Core rule
Only trust evidence. Only trust recent evidence.- For tool/framework/model selection: evidence must be from 2025-06-01 or later
- For methodology/principles/patterns: no time limit, but must verify it still holds
- “I heard” / “people say” / “it’s common knowledge” = not evidence
Three evidence channels
Channel 1: Papers and articles
Published analysis with methodology and data.| Tier | Source | Weight |
|---|---|---|
| A — Authoritative | Top venues (NeurIPS, ICML, ACL) + official publications from Anthropic, OpenAI, Google DeepMind | Primary evidence |
| B — Credible | Lesser-known papers, established tech blogs (Simon Willison, Lilian Weng, Chip Huyen) | Supporting evidence |
| C — Supplementary | Unknown authors, personal blogs, Medium posts | Supplement only |
Channel 2: Tool chains
Open-source tools, libraries, frameworks on GitHub, HuggingFace, npm, PyPI.| Tier | Signal | Weight |
|---|---|---|
| A — Established | 1K+ stars + active maintenance + real issues addressed | Primary evidence |
| B — Emerging | 100-1K stars, OR backed by known org | Supporting evidence |
| C — Experimental | Under 100 stars, solo maintainer | Supplement only |
Channel 3: User feedback
Community discussions, complaints, success stories.| Tier | Source | Weight |
|---|---|---|
| A — Detailed | Specific experience with reproducible details, engagement > 50 | Primary evidence |
| B — Corroborated | Multiple independent users reporting the same thing | Supporting evidence |
| C — Anecdotal | Single user, no details, low engagement | Supplement only |
Reliability formula
Anti-patterns
| Trap | Why it fails |
|---|---|
| Appeal to popularity | 10K GitHub stars does not equal best solution |
| Appeal to recency | ”Just released” does not equal better |
| Appeal to authority | ”Google uses X” does not mean X is right for you |
| Survivorship bias | Success stories ignore all the failures |
| Single-source trust | One blog post does not equal truth |
| Outdated evidence | A 2024 benchmark is wrong for 2026 models |
When evidence is absent
- Acknowledge the gap explicitly
- Check existing knowledge bases for prior art under a different name
- If nothing: propose a small-scale test (pilot with 50 cases, not 5000)
- Never fill an evidence gap with AI speculation — label hypotheses as hypotheses