Skip to main content

Core rule

Only trust evidence. Only trust recent evidence.
  • For tool/framework/model selection: evidence must be from 2025-06-01 or later
  • For methodology/principles/patterns: no time limit, but must verify it still holds
  • “I heard” / “people say” / “it’s common knowledge” = not evidence

Three evidence channels

Channel 1: Papers and articles

Published analysis with methodology and data.
TierSourceWeight
A — AuthoritativeTop venues (NeurIPS, ICML, ACL) + official publications from Anthropic, OpenAI, Google DeepMindPrimary evidence
B — CredibleLesser-known papers, established tech blogs (Simon Willison, Lilian Weng, Chip Huyen)Supporting evidence
C — SupplementaryUnknown authors, personal blogs, Medium postsSupplement only

Channel 2: Tool chains

Open-source tools, libraries, frameworks on GitHub, HuggingFace, npm, PyPI.
TierSignalWeight
A — Established1K+ stars + active maintenance + real issues addressedPrimary evidence
B — Emerging100-1K stars, OR backed by known orgSupporting evidence
C — ExperimentalUnder 100 stars, solo maintainerSupplement only

Channel 3: User feedback

Community discussions, complaints, success stories.
TierSourceWeight
A — DetailedSpecific experience with reproducible details, engagement > 50Primary evidence
B — CorroboratedMultiple independent users reporting the same thingSupporting evidence
C — AnecdotalSingle user, no details, low engagementSupplement only

Reliability formula

Reliability = Channel_Count x Evidence_Quality

Channel count:
  1 channel  -> supplement only
  2 channels -> credible (act with caution)
  3 channels -> high confidence (act on this)

Evidence quality (per channel):
  A-tier -> weight 3
  B-tier -> weight 2
  C-tier -> weight 1

Score = sum of weights across channels
  >= 7 -> strong evidence
  4-6  -> moderate evidence
  1-3  -> weak evidence

Anti-patterns

TrapWhy it fails
Appeal to popularity10K GitHub stars does not equal best solution
Appeal to recency”Just released” does not equal better
Appeal to authority”Google uses X” does not mean X is right for you
Survivorship biasSuccess stories ignore all the failures
Single-source trustOne blog post does not equal truth
Outdated evidenceA 2024 benchmark is wrong for 2026 models

When evidence is absent

  1. Acknowledge the gap explicitly
  2. Check existing knowledge bases for prior art under a different name
  3. If nothing: propose a small-scale test (pilot with 50 cases, not 5000)
  4. Never fill an evidence gap with AI speculation — label hypotheses as hypotheses