Evidence Principles

Core rule

Only trust evidence. Only trust recent evidence.

For tool/framework/model selection: evidence must be from 2025-06-01 or later
For methodology/principles/patterns: no time limit, but must verify it still holds
“I heard” / “people say” / “it’s common knowledge” = not evidence

Three evidence channels

Channel 1: Papers and articles

Published analysis with methodology and data.

Tier	Source	Weight
A — Authoritative	Top venues (NeurIPS, ICML, ACL) + official publications from Anthropic, OpenAI, Google DeepMind	Primary evidence
B — Credible	Lesser-known papers, established tech blogs (Simon Willison, Lilian Weng, Chip Huyen)	Supporting evidence
C — Supplementary	Unknown authors, personal blogs, Medium posts	Supplement only

Channel 2: Tool chains

Open-source tools, libraries, frameworks on GitHub, HuggingFace, npm, PyPI.

Tier	Signal	Weight
A — Established	1K+ stars + active maintenance + real issues addressed	Primary evidence
B — Emerging	100-1K stars, OR backed by known org	Supporting evidence
C — Experimental	Under 100 stars, solo maintainer	Supplement only

Channel 3: User feedback

Community discussions, complaints, success stories.

Tier	Source	Weight
A — Detailed	Specific experience with reproducible details, engagement > 50	Primary evidence
B — Corroborated	Multiple independent users reporting the same thing	Supporting evidence
C — Anecdotal	Single user, no details, low engagement	Supplement only

Reliability formula

Reliability = Channel_Count x Evidence_Quality

Channel count:
  1 channel  -> supplement only
  2 channels -> credible (act with caution)
  3 channels -> high confidence (act on this)

Evidence quality (per channel):
  A-tier -> weight 3
  B-tier -> weight 2
  C-tier -> weight 1

Score = sum of weights across channels
  >= 7 -> strong evidence
  4-6  -> moderate evidence
  1-3  -> weak evidence

Anti-patterns

Trap	Why it fails
Appeal to popularity	10K GitHub stars does not equal best solution
Appeal to recency	”Just released” does not equal better
Appeal to authority	”Google uses X” does not mean X is right for you
Survivorship bias	Success stories ignore all the failures
Single-source trust	One blog post does not equal truth
Outdated evidence	A 2024 benchmark is wrong for 2026 models

When evidence is absent

Acknowledge the gap explicitly
Check existing knowledge bases for prior art under a different name
If nothing: propose a small-scale test (pilot with 50 cases, not 5000)
Never fill an evidence gap with AI speculation — label hypotheses as hypotheses

Getting Started

How It Works

Reference

Core rule

Three evidence channels

Channel 1: Papers and articles

Channel 2: Tool chains

Channel 3: User feedback

Reliability formula

Anti-patterns

When evidence is absent

Getting Started

How It Works

Reference

​Core rule

​Three evidence channels

​Channel 1: Papers and articles

​Channel 2: Tool chains

​Channel 3: User feedback

​Reliability formula

​Anti-patterns

​When evidence is absent

Core rule

Three evidence channels

Channel 1: Papers and articles

Channel 2: Tool chains

Channel 3: User feedback

Reliability formula

Anti-patterns

When evidence is absent