Stage 3: LLM Judge Filtering

Precision filtering with chain-of-thought reasoning

Rationale & Novelty

Problem

High recall in earlier stages results in massive noise; manual filtering is infeasible at scale (108K papers).

Solution

Use LLMs with chain-of-thought reasoning and expert-informed prompts to classify papers by title/abstract only.

Key Novelty

Golden dataset-driven prompt engineering: LLMs analyze true/false positive sets to refine inclusion/exclusion criteria
Abstract-only, cost-optimized processing: $0.002/paper vs $0.02+ for full-text analysis
Edge-case handling: Specific logic for complex scenarios (e.g., "hallmarks", "senolytics", intervention-only papers)

6-Step Chain-of-Thought Classification

Initial Assessment

Does the abstract mention biological aging or related theories?

Theory Identification

Are specific aging theories or mechanisms explicitly discussed?

Exclusion Check

Filter out clinical trials, reviews without theory focus, purely intervention papers

Edge Case Analysis

Handle ambiguous cases: hallmarks papers, senolytic studies, evolutionary theories

Confidence Scoring

Assign confidence level (high/medium/low) based on theory centrality

Final Classification

Output structured JSON: include/exclude/review + reasoning

Technical Implementation

Prompt Engineering

Expert-informed inclusion criteria based on aging biology domain knowledge
Golden dataset analysis: LLM reviews true/false positives to self-improve
Structured output format with reasoning transparency
Temperature tuning for consistency vs creativity balance

Processing Pipeline

Parallel, async processing with rate limiting
Checkpointing for fault tolerance and resumability
JSON schema validation for output quality
Real-time monitoring and error recovery

Results & Impact

30,000

Valid aging theory papers retained

(27.8% of input corpus)

2,000

Papers flagged for manual review

(Edge cases & medium confidence)

$0.002

Cost per paper classification

(10x cheaper than full-text)

72.2% noise reduction while maintaining high recall for true aging theory papers

Edge Case Handling

Hallmarks Papers

Papers discussing "hallmarks of aging" are included only if they propose or test specific mechanistic theories, not just review existing hallmarks.

Senolytic Studies

Intervention papers are included if they explicitly test or discuss underlying aging theories (e.g., cellular senescence theory), excluded if purely pharmacological.

Evolutionary Theories

Papers on evolutionary theories of aging (antagonistic pleiotropy, disposable soma) are always included, even if abstract is brief.

Review Papers

Reviews are included if they synthesize or compare theories; excluded if they only summarize empirical findings without theoretical framing.

Back to Overview