AI
Aging Theories Pipeline
Stage 1 of 5

Literature Mining

AI-driven query expansion to maximize recall across diverse sources

Querier Agent
108,000+ Papers Collected

Rationale & Novelty

Problem:

Aging theories are scattered across disciplines, often using inconsistent terminology. Traditional search methods miss critical papers due to varied naming conventions and cross-disciplinary fragmentation.

Solution:

Cast an ultra-wide net using AI- and expert-informed queries, covering evolutionary, molecular, systems, and intervention-based frameworks. Iteratively expand queries based on discovered papers.

Key Novelty:

  • AI-driven query expansion: LLM generates seed queries, analyzes collected papers, and iteratively creates new queries to maximize coverage
  • Multi-source aggregation: PubMed, arXiv, bioRxiv, medRxiv, OpenAlex, and Europe PMC
  • Diverse terminology: 40+ theoretical frameworks and synonyms, all tracked for reproducibility

Key Features

Parallel Multi-threaded Fetching

Concurrent API calls with intelligent rate limiting and retry logic ensure maximum throughput while respecting source limitations.

API-aware Caching

Smart caching system prevents redundant API calls and enables resumable operations, critical for large-scale data collection.

Data Integrity Checks

Automatic deduplication across sources, provenance tracking for every record, and validation of metadata completeness.

Structured Exports

SQLite database and JSON exports ensure reproducibility and enable downstream processing with full metadata preservation.

Technical Implementation

Data Sources (6 APIs)

PubMed
arXiv
bioRxiv
medRxiv
OpenAlex
Europe PMC

Query Expansion Strategy

1
Seed Query Generation
LLM generates initial queries covering major aging theory categories (evolutionary, molecular, systems biology)
2
Paper Collection & Analysis
Collect papers from all sources, analyze abstracts and keywords to identify terminology patterns
3
Iterative Refinement
Generate new queries based on discovered papers, expanding to related terms and synonyms
4
Convergence Check
Stop when new queries yield diminishing returns (<5% new papers)

Results & Impact

108,000+
Unique Records
Maximizing recall for downstream curation
40+
Query Variants
Covering diverse theoretical frameworks
6
Data Sources
Multi-source aggregation strategy