AI
Aging Theories Pipeline
Agentic AI Against Aging Hackathon

Agentic Systematic Aging Theories Collection Multi-Step Pipeline

Imagine searching for the secret of aging in a jungle of scattered scientific papers, each using different terms and ideas. What if we could build a living, explorable map of every credible aging theory, linked to its evidence, ready for both scientists and AI to use?

Top-Bottom Approach
Recall → Filter → Validate
Compression
108K → 2,141 theories
Cost Optimized
$0.002 per paper
Multi-Agent
5 specialized agents

The Challenge

Aging theory literature is incredibly diverse and sparse. There's no single keyword or database that covers it all.

Scattered Knowledge

Theories spread across multiple databases, using inconsistent terminology and diverse conceptual frameworks

High Complexity

Theories range from molecular mechanisms to system-level processes, requiring multi-scale analysis

Sparse Evidence

No comprehensive database exists—theories must be extracted from full-text papers, not just abstracts

Our Mission

Systematically collect, classify, and structure all scientific theories of aging—creating the most comprehensive, queryable knowledge base for aging research. A living, explorable map ready for both scientists and AI to use.

Top-Down Funnel Approach

Starting broad, then narrowing focus: maximize recall → fast filtering → precise extraction

1. Cast the Widest Net
Maximize recall with AI-generated queries
108K
2. Apply Fast, Scalable Filters
Remove obvious noise cheaply
90K
3. Precise LLM-Powered Classification
Identify true theory papers
30K
4. Extract & Normalize Theories
Validate and structure knowledge
2,141
5. Enable Deep Question Answering
Structured Q&A with evidence
142K

Why This Approach Works

This funnel approach—broad to narrow—is essential for handling the wild diversity and complexity of aging theory research. We prioritize recall first, then progressively apply more sophisticated (and expensive) filters to ensure we don't miss important theories while maintaining cost efficiency.

5-Stage Agentic Pipeline

Intelligent multi-agent orchestration for systematic knowledge extraction

Stage 1

Literature Mining

Querier Agent

Click to expand

AI-driven query expansion to maximize recall across diverse sources

108,000+
Papers Collected
6
Data Sources
40+
Query Variants
Stage 2

Full-Text Extraction

Collector Agent

Click to expand

Multi-source retrieval with intelligent parsing and quality metrics

90%+
Recovery Rate
8x
Processing Speed
2
Parser Types
Stage 3

LLM Judge Filtering

Classification Agent

Click to expand

Chain-of-thought reasoning for precise paper classification

30,000
Valid Papers
27.8%
Precision Rate
$0.002
Cost per Paper
Stage 4

Theory Extraction

Normalization Agent

Click to expand

Multi-stage extraction, validation, and mechanism-based clustering

27,595
Theory Mentions
2,141
Canonical Theories
12.9:1
Compression Ratio
Stage 5

RAG-based QA

Retrieval Agent

Click to expand

Semantic search with advanced RAG for scalable question answering

1.15M
Chunks Indexed
142,317
Q&A Pairs
15,813
Unique Papers

Pipeline Results

Building the largest structured knowledge base of aging theories

108K
Papers Collected
30K
Valid Theory Papers
2,141
Canonical Theories

Additional Achievements

15,813 Unique DOIs
Mapped to high-confidence theories with provenance tracking
142,317 Q&A Pairs
Structured answers about biomarkers, mechanisms, and interventions
1.15M Chunks Indexed
Semantic embeddings for scalable RAG-based retrieval
12.9:1 Compression
From 27,595 mentions to 2,141 validated canonical theories
Agentic AI Against Aging

Novel multi-stage pipeline for systematic aging theories discovery and analysis

Full Documentation on GitHub