Agentic AI Against Aging Hackathon

Agentic Systematic Aging Theories Collection Multi-Step Pipeline

Imagine searching for the secret of aging in a jungle of scattered scientific papers, each using different terms and ideas. What if we could build a living, explorable map of every credible aging theory, linked to its evidence, ready for both scientists and AI to use?

Top-Bottom Approach

Recall → Filter → Validate

Compression

108K → 2,141 theories

Cost Optimized

$0.002 per paper

Multi-Agent

5 specialized agents

The Challenge

Aging theory literature is incredibly diverse and sparse. There's no single keyword or database that covers it all.

Scattered Knowledge

Theories spread across multiple databases, using inconsistent terminology and diverse conceptual frameworks

High Complexity

Theories range from molecular mechanisms to system-level processes, requiring multi-scale analysis

Sparse Evidence

No comprehensive database exists—theories must be extracted from full-text papers, not just abstracts

Our Mission

Systematically collect, classify, and structure all scientific theories of aging—creating the most comprehensive, queryable knowledge base for aging research. A living, explorable map ready for both scientists and AI to use.

Top-Down Funnel Approach

Starting broad, then narrowing focus: maximize recall → fast filtering → precise extraction

1. Cast the Widest Net

Maximize recall with AI-generated queries

108K

2. Apply Fast, Scalable Filters

Remove obvious noise cheaply

90K

3. Precise LLM-Powered Classification

Identify true theory papers

30K

4. Extract & Normalize Theories

Validate and structure knowledge

2,141

5. Enable Deep Question Answering

Structured Q&A with evidence

142K

Why This Approach Works

This funnel approach—broad to narrow—is essential for handling the wild diversity and complexity of aging theory research. We prioritize recall first, then progressively apply more sophisticated (and expensive) filters to ensure we don't miss important theories while maintaining cost efficiency.

5-Stage Agentic Pipeline

Intelligent multi-agent orchestration for systematic knowledge extraction

Stage 1

Literature Mining

Querier Agent

Click to expand

AI-driven query expansion to maximize recall across diverse sources

108,000+

Papers Collected

Data Sources

40+

Query Variants

Stage 2

Full-Text Extraction

Collector Agent

Click to expand

Multi-source retrieval with intelligent parsing and quality metrics

90%+

Recovery Rate

Processing Speed

Parser Types

Stage 3

LLM Judge Filtering

Classification Agent

Click to expand

Chain-of-thought reasoning for precise paper classification

30,000

Valid Papers

27.8%

Precision Rate

$0.002

Cost per Paper

Stage 4

Theory Extraction

Normalization Agent

Click to expand

Multi-stage extraction, validation, and mechanism-based clustering

27,595

Theory Mentions

2,141

Canonical Theories

12.9:1

Compression Ratio

Stage 5

RAG-based QA

Retrieval Agent

Click to expand

Semantic search with advanced RAG for scalable question answering

1.15M

Chunks Indexed

142,317

Q&A Pairs

15,813

Unique Papers

Pipeline Results

Building the largest structured knowledge base of aging theories

108K

Papers Collected

30K

Valid Theory Papers

2,141

Canonical Theories

Additional Achievements

15,813 Unique DOIs

Mapped to high-confidence theories with provenance tracking

142,317 Q&A Pairs

Structured answers about biomarkers, mechanisms, and interventions

1.15M Chunks Indexed

Semantic embeddings for scalable RAG-based retrieval

12.9:1 Compression

From 27,595 mentions to 2,141 validated canonical theories

Agentic AI Against Aging Hackathon

Agentic Systematic Aging Theories Collection Multi-Step Pipeline

Top-Bottom Approach

Recall → Filter → Validate

Compression

108K → 2,141 theories

Cost Optimized

$0.002 per paper

Multi-Agent

5 specialized agents

The Challenge

Aging theory literature is incredibly diverse and sparse. There's no single keyword or database that covers it all.

Scattered Knowledge

Theories spread across multiple databases, using inconsistent terminology and diverse conceptual frameworks

High Complexity

Theories range from molecular mechanisms to system-level processes, requiring multi-scale analysis

Sparse Evidence

No comprehensive database exists—theories must be extracted from full-text papers, not just abstracts

Our Mission

Top-Down Funnel Approach

Starting broad, then narrowing focus: maximize recall → fast filtering → precise extraction

1. Cast the Widest Net

Maximize recall with AI-generated queries

108K

2. Apply Fast, Scalable Filters

Remove obvious noise cheaply

90K

3. Precise LLM-Powered Classification

Identify true theory papers

30K

4. Extract & Normalize Theories

Validate and structure knowledge

2,141

5. Enable Deep Question Answering

Structured Q&A with evidence

142K

Why This Approach Works

5-Stage Agentic Pipeline

Intelligent multi-agent orchestration for systematic knowledge extraction

Stage 1

Literature Mining

Querier Agent

Click to expand

AI-driven query expansion to maximize recall across diverse sources

108,000+

Papers Collected

Data Sources

40+

Query Variants

Stage 2

Full-Text Extraction

Collector Agent

Click to expand

Multi-source retrieval with intelligent parsing and quality metrics

90%+

Recovery Rate

Processing Speed

Parser Types

Stage 3

LLM Judge Filtering

Classification Agent

Click to expand

Chain-of-thought reasoning for precise paper classification

30,000

Valid Papers

27.8%

Precision Rate

$0.002

Cost per Paper

Stage 4

Theory Extraction

Normalization Agent

Click to expand

Multi-stage extraction, validation, and mechanism-based clustering

27,595

Theory Mentions

2,141

Canonical Theories

12.9:1

Compression Ratio

Stage 5

RAG-based QA

Retrieval Agent

Click to expand

Semantic search with advanced RAG for scalable question answering

1.15M

Chunks Indexed

142,317

Q&A Pairs

15,813

Unique Papers

Pipeline Results

Building the largest structured knowledge base of aging theories

108K

Papers Collected

30K

Valid Theory Papers

2,141

Canonical Theories

Additional Achievements

15,813 Unique DOIs

Mapped to high-confidence theories with provenance tracking

142,317 Q&A Pairs

Structured answers about biomarkers, mechanisms, and interventions

1.15M Chunks Indexed

Semantic embeddings for scalable RAG-based retrieval

12.9:1 Compression

From 27,595 mentions to 2,141 validated canonical theories