Token Efficiency¶

NervaPack reduces token usage by 90% compared to naive RAG. Here's how.

The Token Problem¶

LLMs have limited context windows: - GPT-4o: 128K tokens - Claude Sonnet: 200K tokens

Problem: Traditional RAG sends entire files, wasting tokens on irrelevant code.

Naive RAG Approach¶

Query: "How does authentication work?"

Naive RAG:
1. Vector search finds 3 relevant files
2. Send entire files to LLM
3. Total: 12,840 tokens (mostly irrelevant)

NervaPack Approach¶

Query: "How does authentication work?"

NervaPack:
1. Vector search finds 3 seed entities
2. Graph traversal (BFS) finds related code
3. Extract only relevant classes/functions
4. Total: 1,180 tokens (90.8% reduction)

Token Savings Dashboard¶

Every query shows savings:

╭──────────────  Token Efficiency  ──────────────╮
│  Naive RAG        12,840   ████████████████    │
│  NervaPack         1,180   █░░░░░░░░░░░░░░░    │
│                                                 │
│  Savings: 11,660 tokens (90.8%)                │
│  Cost saved: $0.0292/query (GPT-4o)            │
╰─────────────────────────────────────────────────╯

Why This Matters¶

Lower costs — Less API spending
Faster responses — Smaller prompts
Better quality — LLM focuses on relevant code
More context — Fit more queries in same window