Token Efficiency¶
NervaPack reduces token usage by 90% compared to naive RAG. Here's how.
The Token Problem¶
LLMs have limited context windows: - GPT-4o: 128K tokens - Claude Sonnet: 200K tokens
Problem: Traditional RAG sends entire files, wasting tokens on irrelevant code.
Naive RAG Approach¶
Query: "How does authentication work?"
Naive RAG:
1. Vector search finds 3 relevant files
2. Send entire files to LLM
3. Total: 12,840 tokens (mostly irrelevant)
NervaPack Approach¶
Query: "How does authentication work?"
NervaPack:
1. Vector search finds 3 seed entities
2. Graph traversal (BFS) finds related code
3. Extract only relevant classes/functions
4. Total: 1,180 tokens (90.8% reduction)
Token Savings Dashboard¶
Every query shows savings:
╭────────────── Token Efficiency ──────────────╮
│ Naive RAG 12,840 ████████████████ │
│ NervaPack 1,180 █░░░░░░░░░░░░░░░ │
│ │
│ Savings: 11,660 tokens (90.8%) │
│ Cost saved: $0.0292/query (GPT-4o) │
╰─────────────────────────────────────────────────╯
Why This Matters¶
- Lower costs — Less API spending
- Faster responses — Smaller prompts
- Better quality — LLM focuses on relevant code
- More context — Fit more queries in same window