What does a RAG pipeline cost per query?

A typical RAG query costs $0.001–0.01 per query: embedding ($0.00005) + retrieval ($0.00001) + LLM generation ($0.001–0.01). The LLM generation step (context + response) is usually 80–95% of total cost. Use this calculator to estimate your specific RAG deployment.

RAG Pipeline Cost Calculator 2026 — Embedding + Retrieval + Generation Costs

What is a RAG Pipeline?

RAG (Retrieval-Augmented Generation) is a technique where an AI model retrieves relevant information from a knowledge base before generating a response. It combines three stages:

Embedding: Your documents are split into chunks and converted to vector embeddings stored in a vector database (Pinecone, Weaviate, Chroma, pgvector)
Retrieval: When a user asks a question, it's embedded and the most relevant document chunks are retrieved
Generation: The retrieved context + the user's question are sent to an LLM for a grounded, accurate response

RAG Cost Components Explained

Component	Typical Cost	% of Total	How to Reduce
Embedding (query)	$0.000002–0.000013	< 1%	Use small embed models (text-embedding-3-small)
Vector Storage	$0.0000001–0.000001	< 1%	Delete unused embeddings, use quantized storage
LLM Generation	$0.0003–0.015	95–99%	Use smaller LLMs, limit context tokens, cache responses

How to Use This Calculator

Choose embedding model: Smaller models (text-embedding-3-small) are 6.5x cheaper than large ones
Set token counts: Query tokens (usually 50–200) + retrieved context (your chunk size)
Select LLM: The generation model — this is where 95%+ of cost lives
Set query volume: Monthly queries to project costs

Real-World RAG Cost Examples

Example 1: Customer Support Bot (100K queries/month)

Setup: text-embedding-3-small + GPT-4o mini, 2K context, 300 output

Cost per query: $0.000002 (embed) + $0.000375 (LLM) = $0.000377

Monthly cost: 100,000 × $0.000377 = $37.70

Annual cost: $458.77

That's 3.8 cents per 100 queries, or $0.00038 per conversation.

Example 2: Legal Document Assistant (10K queries/month)

Setup: text-embedding-3-large + Claude 3.5 Sonnet, 8K context, 500 output

Cost per query: $0.000013 (embed) + $0.000825 (LLM) = $0.000838

Monthly cost: 10,000 × $0.000838 = $8.38

Annual cost: $101.94

Claude's superior accuracy on legal documents may justify the higher cost per query.

Example 3: Research Assistant (1M queries/month)

Setup: text-embedding-3-small + Gemini Flash-Lite, 2K context, 200 output

Cost per query: $0.000002 + $0.000255 = $0.000257

Monthly cost: 1,000,000 × $0.000257 = $257.00

Annual cost: $3,127.80

At 1M queries/month, even cheap per-query costs add up. Cache frequent queries to reduce this by 30–60%.

How to Reduce RAG Costs

Use cheap embedding models: text-embedding-3-small ($0.02/1M) vs text-embedding-3-large ($0.13/1M) — save 85%
Limit retrieved context: Retrieve 2K tokens instead of 8K to cut LLM input costs by 75%
Use cheaper LLMs for simple queries: Route to Gemini Flash-Lite for straightforward questions, premium models only for complex ones
Cache responses: For repeated queries (common in support bots), serve cached responses at zero cost
HyDE or sparse retrieval: Use Hypothetical Document Embeddings to improve retrieval accuracy, reducing the need for large contexts
Delete stale embeddings: Prune your vector DB regularly to reduce storage costs

Cost Breakdown per Query