What is an AI Token?
Before calculating costs, you need to understand what a token is. A token is the basic unit of text that AI language models process. For English text, 1 token is approximately 4 characters or 0.75 words.
This means:
- A typical sentence = ~15–20 tokens
- A paragraph = ~75–100 tokens
- A page of text = ~300–500 tokens
- 1,000 words = ~1,333 tokens
Code uses more tokens per word than prose because it contains many special characters. Always measure your actual token usage through the API rather than estimating from word counts.
How to Calculate AI Token Cost: The Formula
The token cost formula is straightforward:
Total Cost = (Input Tokens × Input Price per Million) + (Output Tokens × Output Price per Million)
Each LLM provider charges a different rate per million tokens. Prices vary significantly between providers and model tiers — from $0.05/1M input (Gemini 3 Flash-Lite) to $20.00/1M input (o3).
LLM Token Pricing Comparison 2026
Here's how the major 2026 frontier models compare on per-million-token pricing:
| Model | Provider | Input / 1M Tokens | Output / 1M Tokens | Best For |
|---|---|---|---|---|
| GPT-5 | OpenAI | $10.00 | $40.00 | Maximum capability |
| GPT-5 mini | OpenAI | $0.75 | $3.00 | Capable, affordable |
| GPT-4o | OpenAI | $2.50 | $10.00 | General purpose, coding |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | High volume, cost-sensitive |
| o3 | OpenAI | $20.00 | $80.00 | Advanced reasoning |
| o3-mini | OpenAI | $4.00 | $16.00 | Reasoning, budget |
| o1 | OpenAI | $15.00 | $60.00 | Chain-of-thought tasks |
| o1-mini | OpenAI | $3.00 | $12.00 | Fast reasoning |
| Claude 4 Opus | Anthropic | $15.00 | $75.00 | Maximum intelligence |
| Claude 4 Sonnet | Anthropic | $3.00 | $15.00 | Long docs, analysis |
| Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | Long docs, analysis |
| Claude 3.5 Haiku | Anthropic | $0.80 | $4.00 | Fast, budget tasks |
| Gemini 3 Ultra | $1.25 | $5.00 | Multimodal, high intelligence | |
| Gemini 3 Pro | $0.35 | $1.05 | Balanced performance | |
| Gemini 3 Flash | $0.075 | $0.30 | High volume, real-time | |
| Gemini 3 Flash-Lite | $0.05 | $0.20 | Cheapest option | |
| DeepSeek V3 | DeepSeek | $0.27 | $1.10 | Open-weight, cost-efficient |
| DeepSeek R1 | DeepSeek | $0.55 | $2.20 | Reasoning, open-weight |
Real-World Calculation Examples
Example 1: AI Chatbot (GPT-4o mini)
A customer sends a message (200 tokens in) and gets a response (150 tokens out):
Input: (200 / 1,000,000) × $0.15 = $0.00003
Output: (150 / 1,000,000) × $0.60 = $0.00009
Total: $0.00012 per message
That means 1,000 conversations cost just $0.12. A chatbot serving 10,000 daily users costs only $36/month.
Example 2: Document Analysis (Claude 4 Sonnet)
You upload a 5-page legal document (6,000 tokens) and get a detailed analysis (800 tokens out):
Input: (6,000 / 1,000,000) × $3.00 = $0.018
Output: (800 / 1,000,000) × $15.00 = $0.012
Total: $0.030 per document
Processing 1,000 documents costs just $30. Claude 4 Sonnet's large context window means you can analyze entire books in a single call.
Example 3: RAG Pipeline (Gemini 3 Flash)
Retrieval-Augmented Generation query: retrieved context (2,000 tokens) + query (100 tokens) + answer (300 tokens):
Input: (2,100 / 1,000,000) × $0.075 = $0.0001575
Output: (300 / 1,000,000) × $0.30 = $0.00009
Total: $0.00025 per query
At this rate, 1 million RAG queries cost just $250/month — far cheaper than GPT-5 or o3 at the same volume.
Example 4: Advanced Reasoning (o3)
A complex multi-step reasoning task: user query (3,000 tokens) + detailed response (1,500 tokens):
Input: (3,000 / 1,000,000) × $20.00 = $0.060
Output: (1,500 / 1,000,000) × $80.00 = $0.120
Total: $0.180 per query
o3 is 450x more expensive per query than Gemini 3 Flash-Lite — only use it when advanced reasoning genuinely justifies the cost.
Why Output Tokens Are More Expensive
Every major LLM charges 2–5x more for output tokens than input tokens. Here's why:
- Compute difference: Input is processed in a single forward pass through the neural network. Output is generated token-by-token, requiring sequential computation for each token
- Generation overhead: Each output token requires a full matrix multiplication through the model — generating 1,000 tokens means 1,000 forward passes
- Quality vs. speed: Longer outputs with coherent, high-quality content require more compute than a short response
How to Reduce Token Costs
Token costs add up fast at scale. Here's how to cut them:
- Use cheaper models: Gemini 3 Flash-Lite costs 400x less than o3. For 90% of tasks, you don't need the most powerful model
- Shorten prompts: Remove redundant instructions and context. Every token you don't send is a token you don't pay for
- Set max_tokens: Cap output length to prevent runaway responses
- Batch requests: OpenAI's Batch API offers 50% discount for async processing
- Cache repeated queries: For identical queries, serve cached results at zero cost
- Prompt caching: Anthropic, Google, and OpenAI support caching repeated long contexts at a fraction of normal cost
Token Cost by Use Case
| Use Case | Typical Tokens (in/out) | Model | Cost per Call |
|---|---|---|---|
| Chat message | 200 / 150 | GPT-4o mini | $0.00012 |
| Email response | 500 / 300 | Claude 3.5 Haiku | $0.00270 |
| Code generation | 1,000 / 800 | GPT-4o | $0.01150 |
| Long document summary | 8,000 / 600 | Claude 4 Sonnet | $0.03300 |
| Research analysis | 20,000 / 2,000 | GPT-5 | $0.28000 |
| Advanced reasoning | 3,000 / 1,500 | o3 | $0.18000 |
| RAG query | 2,100 / 300 | Gemini 3 Flash | $0.00025 |
Frequently Asked Questions
Key Takeaways
- Token cost formula: (Input Tokens × $/1M Input) + (Output Tokens × $/1M Output)
- Gemini 3 Flash-Lite is the cheapest at $0.05/$0.20; o3 is the most expensive at $20/$80 per 1M
- Output tokens are always 2–5x more expensive than input tokens
- DeepSeek V3 ($0.27/$1.10) offers the best price-to-performance among open-weight models
- Use the AI Token Cost Calculator to estimate costs for any model