What is an AI Token?

Before calculating costs, you need to understand what a token is. A token is the basic unit of text that AI language models process. For English text, 1 token is approximately 4 characters or 0.75 words.

This means:

  • A typical sentence = ~15–20 tokens
  • A paragraph = ~75–100 tokens
  • A page of text = ~300–500 tokens
  • 1,000 words = ~1,333 tokens

Code uses more tokens per word than prose because it contains many special characters. Always measure your actual token usage through the API rather than estimating from word counts.

How to Calculate AI Token Cost: The Formula

The token cost formula is straightforward:

Total Cost = (Input Tokens × Input Price per Million) + (Output Tokens × Output Price per Million)

Each LLM provider charges a different rate per million tokens. Prices vary significantly between providers and model tiers — from $0.05/1M input (Gemini 3 Flash-Lite) to $20.00/1M input (o3).

LLM Token Pricing Comparison 2026

Here's how the major 2026 frontier models compare on per-million-token pricing:

ModelProviderInput / 1M TokensOutput / 1M TokensBest For
GPT-5OpenAI$10.00$40.00Maximum capability
GPT-5 miniOpenAI$0.75$3.00Capable, affordable
GPT-4oOpenAI$2.50$10.00General purpose, coding
GPT-4o miniOpenAI$0.15$0.60High volume, cost-sensitive
o3OpenAI$20.00$80.00Advanced reasoning
o3-miniOpenAI$4.00$16.00Reasoning, budget
o1OpenAI$15.00$60.00Chain-of-thought tasks
o1-miniOpenAI$3.00$12.00Fast reasoning
Claude 4 OpusAnthropic$15.00$75.00Maximum intelligence
Claude 4 SonnetAnthropic$3.00$15.00Long docs, analysis
Claude 3.5 SonnetAnthropic$3.00$15.00Long docs, analysis
Claude 3.5 HaikuAnthropic$0.80$4.00Fast, budget tasks
Gemini 3 UltraGoogle$1.25$5.00Multimodal, high intelligence
Gemini 3 ProGoogle$0.35$1.05Balanced performance
Gemini 3 FlashGoogle$0.075$0.30High volume, real-time
Gemini 3 Flash-LiteGoogle$0.05$0.20Cheapest option
DeepSeek V3DeepSeek$0.27$1.10Open-weight, cost-efficient
DeepSeek R1DeepSeek$0.55$2.20Reasoning, open-weight

Real-World Calculation Examples

Example 1: AI Chatbot (GPT-4o mini)

A customer sends a message (200 tokens in) and gets a response (150 tokens out):

Input: (200 / 1,000,000) × $0.15 = $0.00003
Output: (150 / 1,000,000) × $0.60 = $0.00009
Total: $0.00012 per message

That means 1,000 conversations cost just $0.12. A chatbot serving 10,000 daily users costs only $36/month.

Example 2: Document Analysis (Claude 4 Sonnet)

You upload a 5-page legal document (6,000 tokens) and get a detailed analysis (800 tokens out):

Input: (6,000 / 1,000,000) × $3.00 = $0.018
Output: (800 / 1,000,000) × $15.00 = $0.012
Total: $0.030 per document

Processing 1,000 documents costs just $30. Claude 4 Sonnet's large context window means you can analyze entire books in a single call.

Example 3: RAG Pipeline (Gemini 3 Flash)

Retrieval-Augmented Generation query: retrieved context (2,000 tokens) + query (100 tokens) + answer (300 tokens):

Input: (2,100 / 1,000,000) × $0.075 = $0.0001575
Output: (300 / 1,000,000) × $0.30 = $0.00009
Total: $0.00025 per query

At this rate, 1 million RAG queries cost just $250/month — far cheaper than GPT-5 or o3 at the same volume.

Example 4: Advanced Reasoning (o3)

A complex multi-step reasoning task: user query (3,000 tokens) + detailed response (1,500 tokens):

Input: (3,000 / 1,000,000) × $20.00 = $0.060
Output: (1,500 / 1,000,000) × $80.00 = $0.120
Total: $0.180 per query

o3 is 450x more expensive per query than Gemini 3 Flash-Lite — only use it when advanced reasoning genuinely justifies the cost.

Why Output Tokens Are More Expensive

Every major LLM charges 2–5x more for output tokens than input tokens. Here's why:

  • Compute difference: Input is processed in a single forward pass through the neural network. Output is generated token-by-token, requiring sequential computation for each token
  • Generation overhead: Each output token requires a full matrix multiplication through the model — generating 1,000 tokens means 1,000 forward passes
  • Quality vs. speed: Longer outputs with coherent, high-quality content require more compute than a short response

How to Reduce Token Costs

Token costs add up fast at scale. Here's how to cut them:

  • Use cheaper models: Gemini 3 Flash-Lite costs 400x less than o3. For 90% of tasks, you don't need the most powerful model
  • Shorten prompts: Remove redundant instructions and context. Every token you don't send is a token you don't pay for
  • Set max_tokens: Cap output length to prevent runaway responses
  • Batch requests: OpenAI's Batch API offers 50% discount for async processing
  • Cache repeated queries: For identical queries, serve cached results at zero cost
  • Prompt caching: Anthropic, Google, and OpenAI support caching repeated long contexts at a fraction of normal cost

Token Cost by Use Case

Use CaseTypical Tokens (in/out)ModelCost per Call
Chat message200 / 150GPT-4o mini$0.00012
Email response500 / 300Claude 3.5 Haiku$0.00270
Code generation1,000 / 800GPT-4o$0.01150
Long document summary8,000 / 600Claude 4 Sonnet$0.03300
Research analysis20,000 / 2,000GPT-5$0.28000
Advanced reasoning3,000 / 1,500o3$0.18000
RAG query2,100 / 300Gemini 3 Flash$0.00025

Frequently Asked Questions

How many tokens is my text?
For English prose: approximately 1 token per 0.75 words (so 1,000 words = ~1,333 tokens). For code: approximately 1 token per 2-4 characters. For the most accurate count, use the provider's tokenizer tool (OpenAI tokenizer, Anthropic token counter) or count bytes divided by 4.
Which model has the lowest token cost?
Gemini 3 Flash-Lite at $0.05/$0.20 per million input/output tokens. For context, GPT-5 costs $10/$40 — that's 200x more expensive. DeepSeek V3 at $0.27/$1.10 is another strong budget option with open weights.
Does caching reduce token costs?
Yes. Anthropic's prompt caching charges 10% of the normal input rate for cached tokens. If you reuse a 10K-token context across 1,000 queries, caching saves 90% on that context. Google Gemini, OpenAI, and DeepSeek also support context caching at discounted rates.
How much does 1 million tokens cost?
For the cheapest paid option (Gemini 3 Flash-Lite): $0.05 input + $0.20 output = $0.25 per 1M tokens total. For GPT-4o: $2.50 + $10.00 = $12.50 per 1M tokens. For o3: $20.00 + $80.00 = $100.00 per 1M tokens. The range is $0.25 to $100 per million tokens depending on model.

Key Takeaways

  • Token cost formula: (Input Tokens × $/1M Input) + (Output Tokens × $/1M Output)
  • Gemini 3 Flash-Lite is the cheapest at $0.05/$0.20; o3 is the most expensive at $20/$80 per 1M
  • Output tokens are always 2–5x more expensive than input tokens
  • DeepSeek V3 ($0.27/$1.10) offers the best price-to-performance among open-weight models
  • Use the AI Token Cost Calculator to estimate costs for any model