Gemini 3.5 Flash Pricing (May 2026)
Google released Gemini 3.5 Flash at Google I/O on May 19, 2026, replacing Gemini 3.1 Pro as the default production model. It is 40% cheaper than 3.1 Pro, runs 4x faster, and scores 76.2% on Terminal-Bench 2.1. Gemini 3.5 Flash supports text, vision, video, audio, function calling, context caching, Batch API, and Flex inference.
| Model | Input / 1M tokens | Output / 1M tokens | Cache Hit / 1M tokens | Context Window |
|---|---|---|---|---|
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 | 1M tokens |
| Gemini 3.5 Flash (Batch) | $0.75 | $4.50 | $0.075 | 1M tokens |
| Gemini 3.5 Flash (Flex) | $2.70 | $16.20 | $0.15 | 1M tokens |
| Gemini 3.1 Flash Lite | $0.25 | $1.50 | $0.025 | 1M tokens |
| Gemini 3 Flash Preview | $0.50 | $3.00 | $0.05 | 1M tokens |
| Gemini 3.1 Pro | $2.00 | $12.00 | — | 1M tokens |
Source: Google AI Developer Documentation. Verified May 24, 2026. Non-global regions may incur a 10% surcharge.
Gemini API vs OpenAI vs Claude Pricing (2026 Comparison)
Gemini 3.5 Flash is Google's most cost-competitive model against OpenAI and Anthropic. Here is how the leading models compare:
| Model | Input / 1M tokens | Output / 1M tokens | Cache Hit / 1M | Notes |
|---|---|---|---|---|
| Gemini 3.5 Flash | $1.50 | $9.00 | $0.15 | Best value — agentic workloads |
| GPT-4o mini | $0.15 | $0.60 | $0.075 | Lowest absolute cost |
| GPT-4o | $2.50 | $10.00 | $1.25 | Higher capability tier |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 | 200K context window |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 | 200K context window |
How to Use the Gemini API Cost Calculator
- Select your Gemini model from the dropdown — use 3.5 Flash for general production, 3.1 Flash Lite for high-volume budget workloads, or 3.1 Pro for advanced reasoning tasks
- Enter your average input tokens per request — a typical chatbot uses 500–2,000 tokens; a code assistant may use 2,000–10,000
- Enter your average output tokens per request — short answers: 100–300 tokens; long-form content: 500–2,000 tokens
- Set your requests per day — the calculator multiplies this by 30.44 for monthly estimates
- Enable context caching if you use repeated system prompts or reference documents (saves up to 90% on context costs)
Real-World Cost Examples
Example 1: Customer Support Chatbot
10,000 requests/day · 500 input tokens · 200 output tokens · Gemini 3.5 Flash
Monthly cost: $14.10 (input: $6.82 + output: $7.28)
Example 2: RAG-Powered Research Tool
1,000 requests/day · 5,000 input tokens (with context caching) · 500 output tokens · 100 cache hits/day
Monthly cost: $34.40 (uncached input: $68.21 + cached input: $0.68 + output: $40.77)
Example 3: Batch Data Processing
10,000 requests/day · 2,000 input tokens · 1,000 output tokens · Gemini 3.5 Flash (Batch)
Monthly cost: $113.93 (50% cheaper than standard tier)
Frequently Asked Questions
How much does Gemini 3.5 Flash cost per month?
Gemini 3.5 Flash costs $1.50 per 1M input tokens and $9.00 per 1M output tokens. A chatbot running 10,000 requests/day with 500 input + 200 output tokens per request costs approximately $14.10/month. Use this calculator to estimate your specific usage.
How does Gemini 3.5 Flash compare to GPT-4o on price?
Gemini 3.5 Flash is significantly cheaper than GPT-4o for most workloads. Gemini 3.5 Flash: $1.50/$9.00 per 1M tokens. GPT-4o: $2.50/$10.00 per 1M tokens. For a typical workload, Gemini 3.5 Flash costs about 40% less for input and 10% less for output compared to GPT-4o.
What is context caching and how much does it save?
Context caching lets you store repeated context (system prompts, reference docs) at $0.15/M tokens — 90% cheaper than the standard $1.50/M input rate. For a 10,000-token system prompt used 100 times/day, caching saves approximately $3.96/month in input costs. Storage costs $1.00 per 1M tokens per hour.
What is the difference between Standard, Batch, and Flex pricing tiers?
Standard ($1.50/$9.00 per 1M): Real-time requests, billed per token used.
Batch ($0.75/$4.50 per 1M): 50% off standard rates, asynchronous processing — submit jobs and receive results later. Ideal for bulk data analysis.
Flex ($2.70/$16.20 per 1M): 80% premium over standard, guaranteed low-latency priority inference — use for SLA-bound production systems.
Is Gemini Omni API available?
As of May 26, 2026, Gemini Omni is not yet available via the developer API. It launched at Google I/O 2026 for consumer access (Gemini app, YouTube Shorts, Google Flow) but API access is "coming in the coming weeks" with pricing TBA. Check the official pricing page for updates.
Is there a free tier for Gemini API?
Yes. Google offers a free tier for Gemini 3.5 Flash with 1M input tokens and 1M output tokens per month (rate limits apply). After that, usage is billed at the standard rates above. Batch and Flex tiers are paid-only.