Gemini API Cost Calculator 2026 — Gemini 3.5 Flash, Pro & Omni Pricing

Q: How does Gemini 3.5 Flash compare to GPT-4o on price?

Gemini 3.5 Flash is significantly cheaper than GPT-4o. Gemini 3.5 Flash: $1.50/$9.00 per 1M tokens. GPT-4o: $2.50/$10.00 per 1M tokens. For a typical workload, Gemini 3.5 Flash costs about 40% less for input and 10% less for output compared to GPT-4o.

Q: What is the difference between Standard, Batch, and Flex pricing tiers?

Standard: $1.50 input / $9.00 output per 1M tokens — real-time requests. Batch: $0.75 input / $4.50 output per 1M tokens — 50% off, asynchronous processing. Flex: $2.70 input / $16.20 output per 1M tokens — guaranteed low-latency priority inference.

Model

Input tokens / request

Output tokens / request

Requests per day

Context Caching

Use context caching

Monthly Cost Breakdown

Input cost $0.00

Output cost $0.00

Total Monthly Cost $0.00

Per 1,000 requests $0.00

Per 1M tokens (combined) $0.00

Gemini 3.5 Flash Pricing (May 2026)

Google released Gemini 3.5 Flash at Google I/O on May 19, 2026, replacing Gemini 3.1 Pro as the default production model. It is 40% cheaper than 3.1 Pro, runs 4x faster, and scores 76.2% on Terminal-Bench 2.1. Gemini 3.5 Flash supports text, vision, video, audio, function calling, context caching, Batch API, and Flex inference.

Model	Input / 1M tokens	Output / 1M tokens	Cache Hit / 1M tokens	Context Window
Gemini 3.5 Flash	$1.50	$9.00	$0.15	1M tokens
Gemini 3.5 Flash (Batch)	$0.75	$4.50	$0.075	1M tokens
Gemini 3.5 Flash (Flex)	$2.70	$16.20	$0.15	1M tokens
Gemini 3.1 Flash Lite	$0.25	$1.50	$0.025	1M tokens
Gemini 3 Flash Preview	$0.50	$3.00	$0.05	1M tokens
Gemini 3.1 Pro	$2.00	$12.00	—	1M tokens

Source: Google AI Developer Documentation. Verified May 24, 2026. Non-global regions may incur a 10% surcharge.

Gemini API vs OpenAI vs Claude Pricing (2026 Comparison)

Gemini 3.5 Flash is Google's most cost-competitive model against OpenAI and Anthropic. Here is how the leading models compare:

Model	Input / 1M tokens	Output / 1M tokens	Cache Hit / 1M	Notes
Gemini 3.5 Flash	$1.50	$9.00	$0.15	Best value — agentic workloads
GPT-4o mini	$0.15	$0.60	$0.075	Lowest absolute cost
GPT-4o	$2.50	$10.00	$1.25	Higher capability tier
Claude Haiku 4.5	$1.00	$5.00	$0.10	200K context window
Claude Sonnet 4.6	$3.00	$15.00	$0.30	200K context window

How to Use the Gemini API Cost Calculator

Select your Gemini model from the dropdown — use 3.5 Flash for general production, 3.1 Flash Lite for high-volume budget workloads, or 3.1 Pro for advanced reasoning tasks
Enter your average input tokens per request — a typical chatbot uses 500–2,000 tokens; a code assistant may use 2,000–10,000
Enter your average output tokens per request — short answers: 100–300 tokens; long-form content: 500–2,000 tokens
Set your requests per day — the calculator multiplies this by 30.44 for monthly estimates
Enable context caching if you use repeated system prompts or reference documents (saves up to 90% on context costs)

Real-World Cost Examples

Example 1: Customer Support Chatbot

10,000 requests/day · 500 input tokens · 200 output tokens · Gemini 3.5 Flash

Monthly cost: $14.10 (input: $6.82 + output: $7.28)

Example 2: RAG-Powered Research Tool

1,000 requests/day · 5,000 input tokens (with context caching) · 500 output tokens · 100 cache hits/day

Monthly cost: $34.40 (uncached input: $68.21 + cached input: $0.68 + output: $40.77)

Example 3: Batch Data Processing

10,000 requests/day · 2,000 input tokens · 1,000 output tokens · Gemini 3.5 Flash (Batch)

Monthly cost: $113.93 (50% cheaper than standard tier)

Frequently Asked Questions

How much does Gemini 3.5 Flash cost per month?

Gemini 3.5 Flash costs $1.50 per 1M input tokens and $9.00 per 1M output tokens. A chatbot running 10,000 requests/day with 500 input + 200 output tokens per request costs approximately $14.10/month. Use this calculator to estimate your specific usage.

How does Gemini 3.5 Flash compare to GPT-4o on price?

Gemini 3.5 Flash is significantly cheaper than GPT-4o for most workloads. Gemini 3.5 Flash: $1.50/$9.00 per 1M tokens. GPT-4o: $2.50/$10.00 per 1M tokens. For a typical workload, Gemini 3.5 Flash costs about 40% less for input and 10% less for output compared to GPT-4o.

What is context caching and how much does it save?

Context caching lets you store repeated context (system prompts, reference docs) at $0.15/M tokens — 90% cheaper than the standard $1.50/M input rate. For a 10,000-token system prompt used 100 times/day, caching saves approximately $3.96/month in input costs. Storage costs $1.00 per 1M tokens per hour.

What is the difference between Standard, Batch, and Flex pricing tiers?

Standard ($1.50/$9.00 per 1M): Real-time requests, billed per token used.
Batch ($0.75/$4.50 per 1M): 50% off standard rates, asynchronous processing — submit jobs and receive results later. Ideal for bulk data analysis.
Flex ($2.70/$16.20 per 1M): 80% premium over standard, guaranteed low-latency priority inference — use for SLA-bound production systems.

Is Gemini Omni API available?

As of May 26, 2026, Gemini Omni is not yet available via the developer API. It launched at Google I/O 2026 for consumer access (Gemini app, YouTube Shorts, Google Flow) but API access is "coming in the coming weeks" with pricing TBA. Check the official pricing page for updates.

Is there a free tier for Gemini API?

Yes. Google offers a free tier for Gemini 3.5 Flash with 1M input tokens and 1M output tokens per month (rate limits apply). After that, usage is billed at the standard rates above. Batch and Flex tiers are paid-only.