Calculator · Edition 2026

What does a prompt actually cost?

Type your token counts, pick a model, and watch the arithmetic unfold. Compare every current frontier model side-by-side, scale to production volumes, and see where caching and batching bend the curve.

01 Inputs

Input tokens

≈ 750 words

Output tokens

≈ 750 words

Estimate tokens from text ＋

0 tokens

02 Modifiers

Model

Prompt caching Cached input billed at 10% (90% off)

Cached portion of input: 50%

Batch API 50% off, results within 24h

03 Volume & currency

Requests per day

Currency

Cost per request

$0.000000

—

Per day

—

1 request

Per month

—

×30 days

Per year

—

×365 days

Cost / 1K tokens —

Cost / 1M tokens —

In / Out split —

Section II

The Field at a Glance

Sort by

Model	Provider	In / Out per 1M	Your cost	Relative	vs. selected

Cheapest result highlighted. Bar widths scale to the most expensive option on screen. Modifiers (caching, batch) apply to every row.

Section III

Marginalia

What exactly is a token?

A token is the atomic unit a model reads or writes — usually a short run of characters, often a whole short word, sometimes a fragment. As a rough ratio, four English characters or about three-quarters of a word equals one token.

Why are output tokens so much pricier?

Generating text is the expensive part. Reading your prompt is largely parallel work; producing each output token requires another forward pass through the model. Most providers price output three to five times higher than input.

What does prompt caching actually do?

If your requests share a long system prompt or document, providers can cache those tokens so repeated requests only pay a tiny fraction of the usual input rate — typically a 90% discount on the cached portion. The single biggest lever for agent-style workloads.

When should I use the Batch API?

Anything that doesn't need to answer a human within seconds — evaluations, bulk data processing, overnight content pipelines. Batch jobs return within 24 hours at half price on both input and output tokens, every model.

How do I cut costs in production?

Route by difficulty. Send classification and extraction to the smallest model, give the mid-tier the bulk of your traffic, and reserve the flagship for the requests where reasoning quality genuinely matters. A blended bill is almost always cheaper than a uniform one.

Are these prices current?

Rates are sourced from each provider's published pricing as of May 2026. Currency conversions are approximate. Confirm against your provider's current rate card before forecasting any material budget.