What exactly is a token?
A token is the atomic unit a model reads or
writes — usually a short run of characters, often
a whole short word, sometimes a fragment. As a
rough ratio, four English characters or about
three-quarters of a word equals one token.
Why are output tokens so much pricier?
Generating text is the expensive part. Reading
your prompt is largely parallel work; producing
each output token requires another forward pass
through the model. Most providers price output
three to five times higher than input.
What does prompt caching actually do?
If your requests share a long system prompt or
document, providers can cache those tokens so
repeated requests only pay a tiny fraction of the
usual input rate — typically a 90% discount on
the cached portion. The single biggest lever for
agent-style workloads.
When should I use the Batch API?
Anything that doesn't need to answer a human
within seconds — evaluations, bulk data
processing, overnight content pipelines. Batch
jobs return within 24 hours at half price on both
input and output tokens, every model.
How do I cut costs in production?
Route by difficulty. Send classification and
extraction to the smallest model, give the
mid-tier the bulk of your traffic, and reserve
the flagship for the requests where reasoning
quality genuinely matters. A blended bill is
almost always cheaper than a uniform one.
Are these prices current?
Rates are sourced from each provider's published
pricing as of May 2026. Currency conversions are
approximate. Confirm against your provider's
current rate card before forecasting any
material budget.