The short answer

On a short-context finance task (8k input, 500 output), 1,000 tasks cost $16.50 on Gemini 3.5 Flash, $41.70 on Claude Opus 4.7, and $55.00 on GPT-5.5, computed from the Token Cost Optimizer. Gemini 2.5 Flash-Lite does the same 1,000 tasks for $1.00, reframing the frontier comparison entirely.

On a short-context finance task (8k input, 500 output, the shape of a news-sentiment or tagging call), 1,000 tasks cost $16.50 on Gemini 3.5 Flash, $41.70 on Claude Opus 4.7, and $55.00 on GPT-5.5, all computed live from the Token Cost Optimizer. The budget tier reframes the whole comparison: Gemini 2.5 Flash-Lite does the same 1,000 tasks for $1.00. The frontier three are within a 3.3x band of each other and 16x to 55x above the budget floor.

TL;DR

Model Cost per 1,000 tasks
Gemini 2.5 Flash-Lite $1.00
Gemini 2.5 Flash $3.65
Claude Haiku 4.5 $8.34
Gemini 3.5 Flash $16.50
Claude Opus 4.7 $41.70
GPT-5.5 $55.00

Same task for every row: 8,000 input + 500 output tokens, one call per task, no retry, a 0.30 cache-hit assumption, priced as 1,000 tasks in a day. Anthropic input reflects the 30% cache hit; Google and OpenAI are priced at full list input.

What a "task" is here

A task is a single short-context LLM call: classify the sentiment of a news item, tag a filing paragraph, score a headline, extract one field. It is the atomic unit of a high-volume finance pipeline. The 8k-input, 500-output shape is typical: a news item plus an instruction in, a structured score out. Cost-per-1,000-tasks is the unit that scales, because these pipelines run tens of thousands of tasks a day.

The three frontier models, head to head

Among the three frontier headline models, the ranking on cost-per-1,000-tasks is clear and not close to a tie:

  • Gemini 3.5 Flash $16.50. The cheapest frontier option, by a wide margin.
  • Claude Opus 4.7 $41.70. About 2.5x Gemini 3.5 Flash.
  • GPT-5.5 $55.00. About 3.3x Gemini 3.5 Flash.

The driver is the rate table. Gemini 3.5 Flash bills $1.50/$9.00 per Mtok1; GPT-5.5 bills $5.00/$30.002. On a 500-token output the output rate matters, and GPT-5.5's $30/Mtok output is what pushes it to the top of the three. Opus 4.7, at $5/$253, lands between them, helped slightly by the 30% input cache that only Anthropic gets in this engine.

At 50,000 tasks a day, the band translates to real money: $825/day on Gemini 3.5 Flash, $2,085/day on Opus 4.7, $2,750/day on GPT-5.5. Over a month that is roughly $25k versus $63k versus $83k for the identical 1.5 million tasks.

The budget tier changes the question

The frontier three-way is interesting, but the dominant fact is the budget floor. Gemini 2.5 Flash-Lite does 1,000 tasks for $1.00, 16x cheaper than Gemini 3.5 Flash and 55x cheaper than GPT-5.5. Claude Haiku 4.5 ($8.34) and Gemini 2.5 Flash ($3.65) sit in between.

For sentiment scoring and tagging, the task is structural classification at volume, exactly the regime where a budget model, ideally one fine-tuned on your label set, captures most of the value. A frontier model on a sentiment call is paying reasoning rates for a classification job. The defensible architecture is a budget first pass on all 1,000 tasks, with frontier escalation only on the ambiguous subset the cheap model flags. Routing 10% of tasks to Gemini 3.5 Flash on top of a Flash-Lite base costs about $2.65 per 1,000, versus $16.50 to run everything on the frontier model.

A capability caveat the cost numbers cannot settle

Cost is not quality. These figures say nothing about which model classifies a sentiment correctly, follows a structured-output schema reliably, or handles a sarcastic headline. Vendors publish capability benchmarks; treat them as vendor claims and run your own eval on a labeled sample of your real tasks. The cost ranking is firm; the accuracy ranking is yours to establish. The right model is the cheapest one that clears your accuracy bar, and on a structural classification task that is very often a budget model.

Decision guidance

  1. Price in cost-per-1,000-tasks, not per-call. A fraction of a cent per call becomes tens of thousands of dollars a month at volume; the per-1,000 unit makes the decision legible.
  2. Default to the budget tier for classification. Sentiment, tagging, and field extraction are structural; the 16x-to-55x frontier premium rarely earns out.
  3. Two-stage on ambiguity. Cheap first pass, frontier escalation on flagged items, beats running everything on the frontier model.
  4. Recompute your real token shape. A longer instruction or a richer structured output moves the per-1,000 cost; the Token Cost Optimizer recomputes it instantly.

Connects to

References

Footnotes

  1. Google. "Gemini Developer API pricing." ai.google.dev, verified 2026-05-26. https://ai.google.dev/gemini-api/docs/pricing

  2. OpenAI. "API Pricing." developers.openai.com, verified 2026-05-26. https://developers.openai.com/api/docs/pricing

  3. Anthropic. "Pricing." platform.claude.com, verified 2026-05-26. https://platform.claude.com/docs/en/about-claude/pricing

Verified engine output

Show the recompute-verified inputs and outputs
Cost per 1,000 tasks — Gemini 2.5 Flash-Lite (budget floor)
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgemini-2-5-flash-lite
Result
model › idgemini-2-5-flash-lite
model › providergoogle
model › nameGemini 2.5 Flash-Lite
model › input usd per mtoken0.1
model › output usd per mtoken0.4
model › context window1000000
model › notesCheapest tier in this table; 1M context.
effective cost per call0.001
cost per idea0.001
cost per validated trade0.0011111111111111111
cost per day1
cost per month30
cost per year365

Computed live at build time.

Cost per 1,000 tasks — Gemini 2.5 Flash
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgemini-2-5-flash
Result
model › idgemini-2-5-flash
model › providergoogle
model › nameGemini 2.5 Flash
model › input usd per mtoken0.3
model › output usd per mtoken2.5
model › context window1000000
model › notesFast mid-tier; 1M context.
effective cost per call0.0036499999999999996
cost per idea0.0036499999999999996
cost per validated trade0.004055555555555555
cost per day3.6499999999999995
cost per month109.49999999999999
cost per year1332.2499999999998

Computed live at build time.

Cost per 1,000 tasks — Claude Haiku 4.5
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idclaude-haiku-4-5
Result
model › idclaude-haiku-4-5
model › provideranthropic
model › nameClaude Haiku 4.5
model › input usd per mtoken1
model › output usd per mtoken5
model › cache write usd per mtoken1.25
model › cache read usd per mtoken0.1
model › context window200000
model › notesFast, cheap — filtering + pre-processing layers.
effective cost per call0.00834
cost per idea0.00834
cost per validated trade0.009266666666666666
cost per day8.34
cost per month250.2
cost per year3044.1

Computed live at build time.

Cost per 1,000 tasks — Gemini 3.5 Flash (cheapest frontier)
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgemini-3-5-flash
Result
model › idgemini-3-5-flash
model › providergoogle
model › nameGemini 3.5 Flash
model › input usd per mtoken1.5
model › output usd per mtoken9
model › context window1000000
model › notesFrontier agent-tier at Flash speed — not a budget model (output ~3.6x Gemini 2.5 Flash).
effective cost per call0.0165
cost per idea0.0165
cost per validated trade0.018333333333333333
cost per day16.5
cost per month495
cost per year6022.5

Computed live at build time.

Cost per 1,000 tasks — Claude Opus 4.7 (30% cache hit on input)
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idclaude-opus-4-7
Result
model › idclaude-opus-4-7
model › provideranthropic
model › nameClaude Opus 4.7
model › input usd per mtoken5
model › output usd per mtoken25
model › cache write usd per mtoken6.25
model › cache read usd per mtoken0.5
model › context window1000000
model › notesFlagship reasoning model — 1M context.
effective cost per call0.0417
cost per idea0.0417
cost per validated trade0.04633333333333333
cost per day41.7
cost per month1251
cost per year15220.500000000002

Computed live at build time.

Cost per 1,000 tasks — GPT-5.5 (premium)
Inputs
input_tokens_per_call8000
output_tokens_per_call500
calls_per_idea1
retry_rate0
ideas_per_day1000
validation_rate0.9
cache_hit_rate0.3
model_idgpt-5
Result
model › idgpt-5
model › provideropenai
model › nameGPT-5.5
model › input usd per mtoken5
model › output usd per mtoken30
model › context window400000
model › notesOpenAI frontier model (GPT-5.5).
effective cost per call0.055
cost per idea0.055
cost per validated trade0.06111111111111111
cost per day55
cost per month1650
cost per year20075

Computed live at build time.

Frequently asked questions

What is the cost per 1,000 tasks for Gemini 3.5 Flash vs Opus 4.7 vs GPT-5.5?
On an 8k-input, 500-output finance task: Gemini 3.5 Flash $16.50, Claude Opus 4.7 $41.70, and GPT-5.5 $55.00 per 1,000 tasks, all computed from the Token Cost Optimizer. Gemini 3.5 Flash is the cheapest of the three frontier models by a wide margin.
Which is the cheapest model for high-volume finance classification?
Gemini 2.5 Flash-Lite at $1.00 per 1,000 tasks, 16x cheaper than Gemini 3.5 Flash and 55x cheaper than GPT-5.5. Sentiment scoring and tagging are structural classification, where the budget tier captures most of the value at a fraction of the frontier cost.
Why is GPT-5.5 the most expensive of the three frontier models per task?
Its output rate. GPT-5.5 bills $30/Mtok output against Gemini 3.5 Flash's $9/Mtok, and on a 500-token output that gap drives GPT-5.5 to $55.00 per 1,000 tasks versus Gemini 3.5 Flash's $16.50, a 3.3x difference.
How much does running 50,000 finance tasks a day cost on each model?
Scaling the per-1,000 figures: about $825/day on Gemini 3.5 Flash, $2,085/day on Claude Opus 4.7, and $2,750/day on GPT-5.5. Over a month that is roughly $25k, $63k, and $83k for the same 1.5 million tasks.
Are these cost numbers or accuracy scores?
Cost numbers only, computed from verified vendor list prices and recomputed by CI against the shipped cost engine. They say nothing about which model classifies correctly. Run your own eval on a labeled sample, then pick the cheapest model that clears your accuracy bar.