Skip to main content
aifinhub

Worked example

Running the shipped financial-document-token-estimator engine on the input below produces exactly this output. Continuous integration recomputes it against the engine bundle on every build, so these numbers cannot drift from the code.

Input

{
  "tool": "financial_document_token_estimator",
  "mode": "archetype",
  "archetype_id": "10-k",
  "output_tokens": 1000,
  "peers": 0,
  "cache_hit_rate": 0
}

Output

[
  {
    "model": {
      "id": "claude-haiku-4-5",
      "provider": "anthropic",
      "name": "Claude Haiku 4.5",
      "inputPerMillion": 1,
      "outputPerMillion": 5,
      "cacheReadMultiplier": 0.1,
      "cacheWriteMultiplier": 1.25,
      "contextWindow": 200000,
      "charsPerToken": 3.5,
      "notes": "Cheap small model; good for first-pass filing extraction."
    },
    "inputTokens": 20571,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.025571,
    "synthesisCost": 0.025571,
    "synthesisInputTokens": 20571,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "claude-sonnet-4-6",
      "provider": "anthropic",
      "name": "Claude Sonnet 4.6",
      "inputPerMillion": 3,
      "outputPerMillion": 15,
      "cacheReadMultiplier": 0.1,
      "cacheWriteMultiplier": 1.25,
      "contextWindow": 1000000,
      "charsPerToken": 3.5,
      "notes": "Balanced price/performance; 1M context."
    },
    "inputTokens": 20571,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.076713,
    "synthesisCost": 0.076713,
    "synthesisInputTokens": 20571,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "claude-opus-4-1",
      "provider": "anthropic",
      "name": "Claude Opus 4.1 (retired, prior generation)",
      "inputPerMillion": 15,
      "outputPerMillion": 75,
      "cacheReadMultiplier": 0.1,
      "cacheWriteMultiplier": 1.25,
      "contextWindow": 200000,
      "charsPerToken": 3.5,
      "notes": "Prior-generation Opus flagship at the $15/$75 rate; kept as a generational price reference against Opus 4.7 ($5/$25)."
    },
    "inputTokens": 20571,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.383565,
    "synthesisCost": 0.383565,
    "synthesisInputTokens": 20571,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "claude-opus-4-7",
      "provider": "anthropic",
      "name": "Claude Opus 4.7",
      "inputPerMillion": 5,
      "outputPerMillion": 25,
      "cacheReadMultiplier": 0.1,
      "cacheWriteMultiplier": 1.25,
      "contextWindow": 1000000,
      "charsPerToken": 3.5,
      "notes": "Flagship reasoning; 1M context for multi-filing synthesis."
    },
    "inputTokens": 20571,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.127855,
    "synthesisCost": 0.127855,
    "synthesisInputTokens": 20571,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "gpt-5",
      "provider": "openai",
      "name": "GPT-5.5",
      "inputPerMillion": 5,
      "outputPerMillion": 30,
      "cacheReadMultiplier": 0.5,
      "cacheWriteMultiplier": 1,
      "contextWindow": 400000,
      "charsPerToken": 4,
      "notes": "Frontier OpenAI model."
    },
    "inputTokens": 18000,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.12,
    "synthesisCost": 0.12,
    "synthesisInputTokens": 18000,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "gpt-5-mini",
      "provider": "openai",
      "name": "GPT-5.4 mini",
      "inputPerMillion": 0.75,
      "outputPerMillion": 4.5,
      "cacheReadMultiplier": 0.5,
      "cacheWriteMultiplier": 1,
      "contextWindow": 256000,
      "charsPerToken": 4,
      "notes": "Mid-tier OpenAI; cheaper for high-volume extraction."
    },
    "inputTokens": 18000,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.018,
    "synthesisCost": 0.018,
    "synthesisInputTokens": 18000,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "gemini-3-5-flash",
      "provider": "google",
      "name": "Gemini 3.5 Flash",
      "inputPerMillion": 1.5,
      "outputPerMillion": 9,
      "cacheReadMultiplier": 0.25,
      "cacheWriteMultiplier": 1,
      "contextWindow": 1000000,
      "charsPerToken": 4,
      "notes": "Frontier agent-tier at Flash speed; output ~3.6x Gemini 2.5 Flash, not a budget model."
    },
    "inputTokens": 18000,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.036,
    "synthesisCost": 0.036,
    "synthesisInputTokens": 18000,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "gemini-2-5-flash",
      "provider": "google",
      "name": "Gemini 2.5 Flash",
      "inputPerMillion": 0.3,
      "outputPerMillion": 2.5,
      "cacheReadMultiplier": 0.25,
      "cacheWriteMultiplier": 1,
      "contextWindow": 1000000,
      "charsPerToken": 4,
      "notes": "Fast mid-tier; 1M context."
    },
    "inputTokens": 18000,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.007899999999999999,
    "synthesisCost": 0.007899999999999999,
    "synthesisInputTokens": 18000,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "gemini-2-5-flash-lite",
      "provider": "google",
      "name": "Gemini 2.5 Flash-Lite",
      "inputPerMillion": 0.1,
      "outputPerMillion": 0.4,
      "cacheReadMultiplier": 0.25,
      "cacheWriteMultiplier": 1,
      "contextWindow": 1000000,
      "charsPerToken": 4,
      "notes": "Cheapest tier; 1M context."
    },
    "inputTokens": 18000,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.0022,
    "synthesisCost": 0.0022,
    "synthesisInputTokens": 18000,
    "fitsInContext": true
  },
  {
    "model": {
      "id": "gemini-2-5-pro",
      "provider": "google",
      "name": "Gemini 2.5 Pro",
      "inputPerMillion": 1.25,
      "outputPerMillion": 10,
      "cacheReadMultiplier": 0.25,
      "cacheWriteMultiplier": 1,
      "contextWindow": 2000000,
      "charsPerToken": 4,
      "notes": "2M context; strong on long-document synthesis."
    },
    "inputTokens": 18000,
    "cacheReadTokens": 0,
    "outputTokens": 1000,
    "onePassCost": 0.0325,
    "synthesisCost": 0.0325,
    "synthesisInputTokens": 18000,
    "fitsInContext": true
  }
]

Frequently asked questions

What does the Financial Document Token Estimator methodology page document?
How the financial document token estimator converts filing text into per-model token counts, one-pass and peer-synthesis costs, and context-window fit. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Financial Document Token Estimator, in the Finance category.
When was the Financial Document Token Estimator methodology last reviewed?
This methodology was last reviewed on 2026-04-23. The matching tool is at https://aifinhub.io/financial-document-token-estimator/.
Are the Financial Document Token Estimator numbers reproducible?
Yes. This page embeds a worked example whose output is the verbatim result of running the shipped financial-document-token-estimator engine on a fixed input; the embedded JSON is recomputed and diffed against the engine in CI, so the numbers cannot drift from the code.

Methodology · Tool · Last updated 2026-04-23

How Financial Document Token Estimator works

How the Financial Document Token Estimator prices 10-K, 10-Q, 8-K, and earnings-call extractions across ten frontier LLMs.

What the tool computes

Given either a pasted filing body or a representative archetype, the tool estimates the input-token count for each of ten frontier models, the output-token cost for a planned extraction, and the total dollar cost both for a single one-pass run and for a synthesis against N peer filings. It also flags whether the resulting context fits inside each model’s published window.

Everything runs client-side in your browser. Nothing is uploaded. There is no API call, no server, no telemetry on your pasted text.

Char-per-token ratios

Tokenization is probabilistic and model-family specific. The tool uses published rough ratios per provider family to convert character length into a token estimate:

  • Anthropic (Claude): ~3.5 characters per token on English prose.
  • OpenAI (GPT family): ~4.0 characters per token on English prose (tiktoken baseline).
  • Google (Gemini): ~4.0 characters per token as a practical approximation.

These are averages. A 10-K heavy in tabular numeric data will typically tokenize at a lower char/token ratio (i.e., more tokens per character) because numbers and symbols fragment more aggressively. The calculator is deliberately a planning tool, not a precise counter.

Archetype assumptions

Archetype token counts are mid-range estimates sampled from public EDGAR filings and investor-relations transcripts. Actual lengths vary widely with issuer size and complexity; large-cap 10-Ks can easily exceed 40K tokens in the narrative sections alone.

Archetype~tokens~charsReasoning
10-K annual report (body)18,00072,000Business + MD&A + risk-factors prose, excluding exhibits.
10-Q quarterly report (body)9,00036,000MD&A + condensed statements body.
8-K current report2,50010,000Single-event disclosure; typically short.
Earnings call transcript35,000140,000Prepared remarks + Q&A for a large-cap quarterly call.

Cost formula

input_tokens    = chars / chars_per_token             (per provider)
cached_tokens   = input_tokens × cache_hit_rate
fresh_tokens    = input_tokens − cached_tokens

input_cost      = fresh_tokens  / 1e6 × input_rate
                + cached_tokens / 1e6 × input_rate × cache_read_multiplier
output_cost     = output_tokens / 1e6 × output_rate

one_pass_cost   = input_cost + output_cost
synthesis_cost  = cost_for(input_tokens × (1 + peers), output_tokens)
fits_in_context = (synthesis_input_tokens + output_tokens) ≤ context_window

Pricing rate table (2026-05-25, USD per 1M tokens)

ModelInputOutputCache read mult.Context
Claude Haiku 4.5$1$50.10×200K
Claude Sonnet 4.6$3$150.10×1M
Claude Opus 4.1 (retired, prior generation — $15/$75 reference)$15$750.10×200K
Claude Opus 4.7$5$250.10×1M
GPT-5.5$5$300.50×400K
GPT-5.4 mini$0.75$4.500.50×256K
Gemini 3.5 Flash$1.50$90.25×1M
Gemini 2.5 Flash$0.30$2.500.25×1M
Gemini 2.5 Flash-Lite$0.10$0.400.25×1M
Gemini 2.5 Pro$1.25$100.25×2M

Pricing sources

Limitations

  • Tokenization is probabilistic. Use tiktoken (OpenAI) or Anthropic’s count_tokens endpoint for audit-grade numbers.
  • Archetypes are representative mid-range samples; real filings from the same category span an order of magnitude.
  • Cache-read multipliers for OpenAI and Gemini are approximate — verify against the current published tier before material decisions.
  • Exhibits, tables, and image-based PDFs are not modelled. OCR’d tables in particular tokenize worse than narrative prose.
  • No batch-API discounts, no enterprise tier pricing, no multi-modal surcharges.
  • This is a planning tool, not investment advice, and not a substitute for the vendor’s official tokenizer or billing records.

Related articles

Changelog

  • 2026-04-23 — Initial release with 8 models and 4 archetypes.
  • 2026-05-25 — Added Gemini 3.5 Flash ($1.50/$9, a frontier agent-tier) and Gemini 2.5 Flash-Lite ($0.10/$0.40, the cheapest tier); table now tracks 10 models.
Planning estimates only — not financial, tax, or investment advice.