aifinhub

Methodology · Tool · Last updated 2026-04-23

How Financial Document Token Estimator works

How the Financial Document Token Estimator prices 10-K, 10-Q, 8-K, and earnings-call extractions across eight frontier LLMs.

What the tool computes

Given either a pasted filing body or a representative archetype, the tool estimates the input-token count for each of eight frontier models, the output-token cost for a planned extraction, and the total dollar cost both for a single one-pass run and for a synthesis against N peer filings. It also flags whether the resulting context fits inside each model’s published window.

Everything runs client-side in your browser. Nothing is uploaded. There is no API call, no server, no telemetry on your pasted text.

Char-per-token ratios

Tokenization is probabilistic and model-family specific. The tool uses published rough ratios per provider family to convert character length into a token estimate:

  • Anthropic (Claude): ~3.5 characters per token on English prose.
  • OpenAI (GPT family): ~4.0 characters per token on English prose (tiktoken baseline).
  • Google (Gemini): ~4.0 characters per token as a practical approximation.

These are averages. A 10-K heavy in tabular numeric data will typically tokenize at a lower char/token ratio (i.e., more tokens per character) because numbers and symbols fragment more aggressively. The calculator is deliberately a planning tool, not a precise counter.

Archetype assumptions

Archetype token counts are mid-range estimates sampled from public EDGAR filings and investor-relations transcripts. Actual lengths vary widely with issuer size and complexity; large-cap 10-Ks can easily exceed 40K tokens in the narrative sections alone.

Archetype~tokens~charsReasoning
10-K annual report (body)18,00072,000Business + MD&A + risk-factors prose, excluding exhibits.
10-Q quarterly report (body)9,00036,000MD&A + condensed statements body.
8-K current report2,50010,000Single-event disclosure; typically short.
Earnings call transcript35,000140,000Prepared remarks + Q&A for a large-cap quarterly call.

Cost formula

input_tokens    = chars / chars_per_token             (per provider)
cached_tokens   = input_tokens × cache_hit_rate
fresh_tokens    = input_tokens − cached_tokens

input_cost      = fresh_tokens  / 1e6 × input_rate
                + cached_tokens / 1e6 × input_rate × cache_read_multiplier
output_cost     = output_tokens / 1e6 × output_rate

one_pass_cost   = input_cost + output_cost
synthesis_cost  = cost_for(input_tokens × (1 + peers), output_tokens)
fits_in_context = (synthesis_input_tokens + output_tokens) ≤ context_window

Pricing rate table (2026-04-23, USD per 1M tokens)

ModelInputOutputCache read mult.Context
Claude Haiku 4.5$1$50.10×200K
Claude Sonnet 4.6$3$150.10×500K
Claude Opus 4.6$15$750.10×500K
Claude Opus 4.7$15$750.10×1M
GPT-5$10$400.50×400K
GPT-5 mini$2$80.50×256K
Gemini 2.5 Flash$0.30$2.500.25×1M
Gemini 2.5 Pro$1.25$100.25×2M

Pricing sources

Limitations

  • Tokenization is probabilistic. Use tiktoken (OpenAI) or Anthropic’s count_tokens endpoint for audit-grade numbers.
  • Archetypes are representative mid-range samples; real filings from the same category span an order of magnitude.
  • Cache-read multipliers for OpenAI and Gemini are approximate — verify against the current published tier before material decisions.
  • Exhibits, tables, and image-based PDFs are not modelled. OCR’d tables in particular tokenize worse than narrative prose.
  • No batch-API discounts, no enterprise tier pricing, no multi-modal surcharges.
  • This is a planning tool, not investment advice, and not a substitute for the vendor’s official tokenizer or billing records.

Related articles

Changelog

  • 2026-04-23 — Initial release with 8 models and 4 archetypes.
Planning estimates only — not financial, tax, or investment advice.