Methodology · Financial Document Token Estimator

What the tool computes

Given either a pasted filing body or a representative archetype, the tool estimates the input-token count for each of eight frontier models, the output-token cost for a planned extraction, and the total dollar cost both for a single one-pass run and for a synthesis against N peer filings. It also flags whether the resulting context fits inside each model’s published window.

Everything runs client-side in your browser. Nothing is uploaded. There is no API call, no server, no telemetry on your pasted text.

Char-per-token ratios

Tokenization is probabilistic and model-family specific. The tool uses published rough ratios per provider family to convert character length into a token estimate:

Anthropic (Claude): ~3.5 characters per token on English prose.
OpenAI (GPT family): ~4.0 characters per token on English prose (tiktoken baseline).
Google (Gemini): ~4.0 characters per token as a practical approximation.

These are averages. A 10-K heavy in tabular numeric data will typically tokenize at a lower char/token ratio (i.e., more tokens per character) because numbers and symbols fragment more aggressively. The calculator is deliberately a planning tool, not a precise counter.

Archetype assumptions

Archetype token counts are mid-range estimates sampled from public EDGAR filings and investor-relations transcripts. Actual lengths vary widely with issuer size and complexity; large-cap 10-Ks can easily exceed 40K tokens in the narrative sections alone.

Archetype	~tokens	~chars	Reasoning
10-K annual report (body)	18,000	72,000	Business + MD&A + risk-factors prose, excluding exhibits.
10-Q quarterly report (body)	9,000	36,000	MD&A + condensed statements body.
8-K current report	2,500	10,000	Single-event disclosure; typically short.
Earnings call transcript	35,000	140,000	Prepared remarks + Q&A for a large-cap quarterly call.

Cost formula

input_tokens    = chars / chars_per_token             (per provider)
cached_tokens   = input_tokens × cache_hit_rate
fresh_tokens    = input_tokens − cached_tokens

input_cost      = fresh_tokens  / 1e6 × input_rate
                + cached_tokens / 1e6 × input_rate × cache_read_multiplier
output_cost     = output_tokens / 1e6 × output_rate

one_pass_cost   = input_cost + output_cost
synthesis_cost  = cost_for(input_tokens × (1 + peers), output_tokens)
fits_in_context = (synthesis_input_tokens + output_tokens) ≤ context_window

Pricing rate table (2026-04-23, USD per 1M tokens)

Model	Input	Output	Cache read mult.	Context
Claude Haiku 4.5	$1	$5	0.10×	200K
Claude Sonnet 4.6	$3	$15	0.10×	500K
Claude Opus 4.6	$15	$75	0.10×	500K
Claude Opus 4.7	$15	$75	0.10×	1M
GPT-5	$10	$40	0.50×	400K
GPT-5 mini	$2	$8	0.50×	256K
Gemini 2.5 Flash	$0.30	$2.50	0.25×	1M
Gemini 2.5 Pro	$1.25	$10	0.25×	2M

Pricing sources

Limitations

Tokenization is probabilistic. Use tiktoken (OpenAI) or Anthropic’s count_tokens endpoint for audit-grade numbers.
Archetypes are representative mid-range samples; real filings from the same category span an order of magnitude.
Cache-read multipliers for OpenAI and Gemini are approximate — verify against the current published tier before material decisions.
Exhibits, tables, and image-based PDFs are not modelled. OCR’d tables in particular tokenize worse than narrative prose.
No batch-API discounts, no enterprise tier pricing, no multi-modal surcharges.
This is a planning tool, not investment advice, and not a substitute for the vendor’s official tokenizer or billing records.

Prompt-caching economics for finance LLM pipelines — why cache-hit rate dominates long-context cost.
Token-cost reality for LLM trading research — how input size and retries compound into real monthly bills.
Reading financial filings with LLMs in 2026 — extraction patterns, context-window trade-offs, and where caching pays back.

Changelog

2026-04-23 — Initial release with 8 models and 4 archetypes.

How Financial Document Token Estimator works