The short answer

For SEC filing extraction in 2026, the pick scales by filing size and accuracy need. Gemini 2.5 Flash-Lite ($0.10/$0.40 per Mtok, 1M context) wins high-volume tagging, Claude Haiku 4.5 wins a re-queried cohort via prompt caching, Sonnet 4.6 and GPT-5.4 mini handle dense tables, and Opus 4.7 or GPT-5.5 are for synthesis, not extraction.

For extracting structured data from SEC filings in 2026, the right LLM API depends on filing size and how strict your accuracy needs are. Gemini 2.5 Flash-Lite and Gemini 2.5 Flash win the high-volume extraction tier on published price and 1M-context headroom (a full 10-K fits in one call). Claude Haiku 4.5 is the cheap-and-accurate-enough pick when you want Anthropic prompt caching across a filing cohort. Claude Sonnet 4.6 and GPT-5.4 mini are the step-up when a filing's tables and footnotes need a stronger reader. Claude Opus 4.7 and GPT-5.5 are reserved for synthesis on top of extraction, not the extraction pass itself. Size your exact pick in the Model Selector for Finance and price the loop in the Token-Cost Optimizer.

Why "best for SEC filings" is its own question

A 10-K runs 50,000 to 150,000 tokens; a 10-Q is smaller but still routinely clears 20,000. So the model that wins a generic "best LLM API" roundup is rarely the model that wins SEC extraction, because three filing-specific constraints dominate:

  • Context window. If the filing does not fit in one call you are chunking, and chunking a financial statement across calls is where numeric errors enter (a figure split from its row header, a footnote split from its table). A 1M-context model reads a full 10-K in one pass.
  • Cost per filing, not per call. EDGAR has millions of filings. The rate that matters is the all-in cost of reading one filing end to end, including the long input every call carries.
  • Numeric fidelity. Extraction is a transcription task with a correctness floor: a transposed digit in a revenue line is worse than a vague summary. The accuracy bar is real-money, not stylistic.

This roundup ranks the major API models against those three axes using published vendor pricing and documented context windows only. No accuracy benchmark numbers are asserted here; where a model reads better, that is reasoned from tier positioning, not from a test we ran.

The headline table

All per-1M-token rates are vendor list prices, verified 2026-05-25 on the pages cited in Sources. Context windows are the vendor-documented standard-tier figures.

Model Input /1M Output /1M Context Prompt caching Notes for extraction
Gemini 2.5 Flash-Lite $0.10 $0.40 1M n/a in this tool Cheapest published rate; built for highest-volume extraction
Gemini 2.5 Flash $0.30 $2.50 1M n/a in this tool Volume extraction with more headroom; 10-K fits in one call
Claude Haiku 4.5 $1.00 $5.00 200K yes (cache-read ~90% off) Cheap Anthropic tier; caching shines on a filing cohort
GPT-5.4 mini $0.75 $4.50 256K yes (cached input ~10% of rate) Stronger reader than the cheap tier; 10-Q-sized contexts
Claude Sonnet 4.6 $3.00 $15.00 1M yes (cache-read ~90% off) Step-up reader for dense tables and footnotes
GPT-5.5 $5.00 $30.00 400K yes Synthesis tier, overkill for plain extraction
Gemini 3.5 Flash $1.50 $9.00 1M yes Google's current frontier (launched May 19); agent-tier reasoning at Flash latency, overkill for plain extraction
Claude Opus 4.7 $5.00 $25.00 1M yes Flagship; reserve for reasoning over extracted data

Rates verified 2026-05-25; verify on the vendor page before committing a budget. Gemini 2.5 Flash-Lite, Flash-Lite input is the cheapest published rate in this set.

Who wins for which extraction job

Job: high-volume tagging across thousands of filings

Gemini 2.5 Flash-Lite at $0.10 input / $0.40 output, 1M context. When you are pulling a fixed set of fields (revenue, segment breakdowns, risk-factor headings) across a large filing universe, the cheapest published rate with a window big enough to swallow a full 10-K is the right default. Step up to Gemini 2.5 Flash ($0.30 / $2.50) when the cheaper tier's reading misses dense tables.

Job: a filing cohort you re-query repeatedly

Claude Haiku 4.5 with prompt caching. Anthropic's cache-read rate is roughly 90% off cached input, so when many extraction prompts share a long stable preamble (the schema, the instructions, the same filing read multiple ways), the effective input cost falls hard. The Token-Cost Optimizer prices this directly: set a cache-hit rate and it applies the cache-read rate to the cached fraction of input.

Job: dense financial statements and footnotes that the cheap tier misreads

Claude Sonnet 4.6 ($3 / $15, 1M context) or GPT-5.4 mini ($0.75 / $4.50, 256K). When extraction quality on multi-column tables and cross-referenced footnotes matters more than per-filing cost, the mid tier earns its rate. Sonnet's 1M window reads a long 10-K in one call; GPT-5.4 mini is cheaper but its 256K window may force a split on the largest filings.

Job: reasoning on top of the extracted numbers

Claude Opus 4.7 or GPT-5.5. These are not extraction models; they are what you run after extraction, to compare filings, rank risk factors, or synthesize a memo. Using the flagship for plain transcription burns money for no benefit.

The chunking trap

The single most expensive mistake in SEC extraction is splitting a filing across calls when you did not have to. A 1M-context model reads a full 10-K in one pass, which removes the failure mode where a number is separated from its header or a footnote from its table. If your filing genuinely exceeds the window, the chunking decision (overlap, structural vs fixed boundaries) becomes load-bearing for numeric accuracy. See SEC Chunking Overlap Tradeoff and Structural vs Fixed Chunking for the decision, and the SEC Filing Chunk Optimizer to size chunks for a target model window.

Where the filings come from

None of this matters without the filings. The SEC EDGAR full-text search API is free, requires no API key, and rate-limits all callers to 10 requests per second across the whole service; a descriptive User-Agent header with your contact details is mandatory or requests are refused (verified 2026-05-25). Build your ingestion against that limit, then feed the documents to whichever model wins your job above. See SEC EDGAR API 2026 for the ingestion specifics.

Pick it with the tool

The table is a starting point; your real pick depends on filing size, monthly budget, and how strict your accuracy floor is. The Model Selector for Finance takes those inputs (task, latency, cost budget, context need, quality sensitivity) and returns a ranked list with a monthly budget estimate per model. The verified output block at the foot of this page is computed live from the shipped engine for the extraction scenario described below.

Connects to

Sources

  • Anthropic. Claude API pricing (Opus 4.7 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5 per 1M tokens; 1M context on Opus/Sonnet). https://www.anthropic.com/pricing (accessed 2026-05-25).
  • OpenAI. API pricing (GPT-5.5 $5/$30, GPT-5.4 mini $0.75/$4.50 per 1M tokens; cached input ~10% of standard rate). https://openai.com/api/pricing/ (accessed 2026-05-25).
  • Google. Gemini API pricing (Gemini 3.5 Flash $1.50/$9, Gemini 2.5 Flash $0.30/$2.50, Gemini 2.5 Flash-Lite $0.10/$0.40, Gemini 2.5 Pro $1.25/$10 per 1M tokens, context-tiered above 200K). https://ai.google.dev/gemini-api/docs/pricing (accessed 2026-05-25).
  • SEC. EDGAR Application Programming Interfaces and Developer Resources (10 requests/second across all users; User-Agent header required; no API key). https://www.sec.gov/about/developer-resources (accessed 2026-05-25).
  • Model Selector for Finance model table (this hub), rates verified 2026-05-25: /model-selector-finance/.

Editorial independence

AI Fin Hub Research maintains editorial independence across sponsor relationships. Vendor placements in tools and comparators are not altered by sponsor payments. Disclosures at /sponsor-disclosure/.

Verified engine output

Show the recompute-verified inputs and outputs
SEC extraction: extract task, sub-30s latency, $50/mo budget, 200K-1M context, high quality sensitivity
Inputs
taskextract
latencysub_30s
costb50
contextk200_1m
qualityhigh
Result
ranked (10 items)[...]

Computed live at build time.

Frequently asked questions

What is the best LLM API for extracting data from SEC 10-K filings in 2026?
For high-volume structured extraction, Gemini 2.5 Flash-Lite ($0.10 input / $0.40 output per 1M tokens, 1M context) reads a full 10-K in one call at the cheapest published rate in this set. Step up to Gemini 2.5 Flash or Claude Sonnet 4.6 when dense tables and footnotes need a stronger reader. Rates verified 2026-05-25.
Why not just use the most capable model like Claude Opus 4.7 or GPT-5.5 for SEC extraction?
Extraction is a transcription task, not a reasoning task. Flagship models cost $5+ per 1M input tokens for output a cheaper model transcribes correctly. Reserve Opus 4.7 and GPT-5.5 for the synthesis pass that runs on top of the extracted numbers.
Does the context window matter for SEC filings?
Yes, more than for most tasks. A 10-K runs 50,000 to 150,000 tokens. A 1M-context model (Gemini 2.5 Flash, Claude Opus 4.7) reads it in one call; a 256K model may force a split, and chunking a financial statement is where numeric extraction errors enter.
How do I get the SEC filings themselves?
The SEC EDGAR full-text search API is free with no API key, limited to 10 requests per second across the whole service, and requires a User-Agent header with your contact details (verified 2026-05-25).