The short answer
For SEC filing extraction in 2026, DeepSeek V4-Flash is dramatically cheaper at about $0.14/$0.28 per million input/output tokens with a 1M-token context window, versus Gemini 3.5 Flash at $1.50/$9 — roughly 10x input and 30x output. On raw extraction cost DeepSeek wins decisively; pay the Gemini premium for ecosystem, reliability, and integration.
For SEC filing extraction at scale in 2026, DeepSeek V4 vs Gemini 3.5 Flash is a price-versus-ecosystem call. DeepSeek V4-Flash is dramatically cheaper — about $0.14 per million input and $0.28 per million output tokens — and both V4 variants carry a 1M-token context window that swallows a full 10-K. Gemini 3.5 Flash costs $1.50 per million input and $9 per million output, roughly 10x more on input and 30x more on output, but brings Google's tooling, reliability, and broad integration. On raw extraction cost DeepSeek wins decisively; pay the Gemini premium for ecosystem and operational maturity. Price your own filing volume in the Token-Cost Optimizer.
TL;DR
| Model | Input ($/1M) | Output ($/1M) | Context window |
|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M tokens |
| DeepSeek V4-Pro | $0.435* | $0.87* | 1M tokens |
| Gemini 3.5 Flash | $1.50 | $9.00 | large (Google docs) |
*V4-Pro reflects a 75% promotional discount ending 2026/05/31 15:59 UTC, after which list price is 1/4 of the original. Prices verified against official pages 2026-05-26.
Why this pairing matters for filings
SEC filing extraction is a long-context, high-volume, structured-output job: feed a 10-K or 10-Q, pull out specific line items, sections, or facts, and do it across thousands of filings. Two costs dominate — the input tokens of long documents and the output tokens of structured results — so per-token price and context window decide the bill. Both contenders are built for exactly this: DeepSeek V4's 1M-token window holds an entire filing without chunking, and Gemini 3.5 Flash is Google's frontier-intelligence-at-Flash-speed tier aimed at agentic workloads.
DeepSeek V4: the cost leader
DeepSeek V4-Flash lists at about $0.14 per million input tokens and $0.28 per million output tokens, with a 1M-token context window and up to 384K tokens of output per request. V4-Pro, the reasoning-heavier variant, lists at $0.435 input and $0.87 output during a 75% promotion ending 2026/05/31; afterward it settles at one quarter of its original reference price. Cache-hit input is reduced to roughly one tenth of the cache-miss rate, which matters when you reuse a long system prompt across many filings.
For pure extraction throughput, this is the cheapest credible option here by a wide margin. The 1M context means a full 10-K rarely needs chunking, removing a class of overlap and stitching bugs. The tradeoffs are ecosystem and operational: fewer native integrations than Google's stack, and a promotional-pricing structure you should model at post-promo rates so a future price step does not surprise your budget.
Gemini 3.5 Flash: the ecosystem premium
Gemini 3.5 Flash lists at $1.50 per million input tokens and $9 per million output tokens — frontier-tier intelligence at Flash speed, but not budget pricing. That is roughly 10x DeepSeek V4-Flash on input and about 30x on output. The output multiple is the one that bites on extraction, because structured pulls can be output-heavy.
What the premium buys is the Google ecosystem: mature SDKs, reliability, broad platform integration, and a model family you can scale up or down within. For a production pipeline where operational maturity and vendor stability outweigh raw token cost, that premium is often defensible. For a cost-sensitive batch job over thousands of filings, it is hard to justify against DeepSeek.
The decision
- Minimize cost per filing, comfortable operating a leaner ecosystem: DeepSeek V4-Flash.
- Need a reasoning-heavier extractor, still cost-conscious: DeepSeek V4-Pro (model post-promo pricing).
- Production pipeline where ecosystem, reliability, and integration outweigh token cost: Gemini 3.5 Flash.
- Output-heavy structured extraction: the output-price gap ($0.28 vs $9.00) favors DeepSeek sharply.
A common pattern is to extract with DeepSeek V4 for cost, then spot-check a sample with a higher-tier model to validate field accuracy before trusting the pipeline at scale.
Validate accuracy, not just price
Cheapest tokens do not mean correct extractions. Numeric precision and field grounding fail silently on filings, so a cost win means nothing if the numbers are wrong. Run a labeled sample through whichever model you pick and check field-level accuracy; the Token-Cost Optimizer prices your exact filing volume and prompt shape so the cost comparison reflects your workload, not a headline rate.
Related in this series
- Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7 Finance Extraction 2026: the three-way frontier comparison.
- Cheapest LLM for SEC 10-K Extraction Across 10,000 Filings 2026: the cost model at batch scale.
- Best LLM APIs for SEC Filing Extraction 2026: the full model field for filings work.
Connects to
- Token-Cost Optimizer: price your filing volume across models.
- Hallucination Detector: catch ungrounded numbers in extracted fields.
- SEC Filing Chunk Optimizer: chunking strategy when context limits bite.
Sources
- DeepSeek API Models & Pricing, api-docs.deepseek.com/quick_start/pricing (accessed 2026-05-26).
- Google AI for Developers, Gemini 3.5 Flash pricing (accessed 2026-05-26).
Frequently asked questions
- Is DeepSeek V4 cheaper than Gemini 3.5 Flash for SEC extraction?
- Yes, substantially: V4-Flash is about $0.14/$0.28 per million input/output against Gemini 3.5 Flash's $1.50/$9, roughly 10x cheaper input and 30x output. For output-heavy structured extraction across thousands of filings that gap dominates the bill. The Gemini premium buys ecosystem maturity, not lower token cost, so cost-sensitive batch work favors DeepSeek (verified 2026-05-26).
- Does DeepSeek V4 fit a full 10-K in context?
- Yes. Both V4 variants carry a 1M-token window with up to 384K tokens of output per request, comfortably holding a full 10-K without chunking, which removes a class of overlap and stitching bugs. Gemini 3.5 Flash also offers a large window per Google's docs. For long single-document extraction both avoid forced chunking; the differentiator is per-token price.
- What is the catch with DeepSeek V4-Pro pricing?
- The low headline rate reflects a temporary 75% promotion. V4-Pro lists at $0.435/$0.87 during the discount, which ends 2026/05/31 at 15:59 UTC, after which it settles at one quarter of the original reference rate. Budget at the post-promotion price so the scheduled step-up does not surprise you. V4-Flash ($0.14/$0.28) is the steadier cost-leader for high volume.