How to use SEC Filing Chunk Optimizer
Pick a filing archetype, tune chunk size and overlap, and the page reports chunk count, embedding cost, and structural-boundary warnings across three chunking strategies so your retrieval pipeline doesn't split tables across chunks.
What It Does
Use the calculator with intent
Pick a filing archetype, tune chunk size and overlap, and the page reports chunk count, embedding cost, and structural-boundary warnings across three chunking strategies so your retrieval pipeline doesn't split tables across chunks.
RAG pipeline builders who already know that splitting a table across two chunks breaks the retrieval and want to size chunk parameters before re-indexing.
Interpreting Results
Boundary warnings are the priority — chunks that split tables or section boundaries cause retrieval misses. Total chunk count drives embedding cost; structural-aware chunking trades a higher chunk count for fewer warnings.
Input Steps
Field by field
- 1
Upload data
Upload the filing (10-K, 10-Q, 8-K) or paste the SEC EDGAR URL.
- 2
Pick option
Pick chunk strategy: retrieval (4K tokens, for embedding-search), summarization (16K tokens, for long-context), or structured (XBRL fields extracted separately).
- 3
Run calculation
Run the optimizer. It splits respecting Item boundaries and MD&A sub-sections — coherent chunks, not arbitrary character cuts.
- 4
Download
Download chunks as JSON. Each chunk includes section path, character offsets, and token count.
- 5
For
For Q&A use cases, pair with hierarchical retrieval (filing → item → paragraph) — outperforms flat retrieval on filing-Q&A benchmarks.
Common Scenarios
Use realistic starting points
10-K with many tables
Archetype
10-K
Chunk size
1500 tokens
Fixed-size chunking generates many table-split warnings; structural-aware chunking reduces them at higher chunk count + embedding cost.
Earnings transcript (Q&A heavy)
Archetype
earnings
Chunk size
2000 tokens
Q&A segments are natural chunk boundaries; structural strategy produces clean chunks aligned with speakers.
Try These Tools
Run the numbers next
Financial Document Token Estimator
Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle.
Earnings-Call Summarization Cost Calculator
LLM cost per stock per quarter to summarize earnings transcripts across Sonnet, Opus, GPT-4o, Gemini 2.5 Pro/Flash. Cache-hit-rate aware. Snapshot pricing.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Related Content