Generator
SEC Filing Chunk Optimizer
SEC filing chunk sizing + 10-K chunking cost calculator. Pick archetype, chunk size, overlap, strategy, and embedding model. Browser-only. Free.
- Inputs
- Configuration
- Runtime
- Instant
- Privacy
- Client-side · no upload
- API key
- Not required
- Methodology
- Open →
1 · Configure chunk strategy
Total chunks
138
Avg tokens/chunk
1,021
min 1,024 · max 1,024
Ingest cost (once)
$0.002818
text-embedding-3-small
Query cost (100 re-embeds)
$0.000080
Tokens embedded
140,898
2 · Results
Strategy note · Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean.
No structural warnings at these settings. Still run a retrieval eval before production — heuristics can't replace ground truth.
Archetype reference: Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8.
3 · Compare strategies (same archetype + chunk size)
| Strategy | Chunks | Avg tok | Min / Max | Ingest cost | Tradeoff |
|---|---|---|---|---|---|
| structuralselected | 138 | 1,021 | 1,024 / 1,024 | $0.002818 | Highest fidelity; uneven chunk sizes. |
| recursive | 138 | 1,021 | 614 / 1,024 | $0.002818 | Cheap + deterministic; blind to tables. |
| semantic | 132 | 1,061 | 409 / 1,433 | $0.002801 | Coherent prose groups; variable sizes, higher compute. |
How the estimate works
stride = chunk_size × (1 − overlap_pct) base_count = ceil(total_tokens / stride) structural → max(boundary_count, base_count) recursive → base_count semantic → ceil(base_count × 0.95), wider size variance ingest_cost = tokens_embedded × $/M_tokens query_cost = (40 × n_queries) × $/M_tokens
Pricing verified 2026-04-23. See methodology for archetype sources and the table-splitting pitfall.
Complementary tools
Users of this tool often explore
Financial Document Token Estimator
Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle and context-window fit check.
Structured Schema Validator for Finance
Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity checks on unit and GAAP basis.
Hallucination Detector
Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication before it ends up in your pipeline.