For a full-body 10-K with 1024-token chunks at 10% overlap on the OpenAI text-embedding-3-small model, the SEC Filing Chunk Optimizer returns: structural strategy = 131 chunks, 1017 avg tokens, 133,227 total ingested, $0.00266 one-pass embed cost. The same configuration with recursive (table-blind) chunking returns the same 131 chunks and same 133,227 total but with min token = 614 (some chunks under-fill) and a warning that recursive splitters are table-blind. Switching to a 512-token chunk size doubles the chunk count to 261 and nudges the ingestion cost to $0.00267. Switching the embedding model to voyage-finance-2 holds chunks at 131 but raises one-pass cost 6× to $0.01598. Three distinct decisions, three distinct cost-vs-quality curves.
TL;DR
Six engine runs, one 10-K body, three trade-off axes:
| Configuration | Chunks | Tokens ingested | Embed cost (one pass) | Notes |
|---|---|---|---|---|
| 10-K body, 1024 tok, 10% overlap, structural, OpenAI 3-small | 131 | 133,227 | $0.00266 | Clean table boundaries |
| 10-K body, 1024 tok, 10% overlap, recursive, OpenAI 3-small | 131 | 133,227 | $0.00266 | Table-blind warning |
| 10-K body, 512 tok, 10% overlap, structural, OpenAI 3-small | 261 | 133,371 | $0.00267 | 2× chunk count, ~same tokens |
| 10-K body, 1024 tok, 10% overlap, structural, voyage-finance-2 | 131 | 133,227 | $0.01598 | 6× the embedding cost |
| MD&A only, 1024 tok, 10% overlap, structural, OpenAI 3-small | 28 | 27,748 | $0.00055 | 4× cheaper than full body |
| Earnings call, 1024 tok, 10% overlap, structural, OpenAI 3-small | 30 | 16,950 | $0.00034 | Many short chunks (warning) |
Three orthogonal decisions drive the cost curve: chunk size (1024 vs 512), chunking strategy (structural vs recursive), and embedding model (OpenAI 3-small vs voyage-finance-2). Halving chunk size barely moves cost (the overlap is the only added token volume); switching embed model raises cost 6×; switching strategy at fixed size and model doesn't change cost but does change retrieval quality on table-heavy filings.
Why structural and recursive cost the same here
The engine reports identical chunkCount (131) and tokensIngested (133,227) for structural and recursive strategies at this configuration. That is the right answer at the ingestion-cost level: both strategies process the same source text and produce the same total token volume. The difference is where the boundaries fall, not how many boundaries exist.
The recursive strategy's warning is the load-bearing output: "Recursive splitters are table-blind. This archetype is table-heavy, expect numeric rows to be severed from their headers." The structural strategy's notes confirm the alternative: "Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together."
For retrieval quality on a 10-K body where MD&A and Notes contain table-dense numerics, this difference is the entire ball-game. A retrieval that returns a chunk containing "Revenue: 1,850" without the column header is unusable for a downstream LLM extraction. The structural chunker preserves the header-table coupling; the recursive chunker does not.
The engine cannot measure retrieval quality directly, it measures cost and boundaries. The decision is to pay the same ingestion cost and pick the strategy that preserves semantic coupling. For 10-K bodies and MD&A sections (table-heavy archetypes), structural is the unambiguous default.
Where recursive wins: earnings-call transcripts
The earnings-call archetype run shows structural's failure mode: 30 chunks with avgTokens = 565 and minTokens = 186. The warning reads: "Earnings calls have many short speaker turns. Structural chunking here yields lots of tiny chunks, consider recursive." The structural strategy's "respect speaker turns" rule produces under-filled chunks because Q&A turns are often short.
On an earnings call, a chunk of 186 tokens (a single 30-second analyst question) is too small to retrieve meaningfully. The recursive strategy would merge adjacent turns to fill toward the 1024 target, which is the right behaviour for this archetype. The engine flags the configuration mismatch in the warnings array.
The strategic implication: structural wins on documents with stable, semantically-load-bearing boundaries (10-K Items, MD&A subsections, footnotes). Recursive wins on documents with noisy or unstable boundaries (earnings-call turns, news articles, blog posts). The engine flags both directions of failure mode.
The chunk-size axis: 1024 vs 512
At 1024-token chunks the 10-K body produces 131 chunks; at 512 it produces 261. The ingestion cost barely moves, from $0.00266 to $0.00267, even though the chunk count doubled. The reason is that the overlap (10%) is absolute in token terms, not relative to chunk size, so smaller chunks have proportionally more overlap, but the marginal cost of overlap is small.
The retrieval implications, in a dense-passage-retrieval setup1, run in opposite directions:
- Smaller chunks → finer-grained retrieval. A query for a specific accounting policy is more likely to retrieve a chunk containing only that policy at 512 tokens; at 1024 the policy may be embedded in a larger Item discussion.
- Smaller chunks → less context per chunk. The downstream LLM extracting from a retrieved chunk has less surrounding context. For an MD&A discussion that spans multiple paragraphs, the 1024-chunk preserves the discussion flow; the 512-chunk fragments it.
The trade-off is task-dependent. For high-precision fact extraction (specific numbers, specific policies), 512 is the right default. For interpretive summarization (what is management saying about supply chain risk), 1024 is the right default. The engine does not adjudicate; the buyer's prompt pattern does.
The Lewis et al. RAG paper sets up this trade-off explicitly and the production folk-wisdom converges on 1024-token chunks for most retrieval-augmented LLM workflows2.
The embedding-model axis: OpenAI 3-small vs voyage-finance-2
OpenAI text-embedding-3-small lists at $0.02/Mtok; voyage-finance-2 lists at $0.12/Mtok, a 6× price differential the engine surfaces directly: $0.00266 vs $0.01598 for the same chunking output. The question is whether the 6× cost buys 6× retrieval quality.
For generic finance content the answer is usually "no, the cost is in the noise of the whole workflow." For a 1,000-filing pipeline embedded once and queried 100k times, the embedding cost is $2.66 vs $15.99, both negligible compared to the LLM-call cost at retrieval-augmented-generation time ($50–500/month at scale).
For domain-specialized retrieval (specific financial terminology, IFRS vs US-GAAP nuance, regulatory disclosure idiom), voyage-finance-2 was trained on a finance corpus and reports stronger retrieval scores on finance-specific benchmarks per its documentation. The 6× cost differential is real but the absolute number is small; the retrieval-quality differential is real and the downstream impact is large.
The defensible procedure is to run a 50-query retrieval eval on both embedding models with the buyer's own queries and pick the winner. The cost differential at retail volumes does not justify skipping the eval.
The overlap-percentage axis
The canonical run uses 10% overlap. Sweeping overlap:
- 0% overlap: 118 chunks, 120,006 tokens ingested (no boundary duplication).
- 10% overlap (canonical): 131 chunks, 133,227 tokens ingested — about 11% more tokens than the 0% case.
- 20% overlap: more chunks again and proportionally more total tokens.
- 30% overlap: more still, scaling roughly with the overlap fraction.
The overlap-cost penalty is modest in absolute terms — 10% overlap adds about 11% more ingested tokens than the 0% case (120,006 → 133,227), which on this embedding model is a fraction of a cent. The retrieval benefit of overlap is non-trivial: a query that lands on a chunk boundary gets context from both sides. 10% is the conventional default; 0% is acceptable for archetypes with strong boundary alignment to typical queries.
Connects to
- Reading Financial Filings with LLMs 2026 — the broader workflow this chunking sits in.
- Finetune vs RAG vs Long-Context for Filings — the sibling decision on retrieval architecture.
- RAG Cost Model vs Fine-Tuning — total-cost-of-ownership across the retrieval choices.
- SEC Filing Chunk Optimizer — engine endpoint.
- Financial Document Token Estimator — companion for downstream LLM cost.
- Token Cost Optimizer — broader cost-per-validated-trade frame.
References
- OpenAI. "Embeddings." platform.openai.com/docs/guides/embeddings, accessed 2026-05-21. Pricing and model documentation for text-embedding-3-small.
- Voyage AI. "voyage-finance-2 model documentation." docs.voyageai.com, accessed 2026-05-21. Domain-specialized finance embeddings.
- SEC. "Financial Reporting Manual." sec.gov. Reference for 10-K Item structure. https://www.sec.gov/divisions/corpfin/cffinancialreportingmanual.shtml
Footnotes
-
Karpukhin, V., Oğuz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." EMNLP 2020. https://arxiv.org/abs/2004.04906 ↩
-
Lewis, P., Perez, E., Piktus, A., Karpukhin, V., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. https://arxiv.org/abs/2005.11401 ↩
Verified engine output
Show the recompute-verified inputs and outputs
| archetype_id | 10k-body |
|---|---|
| chunk_size | 512 |
| overlap_pct | 0.1 |
| strategy | structural |
| embedding_model_id | openai-embed-3-small |
| query_reembed_count | 100 |
| archetype › id | 10k-body |
|---|---|
| archetype › name | 10-K (full body) |
| archetype › total tokens | 120000 |
| archetype › structural boundaries | 12 |
| archetype › table heavy | true |
| archetype › notes | Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8. |
| embedding › id | openai-embed-3-small |
| embedding › name | text-embedding-3-small |
| embedding › vendor | OpenAI |
| embedding › usd per mtokens | 0.02 |
| embedding › dim | 1536 |
| embedding › source | https://openai.com/api/pricing/ |
| strategy | structural |
| chunk count | 261 |
| avg tokens | 511 |
| min tokens | 512 |
| max tokens | 512 |
| tokens ingested | 133371 |
| embedding cost once | 0.0026674199999999998 |
| embedding cost per100 queries | 0.00008 |
| strategy notes | Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean. |
Computed live at build time.
| archetype_id | 10k-body |
|---|---|
| chunk_size | 1024 |
| overlap_pct | 0.1 |
| strategy | structural |
| embedding_model_id | voyage-finance-2 |
| query_reembed_count | 100 |
| archetype › id | 10k-body |
|---|---|
| archetype › name | 10-K (full body) |
| archetype › total tokens | 120000 |
| archetype › structural boundaries | 12 |
| archetype › table heavy | true |
| archetype › notes | Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8. |
| embedding › id | voyage-finance-2 |
| embedding › name | voyage-finance-2 |
| embedding › vendor | Voyage AI |
| embedding › usd per mtokens | 0.12 |
| embedding › dim | 1024 |
| embedding › source | https://docs.voyageai.com/docs/pricing |
| strategy | structural |
| chunk count | 131 |
| avg tokens | 1017 |
| min tokens | 1024 |
| max tokens | 1024 |
| tokens ingested | 133227 |
| embedding cost once | 0.01598724 |
| embedding cost per100 queries | 0.00048 |
| strategy notes | Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean. |
Computed live at build time.
Frequently asked questions
- Why do structural and recursive cost the same on a 10-K body?
- Both process the same source text and produce the same total token volume. The cost is identical; the retrieval-quality difference shows in the engine's warnings — pick structural for table-heavy archetypes.
- Should I always pick 1024-token chunks?
- No. 512 is better for high-precision fact extraction; 1024 is better for interpretive summarization. Pick based on the downstream prompt pattern, not on cost (the cost difference is under 12%).
- Is voyage-finance-2 worth 6× the cost?
- Yes for domain-specialized finance retrieval; no for generic workflows. The absolute cost differential at retail volumes is small enough that the eval cost outweighs picking blindly.
- What overlap percentage should I use?
- 10% is the conventional default. Higher overlap helps when queries land on chunk boundaries; the token penalty at 10% is about 11% over the 0% case, a fraction of a cent in absolute embedding cost, so the default is safe.
- Why does earnings-call structural produce a warning?
- Speaker-turn boundaries produce many short chunks (minTokens = 186) too small for retrieval. Switch to recursive for earnings calls; the engine's warning surfaces this directly.