Why do structural and recursive cost the same on a 10-K body?

Both process the same source text and produce the same total token volume. The cost is identical; the retrieval-quality difference shows in the engine's warnings — pick structural for table-heavy archetypes.

Should I always pick 1024-token chunks?

No. 512 is better for high-precision fact extraction; 1024 is better for interpretive summarization. Pick based on the downstream prompt pattern, not on cost (the cost difference is under 12%).

Is voyage-finance-2 worth 6× the cost?

Yes for domain-specialized finance retrieval; no for generic workflows. The absolute cost differential at retail volumes is small enough that the eval cost outweighs picking blindly.

What overlap percentage should I use?

10% is the conventional default. Higher overlap helps when queries land on chunk boundaries; the token penalty at 10% is about 11% over the 0% case, a fraction of a cent in absolute embedding cost, so the default is safe.

Why does earnings-call structural produce a warning?

Speaker-turn boundaries produce many short chunks (minTokens = 186) too small for retrieval. Switch to recursive for earnings calls; the engine's warning surfaces this directly.

SEC Filing Chunking: Strategy, Size, and Embedding Cost

For a full-body 10-K with 1024-token chunks at 10% overlap on the OpenAI text-embedding-3-small model, the SEC Filing Chunk Optimizer returns: structural strategy = 131 chunks, 1017 avg tokens, 133,227 total ingested, $0.00266 one-pass embed cost. The same configuration with recursive (table-blind) chunking returns the same 131 chunks and same 133,227 total but with min token = 614 (some chunks under-fill) and a warning that recursive splitters are table-blind. Switching to a 512-token chunk size doubles the chunk count to 261 and nudges the ingestion cost to $0.00267. Switching the embedding model to voyage-finance-2 holds chunks at 131 but raises one-pass cost 6× to $0.01598. Three distinct decisions, three distinct cost-vs-quality curves.

TL;DR

Six engine runs, one 10-K body, three trade-off axes:

Configuration	Chunks	Tokens ingested	Embed cost (one pass)	Notes
10-K body, 1024 tok, 10% overlap, structural, OpenAI 3-small	131	133,227	$0.00266	Clean table boundaries
10-K body, 1024 tok, 10% overlap, recursive, OpenAI 3-small	131	133,227	$0.00266	Table-blind warning
10-K body, 512 tok, 10% overlap, structural, OpenAI 3-small	261	133,371	$0.00267	2× chunk count, ~same tokens
10-K body, 1024 tok, 10% overlap, structural, voyage-finance-2	131	133,227	$0.01598	6× the embedding cost
MD&A only, 1024 tok, 10% overlap, structural, OpenAI 3-small	28	27,748	$0.00055	4× cheaper than full body
Earnings call, 1024 tok, 10% overlap, structural, OpenAI 3-small	30	16,950	$0.00034	Many short chunks (warning)

Three orthogonal decisions drive the cost curve: chunk size (1024 vs 512), chunking strategy (structural vs recursive), and embedding model (OpenAI 3-small vs voyage-finance-2). Halving chunk size barely moves cost (the overlap is the only added token volume); switching embed model raises cost 6×; switching strategy at fixed size and model doesn't change cost but does change retrieval quality on table-heavy filings.

Why structural and recursive cost the same here

The engine reports identical chunkCount (131) and tokensIngested (133,227) for structural and recursive strategies at this configuration. That is the right answer at the ingestion-cost level: both strategies process the same source text and produce the same total token volume. The difference is where the boundaries fall, not how many boundaries exist.

The recursive strategy's warning is the load-bearing output: "Recursive splitters are table-blind. This archetype is table-heavy, expect numeric rows to be severed from their headers." The structural strategy's notes confirm the alternative: "Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together."

For retrieval quality on a 10-K body where MD&A and Notes contain table-dense numerics, this difference is the entire ball-game. A retrieval that returns a chunk containing "Revenue: 1,850" without the column header is unusable for a downstream LLM extraction. The structural chunker preserves the header-table coupling; the recursive chunker does not.

The engine cannot measure retrieval quality directly, it measures cost and boundaries. The decision is to pay the same ingestion cost and pick the strategy that preserves semantic coupling. For 10-K bodies and MD&A sections (table-heavy archetypes), structural is the unambiguous default.

Where recursive wins: earnings-call transcripts

The earnings-call archetype run shows structural's failure mode: 30 chunks with avgTokens = 565 and minTokens = 186. The warning reads: "Earnings calls have many short speaker turns. Structural chunking here yields lots of tiny chunks, consider recursive." The structural strategy's "respect speaker turns" rule produces under-filled chunks because Q&A turns are often short.

On an earnings call, a chunk of 186 tokens (a single 30-second analyst question) is too small to retrieve meaningfully. The recursive strategy would merge adjacent turns to fill toward the 1024 target, which is the right behaviour for this archetype. The engine flags the configuration mismatch in the warnings array.

The strategic implication: structural wins on documents with stable, semantically-load-bearing boundaries (10-K Items, MD&A subsections, footnotes). Recursive wins on documents with noisy or unstable boundaries (earnings-call turns, news articles, blog posts). The engine flags both directions of failure mode.

The chunk-size axis: 1024 vs 512

At 1024-token chunks the 10-K body produces 131 chunks; at 512 it produces 261. The ingestion cost barely moves, from $0.00266 to $0.00267, even though the chunk count doubled. The reason is that the overlap (10%) is absolute in token terms, not relative to chunk size, so smaller chunks have proportionally more overlap, but the marginal cost of overlap is small.

The retrieval implications, in a dense-passage-retrieval setup¹, run in opposite directions:

Smaller chunks → finer-grained retrieval. A query for a specific accounting policy is more likely to retrieve a chunk containing only that policy at 512 tokens; at 1024 the policy may be embedded in a larger Item discussion.
Smaller chunks → less context per chunk. The downstream LLM extracting from a retrieved chunk has less surrounding context. For an MD&A discussion that spans multiple paragraphs, the 1024-chunk preserves the discussion flow; the 512-chunk fragments it.

The trade-off is task-dependent. For high-precision fact extraction (specific numbers, specific policies), 512 is the right default. For interpretive summarization (what is management saying about supply chain risk), 1024 is the right default. The engine does not adjudicate; the buyer's prompt pattern does.

The Lewis et al. RAG paper sets up this trade-off explicitly and the production folk-wisdom converges on 1024-token chunks for most retrieval-augmented LLM workflows².

The embedding-model axis: OpenAI 3-small vs voyage-finance-2

OpenAI text-embedding-3-small lists at $0.02/Mtok; voyage-finance-2 lists at $0.12/Mtok, a 6× price differential the engine surfaces directly: $0.00266 vs $0.01598 for the same chunking output. The question is whether the 6× cost buys 6× retrieval quality.

For generic finance content the answer is usually "no, the cost is in the noise of the whole workflow." For a 1,000-filing pipeline embedded once and queried 100k times, the embedding cost is $2.66 vs $15.99, both negligible compared to the LLM-call cost at retrieval-augmented-generation time ($50–500/month at scale).

For domain-specialized retrieval (specific financial terminology, IFRS vs US-GAAP nuance, regulatory disclosure idiom), voyage-finance-2 was trained on a finance corpus and reports stronger retrieval scores on finance-specific benchmarks per its documentation. The 6× cost differential is real but the absolute number is small; the retrieval-quality differential is real and the downstream impact is large.

The defensible procedure is to run a 50-query retrieval eval on both embedding models with the buyer's own queries and pick the winner. The cost differential at retail volumes does not justify skipping the eval.

The overlap-percentage axis

The canonical run uses 10% overlap. Sweeping overlap:

0% overlap: 118 chunks, 120,006 tokens ingested (no boundary duplication).
10% overlap (canonical): 131 chunks, 133,227 tokens ingested — about 11% more tokens than the 0% case.
20% overlap: more chunks again and proportionally more total tokens.
30% overlap: more still, scaling roughly with the overlap fraction.

The overlap-cost penalty is modest in absolute terms — 10% overlap adds about 11% more ingested tokens than the 0% case (120,006 → 133,227), which on this embedding model is a fraction of a cent. The retrieval benefit of overlap is non-trivial: a query that lands on a chunk boundary gets context from both sides. 10% is the conventional default; 0% is acceptable for archetypes with strong boundary alignment to typical queries.

Connects to

Reading Financial Filings with LLMs 2026 — the broader workflow this chunking sits in.
Finetune vs RAG vs Long-Context for Filings — the sibling decision on retrieval architecture.
RAG Cost Model vs Fine-Tuning — total-cost-of-ownership across the retrieval choices.
SEC Filing Chunk Optimizer — engine endpoint.
Financial Document Token Estimator — companion for downstream LLM cost.
Token Cost Optimizer — broader cost-per-validated-trade frame.

References

OpenAI. "Embeddings." platform.openai.com/docs/guides/embeddings, accessed 2026-05-21. Pricing and model documentation for text-embedding-3-small.
Voyage AI. "voyage-finance-2 model documentation." docs.voyageai.com, accessed 2026-05-21. Domain-specialized finance embeddings.
SEC. "Financial Reporting Manual." sec.gov. Reference for 10-K Item structure. https://www.sec.gov/divisions/corpfin/cffinancialreportingmanual.shtml

Karpukhin, V., Oğuz, B., Min, S., et al. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." EMNLP 2020. https://arxiv.org/abs/2004.04906 ↩
Lewis, P., Perez, E., Piktus, A., Karpukhin, V., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. https://arxiv.org/abs/2005.11401 ↩

Verified engine output

Show the recompute-verified inputs and outputs

10-K body, 512-token chunks, 10% overlap, structural, OpenAI 3-small

Inputs
archetype_id	10k-body
chunk_size	512
overlap_pct	0.1
strategy	structural
embedding_model_id	openai-embed-3-small
query_reembed_count	100

Result
archetype › id	10k-body
archetype › name	10-K (full body)
archetype › total tokens	120000
archetype › structural boundaries	12
archetype › table heavy	true
archetype › notes	Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8.
embedding › id	openai-embed-3-small
embedding › name	text-embedding-3-small
embedding › vendor	OpenAI
embedding › usd per mtokens	0.02
embedding › dim	1536
embedding › source	https://openai.com/api/pricing/
strategy	structural
chunk count	261
avg tokens	511
min tokens	512
max tokens	512
tokens ingested	133371
embedding cost once	0.0026674199999999998
embedding cost per100 queries	0.00008
strategy notes	Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean.

Computed live at build time.

10-K body, 1024-token chunks, structural, voyage-finance-2 (6x embedding cost)

Inputs
archetype_id	10k-body
chunk_size	1024
overlap_pct	0.1
strategy	structural
embedding_model_id	voyage-finance-2
query_reembed_count	100

Result
archetype › id	10k-body
archetype › name	10-K (full body)
archetype › total tokens	120000
archetype › structural boundaries	12
archetype › table heavy	true
archetype › notes	Form 10-K business + risk + MD&A + financials. ~12 Items. Dense tables in Item 7 / 8.
embedding › id	voyage-finance-2
embedding › name	voyage-finance-2
embedding › vendor	Voyage AI
embedding › usd per mtokens	0.12
embedding › dim	1024
embedding › source	https://docs.voyageai.com/docs/pricing
strategy	structural
chunk count	131
avg tokens	1017
min tokens	1024
max tokens	1024
tokens ingested	133227
embedding cost once	0.01598724
embedding cost per100 queries	0.00048
strategy notes	Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean.