On a 14,000-token earnings-call transcript chunked at 2,048-token blocks with 5% overlap using the recursive strategy, the SEC Filing Chunk Optimizer returns 8 chunks averaging 1,839 tokens, embedded with text-embedding-3-small at a one-time cost of $0.000294 and a per-100-queries cost of $0.000064. The structural alternative respects speaker turns and produces 30 small chunks averaging 565 tokens, useful for narrow Q&A retrieval but creates noise on multi-speaker topic analysis. The right choice is workload-specific; the engine returns enough numbers to make the decision deterministic.

TL;DR

  • Earnings-call transcript (14k tokens) at 2,048-token fixed chunks: 8 chunks, avg 1,839 tokens.
  • Structural strategy on the same content: 30 chunks, avg 565 tokens.
  • Embedding cost: $0.000294 once + $0.00008 per 100 queries (text-embedding-3-small at $0.02/M).
  • Fixed wins on cross-turn topic queries; structural wins on speaker-attribution and Q&A pinning.
  • For finance retrieval where the binding question is "what did the CFO say about guidance," structural is the better default.

The scenarios

The SEC Filing Chunk Optimizer returns deterministic output for both strategies:

Fixed (recursive) at 2,048 chunk_size, 5% overlap

Metric Value
Chunk count 8
Avg tokens per chunk 1,839
Min tokens 1,228
Max tokens 2,048
Tokens ingested 14,712
Embedding cost (once) $0.000294
Cost per 100 queries $0.000064

Structural (same archetype)

Metric Value
Chunk count 30 (matches the 30 structural boundaries)
Avg tokens per chunk 565
Min / max tokens 186 / 2,048
Embedding cost (once) $0.000339

The token count is similar; the chunking strategy distributes the tokens differently across boundaries (structural ingests slightly more total tokens via boundary alignment, hence the marginally higher embed cost).

What fixed chunking gets right

Fixed (or recursive token-based) chunking is the simplest pattern. Pros:

  • Predictable chunk sizes — fits easily into context-budget calculations.
  • No structural parsing required; works on plain text without document understanding.
  • Fast to implement.
  • Cross-chunk topic retrieval works well — a query about "guidance for Q3" returns the chunk(s) containing that topic, regardless of which speaker discussed it.

Cons:

  • Splits speaker turns mid-sentence.
  • Splits tables mid-row on table-heavy documents.
  • Treats prepared remarks and Q&A as homogeneous text.

What structural chunking gets right

Structural chunking respects document structure — speaker turns for transcripts, Items for 10-Ks, footnote boundaries for notes:

  • Each chunk is a semantic unit.
  • Speaker attribution is intact ("the CFO" stays with the CFO's tokens).
  • Q&A pairs stay together.
  • Easier to cite — a citation can point to "speaker turn 14" rather than "chunk 3 of generic text."

Cons:

  • Many small chunks: noise in cross-turn topic retrieval.
  • Embedding storage cost rises (more vectors per document).
  • Retrieval may return many adjacent chunks for a topic spanning multiple speakers.

The decision table

Workload Recommended strategy
"What did the CFO say about Q3 guidance?" Structural
"What did all speakers say about supply chain?" Fixed
"Quote the exact words of question 5." Structural
"Summarise the call's main topics." Fixed
"Compare CFO and CEO statements on capex." Structural
"Build vector embeddings for similarity search across many transcripts." Either (volume matters more than strategy)

For most finance retrieval workloads, the binding question is speaker-attribution sensitive — "what did X say about Y" — and structural wins. For broad-topic synthesis, fixed wins because larger chunks preserve context across speaker turns.

What the cost numbers say

The cost difference between strategies is negligible at retail volume:

  • Fixed: $0.000294 per transcript embedded, $0.00008 per 100 queries.
  • Structural: $0.000339 per transcript embedded (slightly more total tokens), comparable per-query cost.

At 100 transcripts per quarter, the embedding cost is about $0.03/quarter either way. The choice between strategies is not cost-bound; it is retrieval-quality-bound.

What changes at 10-K body scale

A 10-K body is around 120,000 tokens in the engine archetype, not 14,000. At that scale, the strategy choice has larger downstream effects:

  • Fixed at 2,048 produces 62 chunks. Easy to budget against a context-window constraint but loses Item / section structure entirely.
  • Structural respects the 12 Item boundaries but still splits within them to honour the 2,048 chunk-size cap, so the engine also returns 62 chunks at avg 2,036 tokens — comparable count, but with section-aligned boundaries rather than blind token cuts.
  • The retrieval task on a 10-K typically asks "what is the Item 1A risk disclosure about supply chain" — structural wins here because each chunk aligns to a section boundary, so the answer lands inside a coherent Item rather than spanning a blind cut.

For workloads that mix 10-Ks and earnings transcripts in the same RAG index, the chunking strategy can differ per document type. The optimiser accepts an archetype_id per document, and the index can hold both structural and fixed chunks side by side with a per-document strategy tag.

Retrieval evaluation

The cost columns above ignore the loss function that actually matters: retrieval precision and recall on the buyer's specific questions. A defensible chunking decision requires:

  1. 20-30 representative questions written in advance.
  2. A "gold" passage per question (the chunk that contains the answer).
  3. Retrieval run against both chunking strategies.
  4. Mean reciprocal rank (MRR) and recall@5 reported per strategy.

For most finance workloads, structural produces MRR 0.05-0.15 higher than fixed on speaker-attribution questions and roughly equal MRR on broad-topic questions. The numbers shift per corpus; the evaluation discipline does not.

The hybrid pattern

Many production finance RAG systems1 use a hybrid:

  1. Structural for primary retrieval. Respect speaker turns and section boundaries.
  2. Fixed-size summary chunks for breadth. A 2,048-token rollup of every transcript for cross-document topic queries.
  3. Two-stage retrieval. Coarse retrieval over the summary index, then fine retrieval over structural chunks within selected documents.

The hybrid pattern roughly doubles the embedding cost (about $0.00063 per transcript, the sum of the structural and fixed embeds) but materially improves retrieval quality on both narrow and broad queries.

Embedding model choice

text-embedding-3-small at $0.02/M tokens is the default for retail2. Alternatives:

Model Cost / 1M tokens Dimensions
OpenAI text-embedding-3-small $0.02 1536
Voyage voyage-finance-2 $0.12 1024
Cohere embed-english-v3 $0.10 1024

Voyage's finance-specific model has documented stronger retrieval performance on financial documents in published evaluations3. Cohere's embed-english-v3 is the other specialised option at a comparable rate4. Cost is 6x higher but for finance retrieval workloads the per-query precision lift is often worth it. The engine returns the cost figure for any of these models.

The semantic-chunking option

The engine also supports a "semantic" strategy that embeds sentences and merges neighbouring sentences by cosine similarity. The engine flags this as struggling below 1k tokens — semantic chunking produces fragmented micro-chunks at small target sizes. For earnings calls or other narrative-heavy documents at 1,024+ token targets, semantic is competitive with structural. For numeric-table-heavy 10-K body sections, semantic struggles.

Failure modes

  • Picking on cost alone. The retrieval quality gap between strategies dwarfs the cost gap at typical retail volume.
  • Recursive on table-heavy documents. Fixed/recursive splits tables mid-row. Use structural for any table-heavy source.
  • Tiny chunks below 512 tokens on 10-K bodies. Risk-factor paragraphs get split; retrieval relevance collapses. Engine warns on this.
  • Very large chunks above 4k tokens. Single-topic retrieval quality drops as the chunk mixes multiple concepts.

FAQ

What chunk size is the default for finance RAG?

For narrative content (transcripts, MD&A): 1,024-2,048 tokens with 5-10% overlap. For tabular content (notes, financials): structural to preserve table integrity. For mixed content: 2,048 tokens fixed as a starting point, then tune from retrieval quality measurements.

Do I need a finance-specific embedding model?

For retail-volume retrieval (< 1k documents), OpenAI text-embedding-3-small is sufficient and cheap. For specialised finance RAG with millions of queries, Voyage's voyage-finance-2 or Cohere's embed-english-v3 are worth the cost increase. Run a small precision-recall test before paying the upgrade.

Should overlap be 5%, 10%, or 25%?

5-10% is the typical range. Above 25%, the engine warns that storage and embedding cost rise without measurable retrieval gain. Below 5%, edge information at chunk boundaries can be lost.

Connects to

References

Footnotes

  1. Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. arxiv.org/abs/2005.11401

  2. OpenAI (2024). "Embeddings pricing page." openai.com/api/pricing

  3. Voyage AI (2024). "voyage-finance-2 benchmark report." docs.voyageai.com

  4. Cohere (2024). "Embed English v3 documentation." cohere.com

Verified engine output

Show the recompute-verified inputs and outputs
Recursive (fixed) chunking: earnings call, 2,048 chunk size, 5% overlap, text-embedding-3-small
Inputs
archetype_idearnings-call
chunk_size2048
overlap_pct0.05
strategyrecursive
embedding_model_idopenai-embed-3-small
query_reembed_count80
Result
archetype › idearnings-call
archetype › nameEarnings call transcript
archetype › total tokens14000
archetype › structural boundaries30
archetype › table heavyfalse
archetype › notesPrepared remarks + Q&A. Many short speaker turns; few tables.
embedding › idopenai-embed-3-small
embedding › nametext-embedding-3-small
embedding › vendorOpenAI
embedding › usd per mtokens0.02
embedding › dim1536
embedding › sourcehttps://openai.com/api/pricing/
strategyrecursive
chunk count8
avg tokens1839
min tokens1228
max tokens2048
tokens ingested14712
embedding cost once0.00029424
embedding cost per100 queries0.00006400000000000001
strategy notesLangchain-style recursive character/token splitter. Fast and deterministic, but blind to document structure — will split tables mid-row and bisect Item boundaries.

Computed live at build time.

Structural chunking: same earnings call, respecting speaker turns
Inputs
archetype_idearnings-call
chunk_size2048
overlap_pct0.05
strategystructural
embedding_model_idopenai-embed-3-small
query_reembed_count80
Result
archetype › idearnings-call
archetype › nameEarnings call transcript
archetype › total tokens14000
archetype › structural boundaries30
archetype › table heavyfalse
archetype › notesPrepared remarks + Q&A. Many short speaker turns; few tables.
embedding › idopenai-embed-3-small
embedding › nametext-embedding-3-small
embedding › vendorOpenAI
embedding › usd per mtokens0.02
embedding › dim1536
embedding › sourcehttps://openai.com/api/pricing/
strategystructural
chunk count30
avg tokens565
min tokens186
max tokens2048
tokens ingested16950
embedding cost once0.000339
embedding cost per100 queries0.00006400000000000001
warnings › row 1Earnings calls have many short speaker turns. Structural chunking here yields lots of tiny chunks — consider recursive or semantic to merge speaker turns by topic.
strategy notesRespects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean.

Computed live at build time.