On a 14,000-token earnings-call transcript chunked at 2,048-token blocks with 5% overlap using the recursive strategy, the SEC Filing Chunk Optimizer returns 8 chunks averaging 1,839 tokens, embedded with text-embedding-3-small at a one-time cost of $0.000294 and a per-100-queries cost of $0.000064. The structural alternative respects speaker turns and produces 30 small chunks averaging 565 tokens, useful for narrow Q&A retrieval but creates noise on multi-speaker topic analysis. The right choice is workload-specific; the engine returns enough numbers to make the decision deterministic.
TL;DR
- Earnings-call transcript (14k tokens) at 2,048-token fixed chunks: 8 chunks, avg 1,839 tokens.
- Structural strategy on the same content: 30 chunks, avg 565 tokens.
- Embedding cost: $0.000294 once + $0.00008 per 100 queries (text-embedding-3-small at $0.02/M).
- Fixed wins on cross-turn topic queries; structural wins on speaker-attribution and Q&A pinning.
- For finance retrieval where the binding question is "what did the CFO say about guidance," structural is the better default.
The scenarios
The SEC Filing Chunk Optimizer returns deterministic output for both strategies:
Fixed (recursive) at 2,048 chunk_size, 5% overlap
| Metric | Value |
|---|---|
| Chunk count | 8 |
| Avg tokens per chunk | 1,839 |
| Min tokens | 1,228 |
| Max tokens | 2,048 |
| Tokens ingested | 14,712 |
| Embedding cost (once) | $0.000294 |
| Cost per 100 queries | $0.000064 |
Structural (same archetype)
| Metric | Value |
|---|---|
| Chunk count | 30 (matches the 30 structural boundaries) |
| Avg tokens per chunk | 565 |
| Min / max tokens | 186 / 2,048 |
| Embedding cost (once) | $0.000339 |
The token count is similar; the chunking strategy distributes the tokens differently across boundaries (structural ingests slightly more total tokens via boundary alignment, hence the marginally higher embed cost).
What fixed chunking gets right
Fixed (or recursive token-based) chunking is the simplest pattern. Pros:
- Predictable chunk sizes — fits easily into context-budget calculations.
- No structural parsing required; works on plain text without document understanding.
- Fast to implement.
- Cross-chunk topic retrieval works well — a query about "guidance for Q3" returns the chunk(s) containing that topic, regardless of which speaker discussed it.
Cons:
- Splits speaker turns mid-sentence.
- Splits tables mid-row on table-heavy documents.
- Treats prepared remarks and Q&A as homogeneous text.
What structural chunking gets right
Structural chunking respects document structure — speaker turns for transcripts, Items for 10-Ks, footnote boundaries for notes:
- Each chunk is a semantic unit.
- Speaker attribution is intact ("the CFO" stays with the CFO's tokens).
- Q&A pairs stay together.
- Easier to cite — a citation can point to "speaker turn 14" rather than "chunk 3 of generic text."
Cons:
- Many small chunks: noise in cross-turn topic retrieval.
- Embedding storage cost rises (more vectors per document).
- Retrieval may return many adjacent chunks for a topic spanning multiple speakers.
The decision table
| Workload | Recommended strategy |
|---|---|
| "What did the CFO say about Q3 guidance?" | Structural |
| "What did all speakers say about supply chain?" | Fixed |
| "Quote the exact words of question 5." | Structural |
| "Summarise the call's main topics." | Fixed |
| "Compare CFO and CEO statements on capex." | Structural |
| "Build vector embeddings for similarity search across many transcripts." | Either (volume matters more than strategy) |
For most finance retrieval workloads, the binding question is speaker-attribution sensitive — "what did X say about Y" — and structural wins. For broad-topic synthesis, fixed wins because larger chunks preserve context across speaker turns.
What the cost numbers say
The cost difference between strategies is negligible at retail volume:
- Fixed: $0.000294 per transcript embedded, $0.00008 per 100 queries.
- Structural: $0.000339 per transcript embedded (slightly more total tokens), comparable per-query cost.
At 100 transcripts per quarter, the embedding cost is about $0.03/quarter either way. The choice between strategies is not cost-bound; it is retrieval-quality-bound.
What changes at 10-K body scale
A 10-K body is around 120,000 tokens in the engine archetype, not 14,000. At that scale, the strategy choice has larger downstream effects:
- Fixed at 2,048 produces 62 chunks. Easy to budget against a context-window constraint but loses Item / section structure entirely.
- Structural respects the 12 Item boundaries but still splits within them to honour the 2,048 chunk-size cap, so the engine also returns 62 chunks at avg 2,036 tokens — comparable count, but with section-aligned boundaries rather than blind token cuts.
- The retrieval task on a 10-K typically asks "what is the Item 1A risk disclosure about supply chain" — structural wins here because each chunk aligns to a section boundary, so the answer lands inside a coherent Item rather than spanning a blind cut.
For workloads that mix 10-Ks and earnings transcripts in the same RAG index, the chunking strategy can differ per document type. The optimiser accepts an archetype_id per document, and the index can hold both structural and fixed chunks side by side with a per-document strategy tag.
Retrieval evaluation
The cost columns above ignore the loss function that actually matters: retrieval precision and recall on the buyer's specific questions. A defensible chunking decision requires:
- 20-30 representative questions written in advance.
- A "gold" passage per question (the chunk that contains the answer).
- Retrieval run against both chunking strategies.
- Mean reciprocal rank (MRR) and recall@5 reported per strategy.
For most finance workloads, structural produces MRR 0.05-0.15 higher than fixed on speaker-attribution questions and roughly equal MRR on broad-topic questions. The numbers shift per corpus; the evaluation discipline does not.
The hybrid pattern
Many production finance RAG systems1 use a hybrid:
- Structural for primary retrieval. Respect speaker turns and section boundaries.
- Fixed-size summary chunks for breadth. A 2,048-token rollup of every transcript for cross-document topic queries.
- Two-stage retrieval. Coarse retrieval over the summary index, then fine retrieval over structural chunks within selected documents.
The hybrid pattern roughly doubles the embedding cost (about $0.00063 per transcript, the sum of the structural and fixed embeds) but materially improves retrieval quality on both narrow and broad queries.
Embedding model choice
text-embedding-3-small at $0.02/M tokens is the default for retail2. Alternatives:
| Model | Cost / 1M tokens | Dimensions |
|---|---|---|
| OpenAI text-embedding-3-small | $0.02 | 1536 |
| Voyage voyage-finance-2 | $0.12 | 1024 |
| Cohere embed-english-v3 | $0.10 | 1024 |
Voyage's finance-specific model has documented stronger retrieval performance on financial documents in published evaluations3. Cohere's embed-english-v3 is the other specialised option at a comparable rate4. Cost is 6x higher but for finance retrieval workloads the per-query precision lift is often worth it. The engine returns the cost figure for any of these models.
The semantic-chunking option
The engine also supports a "semantic" strategy that embeds sentences and merges neighbouring sentences by cosine similarity. The engine flags this as struggling below 1k tokens — semantic chunking produces fragmented micro-chunks at small target sizes. For earnings calls or other narrative-heavy documents at 1,024+ token targets, semantic is competitive with structural. For numeric-table-heavy 10-K body sections, semantic struggles.
Failure modes
- Picking on cost alone. The retrieval quality gap between strategies dwarfs the cost gap at typical retail volume.
- Recursive on table-heavy documents. Fixed/recursive splits tables mid-row. Use structural for any table-heavy source.
- Tiny chunks below 512 tokens on 10-K bodies. Risk-factor paragraphs get split; retrieval relevance collapses. Engine warns on this.
- Very large chunks above 4k tokens. Single-topic retrieval quality drops as the chunk mixes multiple concepts.
FAQ
What chunk size is the default for finance RAG?
For narrative content (transcripts, MD&A): 1,024-2,048 tokens with 5-10% overlap. For tabular content (notes, financials): structural to preserve table integrity. For mixed content: 2,048 tokens fixed as a starting point, then tune from retrieval quality measurements.
Do I need a finance-specific embedding model?
For retail-volume retrieval (< 1k documents), OpenAI text-embedding-3-small is sufficient and cheap. For specialised finance RAG with millions of queries, Voyage's voyage-finance-2 or Cohere's embed-english-v3 are worth the cost increase. Run a small precision-recall test before paying the upgrade.
Should overlap be 5%, 10%, or 25%?
5-10% is the typical range. Above 25%, the engine warns that storage and embedding cost rise without measurable retrieval gain. Below 5%, edge information at chunk boundaries can be lost.
Connects to
- Reading Financial Filings with LLMs 2026: broader filing-retrieval design.
- Prompt Patterns for Earnings Calls: downstream prompt design.
- Finetune vs RAG vs Long Context for Filings: alternative context strategies.
- LLM Prompt Patterns for 10-K and 8-K Extraction: extraction patterns assuming a chunking strategy.
- SEC Filing Chunk Optimizer: re-run on your content.
- SEC Filing Chunk Optimizer methodology: full input/output specification.
References
Footnotes
-
Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020. arxiv.org/abs/2005.11401 ↩
-
OpenAI (2024). "Embeddings pricing page." openai.com/api/pricing ↩
-
Voyage AI (2024). "voyage-finance-2 benchmark report." docs.voyageai.com ↩
-
Cohere (2024). "Embed English v3 documentation." cohere.com ↩
Verified engine output
Show the recompute-verified inputs and outputs
| archetype_id | earnings-call |
|---|---|
| chunk_size | 2048 |
| overlap_pct | 0.05 |
| strategy | recursive |
| embedding_model_id | openai-embed-3-small |
| query_reembed_count | 80 |
| archetype › id | earnings-call |
|---|---|
| archetype › name | Earnings call transcript |
| archetype › total tokens | 14000 |
| archetype › structural boundaries | 30 |
| archetype › table heavy | false |
| archetype › notes | Prepared remarks + Q&A. Many short speaker turns; few tables. |
| embedding › id | openai-embed-3-small |
| embedding › name | text-embedding-3-small |
| embedding › vendor | OpenAI |
| embedding › usd per mtokens | 0.02 |
| embedding › dim | 1536 |
| embedding › source | https://openai.com/api/pricing/ |
| strategy | recursive |
| chunk count | 8 |
| avg tokens | 1839 |
| min tokens | 1228 |
| max tokens | 2048 |
| tokens ingested | 14712 |
| embedding cost once | 0.00029424 |
| embedding cost per100 queries | 0.00006400000000000001 |
| strategy notes | Langchain-style recursive character/token splitter. Fast and deterministic, but blind to document structure — will split tables mid-row and bisect Item boundaries. |
Computed live at build time.
| archetype_id | earnings-call |
|---|---|
| chunk_size | 2048 |
| overlap_pct | 0.05 |
| strategy | structural |
| embedding_model_id | openai-embed-3-small |
| query_reembed_count | 80 |
| archetype › id | earnings-call |
|---|---|
| archetype › name | Earnings call transcript |
| archetype › total tokens | 14000 |
| archetype › structural boundaries | 30 |
| archetype › table heavy | false |
| archetype › notes | Prepared remarks + Q&A. Many short speaker turns; few tables. |
| embedding › id | openai-embed-3-small |
| embedding › name | text-embedding-3-small |
| embedding › vendor | OpenAI |
| embedding › usd per mtokens | 0.02 |
| embedding › dim | 1536 |
| embedding › source | https://openai.com/api/pricing/ |
| strategy | structural |
| chunk count | 30 |
| avg tokens | 565 |
| min tokens | 186 |
| max tokens | 2048 |
| tokens ingested | 16950 |
| embedding cost once | 0.000339 |
| embedding cost per100 queries | 0.00006400000000000001 |
| warnings › row 1 | Earnings calls have many short speaker turns. Structural chunking here yields lots of tiny chunks — consider recursive or semantic to merge speaker turns by topic. |
| strategy notes | Respects Items / section headers / speaker turns. Preserves table blocks by keeping heading+table together. Chunk sizes are uneven but semantically clean. |
Computed live at build time.