Skip to main content
aifinhub
general Calculator Guide

How to use SEC Filing Chunk Optimizer

Pick a filing archetype, tune chunk size and overlap, and the page reports chunk count, embedding cost, and structural-boundary warnings across three chunking strategies so your retrieval pipeline doesn't split tables across chunks.

By Orbyd Editorial · AI Fin Hub Team

What It Does

Use the calculator with intent

Pick a filing archetype, tune chunk size and overlap, and the page reports chunk count, embedding cost, and structural-boundary warnings across three chunking strategies so your retrieval pipeline doesn't split tables across chunks.

RAG pipeline builders who already know that splitting a table across two chunks breaks the retrieval and want to size chunk parameters before re-indexing.

Interpreting Results

Boundary warnings are the priority — chunks that split tables or section boundaries cause retrieval misses. Total chunk count drives embedding cost; structural-aware chunking trades a higher chunk count for fewer warnings.

Input Steps

Field by field

  1. 1

    Upload data

    Upload the filing (10-K, 10-Q, 8-K) or paste the SEC EDGAR URL.

  2. 2

    Pick option

    Pick chunk strategy: retrieval (4K tokens, for embedding-search), summarization (16K tokens, for long-context), or structured (XBRL fields extracted separately).

  3. 3

    Run calculation

    Run the optimizer. It splits respecting Item boundaries and MD&A sub-sections — coherent chunks, not arbitrary character cuts.

  4. 4

    Download

    Download chunks as JSON. Each chunk includes section path, character offsets, and token count.

  5. 5

    For

    For Q&A use cases, pair with hierarchical retrieval (filing → item → paragraph) — outperforms flat retrieval on filing-Q&A benchmarks.

Common Scenarios

Use realistic starting points

10-K with many tables

Archetype

10-K

Chunk size

1500 tokens

Fixed-size chunking generates many table-split warnings; structural-aware chunking reduces them at higher chunk count + embedding cost.

Earnings transcript (Q&A heavy)

Archetype

earnings

Chunk size

2000 tokens

Q&A segments are natural chunk boundaries; structural strategy produces clean chunks aligned with speakers.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Sections of an SEC filing (10-K, 10-Q, 8-K) split for downstream LLM ingestion. Splits respect document structure — Item boundaries (Item 1, Item 1A, etc.), MD&A sub-sections — rather than naive character-count splits. The methodology page documents the parsing rules.

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.