Does RAG remove the need to verify the model's numbers?

No. Retrieval reduces fabrication by grounding the answer in real passages, but the model can still transcribe a figure incorrectly, confuse units, or read the wrong row of a table. Numeric claims need an explicit check against the source regardless of how good retrieval is, because the failure mode is in the reading and copying, not in whether the right passage was found.

Should I use RAG or just paste the whole filing into a long-context model?

Both have a place. Long-context generation is simpler and avoids retrieval errors, but it is more expensive per query, can bury the relevant passage in a large context, and does not scale to questions that span many filings. RAG retrieves only the relevant passages, which is cheaper at scale and lets you cite sources cleanly. Many production pipelines combine them: retrieve to narrow the candidates, then pass the survivors to a long-context model.

How do I handle tables and financial statements in filings?

Keep tables intact during chunking so a row never gets separated from its header and the table never splits across chunks. For the numbers themselves, prefer extracting them with a structured schema and verifying each value against the source, rather than letting the model freely read a table in prose. Tables are where transcription and unit errors cluster, so they deserve the strictest verification.

AI in Markets Guide

How to Build a RAG Pipeline Over SEC Filings

SEC filings are long, structured, and dense with numbers, which makes them a strong fit for retrieval-augmented generation and an unforgiving one for mistakes. A pipeline that retrieves the right passage and grounds its answer there is far more reliable than a model answering from memory. But retrieval alone does not stop fabrication, and filings carry restatement and point-in-time traps. The pipeline is covered end to end, from ingestion through numerical verification.

10 MIN READPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

On This Page

Before you start 6 steps Common mistakes FAQ

Before You Start

Set up the inputs that make the next steps easier

Access to filings, whether from EDGAR directly or a fundamentals data vendor with point-in-time history.

An embedding model and a vector store, plus a generation model with a large enough context window for the retrieved passages.

A deterministic way to compute or look up any figure the model is allowed to state, so its numbers can be checked.

Guide Steps

Move through it in order

Each step focuses on one decision so you can keep momentum without losing the thread.

1

Ingest filings with point-in-time discipline

Pull the filings and store both the document and the date it was filed, so a query about a past period retrieves what was knowable then rather than a later restatement. SEC filings get amended, and fundamentals get restated; answering a historical question with restated figures is a subtle form of look-ahead bias. Normalize the document structure (items, sections, tables) during ingestion so later chunking can respect those boundaries.

Store the filing type and date as metadata on every chunk. It lets you filter retrieval to the right period and to the right document, which matters when a company has dozens of filings.
2

Chunk on structural boundaries

Split filings into chunks that respect the document's structure rather than cutting at arbitrary character counts. A chunk that spans the boundary between two unrelated items dilutes retrieval relevance, and one that splits a table from its header loses the meaning of the numbers. Tune chunk size and overlap so each chunk is self-contained: large enough to carry context, small enough to stay on one topic. Compare fixed-size, recursive, and structure-aware strategies on your filings.

Watch for chunks that split financial tables. A retrieved half-table is worse than useless because it gives the model numbers without their labels.

Use The ToolGenerators
SEC Filing Chunk Optimizer
Pick a filing archetype, tune chunk size and overlap, and see chunk count, embedding cost, and structural-boundary warnings across three chunking strategies.
ToolOpen ->
3

Embed, index, and estimate the cost

Embed each chunk with your embedding model and store the vectors in an index alongside the chunk text and metadata. The embedding step has a real and predictable cost that scales with total token count, so estimate it before running the full corpus. For a large filing archive this is the dominant one-time cost, and the chunk size you chose directly drives it: smaller chunks mean more chunks and more embeddings.

Estimate the token count and embedding cost for one representative filing first, then multiply by your corpus size. It catches budget surprises before you embed thousands of documents.
4

Retrieve the relevant passages

At query time, embed the question and retrieve the top matching chunks, filtered by the metadata that scopes the query to the right company, filing, and period. Retrieve enough chunks to cover the answer but few enough to fit the context window without burying the relevant passage in noise. Quality here is decisive: if retrieval misses the passage that contains the answer, no amount of clever prompting will recover it.

Always scope retrieval by company and period metadata before semantic ranking. Semantic similarity alone will happily return the right concept from the wrong year.
5

Generate with mandatory citations

Prompt the model to answer only from the retrieved passages and to cite the specific passage backing each claim. Then check that the cited passage actually supports the statement, and reject or flag answers where it does not. Grounding lowers fabrication but does not eliminate it: models still occasionally state claims their own cited source does not contain. The citation-faithfulness check is what turns grounding into something you can rely on.

Treat an uncited claim as a failed answer, not a stylistic lapse. If the model cannot point to a passage, it is answering from memory, which is what RAG exists to prevent.
6

Verify every extracted number

Do not trust a number just because it appears in a retrieved passage and the model repeated it. Models transcribe figures incorrectly, mix up units, and pull the wrong line from a table. Run each numeric claim in the output through a check against the source text, and surface any mismatch rather than silently accepting the model's value. For derived figures like ratios, compute them deterministically and have the model present the verified result.

Numeric transcription errors are the most common and most damaging failure in filing extraction. A per-number check catches them before they reach a decision.

Use The ToolPlaygrounds
Hallucination Detector
Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.
ToolOpen ->

Common Mistakes

The misses that undo good inputs

Chunking by character count and ignoring structure

Arbitrary cuts split tables from headers and merge unrelated sections, which degrades retrieval relevance and feeds the model numbers stripped of their labels. Structure-aware chunking is the single highest-leverage fix.

Trusting retrieved numbers without verification

Retrieval gets the right passage in front of the model, but the model can still transcribe a figure wrong or read the wrong table row. Without a numeric check, these errors flow straight into the answer with full confidence.

Ignoring restatements and point-in-time dates

Answering a historical question with later restated figures is look-ahead bias. A backtest or analysis built on it will look better than reality, because it used numbers that were not available at the time.

Try These Tools

Run the numbers next

CalculatorsCalculator

Financial Document Token Estimator

Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across ten frontier LLMs, with cache-hit toggle.

Launch toolOpen ->

PlaygroundsCalculator

Structured Schema Validator for Finance

Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

There is no universal size, but the goal is self-contained passages that respect structure. Too large and retrieval returns diluted, off-topic context; too small and a chunk loses the surrounding context needed to interpret it, and the chunk count and embedding cost explode. Start by aligning chunks to filing sections and items, then tune size and overlap against retrieval quality on your own queries rather than picking a fixed token count.

Sources & References

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., NeurIPS (2020)
EDGAR Full-Text Search and Filing Access — U.S. Securities and Exchange Commission

Keep the topic connected

AI in Markets1 FAQS

LLM Hallucination Detection in Finance

How to detect LLM hallucinations in financial outputs: citation grounding, verifiable-claim checks, and cross-model agreement that flag fabricated data.

Keep readingRead ->

Backtesting & Validation1 FAQS

Look-Ahead Bias

Look-ahead bias: when a backtest accidentally uses data the strategy wouldn't have had at decision time. The most common variants and how to catch them.

Keep readingRead ->

AI in Markets2 FAQS

MCP (Model Context Protocol)

Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.

Keep readingRead ->

AI in Markets14 ITEMS

LLM for Finance Deployment Checklist

A pre-flight checklist for putting a large language model into a finance workflow: scoping, grounding, input security, numerical verification, and drift monitoring.

Keep readingRead ->

Set up the inputs that make the next steps easier

Move through it in order

Ingest filings with point-in-time discipline

Chunk on structural boundaries

Embed, index, and estimate the cost

Retrieve the relevant passages

Generate with mandatory citations

Verify every extracted number

The misses that undo good inputs

Chunking by character count and ignoring structure

Trusting retrieved numbers without verification

Ignoring restatements and point-in-time dates

Run the numbers next

Financial Document Token Estimator

Structured Schema Validator for Finance

Questions people ask next

Keep the topic connected

LLM Hallucination Detection in Finance

Look-Ahead Bias

MCP (Model Context Protocol)

LLM for Finance Deployment Checklist