LLM Prompt Patterns for 10-K and 8-K Extraction

TL;DR

Free-form "summarize this 10-K" prompts produce unreliable, unreproducible output. Three structured patterns — field-by-field JSON extraction, citation-required verbatim quoting, and contradiction-triangle cross-check — produce extractions that are auditable, hallucination-checkable, and cheap enough to run on every filing in a watchlist. Below: each pattern as a copy-paste system prompt, what it costs, and how it composes with the Hallucination Detector.

The failure mode

You ask an LLM to "summarize the risks in this 10-K." The output is fluent, coherent, plausible. Three of the ten listed risks don't appear in the filing. One is materially restated in the LLM's words. Two are paraphrased accurately. Four are conflated with risks from a completely different filing the LLM remembers from training data.

This is the normal case for unstructured extraction. The fix is not "prompt better." The fix is structure plus verification.

Pattern 1: Field-by-field JSON extraction

Don't ask for a summary. Ask for a structured object with exact field names:

You are an SEC filings extractor. For the attached filing, return a JSON
object with EXACTLY these keys. Use null when the filing does not state
the value. Do NOT infer. Do NOT estimate. Do NOT use knowledge of other
filings.

{
  "company_legal_name": string | null,
  "period_end": string | null,  // YYYY-MM-DD
  "fiscal_year": integer | null,
  "revenue_usd": number | null,
  "net_income_usd": number | null,
  "operating_cash_flow_usd": number | null,
  "capex_usd": number | null,
  "shares_outstanding": integer | null,
  "risk_factors": string[]  // up to 10, each <= 150 chars, verbatim or paraphrased
}

Return only the JSON. No prose. No code fences.

Forces the model to commit to specific fields. Null-ability signals when the filing doesn't contain the value. Each numeric field is independently checkable. Budget: ~200–500 output tokens at Sonnet pricing ≈ $0.003 per filing.

Pattern 2: Citation-required verbatim quoting

For any numeric claim, require the model to quote the exact source sentence:

You are an SEC filings extractor. For each claim you make, you must quote
the exact sentence from the filing that supports it.

Return a JSON array of claims, each shaped:
{
  "claim": string,
  "value": number | string | null,
  "source_quote": string,    // EXACT sentence from the filing, copy-paste
  "page_or_section": string  // where in the filing the quote appeared
}

If you cannot find a verbatim source quote, omit the claim. Do NOT paraphrase.

The source_quote field is the audit anchor. After extraction, a post-processor checks that each source_quote appears literally in the filing text. Any claim whose quote doesn't match is hallucinated and dropped.

This is strictly more expensive than Pattern 1 (longer outputs) but substantially more reliable. Use for high-stakes extractions; skip for screening.

Pattern 3: Contradiction-triangle cross-check

For claims that matter to a trade decision, run the extraction three times with different framings:

Framing A: "What is the company's stated revenue for the fiscal year?"
Framing B: "What is the largest revenue figure mentioned in the income statement?"
Framing C: "Quote the sentence that states the company's total revenue."

All three should converge on the same number. If they diverge, the filing text is ambiguous (or the LLM is hallucinating). Ambiguous filings get flagged for human review; unambiguous ones pass through.

The cost is 3× Pattern 1 (~$0.009 per filing) but the catch rate for subtle errors is materially higher.

Wiring into the Hallucination Detector

Output from any of these patterns can be fed directly into /tools/hallucination-detector/ along with the source filing:

Paste the source filing's text in the "Source" box.
Paste the LLM's output in the "LLM extraction" box.
The detector highlights every numeric claim; ungrounded ones are flagged in rose, grounded ones in emerald.
Ungrounded claims show the nearest numeric value in the source so the reviewer can tell "it meant this other number" from "it made up the number."

This is the simplest production pattern: extract with Pattern 1 for coverage, cross-check with the Hallucination Detector for safety, run Pattern 3 contradiction-triangle when stakes are material.

Cost at scale

For a 500-ticker watchlist on quarterly 10-Q filings:

Pattern 1 only: 500 × $0.003 = $1.50 / quarter.
Pattern 1 + Hallucination Detector (no extra LLM cost): $1.50 / quarter.
Pattern 1 + Pattern 3 contradiction-triangle on flagged items: assume 10% flag rate → $1.50 + 50 × $0.009 = $1.95 / quarter.

For reference, the Token-Cost Optimizer models loops like this across all 8 tracked LLMs — use it to pick the right model for your specific workload.

What these patterns don't fix

Filing errors themselves. If the 10-K has a typo or misstatement, the extractor will faithfully extract the error. Cross-reference against the 10-K's audited financials and the company's 8-K corrections.
Financial-reporting games. Non-GAAP adjustments, one-time items, restated comparatives — the extractor sees what the filing says, not what it means. A structured extractor is necessary but not sufficient for a trading thesis.
Missing context. A risk factor that says "continued pricing pressure in specialty components" means one thing in 2019 and another in 2025. The LLM doesn't know the context drift; you do. Use the extracted text as a starting point, not a conclusion.

References

Bailey, D. H., & Lopez de Prado, M. — on selection bias from over-extraction.
SEC EDGAR filings corpus — the authoritative source; always cross-check against the filing itself, not the LLM output.
Anthropic prompt caching documentation — for cost reduction on repeated filings-style prompts.