Zero-Shot vs Few-Shot Extraction
When extracting structured data from filings with an LLM, you choose how much to show the model before the target document. Zero-shot gives instructions and the document, trusting the model's general capability. Few-shot prepends a handful of input-output examples that demonstrate exactly the format and edge-case handling you want. More examples cost more input tokens per call but steer the model toward consistent, correct output. With modern prompt caching, that fixed example block can be cached and reused across thousands of documents, which changes the economics. This matrix compares the two for production extraction.
On This Page
Provides only task instructions and the target document, with no examples. Relies entirely on the model's general ability to follow the spec.
Pros
- Cheapest per call, since the prompt carries no example tokens
- Simplest to write and maintain, with nothing to curate but the instructions
- Fast to iterate when the task is easy and the model already handles it well
- No risk of examples biasing the model toward an unrepresentative pattern
Cons
- Less consistent output format, often needing a schema validator to catch deviations
- Weaker on edge cases and ambiguous fields the instructions cannot fully specify
- More sensitive to instruction wording, so small prompt changes can swing results
- Higher error rate on nuanced extraction, where a demonstration would clarify intent
Simple, well-specified extractions, rapid prototyping, and capable models on tasks they already do reliably
Prepends several input-output examples that demonstrate the desired format and edge-case handling before the target document.
Pros
- Markedly better format adherence, since examples show the exact output shape expected
- Higher accuracy on edge cases the examples cover, reducing ambiguous-field errors
- More robust to instruction wording, because the examples anchor the behavior
- The fixed example block is ideal for prompt caching, amortizing its cost across many calls
Cons
- More input tokens per call before caching, raising cost and latency
- Examples must be curated and kept representative, or they bias the model wrongly
- Poorly chosen examples can overfit the model to a narrow pattern and hurt generalization
- Longer prompts can crowd a small context window on very large documents
High-volume extraction needing consistent format, nuanced fields, and pipelines that can cache the example block across calls
Decision Table
See the tradeoffs side by side
| Criterion | Zero-Shot Prompting | Few-Shot Prompting |
|---|---|---|
| Examples in prompt | None | A handful, curated |
| Format consistency | Lower | Higher |
| Edge-case accuracy | Weaker | Stronger on covered cases |
| Tokens per call | Fewer | More, before caching |
| Prompt-cache fit | Less to cache | Fixed block caches well |
| Maintenance | Instructions only | Curate and refresh examples |
Verdict
For one-off or simple, well-specified extractions, zero-shot is the right starting point: it is cheap, fast, and a capable model often handles a clear task without demonstrations. The moment you move to high-volume production extraction with nuanced fields and a strict output schema, few-shot examples usually pay for themselves by cutting format violations and edge-case errors, which are the failures that quietly corrupt a dataset. The token-cost objection to few-shot largely dissolves with prompt caching: the example block is fixed across every document, so it can be cached once and reused at a steep discount over thousands of calls, making the marginal cost of the examples small. The practical path is to prototype zero-shot, measure where it fails on a labeled sample, and add the minimum set of representative examples that fix those specific failure modes, keeping them curated so they do not bias the model toward an unrepresentative pattern. Pair either approach with a schema validator, since neither guarantees structurally valid output on its own.
Try These Tools
Run the numbers next
Token-Cost Optimizer
Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.
Structured Schema Validator for Finance
Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity.
Financial Document Token Estimator
Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across eight frontier LLMs, with cache-hit toggle.
FAQ
Questions people ask next
The short answers readers usually want after the first pass.
Sources & References
- Language Models are Few-Shot Learners — Brown et al., NeurIPS (2020)
- Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? — Min et al., EMNLP (2022)
Related Content
Keep the topic connected
Hallucination Detection
Detecting LLM hallucinations in financial outputs: the verifiable-claim approach, citation grounding, and cross-model agreement signals that work.
Agent-Cost Envelope
The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.
MCP (Model Context Protocol)
Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.
Prompt Injection
Prompt injection: when untrusted text in a prompt overrides system instructions. The attack patterns and the structural defenses that work in production.