Does prompt caching really make few-shot cheap?

For high-volume extraction, largely yes. The few-shot example block is identical across every document you process, so it qualifies as a cacheable prefix. Once cached, subsequent calls pay a reduced rate for those tokens rather than the full input price, so the per-document overhead of the examples shrinks dramatically as volume grows. The variable part, the target document, is not cacheable, but the fixed instruction-plus-example prefix is exactly the kind of stable content caching is designed to amortize.

Can few-shot examples hurt accuracy?

Yes, if they are unrepresentative or biased. Examples teach the model a pattern, so if they all show one variant of a field, the model may wrongly impose that variant on documents that differ, a form of overfitting to the demonstrations. Examples drawn from a narrow slice of your corpus can also steer the model away from cases they do not cover. The fix is to curate examples that span the real variety of your documents and to re-check accuracy on a held-out labeled sample after changing them.

AI in Markets Comparison

Zero-Shot vs Few-Shot Extraction

When extracting structured data from filings with an LLM, you choose how much to show the model before the target document. Zero-shot gives instructions and the document, trusting the model's general capability. Few-shot prepends a handful of input-output examples that demonstrate exactly the format and edge-case handling you want. More examples cost more input tokens per call but steer the model toward consistent, correct output. With modern prompt caching, that fixed example block can be cached and reused across thousands of documents, which changes the economics. This matrix compares the two for production extraction.

6 CRITERIAPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

On This Page

Options 6 criteria Verdict FAQ

Zero-Shot Prompting Option

Provides only task instructions and the target document, with no examples. Relies entirely on the model's general ability to follow the spec.

Pros

Cheapest per call, since the prompt carries no example tokens
Simplest to write and maintain, with nothing to curate but the instructions
Fast to iterate when the task is easy and the model already handles it well
No risk of examples biasing the model toward an unrepresentative pattern

Cons

Less consistent output format, often needing a schema validator to catch deviations
Weaker on edge cases and ambiguous fields the instructions cannot fully specify
More sensitive to instruction wording, so small prompt changes can swing results
Higher error rate on nuanced extraction, where a demonstration would clarify intent

Simple, well-specified extractions, rapid prototyping, and capable models on tasks they already do reliably

Few-Shot Prompting Option

Prepends several input-output examples that demonstrate the desired format and edge-case handling before the target document.

Pros

Markedly better format adherence, since examples show the exact output shape expected
Higher accuracy on edge cases the examples cover, reducing ambiguous-field errors
More robust to instruction wording, because the examples anchor the behavior
The fixed example block is ideal for prompt caching, amortizing its cost across many calls

Cons

More input tokens per call before caching, raising cost and latency
Examples must be curated and kept representative, or they bias the model wrongly
Poorly chosen examples can overfit the model to a narrow pattern and hurt generalization
Longer prompts can crowd a small context window on very large documents

High-volume extraction needing consistent format, nuanced fields, and pipelines that can cache the example block across calls

Decision Table

See the tradeoffs side by side

Criterion	Zero-Shot Prompting	Few-Shot Prompting
Examples in prompt	None	A handful, curated
Format consistency	Lower	Higher
Edge-case accuracy	Weaker	Stronger on covered cases
Tokens per call	Fewer	More, before caching
Prompt-cache fit	Less to cache	Fixed block caches well
Maintenance	Instructions only	Curate and refresh examples

Verdict

For one-off or simple, well-specified extractions, zero-shot is the right starting point: it is cheap, fast, and a capable model often handles a clear task without demonstrations. The moment you move to high-volume production extraction with nuanced fields and a strict output schema, few-shot examples usually pay for themselves by cutting format violations and edge-case errors, which are the failures that quietly corrupt a dataset. The token-cost objection to few-shot largely dissolves with prompt caching: the example block is fixed across every document, so it can be cached once and reused at a steep discount over thousands of calls, making the marginal cost of the examples small. The practical path is to prototype zero-shot, measure where it fails on a labeled sample, and add the minimum set of representative examples that fix those specific failure modes, keeping them curated so they do not bias the model toward an unrepresentative pattern. Pair either approach with a schema validator, since neither guarantees structurally valid output on its own.

Try These Tools

Run the numbers next

CalculatorsCalculator

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

Launch toolOpen ->

PlaygroundsCalculator

Structured Schema Validator for Finance

Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity.

Launch toolOpen ->

CalculatorsCalculator

Financial Document Token Estimator

Paste a 10-K, 10-Q, 8-K or earnings transcript and see token count + one-pass extraction cost across ten frontier LLMs, with cache-hit toggle.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Usually a small number, often two to five, is enough to lock in format and demonstrate the main edge cases; the gains diminish quickly and very long example blocks can crowd the context and even confuse the model. The right count is empirical: add examples that target your observed failure modes and stop when accuracy on a labeled sample plateaus. Quality and representativeness of the examples matter far more than quantity, since a few well-chosen demonstrations beat many redundant ones.

Sources & References

Language Models are Few-Shot Learners — Brown et al., NeurIPS (2020)
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? — Min et al., EMNLP (2022)

Keep the topic connected

AI in Markets1 FAQS

LLM Hallucination Detection in Finance

How to detect LLM hallucinations in financial outputs: citation grounding, verifiable-claim checks, and cross-model agreement that flag fabricated data.

Keep readingRead ->

AI in Markets1 FAQS

Agent-Cost Envelope

The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.

Keep readingRead ->

AI in Markets2 FAQS

MCP (Model Context Protocol)

Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.

Keep readingRead ->

AI in Markets1 FAQS

Prompt Injection

Prompt injection: when untrusted text in a prompt overrides system instructions. The attack patterns and the structural defenses that work in production.

Keep readingRead ->