Can over-constraining the schema cause problems?

Yes. When a schema marks fields as required, the model is pushed to supply a value even for fields a particular document does not contain, which can produce confident fabrications rather than an honest omission. The fix is to design the schema to reflect reality: make fields optional or nullable where the data may be absent, allow an explicit not-found value, and avoid forcing the model to invent data. A validator alone cannot catch a plausible-but-wrong value, so schema design matters as much as enforcement.

Why still validate output if strict mode enforces the schema?

Strict structured-output modes guarantee structural conformance, types and required fields, but not semantic correctness: a value can be the right type and still be wrong, such as a number extracted from the wrong line of a filing. Validation should therefore extend beyond schema checks to sanity rules, like range and cross-field consistency checks, and ideally source grounding that ties each extracted number back to the document. Enforcement handles structure; you still own correctness.

AI in Markets Comparison

JSON Mode vs Tool Calling for Extraction

Getting structured data out of an LLM means choosing how to constrain the output. Plain JSON mode tells the model to produce valid JSON, which solves syntax errors but not whether the fields, types, and required keys match what you need. Tool calling defines a function with a typed parameter schema; the model fills in arguments, which the platform can validate against that schema. The line between them has blurred as providers add strict structured-output modes that guarantee schema conformance. The real question is how much of the burden of producing correct, schema-valid output the platform carries versus your validation code. This matrix compares them for extraction.

6 CRITERIAPublished May 26, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

On This Page

Options 6 criteria Verdict FAQ

JSON Mode Option

Instructs the model to return syntactically valid JSON. Guarantees parseable output but, in its basic form, not conformance to a particular schema.

Pros

Guarantees the output parses as JSON, eliminating a whole class of syntax errors
Simple to invoke and lightweight, with no function or tool schema to define
Flexible when the exact shape can vary and any valid JSON object is acceptable
Lower prompt overhead than passing a full tool schema for simple cases

Cons

Basic JSON mode does not enforce your field names, types, or required keys
The model can return valid JSON that is structurally wrong for your schema
You still need a downstream schema validator to catch missing or mistyped fields
Multi-field consistency and enums are not guaranteed without extra constraints

Cases where any syntactically valid JSON suffices, simple shapes, and quick prototypes before a strict schema is needed

Tool Calling (Function Calling) Option

Defines a function with a typed parameter schema that the model populates. Strict or structured variants enforce the schema, so arguments must match types and required fields.

Pros

Enforces a typed schema: field names, types, required keys, and enums are constrained
Strict structured-output modes guarantee the output conforms, removing most validation failures
Natural fit for multi-field extraction and for routing to multiple typed operations
Self-documenting, since the schema is the contract the model and your code both agree on

Cons

More setup: you must define and maintain the function schema
Schema tokens add to the prompt, though they are stable and cacheable
Over-constraining can cause the model to force-fit data into fields that do not apply
Behavior and strictness guarantees vary across providers and model versions

Reliable multi-field extraction, strict schema conformance, and any pipeline that treats the output as a typed contract

Decision Table

See the tradeoffs side by side

Criterion	JSON Mode	Tool Calling (Function Calling)
Guarantees valid JSON	Yes	Yes
Enforces your schema	No, basic mode	Yes, with strict mode
Field types and enums	Not constrained	Constrained
Setup effort	Low	Higher, define schema
Multi-field control	Manual	Native
Downstream validation	Still required	Largely handled by the platform

Verdict

The deciding question is whether you need the output to satisfy a specific schema or merely to be parseable JSON. If any valid JSON object will do, plain JSON mode is the lighter choice and a fine prototype. But filing extraction almost always has a real schema, named fields with types, required keys, and enumerated values, and basic JSON mode does not enforce that: it happily returns valid JSON that is structurally wrong for your needs, leaving the burden on your validator. Tool calling, especially in a strict or structured-output mode, makes the schema the contract the platform enforces, which removes most structural failures at the source and is the natural fit for multi-field extraction. The caveat is that over-constraining can push the model to fabricate values for fields that do not apply to a given document, so design schemas with optional and nullable fields and still run a validator as a backstop. In short: use tool calling or strict structured output for production extraction, and reserve plain JSON mode for cases where the shape genuinely does not need to be pinned down.

Try These Tools

Run the numbers next

PlaygroundsCalculator

Structured Schema Validator for Finance

Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity.

Launch toolOpen ->

CalculatorsCalculator

Token-Cost Optimizer

Compute the dollar cost of a trading research loop across Claude, GPT, and Gemini. Prompt length × model × retry × call volume → cost per idea and per.

Launch toolOpen ->

PlaygroundsCalculator

Hallucination Detector

Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Basic JSON mode guarantees only that the output is syntactically valid JSON, not that it matches your field names, types, or required keys. The model can return a perfectly parseable object that omits a required field, uses the wrong type, or invents a key. To guarantee schema conformance you either use a tool-calling or structured-output mode that enforces a schema, or you keep JSON mode and add a strict downstream validator that rejects nonconforming output and triggers a retry.

Sources & References

Introducing Structured Outputs in the API — OpenAI (2024)
Tool Use (Function Calling) Documentation — Anthropic

Keep the topic connected

AI in Markets2 FAQS

MCP (Model Context Protocol)

Model Context Protocol: Anthropic's open standard for letting LLMs discover and call tools — the interface, why it matters, and finance MCP server checks.

Keep readingRead ->

AI in Markets1 FAQS

LLM Hallucination Detection in Finance

How to detect LLM hallucinations in financial outputs: citation grounding, verifiable-claim checks, and cross-model agreement that flag fabricated data.

Keep readingRead ->

AI in Markets1 FAQS

Prompt Injection

Prompt injection: when untrusted text in a prompt overrides system instructions. The attack patterns and the structural defenses that work in production.

Keep readingRead ->

AI in Markets1 FAQS

Agent-Cost Envelope

The agent-cost envelope: the loop of (calls × tokens × retries × model_price) that determines the dollar cost of an LLM-driven trading agent per decision.

Keep readingRead ->