On the canonical trade-decision schema the Structured Schema Validator (Finance) requires six fields (ticker, side, size_shares, rationale, stop_loss, take_profit). A valid payload passes with pass: true and zero sanity flags. A payload missing rationale/stop_loss/take_profit fails with pass: false and three "required field is missing" entries. A payload with size_shares = 50,000,000 passes hard validation but raises a sanityFlag: "value 50000000 outside plausible band [1 … 1,000,000] shares". A payload with side = "maybe" fails on enum_mismatch with side ∉ [long, short, none]. Strict mode is the only mode that survives contact with an LLM output stream; lenient mode is responsible for most live-trade incidents.

TL;DR

Four payloads against the trade_decision schema:

Payload Pass? What fired
Valid (all six fields) true No sanity flags
Missing rationale / stop_loss / take_profit false 3× "required field is missing"
size_shares = 50,000,000 true sanityFlag: outside band [1, 1,000,000]
side = "maybe", size_shares = "lots" false enum_mismatch + type_mismatch

The schema enforces presence (required fields), type (number vs string), enum (side ∈ {long, short, none}), and soft band (sanity range for size_shares). The fourth check is the load-bearing one for catching LLM substitution failures that pass the first three.

The four required fields that block the worst failures

The trade_decision schema has six fields total; four of them are the audit-defensible minimum for a machine-executable trade intent:

  1. side ∈ {long, short, none}, the LLM cannot emit "maybe" or "trim" or any free-form variant. The enum lock turns ambiguity into a validation failure.
  2. size_shares: number, min 0.00001, the LLM cannot emit "lots" or "a small position" or any string. Type-locking to number prevents downstream parser crashes.
  3. stop_loss: number, min 0, the LLM cannot emit "no stop" or "trail at 5%" without computing the absolute price. A machine-executable stop is a price.
  4. rationale: string, minLength 20, the LLM cannot emit "buy" or "earnings" or any one-word justification. The minimum length forces enough text to audit.

Each of these four fields blocks a class of failure that lenient validation would let through. Without enum lock, a downstream router can pass "maybe" to a broker order endpoint and crash the routing layer. Without type lock, the size field can be a string that the broker rejects with a vague error. Without stop_loss requirement, a position can be sized with no risk control. Without rationale length, an LLM-driven journal becomes uninterpretable after a 6-month gap.

The two non-required fields in the canonical schema (ticker minLength, take_profit) cover the remaining audit gap. Take_profit is required in the engine but is conceptually a target, not a guard; some workflows treat it as optional.

The strict-mode-vs-lenient-mode comparison

Lenient mode would:

  • Accept "maybe" for side, mapping it to "none" implicitly.
  • Accept "lots" for size_shares, defaulting to a configured size.
  • Skip the rationale field if missing, using an LLM-generated placeholder.
  • Accept arbitrary numeric values for stop_loss without sanity-banding.

Each of these is a real engineering choice. Each is a real source of production incidents in retail LLM-driven trading layers. The strict-mode rule set (the engine's default) refuses all four implicit conversions and forces the LLM to emit a clean payload or fail.

The engineering trade-off: strict mode produces more "invalid" responses from the LLM (retry rate rises, cost rises). Lenient mode produces silent acceptance of malformed output (incident rate rises, recovery cost rises). For trade-decision flow, the strict-mode retry cost is small ($0.0302 per call from the Token Cost Optimizer at 8% retry rate); the lenient-mode incident cost is large (one position sized wrong can lose more than a year of saved API spend).

The sanity band: why pass can still mean "warning"

The engine's sanityRange on size_shares is {low: 1, high: 1,000,000, unit: "shares"}. A payload with size_shares = 50,000,000 passes hard validation (it is a positive number, above the min: 0.00001 floor) but triggers a sanity flag.

The sanity flag is the audit reviewer's flag, not the executor's flag. A 50-million-share order on a retail account is implausible; either the LLM hallucinated a position size, or the buyer's account is much larger than the schema's calibration assumes. The sanity flag forces the human reviewer to confirm the order intent before it routes to broker.

The two-tier check (hard validation + soft sanity flag) is the right design. Hard validation rejects malformed payloads at the parser layer. Soft sanity flags raise a human-review event for payloads that are well-formed but suspicious. Skipping either tier creates a hole.

The enum-mismatch failure mode

The side = "maybe" example produces enum_mismatch with the message: "maybe" not in [long, short, none]. The error is informative and the validator refuses to fall through to a default.

This is the load-bearing strict-mode behaviour for tool-output-injection defence. An adversarial input (or a confused LLM) that emits "side: 'cancel-all-positions'" gets the same enum_mismatch failure as "side: 'maybe'". The validator is not interpreting the value; it is checking it against the allow-list. The allow-list approach is the only defence against prompt-injection attacks that try to escalate the trade-decision payload into a higher-privilege action.

The Prompt Injection Tester catches the prompt-side of this; the schema validator catches the output-side. Both together close the loop.

The minimum-length string check

The rationale: minLength 20 catches LLM outputs that emit short or empty justifications. "buy" (3 chars), "earnings" (8 chars), "bullish" (7 chars) all fail. "Entry on thesis-confirming earnings beat" (40 chars) passes.

The 20-character minimum is the engine's default. For audit-trail-bound contexts (BaFin / SEC) a longer minimum (80-120 chars) is more defensible because it forces enough substance to reconstruct the decision context. The schema accepts custom minLength values; tighten it for production deployments.

The string check does not enforce content quality, only length. An LLM can emit 20 characters of random text and pass. The check is a guard against the worst empty-rationale failures, not a substitute for the Hallucination Detector verification of the rationale's claims.

Pairing with the hallucination detector

A defensible LLM-to-trade pipeline runs two validators in sequence:

  1. Structured schema validator, checks the trade_decision JSON is well-formed, fields are present and typed correctly, sanity bands hold.
  2. Hallucination detector, checks any numeric claims in the rationale string match the source data the LLM was conditioned on.

The schema validator alone passes a payload like {ticker: "AAPL", side: "long", size_shares: 250, rationale: "Earnings beat by 15% on $1.85B revenue per source document.", stop_loss: 180, take_profit: 220}. The hallucination detector then checks whether "15% beat" and "$1.85B revenue" actually appear in the source document.

If the source says "Revenue beat by 12% to $1.78B," the hallucination detector flags both numbers as ungrounded. The schema-level check is passed, the content-level check fails. Together the two gates catch both shape and content failures.

Connects to

References

  • SEC. "Books and Records Requirements." sec.gov/divisions/marketreg/mrnote.htm, accessed 2026-05-21. The records-keeping basis for audit-trail design.
  • FINRA. "Rule 4511 Books and Records." finra.org/rules-guidance/rulebooks/finra-rules/4511, accessed 2026-05-21.
  • Anthropic. "Tool use." docs.anthropic.com/en/docs/build-with-claude/tool-use, accessed 2026-05-21. Reference for structured tool-call schemas.
  • OpenAI. "Structured Outputs." platform.openai.com/docs/guides/structured-outputs, accessed 2026-05-21.
  • BaFin. "Statement on Algorithmic Trading." bafin.de/EN/Aufsicht/BoersenMaerkte/Hochfrequenzhandel/, accessed 2026-05-21. The German regulatory basis for algorithmic-trade record requirements.

Verified engine output

Show the recompute-verified inputs and outputs
Valid trade-decision payload (all six fields, synthetic ticker)
Inputs
schema_idtrade_decision
json › tickerSYNTHETIC_A
json › sidelong
json › size_shares50
json › rationaleData-center capex guidance reiterated above consensus, sized for 0.8% portfolio risk against the declared stop.
json › stop_loss145.5
json › take_profit178
Result
passtrue
fields › row 1 › fieldticker
fields › row 1 › statusok
fields › row 1 › messageok
fields › row 2 › fieldside
fields › row 2 › statusok
fields › row 2 › messageok
fields › row 3 › fieldsize_shares
fields › row 3 › statusok
fields › row 3 › messageok
fields › row 4 › fieldrationale
fields › row 4 › statusok
fields › row 4 › messageok
fields › row 5 › fieldstop_loss
fields › row 5 › statusok
fields › row 5 › messageok
fields › row 6 › fieldtake_profit
fields › row 6 › statusok
fields › row 6 › messageok

Computed live at build time.

Frequently asked questions

Why is take_profit a required field — what if I don't want a take-profit on every trade?
The audit-defensible contract includes both worst-case and best-case exit prices. Set take_profit to a numerically-large valid number for open-ended trades; the validator requires presence, not tightness.
Can I customize the schema beyond the canonical four?
Not in this engine. The four canonical schemas are baked in; for custom schemas use a generic JSON-schema validator like ajv, trading off the finance-specific sanity bands.
What happens if the LLM emits valid JSON but with extra fields?
The validator does not reject extra fields; it enforces required fields but not closed-world. Extra fields are dropped by downstream consumers.
Should I block trades on any failed validation?
Yes on pass: false. For sanity flags (pass: true with warnings), route to human review instead of blocking. The engine makes this distinction explicit.
How do I integrate this with my agent framework?
Wrap the engine as a tool the agent calls. The agent passes the proposed trade payload, the validator returns the verdict, the agent submits on pass or fixes on fail.