A prompt is a stochastic specification of what the LLM should do. A contract is a deterministic one. Moving an LLM skill from prompt to contract, typed inputs, typed outputs, runtime invariants, automated verification, is the highest-yield methodology change in LLM finance work in 2026. The shift turns "the LLM tends to produce JSON when asked nicely" into "the LLM produces JSON or the call fails; the verifier catches every defect." Skills built as contracts compose into agents; skills built as prompts compose into liability. The methodology below specifies what a contract-form skill looks like and how to migrate from prompt-form.

TL;DR

Three properties separate a contract-form skill from a prompt-form skill:

Property Prompt-form Contract-form
Input shape Implicit, natural language Typed JSON schema
Output shape "Probably JSON if asked" Typed JSON schema, strict-mode validated
Runtime invariants Trust the LLM Verifier function catches violations

A contract-form skill survives prompt drift, model upgrades, and adversarial inputs without silent failure. A prompt-form skill does none of these. The migration path is to add the schema gates first, then refactor the prompt to fit the gates.

Why prompts are stochastic specifications

A prompt instructs the LLM in natural language: "Extract revenue, net income, and EPS from this 10-K excerpt; return JSON with these three fields." The LLM usually returns JSON with those three fields. It sometimes returns a fourth field. It sometimes returns prose around the JSON. It sometimes returns a different schema. It sometimes returns a string where a number is expected.

The "usually" is the load-bearing flaw. Production systems that consume LLM output downstream cannot tolerate "usually." They need "always" or "fails loudly." A prompt that produces the right output 95% of the time and silently corrupts the other 5% is worse than a prompt that produces correct output 90% of the time and fails loudly on the other 10%, the silent-corruption failure mode compounds undetectably; the loud-failure mode is handled.

The Toolformer paper1 documents the same insight from the model-training side: LLMs that learn to call typed external tools are more accurate than LLMs that emit free-form text containing the answer. Typed tool calls force the model into a contract-shaped output space. The same logic applies at the application layer.

What a contract-form skill looks like

A contract-form skill has four artefacts:

  1. Input schema. A JSON-schema (draft 2020-12) specification of the input the skill accepts. Includes types, required fields, sanity ranges, enum values. The skill rejects inputs that do not validate.
  2. Output schema. The same shape for the skill's output. Strict-mode validation: the LLM's output is parsed against this schema; non-conforming output is treated as failure, not "best effort."
  3. Invariant function. A pure function that checks runtime properties of the (input, output) pair. Example: "if the input mentions a revenue number, the output's revenue field must equal that number." Invariants catch semantic errors that schema validation passes.
  4. Verification harness. A test set of (input, expected-output) pairs that runs on every prompt change. The harness gates deployment: prompt changes that drop the verification pass rate below threshold do not deploy.

The skill is then defined by these four artefacts together. The LLM prompt is an implementation detail; the contract is the API. A future model can replace the LLM call entirely (e.g., a fine-tuned classifier) without changing the contract.

The migration path: schema-first

The cheapest migration from prompt-form to contract-form is schema-first:

Step 1: define the output schema. Look at the prompt's current output examples. Write the schema that describes them. Pick types deliberately, string vs enum, number vs string-containing-a-number, optional vs required. The Structured Schema Validator (Finance) is a reference for finance-domain schemas.

Step 2: wrap the prompt's output in the validator. Every call to the prompt now passes its output through the schema validator. Failures are logged. For the first 100-1,000 calls, the failure rate is the baseline drift; analyze the failures to identify whether the prompt or the schema needs adjustment.

Step 3: define invariants. Beyond the schema, what does the output need to be internally consistent with? Examples: "the rationale field must mention every ticker in the tickers field"; "the position-size field must be ≤ 1% of the bankroll input." Write these as pure functions; route LLM outputs through them.

Step 4: refactor the prompt. With the schema and invariants in place, refactor the prompt to make conforming output more likely. Add examples that show the schema-shape; add explicit reminders about the invariants. The prompt is now downstream of the contract, not upstream.

Step 5: build the verification harness. 50-200 (input, expected-output) pairs. Run on every prompt change. Gate deployment on the pass rate.

The migration is iterative, each step makes the skill more contract-shaped. The total time is 1-3 days for a single skill. The payoff compounds: every downstream consumer of the skill can trust its output shape and invariants without re-validating.

What contracts cannot do

Contracts cannot make a fundamentally-wrong LLM right. If the LLM systematically picks the wrong revenue number from a 10-K because it cannot read the table, the schema validates and the invariants pass, but the output is still wrong. Contracts catch structural errors and internally-inconsistent errors; they do not catch factually-wrong errors.

The complementary tool for factual error is the Hallucination Detector, which checks the output against the source. The two layers (contract validation + content validation) together catch most of the meaningful failure modes. Skipping either creates a hole.

Contracts also cannot specify creativity. A skill whose value comes from open-ended generation (idea-generation, hypothesis-building) does not benefit from contract-form constraints. The right scope for contract-form skills is the structured portion of a workflow, extraction, classification, decision-making, not the exploratory portion.

Anthropic's Constitutional AI methodology2 addresses a third class of failure (the LLM produces harmful or off-policy output) that contracts cannot catch. The three layers, contract validation, content validation, constitutional/policy filtering, together approximate a safe, structured, factual LLM-driven workflow.

The composability dividend

A contract-form skill composes with other contract-form skills. Skill A's output is Skill B's input; the schemas connect at the API boundary. The pipeline as a whole is type-safe; the auditor can trace data flow without inspecting prompts.

Prompt-form skills do not compose. The output of prompt-form Skill A is unstructured text; Skill B's prompt has to re-parse it. Errors propagate; the pipeline becomes a chain of stochastic transformations.

For an LLM-driven research-to-trade pipeline, the composability dividend is substantial. A pipeline of 5 contract-form skills has a single end-to-end schema; the pipeline can be validated in one pass. The same pipeline built from prompt-form skills has 5 ad-hoc parser layers, each a source of silent failure.

When prompts are still the right shape

For exploratory work, first-prototype hypothesis-generation, one-off analyses, throwaway-script reasoning — prompts are the right shape. Building the four contract artefacts is overkill for a skill that will be discarded next week.

The transition is workload-dependent. A skill that runs in production for more than a month, that has downstream consumers, that touches real capital or real publication, should be contract-form. A skill that runs once or twice for exploratory purposes can stay prompt-form.

The trap is leaving an "exploratory" prompt in production. The exploratory skill that doesn't get the contract treatment but does end up running every day is the source of most retail LLM-driven incident reports. The discipline is: if it's in launchd, it should be contract-form.

The JSON-schema standard

The contract-form skill writes its schemas in JSON-schema (draft 2020-12)3. This is the de-facto standard, supported by:

  • Anthropic's tool-use schema definitions (compatible subset).
  • OpenAI's structured-outputs feature (compatible subset).
  • Most JSON-validation libraries across languages (ajv, jsonschema, etc.).

Avoid custom schema formats. The portability cost of a non-standard schema (porting between models, between languages, between validation libraries) is large; the apparent simplicity of a custom format is not worth it.

OpenAI's structured outputs documentation4 specifies the model-side enforcement: the model is constrained at the decoding level to produce schema-conforming output. The same level of enforcement is not available on every vendor; the application-level schema validator is the portable defence.

Connects to

References

  • Anthropic. "Tool use." docs.anthropic.com/en/docs/build-with-claude/tool-use, accessed 2026-05-21. Reference for Anthropic's typed tool-call schemas.
  • Google. "Gemini API: Schema and structured output." ai.google.dev/gemini-api/docs/structured-output, accessed 2026-05-21.
  • OECD. "OECD AI Principles." https://oecd.ai/en/ai-principles. Reference for the policy layer above contract-form skills.

Footnotes

  1. Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools." Meta AI Research. https://arxiv.org/abs/2302.04761

  2. Bai, Y., Kadavath, S., Kundu, S., et al. (2022). "Constitutional AI: Harmlessness from AI Feedback." Anthropic Research. https://arxiv.org/abs/2212.08073

  3. JSON Schema. "Specification: 2020-12." json-schema.org. https://json-schema.org/draft/2020-12/schema

  4. OpenAI. "Structured Outputs." platform.openai.com/docs/guides/structured-outputs, accessed 2026-05-21. https://platform.openai.com/docs/guides/structured-outputs

Frequently asked questions

Should I migrate every existing prompt to contract-form?
No. Migrate the ones that run in production, that have downstream consumers, or that touch real capital. Exploratory prompts can stay prompt-form. Budget 1-3 days per skill.
What if the LLM refuses to produce schema-conforming output reliably?
Signal the schema is too strict or the prompt is wrong. Relax required fields first, then re-tighten one at a time. If the LLM still refuses, the schema and LLM capabilities are mismatched.
How do invariants differ from schemas?
Schemas constrain shape (types, required fields, ranges). Invariants constrain relationships between fields (e.g., stop_loss < take_profit). Schema validates fields independently; invariant validates the combination.
Can I use TypeScript types instead of JSON schema?
Yes for development-time checking, no for runtime enforcement. TypeScript compiles away; JSON schema lives at runtime. Use both — TypeScript for IDE safety, JSON schema for runtime gating.
How does this interact with vendor-specific structured-output features?
Use them. Anthropic tool-use, OpenAI structured outputs, Google schema-constrained generation provide model-level enforcement. The application-level validator is the safety net.