Playground

Agent Skill Tester for Markets

Name: Agent Skill Tester for Markets
Author: AI Fin Hub Research

Test Anthropic Agent Skills for market extraction. Paste SKILL.md + sample 10-K excerpt + your key. See output, token cost, latency. Browser-only. Free.

AI Fin Hub Research Published Apr 20, 2026 Methodology Corrections

Inputs: Prompt / input + API key
Runtime: 2–15 s per model call
Privacy: Client-side · no upload
API key: BYO key (Anthropic · OpenAI · Google)
Methodology: Open →

Education · Not investment advice. BaFin/EU framework. Past performance does not indicate future results. Editorial standards Sponsor disclosure Corrections

Trust model · two boundaries

Anthropic call goes direct from your browser toapi.anthropic.com. The API key never reaches aifinhub.
Scoring imports the/engines/agent-skill-tester.jsmodule and scores the prompt + response in your browser. No key, no model id, no network call. Deterministic checks, no LLM.

1 · Config

Anthropic API key

Remember in this browser (localStorage)

Model

2 · Skill definition (SKILL.md)

3 · Sample input document

4 · Pass/fail rubric (one criterion per line)

Recognised: valid_json, has_field:<path>, field_type:<path>:<type>, contains:<text>, regex:<pattern>, no_apology, numbers_grounded_in_prompt, and more.

How to use

Step-by-step

Full calculator guide →

1
Paste your SKILL.md definition into the editor. The schema spec must be valid JSON Schema for strict-mode validation to pass.
2
Paste sample input that matches your skill's input schema. Use a realistic example, not a minimal one.
3
Enter your Anthropic API key. The key stays in browser memory only — not persisted, not logged.
4
Click Run. Watch the structured output, token cost (input + output × current pricing), and end-to-end latency.
5
Re-run several times. Variance in outputs is informative — high variance suggests the prompt is under-constrained.

For agents

Use in an agent

Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.

import { compute } from "https://aifinhub.io/engines/agent-skill-tester.js";

Contract: /contracts/agent-skill-tester.json Full agent guide →

Glossary references

Terms used by this tool

All glossary →

Questions people ask next

FAQ

What's a SKILL.md?

Anthropic's structured-skill spec: a markdown file with a name, description, input/output schema, and worked examples. Skills bundle a small repeatable agent capability (e.g., 'extract a 10-K risk factor') in a portable format. The tester loads your SKILL.md and runs it against sample input.

Why does the tester need my own API key?

Calls go to Anthropic's API directly from your browser. The tool never sees or proxies the key. This keeps cost on your account, makes rate limits predictable, and avoids the privacy issue of routing finance prompts through a third-party proxy. Your key stays in browser memory only — it's not persisted to localStorage.

What does the tester measure?

Three things: structured-output compliance (does the model return valid JSON matching the schema?), token cost (input + output tokens × current pricing), and end-to-end latency. Repeated runs show variance — useful for diagnosing flaky outputs.

Why does my SKILL.md sometimes fail validation?

The tester enforces strict-mode JSON schema validation. Common failures: missing required fields, nested objects with no properties defined, enum values outside the declared list. The validation error shows the exact field path. Real-world agents are more forgiving but also more inconsistent — strict-mode catches issues that bite later.

Can I test multi-turn skills?

The current version is single-turn only — input → output. Multi-turn skill testing (where the model asks clarifying questions or takes multiple steps) is on the roadmap. For now, multi-turn skills need to be tested in a loop programmatically against the API directly.

Related deep dive

All articles →

Read further

Long-form context behind the tool output.

Used in

Decision workflows that use this tool

Goal-driven flows that bundle this tool with adjacent ones.

Audit Your Pipeline
Catch hallucinations, prompt injections, and regression drift before they ship.
Open

Complementary tools

Prompt Regression Tester

Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.

Playgrounds Open

Hallucination Detector

Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.

Playgrounds Open

Prompt Injection Tester

Red-team a finance agent against 24 documented prompt-injection attacks — direct override, role confusion, indirect injection via retrieved content.

Playgrounds Open

1 · Config

2 · Skill definition (SKILL.md)

3 · Sample input document

4 · Pass/fail rubric (one criterion per line)

Step-by-step

Use in an agent

Terms used by this tool

FAQ

Read further

Decision workflows that use this tool

Users of this tool often explore

Prompt Regression Tester

Hallucination Detector

Prompt Injection Tester