Playground
Agent Skill Tester for Markets
Test Anthropic Agent Skills for market extraction. Paste SKILL.md + sample 10-K excerpt + your key. See output, token cost, latency. Browser-only. Free.
- Inputs
- Prompt / input + API key
- Runtime
- 2–15 s per model call
- Privacy
- Client-side · no upload
- API key
- BYO key (Anthropic · OpenAI · Google)
- Methodology
- Open →
Trust model · two boundaries
- Anthropic call goes direct from your browser to
api.anthropic.com. The API key never reaches aifinhub. - Scoring imports the
/engines/agent-skill-tester.jsmodule and scores the prompt + response in your browser. No key, no model id, no network call. Deterministic checks, no LLM.
1 · Config
2 · Skill definition (SKILL.md)
3 · Sample input document
4 · Pass/fail rubric (one criterion per line)
Recognised: valid_json, has_field:<path>, field_type:<path>:<type>, contains:<text>, regex:<pattern>, no_apology, numbers_grounded_in_prompt, and more.
How to use
Step-by-step
- 1
Paste your SKILL.md definition into the editor. The schema spec must be valid JSON Schema for strict-mode validation to pass.
- 2
Paste sample input that matches your skill's input schema. Use a realistic example, not a minimal one.
- 3
Enter your Anthropic API key. The key stays in browser memory only — not persisted, not logged.
- 4
Click Run. Watch the structured output, token cost (input + output × current pricing), and end-to-end latency.
- 5
Re-run several times. Variance in outputs is informative — high variance suggests the prompt is under-constrained.
For agents
Use in an agent
Same math, same result shape as the UI above — as a static ES module. No HTTP request, no auth, no rate limit.
import { compute } from "https://aifinhub.io/engines/agent-skill-tester.js"; Contract: /contracts/agent-skill-tester.json Full agent guide →
Glossary references
Terms used by this tool
Questions people ask next
FAQ
What's a SKILL.md?
Anthropic's structured-skill spec: a markdown file with a name, description, input/output schema, and worked examples. Skills bundle a small repeatable agent capability (e.g., 'extract a 10-K risk factor') in a portable format. The tester loads your SKILL.md and runs it against sample input.
Why does the tester need my own API key?
Calls go to Anthropic's API directly from your browser. The tool never sees or proxies the key. This keeps cost on your account, makes rate limits predictable, and avoids the privacy issue of routing finance prompts through a third-party proxy. Your key stays in browser memory only — it's not persisted to localStorage.
What does the tester measure?
Three things: structured-output compliance (does the model return valid JSON matching the schema?), token cost (input + output tokens × current pricing), and end-to-end latency. Repeated runs show variance — useful for diagnosing flaky outputs.
Why does my SKILL.md sometimes fail validation?
The tester enforces strict-mode JSON schema validation. Common failures: missing required fields, nested objects with no properties defined, enum values outside the declared list. The validation error shows the exact field path. Real-world agents are more forgiving but also more inconsistent — strict-mode catches issues that bite later.
Can I test multi-turn skills?
The current version is single-turn only — input → output. Multi-turn skill testing (where the model asks clarifying questions or takes multiple steps) is on the roadmap. For now, multi-turn skills need to be tested in a loop programmatically against the API directly.
Related deep dive
All articles →Read further
Long-form context behind the tool output.
- Methodology · Opinion·9 min
The 8-Step LLM Research Prompt Template
Free-form prompts yield uncalibrated LLM output. An 8-step template makes research reproducible and better-calibrated across model versions.
Read - Tutorial · Runnable·9 min
LLM Prompt Patterns for 10-K and 8-K Extraction
Three structured patterns for auditable 10-K extractions: field-by-field JSON, citation-required verbatim quotes, and contradiction-triangle cross-check.
Read - Tutorial · Runnable·10 min
Options Greeks for LLM-Driven Trading
Options Greeks for LLM-driven trading: delta, gamma, theta, vega, rho — what each costs, three rules, plus a prompt template for multi-leg positions.
Read
Used in
Decision workflows that use this tool
Goal-driven flows that bundle this tool with adjacent ones.
Complementary tools
Users of this tool often explore
Prompt Regression Tester
Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.
Hallucination Detector
Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.
Prompt Injection Tester
Red-team a finance agent against 24 documented prompt-injection attacks — direct override, role confusion, indirect injection via retrieved content.