Why does the tester need my own API key?

Calls go to Anthropic's API directly from your browser. The tool never sees or proxies the key. This keeps cost on your account, makes rate limits predictable, and avoids the privacy issue of routing finance prompts through a third-party proxy. Your key stays in browser memory only — it's not persisted to localStorage.

What does the tester measure?

Three things: structured-output compliance (does the model return valid JSON matching the schema?), token cost (input + output tokens × current pricing), and end-to-end latency. Repeated runs show variance — useful for diagnosing flaky outputs.

What's a common mistake when using Agent Skill Tester for Markets?

Testing only the happy-path input. Skills break on malformed or adversarial input — feed at least one example with missing or wrong-type fields to see how the skill degrades.

What gets missed when a skill works on the current model?

Ignoring drift across model versions. Re-run the same input against Sonnet 4.5 vs 4.7 — silent output shape changes happen and a regression test catches them before production does.

general Calculator Guide

How to use Agent Skill Tester for Markets

Paste a SKILL.md definition, a sample input, and your Anthropic API key. The page runs the skill in your browser and returns the structured extraction, token cost, and latency — useful for evaluating Claude skills before wiring them into production.

5 STEPSPublished May 12, 2026Live Content

By AI Fin Hub Research · AI Fin Hub Team

Best Next MovePlaygrounds

Agent Skill Tester for Markets

Paste a SKILL.md definition + sample input + your Anthropic API key. See structured extraction, token cost, and latency — all in your browser. No signup.

CalculatorOpen ->

On This Page

Overview 5 steps Scenarios FAQ

What It Does

Use the calculator with intent

Engineers iterating on Claude skill definitions who want fast feedback without standing up a backend or burning CI minutes per change.

Interpreting Results

Validate the extracted output matches your schema (every required field present, types correct). Watch latency — skills with deep context or many tool calls easily push past p95 budgets for interactive use-cases.

Input Steps

Field by field

1

Paste inputs

Paste your SKILL.md definition into the editor. The schema spec must be valid JSON Schema for strict-mode validation to pass.
2

Paste inputs

Paste sample input that matches your skill's input schema. Use a realistic example, not a minimal one.
3

Enter inputs

Enter your Anthropic API key. The key stays in browser memory only — not persisted, not logged.
4

Click

Click Run. Watch the structured output, token cost (input + output × current pricing), and end-to-end latency.
5

Re-run

Re-run several times. Variance in outputs is informative — high variance suggests the prompt is under-constrained.

Common Scenarios

Use realistic starting points

Simple structured extraction skill

Skill input length

~500 tokens

Expected output

JSON with 4 fields

Output validates against the schema, latency under ~3s, cost under one cent per call. Anything else suggests the skill needs trimming.

Multi-step research skill

Skill input length

~3000 tokens

Expected output

Bulleted analysis

Latency 5–15s expected; check token cost against your per-decision budget before scaling.

Try These Tools

Run the numbers next

PlaygroundsCalculator

Prompt Regression Tester

Run the same prompt against multiple models (Claude 4.5/4.6/4.7, GPT-5, Gemini 2.5) with your own keys. Diff outputs, score drift, catch regressions.

Launch toolOpen ->

PlaygroundsCalculator

Hallucination Detector

Paste a source document + an LLM's extraction. Every numeric claim in the output is checked against the source. Client-side. Catches silent fabrication.

Launch toolOpen ->

PlaygroundsCalculator

Structured Schema Validator for Finance

Paste LLM JSON output and validate against four pre-built finance schemas — research output, trade decision, risk snapshot, peer comparison — with sanity.

Launch toolOpen ->

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Anthropic's structured-skill spec: a markdown file with a name, description, input/output schema, and worked examples. Skills bundle a small repeatable agent capability (e.g., 'extract a 10-K risk factor') in a portable format. The tester loads your SKILL.md and runs it against sample input.

Keep the topic connected

AI in Markets1 FAQS

Agent Skill Testing

Agent skill testing: the regression-test discipline for LLM-driven agents. What to test, how to score, and the difference between pass-rate and capability.

Keep readingRead ->

AI in Markets1 FAQS

LLM Hallucination Detection in Finance

How to detect LLM hallucinations in financial outputs: citation grounding, verifiable-claim checks, and cross-model agreement that flag fabricated data.

Keep readingRead ->

AI in Markets1 FAQS

Model Drift

Model drift: when an LLM's behavior changes between calls, versions, or weeks. The monitoring stack that catches it before production breaks.

Keep readingRead ->

Use the calculator with intent

Field by field

Paste inputs

Paste inputs

Enter inputs

Click

Re-run

Use realistic starting points

Simple structured extraction skill

Multi-step research skill

Run the numbers next

Prompt Regression Tester

Hallucination Detector

Structured Schema Validator for Finance

Questions people ask next

Keep the topic connected

Agent Skill Testing

LLM Hallucination Detection in Finance

Model Drift