What does the Agent Skill Tester methodology page document?

How the Agent Skill Tester calls the Anthropic API, scores outputs, and handles your API key. Source citations, assumption deltas, and as-of dates included. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Agent Skill Tester, in the Finance category.

When was the Agent Skill Tester methodology last reviewed?

This methodology was last reviewed on 2026-04-20. The matching tool is at https://aifinhub.io/agent-skill-tester/.

Does the Agent Skill Tester run server-side?

No. The Agent Skill Tester runs entirely in the browser; there is no server component and no headless deterministic engine input, which is why this page does not embed a fixed recompute example.

Methodology: Agent Skill Tester — AI Fin Hub Research

What it does

Sends your SKILL.md as the system prompt and your sample input as the user message to Anthropic's Messages API. Returns the model's output alongside measured latency, reported input/output token counts, and a computed cost estimate.

API call

POST https://api.anthropic.com/v1/messages
x-api-key: {your key}
anthropic-version: 2023-06-01
anthropic-dangerous-direct-browser-access: true
{
  "model": "{selected model}",
  "max_tokens": 1024,
  "temperature": 0,
  "system": "{SKILL.md contents}",
  "messages": [{ "role": "user", "content": "{input}" }]
}

Cost estimate

Cost is computed from the usage field in the API response:

cost = input_tokens × price_in + output_tokens × price_out
       (pricing from /methodology/token-cost-optimizer/)

Privacy + key handling

Your API key is kept only in React state. It is never persisted to localStorage, sessionStorage, cookies, or any server.
The only network call made with your key is directly to api.anthropic.com.
Refreshing the page clears the key; you must re-enter it on next use.
For automated / scheduled use, run the same SKILL.md from your own scripts with scoped rate-limited keys rather than pasting a full-authority key into a browser tool.

Limitations

Anthropic-only. This tool calls Anthropic's API exclusively. For cross-model comparison, use the Prompt Regression Tester.
No caching. Anthropic prompt caching is disabled in this tool; each call is billed at full input token rate.
Single-shot. No tool-use, no multi-turn. The skill runs once per click.
Model IDs drift. If Anthropic retires a model ID listed in the selector, the request will 404. Update the model list as needed; contact /about/ for corrections.

Frequently asked questions

How Agent Skill Tester works

What it does

API call

Cost estimate

Privacy + key handling

Limitations

External resources