aifinhub

Methodology · Playground · Last updated 2026-04-20

How Agent Skill Tester works

How the Agent Skill Tester tool actually works — assumptions, algorithms, limitations.

What it does

Sends your SKILL.md as the system prompt and your sample input as the user message to Anthropic's Messages API. Returns the model's output alongside measured latency, reported input/output token counts, and a computed cost estimate.

API call

POST https://api.anthropic.com/v1/messages
x-api-key: {your key}
anthropic-version: 2023-06-01
anthropic-dangerous-direct-browser-access: true
{
  "model": "{selected model}",
  "max_tokens": 1024,
  "temperature": 0,
  "system": "{SKILL.md contents}",
  "messages": [{ "role": "user", "content": "{input}" }]
}

Cost estimate

Cost is computed from the usage field in the API response:

cost = input_tokens × price_in + output_tokens × price_out
       (pricing from /methodology/token-cost-optimizer/)

Privacy + key handling

  • Your API key is kept only in React state. It is never persisted to localStorage, sessionStorage, cookies, or any server.
  • The only network call made with your key is directly to api.anthropic.com.
  • Refreshing the page clears the key; you must re-enter it on next use.
  • For automated / scheduled use, run the same SKILL.md from your own scripts with scoped rate-limited keys rather than pasting a full-authority key into a browser tool.

Limitations

  1. Anthropic-only. This tool calls Anthropic's API exclusively. For cross-model comparison, use the Prompt Regression Tester.
  2. No caching. Anthropic prompt caching is disabled in this tool; each call is billed at full input token rate.
  3. Single-shot. No tool-use, no multi-turn. The skill runs once per click.
  4. Model IDs drift. If Anthropic retires a model ID listed in the selector, the request will 404. Update the model list as needed; contact /about/ for corrections.
Planning estimates only — not financial, tax, or investment advice.