Skip to main content
aifinhub

Frequently asked questions

What does the Prompt Regression Tester methodology page document?
How the Prompt Regression Tester compares LLM outputs across providers and scores drift. Source citations, assumption deltas, and as-of dates included. It states the formulas, assumptions, data sources, limitations, and reproducibility steps behind the Prompt Regression Tester, in the Finance category.
When was the Prompt Regression Tester methodology last reviewed?
This methodology was last reviewed on 2026-04-20. The matching tool is at https://aifinhub.io/prompt-regression-tester/.
Does the Prompt Regression Tester run server-side?
No. The Prompt Regression Tester runs entirely in the browser; there is no server component and no headless deterministic engine input, which is why this page does not embed a fixed recompute example.

Methodology · Playground · Last updated 2026-04-20

How Prompt Regression Tester works

How the Prompt Regression Tester tool actually works — assumptions, algorithms, limitations.

What it does

Sends the same prompt to multiple LLM endpoints in parallel with your BYO keys. Renders outputs side-by-side + a pairwise drift matrix.

Drift metric

Jaccard similarity on the 3-gram character set of the two outputs.

A = {all length-3 substrings of output_A (lowercased, whitespace-normalized)}
B = {all length-3 substrings of output_B}
drift(A, B) = 1 − |A ∩ B| / |A ∪ B|

0 = outputs share all trigrams (near-identical text). 1 = zero overlap. This is a cheap syntactic gauge, not a semantic judge. Two paraphrases of the same answer can score surprisingly high drift; an embedding-based similarity is the right next step for semantic equivalence testing.

Parallelization

All enabled targets run in parallel via Promise.all(). The wall-clock latency reported for the batch is the slowest of all calls. Per-target latency is reported on each tile.

API calls

  • Anthropic: POST /v1/messages with anthropic-dangerous-direct-browser-access.
  • OpenAI: POST /v1/chat/completions.
  • Google Gemini: POST /v1beta/models/{model}:generateContent.

Privacy + key handling

  • Each API key stays in React state. Never persisted. Never sent to any origin other than the respective provider's API.
  • Refreshing the page clears all keys.

Limitations

  1. Rate limits + retries. No backoff is implemented. A provider rate-limit error surfaces as a per-target error without retry.
  2. Timeouts. The browser's default fetch timeout applies; very slow responses hang until the browser kills them.
  3. Token counts. Reported directly from each provider. Gemini's token accounting uses SentencePiece, Anthropic uses its own tokenizer, OpenAI uses BPE — raw token counts are not directly comparable.
  4. No cross-run history. Each run starts fresh; there is no "regression against last release" replay mode. Save your prompt + outputs locally for longitudinal comparisons.

External resources

Planning estimates only — not financial, tax, or investment advice.