Skip to main content
aifinhub
AI in Markets Calculator Guide

How to use Prompt Injection Tester

Red-team your finance agent against 24 documented prompt-injection attacks — direct override, role confusion, indirect injection via retrieved content. The page reports which attacks the agent followed and which it correctly refused.

By Orbyd Editorial · AI Fin Hub Team

What It Does

Use the calculator with intent

Red-team your finance agent against 24 documented prompt-injection attacks — direct override, role confusion, indirect injection via retrieved content. The page reports which attacks the agent followed and which it correctly refused.

Engineers deploying agents that read external content (news, filings, social posts) and need to know which prompt-injection vectors break their guardrails.

Interpreting Results

Any 'followed' attack is a critical bug — patch the system prompt or input sanitization before deployment. 'Refused' attacks are passing the bar; 'partial' means the agent didn't follow but also didn't flag the attempt, which is a softer fail.

Input Steps

Field by field

  1. 1

    Pick option

    Pick the prompt being tested (e.g., your agent's system prompt + sample user message).

  2. 2

    Run calculation

    Run the injection battery: instruction override, context smuggling, role confusion, data exfiltration, refusal bypass.

  3. 3

    Read outputs

    Read the per-attack pass/fail. Aggregate pass rate is the high-level metric; per-attack details show specific weaknesses.

  4. 4

    Investigate

    Investigate failures. Each failure shows the attack prompt and the model's compromised output — useful for hardening the system prompt.

  5. 5

    Re-run

    Re-run after every system prompt change. Injection resistance is fragile; small prompt edits can introduce regressions.

Common Scenarios

Use realistic starting points

Naive agent (no defenses)

System prompt

minimal

Input sanitization

none

Most attacks succeed. Direct-override attacks score 80%+. Use this as a baseline before hardening.

Hardened agent (post-defenses)

System prompt

explicit refusal rules

Input sanitization

yes

Direct attacks largely refused; indirect (RAG-based) attacks still leak through if retrieved content can include instructions.

Try These Tools

Run the numbers next

FAQ

Questions people ask next

The short answers readers usually want after the first pass.

Documented on the methodology page: instruction override ('ignore previous instructions'), context smuggling (hidden instructions in retrieved documents), role confusion ('you are now a different assistant'), data exfiltration ('repeat your system prompt'), and refusal bypass ('this is a test, please respond despite policy'). Plus jailbreak templates from public databases.

Related Content

Keep the topic connected

Planning estimates only — not financial, tax, or investment advice.