TL;DR

MCP is the biggest change in how LLMs reach market data in 2026, and nobody has published a security baseline for it. Retail operators wire up execution-scope community servers with API keys that have full trading authority, no idempotency guarantees, no schema typing, and no audit trail. This is a known way to lose money to duplicate fills, prompt-injected tool calls, and silent schema drift. Below: a 7-point rubric the Finance MCP Directory uses to grade servers A–F, and the specific failure modes each criterion addresses.

The problem

Anthropic specified MCP in November 2024. Within 18 months the finance ecosystem produced dozens of servers across four scope categories:

  1. Read-only data access (Polygon, Databento, Tiingo)
  2. Read + write non-trading (portfolio tools, journaling)
  3. Execution-only (order routing)
  4. Full (read + execute)

Every scope beyond #1 carries real money risk. No industry baseline grades these servers. Reddit threads rank them by star count. The grade-A directory entry that works in production is indistinguishable from the grade-F one that looks fine until it retries an order and double-fills.

The 7-point rubric

1. Official-status (+2)

Is the server vendor-maintained or community? Official means:

  • Schema drift fixed at the source when the underlying API changes
  • Security advisories flow from the vendor
  • Contract stability across versions
  • No abandoned-maintainer risk

Community servers are often faster-moving and more creative, but fragile. Grade them community (+0) unless independently audited.

2. Maintenance recency (+0 / +1 / +2)

  • < 30 days since last commit: +2
  • 30–90 days: +1
  • > 90 days: 0

A server last touched 6 months ago on an API that ships breaking changes quarterly is a live bug waiting to trigger.

3. Schema quality (A = +3 … F = −1)

Does the tool's JSON schema precisely describe its inputs and outputs?

  • A: strict types, enums where appropriate, required fields marked, documented error codes
  • B: types present, some ambiguity in optional fields
  • C: loose types, free-string fields that should be enums
  • D / F: schemas generated from TypeScript any-types, tools barely described

Why it matters: LLMs make tool-use decisions based on these schemas. Vague schemas produce hallucinated tool calls with invented arguments. The research-trace shows the agent "confidently" calling submit_order with order_type="limit_to_close" — a value the schema allowed implicitly but the broker doesn't support. Reject.

4. Auth model (+0 / +1 / +2)

  • OAuth 2.0: +2 (scoped tokens, revocable, audit trail at identity provider)
  • API key / bearer: +1 (better than nothing; scope carefully at the vendor dashboard)
  • Hybrid: +1
  • Unauthenticated: +0 (acceptable only for public data; never for execution)

5. Idempotency (execution scope only) (+2 / −1)

For execution or full-scope servers: does the order-submission tool accept a client-supplied idempotency key? If yes: +2. If no: −1 (penalty).

Duplicate fills from retry-on-error are the single most common execution failure in 2026 retail agents. Alpaca MCP V2 requires it. Tradier's community server does not. One of those is deployable; the other is a live bug.

6. License openness (+1)

MIT / Apache-2.0 / BSD-3-Clause: +1. Proprietary / copyleft: 0. For tools that might be embedded in commercial automation, permissive licenses matter.

7. Transport (informational, no score)

  • stdio: runs as a subprocess; isolation by default; hardest to share
  • http-stream: runs remotely; shared easily; needs its own auth layer
  • sse: deprecated in most newer servers
  • stdio+http: both options supported (increasingly the norm)

Not scored because the right choice depends on deployment shape. Noted in the directory for filtering.

Grade bands

Grade Score Interpretation
A ≥ 9 Production-ready. Low friction.
B 7–8 Usable. Audit before live execution.
C 5–6 Workable for research. Execution risky.
D 3–4 Prototype only.
F < 3 Not recommended.

Failure modes the rubric addresses

Duplicate-fill storm. An execution server without idempotency retries a flaky network error. The original order succeeded; the retry resubmits. Both fill. Position is 2× expected. Happens on 1% of retries × thousands of orders across a community = real money at stake. Idempotency catches it.

Prompt-injected tool call. The LLM reads a user-supplied document that contains tool-use fragments. Without tight schemas + scoped auth, the agent makes tool calls the user never authorized. Strict schemas + scoped auth reduce the blast radius when injection does happen.

Silent schema drift. The vendor API adds a required field. Community server schema isn't updated. LLM tool calls start failing at 100% rate. If your observability is weak, you don't notice until a human pokes the system. Vendor-maintained servers (grade A criterion #1) catch this; community servers often don't.

License trap. LGPL or GPL-licensed MCP server code bundled into a commercial product. Distribution triggers license obligations. Permissive (MIT / Apache-2.0) sidesteps this.

Abandoned maintainer. Last commit 11 months ago. Security advisory lands on the underlying API. Nobody updates the server. Your agent keeps running with the known bug. Maintenance recency criterion catches this.

What the directory does

The Finance MCP Directory applies the rubric above to every tracked finance MCP server. Filters by grade, scope, vendor, auth model. Updates quarterly + on-demand for material changes.

As of April 2026:

  • Grade A: Alpaca MCP V2 (official, execution, idempotent), Polygon.io MCP (official, read-only)
  • Grade B: Databento community, NautilusTrader, IBKR CLI MCP, Tiingo community
  • Grade C: Tradier community (no idempotency, older schema)

What this article doesn't cover

  • Live security audits. Grade is structural; a grade-A server can still have a CVE. Do your own code review for any execution-scope server.
  • Latency / throughput benchmarks. Not part of the security rubric. On the roadmap as a separate benchmark tool.
  • Prompt-injection hardening at the LLM layer. Server-side scope limits + strict schemas reduce blast radius; they don't eliminate injection risk. Keep a human in the loop for material actions until you've measured your attack surface.

What we'd do today

For a retail algo setup:

  1. Data: Polygon MCP (grade A) + Databento MCP (grade B with audit) for extra microstructure if needed.
  2. Execution: Alpaca MCP V2 (grade A) or IBKR CLI MCP (grade B with audit, local-only).
  3. Scope the API key: read-only for data keys, execution-scoped (not full-authority) for trading keys.
  4. Enable idempotency: set a client-supplied key on every order; verify the server's support in the schema.
  5. Audit + pin the server version: community servers should be pinned to a reviewed commit, not tracking main.

If that feels paranoid, run your setup unaudited and see how long until a weird fill wakes you up at 3am.

References