Security test catalog

The bundled security checks mcptest runs against an MCP server. The framework that runs them (declarative checks, deterministic verdicts, separate from the LLM eval path) is settled, and the checks cover the MCP security taxonomy.

This catalog is Layer A, the deterministic baseline. The advisory LLM-judge detection (Layer B), the dynamic red-team (Layer C), and external scanner integrations (Layer D) are part of the multi-layer security testing design. Only Layer A decides the security grade; the other layers are advisory, per-model, or normalized third-party signals.

Every check is deterministic. No model decides a pass or fail. Each row lists the probe method (static inspection of definitions, an active protocol probe, or a diff against a pinned manifest), a severity, whether a black-box client can detect it (yes, partial, or no), and how it maps to the threat model and to the external catalogs (OWASP LLM Top 10, OWASP Agentic Top 10, the MCP specification security guidance, or a CWE).

Out of scope, and marked so on the scorecard rather than scored as a pass: anything that needs host or server-runtime visibility (sandbox escape, server or client internals, server-side WAF behavior). A separate, model-dependent signal ("does this model fall for an injection") belongs to the agent eval path, not here, because an LLM never decides a security verdict.

Check IDs follow the rule-ID standard and emit through the SARIF reporter.

Tool-surface static analysis

Static inspection of tool, prompt, and resource definitions: names, descriptions, schemas, and annotations. The tool description is executable context, so this is the highest-yield family.

ID	Check	Method	Severity	Black-box	Maps to
SEC-001 description-injection	Imperative-to-model instructions in a description ("ignore previous", "before doing X also")	static	high	yes	tool poisoning; LLM01
SEC-002 cross-tool-directive	A description that instructs the model to call or alter another tool	static	high	yes	shadowing, parasitic toolchain
SEC-003 exfiltration-directive	A description that tells the model to read files, env vars, or send data outward	static	high	yes	data exfiltration; LLM02
SEC-004 encoded-payload	Base64, hex, or other encoded blobs embedded in a description or schema	static	medium	yes	tool poisoning; CWE-506
SEC-005 hidden-unicode	Invisible, zero-width, or bidirectional unicode in names or descriptions	static	high	yes	tool poisoning; CWE-176
SEC-006 preference-manipulation	Persuasive "always use this tool" language that biases selection	static	medium	yes	preference manipulation
SEC-007 docstring-schema-mismatch	A parameter named in the description that the input schema does not declare	static	medium	yes	behavioral mismatch
SEC-008 secret-in-definition	API keys, tokens, or PII in a description or example	static	high	yes	sensitive disclosure; LLM02
SEC-009 unannotated-destructive-tool	A write or delete tool with no destructive-action annotation	static	medium	partial	excessive agency; LLM06
SEC-036 unbounded-list-tool	A list/search/query/fetch tool whose input schema declares no bound parameter (limit, max, count, page_size, top_k, page, offset, cursor, per_page)	static	low	partial	unbounded consumption; LLM10; CWE-770
SEC-037 system-prompt-leakage	A prompt or resource that embeds a system-instruction block carrying a secret	static	high	partial	system prompt leakage; LLM07

Definition integrity and drift

Diff the current definitions against a pinned manifest. This is the rug-pull and drift defense, the same hash-and-version approach mcp-scan calls tool pinning, reusing the cassette manifest.

ID	Check	Method	Severity	Black-box	Maps to
SEC-014 tool-pinning-diff	A previously approved tool's description changed between the baseline and the current catalog	diff	high	yes	rug pull
SEC-015 schema-drift	A previously approved tool's input schema, output schema, or annotations changed	diff	medium	yes	configuration drift
SEC-016 version-stamp-posture	Whether the server version-stamps its definitions (ETDI-style)	static	info	partial	integrity posture signal

Namespace and supply chain

Static analysis across a multi-server config.

ID	Check	Method	Severity	Black-box	Maps to
SEC-010 duplicate-tool-name	The same tool name served by more than one server	static	high	yes	tool shadowing
SEC-011 tool-name-squat	A tool name that is a near-duplicate or typosquat of another server's tool	static	medium	partial	tool-name squatting
SEC-012 server-name-squat	A server name that is a near-duplicate of another server name	static	medium	partial	server-name squatting
SEC-013 ambiguous-resolution	The same tool name with different input schemas across servers	static	medium	yes	multi-server trust; LLM03

Toxic flow and capability

Static analysis that scores a catalog's latent capability before any payload fires. Each tool is classified into zero or more capability tiers from keyword heuristics over its name, description, and input-schema property names (a conservative match, so detectability is partial). The risk is the pairing: an untrusted-content source plus an exfil-or-destructive sink is a complete exfiltration chain a prompt injection can wire together. SEC-003 and SEC-009 catch the directive half; these score the structural half.

The five tiers are sensitive-data-exposure (email, credential, vault, secret), workspace-data-exposure (file, path, repo, code), destructive (delete, drop, exec, transfer, pay), local-destructive (rm, unlink, format, local path), and untrusted-content-source (fetch, http, url, web, third-party). A sink is any of the three destructive or sensitive-exposure tiers.

ID	Check	Method	Severity	Black-box	Maps to
SEC-033 capability-tier	Classify each tool into latent capability tiers (informational posture)	static	info	partial	toxic flow; excessive agency; LLM06
SEC-034 untrusted-content-source	A tool that pulls untrusted external content, the injection entry point	static	low	partial	toxic flow; LLM01
SEC-035 toxic-flow-pairing	An untrusted-content source coexists with an exfil-or-destructive sink	static	high	partial	toxic flow; data exfiltration; LLM02

Transport and protocol

Active probes against the running server.

Implemented probes carry a SEC id and analyze captured [ProbeEvidence]; the live probing sits behind that evidence struct so a run is deterministic and cassette-replayable. The first batch (SEC-017..023) landed and the follow-up batch (SEC-024..032); the catalog is now fully implemented.

ID	Check	Method	Severity	Black-box	Maps to
SEC-017 transport/tls-required	Server accepts plaintext HTTP on a non-loopback address	active	high	yes	MCP spec (HTTPS)
SEC-018 transport/origin-validation	Server does not validate the Origin header	active	high	partial	DNS rebinding; MCP spec
SEC-019 transport/private-ip-guard	OAuth or discovery URLs resolve to private or cloud-metadata ranges	active	high	partial	SSRF (server-side request forgery); MCP spec (block private IPs)
SEC-020 transport/error-envelope-leak	Error responses leak stack traces, internal paths, or secrets	active	medium	yes	sensitive disclosure; CWE-209
SEC-024 transport/rate-limit-present	No rate limiting under a controlled burst	active	low	partial	unbounded consumption; LLM10
SEC-025 transport/capability-attestation	Server advertises capabilities without attestation	active	low	partial	Breaking the Protocol
SEC-026 transport/sampling-origin-auth	Server uses sampling without authenticating the origin	active	medium	partial	Breaking the Protocol
SEC-027 transport/unbounded-response	A single response can grow without a server-side bound	active	low	yes	unbounded consumption; LLM10

Auth and identity posture

Active probes of the authentication and authorization surface, grounded in the MCP specification's MUST and SHOULD requirements.

ID	Check	Method	Severity	Black-box	Maps to
SEC-021 auth/posture-tier	Report the auth tier (none, header-bearer, OAuth)	active	info	yes	posture signal
SEC-022 auth/token-audience	Server accepts a token not issued for it (token passthrough)	active	high	partial	MCP spec (no passthrough); RFC 8707
SEC-028 auth/resource-indicators	Server advertises and honors RFC 8707 resource indicators	active	medium	partial	RFC 8707
SEC-029 auth/scope-minimization	Server publishes wildcard or omnibus scopes	active	medium	partial	MCP spec (scope minimization)
SEC-030 auth/session-id-hygiene	Predictable session IDs, or sessions used for authentication	active	high	partial	MCP spec (session hygiene)
SEC-031 auth/confused-deputy-posture	Proxy server lacks per-client consent or exact redirect_uri matching	active	medium	partial	MCP spec (confused deputy)

Local server and supply chain

Static analysis of stdio and local-server configuration.

ID	Check	Method	Severity	Black-box	Maps to
SEC-023 local/dangerous-startup-command	A startup command with a dangerous pattern (sudo, rm -rf, piped curl, SSH-key access)	static	critical	yes	MCP spec (local compromise); LLM03
SEC-032 local/install-source-provenance	Install source has no provenance or pinned version	static	low	partial	supply chain; LLM03

OWASP MCP Top 10 cross-walk

In addition to the per-row mappings above (OWASP LLM and Agentic Top 10, MCP spec requirements, CWEs), the catalog cross-walks to the official OWASP MCP Top 10, the most on-point external anchor. That list is in beta, so the mapping covers its published items.

OWASP MCP Top 10	Covered by
MCP01 Token mismanagement and secret exposure	surface/secret-in-definition, auth/token-audience
MCP02 Privilege escalation	surface/unannotated-destructive-tool, auth/scope-minimization
MCP03 Tool poisoning	the surface/ family, SEC-014 tool-pinning-diff
MCP04 Supply chain and dependency tampering	the namespace family (SEC-010 through SEC-013), local/install-source-provenance
MCP05 Command injection	local/dangerous-startup-command, surface/exfiltration-directive
MCP07 Insufficient authentication	the auth/ family
MCP09 Shadow MCP servers	SEC-010 duplicate-tool-name, SEC-012 server-name-squat
MCP10 Context over-sharing	surface/exfiltration-directive, transport/error-envelope-leak, SEC-035 toxic-flow-pairing

Coverage and follow-ups

This catalog is the bundled pack for the deterministic security framework. The static lanes are wired into the mcptest security CLI: the surface family (SEC-001..009), the namespace family (SEC-010..013), the integrity family (SEC-014..016, behind --baseline), and the toxic-flow family (SEC-033..035). The advisory LLM-judge lane runs behind --model and is reported separately so it never moves the verdict. The active probe rows (transport, auth, local SEC-017..032) and the per-model dynamic red-team layers (C1/C2) remain deferred until a live red-team command is wired: they need a running server and a captured trace, which the static-snapshot CLI does not supply. Coverage against the MCP-DPT inventory is reported as "N of the testable rows," and the scorecard security section renders each fired check as a line item with its severity and defense-layer tag.