mcptest docs GitHub

Security test catalog

The bundled security checks mcptest runs against an MCP server. The framework that runs them (declarative checks, deterministic verdicts, separate from the LLM eval path) is settled, and the checks cover the MCP security taxonomy.

This catalog is Layer A, the deterministic baseline. The advisory LLM-judge detection (Layer B), the dynamic red-team (Layer C), and external scanner integrations (Layer D) are part of the multi-layer security testing design. Only Layer A decides the security grade; the other layers are advisory, per-model, or normalized third-party signals.

Every check is deterministic. No model decides a pass or fail. Each row lists the probe method (static inspection of definitions, an active protocol probe, or a diff against a pinned manifest), a severity, whether a black-box client can detect it (yes, partial, or no), and how it maps to the threat model and to the external catalogs (OWASP LLM Top 10, OWASP Agentic Top 10, the MCP specification security guidance, or a CWE).

Out of scope, and marked so on the scorecard rather than scored as a pass: anything that needs host or server-runtime visibility (sandbox escape, server or client internals, server-side WAF behavior). A separate, model-dependent signal ("does this model fall for an injection") belongs to the agent eval path, not here, because an LLM never decides a security verdict.

Check IDs follow the rule-ID standard and emit through the SARIF reporter.

Tool-surface static analysis

Static inspection of tool, prompt, and resource definitions: names, descriptions, schemas, and annotations. The tool description is executable context, so this is the highest-yield family.

IDCheckMethodSeverityBlack-boxMaps to
SEC-001 description-injectionImperative-to-model instructions in a description ("ignore previous", "before doing X also")statichighyestool poisoning; LLM01
SEC-002 cross-tool-directiveA description that instructs the model to call or alter another toolstatichighyesshadowing, parasitic toolchain
SEC-003 exfiltration-directiveA description that tells the model to read files, env vars, or send data outwardstatichighyesdata exfiltration; LLM02
SEC-004 encoded-payloadBase64, hex, or other encoded blobs embedded in a description or schemastaticmediumyestool poisoning; CWE-506
SEC-005 hidden-unicodeInvisible, zero-width, or bidirectional unicode in names or descriptionsstatichighyestool poisoning; CWE-176
SEC-006 preference-manipulationPersuasive "always use this tool" language that biases selectionstaticmediumyespreference manipulation
SEC-007 docstring-schema-mismatchA parameter named in the description that the input schema does not declarestaticmediumyesbehavioral mismatch
SEC-008 secret-in-definitionAPI keys, tokens, or PII in a description or examplestatichighyessensitive disclosure; LLM02
SEC-009 unannotated-destructive-toolA write or delete tool with no destructive-action annotationstaticmediumpartialexcessive agency; LLM06
SEC-036 unbounded-list-toolA list/search/query/fetch tool whose input schema declares no bound parameter (limit, max, count, page_size, top_k, page, offset, cursor, per_page)staticlowpartialunbounded consumption; LLM10; CWE-770
SEC-037 system-prompt-leakageA prompt or resource that embeds a system-instruction block carrying a secretstatichighpartialsystem prompt leakage; LLM07

Definition integrity and drift

Diff the current definitions against a pinned manifest. This is the rug-pull and drift defense, the same hash-and-version approach mcp-scan calls tool pinning, reusing the cassette manifest.

IDCheckMethodSeverityBlack-boxMaps to
SEC-014 tool-pinning-diffA previously approved tool's description changed between the baseline and the current catalogdiffhighyesrug pull
SEC-015 schema-driftA previously approved tool's input schema, output schema, or annotations changeddiffmediumyesconfiguration drift
SEC-016 version-stamp-postureWhether the server version-stamps its definitions (ETDI-style)staticinfopartialintegrity posture signal

Namespace and supply chain

Static analysis across a multi-server config.

IDCheckMethodSeverityBlack-boxMaps to
SEC-010 duplicate-tool-nameThe same tool name served by more than one serverstatichighyestool shadowing
SEC-011 tool-name-squatA tool name that is a near-duplicate or typosquat of another server's toolstaticmediumpartialtool-name squatting
SEC-012 server-name-squatA server name that is a near-duplicate of another server namestaticmediumpartialserver-name squatting
SEC-013 ambiguous-resolutionThe same tool name with different input schemas across serversstaticmediumyesmulti-server trust; LLM03

Toxic flow and capability

Static analysis that scores a catalog's latent capability before any payload fires. Each tool is classified into zero or more capability tiers from keyword heuristics over its name, description, and input-schema property names (a conservative match, so detectability is partial). The risk is the pairing: an untrusted-content source plus an exfil-or-destructive sink is a complete exfiltration chain a prompt injection can wire together. SEC-003 and SEC-009 catch the directive half; these score the structural half.

The five tiers are sensitive-data-exposure (email, credential, vault, secret), workspace-data-exposure (file, path, repo, code), destructive (delete, drop, exec, transfer, pay), local-destructive (rm, unlink, format, local path), and untrusted-content-source (fetch, http, url, web, third-party). A sink is any of the three destructive or sensitive-exposure tiers.

IDCheckMethodSeverityBlack-boxMaps to
SEC-033 capability-tierClassify each tool into latent capability tiers (informational posture)staticinfopartialtoxic flow; excessive agency; LLM06
SEC-034 untrusted-content-sourceA tool that pulls untrusted external content, the injection entry pointstaticlowpartialtoxic flow; LLM01
SEC-035 toxic-flow-pairingAn untrusted-content source coexists with an exfil-or-destructive sinkstatichighpartialtoxic flow; data exfiltration; LLM02

Transport and protocol

Active probes against the running server.

Implemented probes carry a SEC id and analyze captured [ProbeEvidence]; the live probing sits behind that evidence struct so a run is deterministic and cassette-replayable. The first batch (SEC-017..023) landed and the follow-up batch (SEC-024..032); the catalog is now fully implemented.

IDCheckMethodSeverityBlack-boxMaps to
SEC-017 transport/tls-requiredServer accepts plaintext HTTP on a non-loopback addressactivehighyesMCP spec (HTTPS)
SEC-018 transport/origin-validationServer does not validate the Origin headeractivehighpartialDNS rebinding; MCP spec
SEC-019 transport/private-ip-guardOAuth or discovery URLs resolve to private or cloud-metadata rangesactivehighpartialSSRF (server-side request forgery); MCP spec (block private IPs)
SEC-020 transport/error-envelope-leakError responses leak stack traces, internal paths, or secretsactivemediumyessensitive disclosure; CWE-209
SEC-024 transport/rate-limit-presentNo rate limiting under a controlled burstactivelowpartialunbounded consumption; LLM10
SEC-025 transport/capability-attestationServer advertises capabilities without attestationactivelowpartialBreaking the Protocol
SEC-026 transport/sampling-origin-authServer uses sampling without authenticating the originactivemediumpartialBreaking the Protocol
SEC-027 transport/unbounded-responseA single response can grow without a server-side boundactivelowyesunbounded consumption; LLM10

Auth and identity posture

Active probes of the authentication and authorization surface, grounded in the MCP specification's MUST and SHOULD requirements.

IDCheckMethodSeverityBlack-boxMaps to
SEC-021 auth/posture-tierReport the auth tier (none, header-bearer, OAuth)activeinfoyesposture signal
SEC-022 auth/token-audienceServer accepts a token not issued for it (token passthrough)activehighpartialMCP spec (no passthrough); RFC 8707
SEC-028 auth/resource-indicatorsServer advertises and honors RFC 8707 resource indicatorsactivemediumpartialRFC 8707
SEC-029 auth/scope-minimizationServer publishes wildcard or omnibus scopesactivemediumpartialMCP spec (scope minimization)
SEC-030 auth/session-id-hygienePredictable session IDs, or sessions used for authenticationactivehighpartialMCP spec (session hygiene)
SEC-031 auth/confused-deputy-postureProxy server lacks per-client consent or exact redirect_uri matchingactivemediumpartialMCP spec (confused deputy)

Local server and supply chain

Static analysis of stdio and local-server configuration.

IDCheckMethodSeverityBlack-boxMaps to
SEC-023 local/dangerous-startup-commandA startup command with a dangerous pattern (sudo, rm -rf, piped curl, SSH-key access)staticcriticalyesMCP spec (local compromise); LLM03
SEC-032 local/install-source-provenanceInstall source has no provenance or pinned versionstaticlowpartialsupply chain; LLM03

OWASP MCP Top 10 cross-walk

In addition to the per-row mappings above (OWASP LLM and Agentic Top 10, MCP spec requirements, CWEs), the catalog cross-walks to the official OWASP MCP Top 10, the most on-point external anchor. That list is in beta, so the mapping covers its published items.

OWASP MCP Top 10Covered by
MCP01 Token mismanagement and secret exposuresurface/secret-in-definition, auth/token-audience
MCP02 Privilege escalationsurface/unannotated-destructive-tool, auth/scope-minimization
MCP03 Tool poisoningthe surface/ family, SEC-014 tool-pinning-diff
MCP04 Supply chain and dependency tamperingthe namespace family (SEC-010 through SEC-013), local/install-source-provenance
MCP05 Command injectionlocal/dangerous-startup-command, surface/exfiltration-directive
MCP07 Insufficient authenticationthe auth/ family
MCP09 Shadow MCP serversSEC-010 duplicate-tool-name, SEC-012 server-name-squat
MCP10 Context over-sharingsurface/exfiltration-directive, transport/error-envelope-leak, SEC-035 toxic-flow-pairing

Coverage and follow-ups

This catalog is the bundled pack for the deterministic security framework. The static lanes are wired into the mcptest security CLI: the surface family (SEC-001..009), the namespace family (SEC-010..013), the integrity family (SEC-014..016, behind --baseline), and the toxic-flow family (SEC-033..035). The advisory LLM-judge lane runs behind --model and is reported separately so it never moves the verdict. The active probe rows (transport, auth, local SEC-017..032) and the per-model dynamic red-team layers (C1/C2) remain deferred until a live red-team command is wired: they need a running server and a captured trace, which the static-snapshot CLI does not supply. Coverage against the MCP-DPT inventory is reported as "N of the testable rows," and the scorecard security section renders each fired check as a line item with its severity and defense-layer tag.