Pentest gate and scorecard security section
This is the product-facing output of the MCP security research track. It turns the MCP security taxonomy into three things: a coverage matrix for the pentest gate, a shortlist of black-box checks mcptest can run, and the spec for the scorecard's security posture section. The threat model and the testability flags come from that taxonomy; this document does not restate them.
mcptest is a black-box client. It can inspect tool, resource, and prompt definitions, diff them across runs, analyze a multi-server config, and probe some transport behavior. It cannot see server-side execution or host internals. Every check below stays inside that boundary.
Coverage matrix
The pentest gate measures coverage against the testable rows of the ADR 0039 inventory, not all 22. The deterministic security engine (ADR 0040) now ships sixteen checks, SEC-001 through SEC-016, so the Server surface is covered and the Current column below reflects that. The Client, Transport, and User rows still have follow-up implementation tickets.
| Surface | Attacks (testable) | Current | Check |
|---|---|---|---|
| User | prompt injection, confused-model misuse | none | agent-loop probes (partial, deferred) |
| Client | schema inconsistencies | none | schema-vs-response validation |
| Transport | rebinding, MitM, capability attestation, sampling origin | none | transport hygiene probes |
| Server | tool poisoning, indirect injection, shadowing, rug pull, config drift, name squatting, preference manipulation, multi-server trust | shipped | surface scan (SEC-001/002/005/006), integrity diff (SEC-014/015/016), namespace scan (SEC-010 to SEC-013) |
The non-testable rows (vulnerable client, vulnerable server, sandbox escape, slash-command overlap) are documented as out of scope for the OSS client in ADR 0039. They are candidates for the cloud or enterprise tier, not the gate.
Black-box detection checks
The shortlist below is ordered by value over false-positive risk. The recommendation is to ship the first three in the pentest-gate v1; they are high-signal and low-noise.
| Check | Method | Attacks covered | False-positive risk | Status |
|---|---|---|---|---|
| Tool-definition manifest + diff | Hash each tool/prompt/resource definition; persist a manifest; diff across runs and flag any change to a previously approved definition | rug pull, configuration drift | low | shipped (SEC-014/015/016) |
| Duplicate-name and shadowing scan | In a multi-server config, flag duplicate tool names across servers and ambiguous resolution order | tool shadowing, multi-server trust | low | shipped (SEC-010, SEC-013) |
| Description-injection scan | Scan descriptions and schemas for imperative-to-model instructions, known injection phrases ("ignore previous", "before doing X also"), and invisible or bidirectional unicode | tool poisoning, indirect prompt injection | medium | shipped (SEC-001/002/005) |
| Name-squatting heuristic | Flag tool or server names that are near-duplicates or typosquats of known servers | tool-name and server-name squatting | medium | shipped (SEC-011/012) |
| Preference-manipulation smell test | Flag descriptions stuffed with persuasive "always use this tool" language | preference manipulation | medium to high | shipped (SEC-006) |
| Transport hygiene probe | Check whether an HTTP server validates Origin, rejects DNS-rebinding, and enforces TLS | rebinding, MitM | low | planned |
Decision on the manifest: the integrity check family (SEC-014 through SEC-016) ships this. It diffs a pinned baseline catalog against the current one with the shared diff_tool_catalogs routine and flags any change to a previously approved definition, which covers rug pull and configuration drift. We kept it as its own family in the security engine rather than overloading the cassette manifest, because the engine already owns finding rendering across pretty, JSON, and SARIF output.
Each shipped check maps to one scorecard line item in the security section.
Benchmark corpora
We treat published benchmarks as a source of cases to learn from and, where licensing allows, to import. The recommendation is a small non-redundant core, not wholesale import.
| Benchmark | Format fit | Recommendation |
|---|---|---|
| MCPTox (arXiv:2508.14925) | tool-poisoning cases on real tool metadata; anonymized repo released | import a subset as scenarios once the license and attribution are confirmed; otherwise derive equivalents |
| MCPSecBench (arXiv:2508.13220) | 17 attacks with a playground | learn from; reuse cases only after checking the playground license |
| MCP-SafetyBench (arXiv:2512.15163) | multi-step, multi-server, real servers | cite and derive; too host-dependent to import directly |
| MCP-AttackBench (~70k samples) | scale corpus | cite; sample if a specific gap needs volume |
| SafeMCP (arXiv:2506.13666), SHADE-Arena, MCIP-bench | varied | cite as reference |
Plan for adopted cases: express each as an mcptest scenario or cassette so the gate runs them offline with no dependency on a live remote server. License and attribution must be cleared before any case is redistributed; until then we derive equivalents rather than copy. Coverage delta against the current (empty) corpus is the full testable inventory, so the first import is also the first real coverage.
Scorecard security posture
The security section reports two things: attack results (what broke) and defensive posture (what protections the server has). Posture signals are tagged with their MCP-DPT defense layer (arXiv:2604.07551) so the scorecard reads as a coverage map, not a flat checklist.
| Signal | Detected black-box by | MCP-DPT layer | Detectable |
|---|---|---|---|
| Auth posture (none / header-bearer / OAuth) | connection requirements at handshake | transport / auth | yes |
| Tool-definition integrity (versioned, pinned, ETDI-style) | whether the server version-stamps or signs definitions | tool / supply chain | partial |
| Transport hygiene (TLS, Origin, rebinding) | probes from the transport hygiene check | transport | yes |
| Rate limiting | controlled burst probe | host / transport | partial |
| Error hygiene | inspect error envelopes for stack traces or leaked secrets | server | yes |
What we cannot detect black-box: server-side WAF behavior, payload-inspection pipelines (MCP-Guard style), and runtime intent verification. The scorecard says so explicitly rather than scoring a blank as a pass. ETDI (arXiv:2506.01333) is the source for the integrity and auth-posture signals; MCP-DPT is the organizing frame for the layer tags.
This section is implemented in the security::posture module. It emits one signal per row with its MCP-DPT layer and a detectability marker, and lists the undetectable controls so an absent signal is never read as a pass. The transport-hygiene and rate-limiting signals report that they require the active probe until it ships, rather than reporting a verdict they cannot support.
Follow-ups
The concrete implementation work (pentest-gate checks, the scorecard posture section, and the corpus import) is tracked as separate implementation tickets. This document and ADR 0039 are the design inputs they build from.