Pentest gate and scorecard security section

This is the product-facing output of the MCP security research track. It turns the MCP security taxonomy into three things: a coverage matrix for the pentest gate, a shortlist of black-box checks mcptest can run, and the spec for the scorecard's security posture section. The threat model and the testability flags come from that taxonomy; this document does not restate them.

mcptest is a black-box client. It can inspect tool, resource, and prompt definitions, diff them across runs, analyze a multi-server config, and probe some transport behavior. It cannot see server-side execution or host internals. Every check below stays inside that boundary.

Coverage matrix

The pentest gate measures coverage against the testable rows of the ADR 0039 inventory, not all 22. The deterministic security engine (ADR 0040) now ships sixteen checks, SEC-001 through SEC-016, so the Server surface is covered and the Current column below reflects that. The Client, Transport, and User rows still have follow-up implementation tickets.

Surface	Attacks (testable)	Current	Check
User	prompt injection, confused-model misuse	none	agent-loop probes (partial, deferred)
Client	schema inconsistencies	none	schema-vs-response validation
Transport	rebinding, MitM, capability attestation, sampling origin	none	transport hygiene probes
Server	tool poisoning, indirect injection, shadowing, rug pull, config drift, name squatting, preference manipulation, multi-server trust	shipped	surface scan (SEC-001/002/005/006), integrity diff (SEC-014/015/016), namespace scan (SEC-010 to SEC-013)

The non-testable rows (vulnerable client, vulnerable server, sandbox escape, slash-command overlap) are documented as out of scope for the OSS client in ADR 0039. They are candidates for the cloud or enterprise tier, not the gate.

Black-box detection checks

The shortlist below is ordered by value over false-positive risk. The recommendation is to ship the first three in the pentest-gate v1; they are high-signal and low-noise.

Check	Method	Attacks covered	False-positive risk	Status
Tool-definition manifest + diff	Hash each tool/prompt/resource definition; persist a manifest; diff across runs and flag any change to a previously approved definition	rug pull, configuration drift	low	shipped (SEC-014/015/016)
Duplicate-name and shadowing scan	In a multi-server config, flag duplicate tool names across servers and ambiguous resolution order	tool shadowing, multi-server trust	low	shipped (SEC-010, SEC-013)
Description-injection scan	Scan descriptions and schemas for imperative-to-model instructions, known injection phrases ("ignore previous", "before doing X also"), and invisible or bidirectional unicode	tool poisoning, indirect prompt injection	medium	shipped (SEC-001/002/005)
Name-squatting heuristic	Flag tool or server names that are near-duplicates or typosquats of known servers	tool-name and server-name squatting	medium	shipped (SEC-011/012)
Preference-manipulation smell test	Flag descriptions stuffed with persuasive "always use this tool" language	preference manipulation	medium to high	shipped (SEC-006)
Transport hygiene probe	Check whether an HTTP server validates Origin, rejects DNS-rebinding, and enforces TLS	rebinding, MitM	low	planned

Decision on the manifest: the integrity check family (SEC-014 through SEC-016) ships this. It diffs a pinned baseline catalog against the current one with the shared diff_tool_catalogs routine and flags any change to a previously approved definition, which covers rug pull and configuration drift. We kept it as its own family in the security engine rather than overloading the cassette manifest, because the engine already owns finding rendering across pretty, JSON, and SARIF output.

Each shipped check maps to one scorecard line item in the security section.

Benchmark corpora

We treat published benchmarks as a source of cases to learn from and, where licensing allows, to import. The recommendation is a small non-redundant core, not wholesale import.

Benchmark	Format fit	Recommendation
MCPTox (arXiv:2508.14925)	tool-poisoning cases on real tool metadata; anonymized repo released	import a subset as scenarios once the license and attribution are confirmed; otherwise derive equivalents
MCPSecBench (arXiv:2508.13220)	17 attacks with a playground	learn from; reuse cases only after checking the playground license
MCP-SafetyBench (arXiv:2512.15163)	multi-step, multi-server, real servers	cite and derive; too host-dependent to import directly
MCP-AttackBench (~70k samples)	scale corpus	cite; sample if a specific gap needs volume
SafeMCP (arXiv:2506.13666), SHADE-Arena, MCIP-bench	varied	cite as reference

Plan for adopted cases: express each as an mcptest scenario or cassette so the gate runs them offline with no dependency on a live remote server. License and attribution must be cleared before any case is redistributed; until then we derive equivalents rather than copy. Coverage delta against the current (empty) corpus is the full testable inventory, so the first import is also the first real coverage.

Scorecard security posture

The security section reports two things: attack results (what broke) and defensive posture (what protections the server has). Posture signals are tagged with their MCP-DPT defense layer (arXiv:2604.07551) so the scorecard reads as a coverage map, not a flat checklist.

Signal	Detected black-box by	MCP-DPT layer	Detectable
Auth posture (none / header-bearer / OAuth)	connection requirements at handshake	transport / auth	yes
Tool-definition integrity (versioned, pinned, ETDI-style)	whether the server version-stamps or signs definitions	tool / supply chain	partial
Transport hygiene (TLS, Origin, rebinding)	probes from the transport hygiene check	transport	yes
Rate limiting	controlled burst probe	host / transport	partial
Error hygiene	inspect error envelopes for stack traces or leaked secrets	server	yes

What we cannot detect black-box: server-side WAF behavior, payload-inspection pipelines (MCP-Guard style), and runtime intent verification. The scorecard says so explicitly rather than scoring a blank as a pass. ETDI (arXiv:2506.01333) is the source for the integrity and auth-posture signals; MCP-DPT is the organizing frame for the layer tags.

This section is implemented in the security::posture module. It emits one signal per row with its MCP-DPT layer and a detectability marker, and lists the undetectable controls so an absent signal is never read as a pass. The transport-hygiene and rate-limiting signals report that they require the active probe until it ships, rather than reporting a verdict they cannot support.

Follow-ups

The concrete implementation work (pentest-gate checks, the scorecard posture section, and the corpus import) is tracked as separate implementation tickets. This document and ADR 0039 are the design inputs they build from.