mcptest docs GitHub

Pentest gate and scorecard security section

This is the product-facing output of the MCP security research track. It turns the MCP security taxonomy into three things: a coverage matrix for the pentest gate, a shortlist of black-box checks mcptest can run, and the spec for the scorecard's security posture section. The threat model and the testability flags come from that taxonomy; this document does not restate them.

mcptest is a black-box client. It can inspect tool, resource, and prompt definitions, diff them across runs, analyze a multi-server config, and probe some transport behavior. It cannot see server-side execution or host internals. Every check below stays inside that boundary.

Coverage matrix

The pentest gate measures coverage against the testable rows of the ADR 0039 inventory, not all 22. The deterministic security engine (ADR 0040) now ships sixteen checks, SEC-001 through SEC-016, so the Server surface is covered and the Current column below reflects that. The Client, Transport, and User rows still have follow-up implementation tickets.

SurfaceAttacks (testable)CurrentCheck
Userprompt injection, confused-model misusenoneagent-loop probes (partial, deferred)
Clientschema inconsistenciesnoneschema-vs-response validation
Transportrebinding, MitM, capability attestation, sampling originnonetransport hygiene probes
Servertool poisoning, indirect injection, shadowing, rug pull, config drift, name squatting, preference manipulation, multi-server trustshippedsurface scan (SEC-001/002/005/006), integrity diff (SEC-014/015/016), namespace scan (SEC-010 to SEC-013)

The non-testable rows (vulnerable client, vulnerable server, sandbox escape, slash-command overlap) are documented as out of scope for the OSS client in ADR 0039. They are candidates for the cloud or enterprise tier, not the gate.

Black-box detection checks

The shortlist below is ordered by value over false-positive risk. The recommendation is to ship the first three in the pentest-gate v1; they are high-signal and low-noise.

CheckMethodAttacks coveredFalse-positive riskStatus
Tool-definition manifest + diffHash each tool/prompt/resource definition; persist a manifest; diff across runs and flag any change to a previously approved definitionrug pull, configuration driftlowshipped (SEC-014/015/016)
Duplicate-name and shadowing scanIn a multi-server config, flag duplicate tool names across servers and ambiguous resolution ordertool shadowing, multi-server trustlowshipped (SEC-010, SEC-013)
Description-injection scanScan descriptions and schemas for imperative-to-model instructions, known injection phrases ("ignore previous", "before doing X also"), and invisible or bidirectional unicodetool poisoning, indirect prompt injectionmediumshipped (SEC-001/002/005)
Name-squatting heuristicFlag tool or server names that are near-duplicates or typosquats of known serverstool-name and server-name squattingmediumshipped (SEC-011/012)
Preference-manipulation smell testFlag descriptions stuffed with persuasive "always use this tool" languagepreference manipulationmedium to highshipped (SEC-006)
Transport hygiene probeCheck whether an HTTP server validates Origin, rejects DNS-rebinding, and enforces TLSrebinding, MitMlowplanned

Decision on the manifest: the integrity check family (SEC-014 through SEC-016) ships this. It diffs a pinned baseline catalog against the current one with the shared diff_tool_catalogs routine and flags any change to a previously approved definition, which covers rug pull and configuration drift. We kept it as its own family in the security engine rather than overloading the cassette manifest, because the engine already owns finding rendering across pretty, JSON, and SARIF output.

Each shipped check maps to one scorecard line item in the security section.

Benchmark corpora

We treat published benchmarks as a source of cases to learn from and, where licensing allows, to import. The recommendation is a small non-redundant core, not wholesale import.

BenchmarkFormat fitRecommendation
MCPTox (arXiv:2508.14925)tool-poisoning cases on real tool metadata; anonymized repo releasedimport a subset as scenarios once the license and attribution are confirmed; otherwise derive equivalents
MCPSecBench (arXiv:2508.13220)17 attacks with a playgroundlearn from; reuse cases only after checking the playground license
MCP-SafetyBench (arXiv:2512.15163)multi-step, multi-server, real serverscite and derive; too host-dependent to import directly
MCP-AttackBench (~70k samples)scale corpuscite; sample if a specific gap needs volume
SafeMCP (arXiv:2506.13666), SHADE-Arena, MCIP-benchvariedcite as reference

Plan for adopted cases: express each as an mcptest scenario or cassette so the gate runs them offline with no dependency on a live remote server. License and attribution must be cleared before any case is redistributed; until then we derive equivalents rather than copy. Coverage delta against the current (empty) corpus is the full testable inventory, so the first import is also the first real coverage.

Scorecard security posture

The security section reports two things: attack results (what broke) and defensive posture (what protections the server has). Posture signals are tagged with their MCP-DPT defense layer (arXiv:2604.07551) so the scorecard reads as a coverage map, not a flat checklist.

SignalDetected black-box byMCP-DPT layerDetectable
Auth posture (none / header-bearer / OAuth)connection requirements at handshaketransport / authyes
Tool-definition integrity (versioned, pinned, ETDI-style)whether the server version-stamps or signs definitionstool / supply chainpartial
Transport hygiene (TLS, Origin, rebinding)probes from the transport hygiene checktransportyes
Rate limitingcontrolled burst probehost / transportpartial
Error hygieneinspect error envelopes for stack traces or leaked secretsserveryes

What we cannot detect black-box: server-side WAF behavior, payload-inspection pipelines (MCP-Guard style), and runtime intent verification. The scorecard says so explicitly rather than scoring a blank as a pass. ETDI (arXiv:2506.01333) is the source for the integrity and auth-posture signals; MCP-DPT is the organizing frame for the layer tags.

This section is implemented in the security::posture module. It emits one signal per row with its MCP-DPT layer and a detectability marker, and lists the undetectable controls so an absent signal is never read as a pass. The transport-hygiene and rate-limiting signals report that they require the active probe until it ships, rather than reporting a verdict they cannot support.

Follow-ups

The concrete implementation work (pentest-gate checks, the scorecard posture section, and the corpus import) is tracked as separate implementation tickets. This document and ADR 0039 are the design inputs they build from.