mcptest docs GitHub

Transport, auth, and local probes

The static engine reads a server's catalog. Some risks only show up when you poke the running server: does it accept plaintext, validate the Origin header, block private-IP URLs, leak in its error envelopes, require auth, rate limit a burst, bound its responses, or launch a dangerous stdio command. These are the probe checks.

Run this example. examples/security-multi-server.json captures more than one server, so a single scan exercises the probe checks across each transport.

mcptest security examples/security-multi-server.json

How they work

A probe needs active behavior, but the analysis stays deterministic. The split is the same one the red-team and advisory layers use: a caller captures what the server did into a ProbeEvidence value, and the analyzers read that evidence. The live probing (making the requests, launching the connection) sits behind the evidence struct, so the analysis is reproducible and a probe run replays from a cassette without touching the network.

Every evidence field is optional. A probe whose evidence was not captured produces no finding rather than guessing, which keeps a "did not measure" honest instead of scoring it as a pass.

Implemented probes

RuleWhat it flags
SEC-017 tls-requiredPlaintext HTTP accepted on a non-loopback address.
SEC-018 origin-validationA request with a foreign Origin was accepted (DNS rebinding).
SEC-019 private-ip-guardAn advertised URL is an IP literal in a private or cloud-metadata range (SSRF).
SEC-020 error-envelope-leakAn error body carries a stack trace or an obvious secret (CWE-209).
SEC-021 posture-tierReports the auth tier (none, header bearer, OAuth). Informational.
SEC-022 token-audienceThe server accepted a token not issued for it (token passthrough; RFC 8707).
SEC-023 dangerous-startup-commandA stdio launch command runs a dangerous shell pattern (piped curl, rm -rf, sudo, eval).
SEC-024 rate-limit-presentA controlled burst saw no rate limiting (unbounded consumption).
SEC-025 capability-attestationCapabilities were advertised without an attestation.
SEC-026 sampling-origin-authSampling was used without authenticating the origin.
SEC-027 unbounded-responseA response had no declared size bound, or exceeded the declared one.
SEC-028 resource-indicatorsThe server does not advertise or honor RFC 8707 resource indicators.
SEC-029 scope-minimizationAn advertised scope is a wildcard or an omnibus (*, admin, all).
SEC-030 session-id-hygieneA session ID is low entropy or sequential, or sessions are used for auth.
SEC-031 confused-deputy-postureA proxy lacks per-client consent or exact redirect_uri matching.
SEC-032 install-source-provenanceAn install source has no version pin or provenance.

What stays partial

Several of these probes are marked partial in the catalog because a black-box client cannot always see the full picture, so they fire only on clear evidence and stay silent on the ambiguous case. private-ip-guard only flags IP literals; a hostname would need resolution the client cannot do deterministically, so a hostname is not flagged. The fuzzy analyzers carry a documented heuristic: scope-minimization flags a * wildcard or a known omnibus scope name, session-id-hygiene flags a session ID shorter than 16 characters, an all-digit counter, a single repeated character, or a consecutive numeric sequence, and install-source-provenance flags a source with no version pin (@<version>, ==, a #-ref, or a sha256: digest) and no @latest override. Every catalog probe now has an implementation; nothing remains as a follow-up.

These probes are deterministic security findings: unlike the red-team and advisory layers, they do count toward the grade, because a server that serves plaintext or passes tokens through is a defect in the server, not a property of a model.