Transport, auth, and local probes

The static engine reads a server's catalog. Some risks only show up when you poke the running server: does it accept plaintext, validate the Origin header, block private-IP URLs, leak in its error envelopes, require auth, rate limit a burst, bound its responses, or launch a dangerous stdio command. These are the probe checks.

Run this example. examples/security-multi-server.json captures more than one server, so a single scan exercises the probe checks across each transport.

mcptest security examples/security-multi-server.json

How they work

A probe needs active behavior, but the analysis stays deterministic. The split is the same one the red-team and advisory layers use: a caller captures what the server did into a ProbeEvidence value, and the analyzers read that evidence. The live probing (making the requests, launching the connection) sits behind the evidence struct, so the analysis is reproducible and a probe run replays from a cassette without touching the network.

Every evidence field is optional. A probe whose evidence was not captured produces no finding rather than guessing, which keeps a "did not measure" honest instead of scoring it as a pass.

Implemented probes

Rule	What it flags
SEC-017 tls-required	Plaintext HTTP accepted on a non-loopback address.
SEC-018 origin-validation	A request with a foreign `Origin` was accepted (DNS rebinding).
SEC-019 private-ip-guard	An advertised URL is an IP literal in a private or cloud-metadata range (SSRF).
SEC-020 error-envelope-leak	An error body carries a stack trace or an obvious secret (CWE-209).
SEC-021 posture-tier	Reports the auth tier (none, header bearer, OAuth). Informational.
SEC-022 token-audience	The server accepted a token not issued for it (token passthrough; RFC 8707).
SEC-023 dangerous-startup-command	A stdio launch command runs a dangerous shell pattern (piped curl, rm -rf, sudo, eval).
SEC-024 rate-limit-present	A controlled burst saw no rate limiting (unbounded consumption).
SEC-025 capability-attestation	Capabilities were advertised without an attestation.
SEC-026 sampling-origin-auth	Sampling was used without authenticating the origin.
SEC-027 unbounded-response	A response had no declared size bound, or exceeded the declared one.
SEC-028 resource-indicators	The server does not advertise or honor RFC 8707 resource indicators.
SEC-029 scope-minimization	An advertised scope is a wildcard or an omnibus (`*`, `admin`, `all`).
SEC-030 session-id-hygiene	A session ID is low entropy or sequential, or sessions are used for auth.
SEC-031 confused-deputy-posture	A proxy lacks per-client consent or exact `redirect_uri` matching.
SEC-032 install-source-provenance	An install source has no version pin or provenance.

What stays partial

Several of these probes are marked partial in the catalog because a black-box client cannot always see the full picture, so they fire only on clear evidence and stay silent on the ambiguous case. private-ip-guard only flags IP literals; a hostname would need resolution the client cannot do deterministically, so a hostname is not flagged. The fuzzy analyzers carry a documented heuristic: scope-minimization flags a * wildcard or a known omnibus scope name, session-id-hygiene flags a session ID shorter than 16 characters, an all-digit counter, a single repeated character, or a consecutive numeric sequence, and install-source-provenance flags a source with no version pin (@<version>, ==, a #-ref, or a sha256: digest) and no @latest override. Every catalog probe now has an implementation; nothing remains as a follow-up.

These probes are deterministic security findings: unlike the red-team and advisory layers, they do count toward the grade, because a server that serves plaintext or passes tokens through is a defect in the server, not a property of a model.