External-scanner supplement

mcptest's bundled checks are the authoritative lane: deterministic, and the only thing that sets a security grade. The supplement primitive lets you fold in findings from a scanner you already run, so one report shows both. mcptest does not bundle or endorse any scanner. This is the security analog of the exec scorer: a generic wrapper around a local tool, part of the multi-layer security testing design.

mcptest owns the ingest, not the scan. Do not try to out-scanner the scanners: run the one you already trust, then fold its output into one report.

The `security import` command

security import reads one or more scanner files and prints a unified report. SARIF 2.1.0 is the de-facto format (AgentSeal and similar scanners emit it natively); Snyk agent-scan emits its own ScanPathResult JSON; anything else goes through the generic JSON reader.

# Fold an AgentSeal SARIF file and a Snyk agent-scan JSON file into one report.
mcptest security import \
  --sarif examples/security/agentseal.sarif.json \
  --snyk examples/security/snyk-agent-scan.json

# Also run the bundled deterministic lanes against a snapshot, so imports
# dedup against findings from the same run.
mcptest security import \
  --sarif examples/security/agentseal.sarif.json \
  --snapshot examples/security-tools-list.json

# SARIF out, for code scanning. Imported results carry their scanner provenance.
mcptest security import --sarif scan.sarif --format sarif > security.sarif

Each of --sarif, --snyk, and --supplement is repeatable. The scanner name for a SARIF file is read from its tool driver; Snyk imports are tagged snyk; a generic file is tagged with its file stem. --advisory marks every imported finding advisory so none of it gates (see below). Exit code is 1 when any counted finding is at or above --fail-on (default high), 2 when a file cannot be read or no scanner file is given, otherwise 0.

Managed vendor adapters (Cisco AI Defense, Snyk Agent Scan integrations) are not part of this primitive.

What it does

The command takes each scanner result and does three things:

Normalize. Each result becomes a finding on mcptest's severity axis (info through critical), with its rule ID, message, and a piece of evidence.
Record provenance. Every supplement finding records the scanner it came from, so a combined report shows which engine raised what.
Dedup against the catalog. A finding whose rule ID matches a bundled SEC rule is marked as a duplicate, so the combined report counts it once.

Reading SARIF

Most scanners can emit SARIF. Run yours, then read the SARIF into supplement findings. The reader walks every result in every run:

use mcptest_core::security::{normalize_sarif, dedup_against_catalog};

let sarif: serde_json::Value = serde_json::from_str(&scanner_output)?;
let mut findings = normalize_sarif(&sarif, "semgrep", false);
dedup_against_catalog(&mut findings, &["SEC-001", "SEC-002", "SEC-003"]);

Severity comes from a properties.severity label when the scanner writes one, otherwise from the SARIF level: error maps to high, warning to medium, note to low, and anything else to info. A clean scan and a document that is not SARIF both read as zero findings rather than an error.

For a scanner that does not speak SARIF, normalize_generic_json reads either a top-level array of findings or an object with a findings array, where each item carries a rule_id (or id), a severity label, and a message. Snyk agent-scan output (a ScanPathResult with an issues array of code-keyed findings) goes through normalize_snyk, a thin adapter over the same path, and is what --snyk calls.

Worked example: wrapping semgrep

semgrep runs locally and writes SARIF. A rule pack aimed at MCP servers might flag a tool handler that shells out with an interpolated argument, or a tool that fetches a user-supplied URL with no size limit:

semgrep --config ./mcp-rules --sarif --output findings.sarif src/

Feed findings.sarif to normalize_sarif. The shell-exec result maps to a high-severity finding with the offending line as evidence; the unbounded-fetch result maps to medium. Neither rule ID is a SEC id, so both survive dedup as novel signals. The fixtures under tests/fixtures/supplement/ show the exact shapes.

Advisory findings never set a grade

Some scanners reach their verdict with an LLM judge. Pass advisory = true when you normalize such a run. Advisory findings are reported, but counts_toward_grade returns false for them, so they never set or flip a grade. That keeps the grade on the deterministic lane, which is the rule the whole security design rests on. A finding that duplicates a bundled rule also returns false, because the bundled engine already counted it.