Portable run evidence
A run report already knows almost everything a registry needs to trust a run: which server was tested, under which MCP spec version, against which corpus, at which commit, when, and how it scored. mcptest evidence aggregates those fields into one small, schema-stable artifact a registry can ingest, and pairs it with a verifier that rejects evidence which is stale, forged, or unsigned.
No registry (Glama, IndexMCP, MCP Scoreboard, Smithery, PulseMCP) publishes a schema to ingest today, so mcptest defines the minimal one and lets registries adopt it. This is deliberately not a public directory: it is an artifact plus a trust policy.
Emitting an artifact
evidence reads a mcptest run --format json report and writes the artifact:
mcptest run tests/ --format json --output run.json
mcptest evidence run.json --out evidence.json
Fold in a security scan's severity counts, and mark the run reproducible (the sbom --verify / SOURCE_DATE_EPOCH parity signal), with:
mcptest security tools-list.json --format json > security.json
mcptest evidence run.json --security security.json --reproducible --out evidence.json
The artifact maps one-to-one onto existing run metadata:
{
"schema_version": "mcptest.dev/evidence/v1",
"server_identity": [
{ "name": "weather", "transport": "stdio", "auth": "none" }
],
"spec_version": "2025-03-26",
"corpus_version": "sha256:beef...",
"source": {
"repo": "git@github.com:acme/weather.git",
"branch": "main",
"commit_sha": "1f2e..."
},
"generated_at": "2026-06-02T10:01:00Z",
"grades": {
"tests_total": 42,
"tests_passed": 41,
"tests_failed": 1,
"security_severity_counts": { "high": 1 }
},
"reproducible": true,
"unverifiable_origin": false
}
A run with no git commit (a private deployment, a one-off) sets unverifiable_origin: true. Such an artifact is accepted but flagged as unattested rather than rejected, so a private server can still publish a badge.
Signing
--sign attaches a detached Sigstore signature, reusing the same keyless cosign sign-blob path the release workflow uses (GitHub Actions OIDC). It writes evidence.json.sig and evidence.json.cert beside the artifact:
mcptest evidence run.json --out evidence.json --sign
Signing needs cosign on PATH; without it, --sign exits 2 rather than emitting an unsigned artifact that looks signed. The cryptographic signature is cosign's job (the same root of trust as the SLSA provenance the release publishes); the binary owns the artifact and the anti-gaming bindings below.
Verifying
evidence verify weighs three things and rejects on any of them:
# Reject evidence older than 30 days, require it to be signed.
mcptest evidence verify evidence.json --max-age 720h --require-signed
- Stale:
generated_atis older than--max-age(or absent when an age is required). Defeats replaying an old green run forever. - Forked identity: the artifact's
commit_shais not an ancestor of the currentHEAD(checked withgit merge-base --is-ancestor). Defeats a fork claiming another repo's identity. When the commit cannot be resolved (not a git checkout), the check is skipped rather than failing. - Unsigned:
--require-signedis set but no<evidence>.sigaccompanies the artifact.
Exit code is 0 when accepted, 1 when rejected (with the reasons printed), 2 when a file cannot be read or parsed. Full cryptographic Sigstore verification (transparency-log inclusion, certificate identity) is cosign verify-blob's job against the .sig / .cert; evidence verify owns the freshness, ancestry, and signature-presence policy a registry enforces per ingest.
What is and is not here
The artifact carries cassette_hash and cost_profile as optional fields, populated when a run supplies them; the OSS command leaves them absent. The compliance tier and TDQS tool scores are likewise optional and filled when a run folded those checks in.