mcptest conformance CLI spec
Spec for the new mcptest conformance subcommand and its run and refresh operations. Implementation is still pending; this document is the contract.
Source of the corpus is documented in ../conformance-corpus/README.md; the scoring library is documented in conformance-tier.md.
Goals
- An end user with a running MCP server can score it against the vendored SEP corpus in one command.
- The default invocation works offline against a corpus that ships with the binary, so a
cargo install mcptestuser does not need the source repo or a network round trip to get a tier badge. - The user can refresh the corpus from upstream without re-cloning the source repo, by writing the latest SEPs into a user-owned cache directory.
- The user can point at a specific corpus directory (their own fork, a vendored copy in their server's repo) without touching global state.
Subcommands
mcptest conformance run [FLAGS] # score a server against a corpus
mcptest conformance refresh [FLAGS] # pull the latest SEPs into the cache
Why two subcommands and not one --refresh flag: run requires --server and writes a report; refresh does not need a server and only writes corpus files. Two subcommands keep each --help clean and let clap reject nonsense combinations at parse time.
mcptest conformance run
| Flag | Required | Default | Purpose |
|---|---|---|---|
--server <URL> | yes | MCP server to probe. Same shape as mcptest run --server. | |
--target-version <V> | no | latest available locally | Which spec revision's SEPs to score against. Resolved against the corpus directory chosen below. |
--corpus-dir <PATH> | no | (see resolution order) | Override corpus location. When set, only this path is consulted; the cache / embedded fallback is skipped. |
--reporter <FORMAT> | no | pretty | One of pretty, json, markdown, html. Mirrors mcptest compliance's renderer enum. |
--out <PATH> | no | stdout | Where to write the report. JSON renderer writes a result.json shape (below). |
--auto-refresh | no | off | If --target-version is set but not present in the resolved corpus, run a refresh for that version before scoring. Off by default so a run never silently makes a network call. |
Corpus resolution order, when --corpus-dir is not set:
$XDG_CACHE_HOME/mcptest/conformance/<spec-version>/(or~/.cache/mcptest/conformance/<spec-version>/if unset, or%LOCALAPPDATA%\mcptest\conformance\<spec-version>\on Windows).- The corpus baked into the binary via
include_dir!. The build embeds whatever is incrates/mcptest-core/seps/<spec-version>/at compile time, so acargo installuser gets the same corpus the source clone has.
If --target-version is omitted, run picks the lexicographically greatest version present in the resolved corpus (which sorts correctly for YYYY-MM-DD[-suffix]). If no version is present anywhere, run exits 2 with a message that points at mcptest conformance refresh.
Output shape (JSON renderer):
{
"target_version": "2026-07-28-rc",
"corpus_source": "embedded",
"must": { "passed": 42, "total": 42 },
"should": { "passed": 18, "total": 20 },
"may": { "passed": 3, "total": 5 },
"tier": "tier-1",
"badge": "T1",
"per_sep": [
{ "sep": 2106, "tested": 1, "passed": 1, "excluded": 4 },
{ "sep": 837, "tested": 1, "passed": 0, "excluded": 4 }
]
}
corpus_source is "cache", "embedded", or "override" so a reader can tell whether the run used fresh or vendored data.
mcptest conformance refresh
| Flag | Required | Default | Purpose |
|---|---|---|---|
--target-version <V> | no | latest | Which spec revision to fetch. latest resolves to the most recent <spec-version> upstream advertises (see "Version selection" below). |
--corpus-dir <PATH> | no | user cache | Destination. Default is the same XDG cache path run reads from. |
--url <URL> | no | https://github.com/modelcontextprotocol/conformance | Upstream repo. Override for a fork or a mirror. |
--ref <TAG-OR-SHA> | no | (see version selection) | Pin to a specific ref. Bypasses the version-selection logic. |
--source-path <PATH> | no | src/seps | Subdirectory in the upstream tree to mirror. |
--dry-run | no | off | Print what would be fetched and where, without writing. |
Transport: HTTPS GET to https://codeload.github.com/<owner>/<repo>/tar.gz/<ref> via reqwest, untar the response in memory, extract files matching <source-path>/**/*.{yaml,yml,json}, and write them to <corpus-dir>/<spec-version>/. No git runtime dependency, works in minimal containers and on platforms without git on PATH.
Behavior on conflict: if files exist at the destination, refresh diffs and rewrites them, then prints an added / removed / unchanged summary (same shape as the bash script's summary today). Use --dry-run to preview without writing.
Version selection (refresh --target-version latest)
Resolving latest requires answering two questions:
- Which upstream tag (or branch tip) should we pin to?
- Which spec revision string do we file it under locally?
The selection algorithm:
- Query the GitHub API for tags on
--url. Filter to tags whose tree contains<source-path>. Pick the highest-versioned tag. - If no tag satisfies the filter, fall back to the default branch HEAD (today this is the case:
src/seps/postdatesv0.1.16). - The local
<spec-version>is read from a sidecar file in upstream (proposed:<source-path>/SPEC_VERSION) if present, otherwise derived from the ref name (a tag likespec-2026-07-28-rcmaps to2026-07-28-rc), otherwise the user must pass--target-version <V>explicitly.
The fallback chain is intentionally explicit so a latest resolve that lands on main HEAD prints exactly what it pinned and why.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Run completed; tier badge written. (For run, this does not mean the server passed, only that the run executed.) |
| 1 | Generic CLI error (bad args, IO failure, etc.). |
| 2 | Corpus missing for the requested --target-version and --auto-refresh was not set. |
| 3 | Network failure during refresh. |
| 4 | Server unreachable during run. |
run does not exit nonzero on a Fail tier verdict; callers that want CI to fail on a bad tier should parse the JSON output. This matches how mcptest compliance separates "run executed" from "score was good."
Implementation notes
crates/mcptest-core/seps/is embedded at compile time withinclude_dir. The build embeds the directory verbatim; no codegen, no JSON conversion.- Cache and config locations come from the
directoriescrate (ProjectDirs::from("sh", "mcptest", "mcptest")). - The tarball transport reuses the existing
reqwestclient config (rustls, gzip, no native-tls). Untar viatar+flate2, both already inmcptest-cassette's dep tree. - Version-selection HTTP calls are unauthenticated GitHub API and subject to the 60-req/hr anonymous rate limit.
refreshshould fail informatively (not retry-loop) on 403. A future enhancement could honorGITHUB_TOKENfrom the env if present. - The bash script
scripts/refresh-conformance-corpus.shbecomes a wrapper aroundmcptest conformance refreshonce the subcommand lands, then can be removed.
Open decisions for the implementation ticket
- Should
refreshoverwrite the in-repocrates/mcptest-core/seps/when run from a source clone, or always write to the user cache? Recommendation: always cache, never the source tree. Maintainers updating the vendored copy use the bash script (or a futuremcptest internal vendor-corpus). - Should
runaccept multiple--corpus-direntries to merge corpora from a fork plus upstream? Defer; one path keeps the resolution order trivial to reason about. - Should the binary expose a
conformance check-idssubcommand that lists everycheck:id the corpus references plus whether mcptest implements it? Useful for tracking implementation coverage. Defer to a follow-up ticket.
Cross-references
- ../conformance-corpus/README.md documents the vendored corpus layout and what each file contains.
- conformance-tier.md documents the scoring library (
TierInput,Tier,score_tier) that backs the badge. - cli-reference.md is the user-facing CLI reference; the
conformancesubcommand folds in once this spec is implemented.