`mcptest conformance` CLI spec

Spec for the new mcptest conformance subcommand and its run and refresh operations. Implementation is still pending; this document is the contract.

Source of the corpus is documented in ../conformance-corpus/README.md; the scoring library is documented in conformance-tier.md.

Goals

An end user with a running MCP server can score it against the vendored SEP corpus in one command.
The default invocation works offline against a corpus that ships with the binary, so a cargo install mcptest user does not need the source repo or a network round trip to get a tier badge.
The user can refresh the corpus from upstream without re-cloning the source repo, by writing the latest SEPs into a user-owned cache directory.
The user can point at a specific corpus directory (their own fork, a vendored copy in their server's repo) without touching global state.

Subcommands

mcptest conformance run     [FLAGS]   # score a server against a corpus
mcptest conformance refresh [FLAGS]   # pull the latest SEPs into the cache

Why two subcommands and not one --refresh flag: run requires --server and writes a report; refresh does not need a server and only writes corpus files. Two subcommands keep each --help clean and let clap reject nonsense combinations at parse time.

`mcptest conformance run`

Flag	Required	Default	Purpose
`--server <URL>`	yes		MCP server to probe. Same shape as `mcptest run --server`.
`--target-version <V>`	no	latest available locally	Which spec revision's SEPs to score against. Resolved against the corpus directory chosen below.
`--corpus-dir <PATH>`	no	(see resolution order)	Override corpus location. When set, only this path is consulted; the cache / embedded fallback is skipped.
`--reporter <FORMAT>`	no	`pretty`	One of `pretty`, `json`, `markdown`, `html`. Mirrors `mcptest compliance`'s renderer enum.
`--out <PATH>`	no	stdout	Where to write the report. JSON renderer writes a `result.json` shape (below).
`--auto-refresh`	no	off	If `--target-version` is set but not present in the resolved corpus, run a `refresh` for that version before scoring. Off by default so a `run` never silently makes a network call.

Corpus resolution order, when --corpus-dir is not set:

$XDG_CACHE_HOME/mcptest/conformance/<spec-version>/ (or ~/.cache/mcptest/conformance/<spec-version>/ if unset, or %LOCALAPPDATA%\mcptest\conformance\<spec-version>\ on Windows).
The corpus baked into the binary via include_dir!. The build embeds whatever is in crates/mcptest-core/seps/<spec-version>/ at compile time, so a cargo install user gets the same corpus the source clone has.

If --target-version is omitted, run picks the lexicographically greatest version present in the resolved corpus (which sorts correctly for YYYY-MM-DD[-suffix]). If no version is present anywhere, run exits 2 with a message that points at mcptest conformance refresh.

Output shape (JSON renderer):

{
  "target_version": "2026-07-28-rc",
  "corpus_source": "embedded",
  "must":   { "passed": 42, "total": 42 },
  "should": { "passed": 18, "total": 20 },
  "may":    { "passed":  3, "total":  5 },
  "tier":   "tier-1",
  "badge":  "T1",
  "per_sep": [
    { "sep": 2106, "tested": 1, "passed": 1, "excluded": 4 },
    { "sep": 837,  "tested": 1, "passed": 0, "excluded": 4 }
  ]
}

corpus_source is "cache", "embedded", or "override" so a reader can tell whether the run used fresh or vendored data.

`mcptest conformance refresh`

Flag	Required	Default	Purpose
`--target-version <V>`	no	`latest`	Which spec revision to fetch. `latest` resolves to the most recent `<spec-version>` upstream advertises (see "Version selection" below).
`--corpus-dir <PATH>`	no	user cache	Destination. Default is the same XDG cache path `run` reads from.
`--url <URL>`	no	`https://github.com/modelcontextprotocol/conformance`	Upstream repo. Override for a fork or a mirror.
`--ref <TAG-OR-SHA>`	no	(see version selection)	Pin to a specific ref. Bypasses the version-selection logic.
`--source-path <PATH>`	no	`src/seps`	Subdirectory in the upstream tree to mirror.
`--dry-run`	no	off	Print what would be fetched and where, without writing.

Transport: HTTPS GET to https://codeload.github.com/<owner>/<repo>/tar.gz/<ref> via reqwest, untar the response in memory, extract files matching <source-path>/**/*.{yaml,yml,json}, and write them to <corpus-dir>/<spec-version>/. No git runtime dependency, works in minimal containers and on platforms without git on PATH.

Behavior on conflict: if files exist at the destination, refresh diffs and rewrites them, then prints an added / removed / unchanged summary (same shape as the bash script's summary today). Use --dry-run to preview without writing.

Version selection (`refresh --target-version latest`)

Resolving latest requires answering two questions:

Which upstream tag (or branch tip) should we pin to?
Which spec revision string do we file it under locally?

The selection algorithm:

Query the GitHub API for tags on --url. Filter to tags whose tree contains <source-path>. Pick the highest-versioned tag.
If no tag satisfies the filter, fall back to the default branch HEAD (today this is the case: src/seps/ postdates v0.1.16).
The local <spec-version> is read from a sidecar file in upstream (proposed: <source-path>/SPEC_VERSION) if present, otherwise derived from the ref name (a tag like spec-2026-07-28-rc maps to 2026-07-28-rc), otherwise the user must pass --target-version <V> explicitly.

The fallback chain is intentionally explicit so a latest resolve that lands on main HEAD prints exactly what it pinned and why.

Exit codes

Code	Meaning
0	Run completed; tier badge written. (For `run`, this does not mean the server passed, only that the run executed.)
1	Generic CLI error (bad args, IO failure, etc.).
2	Corpus missing for the requested `--target-version` and `--auto-refresh` was not set.
3	Network failure during `refresh`.
4	Server unreachable during `run`.

run does not exit nonzero on a Fail tier verdict; callers that want CI to fail on a bad tier should parse the JSON output. This matches how mcptest compliance separates "run executed" from "score was good."

Implementation notes

crates/mcptest-core/seps/ is embedded at compile time with include_dir. The build embeds the directory verbatim; no codegen, no JSON conversion.
Cache and config locations come from the directories crate (ProjectDirs::from("sh", "mcptest", "mcptest")).
The tarball transport reuses the existing reqwest client config (rustls, gzip, no native-tls). Untar via tar + flate2, both already in mcptest-cassette's dep tree.
Version-selection HTTP calls are unauthenticated GitHub API and subject to the 60-req/hr anonymous rate limit. refresh should fail informatively (not retry-loop) on 403. A future enhancement could honor GITHUB_TOKEN from the env if present.
The bash script scripts/refresh-conformance-corpus.sh becomes a wrapper around mcptest conformance refresh once the subcommand lands, then can be removed.

Open decisions for the implementation ticket

Should refresh overwrite the in-repo crates/mcptest-core/seps/ when run from a source clone, or always write to the user cache? Recommendation: always cache, never the source tree. Maintainers updating the vendored copy use the bash script (or a future mcptest internal vendor-corpus).
Should run accept multiple --corpus-dir entries to merge corpora from a fork plus upstream? Defer; one path keeps the resolution order trivial to reason about.
Should the binary expose a conformance check-ids subcommand that lists every check: id the corpus references plus whether mcptest implements it? Useful for tracking implementation coverage. Defer to a follow-up ticket.

Cross-references

../conformance-corpus/README.md documents the vendored corpus layout and what each file contains.
conformance-tier.md documents the scoring library (TierInput, Tier, score_tier) that backs the badge.
cli-reference.md is the user-facing CLI reference; the conformance subcommand folds in once this spec is implemented.

mcptest conformance CLI spec