mcptest docs GitHub

mcptest conformance CLI spec

Spec for the new mcptest conformance subcommand and its run and refresh operations. Implementation is still pending; this document is the contract.

Source of the corpus is documented in ../conformance-corpus/README.md; the scoring library is documented in conformance-tier.md.

Goals

  1. An end user with a running MCP server can score it against the vendored SEP corpus in one command.
  2. The default invocation works offline against a corpus that ships with the binary, so a cargo install mcptest user does not need the source repo or a network round trip to get a tier badge.
  3. The user can refresh the corpus from upstream without re-cloning the source repo, by writing the latest SEPs into a user-owned cache directory.
  4. The user can point at a specific corpus directory (their own fork, a vendored copy in their server's repo) without touching global state.

Subcommands

mcptest conformance run     [FLAGS]   # score a server against a corpus
mcptest conformance refresh [FLAGS]   # pull the latest SEPs into the cache

Why two subcommands and not one --refresh flag: run requires --server and writes a report; refresh does not need a server and only writes corpus files. Two subcommands keep each --help clean and let clap reject nonsense combinations at parse time.

mcptest conformance run

FlagRequiredDefaultPurpose
--server <URL>yesMCP server to probe. Same shape as mcptest run --server.
--target-version <V>nolatest available locallyWhich spec revision's SEPs to score against. Resolved against the corpus directory chosen below.
--corpus-dir <PATH>no(see resolution order)Override corpus location. When set, only this path is consulted; the cache / embedded fallback is skipped.
--reporter <FORMAT>noprettyOne of pretty, json, markdown, html. Mirrors mcptest compliance's renderer enum.
--out <PATH>nostdoutWhere to write the report. JSON renderer writes a result.json shape (below).
--auto-refreshnooffIf --target-version is set but not present in the resolved corpus, run a refresh for that version before scoring. Off by default so a run never silently makes a network call.

Corpus resolution order, when --corpus-dir is not set:

  1. $XDG_CACHE_HOME/mcptest/conformance/<spec-version>/ (or ~/.cache/mcptest/conformance/<spec-version>/ if unset, or %LOCALAPPDATA%\mcptest\conformance\<spec-version>\ on Windows).
  2. The corpus baked into the binary via include_dir!. The build embeds whatever is in crates/mcptest-core/seps/<spec-version>/ at compile time, so a cargo install user gets the same corpus the source clone has.

If --target-version is omitted, run picks the lexicographically greatest version present in the resolved corpus (which sorts correctly for YYYY-MM-DD[-suffix]). If no version is present anywhere, run exits 2 with a message that points at mcptest conformance refresh.

Output shape (JSON renderer):

{
  "target_version": "2026-07-28-rc",
  "corpus_source": "embedded",
  "must":   { "passed": 42, "total": 42 },
  "should": { "passed": 18, "total": 20 },
  "may":    { "passed":  3, "total":  5 },
  "tier":   "tier-1",
  "badge":  "T1",
  "per_sep": [
    { "sep": 2106, "tested": 1, "passed": 1, "excluded": 4 },
    { "sep": 837,  "tested": 1, "passed": 0, "excluded": 4 }
  ]
}

corpus_source is "cache", "embedded", or "override" so a reader can tell whether the run used fresh or vendored data.

mcptest conformance refresh

FlagRequiredDefaultPurpose
--target-version <V>nolatestWhich spec revision to fetch. latest resolves to the most recent <spec-version> upstream advertises (see "Version selection" below).
--corpus-dir <PATH>nouser cacheDestination. Default is the same XDG cache path run reads from.
--url <URL>nohttps://github.com/modelcontextprotocol/conformanceUpstream repo. Override for a fork or a mirror.
--ref <TAG-OR-SHA>no(see version selection)Pin to a specific ref. Bypasses the version-selection logic.
--source-path <PATH>nosrc/sepsSubdirectory in the upstream tree to mirror.
--dry-runnooffPrint what would be fetched and where, without writing.

Transport: HTTPS GET to https://codeload.github.com/<owner>/<repo>/tar.gz/<ref> via reqwest, untar the response in memory, extract files matching <source-path>/**/*.{yaml,yml,json}, and write them to <corpus-dir>/<spec-version>/. No git runtime dependency, works in minimal containers and on platforms without git on PATH.

Behavior on conflict: if files exist at the destination, refresh diffs and rewrites them, then prints an added / removed / unchanged summary (same shape as the bash script's summary today). Use --dry-run to preview without writing.

Version selection (refresh --target-version latest)

Resolving latest requires answering two questions:

  1. Which upstream tag (or branch tip) should we pin to?
  2. Which spec revision string do we file it under locally?

The selection algorithm:

The fallback chain is intentionally explicit so a latest resolve that lands on main HEAD prints exactly what it pinned and why.

Exit codes

CodeMeaning
0Run completed; tier badge written. (For run, this does not mean the server passed, only that the run executed.)
1Generic CLI error (bad args, IO failure, etc.).
2Corpus missing for the requested --target-version and --auto-refresh was not set.
3Network failure during refresh.
4Server unreachable during run.

run does not exit nonzero on a Fail tier verdict; callers that want CI to fail on a bad tier should parse the JSON output. This matches how mcptest compliance separates "run executed" from "score was good."

Implementation notes

Open decisions for the implementation ticket

  1. Should refresh overwrite the in-repo crates/mcptest-core/seps/ when run from a source clone, or always write to the user cache? Recommendation: always cache, never the source tree. Maintainers updating the vendored copy use the bash script (or a future mcptest internal vendor-corpus).
  2. Should run accept multiple --corpus-dir entries to merge corpora from a fork plus upstream? Defer; one path keeps the resolution order trivial to reason about.
  3. Should the binary expose a conformance check-ids subcommand that lists every check: id the corpus references plus whether mcptest implements it? Useful for tracking implementation coverage. Defer to a follow-up ticket.

Cross-references