mcptest docs GitHub

Scenario 8: catch schema drift

A consumer depends on an MCP server's tool catalog. The server owner ships a release, and a tool quietly disappears, an argument that used to be optional becomes required, or an enum value gets dropped. Each of those breaks callers the moment they hit the new surface, but none of them show up as a failed unit test on the server side. They are catalog changes, not behavior bugs.

mcptest diff is built for exactly this. You capture the catalog you depend on as a baseline (a saved tools/list snapshot), then diff a new catalog against it. The command classifies every change as breaking or non-breaking and sets its exit code so CI fails loudly on a regression.

This walkthrough uses the hosted test server at https://test.mcptest.sh. Its conformant endpoint is POST https://test.mcptest.sh/mcp. A second endpoint, POST https://test.mcptest.sh/mcp?catalog=v1, serves the prior catalog, so you can diff the two and watch the breaking changes surface.

Capture and diff

The diff command compares two saved tools/list JSON snapshots. The snapshot shape is a single object with a tools array, the same shape the server returns from tools/list. The committed pair examples/diff-tools-baseline.json and examples/diff-tools-current.json is a ready-made example you can diff with no network at all:

mcptest diff examples/diff-tools-baseline.json examples/diff-tools-current.json

For the hosted-server walkthrough, capture each endpoint's catalog into its own snapshot file. mcptest discover runs the handshake and tools/list for you; save the tools/list result for each endpoint as <name>.json. The two snapshots below stand in for "the prior release" and "the current release":

A snapshot is just the tools/list object. A trimmed prior.json looks like:

{
  "tools": [
    { "name": "archive_item", "description": "Archive an item by id.",
      "inputSchema": { "type": "object",
        "properties": { "id": { "type": "string" } },
        "required": ["id"] } },
    { "name": "search", "description": "Search the catalog.",
      "inputSchema": { "type": "object",
        "properties": { "query": { "type": "string" } },
        "required": [] } },
    { "name": "get_forecast", "description": "Forecast for a city.",
      "inputSchema": { "type": "object",
        "properties": {
          "city": { "type": "string" },
          "units": { "type": "string", "enum": ["celsius", "fahrenheit", "kelvin"] }
        },
        "required": ["city"] } }
  ]
}

The matching current.json drops archive_item, moves query into search's required array, and removes kelvin from the units enum.

Diff the prior catalog (old) against the current catalog (new):

mcptest diff prior.json current.json

The first argument is the baseline (old), the second is the candidate (new). Order matters: the diff describes how to get from old to new, so passing them backwards reports an added tool and a relaxed argument instead of the breakage you are looking for.

To gate CI, leave --fail-on-breaking at its default (true) so any breaking change exits non-zero. Add --scorecard for a release letter grade, and pick a machine format with --format when a downstream tool consumes the output:

# CI gate: non-zero exit on any breaking change (the default).
mcptest diff prior.json current.json --fail-on-breaking true

# Advisory PR comment that never fails the job.
mcptest diff prior.json current.json --format markdown --fail-on-breaking false > pr-comment.md

# Append a release scorecard (A+ / A / B / C / D / F).
mcptest diff prior.json current.json --scorecard

What is happening here:

Expected output

Diffing the prior catalog against the current one reports three breaking changes and exits non-zero:

$ mcptest diff prior.json current.json

Tool catalog diff: prior.json -> current.json

Tools removed (1):
  - archive_item (BREAKING)
      last seen with: args.id (string, required)

Tools changed (2):
  search
      args.query: optional -> required (BREAKING)
  get_forecast
      args.units: enum value `kelvin` removed (BREAKING)

Summary: 3 BREAKING, 0 NON-BREAKING.
Exit code: 1

With --scorecard appended, the diff gains a grade line. A removed tool grades the release F:

Release scorecard: F
  removed:   archive_item
  regressed: search (query now required), get_forecast (units enum narrowed)

The exit code is the load-bearing CI signal. 0 means no breaking changes (or --fail-on-breaking false); 1 means at least one breaking change was found, or a snapshot file was missing or malformed. A CI step that runs mcptest diff against the committed baseline fails the build the moment a breaking catalog change lands.

Troubleshooting

See also