Drift watch

A third-party MCP server you depend on can change under you: a parameter is renamed, a tool is removed, or a call that used to return rows starts returning nothing. The schema still type-checks and the agent papers over the gap, so the regression ships silently. mcptest drift is a scheduled canary that catches both schema drift and behavioral drift against a committed baseline.

It builds on mcptest diff (semantic catalog drift with breaking-change classification) and adds the second half a watch needs: golden behavioral canaries. The comparison is deterministic and offline; only the live capture touches the network. The hosted scheduler and alerting dashboard stay in the enterprise tier; the OSS engine ships the diff, the canary capture, and the file/exit-code gate.

Drift state

A drift capture is one file bundling the watched contract:

schema: the server's tools/list result.
canaries: a set of golden tool calls and the responses they produced.

Both a baseline (recorded once against a healthy server) and a current capture (each scheduled run) are drift-state files; drift check diffs the two.

Spec

drift capture and drift check --spec read a spec naming the server and the canary calls:

server:
  url: https://server.example/mcp        # or: command: [node, server.js]
  headers:
    Authorization: Bearer ${SEARCH_TOKEN}
canaries:
  - tool: search
    args: { query: widget }
  - tool: get_item
    args: { id: "1" }

CLI

mcptest drift capture <spec> -o <state.json>
mcptest drift check --baseline <state.json> (--current <state.json> | --spec <spec>) [--json]

capture connects, snapshots tools/list, runs the canaries, and writes a drift-state file.
check compares a current capture against the baseline. Pass --current for an offline file comparison (CI tier), or --spec to capture live first.
The command exits non-zero on a breaking schema change or any behavioral drift, so a scheduled job fails with a classified diff.

A typical CI job:

mcptest drift check --baseline drift-baseline.json --spec drift-spec.yml

What counts as drift

Schema drift reuses the catalog diff classifier. A renamed or removed parameter, a tightened type, an enum value removal, or a removed tool is breaking and fails the gate. A new optional field, an added tool, or a reworded description is reported but does not fail.
Behavioral drift reduces each canary response to a value-tolerant fingerprint: a structure hash (keys, types, nesting) plus a size bucket (empty / small / medium / large). A shape change or an empty-result regression is flagged; concrete values such as ids and timestamps are ignored, so a healthy server does not drift just because its data changed. An errored canary contributes a normalized error signature instead.

See the runnable, offline drift-watch example.