Drift watch
A third-party MCP server you depend on can change under you: a parameter is renamed, a tool is removed, or a call that used to return rows starts returning nothing. The schema still type-checks and the agent papers over the gap, so the regression ships silently. mcptest drift is a scheduled canary that catches both schema drift and behavioral drift against a committed baseline.
It builds on mcptest diff (semantic catalog drift with breaking-change classification) and adds the second half a watch needs: golden behavioral canaries. The comparison is deterministic and offline; only the live capture touches the network. The hosted scheduler and alerting dashboard stay in the enterprise tier; the OSS engine ships the diff, the canary capture, and the file/exit-code gate.
Drift state
A drift capture is one file bundling the watched contract:
- schema: the server's
tools/listresult. - canaries: a set of golden tool calls and the responses they produced.
Both a baseline (recorded once against a healthy server) and a current capture (each scheduled run) are drift-state files; drift check diffs the two.
Spec
drift capture and drift check --spec read a spec naming the server and the canary calls:
server:
url: https://server.example/mcp # or: command: [node, server.js]
headers:
Authorization: Bearer ${SEARCH_TOKEN}
canaries:
- tool: search
args: { query: widget }
- tool: get_item
args: { id: "1" }
CLI
mcptest drift capture <spec> -o <state.json>
mcptest drift check --baseline <state.json> (--current <state.json> | --spec <spec>) [--json]
captureconnects, snapshotstools/list, runs the canaries, and writes a drift-state file.checkcompares a current capture against the baseline. Pass--currentfor an offline file comparison (CI tier), or--specto capture live first.- The command exits non-zero on a breaking schema change or any behavioral drift, so a scheduled job fails with a classified diff.
A typical CI job:
mcptest drift check --baseline drift-baseline.json --spec drift-spec.yml
What counts as drift
- Schema drift reuses the catalog diff classifier. A renamed or removed parameter, a tightened type, an enum value removal, or a removed tool is breaking and fails the gate. A new optional field, an added tool, or a reworded description is reported but does not fail.
- Behavioral drift reduces each canary response to a value-tolerant fingerprint: a structure hash (keys, types, nesting) plus a size bucket (empty / small / medium / large). A shape change or an empty-result regression is flagged; concrete values such as ids and timestamps are ignored, so a healthy server does not drift just because its data changed. An errored canary contributes a normalized error signature instead.
See the runnable, offline drift-watch example.