Conformance invariants
A conformance invariant is a named property of a whole captured MCP session, not a shape match on one request/response pair. The compliance corpus expect DSL checks a single message at a time: it answers "does this tools/list response have the right shape." Some spec requirements span the session: the handshake has to come first, a server that answers tools/list must have advertised the tools capability, and a JSON-RPC error has to carry an integer code. mcptest lifts those into typed invariants, each a pure function over the captured exchange. The same capture always produces the same result, so the checks fit a CI gate.
Run this example. examples/official-conformance.yml runs the local conformance checks these invariants back.
mcptest run --config examples/official-conformance.yml
Running them
The invariants run against a JSON capture file, so the command is deterministic and contacts no server:
mcptest compliance invariants --capture session.json
mcptest compliance invariants --capture session.json --format json
The command exits 0 when every invariant passes and no composition hazard is found, and 1 otherwise, so it gates CI directly.
Capture format
A capture is a session object, or an array of session objects for the multi-server composition mode. Each session carries the server label, the negotiated capability block, and the ordered client exchanges:
{
"server_label": "stdio://my-server",
"server_capabilities": { "tools": {} },
"exchanges": [
{
"request": { "jsonrpc": "2.0", "id": 1, "method": "initialize" },
"response": { "jsonrpc": "2.0", "id": 1, "result": {} }
},
{
"request": { "jsonrpc": "2.0", "id": 2, "method": "tools/list" },
"response": { "jsonrpc": "2.0", "id": 2, "result": { "tools": [] } }
}
]
}
A notification omits response. A session that never completed a handshake omits server_capabilities, which defaults to null. Recording the capture off a live server is the runner's job; this command is the scoring half.
The INV-NNN family
Invariants carry IDs in a dedicated INV-NNN family. The family is documented here, following the standardized rule-ID scheme. Like the SCHEMA-006, SEC, and DESC checks, invariants run in code rather than as compliance corpus rows, because each one reads the whole captured exchange, which the single request/response corpus assertions cannot express. Encoding them as Rust properties also keeps them out of the rule registry, so the rubric stats counts do not shift.
| ID | Category | Property |
|---|---|---|
| INV-001 | lifecycle | initialize is the first request the client sends. |
| INV-002 | lifecycle | The notifications/initialized notification follows the initialize response. |
| INV-003 | capability | A server that answers tools/list, resources/list, or prompts/list advertised the matching capability at initialize. |
| INV-004 | capability | A server that advertised a capability does not error on every call to it. |
| INV-005 | result-shape | A successful tools/call result carries a content array or a structuredContent object, and a boolean isError when present. |
| INV-006 | error-envelope | Every JSON-RPC error envelope carries an integer code and a string message. |
| INV-007 | error-envelope | A method-not-found error uses JSON-RPC code -32601. |
Multi-server composition safety
When the capture holds two or more sessions, the command runs each server's invariants individually and then re-asserts the properties that can fail only when servers coexist behind one client. These hazards are not visible in either capture alone:
- Tool-namespace overlap: two servers exposing the same unprefixed tool name. mcptest namespaces tools as
<server>__tool, so an unprefixed collision means a client that forgot to namespace would route ambiguously. The finding names the colliding tool and the servers that share it. - Shared-transport id collision: two servers reusing the same JSON-RPC request id. On one shared transport the client multiplexes both id spaces over a single channel, so a reused id cannot be correlated back to one request. Giving each server its own transport avoids this; the check flags the overlap so a shared-transport setup does not silently interfere.
The composition mode stays per-run and single-developer. Fleet aggregation, governance, and dashboards are out of scope here.