Compliance baseline (compliance-baseline.yml)
mcptest borrows the expected-failures pattern from the @modelcontextprotocol/conformance suite. A baseline file is a list of compliance rule IDs your server is known to fail today. CI passes when only those rules fail and flips red when anything new fails or when a previously failing rule starts passing without the baseline being updated.
Run this example. examples/compliance-baseline.yml declares the rules a server is known to fail, in both the short and long forms. CI stays green while only those rules fail.
mcptest compliance run --from-suite tests/compliance.yaml --baseline examples/compliance-baseline.yml
This page explains the workflow. The matcher lives in mcptest-core::compliance::baseline; the CLI wiring lands in a follow-up ticket (mcptest compliance --update-baseline is in flight).
Why baselines exist
Compliance suites grow faster than servers do. If you must pass every check on day one, you either delay shipping or vendor in a fork of the suite that quietly skips the failures. Neither is good. The baseline lets you adopt incrementally: declare what you do not pass yet, ship green CI, and chip away at the list. Each removal is a measurable improvement.
The pattern also catches drift. A baselined rule that suddenly passes is either real progress (great, remove it) or a bug somewhere that lets the check return a false positive. CI calling out the stale entry forces a review either way.
When to use a baseline
Use one when:
- You are wiring
mcptest complianceinto CI for the first time and the suite reports failures you cannot fix today. - A new compliance rule lands upstream and you need a grace period.
- You are refactoring an area that temporarily regresses a rule (rare, document the timeline in the entry's
reason).
Do not use one when:
- The server passes every rule. Just delete the file. An empty baseline is a silent invitation for failures to accumulate.
- You disagree with a rule. Argue for the rule's removal upstream, do not hide it under your baseline.
- A single failure is flaky. Fix the flake; do not paper over it.
File format
compliance-baseline.yml lives next to your mcptest.yml. Two equivalent shapes are accepted in the same list so you can adopt the short form first and grow to the long form when you want metadata.
Long form:
# compliance-baseline.yml
server:
- rule_id: LC-INIT-007
reason: server.json field missing in a future release
tracking: https://github.com/our-org/server/issues/42
- rule_id: TOOLS-CALL-003
reason: known race condition
tracking: https://github.com/our-org/server/issues/421
Short form:
server:
- LC-INIT-007
- TOOLS-CALL-003
The short form expands to { rule_id: "...", reason: null, tracking: null }. Mixing is allowed, so you can record metadata for the rules where you have something to say and leave the rest bare.
Editors that read JSON Schema can point at the published schema for inline validation:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json#/$defs/ComplianceBaseline
server:
- LC-INIT-007
Exit code semantics
The runner walks every rule, decides per rule, then aggregates:
| Per-rule decision | Run-level effect |
|---|---|
| Normal pass | No effect, exit 0. |
| Expected failure | Counted, exit 0. |
| New regression | Flips CI red, exit 1. |
| Stale baseline entry | Flips CI red, exit 1. |
Concretely: the run exits 0 only when the set of failing rules matches the baseline exactly. Anything else is a signal worth surfacing.
Regenerating the baseline
A future ticket wires mcptest compliance --update-baseline so the CLI can rewrite the file for you. The flow:
- Run
mcptest compliance --update-baselineagainst your server. - The runner replaces
compliance-baseline.ymlwith the current set of failing rules in short form. Existingreasonandtrackingfields are not preserved (the runner does not know them). - Review the diff. Promote entries to long form by hand and link them to tracker issues.
The library function update_baseline is already in place; the CLI hookup is tracked separately.
References
- The
@modelcontextprotocol/conformanceproject, which inspired the pattern. Pin a specific commit when you reproduce its behavior so the contract is stable across upstream changes.