Scenario 4: compliance baseline
You just ran mcptest compliance against your server for the first time. Sixteen rules pass; four fail. You cannot fix all four today, and you do not want CI to flip red on every push until you do. The baseline file is the answer: declare the four failures as expected, ship green CI now, and chip away at the list.
The full pattern lives in docs/compliance-baseline.md. This scenario is the practical "what does this look like in my repo" walk-through.
The YAML
You have an existing tests/compliance.yml:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
local:
command: ["./target/debug/my-mcp-server"]
compliance:
- name: "initialize handshake"
server: local
check: "initialize"
- name: "tools list shape"
server: local
check: "tools/list"
- name: "ping liveness"
server: local
check: "ping"
- name: "resources/list shape"
server: local
check: "resources/list"
Save the baseline as tests/compliance-baseline.yml:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json#/$defs/ComplianceBaseline
server:
# Long form: documented reason plus a tracker link.
- rule_id: RES-003
reason: server has no resources surface yet
tracking: https://github.com/our-org/server/issues/512
# Short form: bare rule ID, no metadata yet.
- PROTO-003
The baseline is a top-level server: list of the compliance rule IDs the server is known to fail. Each entry is one of two shapes, and you can mix them in the same list:
- Short form: a bare rule-ID string (
PROTO-003). - Long form: a mapping with
rule_id(required), plus optionalreason(why the rule is suppressed, surfaced in reporter output) andtracking(a link to the issue tracking the fix).
There is no per-entry test name and no expiry date. The list is keyed on the rule ID alone, so it stays small and reviewable.
How to run it
# drive the suite against the live server and apply the baseline
mcptest compliance run --from-suite tests/compliance.yml \
--baseline tests/compliance-baseline.yml
--from-suite points at the compliance test file; --baseline points at the baseline file. Outcomes classify against four decisions: a rule that passes normally and a baselined rule that still fails both exit 0; a rule that fails without being baselined (a new regression) and a baselined rule that now passes (a stale entry) both exit 1.
Expected output
A clean run with two known failures suppressed:
mcptest compliance run --from-suite tests/compliance.yml --baseline tests/compliance-baseline.yml
PASS initialize handshake
PASS tools list shape
KNOWN PROTO-003 (expected failure, baselined)
KNOWN RES-003 (expected failure, baselined: server has no resources surface yet)
2 passed, 0 failed, 2 expected failures suppressed in 312ms
When a baselined rule starts passing (you fixed it), the reporter flags it as a stale entry and the run exits 1 so the baseline gets trimmed:
STALE RES-003 (was baselined, now passing)
2 passed, 0 failed, 1 stale baseline entry
ERROR: stale baseline entries. Remove fixed rule IDs from compliance-baseline.yml.
This is the design: a baselined rule that suddenly passes is either real progress (remove it) or a bug (investigate why it stopped failing). Either way, the baseline file gets a review.
CI integration
- name: compliance
run: mcptest compliance run --from-suite tests/compliance.yml --baseline tests/compliance-baseline.yml
That is it. The baseline file is checked in to your repository alongside the test file; CI uses whatever is on disk. When you fix a rule, the next CI run flips red until you remove the rule ID from the baseline. The cost of fixing is one extra commit, paid only once per fixed rule.
When to use a baseline vs. when to fix
A baseline is a triage tool, not a parking lot. Use one for:
- Rules you cannot fix this sprint but you have an owner and an ETA for.
- New compliance rules that landed upstream and you need a grace period for.
- Temporary regressions during a refactor (document the timeline in
reason).
Do not use one for:
- Rules you do not intend to ever fix. Either fix them or drop the tests that exercise them with a tag and
--skip-tag. - Failures whose cause you do not understand. A baseline that hides an unexplained failure is technical debt.
- An empty list. If your server passes every rule, delete the baseline file. An empty file is a silent invitation for failures to accumulate.
See also
docs/compliance-baseline.md, the full pattern.- Previous: Snapshot tests.
- Next: CI quality gate.