mcptest docs GitHub

Scenario 4: compliance baseline

You just ran mcptest compliance against your server for the first time. Sixteen rules pass; four fail. You cannot fix all four today, and you do not want CI to flip red on every push until you do. The baseline file is the answer: declare the four failures as expected, ship green CI now, and chip away at the list.

The full pattern lives in docs/compliance-baseline.md. This scenario is the practical "what does this look like in my repo" walk-through.

The YAML

You have an existing tests/compliance.yml:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  local:
    command: ["./target/debug/my-mcp-server"]

compliance:
  - name: "initialize handshake"
    server: local
    check: "initialize"
  - name: "tools list shape"
    server: local
    check: "tools/list"
  - name: "ping liveness"
    server: local
    check: "ping"
  - name: "resources/list shape"
    server: local
    check: "resources/list"

Save the baseline as tests/compliance-baseline.yml:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json#/$defs/ComplianceBaseline

server:
  # Long form: documented reason plus a tracker link.
  - rule_id: RES-003
    reason: server has no resources surface yet
    tracking: https://github.com/our-org/server/issues/512
  # Short form: bare rule ID, no metadata yet.
  - PROTO-003

The baseline is a top-level server: list of the compliance rule IDs the server is known to fail. Each entry is one of two shapes, and you can mix them in the same list:

There is no per-entry test name and no expiry date. The list is keyed on the rule ID alone, so it stays small and reviewable.

How to run it

# drive the suite against the live server and apply the baseline
mcptest compliance run --from-suite tests/compliance.yml \
  --baseline tests/compliance-baseline.yml

--from-suite points at the compliance test file; --baseline points at the baseline file. Outcomes classify against four decisions: a rule that passes normally and a baselined rule that still fails both exit 0; a rule that fails without being baselined (a new regression) and a baselined rule that now passes (a stale entry) both exit 1.

Expected output

A clean run with two known failures suppressed:

mcptest compliance run --from-suite tests/compliance.yml --baseline tests/compliance-baseline.yml

  PASS  initialize handshake
  PASS  tools list shape
  KNOWN PROTO-003                             (expected failure, baselined)
  KNOWN RES-003                               (expected failure, baselined: server has no resources surface yet)

2 passed, 0 failed, 2 expected failures suppressed in 312ms

When a baselined rule starts passing (you fixed it), the reporter flags it as a stale entry and the run exits 1 so the baseline gets trimmed:

  STALE RES-003                               (was baselined, now passing)

2 passed, 0 failed, 1 stale baseline entry
ERROR: stale baseline entries. Remove fixed rule IDs from compliance-baseline.yml.

This is the design: a baselined rule that suddenly passes is either real progress (remove it) or a bug (investigate why it stopped failing). Either way, the baseline file gets a review.

CI integration

- name: compliance
  run: mcptest compliance run --from-suite tests/compliance.yml --baseline tests/compliance-baseline.yml

That is it. The baseline file is checked in to your repository alongside the test file; CI uses whatever is on disk. When you fix a rule, the next CI run flips red until you remove the rule ID from the baseline. The cost of fixing is one extra commit, paid only once per fixed rule.

When to use a baseline vs. when to fix

A baseline is a triage tool, not a parking lot. Use one for:

Do not use one for:

See also