mcptest docs GitHub

SDK-tier scoring (SEP-2484)

The 2026-07-28 spec ties the publicly visible SDK tier to the conformance suite's MUST / SHOULD pass counts. mcptest aggregates a run's results into a single tier so a CI badge can summarize the run with one letter and a percentage.

Tiers

TierBadgeRule
Tier 1 (gold)T1Every MUST check passes and at least 95 % of SHOULD checks pass.
Tier 2 (silver)T2Every MUST check passes and at least 70 % of SHOULD checks pass.
Tier 3 (bronze)T3Every MUST check passes. SHOULD coverage below 70 % but no MUST regressions.
FailFAt least one MUST check failed. No tier awarded.

The thresholds match the spec table exactly. MAY checks affect neither the tier nor the percentage; they are reported for the operator and never gate the badge.

Library surface

mcptest_core::conformance::tier:

ItemPurpose
TierInput { must_passed, must_total, should_passed, should_total }Aggregated counts from a run.
Tier { Tier1, Tier2, Tier3, Fail }Verdict.
score_tier(input) -> TierPure function implementing the SEP-2484 rule.
Tier::badge()Short letter form (T1 / T2 / T3 / F) for CI badges.
TIER1_SHOULD_THRESHOLD, TIER2_SHOULD_THRESHOLDThe two pass-rate boundaries as f64 constants.

Edge cases

Vendored corpus

The MCP conformance suite scenarios live under conformance-corpus/ in this repo, not as a submodule. The README in that directory documents the refresh procedure: locate the upstream working-group repository, copy the scenarios for the target spec revision, update the upstream tag in upstream.txt, and open a pull request that lists added, removed, and changed scenarios so a reviewer can decide whether the change crosses an SDK-tier boundary.

A weekly cron job opens the refresh PR automatically once the upstream repository is identified. Until then, the directory is intentionally empty.

Planned follow-up