mcptest documentation
mcptest runs YAML tests against any Model Context Protocol server, in CI, on every commit. Start with the quickstart, or jump straight to a reference. These pages mirror the docs in the source repository.
mcptest
What mcptest is, why a green inspector is not a passing test, and how the pieces fit.
Getting started
Install the CLI, write your first test, and wire the gate into CI.
Scenarios
End-to-end recipes for the things teams actually test against MCP servers.
- Your first test
- Performance and token budgets
- Snapshot tests
- Compliance baseline
- Three-stage CI quality gate
- URL target against staging
- LLM-judge matcher preview
- Catch schema drift
- Scan for attacks
- Grade against the spec
- Test behind OAuth
- Multi-server suites
- Tool overload and selection under noise
- Rate limiting and backoff
- The migration doctor
- Record and replay with cassettes
Guides
Task-focused how-tos: CI integration, auth, caching, reporters, and more.
- CI integration
- Testing popular MCP servers
- Docker and package runners
- Multi-server suites
- Setup, teardown, and fixtures
- Native test-framework SDKs (YAML or in code)
- Test isolation
- URL targets
- Discovery (well-known/mcp.json)
- Authentication
- Auth: OAuth refresh
- Auth in tests
- Using the cache
- Cache eligibility
- OpenTelemetry tracing
- Schema diff
- Agent and model testing
- Model compatibility
- Rubric scoring
- LLM evals
- Jury consensus
- Judge calibration
- External scorers
- Troubleshooting
Reference
The exact YAML schema, CLI flags, exit codes, and output formats.
- YAML test format
- YAML configuration reference
- Suite composition
- Compositions (tool DAGs)
- Transforms
- CLI reference
- Pipelines and tool-call chaining
- Cassettes (record and replay)
- MCP server (mcptest mcp-server)
- Mock server (mcptest mock)
- Auto stub generation
- SARIF reporter
- GitLab Code Quality reporter
- Comparison-matrix reporter and model sweeps
- Compliance baseline
- Compliance grade
- Compliance score delta (CI gate)
- Spec-version pinning
- Stateless transport
- Conformance tier (SDK scorecard)
- Conformance CLI
- InputRequiredResult elicitation
- Auth-hardening conformance
- Extensions framework
- Subprocess plugin protocol
- Official conformance bridge
- Conformance invariants
- Structured output conformance
- Code-mode testing
- Multi-run pass^k tool selection
- Tool-selection F1 via equal-function sets
- Name-free discovery and orchestration diagnostics
- Distractor tools and tool-overload scoring
- Offline trace validation
- Within-session stability
- Narrative-vs-trace divergence
- Fault injection and recovery scoring
- Reliability reporting beyond pass^k
- Description quality scoring
- Scorer evidence oracle
- Model-compatibility baseline format
- Software Bill of Materials (mcptest sbom)
- Portable run evidence (mcptest evidence)
- Session ledger (mcptest ledger)
- Release process
- Verifying a published release
- Research references
Security
Scan tool definitions for prompt injection and toxic pairings, and wire findings into your scanners.