mcptest documentation
mcptest runs YAML tests against any Model Context Protocol server, in CI, on every commit. Start with the quickstart, or jump straight to a reference. These pages mirror the docs in the source repository.
mcptest
What mcptest is, why a green inspector is not a passing test, and how the pieces fit.
Foundation
6 pages.
Your first test
20 pages.
- Quickstart
- Scenario corpus
- Your first test
- Performance and token budgets
- Snapshot tests
- Compliance baseline
- Three-stage CI quality gate
- URL target against staging
- LLM-judge matcher
- Catch schema drift
- Scan for attacks
- Grade against the spec
- Test behind OAuth
- Multi-server suites
- Tool overload and selection under noise
- Rate limiting and backoff
- The migration doctor
- Record and replay with cassettes
- One agent test across a model matrix
- The hosted test server
Agent integration
4 pages.
Guides by topic
36 pages.
- Authentication
- Authentication reference
- OAuth access token auto-refresh
- Auth in tests
- Auth-hardening conformance
- Headless auth for agents
- Compliance and conformance
- Compliance baseline
- Compliance grade
- Conformance CLI
- Conformance tier (SDK scorecard)
- Official conformance bridge
- CI integration
- One command, one audit
- Testing popular MCP servers
- Docker and package runners
- Multi-server suites
- Setup, teardown, and fixtures
- Native test-framework SDKs (YAML or in code)
- Test isolation
- URL targets
- Discovery (well-known/mcp.json)
- Using the cache
- Cache eligibility
- OpenTelemetry tracing
- Schema diff
- Drift watch
- Agent and model testing
- Scenario-world harness
- Model compatibility
- Rubric scoring
- LLM evals
- Jury consensus
- Judge calibration
- External scorers
- Troubleshooting
Robustness without golden outputs
9 pages.
Reference
The exact YAML schema, CLI flags, exit codes, and output formats.
- YAML test format
- YAML configuration reference
- Suite composition
- Compositions (tool DAGs)
- Transforms
- CLI reference
- Exit codes
- Pipelines and tool-call chaining
- Cassettes (record and replay)
- Record to test (mcptest record)
- Serve a cassette (mcptest serve)
- Mock server (mcptest mock)
- Auto stub generation
- SARIF reporter
- GitLab Code Quality reporter
- Observability and eval-platform exports
- Comparison-matrix reporter and model sweeps
- Compliance score delta (CI gate)
- Software Bill of Materials (mcptest sbom)
- Portable run evidence (mcptest evidence)
- Importing external benchmarks (mcptest import)
- Session ledger (mcptest ledger)
- Policy simulator (mcptest policy simulate)
Advanced and deep dives
26 pages.
- Spec-version pinning
- Stateless transport
- InputRequiredResult elicitation
- Extensions framework
- Subprocess plugin protocol
- Coprocess protocol (SDK wire contract)
- Conformance invariants
- Structured output conformance
- Code-mode testing
- Choosing an agent scoring method
- Multi-run pass^k tool selection
- Tool-selection F1 via equal-function sets
- Tool-surface token efficiency
- Name-free discovery and orchestration diagnostics
- Distractor tools and tool-overload scoring
- Offline trace validation
- Within-session stability
- Narrative-vs-trace divergence
- Fault injection and recovery scoring
- Reliability reporting beyond pass^k
- Description quality scoring
- Tool-description quality benchmark
- Scorer evidence oracle
- Model-compatibility baseline format
- Workspace layout (architecture)
- Announcing model compatibility (v1.1 post)
Security
Scan tool definitions for prompt injection and toxic pairings, and wire findings into your scanners.
- Running the security checks
- Security test catalog
- Security vulnerability report and OWASP coverage
- Secret redaction
- External-scanner supplement
- Red-team exploitability
- Red-team corpus
- Pentest gate and scorecard
- Advisory LLM judge
- Transport, auth, and local probes
- Trust-boundary conformance
- Web Bot Auth
- Web Bot Auth corpus
Project
3 pages.