What is mcptest

mcptest is an open-source CLI for testing Model Context Protocol (MCP) servers. It runs declarative YAML test files against any MCP server over stdio, streamable HTTP, or legacy SSE, and checks the whole surface: tool, resource, and prompt calls; the agent loop a real model drives; schema drift; spec compliance; and a deterministic security scan of your tool definitions. It runs on every commit, fails the build on a regression, and reports in seven formats. Single static binary, Apache-2.0 licensed, by Soap Bucket LLC.

This page is the canonical short definition that the rest of the site, the README, the GitHub repo description, package metadata, and the og:description meta tag all derive from. If the wording here changes, update those surfaces in the same commit.

In one sentence

mcptest tests Model Context Protocol servers in CI by running declarative YAML against any MCP server and checking the responses, the agent loop a real model drives, schema drift, spec compliance, and the security of your tool definitions, each behind a stable exit code.

In one paragraph

mcptest is the CLI for testing Model Context Protocol (MCP) servers. You write YAML test files that name a server, a tool, the arguments, and the responses you expect. mcptest connects over stdio, streamable HTTP, or legacy SSE, performs the MCP initialize handshake, and runs each test. The same tool drives a real model through the agent loop and asserts on the trace, diffs the tool catalog against a baseline to catch breaking drift, grades the server against a compliance corpus, and scans your tool definitions for prompt-injection and other deterministic security findings. It captures performance numbers and emits a report in any of seven formats (pretty, JSON, JUnit, Markdown, HTML, SARIF, GitLab Code Quality). Single binary, Apache-2.0, by Soap Bucket LLC.

What mcptest is

mcptest is:

A spec runner. Tests live in YAML, validated at load time by a published JSON Schema at https://mcptest.sh/schema/v1.json. Write once, run anywhere mcptest runs.
CI-grade. Exit codes are stable, reporters cover every format CI already understands, and a single run produces machine-readable artifacts CI can store. First-class integrations for GitHub Actions, GitLab CI, and CircleCI.
Deterministic. Cassette record and replay capture real protocol exchanges to JSON; the replay path uses the same normalization pass as the record path so a recording from today and one from next week diff cleanly.
Single-developer scope. Everything a single developer needs on their own machine ships in the open-source binary.
Apache-2.0. No commercial restrictions.

What mcptest is not

mcptest is not:

A debugger. Use your editor's MCP tooling for interactive inspection.
A fuzzer. Use a property-testing library for randomized input generation.
A load tester. Use a dedicated load tool (k6, wrk, vegeta) for throughput testing.
An MCP server. mcptest talks to servers; it does not implement the server side of the protocol (except a small fixture server bundled for smoke tests).
A linter for MCP server source code. mcptest validates runtime behavior over the wire, not the implementation language.

What mcptest replaces

Before mcptest, teams testing MCP servers wrote bespoke harnesses: shell scripts that piped JSON-RPC frames into a subprocess, ad-hoc HTTP clients that compared response bodies with jq and grep, and one-off Node or Python integration suites that no two teams shared. mcptest replaces all of that with one YAML schema, one binary, and nine reporters.

Key capabilities

Transports: stdio, streamable HTTP, legacy SSE.
Matchers: exact, contains, regex, subset, JSON schema, snapshot, the string matchers (contains-all, contains-any, icontains, starts-with, levenshtein), is-json, is-valid-tools-call, the LLM llm-judge and llm-jury matchers, and the not combinator. Latency and token budgets are separate per-test fields, not matchers.
Cassettes: deterministic record/replay, content-addressed cache, diff-friendly JSON.
Auth: OAuth 2.1 + PKCE, bearer tokens, custom headers, environment redaction on every reporter.
Agent loop: drive a real model across one or more servers and assert on the trace (tool choice, arguments, tokens, latency), with a matrix across models and an llm-judge or llm-jury on the final answer.
Drift: mcptest diff compares the tool catalog and response shapes against a recorded baseline and classifies each change as breaking or not.
Compliance corpus: PROTO, SCHEMA, SEQ, TOOL, RES, EDGE categories, graded A+ through F.
Security: mcptest security runs deterministic checks over your tool, prompt, and resource definitions (description injection, hidden unicode, secrets, toxic-flow pairings, integrity drift against a baseline) and reports as SARIF.
Reporters: pretty, JSON, JUnit, Markdown, HTML, SARIF, GitLab Code Quality.
Baselines: expected-failures file lets adoption proceed without blocking PRs on known-flaky tests.
Auto-discovery: detects servers configured in Claude Desktop, Cursor, and Claude Code, then scaffolds a starter suite.

Who builds mcptest

Soap Bucket LLC (two words, capital S and B) at soapbucket.com. Legal contact: legal@soapbucket.com. The open-source project is Apache-2.0 licensed and accepts contributions through GitHub at github.com/soapbucket/mcptest.

License

Apache-2.0. See LICENSE and NOTICE at the repo root.

Distribution

Homebrew: brew install soapbucket/tap/mcptest.
Install script: curl -sSL https://download.mcptest.sh/install.sh | sh.
Docker: soapbucket/mcptest:latest on Docker Hub.
Cargo: cargo install mcptest.
GitHub Actions: install with the script in a workflow step (a soapbucket/mcptest-action is staged as an example but not yet published).
Releases: signed tarballs at github.com/soapbucket/mcptest/releases.