mcptest docs GitHub

Input fuzzing

Status: implemented behind the preview schema flag. Tracked as epic WOR-1236 and child WOR-1238.

Most tests check the happy path: given good arguments, expect a good result. A server also has to survive bad arguments. The MCP runtime-fault taxonomies (A Taxonomy of Runtime Faults in MCP Servers, arXiv:2606.05339; Real Faults in MCP Software, arXiv:2603.05637) found parameter and type-validation faults to be a recurring failure mode: a type mismatch, a missing required field, or a malformed structure that the server does not handle cleanly. The fuzzer exercises that surface.

A terminal session: mcptest fuzz drives schema-derived malformed input at every tool of the built-in evil mock, eight seeded cases each, and every tool stays well-behaved

It derives malformed argument cases from a tool's inputSchema, issues each one, and checks the server fails cleanly. The case generator runs no model and is deterministic: the same schema and seed produce the same cases, so a fuzz run is reproducible in CI.

The cases

From the schema (and, when no schema is advertised, from the shape of the base arguments) the generator builds:

The cases are enumerated in a stable order. When there are more than the cases budget, a seed-dependent window is taken, so a smaller budget still varies with the seed.

The oracle

The oracle is negative-path correctness, not a golden output. Each case is classified by how the call comes back:

Independently, a leak is flagged when an error response carries an internal detail (a stack trace, a source location, a secret-shaped token). The leak check is a conservative heuristic, so an ordinary "missing required argument" message does not trip it.

Assertable targets and the gate

The check exposes six targets. The names are exact.

TargetMeaning
fuzz.cases_runTotal cases dispatched.
fuzz.crashesCases that crashed the server or dropped the transport.
fuzz.hangsCases that did not return within max_call_ms.
fuzz.protocol_violationsCases that returned a malformed envelope.
fuzz.leaksCases whose error response leaked an internal detail.
fuzz.gate_passed1 when the report is clean, 0 otherwise.

The default gate (no expect:) fails on any crash, hang, protocol violation, or leak. Write an explicit expect: to assert a target directly.

tools:
  - name: search survives a fuzz sweep
    server: api
    tool: search
    args: { query: "anthropic", limit: 10 }
    fuzz:
      seed: 1729
      cases: 64
      max_call_ms: 2000

The subcommand

To fuzz every tool a server exposes without writing a suite:

mcptest fuzz --server-command "node ./dist/server.js"
mcptest fuzz --server-url https://api.example.com/mcp --seed 7 --cases 128

The subcommand lists the server's tools, fuzzes each from its advertised schema, and exits non-zero if any tool crashes, hangs, violates the protocol, or leaks.

What it does not do

The fuzzer checks that bad input fails cleanly, not that good input produces a correct result. It will not find a logic bug that returns a wrong-but-well-formed answer. Pair it with ordinary assertion tests for correctness and with the metamorphic relations for the oracle-free middle ground.