Fuzzing, schema lint, and edge coverage
Three checks sit between the golden floor and the oracle-free hub: input fuzzing drives schema-derived malformed input and checks the server fails cleanly, the strict schema lint catches under-constrained schemas statically before they ever run, and tool-edge coverage gates an agent run against its declared tool edges. They are complementary: a tool that passes the lint is far less likely to crash under a fuzz sweep, and a restricted-edge attempt is also a security signal.
Input fuzzing
Status: implemented behind the preview schema flag.
Most tests check the happy path: given good arguments, expect a good result. A server also has to survive bad arguments. The MCP runtime-fault taxonomies (A Taxonomy of Runtime Faults in MCP Servers, arXiv:2606.05339; Real Faults in MCP Software, arXiv:2603.05637) found parameter and type-validation faults to be a recurring failure mode: a type mismatch, a missing required field, or a malformed structure that the server does not handle cleanly. The fuzzer exercises that surface.
It derives malformed argument cases from a tool's inputSchema, issues each one, and checks the server fails cleanly. The case generator runs no model and is deterministic: the same schema and seed produce the same cases, so a fuzz run is reproducible in CI.
In a suite
Add a fuzz: block to a tool entry in mcptest.yaml:
tools:
- name: search survives a fuzz sweep
server: api
tool: search
args: { query: "anthropic", limit: 10 }
fuzz:
seed: 1729
cases: 64
max_call_ms: 2000
Then run the whole suite, one report and one exit code:
mcptest run --config mcptest.yaml
No suite? Run it standalone
To fuzz every tool a server exposes without writing a suite:
mcptest fuzz --server-command "node ./dist/server.js"
mcptest fuzz --server-url https://api.example.com/mcp --seed 7 --cases 128
The subcommand lists the server's tools, fuzzes each from its advertised schema, and exits non-zero if any tool crashes, hangs, violates the protocol, or leaks.
The cases
From the schema (and, when no schema is advertised, from the shape of the base arguments) the generator builds:
- omit a required field, for each field the schema marks required.
- wrong type, setting a field to a value of a different type than declared.
- null, setting a field to null.
- oversize, a very long string or a very large array for a field of that type, and an extreme integer for a numeric field.
- structural, a non-object as the whole arguments value, an empty object, and an unexpected field when
additionalPropertiesis false.
The cases are enumerated in a stable order. When there are more than the cases budget, a seed-dependent window is taken, so a smaller budget still varies with the seed.
The oracle
The oracle is negative-path correctness, not a golden output. Each case is classified by how the call comes back:
- clean: a well-formed JSON-RPC error (the server rejected bad input) or a valid result (the server accepted it). Both are fine.
- crash: the transport dropped or the server died (a closed pipe, a panic).
- hang: the call did not return within
max_call_ms. - protocol_violation: the response was a malformed or unparseable JSON-RPC envelope.
Independently, a leak is flagged when an error response carries an internal detail (a stack trace, a source location, a secret-shaped token). The leak check is a conservative heuristic, so an ordinary "missing required argument" message does not trip it.
Assertable targets and the gate
The check exposes six targets. The names are exact.
| Target | Meaning |
|---|---|
fuzz.cases_run | Total cases dispatched. |
fuzz.crashes | Cases that crashed the server or dropped the transport. |
fuzz.hangs | Cases that did not return within max_call_ms. |
fuzz.protocol_violations | Cases that returned a malformed envelope. |
fuzz.leaks | Cases whose error response leaked an internal detail. |
fuzz.gate_passed | 1 when the report is clean, 0 otherwise. |
The default gate (no expect:) fails on any crash, hang, protocol violation, or leak. Write an explicit expect: to assert a target directly.
The fuzzer checks that bad input fails cleanly, not that good input produces a correct result. It will not find a logic bug that returns a wrong-but-well-formed answer. Pair it with ordinary assertion tests for correctness and with the metamorphic relations for the oracle-free middle ground.
Strict input-schema lint
Status: implemented.
An under-constrained inputSchema lets malformed input reach the server, which the runtime-fault taxonomy (Real Faults in MCP Software, arXiv:2603.05637) ties to a class of parameter and type-validation faults. The fuzzer finds these at runtime; the schema lint catches them statically, which is cheaper.
The rules
Each rule inspects one tool's inputSchema and carries a stable id.
| Rule | Severity | What it flags |
|---|---|---|
| SCH-001 | warning | the object declares properties but no required list, so the server cannot rely on any argument being present |
| SCH-002 | warning | additionalProperties is not false, so unexpected fields are accepted silently |
| SCH-003 | critical | a property declares neither type nor enum, so any value is accepted |
| SCH-004 | warning | a string property has no maxLength, or an array property has no maxItems, so input size is unbounded |
In a suite
The findings surface through the tool_quality: block as two assertable targets, alongside the existing description-quality targets:
schema_warnings: count of SCH findings at warning severity.schema_criticals: count of SCH findings at critical severity.
tool_quality:
- name: tool schemas are well constrained
server: local
expect:
- target: schema_criticals
matcher: { schema: { maximum: 0 } }
- target: schema_warnings
matcher: { schema: { maximum: 3 } }
These do not change the default tool_quality: gate; declare them explicitly to opt in. Run the whole suite with mcptest run --config mcptest.yaml.
No suite? Run it standalone
Run the lint and the autofix from the command line over a captured tools/list snapshot:
mcptest schema-lint tools.json # report findings, exit 1 if any
mcptest schema-lint tools.json --fix # print the tightened catalog
mcptest schema-lint tools.json --fix --write # tighten the snapshot in place
mcptest schema-lint is the standalone surface; the same lint also runs inside a suite's tool_quality: check. See the CLI reference for every flag. (This is distinct from mcptest lint, which scans suites for deprecated MCP features.)
The autofix
The lint ships a mechanical fix. Given an under-constrained schema it returns a tightened copy that sets additionalProperties: false and adds a required list of every declared property, applied recursively to nested object schemas. It deliberately does not invent a type, a maxLength, or a maxItems, since the right value is the author's to choose, so SCH-003 and SCH-004 stay findings rather than guesses. The examples/tool-schema-lint directory pins a loose schema and its tightened output together with a byte-for-byte test.
The lint is structural: it checks that a schema constrains its inputs, not that the constraints are semantically right. A maxLength of one million still passes SCH-004. Pair it with the fuzzer, which exercises the actual runtime handling the schema describes.
Tool-edge coverage
Status: implemented behind the preview schema flag.
End-to-end task success hides whether a declared access rule was actually exercised. An agent can pass its task and still have called a tool it was never supposed to touch, or never have exercised the tool you most wanted covered. Testing Agentic Workflows with Structural Coverage Criteria (Kahani, Bagherzadeh, 2026, arXiv:2605.26521) derives coverage obligations over the workflow's tool edges. The tool_edges: gate brings that to an agent test: it folds the run trace against a declared edge set into three deterministic numbers, with no model in the scoring.
The edges
- allowed: tools the run is expected to exercise.
edges.allowed_pctis the share that were called, 0 to 100. - restricted: tools the run must never call.
edges.restricted_attemptsis the count of calls to one, and any attempt fails the default gate. This is the safety edge. - delegation: declared
from -> toagent hand-offs.edges.delegation_pctis the share observed in the trace'sdelegationslist, for multi-agent runs.
In a suite
The gate lives on an agent entry in mcptest.yaml and exposes four targets, each usable in expect: with the standard matcher::
| Target | Meaning |
|---|---|
edges.allowed_pct | Percent of allowed edges exercised. |
edges.restricted_attempts | Count of calls to a restricted tool. |
edges.delegation_pct | Percent of delegation edges observed. |
edges.gate_passed | 1 when no restricted tool was called, 0 otherwise. |
agents:
- name: triage agent stays within its allowed tools
model: claude-sonnet-4-5
servers: [repo]
prompt: Find the open issues and summarize them.
tool_edges:
allowed: [search, summarize]
restricted: [delete_repo, force_push]
delegation: [{ from: planner, to: worker }]
expect:
- target: edges.restricted_attempts
matcher: { schema: { maximum: 0 } }
- target: edges.allowed_pct
matcher: { schema: { minimum: 80 } }
Omit expect: to apply the default gate, which fails on any call to a restricted tool (edges.restricted_attempts <= 0). A restricted-edge attempt is also a security signal: a destructive tool the agent was told to avoid but reached for anyway. Run the whole suite with mcptest run --config mcptest.yaml.
The gate checks that the run stayed inside its declared edges, not that the declared edges are the right ones. It is structural coverage, not correctness. Pair it with ordinary agent assertions on the final answer, and with the narrative-vs-trace check so the agent's story matches the calls the coverage counted.