Negative-path conformance
Status: implemented behind the preview schema flag. Tracked as epic WOR-1236 and child WOR-1239.
Most tests check that good input produces a good result. A robust server also has to reject bad input cleanly: a well-formed JSON-RPC error, not a silent acceptance, a crash, or a hang. The MCP runtime-fault taxonomies (A Taxonomy of Runtime Faults in MCP Servers, arXiv:2606.05339; Real Faults in MCP Software, arXiv:2603.05637) catalog the ways servers get this wrong. The negative_path: block runs a small, taxonomy-keyed probe set against a tool and gates on whether each probe was rejected.
The probes
Each probe maps to a fault-taxonomy id so a failure points back to the literature.
| Probe | Taxonomy id | Bad request | Contract |
|---|---|---|---|
unknown_tool | FAULT-PROTO-UNKNOWN-METHOD (2606.05339) | call a tool that does not exist | a method-not-found-class error |
missing_required | FAULT-SCHEMA-MISSING-REQUIRED (2603.05637) | omit a required argument | an invalid-params-class error |
wrong_type | FAULT-SCHEMA-TYPE-MISMATCH (2603.05637) | send a wrong-typed argument | an error |
extra_field | FAULT-SCHEMA-UNEXPECTED-FIELD (2603.05637) | send an unexpected field | rejection when additionalProperties is false |
oversized | FAULT-INPUT-OVERSIZED (2606.05339) | send an oversized argument | an error or a result, never a hang |
A probe passes when the server rejects the request: a JSON-RPC error response, or a tool-level error result (isError: true). The oversized probe is softer, since accepting a large input is legitimate: it only requires that the call returns at all, never a hang or a crash.
Targets and the gate
| Target | Meaning |
|---|---|
negative_path.checks_run | Number of probes that ran. |
negative_path.failures | Number of probes that did not meet the contract. |
negative_path.gate_passed | 1 when every probe passed, 0 otherwise. |
tools:
- name: search rejects bad requests
server: api
tool: search
args: { query: "anthropic" }
negative_path:
checks: [unknown_tool, missing_required, wrong_type]
Omit checks: to run the full set, and omit expect: to apply the default gate, which fails on any probe that did not get a clean rejection.
A note on lenient servers
Many servers do not validate argument types, so the wrong_type and extra_field probes will report a finding against them. That is the point: a server that accepts a wrong-typed argument is the type-validation fault the taxonomy describes. Run those probes against a server you expect to validate, and select the universal unknown_tool and missing_required probes for one you do not.
What it does not do
The probes check the error contract, not the error message wording or the exact code. Pair them with the input fuzzer, which sweeps a broader space of malformed input and checks the same crash-and-hang safety, and with ordinary assertion tests for the happy path.