Test isolation: restart policy and parallelism
Status: the policy library + the
ServerPoolAPI surface ship today; the per-test loop integration is the next step. The schema accepts every value, so configs that opt in toper_fileorper_testtoday are valid and will not need editing once the runtime ships the respawn loop. Until then, those values parse correctly but the runner treats them asper_run(no error, no actual respawn). The parallelism knob works today.
mcptest run today spawns the server once, runs every test against the same process, and tears it down at the end. That default is right for stateless servers. For stateful servers (databases, queues, in-memory state machines) it leads to state leaks between tests and parallel-worker interference. The run_options: block adds explicit restart and parallelism controls.
The library that classifies each test boundary lives in mcptest_core::runner::restart: restart_action(policy, boundary) takes a RestartPolicy plus a Boundary (RunStart, BetweenTestsInFile, or BetweenFiles) and returns Keep or Restart. mcptest_core::connector::ServerPool::shutdown_one(name) tears down a single server when the action says restart; a subsequent connect_server + insert brings it back up. The runner integration calls these in the per-test loop.
At a glance
run_options:
restart_policy: per_run # default; spawn once per `mcptest run`
parallel: auto # default; respect CPU count
| Setting | Values | Default | Effect |
|---|---|---|---|
restart_policy | per_run, per_file, per_test | per_run | When to respawn (or reconnect to) the server. |
parallel | auto, false, positive integer | auto | Parallelism. false serializes; an integer pins the worker count. |
Worked example: stateful server, per-test restart
A server keeps an in-memory cache that survives between tests. Earlier tests leave entries that confuse later ones. Force a fresh process per test and serial execution to prevent worker interference:
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
run_options:
restart_policy: per_test
parallel: false
servers:
cache:
command: ["./bin/cache-mcp"]
tools:
- name: "first put goes in clean"
server: cache
tool: put
args: { key: "a", value: "1" }
- name: "second put goes in clean"
server: cache
tool: put
args: { key: "a", value: "2" }
expect:
- target: "result.content[0].existing"
matcher:
exact: false # would be true if the cache leaked between tests
The two tests both expect a fresh process. With per_test, the runner respawns the binary between each test; with parallel: false, no two tests collide on shared state.
Restart policy details
per_run (default)
Spawn once at the start of the run, run every test, teardown at the end. Today's behavior. Right for stateless servers and idempotent operations.
per_file
Spawn before each test file, teardown after the file. Use when state leaks between files but is fine inside a file. Composes with the setup: / teardown: blocks at file level.
per_test
Spawn before each test, teardown after each test. Slowest but maximally isolated. Composes with the setup_per_test: block.
URL targets
For URL targets the runner cannot restart a server it did not spawn. per_file and per_test control the connection lifecycle: the runner disconnects and reconnects, optionally re-runs the readiness probe, before the next file or test.
The docs call this out so operators do not file bugs about a "not restarted" remote server: it is a deliberate limit of the URL transport.
Parallelism details
auto
Pick a worker count based on the CPU count. The current default.
false
Force serial execution. Safest for stateful servers, slowest for stateless ones. Pair with per_test or per_file when state leaks are catastrophic.
Positive integer
Pin the worker count. Useful when CI builds run on a host with a fixed budget (for example, a 4-vCPU runner where auto overshoots).
Why no parallel: true?
Operators historically write parallel: true to mean "default worker count," which is ambiguous: is that one worker, all cores, or something else? auto is the explicit name for the default. The loader rejects parallel: true with a hint pointing at auto.
Project vs file overrides
run_options: at file level overrides project-level values. A specific file can pin itself to per_test while the rest of the suite stays on per_run:
# tests/stateful-cache.yml
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
run_options:
restart_policy: per_test
servers:
cache:
command: ["./bin/cache-mcp"]
tools:
- name: "cache starts empty"
server: cache
tool: get
args:
key: "session"
expect:
- target: "result.isError"
matcher:
exact: false
CLI flags --restart-policy and --parallel N / --no-parallel override everything for one invocation:
mcptest run --restart-policy per_test --no-parallel tests/
Composes with fixtures
When setup_per_test: and restart_policy: per_test both appear in a file, the runner does, in order:
- Respawn (or reconnect) the server.
- Run
setup_per_test:steps. - Run the test.
- (Future) run
teardown_per_test:steps if added. - Teardown the server.
The fixture semantics and the spawn order are both committed, and both run on top of the same future runtime release.
Today: what the runner does
The runner honors per_run (no change to behavior). per_file and per_test parse legally but the runner currently treats them as per_run (no error, no actual respawn) until the executor gains ServerPool shutdown-and-respawn between files and tests.
parallel: auto, false, and integer worker counts are honored today.
Roadmap
The runtime work still pending:
- Per-file and per-test respawn for stdio targets.
- Connection lifecycle handling for URL targets.
- Worker pool with
auto,false, and pinned counts. - Reporter feedback for non-default policies (so operators can spot slow modes in the run header).
The schema does not change between today and that release; your YAML files will continue to validate. These are planned for a future release.
References
docs/fixtures.md, thesetup_per_test:story that composes withper_testdocs/multi-server.md, the multi-server story that usesdefault_server:alongsiderun_options: