Test isolation: restart policy and parallelism

Status: the policy library + the ServerPool API surface ship today; the per-test loop integration is the next step. The schema accepts every value, so configs that opt in to per_file or per_test today are valid and will not need editing once the runtime ships the respawn loop. Until then, those values parse correctly but the runner treats them as per_run (no error, no actual respawn). The parallelism knob works today.

mcptest run today spawns the server once, runs every test against the same process, and tears it down at the end. That default is right for stateless servers. For stateful servers (databases, queues, in-memory state machines) it leads to state leaks between tests and parallel-worker interference. The run_options: block adds explicit restart and parallelism controls.

The library that classifies each test boundary lives in mcptest_core::runner::restart: restart_action(policy, boundary) takes a RestartPolicy plus a Boundary (RunStart, BetweenTestsInFile, or BetweenFiles) and returns Keep or Restart. mcptest_core::connector::ServerPool::shutdown_one(name) tears down a single server when the action says restart; a subsequent connect_server + insert brings it back up. The runner integration calls these in the per-test loop.

At a glance

run_options:
  restart_policy: per_run     # default; spawn once per `mcptest run`
  parallel: auto              # default; respect CPU count

Setting	Values	Default	Effect
`restart_policy`	`per_run`, `per_file`, `per_test`	`per_run`	When to respawn (or reconnect to) the server.
`parallel`	`auto`, `false`, positive integer	`auto`	Parallelism. `false` serializes; an integer pins the worker count.

Worked example: stateful server, per-test restart

A server keeps an in-memory cache that survives between tests. Earlier tests leave entries that confuse later ones. Force a fresh process per test and serial execution to prevent worker interference:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

run_options:
  restart_policy: per_test
  parallel: false

servers:
  cache:
    command: ["./bin/cache-mcp"]

tools:
  - name: "first put goes in clean"
    server: cache
    tool: put
    args: { key: "a", value: "1" }

  - name: "second put goes in clean"
    server: cache
    tool: put
    args: { key: "a", value: "2" }
    expect:
      - target: "result.content[0].existing"
        matcher:
          exact: false  # would be true if the cache leaked between tests

The two tests both expect a fresh process. With per_test, the runner respawns the binary between each test; with parallel: false, no two tests collide on shared state.

Restart policy details

`per_run` (default)

Spawn once at the start of the run, run every test, teardown at the end. Today's behavior. Right for stateless servers and idempotent operations.

`per_file`

Spawn before each test file, teardown after the file. Use when state leaks between files but is fine inside a file. Composes with the setup: / teardown: blocks at file level.

`per_test`

Spawn before each test, teardown after each test. Slowest but maximally isolated. Composes with the setup_per_test: block.

URL targets

For URL targets the runner cannot restart a server it did not spawn. per_file and per_test control the connection lifecycle: the runner disconnects and reconnects, optionally re-runs the readiness probe, before the next file or test.

The docs call this out so operators do not file bugs about a "not restarted" remote server: it is a deliberate limit of the URL transport.

Parallelism details

`auto`

Pick a worker count based on the CPU count. The current default.

`false`

Force serial execution. Safest for stateful servers, slowest for stateless ones. Pair with per_test or per_file when state leaks are catastrophic.

Positive integer

Pin the worker count. Useful when CI builds run on a host with a fixed budget (for example, a 4-vCPU runner where auto overshoots).

Why no `parallel: true`?

Operators historically write parallel: true to mean "default worker count," which is ambiguous: is that one worker, all cores, or something else? auto is the explicit name for the default. The loader rejects parallel: true with a hint pointing at auto.

Project vs file overrides

run_options: at file level overrides project-level values. A specific file can pin itself to per_test while the rest of the suite stays on per_run:

# tests/stateful-cache.yml
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

run_options:
  restart_policy: per_test

servers:
  cache:
    command: ["./bin/cache-mcp"]

tools:
  - name: "cache starts empty"
    server: cache
    tool: get
    args:
      key: "session"
    expect:
      - target: "result.isError"
        matcher:
          exact: false

CLI flags --restart-policy and --parallel N / --no-parallel override everything for one invocation:

mcptest run --restart-policy per_test --no-parallel tests/

Composes with fixtures

When setup_per_test: and restart_policy: per_test both appear in a file, the runner does, in order:

Respawn (or reconnect) the server.
Run setup_per_test: steps.
Run the test.
(Future) run teardown_per_test: steps if added.
Teardown the server.

The fixture semantics and the spawn order are both committed, and both run on top of the same future runtime release.

Today: what the runner does

The runner honors per_run (no change to behavior). per_file and per_test parse legally but the runner currently treats them as per_run (no error, no actual respawn) until the executor gains ServerPool shutdown-and-respawn between files and tests.

parallel: auto, false, and integer worker counts are honored today.

Roadmap

The runtime work still pending:

Per-file and per-test respawn for stdio targets.
Connection lifecycle handling for URL targets.
Worker pool with auto, false, and pinned counts.
Reporter feedback for non-default policies (so operators can spot slow modes in the run header).

The schema does not change between today and that release; your YAML files will continue to validate. These are planned for a future release.

References

docs/fixtures.md, the setup_per_test: story that composes with per_test
docs/multi-server.md, the multi-server story that uses default_server: alongside run_options: