Using the cache

mcptest can skip re-running a test when nothing about the test or the server has changed since the last green run. This page covers the user-facing surface: what the cache does, when it helps, when it does not, the CLI commands you reach for, and the troubleshooting playbook.

What the cache does

When the runner is about to invoke a tool on a server, it asks the cache: "have I seen this exact invocation before, against this exact server build, evaluated by this exact matcher set?" If yes, and the test is eligible (see Eligibility rules below), the cached result is replayed and the matchers run against it. If no, the runner makes the real call, runs the matchers, and stores the result if the test is eligible.

The cache lives at ~/.cache/mcptest/ by default (or $XDG_CACHE_HOME/mcptest/ if set, or the OS-equivalent on Windows). A repository can pin a different directory with the global --cache-dir <path> flag or the cache.dir block in mcptest.yml, which is the form CI uses to point at a path the build cache can restore.

The default backend in v1.0 is the local filesystem. A GitHub Actions backend ships in v1.0. Backend swap is transparent to your YAML; you pick the backend with a CLI flag or a config block, not by editing tests.

When the cache helps

The cache is at its most useful in two settings:

Tight iteration loops. A developer running mcptest run repeatedly while debugging a single failing test does not need to re-execute the forty passing tests every time. The cache serves them in milliseconds and the run feels free.
CI on dependent PRs. A PR that touches one file should not invalidate cache entries for tests against parts of the server that did not change. A well-keyed cache turns the typical "every PR runs the full suite" cost into "every PR runs the tests whose inputs actually changed."

The PRD §9 budget is explicit: a typical suite of 50 deterministic tests should complete in under five seconds on a warm cache. Without caching that target is out of reach; with naive caching it is out of reach in a different way, because the cache hides regressions. The eligibility engine (next section) exists to keep the speed without the false greens.

When the cache does not help

The cache adds zero value, and in fact pays a small overhead, in these cases:

First run. The cache is empty. Every test is a miss.
Server binary changed. The cache key includes server_version_pin (the negotiated MCP Revision), so a server that switches from 2025-03-26 to 2025-06-18 invalidates every cached result for that server on the next run. The runner does the right thing here, but the run that triggers the invalidation pays the full cost.
Test args changed. Changing one byte in the YAML changes the cache key. The cache cannot help you on a test you just rewrote.
Runner upgraded. Upgrading mcptest changes the runner_version field in the cache key, which invalidates the entire cache. This is deliberate; a runner upgrade may include a matcher bug fix, and reusing pre-upgrade results would mask the fix.
Tests with non-deterministic matchers. regex, contains, any LLM-eval matcher, performance thresholds. These are never cached. See the next section.

If you are seeing zero cache hits and the run is identical to the previous one, the most common cause is that you upgraded mcptest or pulled a new server binary between runs. Run with --debug and look for the miss reason in the reporter output.

Eligibility rules

The eligibility engine decides per test whether the cache should ever see it. The check is a pure function over the parsed YAML, so it runs during planning and the reporter knows ahead of time how many tests in the suite are cacheable.

Per-test-type defaults

Each kind of test has a default eligibility:

Tools tests are usually a fixed input mapped to a fixed output. Cacheable by default.
Compliance tests are read-only protocol probes (initialize, tools/list, error shape). Cacheable by default.
Eval tests route through an LLM with sampling. The same input produces different scores across runs. Never cacheable by default.
Performance tests measure wall-clock latency. Caching a timing measurement would be a lie. Never cacheable by default.
Model-compatibility tests mirror tools-style behavior across models, so they share the tools default.

Explicit `cache:` directive

The YAML side has a single cache: field with three accepted values:

tools:
  - name: "force cache off"
    server: local
    tool: ping
    cache: never

  - name: "force cache on"
    server: local
    tool: ping
    cache: always

  - name: "follow the default"
    server: local
    tool: ping
    cache: auto    # this is the default; you can omit the field

cache: never is sovereign. It beats every default and every hard exclusion. Use it when you want a single test pinned uncacheable even after a future refactor changes the engine's view of the test.
cache: always overrides the type default but cannot override hard exclusions. Forcing the cache on a hook-driven test or a test with effects: [external] would let the cache silently return a stale answer.
cache: auto (the default) means "use the type default."

Hard exclusions

Three conditions exclude a test regardless of cache: always:

hooks: block declared on the test. Hooks are arbitrary author code (shell commands, custom Rust extensions, custom matchers). Non-deterministic by definition, never cacheable.
HTTP transport without an explicit server_version: pin. The cache key needs a stable server identity; without a pin, the runner cannot tell that the server changed.
effects: list contains external. External effects (calls to third-party APIs, payment side effects, irreversible actions) cannot be replayed safely.

Compliance has one carve-out: the HTTP-without-pin exclusion does not apply to it, because compliance asks the protocol what it advertises, and the answer does not depend on the server build.

The full eligibility table lives in docs/cache-eligibility.md.

CLI surface

mcptest cache is the CLI namespace for cache operations. The subcommands are in flight; the intended shape is documented here so you can plan against it.

# show every cache entry, grouped by config file
mcptest cache list

# summary statistics
mcptest cache stats

# remove every entry
mcptest cache clear

# remove entries that match a predicate
mcptest cache prune --older-than 7d
mcptest cache prune --server remote_api
mcptest cache prune --match "tools/list_*"

The output of mcptest cache stats looks like this:

mcptest cache stats
~/.cache/mcptest/

  entries:        1,247
  total size:     342 MiB
  oldest entry:   2026-04-22 (24 days)
  hit rate (7d):  84.2%   (3,891 hits, 731 misses)
  evictions (7d): 412     (412 TTL, 0 LRU)
  cap:            1 GiB

The hit rate is computed from the local hit and miss counters the runner emits to the cache directory's index.sqlite, so it reflects whichever projects run on this machine. CI runners and developer machines have their own numbers.

Per-run flags

Two global flags on mcptest run control cache behavior for a single invocation:

--no-cache. Bypass the cache entirely for this run. Tests still record their results, so you can diagnose a suspected cache poisoning issue without losing the cached good state.
--cache refresh. Evict every entry that matches the current run's tests, then re-run. Equivalent to "this suite, with a cold cache." Useful in CI on the main branch to make sure the cache is healthy on a known-good state.

--no-cache is the right hammer when you are debugging a suspected cache miss; --cache refresh is for periodic cache rebuilds.

Cache key composition

The cache key is the lowercase hex SHA-256 of a canonical serialization of this struct:

pub struct CacheKey {
    pub server_spec: ServerSpec,        // canonicalized
    pub tool_name: String,
    pub normalized_args: serde_json::Value,
    pub server_version_pin: String,     // upstream Revision
    pub runner_version: String,         // env!("CARGO_PKG_VERSION")
    pub matcher_set: Vec<MatcherSpec>,  // sorted by ID, canonicalized
}

Five things go into the key:

server_spec. The full server specification after CLI override resolution. A test that runs against a URL is a different key from the same test running against a stdio command.
tool_name. The literal name passed to the server.
normalized_args. The args object after canonicalization (sort object keys, strip trailing whitespace, normalize Unicode to NFC). Two semantically identical args produce the same key.
server_version_pin. The negotiated MCP Revision. A protocol-version change invalidates the cache for that server.
runner_version. The mcptest version that produced the entry. Upgrading mcptest invalidates every entry.
matcher_set. The matchers applied to the response, with their rule IDs sorted and canonicalized.

The key is deliberately conservative. A few false misses are acceptable; a single false hit is not. If you find a case where the runner replays a stale answer, file a GitHub issue: that is the cache invariant breaking and we want to know.

The canonicalization rules are documented in detail (Unicode form, JSON key sort order, whitespace handling); the canonicalizer lives in mcptest-core::cache::canonical and is exercised by golden tests.

CI integration

The cache is most valuable in CI when the build cache restores the cache directory between runs. The pattern is the same across CI vendors: tell the build cache to save and restore .mcptest-cache/, then point mcptest at the same path.

GitHub Actions

- name: cache mcptest
  uses: actions/cache@v4
  with:
    path: .mcptest-cache/
    key: mcptest-${{ runner.os }}-${{ hashFiles('mcptest.yml', 'tests/**') }}

- name: run mcptest
  run: mcptest run --cache-dir .mcptest-cache/

The cache key includes the OS (Linux and macOS runners cannot share a cache) and a content hash of every test file. Any change to mcptest.yml or anything under tests/ produces a new key, which gracefully falls back to the previous key as a "restore-key" miss without throwing the cache away.

If you want a richer restore-key strategy:

- name: cache mcptest
  uses: actions/cache@v4
  with:
    path: .mcptest-cache/
    key: mcptest-${{ runner.os }}-${{ hashFiles('mcptest.yml', 'tests/**') }}
    restore-keys: |
      mcptest-${{ runner.os }}-

Now a near-miss (one test file changed) restores from the most recent cache for this OS and the runner pays only for the actually invalidated entries.

GitLab CI

mcptest:
  cache:
    key:
      files:
        - mcptest.yml
        - tests/**/*.yml
    paths:
      - .mcptest-cache/
  script:
    - mcptest run --cache-dir .mcptest-cache/

CircleCI

- restore_cache:
    keys:
      - mcptest-{{ checksum "mcptest.yml" }}-{{ checksum "tests" }}
      - mcptest-

- run: mcptest run --cache-dir .mcptest-cache/

- save_cache:
    paths: [.mcptest-cache/]
    key: mcptest-{{ checksum "mcptest.yml" }}-{{ checksum "tests" }}

Troubleshooting

"Cache hit but I changed the test"

Symptoms: you edited a test (or a server config) and the runner still reports a hit.

Almost always one of:

The change you made does not affect the cache key. Renaming a test, changing a message: string, or adjusting whitespace in a comment does not invalidate the entry. Only the fields listed in Cache key composition do.
Server identity drift. The server changed under you, but the negotiated server_version_pin did not. The MCP Revision is the only signal the cache has about server identity; if your server reports the same Revision from two different builds, the cache will treat them as the same. The fix is to bump the server's reported version on every release, even when the protocol version did not move. See the troubleshooting tip in docs/cache-eligibility.md.
You expected normalization to be looser than it is. The canonicalizer sorts JSON keys and normalizes Unicode, but it does not normalize numbers (1 and 1.0 are different keys), and it does not normalize array order. If you swapped two elements in an array and expected a hit, you will see a miss.

When in doubt, run --no-cache once to confirm the test still passes against a real call, then --cache refresh to rebuild.

"Cache is making my tests stale"

Symptoms: your tests pass in CI but fail locally (or vice versa), and the suspect is a stale cache entry.

This should not happen, but if it does:

Run with --no-cache to confirm the test passes against a real call.
If it does, the cache is poisoned. Run mcptest cache clear to remove every entry, then re-run.
File a GitHub issue with the smallest reproducer you can produce. A stale hit is the invariant the cache is built around; if we have a leak, we want to fix it.

"Cache too large"

Symptoms: ~/.cache/mcptest/ is several gigabytes and you want the disk back.

The cache has a default cap of 1 GiB. When the cap is exceeded the LRU eviction kicks in. If your cap is set higher (via the cache block in mcptest.yml) or if you have many projects sharing the same root, you may want to prune manually:

# remove entries older than 7 days
mcptest cache prune --older-than 7d

# remove every entry
mcptest cache clear

The TTL on every entry is 7 days by default; entries older than that are evicted on next access regardless of LRU position. If you never re-run an old project, its entries age out naturally; running mcptest cache prune --older-than 7d just makes the cleanup eager.

"Cache stats look wrong"

Symptoms: mcptest cache stats reports a hit rate of zero or something nonsensical.

The stats database (index.sqlite in the cache directory) is maintained on a best-effort basis. A force-killed runner can leave the database slightly stale; a corrupted database is a known failure mode on filesystems that lose locking semantics under contention (NFS, some network shares). The fix is to delete the stats database:

rm "$(mcptest cache stats --path-only)/index.sqlite"

The next run will rebuild it as it goes. You will lose historical hit-rate data but no cached entries.

The LLM-judge verdict cache

LLM judges are non-deterministic and slow, so mcptest ships a separate verdict cache that keys on the juror's inputs (model, prompt-template version, criteria, prompt, response) rather than on the call shape. The cache is opt-in because a stale cached verdict can mask drift.

Opt in inside the YAML evals: block:

evals:
  cache:
    verdicts: true

Override the YAML setting on the command line with --no-verdict-cache. The flag is accepted on mcptest run and mcptest eval (anywhere a jury fires) and always wins over the YAML opt-in, so a spot check that needs fresh verdicts does not have to edit YAML:

# Force fresh verdicts even when the YAML opts into caching.
mcptest eval --no-verdict-cache

# Same for the runner.
mcptest run --no-verdict-cache

When the cache is disabled (default, or by --no-verdict-cache), no verdicts are read or written. When enabled, verdicts live under ~/.cache/mcptest/verdicts/ with a 24-hour default TTL.