mcptest docs GitHub

Using the cache

mcptest can skip re-running a test when nothing about the test or the server has changed since the last green run. This page covers the user-facing surface: what the cache does, when it helps, when it does not, the CLI commands you reach for, and the troubleshooting playbook.

What the cache does

When the runner is about to invoke a tool on a server, it asks the cache: "have I seen this exact invocation before, against this exact server build, evaluated by this exact matcher set?" If yes, and the test is eligible (see Eligibility rules below), the cached result is replayed and the matchers run against it. If no, the runner makes the real call, runs the matchers, and stores the result if the test is eligible.

The cache lives at ~/.cache/mcptest/ by default (or $XDG_CACHE_HOME/mcptest/ if set, or the OS-equivalent on Windows). A repository can pin a different directory with the global --cache-dir <path> flag or the cache.dir block in mcptest.yml, which is the form CI uses to point at a path the build cache can restore.

The default backend in v1.0 is the local filesystem. A GitHub Actions backend ships in v1.0. Backend swap is transparent to your YAML; you pick the backend with a CLI flag or a config block, not by editing tests.

When the cache helps

The cache is at its most useful in two settings:

The PRD ยง9 budget is explicit: a typical suite of 50 deterministic tests should complete in under five seconds on a warm cache. Without caching that target is out of reach; with naive caching it is out of reach in a different way, because the cache hides regressions. The eligibility engine (next section) exists to keep the speed without the false greens.

When the cache does not help

The cache adds zero value, and in fact pays a small overhead, in these cases:

If you are seeing zero cache hits and the run is identical to the previous one, the most common cause is that you upgraded mcptest or pulled a new server binary between runs. Run with --debug and look for the miss reason in the reporter output.

Eligibility rules

The eligibility engine decides per test whether the cache should ever see it. The check is a pure function over the parsed YAML, so it runs during planning and the reporter knows ahead of time how many tests in the suite are cacheable.

Per-test-type defaults

Each kind of test has a default eligibility:

Explicit cache: directive

The YAML side has a single cache: field with three accepted values:

tools:
  - name: "force cache off"
    server: local
    tool: ping
    cache: never

  - name: "force cache on"
    server: local
    tool: ping
    cache: always

  - name: "follow the default"
    server: local
    tool: ping
    cache: auto    # this is the default; you can omit the field

Hard exclusions

Three conditions exclude a test regardless of cache: always:

  1. hooks: block declared on the test. Hooks are arbitrary author code (shell commands, custom Rust extensions, custom matchers). Non-deterministic by definition, never cacheable.
  2. HTTP transport without an explicit server_version: pin. The cache key needs a stable server identity; without a pin, the runner cannot tell that the server changed.
  3. effects: list contains external. External effects (calls to third-party APIs, payment side effects, irreversible actions) cannot be replayed safely.

Compliance has one carve-out: the HTTP-without-pin exclusion does not apply to it, because compliance asks the protocol what it advertises, and the answer does not depend on the server build.

The full eligibility table lives in docs/cache-eligibility.md.

CLI surface

mcptest cache is the CLI namespace for cache operations. The subcommands are in flight; the intended shape is documented here so you can plan against it.

# show every cache entry, grouped by config file
mcptest cache list

# summary statistics
mcptest cache stats

# remove every entry
mcptest cache clear

# remove entries that match a predicate
mcptest cache prune --older-than 7d
mcptest cache prune --server remote_api
mcptest cache prune --match "tools/list_*"

The output of mcptest cache stats looks like this:

mcptest cache stats
~/.cache/mcptest/

  entries:        1,247
  total size:     342 MiB
  oldest entry:   2026-04-22 (24 days)
  hit rate (7d):  84.2%   (3,891 hits, 731 misses)
  evictions (7d): 412     (412 TTL, 0 LRU)
  cap:            1 GiB

The hit rate is computed from the local hit and miss counters the runner emits to the cache directory's index.sqlite, so it reflects whichever projects run on this machine. CI runners and developer machines have their own numbers.

Per-run flags

Two global flags on mcptest run control cache behavior for a single invocation:

--no-cache is the right hammer when you are debugging a suspected cache miss; --cache refresh is for periodic cache rebuilds.

Cache key composition

The cache key is the lowercase hex SHA-256 of a canonical serialization of this struct:

pub struct CacheKey {
    pub server_spec: ServerSpec,        // canonicalized
    pub tool_name: String,
    pub normalized_args: serde_json::Value,
    pub server_version_pin: String,     // upstream Revision
    pub runner_version: String,         // env!("CARGO_PKG_VERSION")
    pub matcher_set: Vec<MatcherSpec>,  // sorted by ID, canonicalized
}

Five things go into the key:

  1. server_spec. The full server specification after CLI override resolution. A test that runs against a URL is a different key from the same test running against a stdio command.
  2. tool_name. The literal name passed to the server.
  3. normalized_args. The args object after canonicalization (sort object keys, strip trailing whitespace, normalize Unicode to NFC). Two semantically identical args produce the same key.
  4. server_version_pin. The negotiated MCP Revision. A protocol-version change invalidates the cache for that server.
  5. runner_version. The mcptest version that produced the entry. Upgrading mcptest invalidates every entry.
  6. matcher_set. The matchers applied to the response, with their rule IDs sorted and canonicalized.

The key is deliberately conservative. A few false misses are acceptable; a single false hit is not. If you find a case where the runner replays a stale answer, file a GitHub issue: that is the cache invariant breaking and we want to know.

The canonicalization rules are documented in detail (Unicode form, JSON key sort order, whitespace handling); the canonicalizer lives in mcptest-core::cache::canonical and is exercised by golden tests.

CI integration

The cache is most valuable in CI when the build cache restores the cache directory between runs. The pattern is the same across CI vendors: tell the build cache to save and restore .mcptest-cache/, then point mcptest at the same path.

GitHub Actions

- name: cache mcptest
  uses: actions/cache@v4
  with:
    path: .mcptest-cache/
    key: mcptest-${{ runner.os }}-${{ hashFiles('mcptest.yml', 'tests/**') }}

- name: run mcptest
  run: mcptest run --cache-dir .mcptest-cache/

The cache key includes the OS (Linux and macOS runners cannot share a cache) and a content hash of every test file. Any change to mcptest.yml or anything under tests/ produces a new key, which gracefully falls back to the previous key as a "restore-key" miss without throwing the cache away.

If you want a richer restore-key strategy:

- name: cache mcptest
  uses: actions/cache@v4
  with:
    path: .mcptest-cache/
    key: mcptest-${{ runner.os }}-${{ hashFiles('mcptest.yml', 'tests/**') }}
    restore-keys: |
      mcptest-${{ runner.os }}-

Now a near-miss (one test file changed) restores from the most recent cache for this OS and the runner pays only for the actually invalidated entries.

GitLab CI

mcptest:
  cache:
    key:
      files:
        - mcptest.yml
        - tests/**/*.yml
    paths:
      - .mcptest-cache/
  script:
    - mcptest run --cache-dir .mcptest-cache/

CircleCI

- restore_cache:
    keys:
      - mcptest-{{ checksum "mcptest.yml" }}-{{ checksum "tests" }}
      - mcptest-

- run: mcptest run --cache-dir .mcptest-cache/

- save_cache:
    paths: [.mcptest-cache/]
    key: mcptest-{{ checksum "mcptest.yml" }}-{{ checksum "tests" }}

Troubleshooting

"Cache hit but I changed the test"

Symptoms: you edited a test (or a server config) and the runner still reports a hit.

Almost always one of:

When in doubt, run --no-cache once to confirm the test still passes against a real call, then --cache refresh to rebuild.

"Cache is making my tests stale"

Symptoms: your tests pass in CI but fail locally (or vice versa), and the suspect is a stale cache entry.

This should not happen, but if it does:

"Cache too large"

Symptoms: ~/.cache/mcptest/ is several gigabytes and you want the disk back.

The cache has a default cap of 1 GiB. When the cap is exceeded the LRU eviction kicks in. If your cap is set higher (via the cache block in mcptest.yml) or if you have many projects sharing the same root, you may want to prune manually:

# remove entries older than 7 days
mcptest cache prune --older-than 7d

# remove every entry
mcptest cache clear

The TTL on every entry is 7 days by default; entries older than that are evicted on next access regardless of LRU position. If you never re-run an old project, its entries age out naturally; running mcptest cache prune --older-than 7d just makes the cleanup eager.

"Cache stats look wrong"

Symptoms: mcptest cache stats reports a hit rate of zero or something nonsensical.

The stats database (index.sqlite in the cache directory) is maintained on a best-effort basis. A force-killed runner can leave the database slightly stale; a corrupted database is a known failure mode on filesystems that lose locking semantics under contention (NFS, some network shares). The fix is to delete the stats database:

rm "$(mcptest cache stats --path-only)/index.sqlite"

The next run will rebuild it as it goes. You will lose historical hit-rate data but no cached entries.

The LLM-judge verdict cache

LLM judges are non-deterministic and slow, so mcptest ships a separate verdict cache that keys on the juror's inputs (model, prompt-template version, criteria, prompt, response) rather than on the call shape. The cache is opt-in because a stale cached verdict can mask drift.

Opt in inside the YAML evals: block:

evals:
  cache:
    verdicts: true

Override the YAML setting on the command line with --no-verdict-cache. The flag is accepted on mcptest run and mcptest eval (anywhere a jury fires) and always wins over the YAML opt-in, so a spot check that needs fresh verdicts does not have to edit YAML:

# Force fresh verdicts even when the YAML opts into caching.
mcptest eval --no-verdict-cache

# Same for the runner.
mcptest run --no-verdict-cache

When the cache is disabled (default, or by --no-verdict-cache), no verdicts are read or written. When enabled, verdicts live under ~/.cache/mcptest/verdicts/ with a 24-hour default TTL.

See also