YAML configuration reference

The condensed reference for every key the mcptest configuration loader accepts. The JSON Schema at schemas/v1.json is the source of truth; the longer narrative walk-through with worked examples is yaml-reference.md. This page is the lookup table.

Validate your config: mcptest schema | python -m jsonschema -i config.yml /dev/stdin

Every field below appears in the schema today. Fields that are reserved on the schema for a later release are called out inline.

Top-level keys

The schema requires at least one of servers: or corpus: at the top level. Every other top-level key is optional. Unknown keys fail the load with a pointer to the offending field.

Key	Type	Required	Purpose
`servers`	object	yes (unless `corpus`)	Named MCP servers under test.
`imports`	array of strings	no	Other YAML files to merge in.
`variables`	object	no	Author-defined values for `${name}` interpolation.
`tools`	array of objects	no	Tool-call tests.
`compliance`	array or object	no	Protocol-level checks against the server.
`evals`	array of objects	no	Rubric or model-graded evaluations.
`model_compatibility`	array of strings	no	Metadata list of model identifiers the suite targets.
`fixtures`	object	no	Reusable named error scenarios.
`performance`	object	no	Suite-wide latency and timeout budgets.
`corpus`	object	yes (unless `servers`)	Compliance corpus metadata for rule-definition files.

Note: the schema does not currently validate dedicated top-level keys for cache, reporters, auth, profiles, or defaults. Cache behavior is configured per-test under cache: on a tool or compliance entry (see tools[].cache) and is documented in cache-eligibility.md. Reporters are configured on the CLI (--reporter, --output), not in YAML. Auth is configured per-server under servers.<name>.auth. Profile selection is a CLI flag (--profile); the YAML profile: keyword on individual tests is in flight. Suite-wide defaults are written as ordinary top-level keys (performance.default_timeout_ms, variables.*) rather than a dedicated defaults: block.

`servers`

Named MCP servers under test. Each key is a name the test blocks reference via server:. The value is either a subprocess specification (command:) or a URL specification (url:). Exactly one of the two shapes is allowed per server.

Minimal example:

servers:
  local:
    command: ["./target/debug/my-mcp-server"]

Real-world example:

servers:
  filesystem:
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    env:
      LOG_LEVEL: "info"

  remote_api:
    url: "https://mcp.example.com/v1"
    auth:
      bearer_token_env: "MCPTEST_REMOTE_API_TOKEN"
    wait_for_ready: "https://mcp.example.com/healthz"
    headers:
      X-Tenant: "acme"
    http:
      timeout: 30s
      connect_timeout: 5s
      max_redirects: 5
      tls:
        min_version: "1.3"
        ca_bundle_path: "/etc/ssl/internal-ca.pem"

Subprocess server fields

Field	Type	Required	Default	Notes
`command`	array of strings	yes	`-`	Argv to spawn. First element is the executable; remaining elements are arguments.
`env`	object of string to string	no	`{}`	Extra env vars passed to the child process. `${VAR}` interpolation runs against the parent env.

URL server fields

Field	Type	Required	Default	Notes
`url`	string (URI)	yes	`-`	HTTP or SSE endpoint.
`auth`	object	no	`-`	One of bearer or OAuth.
`wait_for_ready`	string	no	`-`	URL or path the runner polls until it returns 2xx.
`headers`	object	no	`{}`	Custom headers, literal or env-backed. `Authorization` and `Proxy-Authorization` are rejected here. See `server.headers`.
`http`	object	no	`{}`	HTTP transport options. See `server.http`.

`auth`

Required when a URL server needs credentials. Exactly one of bearer_token_env or oauth is present. The headers and mtls keys on the auth: block are reserved for v1.1 and validated permissively today (the loader prints a warning). The auth architecture is tracked under .

Minimal example (bearer):

servers:
  remote_api:
    url: "https://mcp.example.com/v1"
    auth:
      bearer_token_env: "MCPTEST_REMOTE_API_TOKEN"

OAuth example:

servers:
  remote_api:
    url: "https://mcp.example.com/v1"
    auth:
      oauth:
        client_id_env: "MCPTEST_OAUTH_CLIENT_ID"
        authorization_url: "https://auth.example.com/oauth/authorize"
        token_url: "https://auth.example.com/oauth/token"
        scopes:
          - "mcp:read"
          - "mcp:invoke"

bearer_token_env is the name of the env var that holds the token; the matching client secret for OAuth is read from ${client_id_env}_SECRET by convention. Credentials never appear in the YAML.

`server.headers`

Map of header name to either a literal string (with ${VAR} interpolation) or {env: NAME} (read from the environment at connect time).

servers:
  saas:
    url: "https://api.example.com/mcp"
    headers:
      X-Tenant: "acme"
      X-API-Key:
        env: ACME_TENANT_API_KEY

`server.http`

HTTP transport tuning. Duration fields take a unit suffix (30s, 500ms). Bare numbers without a unit are rejected.

Field	Type	Default	Notes
`timeout`	duration string	`30s`	Per-request body timeout.
`connect_timeout`	duration string	`5s`	TCP and TLS connect timeout.
`max_redirects`	integer 0 to 50	`5`	`0` disables redirect following.
`user_agent_override`	boolean	`true`	Set a User-Agent identifying mcptest.
`tls.insecure_skip_verify`	boolean	`false`	Disable TLS verification. Use only against private staging.
`tls.ca_bundle_path`	string	`-`	PEM-encoded CA bundle path.
`tls.min_version`	`"1.2"` or `"1.3"`	`"1.2"`	Minimum TLS protocol version.

Mutual TLS (mtls) is deferred to v1.1.

`imports`

Array of paths (relative or absolute) to other YAML files the loader merges into the current file. Later imports override earlier ones; the current file overrides all its imports.

imports:
  - "./shared/servers.yml"
  - "./shared/variables.yml"

Real-world example with a shared servers file and a per-environment override:

imports:
  - "./shared/servers.yml"
  - "./shared/variables.yml"
  - "./environments/staging.yml"

tools:
  - name: "uses an imported server"
    server: shared_filesystem
    tool: "list_directory"
    args:
      path: "/tmp"

Cycle detection and the full merge implementation land in a follow-up ticket. Today the loader recognizes the array and prints a clear error if any path fails to resolve.

`variables`

Author-defined named variables usable inside any string field via ${name} interpolation. Each entry is either a literal value or a reference from_env. Never both.

Minimal example:

variables:
  fixture_path:
    value: "/tmp/mcptest-fixture.txt"

Real-world example with both forms and a default:

variables:
  api_base:
    value: "https://staging.example.com"
  account_id:
    from_env: "MCPTEST_ACCOUNT_ID"
    default: "acct_demo"
  auth_token:
    from_env: "MCPTEST_TOKEN"

Per-entry fields:

Field	Type	Required	Notes
`value`	string, number, or boolean	one of	Literal value.
`from_env`	string	one of	Env var name to read.
`default`	string	no	Fallback when `from_env` is unset.

Resolution precedence (highest to lowest): process env, dotenv file, default:, value:. References that fail to resolve raise a structured error before any test runs. The full interpolation syntax (${VAR}, ${VAR:-default}, ${VAR:?}, $VAR, $$) is documented in yaml-reference.md.

`tools`

Array of tool-call tests. Each entry invokes a single tool on a named server and runs assertions against the result.

Minimal example:

tools:
  - name: "lists tools without error"
    server: local
    tool: "list_directory"
    args:
      path: "/tmp"
    expect:
      - target: "result.content"
        matcher:
          schema:
            type: array
            minItems: 1

Real-world example with budgets and a header check:

tools:
  - name: "search returns content for a known query"
    server: remote_api
    tool: "search"
    args:
      query: "anthropic"
    timeout_ms: 5000
    server_version: "1.4.2"
    cache: auto
    effects: []
    expect:
      assertions:
        - target: "result.content[0].text"
          matcher:
            contains: "Anthropic"
          message: "result should mention Anthropic"
      max_duration_ms: 2000
      max_response_tokens: 4000
      response_headers:
        content-type: "application/json"
      response_headers_absent: ["set-cookie"]

Per-entry fields:

Field	Type	Required	Default	Notes
`name`	string	yes	`-`	Human-readable test name. Appears in reporter output.
`server`	string	yes	`-`	Key into the top-level `servers:` map.
`tool`	string	yes	`-`	Tool name the server exposes.
`args`	object	no	`{}`	JSON-serializable arguments. Values may use `${name}` interpolation.
`expect`	array or object	no	`[]`	Assertions. See `expect`.
`timeout_ms`	integer >= 1	no	`-`	Per-test timeout override.
`inject_error`	string	no	`-`	Fixture name from `fixtures.errors[]`. See `fixtures`.
`cache`	enum	no	`auto`	One of `auto`, `always`, `never`. See Cache directive.
`effects`	array	no	`[]`	One or more of `external`, `local`, `filesystem`.
`server_version`	string	no	`-`	Version pin required for HTTP-transport cache eligibility.
`hooks`	object	no	`-`	Per-test lifecycle hooks. Declaring any hooks opts the test out of caching.

A style: stepwise | single keyword is not in the schema today. mcptest models stepwise interactions by chaining flat tool tests that share ${variables}; see yaml-reference.md for the worked shape and the tracking ticket.

A capture keyword (capture a value from one response and reuse it in a later step) is in flight; see the yaml-reference deferred summary. A retry keyword is not yet on the schema; for now, wrap flaky tests in test-level timeout_ms and a higher suite-wide budget.

A call: keyword (raw JSON-RPC method plus params, sidestepping the tool-call abstraction) lives on compliance: entries today via the check field.

Cache directive

cache: accepts one of three string values:

auto. Defer to the per-test-type default (see cache-eligibility.md).
always. Force the test cacheable subject to hard exclusions (hooks, external effects).
never. Opt out unconditionally; beats every other rule.

The cache design and the storage layer are.

`expect` block

Either an array of per-target assertions (short form) or an object with assertions: plus budgets and header checks (long form).

Short form:

expect:
  - target: "result.content[0].text"
    matcher:
      contains: "hello"

Long form:

expect:
  assertions:
    - target: "result.content[0].text"
      matcher:
        contains: "hello"
  max_duration_ms: 500
  response_headers:
    content-type: "application/json"
  response_headers_absent: ["set-cookie"]

Assertion fields:

Field	Type	Required	Notes
`target`	string	yes	JSONPath-style. Example: `result.content[0].text`.
`matcher`	object	yes	One key. See Matchers.
`message`	string	no	Friendly description used in reporter output.

Long-form expectation fields:

Field	Type	Required	Notes
`assertions`	array of objects	no	Per-target assertions.
`max_duration_ms`	integer >= 1	no	Wall-clock budget for the step.
`max_response_tokens`	integer or object	no	Token cap. Object form takes `budget`, `tokenizer`, `mode`, `image_cost`.
`response_headers`	object	no	Header name to literal string or one of `regex`, `schema`, `exists`, `contains`.
`response_headers_absent`	array of strings	no	Headers that must not be present.

The reporter output model (failure shape, exit codes, JSON envelope) is tracked.

Matchers

A matcher object carries exactly one key. v1 ships six matchers.

Matcher	Argument	Notes
`exact`	any	Deep equality.
`contains`	string, object, or array	Case-sensitive substring for strings, sub-object for objects, array multiset membership for arrays. Use `icontains` for case-insensitive substring, `exact` for full equality. Non-string scalars compare by equality.
`regex`	string	Pattern matched against the stringified value.
`schema`	object	Inline JSON Schema the value must satisfy.
`snapshot`	string or object	Record-on-first-run, diff-on-subsequent-runs.
`llm-judge`	object	LLM panel with a rubric. See LLM evals guide.

llm-judge fields:

matcher:
  llm-judge:
    rubric: "Answer must mention the service name and the release tag."
    threshold: 0.7
    model: "claude-opus-4-7"

`compliance`

Either a bare array of compliance test entries (short form) or an object with optional baseline: and a required tests: array (long form).

Short-form example:

compliance:
  - name: "negotiates capabilities on initialize"
    server: filesystem
    check: "initialize"

Long-form example with a baseline file:

compliance:
  baseline: "./compliance-baseline.yml"
  tests:
    - name: "negotiates capabilities on initialize"
      server: filesystem
      check: "initialize"
      expect:
        - target: "result.protocolVersion"
          matcher:
            regex: "^2\\d{3}-\\d{2}-\\d{2}$"
    - name: "advertises required tools"
      server: filesystem
      check: "tools/list"
      cache: auto
      expect:
        - target: "result.tools"
          matcher:
            schema:
              type: array
              minItems: 1

Per-entry fields:

Field	Type	Required	Notes
`name`	string	yes	Human-readable test name.
`server`	string	yes	Key into the top-level `servers:` map.
`check`	string	yes	Compliance check identifier (e.g. `initialize`, `tools/list`).
`expect`	array or object	no	Same shape as `tools[].expect`.
`cache`	enum	no	See Cache directive.
`effects`	array	no	See `tools`.
`server_version`	string	no	Optional version pin. Compliance tests do not require a pin for cache eligibility.
`hooks`	object	no	Per-test lifecycle hooks.

The compliance baseline file shape is documented in compliance-baseline.md.

`evals`

Rubric or model-graded evaluations. v1 surface is intentionally small. The full grader implementation lands in W7; see guides/llm-evals.md.

Minimal example:

evals:
  - name: "summary stays on topic"
    server: remote_api
    prompt: "Summarize the latest deployment."
    threshold: 0.7

Real-world example with a rubric:

evals:
  - name: "release notes mention service and tag"
    server: remote_api
    prompt: "Write release notes for the last deploy."
    rubric: |
      The notes must mention the service name and the release tag.
      Reject responses that hallucinate a version or omit the tag.
    threshold: 0.8

Per-entry fields:

Field	Type	Required	Default	Notes
`name`	string	yes	`-`	Test name.
`server`	string	yes	`-`	Key into `servers:`.
`prompt`	string	yes	`-`	Prompt sent to the model or tool.
`rubric`	string	no	`-`	Free-form grading rubric.
`threshold`	number in `[0, 1]`	no	`0.7`	Pass threshold.

Evaluations run via mcptest eval. The default mcptest run invocation skips them so the CI gate stays cheap.

`model_compatibility`

Pure metadata. Each entry is a model identifier the test suite targets. Reporters use this to label results.

Minimal example:

model_compatibility:
  - "claude-opus-4-7"

Real-world example for a multi-model suite:

model_compatibility:
  - "claude-opus-4-7"
  - "claude-sonnet-4-5"
  - "gpt-4o"
  - "gpt-5-preview"
  - "gemini-2.5-pro"

The full model-compatibility matrix (cross-model diff, baseline vs candidate gating) is the v1.1 wedge feature; see guides/model-compatibility.md.

`fixtures`

Declares reusable named error scenarios. Any tool test can trigger one by name via inject_error:.

Minimal example:

fixtures:
  errors:
    - name: rate_limited
      code: -32000
      message: "Rate limit exceeded"
      applies_to: any

Real-world example, scoped to a specific tool:

fixtures:
  server: mock_github
  errors:
    - name: rate_limited
      code: -32000
      message: "GitHub API rate limit exceeded"
      tool: create_issue
    - name: auth_expired
      code: -32001
      message: "OAuth token expired"
      applies_to: any

tools:
  - name: "handles rate limiting gracefully"
    server: mock_github
    tool: create_issue
    inject_error: rate_limited
    args:
      title: "Triage queue overflow"
    expect:
      - target: "result.isError"
        matcher:
          exact: true

Per-entry fields under fixtures.errors[]:

Field	Type	Required	Notes
`name`	string	yes	Unique identifier referenced by `inject_error:`.
`code`	integer	yes	JSON-RPC error code.
`message`	string	yes	Human-readable error message.
`tool`	string	one of	Scopes the error to a single tool.
`applies_to`	`"any"`	one of	Applies to every tool on the server.

Exactly one of tool or applies_to must be present.

`performance`

Suite-wide soft budgets.

Minimal example:

performance:
  default_timeout_ms: 10000

Real-world example:

performance:
  default_timeout_ms: 15000
  p95_latency_ms: 2000

Field	Type	Required	Default	Notes
`default_timeout_ms`	integer >= 1	no	`-`	Default per-test timeout. Individual `tools[].timeout_ms` overrides.
`p95_latency_ms`	integer >= 1	no	`-`	Soft p95 latency budget; reporters highlight tests that breach it.

Deeper performance work (load generation, sustained throughput) lives behind a separate mcptest bench command and is out of scope for v1.

`corpus`

Compliance corpus metadata. Corpus files declare reusable rule definitions consumed by the future compliance runner. They do not require a servers: block because they describe what should be tested, not which server to target.

Minimal example:

corpus:
  category: PROTO
  rules:
    - rule_id: PROTO-001
      name: "initialize negotiates a supported protocol version"

Real-world example:

corpus:
  category: PROTO
  spec_version: v2025-06-18
  tickets:
    - 
    - 
  rules:
    - rule_id: PROTO-001
      name: "initialize negotiates a supported protocol version"
      description: |
        The server must respond to `initialize` with a protocolVersion the
        client offered, or reject with an explicit unsupported-version error.
      assertions:
        - description: "initialize succeeds with the canonical client version"
          call:
            method: "initialize"
            params:
              protocolVersion: "2025-06-18"
              capabilities: {}
          expect:
            result:
              protocolVersion: "2025-06-18"
            match_mode: subset
      applies_to: ["stdio", "http"]
      notes: "Authored from PRD section on protocol negotiation."

Top-level corpus fields:

Field	Type	Required	Notes
`category`	enum	yes	`PROTO`, `SCHEMA`, `SEQ`, `TOOL`, `RES`, `PROMPT`, `AUTH`, `TRANSPORT`, `HEADER`, `REPLAY`, `EDGE`, or `COMPAT`.
`spec_version`	enum	no	One of `v2024-11-05`, `v2025-03-26`, `v2025-06-18`, `draft`.
`tickets`	array of strings	no	Linear ticket IDs that own the rules in this file.
`rules`	array of objects	yes	At least one rule entry.

Per-rule fields:

Field	Type	Required	Notes
`rule_id`	string	yes	Pattern `PREFIX-NNN`.
`name`	string	yes	Short summary, repeated from the registry.
`description`	string	no	Longer description.
`assertions`	array of objects	no	Ordered call+expect pairs. See `CorpusAssertion` in the schema.
`applies_to`	array of strings	no	Overrides registry. Allowed values: `stdio`, `http`.
`notes`	string	no	Free-form authoring notes the runner ignores.

Each assertion carries a call: (method, params, optional raw bytes, transport, headers, repeat count, timeout) and an expect: (result, error, schema, headers, transport observation, latency budget, stable key order). The full subschema lives under $defs/CorpusCall and $defs/CorpusExpect in schemas/v1.json.

YAML configuration reference

Top-level keys

servers

Subprocess server fields

URL server fields

auth

server.headers

server.http

imports

variables

tools

Cache directive

expect block

Matchers

compliance

evals

model_compatibility

fixtures

performance

corpus

See also

`servers`

`auth`

`server.headers`

`server.http`

`imports`

`variables`

`tools`

`expect` block

`compliance`

`evals`

`model_compatibility`

`fixtures`

`performance`

`corpus`