YAML configuration reference
The condensed reference for every key the mcptest configuration loader accepts. The JSON Schema at schemas/v1.json is the source of truth; the longer narrative walk-through with worked examples is yaml-reference.md. This page is the lookup table.
Validate your config:
mcptest schema | python -m jsonschema -i config.yml /dev/stdin
Every field below appears in the schema today. Fields that are reserved on the schema for a later release are called out inline.
Top-level keys
The schema requires at least one of servers: or corpus: at the top level. Every other top-level key is optional. Unknown keys fail the load with a pointer to the offending field.
| Key | Type | Required | Purpose |
|---|---|---|---|
servers | object | yes (unless corpus) | Named MCP servers under test. |
imports | array of strings | no | Other YAML files to merge in. |
variables | object | no | Author-defined values for ${name} interpolation. |
tools | array of objects | no | Tool-call tests. |
compliance | array or object | no | Protocol-level checks against the server. |
evals | array of objects | no | Rubric or model-graded evaluations. |
model_compatibility | array of strings | no | Metadata list of model identifiers the suite targets. |
fixtures | object | no | Reusable named error scenarios. |
performance | object | no | Suite-wide latency and timeout budgets. |
corpus | object | yes (unless servers) | Compliance corpus metadata for rule-definition files. |
Note: the schema does not currently validate dedicated top-level keys for cache, reporters, auth, profiles, or defaults. Cache behavior is configured per-test under cache: on a tool or compliance entry (see tools[].cache) and is documented in cache-eligibility.md. Reporters are configured on the CLI (--reporter, --output), not in YAML. Auth is configured per-server under servers.<name>.auth. Profile selection is a CLI flag (--profile); the YAML profile: keyword on individual tests is in flight. Suite-wide defaults are written as ordinary top-level keys (performance.default_timeout_ms, variables.*) rather than a dedicated defaults: block.
servers
Named MCP servers under test. Each key is a name the test blocks reference via server:. The value is either a subprocess specification (command:) or a URL specification (url:). Exactly one of the two shapes is allowed per server.
Minimal example:
servers:
local:
command: ["./target/debug/my-mcp-server"]
Real-world example:
servers:
filesystem:
command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
env:
LOG_LEVEL: "info"
remote_api:
url: "https://mcp.example.com/v1"
auth:
bearer_token_env: "MCPTEST_REMOTE_API_TOKEN"
wait_for_ready: "https://mcp.example.com/healthz"
headers:
X-Tenant: "acme"
http:
timeout: 30s
connect_timeout: 5s
max_redirects: 5
tls:
min_version: "1.3"
ca_bundle_path: "/etc/ssl/internal-ca.pem"
Subprocess server fields
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
command | array of strings | yes | - | Argv to spawn. First element is the executable; remaining elements are arguments. |
env | object of string to string | no | {} | Extra env vars passed to the child process. ${VAR} interpolation runs against the parent env. |
URL server fields
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
url | string (URI) | yes | - | HTTP or SSE endpoint. |
auth | object | no | - | One of bearer or OAuth. |
wait_for_ready | string | no | - | URL or path the runner polls until it returns 2xx. |
headers | object | no | {} | Custom headers, literal or env-backed. Authorization and Proxy-Authorization are rejected here. See server.headers. |
http | object | no | {} | HTTP transport options. See server.http. |
auth
Required when a URL server needs credentials. Exactly one of bearer_token_env or oauth is present. The headers and mtls keys on the auth: block are reserved for v1.1 and validated permissively today (the loader prints a warning). The auth architecture is tracked under .
Minimal example (bearer):
servers:
remote_api:
url: "https://mcp.example.com/v1"
auth:
bearer_token_env: "MCPTEST_REMOTE_API_TOKEN"
OAuth example:
servers:
remote_api:
url: "https://mcp.example.com/v1"
auth:
oauth:
client_id_env: "MCPTEST_OAUTH_CLIENT_ID"
authorization_url: "https://auth.example.com/oauth/authorize"
token_url: "https://auth.example.com/oauth/token"
scopes:
- "mcp:read"
- "mcp:invoke"
bearer_token_env is the name of the env var that holds the token; the matching client secret for OAuth is read from ${client_id_env}_SECRET by convention. Credentials never appear in the YAML.
server.headers
Map of header name to either a literal string (with ${VAR} interpolation) or {env: NAME} (read from the environment at connect time).
servers:
saas:
url: "https://api.example.com/mcp"
headers:
X-Tenant: "acme"
X-API-Key:
env: ACME_TENANT_API_KEY
server.http
HTTP transport tuning. Duration fields take a unit suffix (30s, 500ms). Bare numbers without a unit are rejected.
| Field | Type | Default | Notes |
|---|---|---|---|
timeout | duration string | 30s | Per-request body timeout. |
connect_timeout | duration string | 5s | TCP and TLS connect timeout. |
max_redirects | integer 0 to 50 | 5 | 0 disables redirect following. |
user_agent_override | boolean | true | Set a User-Agent identifying mcptest. |
tls.insecure_skip_verify | boolean | false | Disable TLS verification. Use only against private staging. |
tls.ca_bundle_path | string | - | PEM-encoded CA bundle path. |
tls.min_version | "1.2" or "1.3" | "1.2" | Minimum TLS protocol version. |
Mutual TLS (mtls) is deferred to v1.1.
imports
Array of paths (relative or absolute) to other YAML files the loader merges into the current file. Later imports override earlier ones; the current file overrides all its imports.
imports:
- "./shared/servers.yml"
- "./shared/variables.yml"
Real-world example with a shared servers file and a per-environment override:
imports:
- "./shared/servers.yml"
- "./shared/variables.yml"
- "./environments/staging.yml"
tools:
- name: "uses an imported server"
server: shared_filesystem
tool: "list_directory"
args:
path: "/tmp"
Cycle detection and the full merge implementation land in a follow-up ticket. Today the loader recognizes the array and prints a clear error if any path fails to resolve.
variables
Author-defined named variables usable inside any string field via ${name} interpolation. Each entry is either a literal value or a reference from_env. Never both.
Minimal example:
variables:
fixture_path:
value: "/tmp/mcptest-fixture.txt"
Real-world example with both forms and a default:
variables:
api_base:
value: "https://staging.example.com"
account_id:
from_env: "MCPTEST_ACCOUNT_ID"
default: "acct_demo"
auth_token:
from_env: "MCPTEST_TOKEN"
Per-entry fields:
| Field | Type | Required | Notes |
|---|---|---|---|
value | string, number, or boolean | one of | Literal value. |
from_env | string | one of | Env var name to read. |
default | string | no | Fallback when from_env is unset. |
Resolution precedence (highest to lowest): process env, dotenv file, default:, value:. References that fail to resolve raise a structured error before any test runs. The full interpolation syntax (${VAR}, ${VAR:-default}, ${VAR:?}, $VAR, $$) is documented in yaml-reference.md.
tools
Array of tool-call tests. Each entry invokes a single tool on a named server and runs assertions against the result.
Minimal example:
tools:
- name: "lists tools without error"
server: local
tool: "list_directory"
args:
path: "/tmp"
expect:
- target: "result.content"
matcher:
schema:
type: array
minItems: 1
Real-world example with budgets and a header check:
tools:
- name: "search returns content for a known query"
server: remote_api
tool: "search"
args:
query: "anthropic"
timeout_ms: 5000
server_version: "1.4.2"
cache: auto
effects: []
expect:
assertions:
- target: "result.content[0].text"
matcher:
contains: "Anthropic"
message: "result should mention Anthropic"
max_duration_ms: 2000
max_response_tokens: 4000
response_headers:
content-type: "application/json"
response_headers_absent: ["set-cookie"]
Per-entry fields:
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
name | string | yes | - | Human-readable test name. Appears in reporter output. |
server | string | yes | - | Key into the top-level servers: map. |
tool | string | yes | - | Tool name the server exposes. |
args | object | no | {} | JSON-serializable arguments. Values may use ${name} interpolation. |
expect | array or object | no | [] | Assertions. See expect. |
timeout_ms | integer >= 1 | no | - | Per-test timeout override. |
inject_error | string | no | - | Fixture name from fixtures.errors[]. See fixtures. |
cache | enum | no | auto | One of auto, always, never. See Cache directive. |
effects | array | no | [] | One or more of external, local, filesystem. |
server_version | string | no | - | Version pin required for HTTP-transport cache eligibility. |
hooks | object | no | - | Per-test lifecycle hooks. Declaring any hooks opts the test out of caching. |
A style: stepwise | single keyword is not in the schema today. mcptest models stepwise interactions by chaining flat tool tests that share ${variables}; see yaml-reference.md for the worked shape and the tracking ticket.
A capture keyword (capture a value from one response and reuse it in a later step) is in flight; see the yaml-reference deferred summary. A retry keyword is not yet on the schema; for now, wrap flaky tests in test-level timeout_ms and a higher suite-wide budget.
A call: keyword (raw JSON-RPC method plus params, sidestepping the tool-call abstraction) lives on compliance: entries today via the check field.
Cache directive
cache: accepts one of three string values:
auto. Defer to the per-test-type default (seecache-eligibility.md).always. Force the test cacheable subject to hard exclusions (hooks, external effects).never. Opt out unconditionally; beats every other rule.
The cache design and the storage layer are.
expect block
Either an array of per-target assertions (short form) or an object with assertions: plus budgets and header checks (long form).
Short form:
expect:
- target: "result.content[0].text"
matcher:
contains: "hello"
Long form:
expect:
assertions:
- target: "result.content[0].text"
matcher:
contains: "hello"
max_duration_ms: 500
response_headers:
content-type: "application/json"
response_headers_absent: ["set-cookie"]
Assertion fields:
| Field | Type | Required | Notes |
|---|---|---|---|
target | string | yes | JSONPath-style. Example: result.content[0].text. |
matcher | object | yes | One key. See Matchers. |
message | string | no | Friendly description used in reporter output. |
Long-form expectation fields:
| Field | Type | Required | Notes |
|---|---|---|---|
assertions | array of objects | no | Per-target assertions. |
max_duration_ms | integer >= 1 | no | Wall-clock budget for the step. |
max_response_tokens | integer or object | no | Token cap. Object form takes budget, tokenizer, mode, image_cost. |
response_headers | object | no | Header name to literal string or one of regex, schema, exists, contains. |
response_headers_absent | array of strings | no | Headers that must not be present. |
The reporter output model (failure shape, exit codes, JSON envelope) is tracked.
Matchers
A matcher object carries exactly one key. v1 ships six matchers.
| Matcher | Argument | Notes |
|---|---|---|
exact | any | Deep equality. |
contains | string, object, or array | Case-sensitive substring for strings, sub-object for objects, array multiset membership for arrays. Use icontains for case-insensitive substring, exact for full equality. Non-string scalars compare by equality. |
regex | string | Pattern matched against the stringified value. |
schema | object | Inline JSON Schema the value must satisfy. |
snapshot | string or object | Record-on-first-run, diff-on-subsequent-runs. |
llm-judge | object | LLM panel with a rubric. See LLM evals guide. |
llm-judge fields:
matcher:
llm-judge:
rubric: "Answer must mention the service name and the release tag."
threshold: 0.7
model: "claude-opus-4-7"
compliance
Either a bare array of compliance test entries (short form) or an object with optional baseline: and a required tests: array (long form).
Short-form example:
compliance:
- name: "negotiates capabilities on initialize"
server: filesystem
check: "initialize"
Long-form example with a baseline file:
compliance:
baseline: "./compliance-baseline.yml"
tests:
- name: "negotiates capabilities on initialize"
server: filesystem
check: "initialize"
expect:
- target: "result.protocolVersion"
matcher:
regex: "^2\\d{3}-\\d{2}-\\d{2}$"
- name: "advertises required tools"
server: filesystem
check: "tools/list"
cache: auto
expect:
- target: "result.tools"
matcher:
schema:
type: array
minItems: 1
Per-entry fields:
| Field | Type | Required | Notes |
|---|---|---|---|
name | string | yes | Human-readable test name. |
server | string | yes | Key into the top-level servers: map. |
check | string | yes | Compliance check identifier (e.g. initialize, tools/list). |
expect | array or object | no | Same shape as tools[].expect. |
cache | enum | no | See Cache directive. |
effects | array | no | See tools. |
server_version | string | no | Optional version pin. Compliance tests do not require a pin for cache eligibility. |
hooks | object | no | Per-test lifecycle hooks. |
The compliance baseline file shape is documented in compliance-baseline.md.
evals
Rubric or model-graded evaluations. v1 surface is intentionally small. The full grader implementation lands in W7; see guides/llm-evals.md.
Minimal example:
evals:
- name: "summary stays on topic"
server: remote_api
prompt: "Summarize the latest deployment."
threshold: 0.7
Real-world example with a rubric:
evals:
- name: "release notes mention service and tag"
server: remote_api
prompt: "Write release notes for the last deploy."
rubric: |
The notes must mention the service name and the release tag.
Reject responses that hallucinate a version or omit the tag.
threshold: 0.8
Per-entry fields:
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
name | string | yes | - | Test name. |
server | string | yes | - | Key into servers:. |
prompt | string | yes | - | Prompt sent to the model or tool. |
rubric | string | no | - | Free-form grading rubric. |
threshold | number in [0, 1] | no | 0.7 | Pass threshold. |
Evaluations run via mcptest eval. The default mcptest run invocation skips them so the CI gate stays cheap.
model_compatibility
Pure metadata. Each entry is a model identifier the test suite targets. Reporters use this to label results.
Minimal example:
model_compatibility:
- "claude-opus-4-7"
Real-world example for a multi-model suite:
model_compatibility:
- "claude-opus-4-7"
- "claude-sonnet-4-5"
- "gpt-4o"
- "gpt-5-preview"
- "gemini-2.5-pro"
The full model-compatibility matrix (cross-model diff, baseline vs candidate gating) is the v1.1 wedge feature; see guides/model-compatibility.md.
fixtures
Declares reusable named error scenarios. Any tool test can trigger one by name via inject_error:.
Minimal example:
fixtures:
errors:
- name: rate_limited
code: -32000
message: "Rate limit exceeded"
applies_to: any
Real-world example, scoped to a specific tool:
fixtures:
server: mock_github
errors:
- name: rate_limited
code: -32000
message: "GitHub API rate limit exceeded"
tool: create_issue
- name: auth_expired
code: -32001
message: "OAuth token expired"
applies_to: any
tools:
- name: "handles rate limiting gracefully"
server: mock_github
tool: create_issue
inject_error: rate_limited
args:
title: "Triage queue overflow"
expect:
- target: "result.isError"
matcher:
exact: true
Per-entry fields under fixtures.errors[]:
| Field | Type | Required | Notes |
|---|---|---|---|
name | string | yes | Unique identifier referenced by inject_error:. |
code | integer | yes | JSON-RPC error code. |
message | string | yes | Human-readable error message. |
tool | string | one of | Scopes the error to a single tool. |
applies_to | "any" | one of | Applies to every tool on the server. |
Exactly one of tool or applies_to must be present.
performance
Suite-wide soft budgets.
Minimal example:
performance:
default_timeout_ms: 10000
Real-world example:
performance:
default_timeout_ms: 15000
p95_latency_ms: 2000
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
default_timeout_ms | integer >= 1 | no | - | Default per-test timeout. Individual tools[].timeout_ms overrides. |
p95_latency_ms | integer >= 1 | no | - | Soft p95 latency budget; reporters highlight tests that breach it. |
Deeper performance work (load generation, sustained throughput) lives behind a separate mcptest bench command and is out of scope for v1.
corpus
Compliance corpus metadata. Corpus files declare reusable rule definitions consumed by the future compliance runner. They do not require a servers: block because they describe what should be tested, not which server to target.
Minimal example:
corpus:
category: PROTO
rules:
- rule_id: PROTO-001
name: "initialize negotiates a supported protocol version"
Real-world example:
corpus:
category: PROTO
spec_version: v2025-06-18
tickets:
-
-
rules:
- rule_id: PROTO-001
name: "initialize negotiates a supported protocol version"
description: |
The server must respond to `initialize` with a protocolVersion the
client offered, or reject with an explicit unsupported-version error.
assertions:
- description: "initialize succeeds with the canonical client version"
call:
method: "initialize"
params:
protocolVersion: "2025-06-18"
capabilities: {}
expect:
result:
protocolVersion: "2025-06-18"
match_mode: subset
applies_to: ["stdio", "http"]
notes: "Authored from PRD section on protocol negotiation."
Top-level corpus fields:
| Field | Type | Required | Notes |
|---|---|---|---|
category | enum | yes | PROTO, SCHEMA, SEQ, TOOL, RES, PROMPT, AUTH, TRANSPORT, HEADER, REPLAY, EDGE, or COMPAT. |
spec_version | enum | no | One of v2024-11-05, v2025-03-26, v2025-06-18, draft. |
tickets | array of strings | no | Linear ticket IDs that own the rules in this file. |
rules | array of objects | yes | At least one rule entry. |
Per-rule fields:
| Field | Type | Required | Notes |
|---|---|---|---|
rule_id | string | yes | Pattern PREFIX-NNN. |
name | string | yes | Short summary, repeated from the registry. |
description | string | no | Longer description. |
assertions | array of objects | no | Ordered call+expect pairs. See CorpusAssertion in the schema. |
applies_to | array of strings | no | Overrides registry. Allowed values: stdio, http. |
notes | string | no | Free-form authoring notes the runner ignores. |
Each assertion carries a call: (method, params, optional raw bytes, transport, headers, repeat count, timeout) and an expect: (result, error, schema, headers, transport observation, latency budget, stable key order). The full subschema lives under $defs/CorpusCall and $defs/CorpusExpect in schemas/v1.json.
See also
- YAML reference: narrative walk-through with worked examples.
- CLI reference: every command and flag.
schemas/v1.json: the source of truth.