mcptest docs GitHub

YAML configuration reference

The condensed reference for every key the mcptest configuration loader accepts. The JSON Schema at schemas/v1.json is the source of truth; the longer narrative walk-through with worked examples is yaml-reference.md. This page is the lookup table.

Validate your config: mcptest schema | python -m jsonschema -i config.yml /dev/stdin

Every field below appears in the schema today. Fields that are reserved on the schema for a later release are called out inline.

Top-level keys

The schema requires at least one of servers: or corpus: at the top level. Every other top-level key is optional. Unknown keys fail the load with a pointer to the offending field.

KeyTypeRequiredPurpose
serversobjectyes (unless corpus)Named MCP servers under test.
importsarray of stringsnoOther YAML files to merge in.
variablesobjectnoAuthor-defined values for ${name} interpolation.
toolsarray of objectsnoTool-call tests.
compliancearray or objectnoProtocol-level checks against the server.
evalsarray of objectsnoRubric or model-graded evaluations.
model_compatibilityarray of stringsnoMetadata list of model identifiers the suite targets.
fixturesobjectnoReusable named error scenarios.
performanceobjectnoSuite-wide latency and timeout budgets.
corpusobjectyes (unless servers)Compliance corpus metadata for rule-definition files.

Note: the schema does not currently validate dedicated top-level keys for cache, reporters, auth, profiles, or defaults. Cache behavior is configured per-test under cache: on a tool or compliance entry (see tools[].cache) and is documented in cache-eligibility.md. Reporters are configured on the CLI (--reporter, --output), not in YAML. Auth is configured per-server under servers.<name>.auth. Profile selection is a CLI flag (--profile); the YAML profile: keyword on individual tests is in flight. Suite-wide defaults are written as ordinary top-level keys (performance.default_timeout_ms, variables.*) rather than a dedicated defaults: block.

servers

Named MCP servers under test. Each key is a name the test blocks reference via server:. The value is either a subprocess specification (command:) or a URL specification (url:). Exactly one of the two shapes is allowed per server.

Minimal example:

servers:
  local:
    command: ["./target/debug/my-mcp-server"]

Real-world example:

servers:
  filesystem:
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    env:
      LOG_LEVEL: "info"

  remote_api:
    url: "https://mcp.example.com/v1"
    auth:
      bearer_token_env: "MCPTEST_REMOTE_API_TOKEN"
    wait_for_ready: "https://mcp.example.com/healthz"
    headers:
      X-Tenant: "acme"
    http:
      timeout: 30s
      connect_timeout: 5s
      max_redirects: 5
      tls:
        min_version: "1.3"
        ca_bundle_path: "/etc/ssl/internal-ca.pem"

Subprocess server fields

FieldTypeRequiredDefaultNotes
commandarray of stringsyes-Argv to spawn. First element is the executable; remaining elements are arguments.
envobject of string to stringno{}Extra env vars passed to the child process. ${VAR} interpolation runs against the parent env.

URL server fields

FieldTypeRequiredDefaultNotes
urlstring (URI)yes-HTTP or SSE endpoint.
authobjectno-One of bearer or OAuth.
wait_for_readystringno-URL or path the runner polls until it returns 2xx.
headersobjectno{}Custom headers, literal or env-backed. Authorization and Proxy-Authorization are rejected here. See server.headers.
httpobjectno{}HTTP transport options. See server.http.

auth

Required when a URL server needs credentials. Exactly one of bearer_token_env or oauth is present. The headers and mtls keys on the auth: block are reserved for v1.1 and validated permissively today (the loader prints a warning). The auth architecture is tracked under .

Minimal example (bearer):

servers:
  remote_api:
    url: "https://mcp.example.com/v1"
    auth:
      bearer_token_env: "MCPTEST_REMOTE_API_TOKEN"

OAuth example:

servers:
  remote_api:
    url: "https://mcp.example.com/v1"
    auth:
      oauth:
        client_id_env: "MCPTEST_OAUTH_CLIENT_ID"
        authorization_url: "https://auth.example.com/oauth/authorize"
        token_url: "https://auth.example.com/oauth/token"
        scopes:
          - "mcp:read"
          - "mcp:invoke"

bearer_token_env is the name of the env var that holds the token; the matching client secret for OAuth is read from ${client_id_env}_SECRET by convention. Credentials never appear in the YAML.

server.headers

Map of header name to either a literal string (with ${VAR} interpolation) or {env: NAME} (read from the environment at connect time).

servers:
  saas:
    url: "https://api.example.com/mcp"
    headers:
      X-Tenant: "acme"
      X-API-Key:
        env: ACME_TENANT_API_KEY

server.http

HTTP transport tuning. Duration fields take a unit suffix (30s, 500ms). Bare numbers without a unit are rejected.

FieldTypeDefaultNotes
timeoutduration string30sPer-request body timeout.
connect_timeoutduration string5sTCP and TLS connect timeout.
max_redirectsinteger 0 to 5050 disables redirect following.
user_agent_overridebooleantrueSet a User-Agent identifying mcptest.
tls.insecure_skip_verifybooleanfalseDisable TLS verification. Use only against private staging.
tls.ca_bundle_pathstring-PEM-encoded CA bundle path.
tls.min_version"1.2" or "1.3""1.2"Minimum TLS protocol version.

Mutual TLS (mtls) is deferred to v1.1.

imports

Array of paths (relative or absolute) to other YAML files the loader merges into the current file. Later imports override earlier ones; the current file overrides all its imports.

imports:
  - "./shared/servers.yml"
  - "./shared/variables.yml"

Real-world example with a shared servers file and a per-environment override:

imports:
  - "./shared/servers.yml"
  - "./shared/variables.yml"
  - "./environments/staging.yml"

tools:
  - name: "uses an imported server"
    server: shared_filesystem
    tool: "list_directory"
    args:
      path: "/tmp"

Cycle detection and the full merge implementation land in a follow-up ticket. Today the loader recognizes the array and prints a clear error if any path fails to resolve.

variables

Author-defined named variables usable inside any string field via ${name} interpolation. Each entry is either a literal value or a reference from_env. Never both.

Minimal example:

variables:
  fixture_path:
    value: "/tmp/mcptest-fixture.txt"

Real-world example with both forms and a default:

variables:
  api_base:
    value: "https://staging.example.com"
  account_id:
    from_env: "MCPTEST_ACCOUNT_ID"
    default: "acct_demo"
  auth_token:
    from_env: "MCPTEST_TOKEN"

Per-entry fields:

FieldTypeRequiredNotes
valuestring, number, or booleanone ofLiteral value.
from_envstringone ofEnv var name to read.
defaultstringnoFallback when from_env is unset.

Resolution precedence (highest to lowest): process env, dotenv file, default:, value:. References that fail to resolve raise a structured error before any test runs. The full interpolation syntax (${VAR}, ${VAR:-default}, ${VAR:?}, $VAR, $$) is documented in yaml-reference.md.

tools

Array of tool-call tests. Each entry invokes a single tool on a named server and runs assertions against the result.

Minimal example:

tools:
  - name: "lists tools without error"
    server: local
    tool: "list_directory"
    args:
      path: "/tmp"
    expect:
      - target: "result.content"
        matcher:
          schema:
            type: array
            minItems: 1

Real-world example with budgets and a header check:

tools:
  - name: "search returns content for a known query"
    server: remote_api
    tool: "search"
    args:
      query: "anthropic"
    timeout_ms: 5000
    server_version: "1.4.2"
    cache: auto
    effects: []
    expect:
      assertions:
        - target: "result.content[0].text"
          matcher:
            contains: "Anthropic"
          message: "result should mention Anthropic"
      max_duration_ms: 2000
      max_response_tokens: 4000
      response_headers:
        content-type: "application/json"
      response_headers_absent: ["set-cookie"]

Per-entry fields:

FieldTypeRequiredDefaultNotes
namestringyes-Human-readable test name. Appears in reporter output.
serverstringyes-Key into the top-level servers: map.
toolstringyes-Tool name the server exposes.
argsobjectno{}JSON-serializable arguments. Values may use ${name} interpolation.
expectarray or objectno[]Assertions. See expect.
timeout_msinteger >= 1no-Per-test timeout override.
inject_errorstringno-Fixture name from fixtures.errors[]. See fixtures.
cacheenumnoautoOne of auto, always, never. See Cache directive.
effectsarrayno[]One or more of external, local, filesystem.
server_versionstringno-Version pin required for HTTP-transport cache eligibility.
hooksobjectno-Per-test lifecycle hooks. Declaring any hooks opts the test out of caching.

A style: stepwise | single keyword is not in the schema today. mcptest models stepwise interactions by chaining flat tool tests that share ${variables}; see yaml-reference.md for the worked shape and the tracking ticket.

A capture keyword (capture a value from one response and reuse it in a later step) is in flight; see the yaml-reference deferred summary. A retry keyword is not yet on the schema; for now, wrap flaky tests in test-level timeout_ms and a higher suite-wide budget.

A call: keyword (raw JSON-RPC method plus params, sidestepping the tool-call abstraction) lives on compliance: entries today via the check field.

Cache directive

cache: accepts one of three string values:

The cache design and the storage layer are.

expect block

Either an array of per-target assertions (short form) or an object with assertions: plus budgets and header checks (long form).

Short form:

expect:
  - target: "result.content[0].text"
    matcher:
      contains: "hello"

Long form:

expect:
  assertions:
    - target: "result.content[0].text"
      matcher:
        contains: "hello"
  max_duration_ms: 500
  response_headers:
    content-type: "application/json"
  response_headers_absent: ["set-cookie"]

Assertion fields:

FieldTypeRequiredNotes
targetstringyesJSONPath-style. Example: result.content[0].text.
matcherobjectyesOne key. See Matchers.
messagestringnoFriendly description used in reporter output.

Long-form expectation fields:

FieldTypeRequiredNotes
assertionsarray of objectsnoPer-target assertions.
max_duration_msinteger >= 1noWall-clock budget for the step.
max_response_tokensinteger or objectnoToken cap. Object form takes budget, tokenizer, mode, image_cost.
response_headersobjectnoHeader name to literal string or one of regex, schema, exists, contains.
response_headers_absentarray of stringsnoHeaders that must not be present.

The reporter output model (failure shape, exit codes, JSON envelope) is tracked.

Matchers

A matcher object carries exactly one key. v1 ships six matchers.

MatcherArgumentNotes
exactanyDeep equality.
containsstring, object, or arrayCase-sensitive substring for strings, sub-object for objects, array multiset membership for arrays. Use icontains for case-insensitive substring, exact for full equality. Non-string scalars compare by equality.
regexstringPattern matched against the stringified value.
schemaobjectInline JSON Schema the value must satisfy.
snapshotstring or objectRecord-on-first-run, diff-on-subsequent-runs.
llm-judgeobjectLLM panel with a rubric. See LLM evals guide.

llm-judge fields:

matcher:
  llm-judge:
    rubric: "Answer must mention the service name and the release tag."
    threshold: 0.7
    model: "claude-opus-4-7"

compliance

Either a bare array of compliance test entries (short form) or an object with optional baseline: and a required tests: array (long form).

Short-form example:

compliance:
  - name: "negotiates capabilities on initialize"
    server: filesystem
    check: "initialize"

Long-form example with a baseline file:

compliance:
  baseline: "./compliance-baseline.yml"
  tests:
    - name: "negotiates capabilities on initialize"
      server: filesystem
      check: "initialize"
      expect:
        - target: "result.protocolVersion"
          matcher:
            regex: "^2\\d{3}-\\d{2}-\\d{2}$"
    - name: "advertises required tools"
      server: filesystem
      check: "tools/list"
      cache: auto
      expect:
        - target: "result.tools"
          matcher:
            schema:
              type: array
              minItems: 1

Per-entry fields:

FieldTypeRequiredNotes
namestringyesHuman-readable test name.
serverstringyesKey into the top-level servers: map.
checkstringyesCompliance check identifier (e.g. initialize, tools/list).
expectarray or objectnoSame shape as tools[].expect.
cacheenumnoSee Cache directive.
effectsarraynoSee tools.
server_versionstringnoOptional version pin. Compliance tests do not require a pin for cache eligibility.
hooksobjectnoPer-test lifecycle hooks.

The compliance baseline file shape is documented in compliance-baseline.md.

evals

Rubric or model-graded evaluations. v1 surface is intentionally small. The full grader implementation lands in W7; see guides/llm-evals.md.

Minimal example:

evals:
  - name: "summary stays on topic"
    server: remote_api
    prompt: "Summarize the latest deployment."
    threshold: 0.7

Real-world example with a rubric:

evals:
  - name: "release notes mention service and tag"
    server: remote_api
    prompt: "Write release notes for the last deploy."
    rubric: |
      The notes must mention the service name and the release tag.
      Reject responses that hallucinate a version or omit the tag.
    threshold: 0.8

Per-entry fields:

FieldTypeRequiredDefaultNotes
namestringyes-Test name.
serverstringyes-Key into servers:.
promptstringyes-Prompt sent to the model or tool.
rubricstringno-Free-form grading rubric.
thresholdnumber in [0, 1]no0.7Pass threshold.

Evaluations run via mcptest eval. The default mcptest run invocation skips them so the CI gate stays cheap.

model_compatibility

Pure metadata. Each entry is a model identifier the test suite targets. Reporters use this to label results.

Minimal example:

model_compatibility:
  - "claude-opus-4-7"

Real-world example for a multi-model suite:

model_compatibility:
  - "claude-opus-4-7"
  - "claude-sonnet-4-5"
  - "gpt-4o"
  - "gpt-5-preview"
  - "gemini-2.5-pro"

The full model-compatibility matrix (cross-model diff, baseline vs candidate gating) is the v1.1 wedge feature; see guides/model-compatibility.md.

fixtures

Declares reusable named error scenarios. Any tool test can trigger one by name via inject_error:.

Minimal example:

fixtures:
  errors:
    - name: rate_limited
      code: -32000
      message: "Rate limit exceeded"
      applies_to: any

Real-world example, scoped to a specific tool:

fixtures:
  server: mock_github
  errors:
    - name: rate_limited
      code: -32000
      message: "GitHub API rate limit exceeded"
      tool: create_issue
    - name: auth_expired
      code: -32001
      message: "OAuth token expired"
      applies_to: any

tools:
  - name: "handles rate limiting gracefully"
    server: mock_github
    tool: create_issue
    inject_error: rate_limited
    args:
      title: "Triage queue overflow"
    expect:
      - target: "result.isError"
        matcher:
          exact: true

Per-entry fields under fixtures.errors[]:

FieldTypeRequiredNotes
namestringyesUnique identifier referenced by inject_error:.
codeintegeryesJSON-RPC error code.
messagestringyesHuman-readable error message.
toolstringone ofScopes the error to a single tool.
applies_to"any"one ofApplies to every tool on the server.

Exactly one of tool or applies_to must be present.

performance

Suite-wide soft budgets.

Minimal example:

performance:
  default_timeout_ms: 10000

Real-world example:

performance:
  default_timeout_ms: 15000
  p95_latency_ms: 2000
FieldTypeRequiredDefaultNotes
default_timeout_msinteger >= 1no-Default per-test timeout. Individual tools[].timeout_ms overrides.
p95_latency_msinteger >= 1no-Soft p95 latency budget; reporters highlight tests that breach it.

Deeper performance work (load generation, sustained throughput) lives behind a separate mcptest bench command and is out of scope for v1.

corpus

Compliance corpus metadata. Corpus files declare reusable rule definitions consumed by the future compliance runner. They do not require a servers: block because they describe what should be tested, not which server to target.

Minimal example:

corpus:
  category: PROTO
  rules:
    - rule_id: PROTO-001
      name: "initialize negotiates a supported protocol version"

Real-world example:

corpus:
  category: PROTO
  spec_version: v2025-06-18
  tickets:
    - 
    - 
  rules:
    - rule_id: PROTO-001
      name: "initialize negotiates a supported protocol version"
      description: |
        The server must respond to `initialize` with a protocolVersion the
        client offered, or reject with an explicit unsupported-version error.
      assertions:
        - description: "initialize succeeds with the canonical client version"
          call:
            method: "initialize"
            params:
              protocolVersion: "2025-06-18"
              capabilities: {}
          expect:
            result:
              protocolVersion: "2025-06-18"
            match_mode: subset
      applies_to: ["stdio", "http"]
      notes: "Authored from PRD section on protocol negotiation."

Top-level corpus fields:

FieldTypeRequiredNotes
categoryenumyesPROTO, SCHEMA, SEQ, TOOL, RES, PROMPT, AUTH, TRANSPORT, HEADER, REPLAY, EDGE, or COMPAT.
spec_versionenumnoOne of v2024-11-05, v2025-03-26, v2025-06-18, draft.
ticketsarray of stringsnoLinear ticket IDs that own the rules in this file.
rulesarray of objectsyesAt least one rule entry.

Per-rule fields:

FieldTypeRequiredNotes
rule_idstringyesPattern PREFIX-NNN.
namestringyesShort summary, repeated from the registry.
descriptionstringnoLonger description.
assertionsarray of objectsnoOrdered call+expect pairs. See CorpusAssertion in the schema.
applies_toarray of stringsnoOverrides registry. Allowed values: stdio, http.
notesstringnoFree-form authoring notes the runner ignores.

Each assertion carries a call: (method, params, optional raw bytes, transport, headers, repeat count, timeout) and an expect: (result, error, schema, headers, transport observation, latency budget, stable key order). The full subschema lives under $defs/CorpusCall and $defs/CorpusExpect in schemas/v1.json.

See also