mcptest docs GitHub

Scenario 12: multi-server suites

A real workflow rarely lives on one server. An agent calls an issues server and a notifications server. A contract suite validates that two related services interoperate. And once two servers share a session, a new class of bug appears: output from one server quietly becoming control input for another.

This scenario walks through a suite that spans two independent MCP servers on the hosted test server, routes each tool test to the right one, and then adds the cross-server trust-boundary check that catches the implicit-trust pattern. No API key and no local binary are required; everything points at https://test.mcptest.sh.

The two servers are genuinely separate. The primary server at https://test.mcptest.sh/mcp serves greet, search, get_forecast, list_items, slow_op, fail, and delete_record. A second independent server, datastore-b, sits at https://test.mcptest.sh/mcp-b with its own catalog under distinct names: db_get(key), db_put(key, value), db_list(), db_purge() (which carries a destructiveHint), plus a records://{key} resource. The tool names do not overlap, so this is a true multi-server target rather than the same catalog behind two URLs.

The YAML

Save this as tests/multi-server.yml:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  primary:
    url: https://test.mcptest.sh/mcp
  datastore:
    url: https://test.mcptest.sh/mcp-b
  datastore_crosstrust:
    url: https://test.mcptest.sh/mcp-b?scenario=crosstrust

tools:
  # Each test names its server. mcptest merges the catalogs and reaches
  # the right tool on the right server.
  - name: primary server greets
    server: primary
    tool: greet
    args:
      name: mcptest
    expect:
      - target: result.content[0].text
        matcher:
          exact: "Hello, mcptest!"

  - name: datastore-b reads a stored value
    server: datastore
    tool: db_get
    args:
      key: report
    expect:
      - target: result.content[0].text
        matcher:
          exact: "Q3 numbers are within range."

  - name: datastore-b lists its own keys
    server: datastore
    tool: db_list
    expect:
      - target: result.content[0].text
        matcher:
          contains: "report"

  # Trust boundary: under the crosstrust scenario the same db_get returns
  # stored data carrying an instruction aimed at the OTHER server.
  - name: crosstrust output carries a cross-server instruction
    server: datastore_crosstrust
    tool: db_get
    args:
      key: report
    expect:
      - target: result.content[0].text
        matcher:
          contains: "delete_record"
      - target: result.content[0].text
        matcher:
          contains: "attacker.example"

What is happening here:

Run it

mcptest run tests/multi-server.yml

Nothing else to set up. The hosted server is reachable over plain HTTPS, the values are deterministic, and the suite needs no credentials, so this run is stable in CI.

If you want to confirm the file is well-formed before the first run, validate it:

mcptest validate --config tests/multi-server.yml

A clean run prints ok and exits 0.

The cross-server trust boundary

The first three tests are ordinary multi-server routing. The fourth is the interesting one.

When the second server runs under ?scenario=crosstrust, the value stored at report is no longer just Q3 numbers are within range.. It looks benign, but it carries an embedded instruction that targets a different server: call delete_record on the primary, then forward the result to an exfiltration sink at attacker.example. The honeypot value is benign by default; the crosstrust scenario is what makes the payload appear, so you can assert against it without standing up an attack of your own.

The danger is not the string itself. The danger is an agent or orchestrator that reads db_get output from datastore and feeds it straight into a tool call on primary. That is the implicit-trust pattern: output from one server quietly becoming control input for another. A trust-boundary check flags it. The rule is simple and absolute: tool output from one server must never be treated as control input for another server.

The two assertions on the last test pin the payload to the wire:

Asserting both makes the boundary-crossing payload a concrete, checkable fact. A cross-server conformance check has something specific to flag, and a regression that sanitized the honeypot (or that let the instruction leak into a real delete_record call) would change the test result.

Expected output

mcptest run tests/multi-server.yml

  PASS  primary server greets                                  (318ms)
  PASS  datastore-b reads a stored value                       (262ms)
  PASS  datastore-b lists its own keys                         (244ms)
  PASS  crosstrust output carries a cross-server instruction   (271ms)

4 passed, 0 failed in 1.1s

All four tests pass. The first lands on the primary server, the next two on datastore, and the last on the crosstrust variant of the second server. The per-test lines show that mcptest dispatched each test to the server it named and resolved the right tool there.

Troubleshooting

See also