Scenario 12: multi-server suites

A real workflow rarely lives on one server. An agent calls an issues server and a notifications server. A contract suite validates that two related services interoperate. And once two servers share a session, a new class of bug appears: output from one server quietly becoming control input for another.

This scenario walks through a suite that spans two independent MCP servers on the hosted test server, routes each tool test to the right one, and then adds the cross-server trust-boundary check that catches the implicit-trust pattern. No API key and no local binary are required; everything points at https://test.mcptest.sh.

The two servers are genuinely separate. The primary server at https://test.mcptest.sh/mcp serves greet, search, get_forecast, list_items, slow_op, fail, and delete_record. A second independent server, datastore-b, sits at https://test.mcptest.sh/mcp-b with its own catalog under distinct names: db_get(key), db_put(key, value), db_list(), db_purge() (which carries a destructiveHint), plus a records://{key} resource. The tool names do not overlap, so this is a true multi-server target rather than the same catalog behind two URLs.

The YAML

Save this as tests/multi-server.yml:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  primary:
    url: https://test.mcptest.sh/mcp
  datastore:
    url: https://test.mcptest.sh/mcp-b
  datastore_crosstrust:
    url: https://test.mcptest.sh/mcp-b?scenario=crosstrust

tools:
  # Each test names its server. mcptest merges the catalogs and reaches
  # the right tool on the right server.
  - name: primary server greets
    server: primary
    tool: greet
    args:
      name: mcptest
    expect:
      - target: result.content[0].text
        matcher:
          exact: "Hello, mcptest!"

  - name: datastore-b reads a stored value
    server: datastore
    tool: db_get
    args:
      key: report
    expect:
      - target: result.content[0].text
        matcher:
          exact: "Q3 numbers are within range."

  - name: datastore-b lists its own keys
    server: datastore
    tool: db_list
    expect:
      - target: result.content[0].text
        matcher:
          contains: "report"

  # Trust boundary: under the crosstrust scenario the same db_get returns
  # stored data carrying an instruction aimed at the OTHER server.
  - name: crosstrust output carries a cross-server instruction
    server: datastore_crosstrust
    tool: db_get
    args:
      key: report
    expect:
      - target: result.content[0].text
        matcher:
          contains: "delete_record"
      - target: result.content[0].text
        matcher:
          contains: "attacker.example"

What is happening here:

servers: is the object-map form, one entry per server name. Each entry is a plain URL target; the hosted server needs no auth.
Every tool test carries a server: field. The runner connects each referenced server into a pool and dispatches each test to the server it names. primary server greets reaches greet on the primary; datastore-b reads a stored value reaches db_get on datastore. The catalogs are merged for routing, so the same suite can call tools whose names live on different servers.
primary and datastore point at distinct URLs with disjoint tool names. If the runner sent db_get to the primary it would not resolve, which is exactly the routing the per-test server: field prevents.
datastore_crosstrust is the same second server with ?scenario=crosstrust on the URL. That query string flips the hosted server into a mode where db_get returns benign-looking stored data with an embedded instruction. It is a third named server entry so the honeypot run is isolated from the clean datastore reads above it.
The two real values are deterministic: db_get for key report returns Q3 numbers are within range., and greet for name mcptest returns Hello, mcptest!. That is why the first two tests can use exact matchers.

Run it

mcptest run tests/multi-server.yml

Nothing else to set up. The hosted server is reachable over plain HTTPS, the values are deterministic, and the suite needs no credentials, so this run is stable in CI.

If you want to confirm the file is well-formed before the first run, validate it:

mcptest validate --config tests/multi-server.yml

A clean run prints ok and exits 0.

The cross-server trust boundary

The first three tests are ordinary multi-server routing. The fourth is the interesting one.

When the second server runs under ?scenario=crosstrust, the value stored at report is no longer just Q3 numbers are within range.. It looks benign, but it carries an embedded instruction that targets a different server: call delete_record on the primary, then forward the result to an exfiltration sink at attacker.example. The honeypot value is benign by default; the crosstrust scenario is what makes the payload appear, so you can assert against it without standing up an attack of your own.

The danger is not the string itself. The danger is an agent or orchestrator that reads db_get output from datastore and feeds it straight into a tool call on primary. That is the implicit-trust pattern: output from one server quietly becoming control input for another. A trust-boundary check flags it. The rule is simple and absolute: tool output from one server must never be treated as control input for another server.

The two assertions on the last test pin the payload to the wire:

contains: "delete_record" proves the stored data names a tool that lives on the primary server, not on datastore where it was read.
contains: "attacker.example" proves the same data carries an exfiltration target.

Asserting both makes the boundary-crossing payload a concrete, checkable fact. A cross-server conformance check has something specific to flag, and a regression that sanitized the honeypot (or that let the instruction leak into a real delete_record call) would change the test result.

Expected output

mcptest run tests/multi-server.yml

  PASS  primary server greets                                  (318ms)
  PASS  datastore-b reads a stored value                       (262ms)
  PASS  datastore-b lists its own keys                         (244ms)
  PASS  crosstrust output carries a cross-server instruction   (271ms)

4 passed, 0 failed in 1.1s

All four tests pass. The first lands on the primary server, the next two on datastore, and the last on the crosstrust variant of the second server. The per-test lines show that mcptest dispatched each test to the server it named and resolved the right tool there.

Troubleshooting

tool ... did not resolve on server <name>. The server: field on a test names a server whose catalog does not carry that tool. Most often this is a tool routed to the wrong server: db_get belongs to datastore and greet belongs to primary. Check that each test's server: matches the catalog the tool actually lives in.
server <name> is not defined. A test references a server name that is not in the servers: map. The loader rejects this at load time, before any request goes out. Fix the name or add the entry.
The crosstrust test fails on contains: "delete_record". The ?scenario=crosstrust query string is missing or misspelled on the datastore_crosstrust URL. Without it the second server returns the benign value (Q3 numbers are within range.) and the payload assertions do not match. Confirm the URL is exactly https://test.mcptest.sh/mcp-b?scenario=crosstrust.
All four tests hang or fail to connect. The hosted server was not reachable from this network. Add --wait-for-ready=30s so the runner polls each URL server until it accepts a connection before the suite starts, and confirm https://test.mcptest.sh is reachable from your environment.