Scenario 14: rate limiting and backoff

You want to know what your test run does when the server pushes back with an HTTP 429. A real MCP server fronted by a gateway, an API quota, or a load shedder will eventually answer "too many requests" with a Retry-After header, and you would rather see that behavior in a controlled run than discover it for the first time in CI at 2am.

The hosted test server makes this easy to exercise. Point a URL target at https://test.mcptest.sh/mcp?scenario=ratelimit and every request comes back as HTTP 429 with a Retry-After: 1 header, before any JSON-RPC is handled. There is no happy path to fall through to; the endpoint exists only to push back. That gives mcptest's transport-level backoff something real to chew on and lets you watch how the failure finally surfaces.

What mcptest actually does here is worth being precise about. The Streamable HTTP transport retries a 429 (and 503 and other 5xx) a small, fixed number of times, honoring Retry-After when the server sends a whole-number-of-seconds value. There is no --retry-style knob for this: the backoff is built into the transport. The per-test --retry flag is a different thing (it re-runs a failing test, for flaky third-party services), and it does not come into play here because the 429 lands during connect, before any test step runs. So this scenario is about observing the built-in backoff and the clean transport error it ends with, not about configuring a retry policy.

The YAML

Save this as tests/ratelimit.yml:

# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  ratelimit:
    url: "https://test.mcptest.sh/mcp?scenario=ratelimit"
    http:
      timeout: 30s
      connect_timeout: 5s

tools:
  - name: "lists tools"
    server: ratelimit
    tool: "tools/list"
    args: {}
    expect:
      - target: "error"
        matcher:
          exact: null
        message: "tools/list should not return an error"

What is happening here:

The ratelimit server points at the hosted endpoint with the scenario=ratelimit query parameter. Every request to it, including the MCP initialize handshake, comes back 429 with Retry-After: 1.
The transport retries the 429 a small, fixed number of times with exponential backoff. When the response carries a numeric Retry-After, the transport waits that many seconds instead of the default backoff step. With Retry-After: 1 you get roughly one second between attempts.
Because the endpoint answers 429 to every request, the retries are always exhausted. The transport gives up, the stream closes, and the initialize handshake fails. The run never reaches the tools/list step, so the tool: block above is effectively the thing the run was trying to get to, not the thing that fails.
http.timeout and http.connect_timeout are the per-request and TCP connect budgets. They bound how long a single attempt can hang; the retry loop sits on top of them. Neither field turns the 429 into a pass; they only cap how long each attempt waits for bytes.

A note on the Retry-After value: mcptest parses it as a whole number of seconds. The HTTP spec also allows an HTTP-date form of Retry-After; the transport does not parse that form and falls back to its built-in backoff step when the value is not a plain integer. The hosted endpoint sends the integer form (1) so the wait is honored.

Run it

mcptest run --config tests/ratelimit.yml

If you want to watch the retries happen, turn on transport debug logging. The Streamable HTTP transport logs each non-2xx response and each retry decision:

mcptest --log-level "mcptest_core::transport::streamable_http=debug" \
        run --config tests/ratelimit.yml

To probe the endpoint on its own before wiring it into a suite, point the layered network diagnostic at it:

mcptest doctor --url "https://test.mcptest.sh/mcp?scenario=ratelimit"

The [AUTH] / [MCP-INIT] rows will not reach OK here, because the server answers 429 to the initialize probe too. That is the expected result for this scenario, not a misconfiguration on your side.

Expected output

A run against the always-429 endpoint exhausts the transport's retries and then fails the connect, since the initialize handshake never completes:

mcptest run --config tests/ratelimit.yml

  FAIL  ratelimit  connect failed
        initialize handshake failed: transport closed
        last HTTP status: 429 Too Many Requests (Retry-After: 1)

0 passed, 1 failed in 4.3s

exit code: 1

With transport debug logging on, the retries are visible before the final failure:

mcptest --log-level "mcptest_core::transport::streamable_http=debug" run --config tests/ratelimit.yml

WARN mcptest_core::transport::streamable_http: non-2xx HTTP response  status=429 Too Many Requests
DEBUG mcptest_core::transport::streamable_http: post attempt failed   transient=true
WARN mcptest_core::transport::streamable_http: non-2xx HTTP response  status=429 Too Many Requests
DEBUG mcptest_core::transport::streamable_http: post attempt failed   transient=true
WARN mcptest_core::transport::streamable_http: non-2xx HTTP response  status=429 Too Many Requests
DEBUG mcptest_core::transport::streamable_http: post attempt failed   transient=true
...
  FAIL  ratelimit  connect failed

The wall-clock time (about four seconds in the example) reflects the Retry-After: 1 waits stacked across the retry attempts. A server that sent no Retry-After would fail faster, because the default backoff steps are sub-second.

Other transport endpoints on the same host are useful for the same kind of "see how a status surfaces" check:

GET https://test.mcptest.sh/status/503 returns the requested status code. 503 is also retried by the transport; a 404 is not (only 429, 503, and 5xx are treated as transient).
GET https://test.mcptest.sh/error returns 500.
GET https://test.mcptest.sh/health returns {"ok": true}, a clean 2xx, handy as a wait_for_ready target or a sanity check that the host is reachable at all.
GET https://test.mcptest.sh/slow?ms=N waits N milliseconds and then responds, which is the endpoint to use when you want to exercise http.timeout rather than a 429.

Troubleshooting

The run hangs much longer than expected. A large Retry-After value multiplied across the retry attempts adds up. The transport honors the server's Retry-After (in whole seconds) over its own backoff, so a server advertising Retry-After: 30 will wait far longer than the always-429 demo. Lower the value the server sends, or accept that a genuinely rate-limited server is telling you to slow down.
I expected --retry to make this pass. The per-test --retry N flag re-runs a test that failed its assertions; it does not change transport behavior, and it does not help here because the 429 fails the connect before any test step runs. There is no separate transport-retry flag; the backoff is built in and not configurable in v1.0.
A 429 against my real server fails the whole suite. That is the point of this scenario: if your server returns 429 during connect, every test behind it fails because the handshake never completes. Fix it on the server side (raise the quota, slow the client) or reduce concurrency with --parallel 1 so the suite stops tripping the limit.
I want to assert on the rate-limit behavior itself. mcptest surfaces the 429 as a transport-level connect failure, not as a JSON-RPC result you can match with an expect: block, so there is nothing in the response body to assert against. Treat the non-zero exit code as the signal, and use --log-level "mcptest_core::transport::streamable_http=debug" to confirm the status and the retry path in the log.
The 429 is not retried at all. Only 429, 503, and other 5xx statuses are treated as transient. A 4xx other than 429 (a 400 or a 404) fails on the first attempt with no backoff, which is intentional: those are not worth retrying.

Scenario 14: rate limiting and backoff

The YAML

Run it

Expected output

Troubleshooting

See also