Skip to content

Commit 29cb1ba

Browse files
committed
examples(mrtr): handler-shape comparison deck (SEP-2322)
Python-SDK counterpart to typescript-sdk#1701. Seven ways to write the same weather-lookup tool so the diff between files is the argument. SDK primitives (src/mcp/server/experimental/mrtr.py): - MrtrCtx.once() — idempotency guard tracked in request_state (Option F) - ToolBuilder — structural step decomposition; end_step runs exactly once regardless of round count (Option G) - input_response() — sugar for the guard-first pattern - sse_retry_shim() — Option A comparison artifact (pragma no-cover until LATEST_PROTOCOL_VERSION bumps past the MRTR gate) - dispatch_by_version() — Option D comparison artifact Option examples (examples/servers/mrtr-options/): - E (degrade-only): the SDK default. MRTR-native; pre-MRTR gets a default or error. Both quadrant rows collapse here. - A (SSE shim): SDK emulates retry over SSE. Safe re-entry, hidden loop. - B (await shim): exception-based. UNSAFE — hidden double-execution above await. Not a ship target; for contrast. - C (version branch): explicit if/else in handler body. - D (dual handler): two functions, SDK picks by version. - F (ctx.once): idempotency guard, opt-in per side-effect. - G (ToolBuilder): no above-the-guard zone; end_step structurally unreachable until all elicitations complete. The invariant test (tests/experimental/test_mrtr.py) parametrises E/F/G against the same Client + callback to prove identical wire behaviour — the server's internal choice doesn't leak. The footgun test measures audit_log count to prove F and G actually hold the guard (naive handler fires twice; F and G fire once). Both F and G depend on request_state integrity. The demos use plain base64-JSON; a production SDK MUST HMAC-sign the blob.
1 parent 25fb05f commit 29cb1ba

File tree

14 files changed

+1320
-0
lines changed

14 files changed

+1320
-0
lines changed
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# MRTR handler-shape options (SEP-2322)
2+
3+
Python-SDK counterpart to [typescript-sdk#1701]. Seven ways to write the same
4+
weather-lookup tool, so the diff between files is the argument.
5+
6+
Unlike the TS demos, the lowlevel plumbing here is **real** — each option is
7+
an actual `mcp.server.Server` that round-trips `IncompleteResult` through the
8+
wire protocol. The invariant test at the bottom asserts they all produce
9+
identical client-observed behaviour.
10+
11+
[typescript-sdk#1701]: https://github.com/modelcontextprotocol/typescript-sdk/pull/1701
12+
13+
## The quadrant
14+
15+
| Server infra | Pre-MRTR client | MRTR client |
16+
| ------------------------------- | --------------------------------- | ----------- |
17+
| Can hold SSE | E by default; A/C/D if you opt in | MRTR |
18+
| MRTR-only (horizontally scaled) | E by necessity | MRTR |
19+
20+
Both rows *work* for old clients — version negotiation succeeds,
21+
`tools/list` is complete, tools that don't elicit are unaffected. Only
22+
elicitation inside a tool is unavailable. Bottom-left isn't "unresolvable";
23+
it's "E is the only option." Top-left is "E, unless you choose to carry SSE
24+
infra." The rows collapse for E, which is why it's the SDK default.
25+
26+
## Options
27+
28+
| | Author writes | SDK does | Hidden re-entry | Old client gets |
29+
| ------------------------------ | ------------------------------- | -------------------------------- | --------------- | --------------------------------- |
30+
| [E](mrtr_options/option_e_degrade.py) | MRTR-native only | Nothing | No | Result w/ default, or error |
31+
| [A](mrtr_options/option_a_sse_shim.py) | MRTR-native only | Retry-loop over SSE | Yes, safe | Full elicitation |
32+
| [B](mrtr_options/option_b_await_shim.py) | `await elicit()` | Exception → `IncompleteResult` | **Yes, unsafe** | Full elicitation |
33+
| [C](mrtr_options/option_c_version_branch.py) | One handler, `if version` branch | Version accessor | No | Full elicitation |
34+
| [D](mrtr_options/option_d_dual_handler.py) | Two handlers | Picks by version | No | Full elicitation |
35+
| [F](mrtr_options/option_f_ctx_once.py) | MRTR-native + `ctx.once` wraps | `once()` guard in request_state | No | (same as E) |
36+
| [G](mrtr_options/option_g_tool_builder.py) | Step functions + `.build()` | Step-tracking in request_state | No | (same as E) |
37+
38+
"Hidden re-entry" = the handler function is invoked more than once for a
39+
single logical tool call, and the author can't tell from the source text.
40+
41+
**A is safe** because MRTR-native code has the re-entry guard (`if not
42+
prefs: return IncompleteResult(...)`) visible in source even though the
43+
*loop* is hidden.
44+
45+
**B is unsafe** because `await elicit()` looks like a suspension point but
46+
is actually a re-entry point on MRTR sessions — see the `audit_log`
47+
landmine in that file.
48+
49+
## Footgun prevention (F, G)
50+
51+
A–E are about the dual-path axis (old client vs new). F and G address a
52+
different axis: even in a pure-MRTR world, the naive handler shape has a
53+
footgun. Code above the `if not prefs` guard runs on every retry. If that
54+
code is a DB write or HTTP POST, it executes N times for N-round
55+
elicitation. Nothing *enforces* putting side-effects below the guard —
56+
safety depends on the developer knowing the convention. The analogy from
57+
SDK-WG review: the naive MRTR handler is de-facto GOTO.
58+
59+
**F (`MrtrCtx.once`)** keeps the monolithic handler but wraps side-effects
60+
in an idempotency guard. `ctx.once("audit", lambda: audit_log(...))` checks
61+
`request_state` — if the key is marked executed, skip. Opt-in: an unwrapped
62+
mutation still fires twice. The footgun is made *visually distinct*, which
63+
is reviewable.
64+
65+
**G (`ToolBuilder`)** decomposes the handler into named step functions.
66+
`incomplete_step` may return `IncompleteResult` or data; `end_step` receives
67+
everything and runs exactly once. There is no "above the guard" zone because
68+
there is no guard — the SDK's step-tracking is the guard. Side-effects go in
69+
`end_step`, structurally unreachable until all elicitations complete.
70+
71+
Both depend on `request_state` integrity. The demos use plain base64-JSON;
72+
a real SDK MUST HMAC-sign the blob, or the client can forge step-done
73+
markers and skip the guards. Per-session key derived from `initialize` keeps
74+
it stateless. Without signing, the safety story is advisory.
75+
76+
## Trade-offs
77+
78+
**E is the SDK default.** A horizontally-scaled server gets E for free —
79+
it's the only thing that works on that infra. A server that can hold SSE
80+
also gets E by default, and opts into A/C/D only if serving old-client
81+
elicitation is worth the extra infra dependency.
82+
83+
**A vs E** is the core tension. Same author-facing code (MRTR-native), the
84+
only difference is whether old clients get elicitation. A requires shipping
85+
`sse_retry_shim`; E requires nothing. A also carries a deployment-time
86+
hazard E doesn't: the shim calls real SSE under the hood, so on MRTR-only
87+
infra it fails at runtime when an old client connects — a constraint that
88+
lives nowhere near the tool code.
89+
90+
**B** is zero-migration but breaks silently for anything non-idempotent
91+
above the await. Not a ship target.
92+
93+
**C vs D** is factoring: one function with a branch vs two functions with a
94+
dispatcher. Both put the dual-path burden on the tool author.
95+
96+
**F vs G** is the footgun-prevention trade. F is minimal — one line per
97+
side-effect, composes with any handler shape. G is structural —
98+
double-execution impossible for `end_step`, but costs two function defs
99+
per tool. Likely SDK answer: ship F as a primitive on the context, ship G
100+
as an opt-in builder, recommend G for multi-round tools and F for
101+
single-question tools.
102+
103+
## The invariant test
104+
105+
`tests/server/experimental/test_mrtr_options.py` parametrises all seven
106+
servers against the same `Client` + `elicitation_callback`, asserting
107+
identical output. The footgun test measures `audit_count` to prove F and G
108+
hold the side-effect to one.
109+
110+
## Not in scope
111+
112+
- Persistent/Tasks workflow — `ServerTaskContext` already does
113+
`input_required`; MRTR integration is a separate PR
114+
- `mrtrOnly` client flag — trivial to add, not demoed
115+
- requestState HMAC signing — called out in code comments
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""MRTR handler-shape comparison — seven options on the same weather tool.
2+
3+
See README.md for the trade-off matrix. Every option here is a real lowlevel
4+
``mcp.server.Server`` that produces identical wire behaviour to each client
5+
version — the server's internal choice doesn't leak. That's the argument
6+
against per-feature ``-mrtr`` capability flags.
7+
"""
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
"""Domain logic shared across all options — *not* SDK machinery.
2+
3+
The weather tool: given a location, asks which units, returns a temperature
4+
string. Same tool throughout so the diff between option files is the
5+
argument.
6+
7+
``audit_log`` is the side-effect that makes the MRTR footgun concrete: under
8+
naive re-entry it fires once per round. Options F and G tame it.
9+
"""
10+
11+
from __future__ import annotations
12+
13+
from mcp import types
14+
from mcp.server import Server, ServerRequestContext
15+
16+
UNITS_SCHEMA: types.ElicitRequestedSchema = {
17+
"type": "object",
18+
"properties": {"units": {"type": "string", "enum": ["metric", "imperial"], "title": "Units"}},
19+
"required": ["units"],
20+
}
21+
22+
UNITS_REQUEST = types.ElicitRequest(
23+
params=types.ElicitRequestFormParams(message="Which units?", requested_schema=UNITS_SCHEMA)
24+
)
25+
26+
27+
def lookup_weather(location: str, units: str) -> str:
28+
temp = "22°C" if units == "metric" else "72°F"
29+
return f"Weather in {location}: {temp}, partly cloudy."
30+
31+
32+
_audit_count = 0
33+
34+
35+
def audit_log(location: str) -> None:
36+
"""The footgun. Under naive re-entry this fires N times for N-round MRTR."""
37+
global _audit_count
38+
_audit_count += 1
39+
print(f"[audit] lookup requested for {location} (count={_audit_count})")
40+
41+
42+
def audit_count() -> int:
43+
return _audit_count
44+
45+
46+
def reset_audit() -> None:
47+
global _audit_count
48+
_audit_count = 0
49+
50+
51+
async def no_tools(ctx: ServerRequestContext, params: types.PaginatedRequestParams | None) -> types.ListToolsResult:
52+
"""Minimal tools/list handler so Client validation has something to call."""
53+
return types.ListToolsResult(tools=[])
54+
55+
56+
def build_server(name: str, on_call_tool: object, **kwargs: object) -> Server:
57+
"""Consistent Server construction across option files."""
58+
return Server(name, on_call_tool=on_call_tool, on_list_tools=no_tools, **kwargs) # type: ignore[arg-type]
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
"""Option A: SDK shim emulates the MRTR retry loop over SSE. Hidden loop.
2+
3+
Tool author writes MRTR-native code only. The SDK wrapper detects the
4+
negotiated version:
5+
- new client → pass ``IncompleteResult`` through, client drives retry
6+
- old client → SDK runs the retry loop *locally*, fulfilling each
7+
``InputRequest`` via real SSE (``ctx.session.elicit_form()``),
8+
re-invoking the handler until it returns a complete result
9+
10+
Author experience: one code path. Re-entry is explicit in source (the
11+
``if not prefs`` guard), so the handler is safe to re-invoke by
12+
construction. But the *fact* that it's re-invoked for old clients is
13+
invisible — the shim is doing work the author can't see.
14+
15+
What makes this "clunky but possible": the SDK runs a loop on the
16+
author's behalf. If the handler does something expensive before the
17+
guard, the author won't find out until an old client connects in prod.
18+
Works, but it's magic.
19+
20+
Deployment hazard: ``sse_retry_shim`` calls real SSE under the hood.
21+
On MRTR-only infra it fails at runtime when an old client connects —
22+
a constraint that lives nowhere near the tool code. If that's the
23+
deployment, use Option E.
24+
"""
25+
26+
from __future__ import annotations
27+
28+
from mcp import types
29+
from mcp.server import ServerRequestContext
30+
from mcp.server.experimental.mrtr import input_response, sse_retry_shim
31+
32+
from ._shared import UNITS_REQUEST, build_server, lookup_weather
33+
34+
# ───────────────────────────────────────────────────────────────────────────
35+
# This is what the tool author writes. One function, MRTR-native. No
36+
# version check, no SSE awareness. The ``if not prefs`` guard IS the
37+
# re-entry contract; the author sees it, but doesn't see the shim
38+
# calling this in a loop for old-client sessions.
39+
# ───────────────────────────────────────────────────────────────────────────
40+
41+
42+
async def weather(
43+
ctx: ServerRequestContext, params: types.CallToolRequestParams
44+
) -> types.CallToolResult | types.IncompleteResult:
45+
location = (params.arguments or {}).get("location", "?")
46+
47+
prefs = input_response(params, "units")
48+
if prefs is None:
49+
return types.IncompleteResult(input_requests={"units": UNITS_REQUEST})
50+
51+
return types.CallToolResult(content=[types.TextContent(text=lookup_weather(location, prefs["units"]))])
52+
53+
54+
# ───────────────────────────────────────────────────────────────────────────
55+
# Registration applies the shim. In a real SDK this could be a flag on
56+
# ``add_tool`` or inferred from the handler signature — the author opts in
57+
# once at registration, not per-call.
58+
# ───────────────────────────────────────────────────────────────────────────
59+
60+
server = build_server("mrtr-option-a", on_call_tool=sse_retry_shim(weather))
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
"""Option B: exception-based shim, ``await elicit()`` canonical. The footgun.
2+
3+
Tool author writes today's ``await ctx.elicit(...)`` style. The shim routes:
4+
- old client → native SSE, blocks inline (today's behaviour exactly)
5+
- new client → ``elicit()`` raises ``NeedsInputSignal``, shim catches,
6+
emits ``IncompleteResult``. On retry the handler runs *from the top*
7+
and this time ``elicit()`` finds the answer in ``input_responses``.
8+
9+
Author experience: zero migration. Handlers that work today keep working.
10+
The ``await`` reads linearly.
11+
12+
The problem: the ``await`` is a lie on MRTR sessions. Everything above it
13+
re-executes on retry. Uncomment the ``audit_log()`` call below — an MRTR
14+
client triggers *two* audit entries for one tool call. A pre-MRTR client
15+
triggers one. Same source, different observable behaviour, nothing warns.
16+
17+
Only safe if you can enforce "no side-effects before await" as a lint
18+
rule, which is hard in practice.
19+
20+
**This is not a ship target — it's a cautionary comparison.**
21+
"""
22+
23+
from __future__ import annotations
24+
25+
from mcp import types
26+
from mcp.server import ServerRequestContext
27+
from mcp.server.experimental.mrtr import input_response
28+
29+
from ._shared import UNITS_REQUEST, UNITS_SCHEMA, build_server, lookup_weather
30+
31+
32+
class NeedsInputSignal(Exception):
33+
"""Control-flow-by-exception. Unwound by the shim, packaged as IncompleteResult."""
34+
35+
def __init__(self, input_requests: types.InputRequests) -> None:
36+
self.input_requests = input_requests
37+
super().__init__("NeedsInputSignal (control flow, not an error)")
38+
39+
40+
async def elicit_or_signal(
41+
ctx: ServerRequestContext, params: types.CallToolRequestParams, key: str
42+
) -> dict[str, str] | None:
43+
"""The ``await``-able elicit that looks linear but isn't on MRTR."""
44+
version = ctx.session.client_params.protocol_version if ctx.session.client_params else None
45+
46+
# Old client: native SSE, no trickery.
47+
if version is None or str(version) < "2026-06-01":
48+
result = await ctx.session.elicit_form(message="Which units?", requested_schema=UNITS_SCHEMA)
49+
if result.action != "accept" or not result.content:
50+
return None
51+
return {k: str(v) for k, v in result.content.items()}
52+
53+
# New client: check input_responses first.
54+
prefs = input_response(params, key)
55+
if prefs is not None:
56+
return {k: str(v) for k, v in prefs.items()}
57+
58+
# Not pre-supplied → signal the shim. Everything on the stack unwinds.
59+
# On retry the handler re-executes from line one.
60+
raise NeedsInputSignal({key: UNITS_REQUEST})
61+
62+
63+
# ───────────────────────────────────────────────────────────────────────────
64+
# This is what the tool author writes. Looks linear. Isn't, on MRTR.
65+
# ───────────────────────────────────────────────────────────────────────────
66+
67+
68+
async def _weather_inner(ctx: ServerRequestContext, params: types.CallToolRequestParams) -> types.CallToolResult:
69+
location = (params.arguments or {}).get("location", "?")
70+
71+
# audit_log(location)
72+
# ^^^^^^^^^^^^^^^^^^
73+
# On pre-MRTR: runs once. On MRTR: runs once on the initial call,
74+
# once more on the retry. The await below isn't a suspension point
75+
# on MRTR — it's a re-entry point. Nothing in this syntax says so.
76+
77+
prefs = await elicit_or_signal(ctx, params, "units")
78+
if not prefs:
79+
return types.CallToolResult(content=[types.TextContent(text="Cancelled.")])
80+
81+
return types.CallToolResult(content=[types.TextContent(text=lookup_weather(location, prefs["units"]))])
82+
83+
84+
async def weather(
85+
ctx: ServerRequestContext, params: types.CallToolRequestParams
86+
) -> types.CallToolResult | types.IncompleteResult:
87+
try:
88+
return await _weather_inner(ctx, params)
89+
except NeedsInputSignal as signal:
90+
return types.IncompleteResult(input_requests=signal.input_requests)
91+
92+
93+
server = build_server("mrtr-option-b", on_call_tool=weather)
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"""Option C: explicit version branch in the handler body.
2+
3+
No shim. Tool author checks the negotiated version themselves and writes
4+
both code paths inline. The SDK provides nothing except the version
5+
accessor and the raw primitives for each path.
6+
7+
Author experience: everything is visible. Both protocol behaviours are
8+
right there in source, separated by an ``if``. No hidden re-entry, no
9+
magic wrappers. A reader traces exactly what happens for each client
10+
version.
11+
12+
The cost is also visible: the elicitation schema is duplicated, the
13+
cancel-handling is duplicated, and there's a conditional at the top of
14+
every handler that uses elicitation. For one tool, fine. For twenty,
15+
it's twenty copies of the same branch.
16+
"""
17+
18+
from __future__ import annotations
19+
20+
from mcp import types
21+
from mcp.server import ServerRequestContext
22+
from mcp.server.experimental.mrtr import input_response
23+
24+
from ._shared import UNITS_REQUEST, UNITS_SCHEMA, build_server, lookup_weather
25+
26+
27+
async def weather(
28+
ctx: ServerRequestContext, params: types.CallToolRequestParams
29+
) -> types.CallToolResult | types.IncompleteResult:
30+
location = (params.arguments or {}).get("location", "?")
31+
version = ctx.session.client_params.protocol_version if ctx.session.client_params else None
32+
33+
# ───────────────────────────────────────────────────────────────────────
34+
# The branch is the whole story.
35+
# ───────────────────────────────────────────────────────────────────────
36+
37+
if version is not None and str(version) >= "2026-06-01":
38+
# MRTR path: check input_responses, return IncompleteResult if missing.
39+
prefs = input_response(params, "units")
40+
if prefs is None:
41+
return types.IncompleteResult(input_requests={"units": UNITS_REQUEST})
42+
return types.CallToolResult(content=[types.TextContent(text=lookup_weather(location, prefs["units"]))])
43+
44+
# SSE path: inline await, blocks on the response stream.
45+
result = await ctx.session.elicit_form(message="Which units?", requested_schema=UNITS_SCHEMA)
46+
if result.action != "accept" or not result.content:
47+
return types.CallToolResult(content=[types.TextContent(text="Cancelled.")])
48+
units = str(result.content.get("units", "metric"))
49+
return types.CallToolResult(content=[types.TextContent(text=lookup_weather(location, units))])
50+
51+
52+
server = build_server("mrtr-option-c", on_call_tool=weather)

0 commit comments

Comments
 (0)