Python: Support structuredContent in MCP tool results and fix sampling options type#4763
Python: Support structuredContent in MCP tool results and fix sampling options type#4763eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
Conversation
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 88%
✓ Correctness
The changes are correct and well-tested. The
_parse_tool_result_from_mcpfunction now handlesstructuredContentby serializing it to JSON text, placed correctly after the content loop and before the empty-result fallback. TheChatOptions[None]type annotation is a safe improvement — it's a local variable annotation in function scope, so it's never evaluated at runtime, and during type-checking the Generic branch provides proper type safety. Both test cases cover the relevant scenarios (structured content alone, and combined with regular content). No bugs, no missing error handling, no incorrect assumptions.
✓ Security Reliability
The changes are minimal and low-risk. The
structuredContenthandling usesjson.dumpson a value typed asdict[str, Any] | Noneby the MCP SDK, which is safe. TheChatOptions[None]type annotation is a purely cosmetic improvement. No injection risks, resource leaks, or unhandled failure modes introduced. One minor reliability concern:json.dumpscan raiseTypeErroron non-serializable values if the MCP server returns unexpected types within the dict, but this is an edge case given the MCP protocol guarantees JSON-compatible content.
✓ Test Coverage
The two new tests for
structuredContenthandling in_parse_tool_result_from_mcpare well-written and cover the key scenarios (structured content alongside regular content, and structured content alone). However, theChatOptions[None]type annotation change insampling_callbackhas no corresponding test coverage. A minor missing edge case is thatstructuredContent=None(explicit) isn't tested, though it's implicitly covered by all existing tests that omit it.
✗ Design Approach
The diff makes two changes: (1) appends
structuredContentas a JSON textContentitem after regular content in_parse_tool_result_from_mcp, and (2) tightens the type annotation ofoptionsfromdict[str, Any]toChatOptions[None]. The structured-content parsing is a reasonable pragmatic approach given the framework's limitedContenttype set, but has a meaningful design flaw:structuredContentis appended unconditionally even when the regularcontent[]array already carries the same data serialised as text (which is FastMCP's default behaviour whenresult_typeis used). The LLM will then see the same JSON payload twice, one in human-readable position and once as the appended structured blob. TheChatOptions[None]annotation is type-correct but purely cosmetic —Noneas the type parameter explicitly means 'no response model', so it does not fulfil the stated goal from the prior discussion of forwarding the desired output schema toget_response.
Flagged Issues
- When a FastMCP server uses
result_type, it typically serialises the structured value into bothcontent[0](as a text item) andstructuredContent. The new code appendsstructuredContentunconditionally after allcontentitems, causing the LLM to receive the same JSON payload twice. The fix should either (a) only emitstructuredContentwhencontentis empty (replacing the existing 'null' fallback), or (b) at minimum skipcontentitems that are identical to the JSON-serialised structured content. Emitting both by default risks confusing the model and inflating context.
Suggestions
- Consider whether
structuredContentshould be distinguished from regular text content (e.g., via a metadata flag or a different Content type) so downstream consumers can programmatically identify structured output without having to speculatively parse JSON. - Consider adding a test for
structuredContentwith nested complex types (e.g., nested dicts/lists) to verifyjson.dumpshandles deeper structures correctly. - The
ChatOptions[None]annotation is accurate but theNonetype parameter signals 'no structured output type', so it does not advance the goal (raised in prior discussion) of passing the server's expected schema asresponse_formattoget_response. If that goal is deferred intentionally, add a TODO comment so it is not silently dropped. Existing sampling callback tests already exercise this code path, but a type-checking CI step (mypy/pyright) would catch any incompatibility.
Automated review by eavanvalkenburg's agents
There was a problem hiding this comment.
Pull request overview
Adds support for retaining MCP structuredContent in parsed tool results and improves sampling callback option typing/forwarding so sampling metadata (system prompt/tools/tool choice) is passed through to the chat client.
Changes:
- Preserve
CallToolResult.structuredContentby appending it to parsed results as JSON text content. - Introduce typed
ChatOptionsusage and forwardsystemPrompt,tools, andtoolChoiceintoget_response(..., options=...). - Add unit tests covering structured content parsing and sampling forwarding / session capability wiring.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| python/packages/core/agent_framework/_mcp.py | Parse structuredContent, build typed sampling options, and advertise sampling capabilities when a chat client is present |
| python/packages/core/tests/core/test_mcp.py | Adds coverage for structuredContent parsing and sampling/connection behaviors |
You can also share your feedback on Copilot code review. Take the survey.
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 88%
✓ Correctness
The changes are logically correct. The structuredContent fallback change correctly prioritizes content items over structuredContent (avoiding duplication), matching the MCP spec intent. The truthiness-to-None checks for
params.systemPromptandparams.toolsare a genuine bugfix: since both default toNone, the oldif params.systemPrompt:would silently drop an empty string system prompt, andif params.tools:would drop an empty tools list — both valid values a server might send. Thedefault=straddition tojson.dumpsis a reasonable defensive measure. Tests are updated consistently with the behavior changes.
✓ Security Reliability
The changes are small and well-scoped. The
structuredContentfallback logic and theis not Nonetruthiness fixes are correct improvements. Thedefault=strinjson.dumpssilently coerces arbitrary types to strings, which could mask data-corruption bugs rather than failing loudly, but this is a low-severity concern sincestructuredContentis a dict from the MCP SDK. The error message in the sampling callback at line 777 still exposes raw exception details (f"Failed to get chat message content: {ex}") which is returned as an MCPErrorData—this could leak internal details to an untrusted MCP server, but this is pre-existing, not introduced by this diff. No injection, deserialization, secret, or resource-leak issues found.
✗ Test Coverage
The diff modifies two behaviors: (1) structuredContent is now a fallback (only used when no content items exist) with
default=stradded to json.dumps, and (2) the sampling callback now usesis not Nonechecks instead of truthiness for systemPrompt and tools. The existing tests for structuredContent are updated and a new nested-object test is added, which is good. However, there are three test coverage gaps: no test for thedefault=strbehavior on json.dumps (e.g., structuredContent containing a non-serializable type like datetime), no test for an empty-string systemPrompt being forwarded (the semantic change from truthiness tois not None), and no test for an empty tools list being forwarded (same truthiness-to-identity change).
✓ Design Approach
The diff makes three independent improvements: (1) making
structuredContenta fallback-only whencontentitems are absent, (2) addingdefault=strto thejson.dumpscall, and (3) tighteningsystemPrompt/toolschecks from truthiness to explicitNonecomparisons. All three changes are directionally correct. The one design concern worth calling out is that permanently discardingstructuredContentwhen anycontentitems are present means structured data is irreversibly lost at the parsing boundary — even when the structured payload contains richer information than the accompanying human-readable text. The function signature-> list[Content]has no way to carry structured data forward, so the TODO aboutresult_type/response_formatwill require a different entry point (e.g., a parallel_parse_tool_result_from_mcp_typedor a richer return type) rather than being retrofittable into this function. That said, the current fix is correct for the immediate LLM-consumption use case, and the limitation is explicitly acknowledged.
Flagged Issues
- No tests cover the behavioral changes from truthiness to
is not Nonechecks. An empty stringparams.systemPrompt = ""is now forwarded asinstructions, and an empty listparams.tools = []is now forwarded astools, but no tests verify these edge cases.
Suggestions
- Consider whether
default=strinjson.dumps(mcp_type.structuredContent, default=str)is the right tradeoff. SincestructuredContentis typed asdict[str, Any]by the MCP SDK and comes from JSON deserialization, it should always be JSON-serializable. Omittingdefault=strwould fail fast on unexpected types; alternatively, keep it but log a warning when the fallback is actually invoked. - Pre-existing: the sampling callback at line 777 returns
f"Failed to get chat message content: {ex}"as an MCPErrorDatamessage. If the MCP server is untrusted, this could leak internal exception details (file paths, credentials in connection strings). Consider sanitizing the error message. - Add a test for
structuredContentcontaining a non-JSON-serializable value (e.g. adatetimeobject) to exercise thedefault=strfallback injson.dumps. - Add tests for the empty-value edge cases:
params.systemPrompt = ""should setoptions["instructions"] = "", andparams.tools = []should setoptions["tools"] = []. - The
structuredContentis permanently discarded whencontentis non-empty, making it impossible to add typed result support (theresult_type/response_formatTODO) through_parse_tool_result_from_mcpwithout a breaking signature change. Consider noting this constraint in the TODO comment so the next author knows a parallel code path or richer return type will be needed. - An MCP server sending
tools=[]will now produceoptions["tools"] = []rather than omitting it. Downstream LLM clients likely filter via truthiness, but verify no middleware path treatsoptions.get("tools") is not Noneas semantically meaningful.
Automated review by eavanvalkenburg's agents
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 92%
✓ Correctness
This is a small, correct change that improves error handling hygiene in the MCP sampling callback. It removes the raw exception message from the error response returned to the MCP server (preventing potential information leakage), and instead logs it locally at DEBUG level with full traceback via exc_info=True. The test is updated accordingly. The logger is already defined in the module. No correctness issues found.
✓ Security Reliability
This is a good security hardening change. It removes the internal exception message (which could leak implementation details like stack traces, internal service names, or credential fragments) from the error response sent back to the MCP server, and instead logs it server-side at debug level. The change is minimal, correct, and the test is properly updated. No issues found.
✗ Test Coverage
The diff strips exception details from the user-facing error message (good security practice) and adds debug logging with
exc_info=True. The existing testtest_mcp_tool_sampling_callback_chat_client_exceptionis updated to match the new message but does not verify that the debug log is actually emitted. Since the logging is the new behavior introduced by this change, it should be tested.
✓ Design Approach
The change strips the exception message from the user-facing error string and logs it at DEBUG level instead. This is a reasonable security/information-leakage improvement — internal exception details (which might include credentials, tokens, or internal hostnames from the chat client) should not be forwarded to the MCP server/caller in the error message. Logging at DEBUG with exc_info=True preserves full diagnostics for operators. The test is updated consistently. No design-level concerns: the fix is narrowly scoped, doesn't mask a deeper root cause, and doesn't extend a questionable pattern.
Flagged Issues
- No test verifies the new
logger.debug("Sampling callback error: ...")call. Since the purpose of this change is to move exception details from the user-facing message into the debug log, a test should confirm the debug log is actually produced (e.g., viacaplogor by patching the logger).
Suggestions
- Consider logging at WARNING or ERROR level rather than DEBUG, since a sampling callback failure is an operational error that operators need to notice in production logs where DEBUG is typically suppressed.
Automated review by eavanvalkenburg's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
chetantoshniwal
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 81%
✓ Correctness
✗ Security Reliability
The diff adds structured content fallback handling, sampling capability negotiation, and richer option forwarding in the MCP sampling callback. There is one blocking reliability issue in the test file: the test for
test_connect_sampling_capabilities_with_clientis missing itsasync deffunction definition and its body is accidentally embedded insidetest_mcp_tool_sampling_callback_omits_temperature_when_none, making it an unreliable merged test that won't be discoverable as a separate test case. In the production code, thejson.dumpscall onstructuredContentcan raise an unhandledTypeErrorfor non-serializable payloads, which would propagate as an unexpected exception to the caller instead of being caught and returned gracefully. Additionally, the error logging in the sampling callback includesexc_info=True, which is fine at DEBUG level but the error message returned to the MCP server still includes the raw exception string, potentially leaking internal details across the trust boundary.
✗ Test Coverage
The diff adds good test coverage for the new
structuredContentfallback and the expandedsampling_callbackoptions forwarding. However, there is a critical structural bug: the test fortest_connect_sampling_capabilities_with_clientis missing itsasync deffunction definition, causing its body to be appended inside the unrelatedtest_mcp_tool_sampling_callback_omits_temperature_when_nonetest. This means the sampling-capabilities-with-client scenario will not run as a discoverable test, and it will corrupt the temperature-omission test. Additionally, theoptions or Noneexpression on theget_responsecall is effectively dead code becauseoptionsalways contains at leastmax_tokens, making the dict truthy; a test verifying the 'no options' path would be misleading.
✗ Design Approach
The refactoring of
sampling_callbackto use aChatOptionsdict is the right direction, but contains one inconsistency that makes a guard expression permanently dead:options["max_tokens"] = params.maxTokensis unconditional, always inserting a key into the dict (even with aNonevalue), which meansoptions or Noneon theget_responsecall always evaluates tooptions— the guard never fires. Beyond the dead guard, passing an explicitNoneformax_tokensis semantically different from omitting the key, and some backend implementations may treat them differently. ThestructuredContentfallback andsampling_capabilitieswiring are clean and well-scoped. However, there is a test-file bug: thetest_connect_with_sampling_capabilitiestest function is missing itsasync defdeclaration — its body (starting with a bare docstring) is appended as dead code insidetest_mcp_tool_sampling_callback_omits_temperature_when_none, so that test never runs.
Flagged Issues
- python/packages/core/tests/core/test_mcp.py: The test for
connect() with sampling_capabilitiesis missing itsasync deffunction definition. Its body is accidentally embedded at the end oftest_mcp_tool_sampling_callback_omits_temperature_when_none, so (1) pytest won't discover it as a separate test, (2) it runs under a misleading test name, and (3) the temperature-omission test is corrupted by the appended code. - python/packages/core/agent_framework/_mcp.py: In
sampling_callback,options["max_tokens"] = params.maxTokensis always executed unconditionally (even whenparams.maxTokens is None), inserting aNone-valued key into the dict. This makes theoptions or Noneguard on theget_responsecall permanently dead (a dict with any key is truthy) and silently propagatesmax_tokens=Noneto the underlying client. Guard it the same way temperature is guarded:if params.maxTokens is not None: options["max_tokens"] = params.maxTokens.
Suggestions
- In
_parse_tool_result_from_mcp, wrap thejson.dumps(mcp_type.structuredContent)call in a try/except to handle non-serializable payloads gracefully (e.g., fall back tostr()) rather than lettingTypeErrorpropagate as an unhandled exception to the caller. - Consider sanitizing the exception message in the sampling callback error response (
message=f"Failed to get chat message content: {ex}") to avoid leaking internal details (stack traces, file paths, config values) across the MCP trust boundary back to the server. - Add a test for the edge case where
params.toolChoiceis set butparams.toolChoice.modeisNone, to verifytool_choiceis not added to options. - After fixing the max_tokens guard, consider simplifying or dropping the
options or Noneexpression entirely, depending on the client contract for handling an empty dict.
Automated review by chetantoshniwal's agents
Forward systemPrompt, tools, and toolChoice from MCP sampling requests to the chat client's get_response() call. Also advertise the sampling.tools capability to MCP servers when a client is configured. - Pass SamplingCapability with tools support to ClientSession - Convert systemPrompt to instructions in options - Convert MCP Tool objects to FunctionTool instances for options - Map MCP ToolChoice.mode to tool_choice in options - Add tests for all new behaviors and update existing sampling tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…structured content - Fix mypy error by typing sampling callback options as ChatOptions[None] instead of dict[str, Any], and importing ChatOptions from _types - Handle structuredContent from CallToolResult in _parse_tool_result_from_mcp, serializing it as JSON text Content when present - Add tests for structuredContent parsing (with and without regular content) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dge-case tests
- Remove default=str from json.dumps for structuredContent to fail fast
on non-JSON-serializable values instead of silently converting
- Add test for non-JSON-serializable structuredContent (TypeError)
- Add tests for empty systemPrompt ('') and empty tools list ([]) edge
cases in sampling callback
- Expand TODO comment noting list[Content] return type constraint for
future result_type support
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t#4625) Log exception details at DEBUG level instead of including them in the ErrorData message returned to the MCP server, which may be untrusted. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…estore error info
- Remove stale TODO comment about response_format (ChatOptions already has it)
- Restore {ex} in sampling callback error message for useful debugging info
- Set structuredContent as additional_property on Content for structured access
- Move temperature, max_tokens, stop into options dict (not top-level kwargs)
- Only set temperature when provided (not all models support it)
- Add tests for generation params in options and temperature omission
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…osoft#4625) - Guard max_tokens like temperature: only set when not None, so options can properly evaluate to None when all params are absent - Wrap json.dumps of structuredContent in try/except to fall back to str() for non-serializable values instead of propagating TypeError - Extract test_connect_sampling_capabilities_with_client into its own test function so pytest can discover it independently - Add test for max_tokens=None omission from options - Update structured content non-serializable test to expect fallback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
73ec337 to
e08036a
Compare
Motivation and Context
MCP servers can return
structuredContentinCallToolResult, but the agent framework silently dropped this data. Additionally, the sampling handler'soptionsdict lacked proper typing, making it inconsistent with the rest of the codebase.Fixes #4625
Description
The
_parse_tool_result_from_mcpfunction did not handle thestructuredContentfield onCallToolResult, so any structured data returned by MCP tools was lost. This fix appendsstructuredContentas a JSON-serializedContent.from_textentry to the parsed result list, and updates the sampling handler'soptionsvariable to use the typedChatOptions[None]instead of a baredict[str, Any]. Tests are added to verify thatstructuredContentis correctly included both alongside regular content and when it is the only content present.Contribution Checklist