Skip to content

Python: Support structuredContent in MCP tool results and fix sampling options type#4763

Open
eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
eavanvalkenburg:agent/fix-4625-1
Open

Python: Support structuredContent in MCP tool results and fix sampling options type#4763
eavanvalkenburg wants to merge 9 commits intomicrosoft:mainfrom
eavanvalkenburg:agent/fix-4625-1

Conversation

@eavanvalkenburg
Copy link
Member

Motivation and Context

MCP servers can return structuredContent in CallToolResult, but the agent framework silently dropped this data. Additionally, the sampling handler's options dict lacked proper typing, making it inconsistent with the rest of the codebase.

Fixes #4625

Description

The _parse_tool_result_from_mcp function did not handle the structuredContent field on CallToolResult, so any structured data returned by MCP tools was lost. This fix appends structuredContent as a JSON-serialized Content.from_text entry to the parsed result list, and updates the sampling handler's options variable to use the typed ChatOptions[None] instead of a bare dict[str, Any]. Tests are added to verify that structuredContent is correctly included both alongside regular content and when it is the only content present.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by eavanvalkenburg's agent

Copilot AI review requested due to automatic review settings March 18, 2026 13:34
@eavanvalkenburg eavanvalkenburg self-assigned this Mar 18, 2026
Copy link
Member Author

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 88%

✓ Correctness

The changes are correct and well-tested. The _parse_tool_result_from_mcp function now handles structuredContent by serializing it to JSON text, placed correctly after the content loop and before the empty-result fallback. The ChatOptions[None] type annotation is a safe improvement — it's a local variable annotation in function scope, so it's never evaluated at runtime, and during type-checking the Generic branch provides proper type safety. Both test cases cover the relevant scenarios (structured content alone, and combined with regular content). No bugs, no missing error handling, no incorrect assumptions.

✓ Security Reliability

The changes are minimal and low-risk. The structuredContent handling uses json.dumps on a value typed as dict[str, Any] | None by the MCP SDK, which is safe. The ChatOptions[None] type annotation is a purely cosmetic improvement. No injection risks, resource leaks, or unhandled failure modes introduced. One minor reliability concern: json.dumps can raise TypeError on non-serializable values if the MCP server returns unexpected types within the dict, but this is an edge case given the MCP protocol guarantees JSON-compatible content.

✓ Test Coverage

The two new tests for structuredContent handling in _parse_tool_result_from_mcp are well-written and cover the key scenarios (structured content alongside regular content, and structured content alone). However, the ChatOptions[None] type annotation change in sampling_callback has no corresponding test coverage. A minor missing edge case is that structuredContent=None (explicit) isn't tested, though it's implicitly covered by all existing tests that omit it.

✗ Design Approach

The diff makes two changes: (1) appends structuredContent as a JSON text Content item after regular content in _parse_tool_result_from_mcp, and (2) tightens the type annotation of options from dict[str, Any] to ChatOptions[None]. The structured-content parsing is a reasonable pragmatic approach given the framework's limited Content type set, but has a meaningful design flaw: structuredContent is appended unconditionally even when the regular content[] array already carries the same data serialised as text (which is FastMCP's default behaviour when result_type is used). The LLM will then see the same JSON payload twice, one in human-readable position and once as the appended structured blob. The ChatOptions[None] annotation is type-correct but purely cosmetic — None as the type parameter explicitly means 'no response model', so it does not fulfil the stated goal from the prior discussion of forwarding the desired output schema to get_response.

Flagged Issues

  • When a FastMCP server uses result_type, it typically serialises the structured value into both content[0] (as a text item) and structuredContent. The new code appends structuredContent unconditionally after all content items, causing the LLM to receive the same JSON payload twice. The fix should either (a) only emit structuredContent when content is empty (replacing the existing 'null' fallback), or (b) at minimum skip content items that are identical to the JSON-serialised structured content. Emitting both by default risks confusing the model and inflating context.

Suggestions

  • Consider whether structuredContent should be distinguished from regular text content (e.g., via a metadata flag or a different Content type) so downstream consumers can programmatically identify structured output without having to speculatively parse JSON.
  • Consider adding a test for structuredContent with nested complex types (e.g., nested dicts/lists) to verify json.dumps handles deeper structures correctly.
  • The ChatOptions[None] annotation is accurate but the None type parameter signals 'no structured output type', so it does not advance the goal (raised in prior discussion) of passing the server's expected schema as response_format to get_response. If that goal is deferred intentionally, add a TODO comment so it is not silently dropped. Existing sampling callback tests already exercise this code path, but a type-checking CI step (mypy/pyright) would catch any incompatibility.

Automated review by eavanvalkenburg's agents

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for retaining MCP structuredContent in parsed tool results and improves sampling callback option typing/forwarding so sampling metadata (system prompt/tools/tool choice) is passed through to the chat client.

Changes:

  • Preserve CallToolResult.structuredContent by appending it to parsed results as JSON text content.
  • Introduce typed ChatOptions usage and forward systemPrompt, tools, and toolChoice into get_response(..., options=...).
  • Add unit tests covering structured content parsing and sampling forwarding / session capability wiring.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
python/packages/core/agent_framework/_mcp.py Parse structuredContent, build typed sampling options, and advertise sampling capabilities when a chat client is present
python/packages/core/tests/core/test_mcp.py Adds coverage for structuredContent parsing and sampling/connection behaviors

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Member Author

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 88%

✓ Correctness

The changes are logically correct. The structuredContent fallback change correctly prioritizes content items over structuredContent (avoiding duplication), matching the MCP spec intent. The truthiness-to-None checks for params.systemPrompt and params.tools are a genuine bugfix: since both default to None, the old if params.systemPrompt: would silently drop an empty string system prompt, and if params.tools: would drop an empty tools list — both valid values a server might send. The default=str addition to json.dumps is a reasonable defensive measure. Tests are updated consistently with the behavior changes.

✓ Security Reliability

The changes are small and well-scoped. The structuredContent fallback logic and the is not None truthiness fixes are correct improvements. The default=str in json.dumps silently coerces arbitrary types to strings, which could mask data-corruption bugs rather than failing loudly, but this is a low-severity concern since structuredContent is a dict from the MCP SDK. The error message in the sampling callback at line 777 still exposes raw exception details (f"Failed to get chat message content: {ex}") which is returned as an MCP ErrorData—this could leak internal details to an untrusted MCP server, but this is pre-existing, not introduced by this diff. No injection, deserialization, secret, or resource-leak issues found.

✗ Test Coverage

The diff modifies two behaviors: (1) structuredContent is now a fallback (only used when no content items exist) with default=str added to json.dumps, and (2) the sampling callback now uses is not None checks instead of truthiness for systemPrompt and tools. The existing tests for structuredContent are updated and a new nested-object test is added, which is good. However, there are three test coverage gaps: no test for the default=str behavior on json.dumps (e.g., structuredContent containing a non-serializable type like datetime), no test for an empty-string systemPrompt being forwarded (the semantic change from truthiness to is not None), and no test for an empty tools list being forwarded (same truthiness-to-identity change).

✓ Design Approach

The diff makes three independent improvements: (1) making structuredContent a fallback-only when content items are absent, (2) adding default=str to the json.dumps call, and (3) tightening systemPrompt/tools checks from truthiness to explicit None comparisons. All three changes are directionally correct. The one design concern worth calling out is that permanently discarding structuredContent when any content items are present means structured data is irreversibly lost at the parsing boundary — even when the structured payload contains richer information than the accompanying human-readable text. The function signature -> list[Content] has no way to carry structured data forward, so the TODO about result_type/response_format will require a different entry point (e.g., a parallel _parse_tool_result_from_mcp_typed or a richer return type) rather than being retrofittable into this function. That said, the current fix is correct for the immediate LLM-consumption use case, and the limitation is explicitly acknowledged.

Flagged Issues

  • No tests cover the behavioral changes from truthiness to is not None checks. An empty string params.systemPrompt = "" is now forwarded as instructions, and an empty list params.tools = [] is now forwarded as tools, but no tests verify these edge cases.

Suggestions

  • Consider whether default=str in json.dumps(mcp_type.structuredContent, default=str) is the right tradeoff. Since structuredContent is typed as dict[str, Any] by the MCP SDK and comes from JSON deserialization, it should always be JSON-serializable. Omitting default=str would fail fast on unexpected types; alternatively, keep it but log a warning when the fallback is actually invoked.
  • Pre-existing: the sampling callback at line 777 returns f"Failed to get chat message content: {ex}" as an MCP ErrorData message. If the MCP server is untrusted, this could leak internal exception details (file paths, credentials in connection strings). Consider sanitizing the error message.
  • Add a test for structuredContent containing a non-JSON-serializable value (e.g. a datetime object) to exercise the default=str fallback in json.dumps.
  • Add tests for the empty-value edge cases: params.systemPrompt = "" should set options["instructions"] = "", and params.tools = [] should set options["tools"] = [].
  • The structuredContent is permanently discarded when content is non-empty, making it impossible to add typed result support (the result_type/response_format TODO) through _parse_tool_result_from_mcp without a breaking signature change. Consider noting this constraint in the TODO comment so the next author knows a parallel code path or richer return type will be needed.
  • An MCP server sending tools=[] will now produce options["tools"] = [] rather than omitting it. Downstream LLM clients likely filter via truthiness, but verify no middleware path treats options.get("tools") is not None as semantically meaningful.

Automated review by eavanvalkenburg's agents

Copy link
Member Author

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 92%

✓ Correctness

This is a small, correct change that improves error handling hygiene in the MCP sampling callback. It removes the raw exception message from the error response returned to the MCP server (preventing potential information leakage), and instead logs it locally at DEBUG level with full traceback via exc_info=True. The test is updated accordingly. The logger is already defined in the module. No correctness issues found.

✓ Security Reliability

This is a good security hardening change. It removes the internal exception message (which could leak implementation details like stack traces, internal service names, or credential fragments) from the error response sent back to the MCP server, and instead logs it server-side at debug level. The change is minimal, correct, and the test is properly updated. No issues found.

✗ Test Coverage

The diff strips exception details from the user-facing error message (good security practice) and adds debug logging with exc_info=True. The existing test test_mcp_tool_sampling_callback_chat_client_exception is updated to match the new message but does not verify that the debug log is actually emitted. Since the logging is the new behavior introduced by this change, it should be tested.

✓ Design Approach

The change strips the exception message from the user-facing error string and logs it at DEBUG level instead. This is a reasonable security/information-leakage improvement — internal exception details (which might include credentials, tokens, or internal hostnames from the chat client) should not be forwarded to the MCP server/caller in the error message. Logging at DEBUG with exc_info=True preserves full diagnostics for operators. The test is updated consistently. No design-level concerns: the fix is narrowly scoped, doesn't mask a deeper root cause, and doesn't extend a questionable pattern.

Flagged Issues

  • No test verifies the new logger.debug("Sampling callback error: ...") call. Since the purpose of this change is to move exception details from the user-facing message into the debug log, a test should confirm the debug log is actually produced (e.g., via caplog or by patching the logger).

Suggestions

  • Consider logging at WARNING or ERROR level rather than DEBUG, since a sampling callback failure is an operational error that operators need to notice in production logs where DEBUG is typically suppressed.

Automated review by eavanvalkenburg's agents

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 18, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _mcp.py5497286%97–98, 108–113, 124, 129, 175, 184, 196–197, 260, 269, 332, 340, 399, 512, 547, 560, 562–565, 584–585, 598–601, 603–604, 608, 665, 706, 708, 712–713, 715–716, 791, 824, 869, 1001, 1014–1019, 1043, 1102–1103, 1109–1111, 1130, 1155–1156, 1160–1164, 1181–1185, 1332
TOTAL27274321988% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5346 20 💤 0 ❌ 0 🔥 1m 28s ⏱️

Copy link

@chetantoshniwal chetantoshniwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 81%

✓ Correctness

✗ Security Reliability

The diff adds structured content fallback handling, sampling capability negotiation, and richer option forwarding in the MCP sampling callback. There is one blocking reliability issue in the test file: the test for test_connect_sampling_capabilities_with_client is missing its async def function definition and its body is accidentally embedded inside test_mcp_tool_sampling_callback_omits_temperature_when_none, making it an unreliable merged test that won't be discoverable as a separate test case. In the production code, the json.dumps call on structuredContent can raise an unhandled TypeError for non-serializable payloads, which would propagate as an unexpected exception to the caller instead of being caught and returned gracefully. Additionally, the error logging in the sampling callback includes exc_info=True, which is fine at DEBUG level but the error message returned to the MCP server still includes the raw exception string, potentially leaking internal details across the trust boundary.

✗ Test Coverage

The diff adds good test coverage for the new structuredContent fallback and the expanded sampling_callback options forwarding. However, there is a critical structural bug: the test for test_connect_sampling_capabilities_with_client is missing its async def function definition, causing its body to be appended inside the unrelated test_mcp_tool_sampling_callback_omits_temperature_when_none test. This means the sampling-capabilities-with-client scenario will not run as a discoverable test, and it will corrupt the temperature-omission test. Additionally, the options or None expression on the get_response call is effectively dead code because options always contains at least max_tokens, making the dict truthy; a test verifying the 'no options' path would be misleading.

✗ Design Approach

The refactoring of sampling_callback to use a ChatOptions dict is the right direction, but contains one inconsistency that makes a guard expression permanently dead: options["max_tokens"] = params.maxTokens is unconditional, always inserting a key into the dict (even with a None value), which means options or None on the get_response call always evaluates to options — the guard never fires. Beyond the dead guard, passing an explicit None for max_tokens is semantically different from omitting the key, and some backend implementations may treat them differently. The structuredContent fallback and sampling_capabilities wiring are clean and well-scoped. However, there is a test-file bug: the test_connect_with_sampling_capabilities test function is missing its async def declaration — its body (starting with a bare docstring) is appended as dead code inside test_mcp_tool_sampling_callback_omits_temperature_when_none, so that test never runs.

Flagged Issues

  • python/packages/core/tests/core/test_mcp.py: The test for connect() with sampling_capabilities is missing its async def function definition. Its body is accidentally embedded at the end of test_mcp_tool_sampling_callback_omits_temperature_when_none, so (1) pytest won't discover it as a separate test, (2) it runs under a misleading test name, and (3) the temperature-omission test is corrupted by the appended code.
  • python/packages/core/agent_framework/_mcp.py: In sampling_callback, options["max_tokens"] = params.maxTokens is always executed unconditionally (even when params.maxTokens is None), inserting a None-valued key into the dict. This makes the options or None guard on the get_response call permanently dead (a dict with any key is truthy) and silently propagates max_tokens=None to the underlying client. Guard it the same way temperature is guarded: if params.maxTokens is not None: options["max_tokens"] = params.maxTokens.

Suggestions

  • In _parse_tool_result_from_mcp, wrap the json.dumps(mcp_type.structuredContent) call in a try/except to handle non-serializable payloads gracefully (e.g., fall back to str()) rather than letting TypeError propagate as an unhandled exception to the caller.
  • Consider sanitizing the exception message in the sampling callback error response (message=f"Failed to get chat message content: {ex}") to avoid leaking internal details (stack traces, file paths, config values) across the MCP trust boundary back to the server.
  • Add a test for the edge case where params.toolChoice is set but params.toolChoice.mode is None, to verify tool_choice is not added to options.
  • After fixing the max_tokens guard, consider simplifying or dropping the options or None expression entirely, depending on the client contract for handling an empty dict.

Automated review by chetantoshniwal's agents

Copilot and others added 9 commits March 20, 2026 11:10
Forward systemPrompt, tools, and toolChoice from MCP sampling requests
to the chat client's get_response() call. Also advertise the
sampling.tools capability to MCP servers when a client is configured.

- Pass SamplingCapability with tools support to ClientSession
- Convert systemPrompt to instructions in options
- Convert MCP Tool objects to FunctionTool instances for options
- Map MCP ToolChoice.mode to tool_choice in options
- Add tests for all new behaviors and update existing sampling tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…structured content

- Fix mypy error by typing sampling callback options as ChatOptions[None]
  instead of dict[str, Any], and importing ChatOptions from _types
- Handle structuredContent from CallToolResult in _parse_tool_result_from_mcp,
  serializing it as JSON text Content when present
- Add tests for structuredContent parsing (with and without regular content)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dge-case tests

- Remove default=str from json.dumps for structuredContent to fail fast
  on non-JSON-serializable values instead of silently converting
- Add test for non-JSON-serializable structuredContent (TypeError)
- Add tests for empty systemPrompt ('') and empty tools list ([]) edge
  cases in sampling callback
- Expand TODO comment noting list[Content] return type constraint for
  future result_type support

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t#4625)

Log exception details at DEBUG level instead of including them in the
ErrorData message returned to the MCP server, which may be untrusted.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…estore error info

- Remove stale TODO comment about response_format (ChatOptions already has it)
- Restore {ex} in sampling callback error message for useful debugging info
- Set structuredContent as additional_property on Content for structured access
- Move temperature, max_tokens, stop into options dict (not top-level kwargs)
- Only set temperature when provided (not all models support it)
- Add tests for generation params in options and temperature omission

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…osoft#4625)

- Guard max_tokens like temperature: only set when not None, so options
  can properly evaluate to None when all params are absent
- Wrap json.dumps of structuredContent in try/except to fall back to
  str() for non-serializable values instead of propagating TypeError
- Extract test_connect_sampling_capabilities_with_client into its own
  test function so pytest can discover it independently
- Add test for max_tokens=None omission from options
- Update structured content non-serializable test to expect fallback

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Feature]: Support for MCP sampling tool

4 participants