fix: repair orphaned toolUser in last message during session restore#2026
fix: repair orphaned toolUser in last message during session restore#2026konippi wants to merge 2 commits intostrands-agents:mainfrom
Conversation
|
Hi, I stumbled upon this PR while investigating the same issue in my own project. Nice fix — the In-place mutation makes the old test vacuousThe original fixed_messages = session_manager._fix_broken_tool_use(messages)
assert fixed_messages == messagesSince Your new assertions ( import copy
original = copy.deepcopy(messages)
fixed_messages = session_manager._fix_broken_tool_use(messages)
assert len(original) == 2 # sanity: original is untouched
assert len(fixed_messages) == 3The
|
|
Following up on my comment above — I went ahead and opened an issue and a PR for the test improvements:
Happy to adjust if anything doesn't fit your project conventions. |
Thanks for the thorough review and for catching both the deepcopy issue and the insert-induced edge case — really appreciate it! |
|
Assessment: Approve Good fix for a real bug. The change correctly extends Minor SuggestionThe docstring for Thanks for the clear PR description and test coverage! 🎉 |
|
Thanks for raising this pull request! I'm still a bit confused how this issue happens in the first place. We intentionally don't repair the latest message in the session manager, and instead rely on the agent invocation to fix this: https://github.com/strands-agents/sdk-python/blob/main/src/strands/agent/agent.py#L986-L1001 Basically, during agent invocation, we check to see if the latest message is a tool use, and if so, then we put a toolResult message after it so that the conversation is valid. Then we append the use message so we have a fully valid conversation. The reason we do this is because if you resume a session that had interrupted its tool use, you can pass None into your agent and that tool will automatically execute, saving an llm invoke. (This pull request goes into more detail: #1123) By automatically adding a toolResult after an orphaned tool use in the session manager, you break this behavior. But ideally you shouldnt be getting this error in the first place. Can you help me understand how you get this error? Maybe a small reproducible example? |
Description
When an agent process is terminated during tool execution (e.g., runtime timeout in multi-agent setups), the session ends with an assistant message containing
toolUsebut no correspondingtoolResult. On the next invocation,_fix_broken_tool_userestores the session but explicitly skips the last message, leaving the conversation in an invalid state that causes aValidationExceptionfrom the model provider.This change removes the last-message exclusion so that orphaned
toolUseat the end of the conversation is repaired during session restoration, the same way mid-conversation orphans are already handled.Related Issues
resolves: #2025
_fix_broken_tool_use(the last-message case was deferred toAgent._convert_prompt_to_messages, which doesn't cover all code paths)Documentation PR
N/AType of Change
Bug fix
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.