Preserve reasoning content in DurableAgent conversation history#1444
Preserve reasoning content in DurableAgent conversation history#1444
Conversation
…n history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: bc66735 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (2 failed)nextjs-webpack (1 failed):
sveltekit (1 failed):
🌍 Community Worlds (56 failed)mongodb (3 failed):
redis (2 failed):
turso (51 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
Include reasoning parts from step results in the assistant message alongside tool-call parts when building the conversation prompt for the next tool loop iteration. This mirrors what the AI SDK's toResponseMessages() does, ensuring reasoning models retain access to their prior reasoning across multi-step tool loops. Remove sanitizeProviderMetadataForToolCall() — with reasoning items now preserved in the conversation, OpenAI's itemId references are valid and no longer need to be stripped. Closes #1393 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chunksToStep() previously only collected reasoning-delta chunks. For models with encrypted reasoning (like o4-mini), there are no deltas — only reasoning-start + reasoning-end chunks carrying the itemId needed for OpenAI Responses API item references. Now aggregates reasoning by ID from reasoning-start chunks, appending delta text when available. This ensures providerMetadata (including itemId and reasoningEncryptedContent) flows through even when no reasoning deltas are emitted. Verified with o4-mini via OpenAI Responses API: - With fix: follow-up request accepted - Without fix: "Item 'fc_...' of type 'function_call' was provided without its required 'reasoning' item: 'rs_...'" (#880) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 5244397.
pranaygp
left a comment
There was a problem hiding this comment.
Solid PR — the approach of mirroring AI SDK's toResponseMessages() is the right call, and the removal of sanitizeProviderMetadataForToolCall follows cleanly from reasoning now being preserved. Tests are thorough (unit + e2e with mock reasoning model). Left a couple of inline comments.
| providerMetadata: chunk.providerMetadata, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
The reasoning-end chunks can also carry providerMetadata (visible in the stream transform at line 345-347 of this same file), but this aggregation loop only processes reasoning-start and reasoning-delta. If a provider attaches metadata exclusively to reasoning-end (rather than reasoning-start), it would be silently dropped.
Probably fine for current providers (OpenAI puts it on reasoning-start), but worth a comment or a defensive reasoning-end handler that merges metadata into the entry.
There was a problem hiding this comment.
In the AI SDK we throw an error when reasoning data is only attached to reasoning-end
but we are still capturing it, I updated the PR
| ...(meta != null ? { providerOptions: meta } : {}), | ||
| }; | ||
| }), | ||
| ] as typeof toolCalls, |
There was a problem hiding this comment.
Nit: this as typeof toolCalls cast was already here before this PR, but now it's even less accurate — the array genuinely contains reasoning parts that don't conform to LanguageModelV3ToolCall. If the prompt type's content field supports a union of reasoning + tool-call parts, it'd be cleaner to type it properly. Not a blocker though.
- Add reasoning-end handler in chunksToStep() to merge providerMetadata, mirroring the AI SDK's behavior where reasoning-end can carry final metadata that should override earlier values. - Replace inaccurate `as typeof toolCalls` cast with `as LanguageModelV3Prompt[number]['content']` since the content array now contains both reasoning and tool-call parts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| ] as Extract< | ||
| LanguageModelV3Prompt[number], | ||
| { role: 'assistant' } | ||
| >['content'], |
There was a problem hiding this comment.
Note sure I like this type, we could also revert to as typeof toolCalls with the comment
| ] as Extract< | |
| LanguageModelV3Prompt[number], | |
| { role: 'assistant' } | |
| >['content'], | |
| // Cast: content is a mix of reasoning + tool-call parts, both valid | |
| // in an assistant message. typeof toolCalls is imprecise but harmless. | |
| ] as typeof toolCalls, |
Summary
Closes #1393
stream-text-iterator.ts, mirroringtoResponseMessages()in the AI SDKsanitizeProviderMetadataForToolCall()and OpenAIitemIdstripping — with reasoning items preserved,itemIdreferences become validchunksToStep()to preserveproviderMetadataon reasoning parts (needed for OpenAI Responses APIitem_reference)chunksToStep()to aggregate reasoning fromreasoning-startchunks, not justreasoning-delta— encrypted reasoning (OpenAI o-series) emits no deltasitemIdsanitizationManual validation (OpenAI Responses API, #880)
The following script reproduces the error from #880 and confirms the fix. It requires an
OPENAI_API_KEYenv var with access too4-mini.Save as
packages/ai/test-openai-reasoning.mjsand run withcd packages/ai && node test-openai-reasoning.mjs:Expected output:
Test plan
pnpm vitest run packages/ai/)agentReasoningPreservationE2everifies reasoning flows through mock model tool loopo4-minivia OpenAI Responses API confirms Bug: DurableAgent + OpenAI Responses API fails on tool calls due to missing requiredreasoningitem #880 is fixedcd packages/ai && pnpm build)