Skip to content

Preserve reasoning content in DurableAgent conversation history#1444

Open
gr2m wants to merge 8 commits intomainfrom
gr2m/preserve-reasoning-content
Open

Preserve reasoning content in DurableAgent conversation history#1444
gr2m wants to merge 8 commits intomainfrom
gr2m/preserve-reasoning-content

Conversation

@gr2m
Copy link

@gr2m gr2m commented Mar 18, 2026

Summary

Closes #1393

  • Include reasoning content parts in the assistant message alongside tool-call parts in stream-text-iterator.ts, mirroring toResponseMessages() in the AI SDK
  • Remove sanitizeProviderMetadataForToolCall() and OpenAI itemId stripping — with reasoning items preserved, itemId references become valid
  • Fix chunksToStep() to preserve providerMetadata on reasoning parts (needed for OpenAI Responses API item_reference)
  • Fix chunksToStep() to aggregate reasoning from reasoning-start chunks, not just reasoning-delta — encrypted reasoning (OpenAI o-series) emits no deltas
  • Update tests to verify reasoning preservation and reflect the removal of itemId sanitization

Manual validation (OpenAI Responses API, #880)

The following script reproduces the error from #880 and confirms the fix. It requires an OPENAI_API_KEY env var with access to o4-mini.

Save as packages/ai/test-openai-reasoning.mjs and run with cd packages/ai && node test-openai-reasoning.mjs:

import { createOpenAI } from '@ai-sdk/openai';

const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });
const model = openai.responses('o4-mini');

// Step 1: Send prompt that triggers a tool call with reasoning
console.log('Step 1: Sending initial request to trigger tool call...');
const result1 = await model.doStream({
  prompt: [
    { role: 'system', content: 'You are a helpful assistant. Always use the getWeather tool.' },
    { role: 'user', content: [{ type: 'text', text: 'What is the weather in San Francisco?' }] },
  ],
  tools: [{
    type: 'function',
    name: 'getWeather',
    parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
  }],
  toolChoice: { type: 'required' },
});

const chunks = [];
const reader = result1.stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

const reasoningChunks = chunks.filter(c =>
  c.type === 'reasoning-start' || c.type === 'reasoning-delta' || c.type === 'reasoning-end'
);
const toolCallChunks = chunks.filter(c => c.type === 'tool-call');
console.log(`  Reasoning chunks: ${reasoningChunks.length}, Tool call chunks: ${toolCallChunks.length}`);

// Step 2: Build reasoning parts (mirrors chunksToStep + stream-text-iterator fix)
const reasoningById = new Map();
for (const chunk of reasoningChunks) {
  if (chunk.type === 'reasoning-start') {
    reasoningById.set(chunk.id, { text: '', providerMetadata: chunk.providerMetadata });
  } else if (chunk.type === 'reasoning-delta') {
    const entry = reasoningById.get(chunk.id);
    if (entry) {
      entry.text += chunk.delta;
      if (chunk.providerMetadata) entry.providerMetadata = chunk.providerMetadata;
    }
  }
}
const reasoningParts = Array.from(reasoningById.values()).map(r => ({
  type: 'reasoning',
  text: r.text,
  ...(r.providerMetadata != null ? { providerOptions: r.providerMetadata } : {}),
}));
const toolCallParts = toolCallChunks.map(tc => ({
  type: 'tool-call',
  toolCallId: tc.toolCallId,
  toolName: tc.toolName,
  input: JSON.parse(tc.input),
  ...(tc.providerMetadata != null ? { providerOptions: tc.providerMetadata } : {}),
}));

const toolResultContent = toolCallChunks.map(tc => ({
  type: 'tool-result',
  toolCallId: tc.toolCallId,
  toolName: tc.toolName,
  output: { type: 'text', value: JSON.stringify({ city: 'San Francisco', temperature: 62, condition: 'partly cloudy' }) },
}));

// Step 3: Follow-up WITH reasoning (the fix)
console.log('\nStep 3: Follow-up WITH reasoning preserved...');
try {
  const result2 = await model.doStream({
    prompt: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: [{ type: 'text', text: 'What is the weather in San Francisco?' }] },
      { role: 'assistant', content: [...reasoningParts, ...toolCallParts] },
      { role: 'tool', content: toolResultContent },
    ],
    tools: [{ type: 'function', name: 'getWeather', parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] } }],
  });
  const reader2 = result2.stream.getReader();
  let text = '';
  while (true) {
    const { done, value } = await reader2.read();
    if (done) break;
    if (value.type === 'text-delta') text += value.delta;
  }
  console.log(`  ✅ SUCCESS: ${text.slice(0, 150)}`);
} catch (e) {
  console.error(`  ❌ FAILED: ${e.message}`);
  process.exit(1);
}

// Step 4: Follow-up WITHOUT reasoning (old behavior — should fail)
console.log('\nStep 4: Follow-up WITHOUT reasoning (old behavior)...');
try {
  const result3 = await model.doStream({
    prompt: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: [{ type: 'text', text: 'What is the weather in San Francisco?' }] },
      { role: 'assistant', content: [...toolCallParts] }, // no reasoning parts
      { role: 'tool', content: toolResultContent },
    ],
    tools: [{ type: 'function', name: 'getWeather', parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] } }],
  });
  const reader3 = result3.stream.getReader();
  while (true) { const { done } = await reader3.read(); if (done) break; }
  console.log('  (Succeeded unexpectedly — model may not have used reasoning)');
} catch (e) {
  console.log(`  ❌ Old behavior failed as expected: ${e.message.slice(0, 120)}`);
}

Expected output:

Step 1: Sending initial request to trigger tool call...
  Reasoning chunks: 2, Tool call chunks: 1

Step 3: Follow-up WITH reasoning preserved...
  ✅ SUCCESS: The current weather in San Francisco is partly cloudy with a temperature of 62°F.

Step 4: Follow-up WITHOUT reasoning (old behavior)...
  ❌ Old behavior failed as expected: Item 'fc_...' of type 'function_call' was provided without its required 'reasoning' item: 'rs_...'

Test plan

…n history

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Contributor

vercel bot commented Mar 18, 2026

@changeset-bot
Copy link

changeset-bot bot commented Mar 18, 2026

🦋 Changeset detected

Latest commit: bc66735

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@workflow/ai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
❌ ▲ Vercel Production 756 2 67 825
✅ 💻 Local Development 782 0 118 900
✅ 📦 Local Production 782 0 118 900
✅ 🐘 Local Postgres 782 0 118 900
✅ 🪟 Windows 72 0 3 75
❌ 🌍 Community Worlds 118 56 15 189
✅ 📋 Other 198 0 27 225
Total 3490 58 466 4014

❌ Failed Tests

▲ Vercel Production (2 failed)

nextjs-webpack (1 failed):

  • hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KM1F93ZTTJHB76YYWWWGX040 | 🔍 observability

sveltekit (1 failed):

  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KM1FAD47YF4NK4BCN1KEGAW9 | 🔍 observability
🌍 Community Worlds (56 failed)

mongodb (3 failed):

  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KM1F4AM2Y3NYPQZPN2XGT5XE
  • webhookWorkflow | wrun_01KM1F4K0P3YVPM90CZ2CW1MZM
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM1F9RNDEWKCD65KQS3B0GVM

redis (2 failed):

  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KM1F4AM2Y3NYPQZPN2XGT5XE
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM1F9RNDEWKCD65KQS3B0GVM

turso (51 failed):

  • addTenWorkflow | wrun_01KM1F35HETZGE4TB1TVW84F4D
  • addTenWorkflow | wrun_01KM1F35HETZGE4TB1TVW84F4D
  • wellKnownAgentWorkflow (.well-known/agent) | wrun_01KM1F3SYB0ZRRHCYBYT2XPHJ4
  • should work with react rendering in step
  • promiseAllWorkflow | wrun_01KM1F3C0BSSJFVHHG70JWEM0C
  • promiseRaceWorkflow | wrun_01KM1F3HZHVHNGWNKPM0VSPJ82
  • promiseAnyWorkflow | wrun_01KM1F3M462PA0J0HSWRRYYJHG
  • importedStepOnlyWorkflow | wrun_01KM1F43YF75N7QMWQQMZ3CKV5
  • hookWorkflow | wrun_01KM1F4097VS0K9D36EREN4R4B
  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KM1F4AM2Y3NYPQZPN2XGT5XE
  • webhookWorkflow | wrun_01KM1F4K0P3YVPM90CZ2CW1MZM
  • sleepingWorkflow | wrun_01KM1F4S157KMX1Q84AJDQFGGM
  • parallelSleepWorkflow | wrun_01KM1F557TMBMJ5GRNKJS9ECS6
  • nullByteWorkflow | wrun_01KM1F59K9S5DVAMGVQR03C0B0
  • workflowAndStepMetadataWorkflow | wrun_01KM1F5BNN5TWHQCEY1HDNDQ9B
  • fetchWorkflow | wrun_01KM1F6708AY6MRRTE92NTQ1E2
  • promiseRaceStressTestWorkflow | wrun_01KM1F6AA5B7A22PWQGYGM8VBT
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KM1F93ZTTJHB76YYWWWGX040
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KM1F9RNDEWKCD65KQS3B0GVM
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KM1FAD47YF4NK4BCN1KEGAW9
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KM1FB1YFYP9Q3BDX5JPTSNVG
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KM1FBAKFW5XDWMHJ59GJ6VKZ
  • closureVariableWorkflow - nested step functions with closure variables | wrun_01KM1FBFF7PZFHGDWMTMMZGNQG
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KM1FBHNQVEZM7M08NDTKNPWB
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KM1FBZVDK81DXYSHB2PKMSM7
  • Calculator.calculate - static workflow method using static step methods from another class | wrun_01KM1FC6GESXQY417F95T04P9G
  • AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KM1FCDQ22H5CBVSDSXZYAQ1M
  • ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KM1FCM3PKTYSYDVQ651G9716
  • thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KM1FCSJARXBNDTVCA4F9ZEVH
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KM1FD0BXH39SNMZN3HVBJNRR
  • instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KM1FD6ZQ6C0484DWA0SC1B49
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KM1FDHJF583KH91SW2MZ8PE8
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KM1FDT82EVGHSHSTCQ3804YX
  • cancelRun - cancelling a running workflow | wrun_01KM1FE0Z0562JPAJ6YWTSBYMF
  • cancelRun via CLI - cancelling a running workflow | wrun_01KM1FEA2V111F6WFTNMVYKMC4
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KM1FEP3EFH4256ZWNK5JJ2NW
  • sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KM1FF9CTTFTY9RFYCHVXWGZ0
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KM1FFMMET6W7MHKRQMQPQBXT

Details by Category

❌ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 68 0 7
✅ example 68 0 7
✅ express 68 0 7
✅ fastify 68 0 7
✅ hono 68 0 7
✅ nextjs-turbopack 73 0 2
❌ nextjs-webpack 72 1 2
✅ nitro 68 0 7
✅ nuxt 68 0 7
❌ sveltekit 67 1 7
✅ vite 68 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 66 0 9
✅ express-stable 66 0 9
✅ fastify-stable 66 0 9
✅ hono-stable 66 0 9
✅ nextjs-turbopack-canary 55 0 20
✅ nextjs-turbopack-stable 72 0 3
✅ nextjs-webpack-canary 55 0 20
✅ nextjs-webpack-stable 72 0 3
✅ nitro-stable 66 0 9
✅ nuxt-stable 66 0 9
✅ sveltekit-stable 66 0 9
✅ vite-stable 66 0 9
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 66 0 9
✅ express-stable 66 0 9
✅ fastify-stable 66 0 9
✅ hono-stable 66 0 9
✅ nextjs-turbopack-canary 55 0 20
✅ nextjs-turbopack-stable 72 0 3
✅ nextjs-webpack-canary 55 0 20
✅ nextjs-webpack-stable 72 0 3
✅ nitro-stable 66 0 9
✅ nuxt-stable 66 0 9
✅ sveltekit-stable 66 0 9
✅ vite-stable 66 0 9
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 66 0 9
✅ express-stable 66 0 9
✅ fastify-stable 66 0 9
✅ hono-stable 66 0 9
✅ nextjs-turbopack-canary 55 0 20
✅ nextjs-turbopack-stable 72 0 3
✅ nextjs-webpack-canary 55 0 20
✅ nextjs-webpack-stable 72 0 3
✅ nitro-stable 66 0 9
✅ nuxt-stable 66 0 9
✅ sveltekit-stable 66 0 9
✅ vite-stable 66 0 9
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 72 0 3
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 2
❌ mongodb 52 3 3
✅ redis-dev 3 0 2
❌ redis 53 2 3
✅ turso-dev 3 0 2
❌ turso 4 51 3
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 66 0 9
✅ e2e-local-postgres-nest-stable 66 0 9
✅ e2e-local-prod-nest-stable 66 0 9

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: success
  • Local Prod: success
  • Local Postgres: success
  • Windows: success

Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2026

📊 Benchmark Results

📈 Comparing against baseline from main branch. Green 🟢 = faster, Red 🔺 = slower.

workflow with no steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Express 0.042s (-1.6%) 1.005s (~) 0.963s 10 1.00x
💻 Local Nitro 0.044s (-6.8% 🟢) 1.005s (~) 0.961s 10 1.04x
💻 Local Next.js (Turbopack) 0.049s 1.005s 0.956s 10 1.16x
🌐 Redis Next.js (Turbopack) 0.053s 1.005s 0.952s 10 1.26x
🐘 Postgres Next.js (Turbopack) 0.059s 1.012s 0.953s 10 1.39x
🐘 Postgres Nitro 0.059s (-3.9%) 1.012s (~) 0.953s 10 1.40x
🐘 Postgres Express 0.062s (+3.7%) 1.012s (~) 0.950s 10 1.47x
🌐 MongoDB Next.js (Turbopack) 0.122s 1.008s 0.886s 10 2.89x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Next.js (Turbopack) 0.478s (+11.7% 🔺) 2.352s (+1.3%) 1.874s 10 1.00x
▲ Vercel Nitro 0.482s (+9.4% 🔺) 2.562s (+19.1% 🔺) 2.080s 10 1.01x
▲ Vercel Express 0.730s (+52.3% 🔺) 2.574s (-1.0%) 1.844s 10 1.53x

🔍 Observability: Next.js (Turbopack) | Nitro | Express

workflow with 1 step

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 1.114s 2.006s 0.892s 10 1.00x
💻 Local Nitro 1.124s (~) 2.006s (~) 0.882s 10 1.01x
💻 Local Express 1.125s (~) 2.006s (~) 0.880s 10 1.01x
🌐 Redis Next.js (Turbopack) 1.127s 2.007s 0.880s 10 1.01x
🐘 Postgres Next.js (Turbopack) 1.156s 2.012s 0.856s 10 1.04x
🐘 Postgres Nitro 1.161s (+1.2%) 2.022s (~) 0.861s 10 1.04x
🐘 Postgres Express 1.163s (~) 2.013s (~) 0.850s 10 1.04x
🌐 MongoDB Next.js (Turbopack) 1.305s 2.008s 0.704s 10 1.17x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.014s (-5.1% 🟢) 3.887s (+6.6% 🔺) 1.873s 10 1.00x
▲ Vercel Nitro 2.169s (+2.9%) 3.631s (+5.8% 🔺) 1.462s 10 1.08x
▲ Vercel Next.js (Turbopack) 2.182s (-5.2% 🟢) 3.684s (+5.2% 🔺) 1.503s 10 1.08x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 10 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🌐 Redis 🥇 Next.js (Turbopack) 10.756s 11.022s 0.266s 3 1.00x
💻 Local Next.js (Turbopack) 10.785s 11.023s 0.237s 3 1.00x
💻 Local Nitro 10.888s (~) 11.022s (~) 0.134s 3 1.01x
💻 Local Express 10.893s (~) 11.023s (~) 0.130s 3 1.01x
🐘 Postgres Next.js (Turbopack) 10.898s 11.046s 0.148s 3 1.01x
🐘 Postgres Nitro 10.956s (~) 11.042s (~) 0.086s 3 1.02x
🐘 Postgres Express 11.008s (~) 11.378s (+3.0%) 0.370s 3 1.02x
🌐 MongoDB Next.js (Turbopack) 12.230s 13.021s 0.791s 3 1.14x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 16.548s (-5.5% 🟢) 18.335s (-6.1% 🟢) 1.787s 2 1.00x
▲ Vercel Nitro 17.120s (-1.7%) 18.596s (+1.2%) 1.476s 2 1.03x
▲ Vercel Next.js (Turbopack) 21.439s (+15.7% 🔺) 23.264s (+18.6% 🔺) 1.825s 2 1.30x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

workflow with 25 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🌐 Redis 🥇 Next.js (Turbopack) 26.791s 27.050s 0.259s 3 1.00x
🐘 Postgres Next.js (Turbopack) 27.082s 27.728s 0.646s 3 1.01x
💻 Local Next.js (Turbopack) 27.132s 28.051s 0.919s 3 1.01x
🐘 Postgres Express 27.281s (~) 28.068s (~) 0.787s 3 1.02x
🐘 Postgres Nitro 27.286s (~) 28.067s (~) 0.780s 3 1.02x
💻 Local Express 27.445s (~) 28.050s (~) 0.605s 3 1.02x
💻 Local Nitro 27.450s (~) 28.052s (~) 0.603s 3 1.02x
🌐 MongoDB Next.js (Turbopack) 30.353s 31.033s 0.680s 2 1.13x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 42.914s (-4.5%) 44.495s (-3.4%) 1.581s 2 1.00x
▲ Vercel Express 44.322s (-4.8%) 45.969s (-5.3% 🟢) 1.647s 2 1.03x
▲ Vercel Next.js (Turbopack) 45.717s (-1.0%) 47.653s (~) 1.937s 2 1.07x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

workflow with 50 sequential steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🌐 Redis 🥇 Next.js (Turbopack) 53.526s 54.098s 0.572s 2 1.00x
🐘 Postgres Next.js (Turbopack) 54.236s 55.111s 0.875s 2 1.01x
🐘 Postgres Express 54.395s (~) 55.105s (~) 0.709s 2 1.02x
🐘 Postgres Nitro 54.483s (~) 55.110s (~) 0.627s 2 1.02x
💻 Local Next.js (Turbopack) 55.863s 56.101s 0.238s 2 1.04x
💻 Local Express 56.389s (~) 57.102s (~) 0.712s 2 1.05x
💻 Local Nitro 56.625s (~) 57.102s (~) 0.477s 2 1.06x
🌐 MongoDB Next.js (Turbopack) 60.492s 61.059s 0.568s 2 1.13x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 94.297s (-4.4%) 95.455s (-4.1%) 1.158s 1 1.00x
▲ Vercel Express 94.620s (-3.8%) 96.288s (-3.8%) 1.668s 1 1.00x
▲ Vercel Next.js (Turbopack) 97.073s (-6.7% 🟢) 99.155s (-5.4% 🟢) 2.082s 1 1.03x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.all with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.252s 2.013s 0.761s 15 1.00x
🐘 Postgres Express 1.278s (~) 2.011s (~) 0.733s 15 1.02x
🐘 Postgres Nitro 1.293s (~) 2.012s (~) 0.718s 15 1.03x
🌐 Redis Next.js (Turbopack) 1.388s 2.006s 0.618s 15 1.11x
💻 Local Express 1.493s (-1.1%) 2.006s (~) 0.512s 15 1.19x
💻 Local Nitro 1.528s (+1.4%) 2.006s (~) 0.478s 15 1.22x
💻 Local Next.js (Turbopack) 1.562s 2.006s 0.445s 15 1.25x
🌐 MongoDB Next.js (Turbopack) 2.163s 3.007s 0.845s 10 1.73x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.168s (-16.2% 🟢) 3.694s (-12.2% 🟢) 1.525s 9 1.00x
▲ Vercel Next.js (Turbopack) 2.457s (-16.2% 🟢) 3.827s (-4.5%) 1.371s 8 1.13x
▲ Vercel Nitro 2.729s (+12.0% 🔺) 4.148s (+6.1% 🔺) 1.419s 8 1.26x

🔍 Observability: Express | Next.js (Turbopack) | Nitro

Promise.all with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 2.463s (~) 3.012s (~) 0.548s 10 1.00x
🐘 Postgres Express 2.469s (+0.7%) 3.014s (~) 0.545s 10 1.00x
🐘 Postgres Next.js (Turbopack) 2.474s 3.013s 0.538s 10 1.00x
🌐 Redis Next.js (Turbopack) 2.527s 3.008s 0.481s 10 1.03x
💻 Local Express 2.880s (+0.9%) 3.108s (+3.3%) 0.227s 10 1.17x
💻 Local Next.js (Turbopack) 2.891s 3.108s 0.216s 10 1.17x
💻 Local Nitro 2.930s (-2.0%) 3.564s (+6.7% 🔺) 0.634s 9 1.19x
🌐 MongoDB Next.js (Turbopack) 4.662s 5.177s 0.514s 6 1.89x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.679s (+7.3% 🔺) 4.319s (+8.8% 🔺) 1.640s 7 1.00x
▲ Vercel Nitro 2.708s (+5.5% 🔺) 4.167s (+10.9% 🔺) 1.459s 8 1.01x
▲ Vercel Next.js (Turbopack) 3.061s (+8.5% 🔺) 4.416s (+12.5% 🔺) 1.355s 7 1.14x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.all with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 3.612s (-0.5%) 4.016s (~) 0.403s 8 1.00x
🐘 Postgres Express 3.621s (+0.8%) 4.016s (~) 0.394s 8 1.00x
🐘 Postgres Next.js (Turbopack) 3.823s 4.016s 0.193s 8 1.06x
🌐 Redis Next.js (Turbopack) 4.170s 5.012s 0.842s 6 1.15x
💻 Local Next.js (Turbopack) 7.541s 8.015s 0.474s 4 2.09x
💻 Local Express 7.793s (-2.0%) 8.017s (-3.1%) 0.224s 4 2.16x
💻 Local Nitro 8.149s (+3.2%) 9.020s (+2.8%) 0.871s 4 2.26x
🌐 MongoDB Next.js (Turbopack) 9.745s 10.350s 0.605s 3 2.70x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.644s (-21.2% 🟢) 3.856s (-19.0% 🟢) 1.211s 8 1.00x
▲ Vercel Express 2.867s (-6.7% 🟢) 4.142s (-4.2%) 1.275s 8 1.08x
▲ Vercel Next.js (Turbopack) 3.727s (+8.4% 🔺) 5.295s (+12.1% 🔺) 1.568s 6 1.41x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Promise.race with 10 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Next.js (Turbopack) 1.253s 2.011s 0.758s 15 1.00x
🐘 Postgres Express 1.287s (+2.2%) 2.011s (~) 0.725s 15 1.03x
🐘 Postgres Nitro 1.287s (~) 2.011s (~) 0.724s 15 1.03x
🌐 Redis Next.js (Turbopack) 1.323s 2.006s 0.684s 15 1.06x
💻 Local Express 1.505s (-0.9%) 2.005s (~) 0.500s 15 1.20x
💻 Local Next.js (Turbopack) 1.532s 2.006s 0.475s 15 1.22x
💻 Local Nitro 1.533s (-0.5%) 2.007s (~) 0.474s 15 1.22x
🌐 MongoDB Next.js (Turbopack) 2.192s 3.009s 0.817s 10 1.75x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 2.119s (-4.4%) 3.594s (~) 1.475s 9 1.00x
▲ Vercel Next.js (Turbopack) 2.213s (-14.4% 🟢) 3.888s (-3.6%) 1.676s 8 1.04x
▲ Vercel Express 2.276s (+3.1%) 3.707s (+5.5% 🔺) 1.430s 9 1.07x

🔍 Observability: Nitro | Next.js (Turbopack) | Express

Promise.race with 25 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 2.452s (~) 3.013s (~) 0.561s 10 1.00x
🐘 Postgres Express 2.462s (+0.8%) 3.012s (~) 0.551s 10 1.00x
🐘 Postgres Next.js (Turbopack) 2.474s 3.014s 0.540s 10 1.01x
🌐 Redis Next.js (Turbopack) 2.576s 3.008s 0.432s 10 1.05x
💻 Local Express 2.877s (-2.3%) 3.453s (+4.4%) 0.577s 9 1.17x
💻 Local Nitro 3.053s (-2.1%) 3.762s (-3.2%) 0.709s 8 1.24x
💻 Local Next.js (Turbopack) 3.208s 3.885s 0.677s 8 1.31x
🌐 MongoDB Next.js (Turbopack) 4.653s 5.177s 0.524s 6 1.90x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.793s (+11.4% 🔺) 4.114s (+8.1% 🔺) 1.321s 9 1.00x
▲ Vercel Nitro 3.334s (+33.8% 🔺) 4.576s (+24.8% 🔺) 1.242s 7 1.19x
▲ Vercel Next.js (Turbopack) 3.671s (+45.8% 🔺) 5.274s (+52.0% 🔺) 1.603s 6 1.31x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Promise.race with 50 concurrent steps

💻 Local Development

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
🐘 Postgres 🥇 Nitro 3.630s (~) 4.015s (~) 0.386s 8 1.00x
🐘 Postgres Express 3.632s (+0.7%) 4.015s (~) 0.383s 8 1.00x
🐘 Postgres Next.js (Turbopack) 3.785s 4.015s 0.229s 8 1.04x
🌐 Redis Next.js (Turbopack) 4.161s 4.868s 0.708s 7 1.15x
💻 Local Next.js (Turbopack) 7.963s 8.516s 0.553s 4 2.19x
💻 Local Express 8.265s (-3.3%) 9.021s (~) 0.755s 4 2.28x
💻 Local Nitro 9.011s (+4.5%) 9.520s (+5.5% 🔺) 0.510s 4 2.48x
🌐 MongoDB Next.js (Turbopack) 10.086s 10.349s 0.263s 3 2.78x

▲ Production (Vercel)

World Framework Workflow Time Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Express 2.689s (-8.5% 🟢) 4.158s (-3.7%) 1.469s 8 1.00x
▲ Vercel Nitro 3.096s (+2.6%) 4.368s (~) 1.272s 7 1.15x
▲ Vercel Next.js (Turbopack) 3.452s (-13.3% 🟢) 5.102s (+2.1%) 1.651s 6 1.28x

🔍 Observability: Express | Nitro | Next.js (Turbopack)

Stream Benchmarks (includes TTFB metrics)
workflow with stream

💻 Local Development

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
💻 Local 🥇 Next.js (Turbopack) 0.170s 1.002s 0.011s 1.017s 0.848s 10 1.00x
🌐 Redis Next.js (Turbopack) 0.191s 1.000s 0.001s 1.007s 0.817s 10 1.13x
💻 Local Express 0.195s (-0.9%) 1.003s (~) 0.011s (-1.8%) 1.017s (~) 0.822s 10 1.15x
💻 Local Nitro 0.199s (-1.0%) 1.003s (~) 0.011s (-3.4%) 1.017s (~) 0.819s 10 1.17x
🐘 Postgres Next.js (Turbopack) 0.202s 1.002s 0.002s 1.013s 0.811s 10 1.19x
🐘 Postgres Express 0.225s (+2.7%) 0.998s (+0.5%) 0.001s (+16.7% 🔺) 1.013s (~) 0.789s 10 1.32x
🐘 Postgres Nitro 0.230s (+2.6%) 0.994s (~) 0.001s (-7.1% 🟢) 1.013s (~) 0.783s 10 1.36x
🌐 MongoDB Next.js (Turbopack) 0.524s 0.926s 0.001s 1.009s 0.484s 10 3.09x

▲ Production (Vercel)

World Framework Workflow Time TTFB Slurp Wall Time Overhead Samples vs Fastest
▲ Vercel 🥇 Nitro 1.574s (+2.3%) 2.317s (-16.2% 🟢) 0.006s (-59.0% 🟢) 2.964s (-12.5% 🟢) 1.390s 10 1.00x
▲ Vercel Express 1.643s (-5.1% 🟢) 2.272s (-21.2% 🟢) 0.005s (-11.7% 🟢) 2.909s (-17.8% 🟢) 1.265s 10 1.04x
▲ Vercel Next.js (Turbopack) 1.677s (+2.4%) 2.450s (-15.0% 🟢) 0.004s (-29.1% 🟢) 3.090s (-7.0% 🟢) 1.413s 10 1.06x

🔍 Observability: Nitro | Express | Next.js (Turbopack)

Summary

Fastest Framework by World

Winner determined by most benchmark wins

World 🥇 Fastest Framework Wins
💻 Local Next.js (Turbopack) 7/12
🐘 Postgres Next.js (Turbopack) 8/12
▲ Vercel Express 6/12
Fastest World by Framework

Winner determined by most benchmark wins

Framework 🥇 Fastest World Wins
Express 🐘 Postgres 6/12
Next.js (Turbopack) 🐘 Postgres 4/12
Nitro 🐘 Postgres 6/12
Column Definitions
  • Workflow Time: Runtime reported by workflow (completedAt - createdAt) - primary metric
  • TTFB: Time to First Byte - time from workflow start until first stream byte received (stream benchmarks only)
  • Slurp: Time from first byte to complete stream consumption (stream benchmarks only)
  • Wall Time: Total testbench time (trigger workflow + poll for result)
  • Overhead: Testbench overhead (Wall Time - Workflow Time)
  • Samples: Number of benchmark iterations run
  • vs Fastest: How much slower compared to the fastest configuration for this benchmark

Worlds:

  • 💻 Local: In-memory filesystem world (local development)
  • 🐘 Postgres: PostgreSQL database world (local development)
  • ▲ Vercel: Vercel production/preview deployment
  • 🌐 Turso: Community world (local development)
  • 🌐 MongoDB: Community world (local development)
  • 🌐 Redis: Community world (local development)
  • 🌐 Jazz: Community world (local development)

📋 View full workflow run

Include reasoning parts from step results in the assistant message
alongside tool-call parts when building the conversation prompt for
the next tool loop iteration. This mirrors what the AI SDK's
toResponseMessages() does, ensuring reasoning models retain access
to their prior reasoning across multi-step tool loops.

Remove sanitizeProviderMetadataForToolCall() — with reasoning items
now preserved in the conversation, OpenAI's itemId references are
valid and no longer need to be stripped.

Closes #1393

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chunksToStep() previously only collected reasoning-delta chunks. For
models with encrypted reasoning (like o4-mini), there are no deltas —
only reasoning-start + reasoning-end chunks carrying the itemId needed
for OpenAI Responses API item references.

Now aggregates reasoning by ID from reasoning-start chunks, appending
delta text when available. This ensures providerMetadata (including
itemId and reasoningEncryptedContent) flows through even when no
reasoning deltas are emitted.

Verified with o4-mini via OpenAI Responses API:
- With fix: follow-up request accepted
- Without fix: "Item 'fc_...' of type 'function_call' was provided
  without its required 'reasoning' item: 'rs_...'" (#880)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Collaborator

@pranaygp pranaygp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR — the approach of mirroring AI SDK's toResponseMessages() is the right call, and the removal of sanitizeProviderMetadataForToolCall follows cleanly from reasoning now being preserved. Tests are thorough (unit + e2e with mock reasoning model). Left a couple of inline comments.

providerMetadata: chunk.providerMetadata,
});
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning-end chunks can also carry providerMetadata (visible in the stream transform at line 345-347 of this same file), but this aggregation loop only processes reasoning-start and reasoning-delta. If a provider attaches metadata exclusively to reasoning-end (rather than reasoning-start), it would be silently dropped.

Probably fine for current providers (OpenAI puts it on reasoning-start), but worth a comment or a defensive reasoning-end handler that merges metadata into the entry.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the AI SDK we throw an error when reasoning data is only attached to reasoning-end

https://github.com/vercel/ai/blob/438708ae26b2340c9703c480d179c3b13c01d1af/packages/ai/src/ui/process-ui-message-stream.ts#L449

but we are still capturing it, I updated the PR

...(meta != null ? { providerOptions: meta } : {}),
};
}),
] as typeof toolCalls,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this as typeof toolCalls cast was already here before this PR, but now it's even less accurate — the array genuinely contains reasoning parts that don't conform to LanguageModelV3ToolCall. If the prompt type's content field supports a union of reasoning + tool-call parts, it'd be cleaner to type it properly. Not a blocker though.

- Add reasoning-end handler in chunksToStep() to merge providerMetadata,
  mirroring the AI SDK's behavior where reasoning-end can carry final
  metadata that should override earlier values.
- Replace inaccurate `as typeof toolCalls` cast with
  `as LanguageModelV3Prompt[number]['content']` since the content array
  now contains both reasoning and tool-call parts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +334 to +337
] as Extract<
LanguageModelV3Prompt[number],
{ role: 'assistant' }
>['content'],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure I like this type, we could also revert to as typeof toolCalls with the comment

Suggested change
] as Extract<
LanguageModelV3Prompt[number],
{ role: 'assistant' }
>['content'],
// Cast: content is a mix of reasoning + tool-call parts, both valid
// in an assistant message. typeof toolCalls is imprecise but harmless.
] as typeof toolCalls,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DurableAgent should preserve reasoning content in conversation history across tool loop steps

2 participants