Skip to content

fix: restore explicit span.end() to fix span end_time regression#2032

Merged
zastrowm merged 2 commits intostrands-agents:mainfrom
zastrowm:fix_otel_timing
Apr 1, 2026
Merged

fix: restore explicit span.end() to fix span end_time regression#2032
zastrowm merged 2 commits intostrands-agents:mainfrom
zastrowm:fix_otel_timing

Conversation

@zastrowm
Copy link
Copy Markdown
Member

@zastrowm zastrowm commented Apr 1, 2026

Description

Since v1.24.0 (PR #1293), execute_event_loop_cycle spans no longer reflect per-cycle duration. When a cycle performs tool use and recurses, the parent cycle's OTel span stays open until all recursive children complete, producing cumulative bottom-up latency instead of per-step latency in observability backends (Langfuse, Jaeger, etc.).

The root cause: event_loop_cycle() is an async generator whose body was wrapped in use_span(end_on_exit=True). Because yield keeps the context manager open across recursive cycles, all span.end() calls fire simultaneously when the generator chain unwinds.

This is a cherry-pick of the first commit from #1939 (by @Di-Is), isolated to the core span timing fix.

The fix switches to end_on_exit=False and restores explicit span.end() calls via _end_span() in end_event_loop_cycle_span() and end_model_invoke_span(), with end_span_with_error() on all exception paths.

Related Issues

Resolves #1930

Documentation PR

N/A

Type of Change

Bug fix

Testing

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

PR strands-agents#1293 wrapped event_loop_cycle() in use_span(end_on_exit=True) and removed
explicit span.end() calls. Because event_loop_cycle is an async generator,
yield keeps the context manager open across recursive cycles, causing all
execute_event_loop_cycle spans to share the same OTel end_time.

Switch to end_on_exit=False and explicitly call span.end() via _end_span()
in end_event_loop_cycle_span() and end_model_invoke_span(), restoring
end_span_with_error() in all exception paths.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Review Summary

Assessment: Approve ✅

This PR correctly fixes the OTel span timing regression by switching from end_on_exit=True to explicit span.end() calls, ensuring cycle spans end before recursion begins.

Review Notes
  • Root Cause: The async generator with end_on_exit=True kept spans open across yields during recursion, causing cumulative duration instead of per-cycle duration
  • Fix Approach: end_on_exit=False with explicit _end_span() calls via end_event_loop_cycle_span() and end_model_invoke_span()
  • Defensive Coding: Added is_recording() check to prevent double-ending spans
  • Error Handling: Good fallback chain error_message or str(error) or type(error).__name__
  • Test Coverage: Comprehensive tests verify timing behavior (cycle_spans[0].end_time <= cycle_spans[1].start_time) and error paths
  • Documentation PR: N/A is appropriate for this internal bug fix

All 112 tests pass. Clean fix that resolves the regression correctly.

lizradway
lizradway previously approved these changes Apr 1, 2026
@zastrowm
Copy link
Copy Markdown
Member Author

zastrowm commented Apr 1, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Regression: event_loop_cycle span duration becomes cumulative across recursive cycles

3 participants