Skip to content

fix: fix duplicate eval logs & metrics from showing up#3

Open
shuningc wants to merge 2 commits intomainfrom
fix-duplicate-evals-logs/metrics
Open

fix: fix duplicate eval logs & metrics from showing up#3
shuningc wants to merge 2 commits intomainfrom
fix-duplicate-evals-logs/metrics

Conversation

@shuningc
Copy link
Owner

@shuningc shuningc commented Feb 6, 2026

Skip util-genai scope spans so we don't process already-instrumented spans
Removed redundant handler.evaluate_agent() calls - Since finish() already triggers evaluations via _notify_completion(), the explicit evaluate_agent() call was causing duplicate evaluations

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed duplicate span processing by preventing re-processing of internally-generated spans.
  • Refactor

    • Streamlined agent evaluation flow by consolidating evaluation into the span completion process.
    • Improved LLM span detection logic for better accuracy.
    • Simplified message conversion handling for cleaner processing.

Skip util-genai scope spans so we don't process already-instrumented spans

Signed-off-by: Pavan Sudheendra <pavan0591@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

These changes refactor three OpenTelemetry span processors to skip processing spans generated by the util-genai library, consolidate agent evaluation responsibility into the finish() method, and streamline LLM span detection logic across LangSmith, OpenLit, and Traceloop translators.

Changes

Cohort / File(s) Summary
Skip Logic Enhancement
util/opentelemetry-util-genai-langsmith-translator/.../langsmith_span_processor.py, util/opentelemetry-util-genai-openlit-translator/.../openlit_span_processor.py, util/opentelemetry-util-genai-traceloop-translator/.../traceloop_span_processor.py
Added skip conditions in _should_skip_span to filter out spans with instrumentation_scope.name starting with "opentelemetry.util.genai", preventing redundant reprocessing of util-genai spans.
Agent Evaluation Consolidation
util/opentelemetry-util-genai-langsmith-translator/.../langsmith_span_processor.py, util/opentelemetry-util-genai-openlit-translator/.../openlit_span_processor.py, util/opentelemetry-util-genai-traceloop-translator/.../traceloop_span_processor.py
Removed explicit AgentInvocation evaluation blocks in on_end handlers; delegated agent evaluation responsibility to finish() method via the emitter.
LLM Span Detection Refinement
util/opentelemetry-util-genai-traceloop-translator/.../traceloop_span_processor.py
Removed CRITICAL exclusion block that prevented evaluation-related spans from LLM processing; broadened _is_llm_span logic to use only _EXCLUDE_SPAN_PATTERNS for non-LLM filtering.
Message Conversion Simplification
util/opentelemetry-util-genai-traceloop-translator/.../traceloop_span_processor.py
Removed dedicated LangChain serialization format extraction in _convert_langchain_to_genai_messages; content now proceeds through generic handling path instead of specialized LangChain-specific parsing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Three processors cleaned their span-filled nests,
Skipping util-genai guests with their best,
Delegation blooms where finish() now rules,
Evaluation flows through simpler new tools,
LangChain's special way bids farewell today! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the changeset: preventing duplicate evaluation logs and metrics by skipping util-genai spans and removing redundant evaluate_agent() calls.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-duplicate-evals-logs/metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

/review

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit c732f21)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Compatibility

The new skip logic relies on span.instrumentation_scope.name being present and matching a specific prefix. Please validate behavior across the OpenTelemetry Python versions you support (some older SDKs used instrumentation_library_info), and ensure a missing/alternative scope representation doesn’t reintroduce duplicate processing.

# Skip spans created by the util-genai library itself.
# These spans are already properly instrumented and should not be
# processed again by the langsmith translator to avoid duplicate
# evaluations, metrics, and logs.
scope = getattr(span, "instrumentation_scope", None)
scope_name = getattr(scope, "name", "") if scope else ""
if scope_name.startswith("opentelemetry.util.genai"):
    _logger.debug(
        "[LANGSMITH_PROCESSOR] Skipping util-genai span (scope=%s): %s",
        scope_name,
        span.name,
    )
    return True
Skip Criteria

Skipping spans based on an instrumentation scope name prefix is broad; confirm there are no spans legitimately meant to be translated that could also originate from that scope (e.g., if util-genai emits spans that still require vendor translation). Consider documenting the expected scope naming contract to prevent accidental trace signal loss.

# Skip spans created by the util-genai library itself.
# These spans are already properly instrumented and should not be
# processed again by the openlit translator to avoid duplicate
# evaluations, metrics, and logs.
scope = getattr(span, "instrumentation_scope", None)
scope_name = getattr(scope, "name", "") if scope else ""
if scope_name.startswith("opentelemetry.util.genai"):
    _logger.debug(
        "[OPENLIT_PROCESSOR] Skipping util-genai span (scope=%s): %s",
        scope_name,
        span.name,
    )
    return True
Evaluation Semantics

The change removes explicit evaluate_agent() calls and relies on finish() triggering evaluations via completion callbacks. Please verify that agent-level evaluation results are still attached to the intended agent span (and not only to downstream LLM spans) for all invocation types, since this affects how evaluation events/attributes are correlated in traces.

# Close the invocation to trigger core lifecycle handling
# This will call the appropriate stop_* method and emit spans/metrics.
# Note: handler.finish() already triggers evaluations via the completion
# callback mechanism (_notify_completion -> on_completion), so no explicit
# evaluate_agent() call is needed for AgentInvocation types.
handler = self.telemetry_handler or get_telemetry_handler()
try:
    handler.finish(invocation)
    _logger.debug(

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

/improve

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 6, 2026

PR Code Suggestions ✨

Latest suggestions up to c732f21

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle SDK scope API differences

Improve robustness of the span skipping logic by checking for
instrumentation_library_info to support older OpenTelemetry SDKs and ensuring
the scope name is a string.

util/opentelemetry-util-genai-langsmith-translator/src/opentelemetry/util/genai/processor/langsmith_span_processor.py [486-498]

 # Skip spans created by the util-genai library itself.
 # These spans are already properly instrumented and should not be
 # processed again by the langsmith translator to avoid duplicate
 # evaluations, metrics, and logs.
-scope = getattr(span, "instrumentation_scope", None)
+scope = getattr(span, "instrumentation_scope", None) or getattr(
+    span, "instrumentation_library_info", None
+)
 scope_name = getattr(scope, "name", "") if scope else ""
-if scope_name.startswith("opentelemetry.util.genai"):
+if isinstance(scope_name, str) and scope_name.startswith("opentelemetry.util.genai"):
     _logger.debug(
         "[LANGSMITH_PROCESSOR] Skipping util-genai span (scope=%s): %s",
         scope_name,
         span.name,
     )
     return True
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential compatibility issue with older OpenTelemetry SDKs by proposing a fallback to instrumentation_library_info. This improves the robustness of the new span skipping logic, preventing duplicate telemetry in environments with older SDK versions.

Medium
Support older scope field names

Improve robustness of the span skipping logic by checking for
instrumentation_library_info to support older OpenTelemetry SDKs and ensuring
the scope name is a string.

util/opentelemetry-util-genai-openlit-translator/src/opentelemetry/util/genai/processor/openlit_span_processor.py [464-476]

 # Skip spans created by the util-genai library itself.
 # These spans are already properly instrumented and should not be
 # processed again by the openlit translator to avoid duplicate
 # evaluations, metrics, and logs.
-scope = getattr(span, "instrumentation_scope", None)
+scope = getattr(span, "instrumentation_scope", None) or getattr(
+    span, "instrumentation_library_info", None
+)
 scope_name = getattr(scope, "name", "") if scope else ""
-if scope_name.startswith("opentelemetry.util.genai"):
+if isinstance(scope_name, str) and scope_name.startswith("opentelemetry.util.genai"):
     _logger.debug(
         "[OPENLIT_PROCESSOR] Skipping util-genai span (scope=%s): %s",
         scope_name,
         span.name,
     )
     return True
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential compatibility issue with older OpenTelemetry SDKs by proposing a fallback to instrumentation_library_info. This improves the robustness of the new span skipping logic, preventing duplicate telemetry in environments with older SDK versions.

Medium
Make skipping version-tolerant

Improve robustness of the span skipping logic by checking for
instrumentation_library_info to support older OpenTelemetry SDKs and ensuring
the scope name is a string.

util/opentelemetry-util-genai-traceloop-translator/src/opentelemetry/util/genai/processor/traceloop_span_processor.py [485-497]

 # Skip spans created by the util-genai library itself.
 # These spans are already properly instrumented and should not be
 # processed again by the traceloop translator to avoid duplicate
 # evaluations, metrics, and logs.
-scope = getattr(span, "instrumentation_scope", None)
+scope = getattr(span, "instrumentation_scope", None) or getattr(
+    span, "instrumentation_library_info", None
+)
 scope_name = getattr(scope, "name", "") if scope else ""
-if scope_name.startswith("opentelemetry.util.genai"):
+if isinstance(scope_name, str) and scope_name.startswith("opentelemetry.util.genai"):
     _logger.debug(
         "[TL_PROCESSOR] Skipping util-genai span (scope=%s): %s",
         scope_name,
         span.name,
     )
     return True
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential compatibility issue with older OpenTelemetry SDKs by proposing a fallback to instrumentation_library_info. This improves the robustness of the new span skipping logic, preventing duplicate telemetry in environments with older SDK versions.

Medium
  • More

Previous suggestions

Suggestions up to commit 30523d6
CategorySuggestion                                                                                                                                    Impact
Possible issue
Harden scope-based span skipping

Improve the robustness of skipping util-genai spans by checking for the legacy
instrumentation_info attribute and ensuring the scope name is a lowercase string
before comparison.

util/opentelemetry-util-genai-traceloop-translator/src/opentelemetry/util/genai/processor/traceloop_span_processor.py [485-497]

 # Skip spans created by the util-genai library itself.
 # These spans are already properly instrumented and should not be
 # processed again by the traceloop translator to avoid duplicate
 # evaluations, metrics, and logs.
-scope = getattr(span, "instrumentation_scope", None)
-scope_name = getattr(scope, "name", "") if scope else ""
+scope = getattr(span, "instrumentation_scope", None) or getattr(
+    span, "instrumentation_info", None
+)
+scope_name = str(getattr(scope, "name", "") or "").lower()
 if scope_name.startswith("opentelemetry.util.genai"):
     _logger.debug(
         "[TL_PROCESSOR] Skipping util-genai span (scope=%s): %s",
         scope_name,
         span.name,
     )
     return True
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential robustness issue and proposes a valid improvement for backward compatibility and handling of edge cases, which directly supports the PR's goal of reliably skipping specific spans.

Medium

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

/review

@qodo-code-review
Copy link

Persistent review updated to latest commit 30523d6

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

/review

@qodo-code-review
Copy link

Persistent review updated to latest commit c732f21

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

/improve

@qodo-code-review
Copy link

Persistent suggestions updated to latest commit c732f21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants