Skip to content

fix(langchain): use default EmbeddingInvocation operation_name instead of hardcoded value#7

Open
shuningc wants to merge 1 commit intomainfrom
copy-pr-214
Open

fix(langchain): use default EmbeddingInvocation operation_name instead of hardcoded value#7
shuningc wants to merge 1 commit intomainfrom
copy-pr-214

Conversation

@shuningc
Copy link
Owner

@shuningc shuningc commented Feb 20, 2026

The langchain instrumentation was setting operation_name="embedding" (singular) which overrode the correct semantic convention default of "embeddings" (plural) defined in EmbeddingInvocation types.py. Remove the explicit override so the dataclass default from GenAiOperationNameValues.EMBEDDINGS is used.

Add embedding tests for both packages:

  • util-genai: default operation_name, semconv attributes, span name, error path
  • langchain: VCR integration tests for success and error paths

Summary by CodeRabbit

  • Tests

    • Added comprehensive test coverage for LangChain embedding instrumentation, including successful embedding retrieval and error handling scenarios.
    • Added test fixtures for API interactions covering both successful and error cases.
  • Chores

    • Enhanced embedding instrumentation telemetry validation and test infrastructure.

…d of hardcoded value

The langchain instrumentation was setting operation_name="embedding" (singular)
which overrode the correct semantic convention default of "embeddings" (plural)
defined in EmbeddingInvocation types.py. Remove the explicit override so the
dataclass default from GenAiOperationNameValues.EMBEDDINGS is used.

Add embedding tests for both packages:
- util-genai: default operation_name, semconv attributes, span name, error path
- langchain: VCR integration tests for success and error paths

Co-authored-by: Cursor <cursoragent@cursor.com>
@qodo-code-review
Copy link

Review Summary by Qodo

Fix embedding operation_name and add comprehensive test coverage

🐞 Bug fix 🧪 Tests

Grey Divider

Walkthroughs

Description
• Remove hardcoded "embedding" operation_name override in langchain instrumentation
  - Allows default "embeddings" semantic convention value to be used
• Add comprehensive embedding tests for util-genai package
  - Default operation_name, semantic convention attributes, span naming, error handling
• Add VCR integration tests for langchain embedding instrumentation
  - Success and error paths with recorded API cassettes
Diagram
flowchart LR
  A["LangChain Instrumentation"] -->|Remove hardcoded operation_name| B["Use EmbeddingInvocation Default"]
  B -->|operation_name='embeddings'| C["Semantic Convention Compliance"]
  D["util-genai Tests"] -->|Default operation_name| E["Span Attributes"]
  D -->|Error handling| E
  D -->|Span naming| E
  F["LangChain Integration Tests"] -->|VCR cassettes| G["Success Path"]
  F -->|VCR cassettes| H["Error Path"]
Loading

Grey Divider

File Changes

1. instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/__init__.py 🐞 Bug fix +0/-1

Remove hardcoded embedding operation_name override

• Removed hardcoded operation_name="embedding" parameter from UtilEmbeddingInvocation constructor
• Allows the dataclass default from GenAiOperationNameValues.EMBEDDINGS to be used instead
• Fixes semantic convention compliance by using plural "embeddings" instead of singular "embedding"

instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py


2. util/opentelemetry-util-genai/tests/test_embedding_invocation.py 🧪 Tests +148/-1

Add comprehensive EmbeddingInvocation unit tests

• Added module docstring documenting test scope
• Added 9 new test functions covering EmbeddingInvocation lifecycle and defaults
• Tests verify default operation_name is "embeddings", semantic convention attributes are correct
• Tests validate span naming convention, error handling, and custom operation_name overrides
• Tests mirror langchain instrumentation usage pattern to ensure compatibility

util/opentelemetry-util-genai/tests/test_embedding_invocation.py


3. instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py 🧪 Tests +154/-0

Add langchain embedding VCR integration tests

• Created new test module with VCR integration tests for langchain embedding instrumentation
• test_langchain_embedding_call validates successful embedding with correct telemetry attributes
• test_langchain_embedding_call_error validates error path produces correct span with ERROR status
• Tests verify operation_name='embeddings', span naming, metrics emission, and semantic convention
 compliance

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py


View more (2)
4. instrumentation-genai/opentelemetry-instrumentation-langchain/tests/cassettes/test_langchain_embedding_call.yaml 🧪 Tests +78/-0

VCR cassette for successful embedding call

• VCR cassette recording successful OpenAI embedding API call
• Contains request with embedding query and response with embedding vector
• Used by test_langchain_embedding_call to validate success path without live API calls

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/cassettes/test_langchain_embedding_call.yaml


5. instrumentation-genai/opentelemetry-instrumentation-langchain/tests/cassettes/test_langchain_embedding_call_error.yaml 🧪 Tests +71/-0

VCR cassette for embedding API error response

• VCR cassette recording failed OpenAI embedding API call with 401 Unauthorized response
• Contains request and error response with invalid API key message
• Used by test_langchain_embedding_call_error to validate error path handling

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/cassettes/test_langchain_embedding_call_error.yaml


Grey Divider

Qodo Logo

@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

A minor parameter removal from EmbeddingInvocation construction in LangChain instrumentation, paired with comprehensive test coverage including two new VCR cassettes for embedding API calls (success and error scenarios), a new LangChain embedding test module, and extended utility tests for EmbeddingInvocation telemetry lifecycle.

Changes

Cohort / File(s) Summary
Core Instrumentation Change
instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/__init__.py
Removed operation_name="embedding" argument from EmbeddingInvocation constructor call, omitting the operation_name field from the emitted embedding invocation payload.
VCR Test Cassettes
instrumentation-genai/opentelemetry-instrumentation-langchain/tests/cassettes/test_langchain_embedding_call.yaml, test_langchain_embedding_call_error.yaml
Added mock HTTP interaction fixtures for OpenAI embeddings API calls capturing successful embedding responses and 401 Unauthorized error scenarios with invalid API key responses.
LangChain Embedding Tests
instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py
Added new test module with two tests: test_langchain_embedding_call validates successful embedding retrieval with span attributes and duration metrics; test_langchain_embedding_call_error validates error path span emission and ERROR status on API failure.
Utility Library Tests
util/opentelemetry-util-genai/tests/test_embedding_invocation.py
Extended EmbeddingInvocation tests to cover telemetry lifecycle (start/stop/fail), default operation_name behavior, semantic attributes validation, span naming conventions, and explicit operation_name override scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~18 minutes

Poem

🐰 A hop and a skip through telemetry springs,
Embeddings dance where the operation sings,
Tests wrapped in cassettes, both triumph and woe,
The instrumentation chain flexes its flow! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing a hardcoded operation_name value from EmbeddingInvocation in langchain instrumentation to use the default instead.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copy-pr-214

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
util/opentelemetry-util-genai/tests/test_embedding_invocation.py (2)

67-69: Silent no-op when internal API changes.

The if span_emitters: guard means that if emitters_for("span") returns an empty list (e.g., due to an internal refactor), the tracer override is silently skipped and the tests will either pass vacuously or fail with a confusing "0 spans" message. Consider failing fast instead:

-    if span_emitters:
-        span_emitters[0]._tracer = tracer_provider.get_tracer(__name__)
+    assert span_emitters, "Expected at least one span emitter from telemetry handler"
+    span_emitters[0]._tracer = tracer_provider.get_tracer(__name__)

Also applies to: 98-100, 123-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util/opentelemetry-util-genai/tests/test_embedding_invocation.py` around
lines 67 - 69, The test currently silently skips overriding the tracer when
handler._emitter.emitters_for("span") returns an empty list (span_emitters),
which masks API changes; replace the permissive if-check with a fail-fast
assertion or explicit raise so the test errors immediately if no span emitter is
found, then proceed to set span_emitters[0]._tracer =
tracer_provider.get_tracer(__name__); apply the same change to the other
occurrences that use emitters_for("span") (the blocks around lines 98-100 and
123-125) to ensure tests fail loudly on internal API changes.

62-69: Consider extracting the repeated tracer-setup boilerplate into a fixture.

The span exporter + tracer provider + internal _emitter wiring is duplicated across three tests (test_embedding_invocation_span_attributes, test_embedding_invocation_span_name, test_embedding_invocation_with_error). A shared pytest fixture would reduce duplication and make it easier to update if the internal API changes.

♻️ Example fixture sketch
`@pytest.fixture`
def telemetry_setup():
    span_exporter = InMemorySpanExporter()
    tracer_provider = TracerProvider()
    tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter))

    handler = get_telemetry_handler()
    span_emitters = list(handler._emitter.emitters_for("span"))
    if span_emitters:
        span_emitters[0]._tracer = tracer_provider.get_tracer(__name__)

    return handler, span_exporter

Also applies to: 93-100, 118-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@util/opentelemetry-util-genai/tests/test_embedding_invocation.py` around
lines 62 - 69, The three tests (test_embedding_invocation_span_attributes,
test_embedding_invocation_span_name, test_embedding_invocation_with_error)
repeat tracer setup boilerplate; extract that into a pytest fixture (e.g.,
telemetry_setup) that creates InMemorySpanExporter, TracerProvider, adds
SimpleSpanProcessor, obtains handler via get_telemetry_handler(), wires
handler._emitter.emitters_for("span")[0]._tracer with
tracer_provider.get_tracer(__name__) if present, and returns (handler,
span_exporter); then update the three tests to accept the telemetry_setup
fixture and use the returned handler and span_exporter instead of duplicating
the setup.
instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py (1)

63-74: Extract duplicate span-search logic into a helper.

The span-finding loop is duplicated verbatim between both tests. A small helper would improve readability and reduce maintenance burden.

♻️ Example helper
def _find_span_by_operation(spans, operation_name):
    for s in spans:
        attrs = getattr(s, "attributes", None)
        if attrs and attrs.get(gen_ai_attributes.GEN_AI_OPERATION_NAME) == operation_name:
            return s
    return None

Usage:

embedding_span = _find_span_by_operation(spans, EMBEDDINGS)
assert embedding_span is not None, "No embeddings operation span found"

Also applies to: 132-143

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py`
around lines 63 - 74, Extract the duplicated span-search loop into a small
helper function (e.g., _find_span_by_operation) that accepts spans and an
operation_name and returns the matching span or None; replace both verbatim
loops (the one using spans, attrs_obj/getattr and
gen_ai_attributes.GEN_AI_OPERATION_NAME to compare against EMBEDDINGS) with
calls to this helper (e.g., embedding_span = _find_span_by_operation(spans,
EMBEDDINGS)) and add an assertion like assert embedding_span is not None where
needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py`:
- Around line 63-74: Extract the duplicated span-search loop into a small helper
function (e.g., _find_span_by_operation) that accepts spans and an
operation_name and returns the matching span or None; replace both verbatim
loops (the one using spans, attrs_obj/getattr and
gen_ai_attributes.GEN_AI_OPERATION_NAME to compare against EMBEDDINGS) with
calls to this helper (e.g., embedding_span = _find_span_by_operation(spans,
EMBEDDINGS)) and add an assertion like assert embedding_span is not None where
needed.

In `@util/opentelemetry-util-genai/tests/test_embedding_invocation.py`:
- Around line 67-69: The test currently silently skips overriding the tracer
when handler._emitter.emitters_for("span") returns an empty list
(span_emitters), which masks API changes; replace the permissive if-check with a
fail-fast assertion or explicit raise so the test errors immediately if no span
emitter is found, then proceed to set span_emitters[0]._tracer =
tracer_provider.get_tracer(__name__); apply the same change to the other
occurrences that use emitters_for("span") (the blocks around lines 98-100 and
123-125) to ensure tests fail loudly on internal API changes.
- Around line 62-69: The three tests (test_embedding_invocation_span_attributes,
test_embedding_invocation_span_name, test_embedding_invocation_with_error)
repeat tracer setup boilerplate; extract that into a pytest fixture (e.g.,
telemetry_setup) that creates InMemorySpanExporter, TracerProvider, adds
SimpleSpanProcessor, obtains handler via get_telemetry_handler(), wires
handler._emitter.emitters_for("span")[0]._tracer with
tracer_provider.get_tracer(__name__) if present, and returns (handler,
span_exporter); then update the three tests to accept the telemetry_setup
fixture and use the returned handler and span_exporter instead of duplicating
the setup.

@qodo-code-review
Copy link

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (2) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Changelog missing embeddings rename 📘 Rule violation ✧ Quality
Description
The PR changes emitted GenAI operation/span naming for LangChain embeddings by removing the
hardcoded operation_name, but the package changelog was not updated to document this user-visible
telemetry behavior change. This can surprise users who rely on span names/attributes for dashboards
and alerts.
Code

instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py[R205-209]

            # Create embedding invocation
            embedding = UtilEmbeddingInvocation(
-                operation_name="embedding",
                request_model=request_model,
                input_texts=texts if isinstance(texts, list) else [texts],
                provider=provider,
Evidence
Compliance requires updating documentation/changelog when behavior changes; the PR changes how the
embedding invocation is created (no explicit operation_name override), but the LangChain
instrumentation changelog contains no entry describing this embeddings operation name/span naming
change.

AGENTS.md
instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py[205-211]
instrumentation-genai/opentelemetry-instrumentation-langchain/CHANGELOG.md[5-14]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The LangChain instrumentation behavior changed (embeddings operation/span naming now uses the default `embeddings` operation name), but the package changelog was not updated.

## Issue Context
This is a user-visible observability change that can affect dashboards/alerts and should be documented per the compliance checklist.

## Fix Focus Areas
- instrumentation-genai/opentelemetry-instrumentation-langchain/CHANGELOG.md[1-30]
- instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/__init__.py[205-211]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Non-Google test docstring 📘 Rule violation ✓ Correctness
Description
A newly added docstring does not follow Google-style formatting (summary line + blank line +
sections like Raises when applicable). This reduces consistency and can cause documentation/lint
tooling to flag style violations.
Code

instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py[R110-114]

+    """When the embedding API returns an error the wrapper must:
+    1. Still emit a span with operation_name == 'embeddings'.
+    2. Mark the span status as ERROR.
+    3. Re-raise the original exception so the caller sees the failure.
+    """
Evidence
The compliance checklist requires new/modified docstrings to follow Google style; the new docstring
uses an inline numbered list without standard Google-style structure.

AGENTS.md
instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py[110-114]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A newly added docstring does not conform to Google-style formatting.

## Issue Context
Google-style docstrings improve consistency and compatibility with documentation tooling.

## Fix Focus Areas
- instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_langchain_embedding.py[110-114]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Examples still override op name 🐞 Bug ✧ Quality
Description
LangChain instrumentation now relies on EmbeddingInvocation’s default operation_name ("embeddings"),
but util-genai’s embeddings example still hardcodes operation_name="embedding". Users copying the
example will emit non-semconv operation names/span names and get inconsistent telemetry compared to
the instrumentations.
Code

instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py[R205-209]

            # Create embedding invocation
            embedding = UtilEmbeddingInvocation(
-                operation_name="embedding",
                request_model=request_model,
                input_texts=texts if isinstance(texts, list) else [texts],
                provider=provider,
Evidence
The PR change removes the explicit override in LangChain and now depends on the util-genai
EmbeddingInvocation default. The EmbeddingInvocation default is semconv "embeddings", but the
util-genai example continues to explicitly pass the singular "embedding", overriding the default and
producing inconsistent semconv attributes/span names for anyone following the example.

instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/init.py[205-211]
util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py[417-424]
util/opentelemetry-util-genai/examples/embeddings_example.py[70-76]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`util-genai` examples hardcode `operation_name=&quot;embedding&quot;` (singular), which overrides the semconv default (`&quot;embeddings&quot;`). After this PR, LangChain instrumentation intentionally relies on the default, so the example becomes a misleading source of non-semconv telemetry.

### Issue Context
- `EmbeddingInvocation` defaults operation_name to `GenAiOperationNameValues.EMBEDDINGS`.
- Examples should not teach overriding this default with a non-semconv string.

### Fix Focus Areas
- util/opentelemetry-util-genai/examples/embeddings_example.py[70-76]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Singleton handler test leakage 🐞 Bug ⛯ Reliability
Description
New util-genai tests mutate the singleton TelemetryHandler’s internal span emitter tracer
(span_emitters[0]._tracer) without resetting the singleton between tests. This can create order
dependence and flaky behavior (especially with parallel test execution) because the mutated
handler/emitter pipeline is shared.
Code

util/opentelemetry-util-genai/tests/test_embedding_invocation.py[R60-70]

+def test_embedding_invocation_span_attributes():
+    """Spans should carry the correct operation_name attribute from the default."""
+    span_exporter = InMemorySpanExporter()
+    tracer_provider = TracerProvider()
+    tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter))
+
+    handler = get_telemetry_handler()
+    span_emitters = list(handler._emitter.emitters_for("span"))
+    if span_emitters:
+        span_emitters[0]._tracer = tracer_provider.get_tracer(__name__)
+
Evidence
get_telemetry_handler() caches a singleton in a function attribute. The new tests call
get_telemetry_handler() and mutate internal/private emitter state (_emitter, _tracer), meaning
subsequent tests may reuse a handler with unexpected tracer configuration.

util/opentelemetry-util-genai/tests/test_embedding_invocation.py[60-70]
util/opentelemetry-util-genai/src/opentelemetry/util/genai/handler.py[1115-1133]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`test_embedding_invocation.py` mutates private state on a cached singleton handler (`handler._emitter... _tracer = ...`). Because the handler is a singleton, this state can leak across tests, creating order dependence and flakiness.

### Issue Context
`get_telemetry_handler()` caches a singleton handler on the function object (`_default_handler`). Without resetting it, tests share the same handler instance.

### Fix Focus Areas
- util/opentelemetry-util-genai/tests/test_embedding_invocation.py[60-70]
- util/opentelemetry-util-genai/src/opentelemetry/util/genai/handler.py[1115-1133]
- util/opentelemetry-util-genai/tests/conftest.py[1-7]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines 205 to 209
# Create embedding invocation
embedding = UtilEmbeddingInvocation(
operation_name="embedding",
request_model=request_model,
input_texts=texts if isinstance(texts, list) else [texts],
provider=provider,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Changelog missing embeddings rename 📘 Rule violation ✧ Quality

The PR changes emitted GenAI operation/span naming for LangChain embeddings by removing the
hardcoded operation_name, but the package changelog was not updated to document this user-visible
telemetry behavior change. This can surprise users who rely on span names/attributes for dashboards
and alerts.
Agent Prompt
## Issue description
The LangChain instrumentation behavior changed (embeddings operation/span naming now uses the default `embeddings` operation name), but the package changelog was not updated.

## Issue Context
This is a user-visible observability change that can affect dashboards/alerts and should be documented per the compliance checklist.

## Fix Focus Areas
- instrumentation-genai/opentelemetry-instrumentation-langchain/CHANGELOG.md[1-30]
- instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/__init__.py[205-211]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants