Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis pull request updates logging infrastructure across OpenTelemetry GenAI utilities by centralizing log configuration, bumps package versions, expands documentation with Deepeval and instrumentation examples, and modifies a test to align with the new logging setup. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
354-371:⚠️ Potential issue | 🟡 MinorFix typos and syntax in the in-code example.
Line 356 “Sudo-code” → “Pseudo-code”; Line 366 “actuall” → “actually”; Line 364 has mismatched parentheses and will not run as written.
✅ Suggested correction
-Sudo-code to create LLMInvocation for your in-code for an application: +Pseudo-code to create LLMInvocation for your in-code for an application: -inv = LLMInvocation(request_model="gpt-5-nano", input_messages=[InputMessage(role="user", parts=[Text(user_input))])], provider="openai") +inv = LLMInvocation( + request_model="gpt-5-nano", + input_messages=[InputMessage(role="user", parts=[Text(user_input)])], + provider="openai", +) -# your code which actuall invokes llm here +# your code which actually invokes llm here
🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 283-317: Standardize the evaluator token casing used in examples
to match the parser's expected token (replace the uppercase "Deepeval(...)" with
the lowercase "deepeval(...)" and ensure all examples using
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS consistently show "deepeval" e.g.,
change Deepeval(LLMInvocation(bias,toxicity)) to
deepeval(LLMInvocation(bias,toxicity)) so examples all use the same token casing
to avoid user confusion.
- Around line 335-342: In the README "Installing a Translator library" section,
fix the typos and wording: change the phrase "exiting 3rd party" to "existing
third‑party" and "traseloop" to "traceloop" (also ensure the example package
path uses "traceloop" in "opentelemetry-util-genai-traceloop-translator");
update the sentence containing "convert it to Splunk Distro semantic
conventions/run instrumentation-side evaluations" only if needed for clarity but
at minimum correct the two spelled words mentioned above.
🧹 Nitpick comments (1)
util/opentelemetry-util-genai/src/opentelemetry/util/genai/debug.py (1)
60-69: Consider setting_PARENT_LOGGER.propagate = Falseto prevent duplicate log output.If the application (or a framework) has configured a handler on the root logger (e.g., via
logging.basicConfig()), messages from anyopentelemetry.util.genai.*child logger will be emitted twice: once by this parent handler and once by the root logger's handler, sincepropagatedefaults toTrue.Adding
_PARENT_LOGGER.propagate = Falsewould keep all GenAI log output self-contained and avoid duplicates in applications that configure root logging.Suggested fix
_PARENT_LOGGER = logging.getLogger("opentelemetry.util.genai") if not _PARENT_LOGGER.handlers: parent_handler = logging.StreamHandler() parent_fmt = logging.Formatter("[%(name)s] %(levelname)s: %(message)s") parent_handler.setFormatter(parent_fmt) _PARENT_LOGGER.addHandler(parent_handler) _PARENT_LOGGER.setLevel(logging.DEBUG if _ENABLED else logging.INFO) + _PARENT_LOGGER.propagate = False
| # configure which GenAI types to evaluate and which evaluations | ||
| export OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="Deepeval(LLMInvocation(bias,toxicity))" | ||
| # Deepeval optimization | ||
| export DEEPEVAL_FILE_SYSTEM=READ_ONLY | ||
| export DEEPEVAL_TELEMETRY_OPT_OUT=YES | ||
| # set environment and service names for ease of filtering | ||
| export OTEL_SERVICE_NAME=genai-eval-test | ||
| export OTEL_RESOURCE_ATTRIBUTES='deployment.environment=genai-dev' | ||
| ``` | ||
|
|
||
| For telemetry to properly work with Splunk Platform instrumentation, set the env var to enable Splunk format for aggregated evaluation results. | ||
|
|
||
| ```bash | ||
| export OTEL_INSTRUMENTATION_GENAI_EVALS_RESULTS_AGGREGATION=true | ||
| export OTEL_INSTRUMENTATION_GENAI_EMITTERS_EVALUATION="replace-category:SplunkEvaluationResults" | ||
| ``` | ||
|
|
||
| ### Deepeval evaluator integration configuration | ||
|
|
||
| Instrumentation-side evaluations can be configured using `OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS` environment variable | ||
|
|
||
| ```bash | ||
| # uses defaults - evaluates LLMInvocation and AgentInvocation with 5 metrics: | ||
| # (bias,toxicity,answer_relevancy,hallucination,sentiment) | ||
| OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval" | ||
|
|
||
| # Specific metrics for LLMInvocation | ||
| OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval(LLMInvocation(bias,toxicity))" | ||
|
|
||
| # Multiple types with metrics | ||
| OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval(LLMInvocation(bias,toxicity),AgentInvocation(hallucination))" | ||
|
|
||
| # With metric options | ||
| OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval(LLMInvocation(hallucination(threshold=0.8)))" | ||
| ``` |
There was a problem hiding this comment.
Align Deepeval token casing to avoid config ambiguity.
Line 284 uses Deepeval(...) while Lines 307–316 use deepeval(...). If the evaluator grammar is case‑sensitive, this is a foot‑gun for users. Please standardize the token casing across examples (and ideally match the parser’s expected value).
🤖 Prompt for AI Agents
In `@README.md` around lines 283 - 317, Standardize the evaluator token casing
used in examples to match the parser's expected token (replace the uppercase
"Deepeval(...)" with the lowercase "deepeval(...)" and ensure all examples using
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS consistently show "deepeval" e.g.,
change Deepeval(LLMInvocation(bias,toxicity)) to
deepeval(LLMInvocation(bias,toxicity)) so examples all use the same token casing
to avoid user confusion.
| ### Installing a Translator library | ||
|
|
||
| To use exiting 3rd party instrumentations and convert it to Splunk Distro semantic conventions/run instrumentation-side evaluations you can install a translator library. | ||
|
|
||
| For example for existing traseloop instrumentations | ||
| ```bash | ||
| pip install -e util/opentelemetry-util-genai-traceloop-translator --no-deps | ||
| ``` |
There was a problem hiding this comment.
Fix wording + spelling in translator section.
Line 337 “exiting 3rd party” → “existing third‑party” and Line 339 “traseloop” → “traceloop”.
💡 Suggested doc fix
-To use exiting 3rd party instrumentations and convert it to Splunk Distro semantic conventions/run instrumentation-side evaluations you can install a translator library.
+To use existing third-party instrumentations and convert them to Splunk Distro semantic conventions/run instrumentation-side evaluations you can install a translator library.
-For example for existing traseloop instrumentations
+For example for existing traceloop instrumentations🧰 Tools
🪛 LanguageTool
[grammar] ~337-~337: Use a hyphen to join words.
Context: ...a Translator library To use exiting 3rd party instrumentations and convert it to...
(QB_NEW_EN_HYPHEN)
[grammar] ~339-~339: Ensure spelling is correct
Context: ...ator library. For example for existing traseloop instrumentations ```bash pip install -e...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In `@README.md` around lines 335 - 342, In the README "Installing a Translator
library" section, fix the typos and wording: change the phrase "exiting 3rd
party" to "existing third‑party" and "traseloop" to "traceloop" (also ensure the
example package path uses "traceloop" in
"opentelemetry-util-genai-traceloop-translator"); update the sentence containing
"convert it to Splunk Distro semantic conventions/run instrumentation-side
evaluations" only if needed for clarity but at minimum correct the two spelled
words mentioned above.
Fixes some outdated or confusing parts of the development setup.
Fixes logging for genai debug logging (it did not work before in eval module)
Summary by CodeRabbit
Documentation
Bug Fixes
Tests