Skip to content

Fix/dev set up test#2

Open
shuningc wants to merge 4 commits intomainfrom
fix/dev-set-up-test
Open

Fix/dev set up test#2
shuningc wants to merge 4 commits intomainfrom
fix/dev-set-up-test

Conversation

@shuningc
Copy link
Owner

@shuningc shuningc commented Feb 6, 2026

Fixes some outdated or confusing parts of the development setup.
Fixes logging for genai debug logging (it did not work before in eval module)

Summary by CodeRabbit

  • Documentation

    • Expanded development setup guide with additional environment variables and Deepeval integration guidance.
    • Added comprehensive instrumentation examples and configuration instructions.
    • Updated install commands and related references.
  • Bug Fixes

    • Fixed logging visibility across modules: INFO messages now always displayed; DEBUG messages require OTEL_INSTRUMENTATION_GENAI_DEBUG flag.
  • Tests

    • Refined test logging configuration for improved precision.

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

This pull request updates logging infrastructure across OpenTelemetry GenAI utilities by centralizing log configuration, bumps package versions, expands documentation with Deepeval and instrumentation examples, and modifies a test to align with the new logging setup.

Changes

Cohort / File(s) Summary
Documentation Updates
README.md
Expanded development setup with environment variables, Deepeval configuration examples, instrumentation library installation instructions, and reorganized in-code instrumentation examples with updated aidefense usage paths.
Logging Configuration
util/opentelemetry-util-genai/src/opentelemetry/util/genai/debug.py
Introduces centralized parent logger ("opentelemetry.util.genai") with StreamHandler, dynamic level management (DEBUG when enabled, INFO otherwise), and propagation control to prevent duplicate logs.
Bootstrap Integration
util/opentelemetry-util-genai-evals/src/opentelemetry/util/genai/evals/bootstrap.py
Adds import of debug module to ensure parent logger configuration executes before logging calls in evals bootstrap.
Version Bumps
util/opentelemetry-util-genai-evals/src/opentelemetry/util/genai/evals/version.py, util/opentelemetry-util-genai/src/opentelemetry/util/genai/version.py
Version increments: evals package from 0.1.7 to 0.1.8, genai package from 0.1.9 to 0.1.10.
Changelog Entries
util/opentelemetry-util-genai-evals/CHANGELOG.md, util/opentelemetry-util-genai/CHANGELOG.md
Added entries documenting fixes for log visibility: INFO logs always shown, DEBUG logs require OTEL_INSTRUMENTATION_GENAI_DEBUG=true.
Test Updates
util/opentelemetry-util-genai/tests/test_upload_hook.py
Modified assertLogs to target "opentelemetry.util.genai.upload_hook" specifically instead of any logger for more precise DEBUG capture.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Our logs now dance in harmony,
With parent logger's symphony,
INFO clear, DEBUG when you ask—
Organization's our beloved task!
From evals to genai they align,
Each version bumped, each changelog signed. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix/dev set up test' is vague and uses generic formatting (slashes, capitalization) that doesn't clearly convey the main changes—fixing logging configuration and development documentation. Consider a more specific title like 'Fix genai debug logging and development setup documentation' to clearly communicate the primary changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/dev-set-up-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shuningc
Copy link
Owner Author

shuningc commented Feb 6, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

354-371: ⚠️ Potential issue | 🟡 Minor

Fix typos and syntax in the in-code example.

Line 356 “Sudo-code” → “Pseudo-code”; Line 366 “actuall” → “actually”; Line 364 has mismatched parentheses and will not run as written.

✅ Suggested correction
-Sudo-code to create LLMInvocation for your in-code for an application:
+Pseudo-code to create LLMInvocation for your in-code for an application:

-inv = LLMInvocation(request_model="gpt-5-nano", input_messages=[InputMessage(role="user", parts=[Text(user_input))])], provider="openai")
+inv = LLMInvocation(
+    request_model="gpt-5-nano",
+    input_messages=[InputMessage(role="user", parts=[Text(user_input)])],
+    provider="openai",
+)

-# your code which actuall invokes llm here
+# your code which actually invokes llm here
🤖 Fix all issues with AI agents
In `@README.md`:
- Around line 283-317: Standardize the evaluator token casing used in examples
to match the parser's expected token (replace the uppercase "Deepeval(...)" with
the lowercase "deepeval(...)" and ensure all examples using
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS consistently show "deepeval" e.g.,
change Deepeval(LLMInvocation(bias,toxicity)) to
deepeval(LLMInvocation(bias,toxicity)) so examples all use the same token casing
to avoid user confusion.
- Around line 335-342: In the README "Installing a Translator library" section,
fix the typos and wording: change the phrase "exiting 3rd party" to "existing
third‑party" and "traseloop" to "traceloop" (also ensure the example package
path uses "traceloop" in "opentelemetry-util-genai-traceloop-translator");
update the sentence containing "convert it to Splunk Distro semantic
conventions/run instrumentation-side evaluations" only if needed for clarity but
at minimum correct the two spelled words mentioned above.
🧹 Nitpick comments (1)
util/opentelemetry-util-genai/src/opentelemetry/util/genai/debug.py (1)

60-69: Consider setting _PARENT_LOGGER.propagate = False to prevent duplicate log output.

If the application (or a framework) has configured a handler on the root logger (e.g., via logging.basicConfig()), messages from any opentelemetry.util.genai.* child logger will be emitted twice: once by this parent handler and once by the root logger's handler, since propagate defaults to True.

Adding _PARENT_LOGGER.propagate = False would keep all GenAI log output self-contained and avoid duplicates in applications that configure root logging.

Suggested fix
 _PARENT_LOGGER = logging.getLogger("opentelemetry.util.genai")
 if not _PARENT_LOGGER.handlers:
     parent_handler = logging.StreamHandler()
     parent_fmt = logging.Formatter("[%(name)s] %(levelname)s: %(message)s")
     parent_handler.setFormatter(parent_fmt)
     _PARENT_LOGGER.addHandler(parent_handler)
     _PARENT_LOGGER.setLevel(logging.DEBUG if _ENABLED else logging.INFO)
+    _PARENT_LOGGER.propagate = False

Comment on lines +283 to +317
# configure which GenAI types to evaluate and which evaluations
export OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="Deepeval(LLMInvocation(bias,toxicity))"
# Deepeval optimization
export DEEPEVAL_FILE_SYSTEM=READ_ONLY
export DEEPEVAL_TELEMETRY_OPT_OUT=YES
# set environment and service names for ease of filtering
export OTEL_SERVICE_NAME=genai-eval-test
export OTEL_RESOURCE_ATTRIBUTES='deployment.environment=genai-dev'
```

For telemetry to properly work with Splunk Platform instrumentation, set the env var to enable Splunk format for aggregated evaluation results.

```bash
export OTEL_INSTRUMENTATION_GENAI_EVALS_RESULTS_AGGREGATION=true
export OTEL_INSTRUMENTATION_GENAI_EMITTERS_EVALUATION="replace-category:SplunkEvaluationResults"
```

### Deepeval evaluator integration configuration

Instrumentation-side evaluations can be configured using `OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS` environment variable

```bash
# uses defaults - evaluates LLMInvocation and AgentInvocation with 5 metrics:
# (bias,toxicity,answer_relevancy,hallucination,sentiment)
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval"

# Specific metrics for LLMInvocation
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval(LLMInvocation(bias,toxicity))"

# Multiple types with metrics
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval(LLMInvocation(bias,toxicity),AgentInvocation(hallucination))"

# With metric options
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS="deepeval(LLMInvocation(hallucination(threshold=0.8)))"
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align Deepeval token casing to avoid config ambiguity.

Line 284 uses Deepeval(...) while Lines 307–316 use deepeval(...). If the evaluator grammar is case‑sensitive, this is a foot‑gun for users. Please standardize the token casing across examples (and ideally match the parser’s expected value).

🤖 Prompt for AI Agents
In `@README.md` around lines 283 - 317, Standardize the evaluator token casing
used in examples to match the parser's expected token (replace the uppercase
"Deepeval(...)" with the lowercase "deepeval(...)" and ensure all examples using
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS consistently show "deepeval" e.g.,
change Deepeval(LLMInvocation(bias,toxicity)) to
deepeval(LLMInvocation(bias,toxicity)) so examples all use the same token casing
to avoid user confusion.

Comment on lines +335 to +342
### Installing a Translator library

To use exiting 3rd party instrumentations and convert it to Splunk Distro semantic conventions/run instrumentation-side evaluations you can install a translator library.

For example for existing traseloop instrumentations
```bash
pip install -e util/opentelemetry-util-genai-traceloop-translator --no-deps
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix wording + spelling in translator section.

Line 337 “exiting 3rd party” → “existing third‑party” and Line 339 “traseloop” → “traceloop”.

💡 Suggested doc fix
-To use exiting 3rd party instrumentations and convert it to Splunk Distro semantic conventions/run instrumentation-side evaluations you can install a translator library.
+To use existing third-party instrumentations and convert them to Splunk Distro semantic conventions/run instrumentation-side evaluations you can install a translator library.

-For example for existing traseloop instrumentations
+For example for existing traceloop instrumentations
🧰 Tools
🪛 LanguageTool

[grammar] ~337-~337: Use a hyphen to join words.
Context: ...a Translator library To use exiting 3rd party instrumentations and convert it to...

(QB_NEW_EN_HYPHEN)


[grammar] ~339-~339: Ensure spelling is correct
Context: ...ator library. For example for existing traseloop instrumentations ```bash pip install -e...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In `@README.md` around lines 335 - 342, In the README "Installing a Translator
library" section, fix the typos and wording: change the phrase "exiting 3rd
party" to "existing third‑party" and "traseloop" to "traceloop" (also ensure the
example package path uses "traceloop" in
"opentelemetry-util-genai-traceloop-translator"); update the sentence containing
"convert it to Splunk Distro semantic conventions/run instrumentation-side
evaluations" only if needed for clarity but at minimum correct the two spelled
words mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants