Add `MultiTurnSample` conversion path for RAGAS trajectory-aware agent metrics by yczhang-nv · Pull Request #1774 · NVIDIA/NeMo-Agent-Toolkit

yczhang-nv · 2026-03-09T23:32:46Z

Description

The RAGAS ATIF adapter only converted to SingleTurnSample, limiting evaluation to metrics like AnswerAccuracy. Trajectory-aware metrics (AgentGoalAccuracyWithoutReference, ToolCallAccuracy) require MultiTurnSample and previously failed with a validation error.

This PR adds a MultiTurnSample conversion path alongside the existing SingleTurnSample path, controlled by a new sample_type config field (single_turn | multi_turn).

Closes

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Documentation
- Added comprehensive guidance on multi-turn Ragas metrics for evaluating agent workflows with tool calls, including configuration examples and metric recommendations.
New Features
- Introduced multi-turn evaluation mode via new sample_type configuration option (defaults to single-turn for backward compatibility).
- Enabled assessment of tool call accuracy and agent goal metrics in multi-turn agent conversations.
Tests
- Added comprehensive test coverage for multi-turn evaluation scenarios and edge cases.

…metrics Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

coderabbitai · 2026-03-09T23:33:04Z

Walkthrough

Adds multi-turn evaluation support to Ragas evaluators by introducing sample type configuration, ATIF trajectory-to-Ragas message conversion helpers, configuration validation requiring ATIF evaluator enablement for multi-turn mode, and comprehensive test coverage for single-turn and multi-turn conversions.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/extend/custom-components/custom-evaluator.md`, `docs/source/improve-workflows/evaluate.md`, `examples/evaluation_and_profiling/simple_web_query_eval/atif-eval-readme.md`	Added guidance on multi-turn Ragas metrics with ATIF trajectories, including configuration examples, message type mappings (user→HumanMessage, agent→AIMessage with ToolCall, observations→ToolMessage), and multi-turn metric options (AgentGoalAccuracyWithoutReference, ToolCallAccuracy).
Core Implementation `packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py`	Introduced SampleType alias and new sampling path with `_build_single_turn_sample` and `_build_multi_turn_sample` methods. Added helper functions (`_join_non_empty`, `_atif_step_to_ragas_messages`, `_atif_trajectory_to_multi_turn_messages`) to convert ATIF trajectories to RagAS message sequences. Extended `__init__` with sample_type parameter and updated `atif_samples_to_ragas` to dispatch based on configured sample type.
Configuration `packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/register.py`	Added sample_type field (Literal["single_turn", "multi_turn"]) to RagasEvaluatorConfig with validation requiring enable_atif_evaluator=True when sample_type is multi_turn. Wired sample_type through to RAGAtifEvaluator instantiation.
Tests `packages/nvidia_nat_ragas/tests/test_rag_evaluate.py`	Added 9 new test cases covering multi-turn ATIF-to-MultiTurnSample conversion scenarios: basic conversion, no tools, empty trajectories, multiple parallel tool calls, observations without tool calls, reference_tool_calls propagation, default single-turn behavior, and configuration validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding MultiTurnSample conversion for RAGAS agent metrics, directly aligning with the PR's core objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py (1)

144-159: Add return type hints to private methods.

Both _build_single_turn_sample and _build_multi_turn_sample lack return type annotations. Per coding guidelines, public APIs require type hints, and while these are private methods, adding type hints improves code clarity and IDE support.

♻️ Suggested type hints

-    def _build_single_turn_sample(self, sample: AtifEvalSample):
+    def _build_single_turn_sample(self, sample: AtifEvalSample) -> "SingleTurnSample":
         """Build a RAGAS SingleTurnSample from an ATIF eval sample."""
         from ragas import SingleTurnSample

-    def _build_multi_turn_sample(self, sample: AtifEvalSample):
+    def _build_multi_turn_sample(self, sample: AtifEvalSample) -> "MultiTurnSample":
         """Build a RAGAS MultiTurnSample from an ATIF eval sample."""
         from ragas import MultiTurnSample

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py`
around lines 144 - 159, Add explicit return type annotations to the private
builder methods: annotate _build_single_turn_sample to return
ragas.SingleTurnSample and annotate _build_multi_turn_sample to return
ragas.MultiTurnSample (or the concrete types imported from ragas if already
imported). Ensure the types are importable in the module (add "from ragas import
SingleTurnSample, MultiTurnSample" or qualify with ragas.SingleTurnSample) and
update any typing imports (e.g., from typing import Optional) if required by
your editor/linters so IDEs and static checkers recognize the return types.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py`:
- Around line 144-159: Add explicit return type annotations to the private
builder methods: annotate _build_single_turn_sample to return
ragas.SingleTurnSample and annotate _build_multi_turn_sample to return
ragas.MultiTurnSample (or the concrete types imported from ragas if already
imported). Ensure the types are importable in the module (add "from ragas import
SingleTurnSample, MultiTurnSample" or qualify with ragas.SingleTurnSample) and
update any typing imports (e.g., from typing import Optional) if required by
your editor/linters so IDEs and static checkers recognize the return types.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a1f107d5-d7f1-4a0d-a301-64de77bca6c6

📥 Commits

Reviewing files that changed from the base of the PR and between 0aeb76d and 051bf74.

📒 Files selected for processing (6)

docs/source/extend/custom-components/custom-evaluator.md
docs/source/improve-workflows/evaluate.md
examples/evaluation_and_profiling/simple_web_query_eval/atif-eval-readme.md
packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py
packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/register.py
packages/nvidia_nat_ragas/tests/test_rag_evaluate.py

willkill07 · 2026-03-10T14:37:02Z

packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py

+    Mapping:
+        source="user"  → HumanMessage
+        source="agent" → AIMessage (with optional ToolCall list)
+                         followed by ToolMessage per observation result
+        source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage)


I believe this should fix the documentation build failure

Suggested change

Mapping:

source="user" → HumanMessage

source="agent" → AIMessage (with optional ToolCall list)

followed by ToolMessage per observation result

source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage)

Mapping:

* source="user" → HumanMessage

* source="agent" → AIMessage (with optional ToolCall list)

followed by ToolMessage per observation result

* source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage)

AnuradhaKaruppiah · 2026-03-11T15:57:04Z

@yczhang-nv thx for making these changes, I will be taking this over for integration with the new ragas version

Add MultiTurnSample conversion path for RAGAS trajectory-aware agent …

5dfd9a3

…metrics Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

yczhang-nv self-assigned this Mar 9, 2026

yczhang-nv added feature request New feature or request non-breaking Non-breaking change labels Mar 9, 2026

yczhang-nv marked this pull request as ready for review March 9, 2026 23:32

yczhang-nv requested a review from a team as a code owner March 9, 2026 23:32

fix CI

051bf74

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

yczhang-nv changed the title ~~Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics~~ Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics Mar 9, 2026

willkill07 approved these changes Mar 10, 2026

View reviewed changes

AnuradhaKaruppiah added the DO NOT MERGE PR should not be merged; see PR for details label Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `MultiTurnSample` conversion path for RAGAS trajectory-aware agent metrics#1774

Add `MultiTurnSample` conversion path for RAGAS trajectory-aware agent metrics#1774
yczhang-nv wants to merge 2 commits intoNVIDIA:developfrom
yczhang-nv:yuchen-impl-multi-turn-sample

yczhang-nv commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

willkill07 Mar 10, 2026

Uh oh!

AnuradhaKaruppiah commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yczhang-nv commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

willkill07 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

AnuradhaKaruppiah commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yczhang-nv commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading