Skip to content

Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics#1774

Open
yczhang-nv wants to merge 2 commits intoNVIDIA:developfrom
yczhang-nv:yuchen-impl-multi-turn-sample
Open

Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics#1774
yczhang-nv wants to merge 2 commits intoNVIDIA:developfrom
yczhang-nv:yuchen-impl-multi-turn-sample

Conversation

@yczhang-nv
Copy link
Contributor

@yczhang-nv yczhang-nv commented Mar 9, 2026

Description

The RAGAS ATIF adapter only converted to SingleTurnSample, limiting evaluation to metrics like AnswerAccuracy. Trajectory-aware metrics (AgentGoalAccuracyWithoutReference, ToolCallAccuracy) require MultiTurnSample and previously failed with a validation error.

This PR adds a MultiTurnSample conversion path alongside the existing SingleTurnSample path, controlled by a new sample_type config field (single_turn | multi_turn).

Closes

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guidance on multi-turn Ragas metrics for evaluating agent workflows with tool calls, including configuration examples and metric recommendations.
  • New Features

    • Introduced multi-turn evaluation mode via new sample_type configuration option (defaults to single-turn for backward compatibility).
    • Enabled assessment of tool call accuracy and agent goal metrics in multi-turn agent conversations.
  • Tests

    • Added comprehensive test coverage for multi-turn evaluation scenarios and edge cases.

…metrics

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
@yczhang-nv yczhang-nv self-assigned this Mar 9, 2026
@yczhang-nv yczhang-nv added feature request New feature or request non-breaking Non-breaking change labels Mar 9, 2026
@yczhang-nv yczhang-nv marked this pull request as ready for review March 9, 2026 23:32
@yczhang-nv yczhang-nv requested a review from a team as a code owner March 9, 2026 23:32
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Walkthrough

Adds multi-turn evaluation support to Ragas evaluators by introducing sample type configuration, ATIF trajectory-to-Ragas message conversion helpers, configuration validation requiring ATIF evaluator enablement for multi-turn mode, and comprehensive test coverage for single-turn and multi-turn conversions.

Changes

Cohort / File(s) Summary
Documentation
docs/source/extend/custom-components/custom-evaluator.md, docs/source/improve-workflows/evaluate.md, examples/evaluation_and_profiling/simple_web_query_eval/atif-eval-readme.md
Added guidance on multi-turn Ragas metrics with ATIF trajectories, including configuration examples, message type mappings (user→HumanMessage, agent→AIMessage with ToolCall, observations→ToolMessage), and multi-turn metric options (AgentGoalAccuracyWithoutReference, ToolCallAccuracy).
Core Implementation
packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py
Introduced SampleType alias and new sampling path with _build_single_turn_sample and _build_multi_turn_sample methods. Added helper functions (_join_non_empty, _atif_step_to_ragas_messages, _atif_trajectory_to_multi_turn_messages) to convert ATIF trajectories to RagAS message sequences. Extended __init__ with sample_type parameter and updated atif_samples_to_ragas to dispatch based on configured sample type.
Configuration
packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/register.py
Added sample_type field (Literal["single_turn", "multi_turn"]) to RagasEvaluatorConfig with validation requiring enable_atif_evaluator=True when sample_type is multi_turn. Wired sample_type through to RAGAtifEvaluator instantiation.
Tests
packages/nvidia_nat_ragas/tests/test_rag_evaluate.py
Added 9 new test cases covering multi-turn ATIF-to-MultiTurnSample conversion scenarios: basic conversion, no tools, empty trajectories, multiple parallel tool calls, observations without tool calls, reference_tool_calls propagation, default single-turn behavior, and configuration validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding MultiTurnSample conversion for RAGAS agent metrics, directly aligning with the PR's core objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py (1)

144-159: Add return type hints to private methods.

Both _build_single_turn_sample and _build_multi_turn_sample lack return type annotations. Per coding guidelines, public APIs require type hints, and while these are private methods, adding type hints improves code clarity and IDE support.

♻️ Suggested type hints
-    def _build_single_turn_sample(self, sample: AtifEvalSample):
+    def _build_single_turn_sample(self, sample: AtifEvalSample) -> "SingleTurnSample":
         """Build a RAGAS SingleTurnSample from an ATIF eval sample."""
         from ragas import SingleTurnSample
-    def _build_multi_turn_sample(self, sample: AtifEvalSample):
+    def _build_multi_turn_sample(self, sample: AtifEvalSample) -> "MultiTurnSample":
         """Build a RAGAS MultiTurnSample from an ATIF eval sample."""
         from ragas import MultiTurnSample
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py`
around lines 144 - 159, Add explicit return type annotations to the private
builder methods: annotate _build_single_turn_sample to return
ragas.SingleTurnSample and annotate _build_multi_turn_sample to return
ragas.MultiTurnSample (or the concrete types imported from ragas if already
imported). Ensure the types are importable in the module (add "from ragas import
SingleTurnSample, MultiTurnSample" or qualify with ragas.SingleTurnSample) and
update any typing imports (e.g., from typing import Optional) if required by
your editor/linters so IDEs and static checkers recognize the return types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py`:
- Around line 144-159: Add explicit return type annotations to the private
builder methods: annotate _build_single_turn_sample to return
ragas.SingleTurnSample and annotate _build_multi_turn_sample to return
ragas.MultiTurnSample (or the concrete types imported from ragas if already
imported). Ensure the types are importable in the module (add "from ragas import
SingleTurnSample, MultiTurnSample" or qualify with ragas.SingleTurnSample) and
update any typing imports (e.g., from typing import Optional) if required by
your editor/linters so IDEs and static checkers recognize the return types.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a1f107d5-d7f1-4a0d-a301-64de77bca6c6

📥 Commits

Reviewing files that changed from the base of the PR and between 0aeb76d and 051bf74.

📒 Files selected for processing (6)
  • docs/source/extend/custom-components/custom-evaluator.md
  • docs/source/improve-workflows/evaluate.md
  • examples/evaluation_and_profiling/simple_web_query_eval/atif-eval-readme.md
  • packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py
  • packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/register.py
  • packages/nvidia_nat_ragas/tests/test_rag_evaluate.py

@yczhang-nv yczhang-nv changed the title Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics Mar 9, 2026
Comment on lines +71 to +75
Mapping:
source="user" → HumanMessage
source="agent" → AIMessage (with optional ToolCall list)
followed by ToolMessage per observation result
source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should fix the documentation build failure

Suggested change
Mapping:
source="user"HumanMessage
source="agent"AIMessage (with optional ToolCall list)
followed by ToolMessage per observation result
source="system"HumanMessage (best-effort; RAGAS has no SystemMessage)
Mapping:
* source="user"HumanMessage
* source="agent"AIMessage (with optional ToolCall list)
followed by ToolMessage per observation result
* source="system"HumanMessage (best-effort; RAGAS has no SystemMessage)

@AnuradhaKaruppiah AnuradhaKaruppiah added the DO NOT MERGE PR should not be merged; see PR for details label Mar 11, 2026
@AnuradhaKaruppiah
Copy link
Contributor

@yczhang-nv thx for making these changes, I will be taking this over for integration with the new ragas version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE PR should not be merged; see PR for details feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants