Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics#1774
Add MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics#1774yczhang-nv wants to merge 2 commits intoNVIDIA:developfrom
MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics#1774Conversation
…metrics Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
WalkthroughAdds multi-turn evaluation support to Ragas evaluators by introducing sample type configuration, ATIF trajectory-to-Ragas message conversion helpers, configuration validation requiring ATIF evaluator enablement for multi-turn mode, and comprehensive test coverage for single-turn and multi-turn conversions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py (1)
144-159: Add return type hints to private methods.Both
_build_single_turn_sampleand_build_multi_turn_samplelack return type annotations. Per coding guidelines, public APIs require type hints, and while these are private methods, adding type hints improves code clarity and IDE support.♻️ Suggested type hints
- def _build_single_turn_sample(self, sample: AtifEvalSample): + def _build_single_turn_sample(self, sample: AtifEvalSample) -> "SingleTurnSample": """Build a RAGAS SingleTurnSample from an ATIF eval sample.""" from ragas import SingleTurnSample- def _build_multi_turn_sample(self, sample: AtifEvalSample): + def _build_multi_turn_sample(self, sample: AtifEvalSample) -> "MultiTurnSample": """Build a RAGAS MultiTurnSample from an ATIF eval sample.""" from ragas import MultiTurnSample🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py` around lines 144 - 159, Add explicit return type annotations to the private builder methods: annotate _build_single_turn_sample to return ragas.SingleTurnSample and annotate _build_multi_turn_sample to return ragas.MultiTurnSample (or the concrete types imported from ragas if already imported). Ensure the types are importable in the module (add "from ragas import SingleTurnSample, MultiTurnSample" or qualify with ragas.SingleTurnSample) and update any typing imports (e.g., from typing import Optional) if required by your editor/linters so IDEs and static checkers recognize the return types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@packages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.py`:
- Around line 144-159: Add explicit return type annotations to the private
builder methods: annotate _build_single_turn_sample to return
ragas.SingleTurnSample and annotate _build_multi_turn_sample to return
ragas.MultiTurnSample (or the concrete types imported from ragas if already
imported). Ensure the types are importable in the module (add "from ragas import
SingleTurnSample, MultiTurnSample" or qualify with ragas.SingleTurnSample) and
update any typing imports (e.g., from typing import Optional) if required by
your editor/linters so IDEs and static checkers recognize the return types.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a1f107d5-d7f1-4a0d-a301-64de77bca6c6
📒 Files selected for processing (6)
docs/source/extend/custom-components/custom-evaluator.mddocs/source/improve-workflows/evaluate.mdexamples/evaluation_and_profiling/simple_web_query_eval/atif-eval-readme.mdpackages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/atif_evaluate.pypackages/nvidia_nat_ragas/src/nat/plugins/ragas/rag_evaluator/register.pypackages/nvidia_nat_ragas/tests/test_rag_evaluate.py
MultiTurnSample conversion path for RAGAS trajectory-aware agent metrics
| Mapping: | ||
| source="user" → HumanMessage | ||
| source="agent" → AIMessage (with optional ToolCall list) | ||
| followed by ToolMessage per observation result | ||
| source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage) |
There was a problem hiding this comment.
I believe this should fix the documentation build failure
| Mapping: | |
| source="user" → HumanMessage | |
| source="agent" → AIMessage (with optional ToolCall list) | |
| followed by ToolMessage per observation result | |
| source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage) | |
| Mapping: | |
| * source="user" → HumanMessage | |
| * source="agent" → AIMessage (with optional ToolCall list) | |
| followed by ToolMessage per observation result | |
| * source="system" → HumanMessage (best-effort; RAGAS has no SystemMessage) |
|
@yczhang-nv thx for making these changes, I will be taking this over for integration with the new ragas version |
Description
The RAGAS ATIF adapter only converted to
SingleTurnSample, limiting evaluation to metrics likeAnswerAccuracy. Trajectory-aware metrics (AgentGoalAccuracyWithoutReference,ToolCallAccuracy) requireMultiTurnSampleand previously failed with a validation error.This PR adds a
MultiTurnSampleconversion path alongside the existingSingleTurnSamplepath, controlled by a newsample_typeconfig field (single_turn|multi_turn).Closes
By Submitting this PR I confirm:
Summary by CodeRabbit
Documentation
New Features
sample_typeconfiguration option (defaults to single-turn for backward compatibility).Tests