Skip to content

Eval samples test (3): Add inline-data evaluation samples#44950

Merged
aprilk-ms merged 4 commits intomainfrom
aprilk/eval-sample-test-3
Feb 3, 2026
Merged

Eval samples test (3): Add inline-data evaluation samples#44950
aprilk-ms merged 4 commits intomainfrom
aprilk/eval-sample-test-3

Conversation

@aprilk-ms
Copy link
Member

Summary

Add 3 new inline-data evaluation samples and include them in the sample recording tests to increase test coverage.

Changes

  • Add \sample_evaluations_graders_with_inline_data.py\ - Tests label_model, text_similarity, string_check, and score_model graders
  • Add \sample_evaluations_ai_assisted_with_inline_data.py\ - Tests Similarity, ROUGE, METEOR, GLEU, F1, and BLEU evaluators
  • Add \sample_evaluation_cluster_insight_with_inline_data.py\ - Tests cluster insight generation
  • Update \ est_samples_evaluations.py\ to include the 3 new samples (coverage: 25 → 28 samples)
  • Update \�ssets.json\ with new recordings

Why inline-data versions?

The original samples (\sample_evaluations_graders.py, \sample_evaluations_ai_assisted.py, \sample_evaluation_cluster_insight.py) require file uploads to Azure Blob Storage, which is incompatible with test proxy playback. These inline-data versions test the same evaluation functionality without file upload dependencies.

Testing

  • ✅ Live recording completed
  • ✅ Playback tests pass (all 3 new samples)

@aprilk-ms aprilk-ms changed the title Add inline-data evaluation samples to increase test coverage Eval samples test (3): Add inline-data evaluation samples Feb 1, 2026
@aprilk-ms aprilk-ms force-pushed the aprilk/eval-sample-test-3 branch from 7d31678 to 7626e53 Compare February 1, 2026 07:47
- Update sample_evaluations_graders.py to use inline data instead of file upload
- Update sample_evaluations_ai_assisted.py to use inline data instead of file upload
- Update sample_evaluation_cluster_insight.py to use inline data instead of file upload
- Add samples/evaluations/README.md as an index for all evaluation samples
- Update test_samples_evaluations.py to test the updated samples
- Remove unnecessary inline-data sample duplicates
- Update assets.json with new recordings
@aprilk-ms aprilk-ms force-pushed the aprilk/eval-sample-test-3 branch from 7626e53 to 883fd24 Compare February 2, 2026 06:25
@aprilk-ms aprilk-ms marked this pull request as ready for review February 2, 2026 18:20
Copilot AI review requested due to automatic review settings February 2, 2026 18:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies three existing evaluation samples to use inline data instead of file uploads to Azure Blob Storage, enabling them to be included in automated test coverage with test proxy playback.

Changes:

  • Converted sample_evaluations_graders.py, sample_evaluations_ai_assisted.py, and sample_evaluation_cluster_insight.py to use inline data instead of dataset file uploads
  • Added these three samples to the test suite (increasing test coverage from 25 to 28 samples)
  • Added comprehensive README.md documentation for the evaluations samples folder
  • Updated test recordings in assets.json

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test_samples_evaluations.py Updated docstring to reflect new sample count (25→28) and added three samples to test list; removed these samples from the excluded list
sample_evaluations_graders.py Removed dataset upload code and DatasetVersion import; converted to use SourceFileContent with inline data; fixed "and and" typo in description
sample_evaluations_ai_assisted.py Removed dataset upload code and DatasetVersion import; converted to use SourceFileContent with inline data; fixed "and and" typo in description
sample_evaluation_cluster_insight.py Removed dataset upload, temp file creation, and unused imports (json, tempfile); converted to use SourceFileContent with inline data
README.md Added comprehensive documentation for all evaluation samples with categorized tables and links
assets.json Updated test recording tag to include new recordings

@aprilk-ms aprilk-ms merged commit 28024b6 into main Feb 3, 2026
21 checks passed
@aprilk-ms aprilk-ms deleted the aprilk/eval-sample-test-3 branch February 3, 2026 20:51
aprilk-ms added a commit that referenced this pull request Feb 3, 2026
…4950)

- Update sample_evaluations_graders.py to use inline data instead of file upload
- Update sample_evaluations_ai_assisted.py to use inline data instead of file upload
- Update sample_evaluation_cluster_insight.py to use inline data instead of file upload
- Add samples/evaluations/README.md as an index for all evaluation samples
- Update test_samples_evaluations.py to test the updated samples
- Remove unnecessary inline-data sample duplicates
- Update assets.json with new recordings
aprilk-ms added a commit that referenced this pull request Feb 3, 2026
…4950)

- Update sample_evaluations_graders.py to use inline data instead of file upload
- Update sample_evaluations_ai_assisted.py to use inline data instead of file upload
- Update sample_evaluation_cluster_insight.py to use inline data instead of file upload
- Add samples/evaluations/README.md as an index for all evaluation samples
- Update test_samples_evaluations.py to test the updated samples
- Remove unnecessary inline-data sample duplicates
- Update assets.json with new recordings
aprilk-ms added a commit that referenced this pull request Feb 3, 2026
…4950) (#44983)

- Update sample_evaluations_graders.py to use inline data instead of file upload
- Update sample_evaluations_ai_assisted.py to use inline data instead of file upload
- Update sample_evaluation_cluster_insight.py to use inline data instead of file upload
- Add samples/evaluations/README.md as an index for all evaluation samples
- Update test_samples_evaluations.py to test the updated samples
- Remove unnecessary inline-data sample duplicates
- Update assets.json with new recordings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants