-
Notifications
You must be signed in to change notification settings - Fork 0
bug(cli): --threshold compares mean score instead of per-test score #882
Copy link
Copy link
Closed
Description
Problem
--threshold compares the mean score against the threshold, but RESULT: uses per-test pass/fail (score >= 0.8). This produces contradictory output and exit codes:
RESULT: FAIL (28/31 passed, mean score: 0.927)
Suite score: 0.93 (threshold: 0.80) — PASS ← exit code 0
The output says FAIL but the exit code is 0. Users expect --threshold 0.8 to mean "each test must score >= 0.8" — matching the per-test requirement.
Root cause
formatEvaluationSummary()— per-test pass/fail (score >= hardcoded 0.8)formatThresholdSummary()— mean score comparison- Exit code follows threshold (mean-based), not RESULT (per-test)
Fix
PR #885 — --threshold now overrides the per-test score requirement:
calculateEvaluationSummary()recomputes passed/failed using the threshold- RESULT line shows the threshold:
28/31 scored >= 0.8 - Exit code matches RESULT verdict
- Removed separate
formatThresholdSummary()— one unified output line
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels