test: fix flaky key-order assertions in structured output tests#2537
Conversation
|
Hey @BrendanWalsh 👋! We use semantic commit messages to streamline the release process. Examples of commit messages with semantic prefixes:
To test your commit locally, please follow our guild on building from source. |
There was a problem hiding this comment.
Pull request overview
This PR updates the Python OpenAI integration tests to avoid flaky assertions about JSON key ordering in model responses, shifting the tests to validate structured output correctness (valid JSON + expected keys/types) instead of response property order.
Changes:
- Replaced response key-order assertions (
text.find(...)) with JSON parsing and key/type validation. - Added a shared
_assert_valid_responsehelper documenting why key-order is not asserted. - Removed an unused
osimport.
cognitive/src/test/python/synapsemltest/services/openai/test_ResponseFormatOrder.py
Outdated
Show resolved
Hide resolved
6aa723d to
a0aa012
Compare
Remove non-deterministic assertions on JSON key ordering in model responses. LLMs do not guarantee response key order matches schema property order. Tests now validate structured output correctness (valid JSON with expected keys and types) instead. Rename module/class/methods to reflect the new intent: - test_ResponseFormatOrder.py -> test_StructuredOutput.py - TestResponseFormatOrder -> TestStructuredOutput - test_*_reason_then_ans -> test_*_structured_output_reason_first - test_*_ans_then_reason -> test_*_structured_output_ans_first Extract shared _make_prompt helper to reduce test boilerplate. Schema ordering preservation is already covered by the Scala ResponseFormatOrderSuite unit tests.
a0aa012 to
391ac2c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2537 +/- ##
=======================================
Coverage 84.61% 84.61%
=======================================
Files 335 335
Lines 17708 17708
Branches 1612 1612
=======================================
Hits 14984 14984
Misses 2724 2724 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Problem
The 4 tests in
test_ResponseFormatOrder.pyassert that JSON keys in the model response appear in a specific order matching the schema property order. However, OpenAI does not guarantee that response keys will follow the schema ordering, making these tests non-deterministic.CI failure (ADO build 213040246, test run 837042988):
Fix
text.find()key-order assertions with JSON parsing + key presence checksreasonandans) and correct types_assert_valid_responsehelper with a docstring explaining the rationaleosimportWhy this is correct
The library feature (preserving schema property ordering in the API request via LinkedHashMap) is already thoroughly tested by the Scala unit tests in
ResponseFormatOrderSuite.scala, which verify serialization ordering without hitting the API. The Python integration tests should verify that structured output works (returns valid JSON matching the schema), not that the model happens to return keys in a particular order.