test(nexus): improve robustness of json extraction utility#78
test(nexus): improve robustness of json extraction utility#78JuanCS-Dev wants to merge 3 commits intomainfrom
Conversation
- Add comprehensive test suite `tests/nexus/test_utils.py` for `tools/nexus/utils.py`. - Fix `_try_parse_json_loose` to correctly handle multiple closing braces/brackets by iterating backwards through all occurrences, instead of only checking the last one. - This improves recovery from truncated or noisy model outputs. Co-authored-by: JuanCS-Dev <227056558+JuanCS-Dev@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Summary of ChangesHello @JuanCS-Dev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the robustness and test coverage of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly improves the robustness of the _try_parse_json_loose utility by fixing a bug in its truncation recovery logic and introducing a comprehensive suite of unit tests. The changes are well-implemented and the tests cover a wide range of scenarios. My review includes suggestions to further enhance the new test code by using pytest parameterization for better maintainability, and a minor refinement in the implementation for improved readability.
| class TestStripJsonFences: | ||
| """Tests for _strip_json_fences function.""" | ||
|
|
||
| def test_empty_string(self): | ||
| """Test with empty string.""" | ||
| assert _strip_json_fences("") == "" | ||
| assert _strip_json_fences(None) == "" | ||
|
|
||
| def test_plain_text(self): | ||
| """Test with plain text without fences.""" | ||
| assert _strip_json_fences("some text") == "some text" | ||
| assert _strip_json_fences(" some text ") == "some text" | ||
|
|
||
| def test_json_fence(self): | ||
| """Test with ```json fence.""" | ||
| text = "```json\n{\"key\": \"value\"}\n```" | ||
| expected = "{\"key\": \"value\"}" | ||
| assert _strip_json_fences(text) == expected | ||
|
|
||
| def test_generic_fence(self): | ||
| """Test with generic ``` fence.""" | ||
| text = "```\n{\"key\": \"value\"}\n```" | ||
| expected = "{\"key\": \"value\"}" | ||
| assert _strip_json_fences(text) == expected | ||
|
|
||
| def test_mixed_content(self): | ||
| """Test with content before and after fences.""" | ||
| text = "Here is the JSON:\n```json\n{\"key\": \"value\"}\n```\nHope this helps." | ||
| expected = "{\"key\": \"value\"}" | ||
| assert _strip_json_fences(text) == expected | ||
|
|
||
| def test_multiple_fences(self): | ||
| """Test that it takes the first fenced block.""" | ||
| text = "```json\n{\"first\": 1}\n```\n```json\n{\"second\": 2}\n```" | ||
| expected = "{\"first\": 1}" | ||
| assert _strip_json_fences(text) == expected |
There was a problem hiding this comment.
The tests for _strip_json_fences can be made more concise and maintainable by using pytest.mark.parametrize. This would consolidate all test cases into a single test function, reducing boilerplate and making it easier to add new cases in the future.
Additionally, test_empty_string currently tests _strip_json_fences(None). The function is type-hinted to accept a str, so passing None violates its contract. The refactored test below omits this case to adhere to the function's type signature.
@pytest.mark.parametrize(
("text", "expected"),
[
pytest.param("", "", id="empty_string"),
pytest.param("some text", "some text", id="plain_text"),
pytest.param(" some text ", "some text", id="plain_text_with_spaces"),
pytest.param("```json\n{\"key\": \"value\"}\n```", "{\"key\": \"value\"}", id="json_fence"),
pytest.param("```\n{\"key\": \"value\"}\n```", "{\"key\": \"value\"}", id="generic_fence"),
pytest.param(
"Here is the JSON:\n```json\n{\"key\": \"value\"}\n```\nHope this helps.",
"{\"key\": \"value\"}",
id="mixed_content",
),
pytest.param(
"```json\n{\"first\": 1}\n```\n```json\n{\"second\": 2}\n```",
"{\"first\": 1}",
id="multiple_fences",
),
],
)
def test_strip_json_fences(text, expected):
"""Tests for _strip_json_fences function with various inputs."""
assert _strip_json_fences(text) == expected| class TestTryParseJsonLoose: | ||
| """Tests for _try_parse_json_loose function.""" | ||
|
|
||
| def test_valid_json(self): | ||
| """Test with valid JSON string.""" | ||
| text = '{"key": "value", "number": 123}' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == {"key": "value", "number": 123} | ||
|
|
||
| def test_fenced_json(self): | ||
| """Test with fenced JSON block.""" | ||
| text = '```json\n{"key": "value"}\n```' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == {"key": "value"} | ||
|
|
||
| def test_json_with_chatter(self): | ||
| """Test with leading/trailing chatter.""" | ||
| text = 'Here is the output: {"key": "value"} ends here.' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == {"key": "value"} | ||
|
|
||
| def test_array_json(self): | ||
| """Test with JSON array.""" | ||
| text = '[1, 2, 3]' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == [1, 2, 3] | ||
|
|
||
| def test_array_with_chatter(self): | ||
| """Test with JSON array surrounded by text.""" | ||
| text = 'List: [1, 2, 3] end.' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == [1, 2, 3] | ||
|
|
||
| def test_truncated_json_recovery(self): | ||
| """Test recovery from truncated JSON.""" | ||
| # Valid object followed by garbage | ||
| text = '{"key": "value"} truncated...' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == {"key": "value"} | ||
|
|
||
| def test_multiple_objects_recovery(self): | ||
| """Test recovery when multiple JSON-like structures exist.""" | ||
| # Should pick the largest valid JSON starting from the first { | ||
| # strict JSON does not allow multiple objects at root, so it might fail if we try to parse '{"a":1}{"b":2}' | ||
| # But our recovery strategy tries slicing from the END. | ||
|
|
||
| # Case 1: Extra text after valid JSON | ||
| text = '{"a": 1} and then {"b": 2}' | ||
| # It starts at first {, so candidate is '{"a": 1} and then {"b": 2}' | ||
| # rfind('}') gives end of {"b": 2}. slice is '{"a": 1} and then {"b": 2}'. Invalid. | ||
| # Next rfind('}') is end of {"a": 1}. slice is '{"a": 1}'. Valid. | ||
| result = _try_parse_json_loose(text) | ||
| assert result == {"a": 1} | ||
|
|
||
| def test_nested_truncation_recovery(self): | ||
| """Test recovery from nested truncated JSON.""" | ||
| # It tries to find the last `}` or `]` and slice there. | ||
| text = '{"a": 1, "b": 2} some extra junk' | ||
| result = _try_parse_json_loose(text) | ||
| assert result == {"a": 1, "b": 2} | ||
|
|
||
| def test_failed_recovery(self): | ||
| """Test when recovery fails.""" | ||
| text = '{"key": "value' # Completely invalid | ||
| with pytest.raises(json.JSONDecodeError): | ||
| _try_parse_json_loose(text) | ||
|
|
||
| def test_empty_input(self): | ||
| """Test with empty input.""" | ||
| with pytest.raises(json.JSONDecodeError, match="Empty insight"): | ||
| _try_parse_json_loose("") | ||
|
|
||
| def test_no_json_start(self): | ||
| """Test with input having no JSON start characters.""" | ||
| text = "Just some text without braces or brackets." | ||
| with pytest.raises(json.JSONDecodeError, match="No JSON object/array start found"): | ||
| _try_parse_json_loose(text) | ||
|
|
||
| def test_malformed_json(self): | ||
| """Test with malformed JSON that looks like JSON.""" | ||
| text = '{"key": value}' # missing quotes around value | ||
| with pytest.raises(json.JSONDecodeError): | ||
| _try_parse_json_loose(text) |
There was a problem hiding this comment.
Similar to TestStripJsonFences, the tests in TestTryParseJsonLoose can be refactored using pytest.mark.parametrize. This separates successful parsing cases from failure cases into two clear, parameterized test functions, improving readability and maintainability.
@pytest.mark.parametrize(
("text", "expected"),
[
pytest.param('{"key": "value", "number": 123}', {"key": "value", "number": 123}, id="valid_json"),
pytest.param('```json\n{"key": "value"}\n```', {"key": "value"}, id="fenced_json"),
pytest.param('Here is the output: {"key": "value"} ends here.', {"key": "value"}, id="json_with_chatter"),
pytest.param('[1, 2, 3]', [1, 2, 3], id="array_json"),
pytest.param('List: [1, 2, 3] end.', [1, 2, 3], id="array_with_chatter"),
pytest.param('{"key": "value"} truncated...', {"key": "value"}, id="truncated_json_recovery"),
pytest.param('{"a": 1} and then {"b": 2}', {"a": 1}, id="multiple_objects_recovery"),
pytest.param('{"a": 1, "b": 2} some extra junk', {"a": 1, "b": 2}, id="nested_truncation_recovery"),
],
)
def test_try_parse_json_loose_success(text, expected):
"""Tests for _try_parse_json_loose function that should succeed."""
assert _try_parse_json_loose(text) == expected
@pytest.mark.parametrize(
("text", "match"),
[
pytest.param('{"key": "value', None, id="failed_recovery_invalid"),
pytest.param("", "Empty insight", id="empty_input"),
pytest.param("Just some text without braces or brackets.", "No JSON object/array start found", id="no_json_start"),
pytest.param('{"key": value}', None, id="malformed_json"),
],
)
def test_try_parse_json_loose_failure(text, match):
"""Tests for _try_parse_json_loose function that should fail."""
with pytest.raises(json.JSONDecodeError, match=match):
_try_parse_json_loose(text)| potential_ends = [] | ||
| for i, char in enumerate(candidate): | ||
| if char in ("}", "]"): | ||
| potential_ends.append(i) |
There was a problem hiding this comment.
- Fix `quality.yml`: Remove unsupported `--fail D` argument from `radon cc`. - Fix `security.yml`: Update command-line search and `bandit` scan paths to `packages/vertice-core/src/vertice_core`. - Fix `tui_e2e.yaml`: Set correct `PYTHONPATH` to `packages/vertice-core/src`. - Fix `vertice-mcp-cicd.yaml`: Update paths for basic validation checks. - Fix linting errors in `repl.py`: Add missing tool imports. - Fix linting errors in `uncertainty.py`: Rename ambiguous variable `l` to `logit`. - Fix linting errors in `llm_router.py`: Remove unused `ProviderType` import. - Fix linting errors in `coordination.py`: Resolve duplicate dictionary key and unused imports. - Fix linting errors in `tools.py`: Use `is` for type comparisons. - Fix linting errors in `shell_main.py`: Add missing imports for tool classes and TUI components. Co-authored-by: JuanCS-Dev <227056558+JuanCS-Dev@users.noreply.github.com>
- Fix ambiguous variable names (l -> line, left, lightness) - Remove unused imports and definitions - Fix undefined names in validator.py and prompts.py - Handle bare except clauses - Fix CI workflow paths for npm and python tests - Ensure packages/vertice-core is installed in test workflow Co-authored-by: JuanCS-Dev <227056558+JuanCS-Dev@users.noreply.github.com>
Improved the testing and robustness of the
_try_parse_json_looseutility function intools/nexus/utils.py.Testing Gap Addressed:
Changes:
tests/nexus/test_utils.pywith 18 test cases covering:_strip_json_fences: empty strings, plain text, fenced blocks, mixed content._try_parse_json_loose: valid JSON, fenced JSON, chatter, arrays, truncation recovery, nested structures, and error handling._try_parse_json_loosewhere it only considered the last closing brace/bracket for truncation recovery. The fix now iterates backwards through all potential closing positions to find the longest valid JSON prefix.Result:
tests/nexus/test_utils.pypasses with 100% success rate.PR created automatically by Jules for task 5929303825375036850 started by @JuanCS-Dev