Skip to content

test(nexus): improve robustness of json extraction utility#78

Closed
JuanCS-Dev wants to merge 3 commits intomainfrom
testing-improvement-nexus-utils-5929303825375036850
Closed

test(nexus): improve robustness of json extraction utility#78
JuanCS-Dev wants to merge 3 commits intomainfrom
testing-improvement-nexus-utils-5929303825375036850

Conversation

@JuanCS-Dev
Copy link
Owner

Improved the testing and robustness of the _try_parse_json_loose utility function in tools/nexus/utils.py.

Testing Gap Addressed:

  • The utility lacked dedicated unit tests, relying on implicit usage in other modules.
  • Scenarios like fenced code blocks, leading/trailing chatter, and complex truncation were not explicitly covered.

Changes:

  • Created tests/nexus/test_utils.py with 18 test cases covering:
    • _strip_json_fences: empty strings, plain text, fenced blocks, mixed content.
    • _try_parse_json_loose: valid JSON, fenced JSON, chatter, arrays, truncation recovery, nested structures, and error handling.
  • Identified and fixed a bug in _try_parse_json_loose where it only considered the last closing brace/bracket for truncation recovery. The fix now iterates backwards through all potential closing positions to find the longest valid JSON prefix.

Result:

  • tests/nexus/test_utils.py passes with 100% success rate.
  • The utility is now more robust against imperfect LLM outputs containing multiple JSON-like structures or garbage text containing closing braces.

PR created automatically by Jules for task 5929303825375036850 started by @JuanCS-Dev

- Add comprehensive test suite `tests/nexus/test_utils.py` for `tools/nexus/utils.py`.
- Fix `_try_parse_json_loose` to correctly handle multiple closing braces/brackets by iterating backwards through all occurrences, instead of only checking the last one.
- This improves recovery from truncated or noisy model outputs.

Co-authored-by: JuanCS-Dev <227056558+JuanCS-Dev@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@github-actions github-actions bot added the testing Testing related label Feb 12, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @JuanCS-Dev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness and test coverage of the _try_parse_json_loose utility, which is vital for reliably extracting JSON data from potentially messy or incomplete text, especially from LLM responses. It introduces a comprehensive suite of unit tests and addresses a critical bug in the JSON truncation recovery mechanism, ensuring more accurate and resilient parsing.

Highlights

  • New Test Suite: A dedicated test file tests/nexus/test_utils.py was added, introducing 18 new test cases for _strip_json_fences and _try_parse_json_loose.
  • Bug Fix: The _try_parse_json_loose function was updated to correctly handle JSON truncation recovery by iterating through all potential closing brace/bracket positions from the end, rather than just the last one.
  • Improved Robustness: The utility functions are now more resilient to imperfect LLM outputs, including fenced code blocks, leading/trailing text, and complex truncation scenarios.
Changelog
  • tests/nexus/test_utils.py
    • Added a new test file containing unit tests for _strip_json_fences and _try_parse_json_loose.
    • Included tests for various scenarios like empty strings, plain text, fenced JSON, mixed content, valid JSON, JSON with chatter, array JSON, and truncation recovery.
    • Added tests for error handling, including empty input, no JSON start, and malformed JSON.
  • tools/nexus/utils.py
    • Modified the _try_parse_json_loose function to improve its truncation recovery logic.
    • Changed the method for identifying potential JSON end positions to iterate through all closing braces/brackets from the end of the candidate string.
Activity
  • No human activity has been recorded on this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the robustness of the _try_parse_json_loose utility by fixing a bug in its truncation recovery logic and introducing a comprehensive suite of unit tests. The changes are well-implemented and the tests cover a wide range of scenarios. My review includes suggestions to further enhance the new test code by using pytest parameterization for better maintainability, and a minor refinement in the implementation for improved readability.

Comment on lines +9 to +44
class TestStripJsonFences:
"""Tests for _strip_json_fences function."""

def test_empty_string(self):
"""Test with empty string."""
assert _strip_json_fences("") == ""
assert _strip_json_fences(None) == ""

def test_plain_text(self):
"""Test with plain text without fences."""
assert _strip_json_fences("some text") == "some text"
assert _strip_json_fences(" some text ") == "some text"

def test_json_fence(self):
"""Test with ```json fence."""
text = "```json\n{\"key\": \"value\"}\n```"
expected = "{\"key\": \"value\"}"
assert _strip_json_fences(text) == expected

def test_generic_fence(self):
"""Test with generic ``` fence."""
text = "```\n{\"key\": \"value\"}\n```"
expected = "{\"key\": \"value\"}"
assert _strip_json_fences(text) == expected

def test_mixed_content(self):
"""Test with content before and after fences."""
text = "Here is the JSON:\n```json\n{\"key\": \"value\"}\n```\nHope this helps."
expected = "{\"key\": \"value\"}"
assert _strip_json_fences(text) == expected

def test_multiple_fences(self):
"""Test that it takes the first fenced block."""
text = "```json\n{\"first\": 1}\n```\n```json\n{\"second\": 2}\n```"
expected = "{\"first\": 1}"
assert _strip_json_fences(text) == expected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tests for _strip_json_fences can be made more concise and maintainable by using pytest.mark.parametrize. This would consolidate all test cases into a single test function, reducing boilerplate and making it easier to add new cases in the future.

Additionally, test_empty_string currently tests _strip_json_fences(None). The function is type-hinted to accept a str, so passing None violates its contract. The refactored test below omits this case to adhere to the function's type signature.

@pytest.mark.parametrize(
    ("text", "expected"),
    [
        pytest.param("", "", id="empty_string"),
        pytest.param("some text", "some text", id="plain_text"),
        pytest.param("  some text  ", "some text", id="plain_text_with_spaces"),
        pytest.param("```json\n{\"key\": \"value\"}\n```", "{\"key\": \"value\"}", id="json_fence"),
        pytest.param("```\n{\"key\": \"value\"}\n```", "{\"key\": \"value\"}", id="generic_fence"),
        pytest.param(
            "Here is the JSON:\n```json\n{\"key\": \"value\"}\n```\nHope this helps.",
            "{\"key\": \"value\"}",
            id="mixed_content",
        ),
        pytest.param(
            "```json\n{\"first\": 1}\n```\n```json\n{\"second\": 2}\n```",
            "{\"first\": 1}",
            id="multiple_fences",
        ),
    ],
)
def test_strip_json_fences(text, expected):
    """Tests for _strip_json_fences function with various inputs."""
    assert _strip_json_fences(text) == expected

Comment on lines +47 to +129
class TestTryParseJsonLoose:
"""Tests for _try_parse_json_loose function."""

def test_valid_json(self):
"""Test with valid JSON string."""
text = '{"key": "value", "number": 123}'
result = _try_parse_json_loose(text)
assert result == {"key": "value", "number": 123}

def test_fenced_json(self):
"""Test with fenced JSON block."""
text = '```json\n{"key": "value"}\n```'
result = _try_parse_json_loose(text)
assert result == {"key": "value"}

def test_json_with_chatter(self):
"""Test with leading/trailing chatter."""
text = 'Here is the output: {"key": "value"} ends here.'
result = _try_parse_json_loose(text)
assert result == {"key": "value"}

def test_array_json(self):
"""Test with JSON array."""
text = '[1, 2, 3]'
result = _try_parse_json_loose(text)
assert result == [1, 2, 3]

def test_array_with_chatter(self):
"""Test with JSON array surrounded by text."""
text = 'List: [1, 2, 3] end.'
result = _try_parse_json_loose(text)
assert result == [1, 2, 3]

def test_truncated_json_recovery(self):
"""Test recovery from truncated JSON."""
# Valid object followed by garbage
text = '{"key": "value"} truncated...'
result = _try_parse_json_loose(text)
assert result == {"key": "value"}

def test_multiple_objects_recovery(self):
"""Test recovery when multiple JSON-like structures exist."""
# Should pick the largest valid JSON starting from the first {
# strict JSON does not allow multiple objects at root, so it might fail if we try to parse '{"a":1}{"b":2}'
# But our recovery strategy tries slicing from the END.

# Case 1: Extra text after valid JSON
text = '{"a": 1} and then {"b": 2}'
# It starts at first {, so candidate is '{"a": 1} and then {"b": 2}'
# rfind('}') gives end of {"b": 2}. slice is '{"a": 1} and then {"b": 2}'. Invalid.
# Next rfind('}') is end of {"a": 1}. slice is '{"a": 1}'. Valid.
result = _try_parse_json_loose(text)
assert result == {"a": 1}

def test_nested_truncation_recovery(self):
"""Test recovery from nested truncated JSON."""
# It tries to find the last `}` or `]` and slice there.
text = '{"a": 1, "b": 2} some extra junk'
result = _try_parse_json_loose(text)
assert result == {"a": 1, "b": 2}

def test_failed_recovery(self):
"""Test when recovery fails."""
text = '{"key": "value' # Completely invalid
with pytest.raises(json.JSONDecodeError):
_try_parse_json_loose(text)

def test_empty_input(self):
"""Test with empty input."""
with pytest.raises(json.JSONDecodeError, match="Empty insight"):
_try_parse_json_loose("")

def test_no_json_start(self):
"""Test with input having no JSON start characters."""
text = "Just some text without braces or brackets."
with pytest.raises(json.JSONDecodeError, match="No JSON object/array start found"):
_try_parse_json_loose(text)

def test_malformed_json(self):
"""Test with malformed JSON that looks like JSON."""
text = '{"key": value}' # missing quotes around value
with pytest.raises(json.JSONDecodeError):
_try_parse_json_loose(text)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to TestStripJsonFences, the tests in TestTryParseJsonLoose can be refactored using pytest.mark.parametrize. This separates successful parsing cases from failure cases into two clear, parameterized test functions, improving readability and maintainability.

@pytest.mark.parametrize(
    ("text", "expected"),
    [
        pytest.param('{"key": "value", "number": 123}', {"key": "value", "number": 123}, id="valid_json"),
        pytest.param('```json\n{"key": "value"}\n```', {"key": "value"}, id="fenced_json"),
        pytest.param('Here is the output: {"key": "value"} ends here.', {"key": "value"}, id="json_with_chatter"),
        pytest.param('[1, 2, 3]', [1, 2, 3], id="array_json"),
        pytest.param('List: [1, 2, 3] end.', [1, 2, 3], id="array_with_chatter"),
        pytest.param('{"key": "value"} truncated...', {"key": "value"}, id="truncated_json_recovery"),
        pytest.param('{"a": 1} and then {"b": 2}', {"a": 1}, id="multiple_objects_recovery"),
        pytest.param('{"a": 1, "b": 2} some extra junk', {"a": 1, "b": 2}, id="nested_truncation_recovery"),
    ],
)
def test_try_parse_json_loose_success(text, expected):
    """Tests for _try_parse_json_loose function that should succeed."""
    assert _try_parse_json_loose(text) == expected


@pytest.mark.parametrize(
    ("text", "match"),
    [
        pytest.param('{"key": "value', None, id="failed_recovery_invalid"),
        pytest.param("", "Empty insight", id="empty_input"),
        pytest.param("Just some text without braces or brackets.", "No JSON object/array start found", id="no_json_start"),
        pytest.param('{"key": value}', None, id="malformed_json"),
    ],
)
def test_try_parse_json_loose_failure(text, match):
    """Tests for _try_parse_json_loose function that should fail."""
    with pytest.raises(json.JSONDecodeError, match=match):
        _try_parse_json_loose(text)

Comment on lines +53 to +56
potential_ends = []
for i, char in enumerate(candidate):
if char in ("}", "]"):
potential_ends.append(i)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop can be expressed more concisely and pythonically using a list comprehension.

Suggested change
potential_ends = []
for i, char in enumerate(candidate):
if char in ("}", "]"):
potential_ends.append(i)
potential_ends = [i for i, char in enumerate(candidate) if char in ("}", "]")]

google-labs-jules bot and others added 2 commits February 12, 2026 11:54
- Fix `quality.yml`: Remove unsupported `--fail D` argument from `radon cc`.
- Fix `security.yml`: Update command-line search and `bandit` scan paths to `packages/vertice-core/src/vertice_core`.
- Fix `tui_e2e.yaml`: Set correct `PYTHONPATH` to `packages/vertice-core/src`.
- Fix `vertice-mcp-cicd.yaml`: Update paths for basic validation checks.
- Fix linting errors in `repl.py`: Add missing tool imports.
- Fix linting errors in `uncertainty.py`: Rename ambiguous variable `l` to `logit`.
- Fix linting errors in `llm_router.py`: Remove unused `ProviderType` import.
- Fix linting errors in `coordination.py`: Resolve duplicate dictionary key and unused imports.
- Fix linting errors in `tools.py`: Use `is` for type comparisons.
- Fix linting errors in `shell_main.py`: Add missing imports for tool classes and TUI components.

Co-authored-by: JuanCS-Dev <227056558+JuanCS-Dev@users.noreply.github.com>
- Fix ambiguous variable names (l -> line, left, lightness)
- Remove unused imports and definitions
- Fix undefined names in validator.py and prompts.py
- Handle bare except clauses
- Fix CI workflow paths for npm and python tests
- Ensure packages/vertice-core is installed in test workflow

Co-authored-by: JuanCS-Dev <227056558+JuanCS-Dev@users.noreply.github.com>
@JuanCS-Dev JuanCS-Dev closed this Feb 12, 2026
@JuanCS-Dev JuanCS-Dev deleted the testing-improvement-nexus-utils-5929303825375036850 branch February 12, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Testing related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant