Skip to content

Fix: Resolve JSON malformation causing infinite loops and TypeError#1037

Closed
gdeyoung wants to merge 0 commit intoagent0ai:mainfrom
gdeyoung:main
Closed

Fix: Resolve JSON malformation causing infinite loops and TypeError#1037
gdeyoung wants to merge 0 commit intoagent0ai:mainfrom
gdeyoung:main

Conversation

@gdeyoung
Copy link
Copy Markdown
Contributor

Summary

Fixed multiple critical bugs causing JSON malformation, infinite loops, and TypeError crashes in Agent Zero.

Issues Fixed

1. JSON Object Extraction Bug (rfind)

File: python/helpers/extract_tools.py - Used rfind to find LAST closing brace instead of matching one
Fix: Proper nested brace tracking using depth counter

2. Escape Handling Logic Error

File: python/helpers/extract_tools.py - Escaped quotes not toggling in_string flag
Fix: Proper check for escaped quotes

3. No Loop Protection

File: agent.py - No protection against consecutive misformat errors
Fix: Added consecutive_misformat counter with 5-attempt limit + HandledException

4. TypeError: tool_args must be a mapping

File: agent.py - .get() returns string if key exists with string value
Fix: isinstance(tool_args, dict) validation

@gdeyoung
Copy link
Copy Markdown
Contributor Author

Additional Fix v1.2 - HandledException Shadowing

Issue Found

A duplicate local class definition of HandledException in agent.py was shadowing the imported class from python.helpers.errors. This caused:

  • isinstance(exception, HandledException) check to fail in handle_critical_exception
  • Unhandled exceptions causing crashes instead of graceful loop termination

Root Cause

  • Line 36: from python.helpers.errors import RepairableException, HandledException ✅ (import)
  • Line 353: class HandledException(Exception): pass ❌ (duplicate local definition)

Fix Applied

Removed the duplicate class definition (lines 349-355) in agent.py. Now uses only the imported HandledException from errors.py.

Files Changed

  • agent.py: Removed duplicate HandledException class definition

Verification

  • Python syntax validated
  • Import now works correctly (same class object used throughout)
  • Exception handling now works as intended

Added: 2026-02-14

@longman391
Copy link
Copy Markdown

I can confirm this is a critical bug. I traced tool call failures across multiple chat sessions on my instance (v0.9.8.1, Claude Opus 4.6 via GitHub Copilot, 128k context) and found 105 empty tool_name failures in a single log file — all caused by the rfind bug in extract_json_object_string().

Evidence from my logs:

  • Chat mYlPuJkf: 38 consecutive 'Tool not found' errors (messages 131-168) — the agent was stuck in an infinite loop of malformed output → error → retry
  • The extract_json_object_string() function grabs everything between the first { and last }, which means incidental curly braces in LLM output (file paths like /restore/{backup_id}/, inline JSON examples, etc.) get misinterpreted as tool calls with empty tool_name
  • I verified this by testing DirtyJson directly: input 'The backup is at /restore/{backup_id}/files' parses to {'backup_id': ''} with no tool_name → triggers 'Tool not found'

Additional issue not covered by this PR: The LLM frequently hallucinates tool names from training data instead of using the actual tool names in the system prompt:

  • code_execution instead of code_execution_tool
  • web_search instead of search_engine
  • browser_tool instead of browser_agent
  • response_tool / message_tool instead of response
  • terminal instead of code_execution_tool

A simple alias mapping in get_tool() would catch these. Happy to submit a PR for that.

The consecutive misformat counter in this PR would have prevented the 38-message infinite loop. Please consider merging this — it's a significant stability improvement.

longman391 added a commit to longman391/agent-zero that referenced this pull request Feb 20, 2026
Fix agent0ai#3 - Empty tool_name validation:
- When DirtyJson parses valid JSON but the object has no tool_name field,
  the agent previously dispatched with an empty string, triggering
  'Tool  not found' errors. Now treats this as a misformat and increments
  the consecutive_misformat counter (integrates with PR agent0ai#1037's circuit breaker).
- Evidence: 105 empty tool_name failures found in a single log session.

Fix agent0ai#4 - Tool name alias mapping:
- LLMs frequently hallucinate tool names from training data instead of
  using the actual names in the system prompt. Added TOOL_ALIASES dict
  that maps common hallucinated names to actual Agent Zero tool names:
  - code_execution/terminal/shell -> code_execution_tool
  - web_search/search -> search_engine
  - browser_tool/browser -> browser_agent
  - response_tool/message_tool/message/reply -> response
  - knowledge_tool/memory_tool -> memory
  - task_manager -> scheduler
- Evidence: 20+ hallucinated tool name failures across multiple chat logs.

Related: agent0ai#1031, agent0ai#805
longman391 added a commit to longman391/agent-zero that referenced this pull request Feb 20, 2026
…ng resistance

Claude subordinates interpret Agent Zero system prompt as "prompt injection"
and refuse to output JSON, causing infinite misformat loops (even with
circuit breaker from PR agent0ai#1037).

GLM-5 reliably follows JSON formatting instructions and is capable
enough for agentic subordinate work.

Uses existing initialize_agent(override_settings=) mechanism.
@PaoloC68
Copy link
Copy Markdown
Contributor

Hi @gdeyoung @frdel — I can independently confirm all three bugs here. I've been hitting the misformat infinite loop repeatedly with MiniMax M2.5 and GLM-5 on OpenRouter, and traced the root cause through the same path:

  1. rfind bug: extract_json_object_string() grabs the last } instead of the matching one — confirmed this corrupts tool JSON when model output contains incidental braces (code examples, file paths, etc.)

  2. No loop protection: My instance hit 4+ consecutive misformat warnings with no break. The agent burns through its entire context retrying the same malformed pattern.

  3. ValueError crash (related): The validate_tool_request added in 1b89a0d raises plain ValueError instead of RepairableException, which makes the crash even worse — the agent can't recover at all. I filed a separate fix for this: fix: use RepairableException in validate_tool_request for graceful recovery #1242.

I've deployed all three fixes (rfind, circuit breaker, RepairableException) on my production instance and they work together correctly:

  • The rfind fix reduces false misformat detections significantly
  • When a model still produces malformed JSON, the circuit breaker stops after 5 attempts instead of looping forever
  • The RepairableException fix lets the agent retry on validation failures instead of hard-crashing

Note: @Krashnicov's review about the escape handling bug in this PR is correct — escaped quotes should NOT toggle in_string. The fix is simply continue on escape_next without any toggle. I used the corrected version in my deployment.

@frdel this is probably the highest-impact stability fix pending for the project — multiple users are hitting this (#624, #841, #1031, #1234, #1241). Would love to see this reviewed and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants