Add agent_chat tool for agentic workflows with tool-calling#2
Open
lambertmt wants to merge 8 commits intoopenconstruct:mainfrom
Open
Add agent_chat tool for agentic workflows with tool-calling#2lambertmt wants to merge 8 commits intoopenconstruct:mainfrom
lambertmt wants to merge 8 commits intoopenconstruct:mainfrom
Conversation
Features: - New agent_chat tool with stateful conversation management - Tool definition schema for describing available tools - Few-shot prompt format for reliable JSON tool call output - Multi-strategy JSON parsing (code blocks, inline, permissive) - Conversation continuation with tool results - Automatic conversation cleanup after 30 minutes - list_conversations debug tool Enables Claude to delegate tasks to local LLMs while maintaining control of tool execution - a cost-effective hybrid approach. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The parseToolCall method now properly handles:
- Nested JSON objects (e.g., {"tool": "x", "arguments": {...}})
- Braces inside string values (e.g., "command": "awk '{print}'")
Replaced regex-based extraction with balanced brace tracking that
respects string escaping, enabling reliable tool call detection
from LLM output.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove dangerous single-quote replacement in tryParseToolJson that broke JSON containing single quotes in string values (e.g., awk '$3') - Add DEBUG_MCP env var to enable detailed logging of: - parseToolCall input/output and strategies - agent_chat conversation flow and LLM responses - Try JSON parsing strategies in order: as-is, trailing comma fix Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major changes: - agent_chat now auto-executes ssh_exec internally (no CC middleman) - Agentic loop runs until final_answer or max_iterations - Strict prompt guidelines for clean JSON/text output formatting - Remove max_tokens limit (local tokens are free with 128K context) - ssh_exec added as built-in tool automatically - Reports tools_executed in response for transparency This enables massive CC token savings - raw tool output (e.g., logs) never touches Claude's context, only the final analysis is returned. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents real-world testing showing 70-90% Claude token savings: - Log analysis: 15,000 tokens → 1,500 tokens (90% reduction) - System health check: 15,000 tokens → 4,500 tokens (70% reduction) Includes architecture diagrams, usage examples, and configuration guide for the autonomous agent with internal tool execution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add "How the Token Math Works" section explaining token breakdown - Correct savings percentages: 40-80% (was 70-90%) - Add Local Tokens column to comparison tables - Update video script with consistent numbers and explanations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show tokens shifting from Claude (paid) to Local LLM (free) - Add security audit (93%) and Docker logs (95%) as top examples - Update all claims to "up to 95%" based on actual testing - Include 8 test cases sorted by Claude Direct tokens - Add test-scripts/health_check.sh generated by agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Acknowledge CC Token Saver, Ollama Claude, Rubber Duck MCP as prior art - Position as infrastructure-focused implementation, not novel invention - Add comparison tables: when to use this vs other options - Add PORTABILITY.md for others evaluating the tool - Add posts/ with Reddit and Substack drafts - Update video script with honest intro and metadata - Add Nextcloud debugging use case to stats table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for multi-turn agent conversations where the local LLM can request tool calls that the orchestrating system (Claude) executes.
Features
agent_chattool - Start or continue agent conversations with tool-calling supportlist_conversationsdebug tool - Monitor active agent sessionsUse Case
This enables a hybrid architecture where Claude (or other orchestrators) can:
Example
Testing
Tested with gpt-oss-120b model via llama-server. The few-shot prompt format produces reliable JSON tool calls.
Test Plan
🤖 Generated with Claude Code