Add agent_chat tool for agentic workflows with tool-calling by lambertmt · Pull Request #2 · openconstruct/llama-mcp-server

lambertmt · 2026-01-13T17:43:28Z

Summary

This PR adds support for multi-turn agent conversations where the local LLM can request tool calls that the orchestrating system (Claude) executes.

Features

agent_chat tool - Start or continue agent conversations with tool-calling support
Stateful conversation management - Conversations persist with unique IDs for multi-turn workflows
Few-shot prompt format - Reliable JSON tool call output from instruction-following models
Multi-strategy JSON parsing - Handles code blocks, inline JSON, and permissive matching
Tool result continuation - Feed tool results back to continue the conversation
Auto-cleanup - Conversations expire after 30 minutes
list_conversations debug tool - Monitor active agent sessions

Use Case

This enables a hybrid architecture where Claude (or other orchestrators) can:

Delegate analysis/reasoning tasks to local LLMs
Receive structured tool call requests from the local model
Execute tools using the orchestrator's capabilities
Feed results back to continue the agent loop
Get final answers at zero API cost for the delegated reasoning

Example

// Start agent conversation
agent_chat({
  task: "Find Python files and count lines",
  tools: [{ name: "run_command", description: "Run shell command", parameters: {...} }],
  context: "Working in /home/user/project"
})

// Returns: { type: "tool_call", tool_call: { name: "run_command", arguments: {...} }, conversation_id: "..." }

// Continue with tool result
agent_chat({
  conversation_id: "...",
  tool_result: { tool_name: "run_command", result: "file1.py\nfile2.py" }
})

// Returns: { type: "final_answer", content: "Found 2 Python files..." }

Testing

Tested with gpt-oss-120b model via llama-server. The few-shot prompt format produces reliable JSON tool calls.

Test Plan

TypeScript compiles without errors
Tool call JSON parsing handles multiple formats
Conversation state persists across calls
Direct llama-server testing confirms tool call output
Full integration test with Claude Code as orchestrator

🤖 Generated with Claude Code

Features: - New agent_chat tool with stateful conversation management - Tool definition schema for describing available tools - Few-shot prompt format for reliable JSON tool call output - Multi-strategy JSON parsing (code blocks, inline, permissive) - Conversation continuation with tool results - Automatic conversation cleanup after 30 minutes - list_conversations debug tool Enables Claude to delegate tasks to local LLMs while maintaining control of tool execution - a cost-effective hybrid approach. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The parseToolCall method now properly handles: - Nested JSON objects (e.g., {"tool": "x", "arguments": {...}}) - Braces inside string values (e.g., "command": "awk '{print}'") Replaced regex-based extraction with balanced brace tracking that respects string escaping, enabling reliable tool call detection from LLM output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove dangerous single-quote replacement in tryParseToolJson that broke JSON containing single quotes in string values (e.g., awk '$3') - Add DEBUG_MCP env var to enable detailed logging of: - parseToolCall input/output and strategies - agent_chat conversation flow and LLM responses - Try JSON parsing strategies in order: as-is, trailing comma fix Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Major changes: - agent_chat now auto-executes ssh_exec internally (no CC middleman) - Agentic loop runs until final_answer or max_iterations - Strict prompt guidelines for clean JSON/text output formatting - Remove max_tokens limit (local tokens are free with 128K context) - ssh_exec added as built-in tool automatically - Reports tools_executed in response for transparency This enables massive CC token savings - raw tool output (e.g., logs) never touches Claude's context, only the final analysis is returned. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Documents real-world testing showing 70-90% Claude token savings: - Log analysis: 15,000 tokens → 1,500 tokens (90% reduction) - System health check: 15,000 tokens → 4,500 tokens (70% reduction) Includes architecture diagrams, usage examples, and configuration guide for the autonomous agent with internal tool execution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add "How the Token Math Works" section explaining token breakdown - Correct savings percentages: 40-80% (was 70-90%) - Add Local Tokens column to comparison tables - Update video script with consistent numbers and explanations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Show tokens shifting from Claude (paid) to Local LLM (free) - Add security audit (93%) and Docker logs (95%) as top examples - Update all claims to "up to 95%" based on actual testing - Include 8 test cases sorted by Claude Direct tokens - Add test-scripts/health_check.sh generated by agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Acknowledge CC Token Saver, Ollama Claude, Rubber Duck MCP as prior art - Position as infrastructure-focused implementation, not novel invention - Add comparison tables: when to use this vs other options - Add PORTABILITY.md for others evaluating the tool - Add posts/ with Reddit and Substack drafts - Update video script with honest intro and metadata - Add Nextcloud debugging use case to stats table Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Michael Lambert and others added 8 commits January 12, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent_chat tool for agentic workflows with tool-calling#2

Add agent_chat tool for agentic workflows with tool-calling#2
lambertmt wants to merge 8 commits intoopenconstruct:mainfrom
lambertmt:feature/agent-tool-calling

lambertmt commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant