Bug fix: Fix the Git Service when running on Ollama#219
Bug fix: Fix the Git Service when running on Ollama#219esnible wants to merge 2 commits intokagenti:mainfrom
Conversation
Signed-off-by: Ed Snible <snible@us.ibm.com>
rubambiza
left a comment
There was a problem hiding this comment.
Review Summary
The MCP port fix (9090 -> 8000) is correct and matches Kagenti's default tool port. The /v1 URL suffix and model ID change look intentional for OpenAI-compatible mode via litellm. One issue: the file header comment is now stale.
Areas reviewed: Config/env
Commits: 1 commit, signed-off
CI status: All passing (10/10)
|
|
||
| # LLM configuration | ||
| TASK_MODEL_ID=ollama_chat/ibm/granite4:latest | ||
| TASK_MODEL_ID=gpt-oss:latest |
There was a problem hiding this comment.
must-fix: The header comment on line 5 still says ollama pull ibm/granite4:latest but the model ID is now gpt-oss:latest. Please update the comment to match the new model, or explain what gpt-oss is and what prerequisite pull command is needed.
Signed-off-by: Ed Snible <esnible@acm.org>
rubambiza
left a comment
There was a problem hiding this comment.
Please see below concern about model switch.
| # Uses a local Ollama instance for LLM inference. | ||
| # Prerequisite: Ollama must be running with the model pulled: | ||
| # ollama pull ibm/granite4:latest | ||
| # ollama pull gpt-oss:latest |
There was a problem hiding this comment.
If this is intended for local development and testing of a PoC, I have concerns about the switch to gpt-oss. The smallest version of 20B parameters requires 12-16GB of RAM. On the other hand, the granite models vary from 350M-32B parameters. My intuition is that granite should good enough for a git issue agent. So, I'd def recommend that we stick to them. This is coming from my experience developing kagenti with a laptop of only 16GB of RAM.
There was a problem hiding this comment.
Good point - I agree, let's tr to stick with smallest possible model for this in terms of footprint
There was a problem hiding this comment.
I also agree. However, the granite model didn't work for me -- the tools were not called. The weather service uses llama3.2:3b-instruct-fp16 which also didn't work. Can you suggest a model that works?
For testing, I was deploying using the current environment variables and doing kubectl set env with models from other .env.ollama files, with queries such as Tell me about https://github.com/kagenti/agent-examples/issues/208.
There was a problem hiding this comment.
@rubambiza Can you or someone on the team provide guidance for the what we are looking for in an open source model for demos? I have tried my query, What do you think about https://github.com/kagenti/agent-examples/issues/218 ? with the models used by all the Kagenti ollama examples and with some popular recent small models:
| model | output |
|---|---|
| granite4:latest | outputs a thought |
| llama3.2:3b-instruct-fp16 | outputs a thought |
| gpt-oss:latest | works |
| phi4-mini | outputs a thought |
| gemma3 | gemma3:latest does not support tools |
At this point I can either try more models in the hopes of finding one that meets your criteria, or I can assume there is some bug in the agent that prevents it from going from thought to action. What do you think?
There was a problem hiding this comment.
@esnible In our recent meeting (I didn't have time to capture her suggestion), @kellyaa mentioned there might be some peculiarities to do with prompting granite in particular and the underlying agent framework itself. I hope @kellyaa restate her insight here, and hopefully it can help us move forward.
pdettori
left a comment
There was a problem hiding this comment.
Re-review: Model Guidance + Suggested Path Forward
Areas reviewed: Config/env, LLM integration code (llm.py, config.py, prompts.py, main.py)
Commits: 2 commits, both signed-off
CI status: All 10 checks passing
What's Correct and Should Merge
The MCP port fix (9090 → 8000) is correct and matches Kagenti's default tool port. The /v1 suffix, API key comment, and OLLAMA_API_BASE removal are fine. These changes are needed regardless of the model question.
Suggested Path Forward
Split this PR into two changes:
- Merge now: MCP port fix + comment updates (uncontested, correct)
- Follow-up PR: Model selection, once a small model is validated
Why Small Models Fail at Tool Calling
Ed's test results are consistent with a well-understood limitation. The agent uses CrewAI's ReAct-style loop, which requires the model to output a strict multi-line format (Thought: → Action: → Action Input: {JSON}). The "outputs a thought" failure means the model generates reasoning but doesn't follow through with the action format.
Three factors contribute:
-
Native tool support in Ollama — Ollama maintains a curated list of models that support the
toolsparameter (ollama.com/search?c=tools). Models without proper tool-call templates in their model files simply cannot emit structured tool calls.granite4may not be on this list. -
The
ollama_chat/prefix matters — WhenTASK_MODEL_IDstarts withollama_chat/, litellm routes through Ollama's native chat API. When the prefix is dropped and/v1is added to the base URL, litellm routes through the OpenAI-compatible endpoint. This is a meaningful routing change that affects how tool calls are handled. -
Hidden
num_ctxregression — Ingit_issue_agent/llm.py:12-13, the code only setsnum_ctx=8192for models prefixed withollama/orollama_chat/. Withgpt-oss:latest(no prefix), this override doesn't trigger, leaving Ollama's default 2048-token context. The prompts inprompts.pyare quite large — this context limit alone could cause silent failures even with a capable model.
Recommended Models to Test
Tool/function calling is a specific model capability, not just a matter of parameter count. For any candidate, verify it appears in Ollama's tools-capable list.
| Model | Size (disk) | Tool calling | Notes |
|---|---|---|---|
qwen2.5:7b |
~4.4 GB | ✅ Excellent | Best balance of size vs capability for dev laptops |
llama3.1:8b |
~4.7 GB | ✅ Good | Native function calling; different from llama3.2:3b |
mistral:7b |
~4.1 GB | ✅ Good | Established function calling support |
qwen2.5:3b |
~1.9 GB | Worth testing if minimal footprint needed |
qwen2.5:7b would be my first recommendation — it fits in 16 GB RAM alongside Ollama overhead and has strong tool-calling benchmarks.
Action Items
- Split PR: extract MCP port fix into a mergeable change
- Fix
llm.pyto setnum_ctxregardless of model prefix (detect Ollama via base URL containing:11434rather than model ID prefix) - Test
qwen2.5:7bwith Ed's query:What do you think about https://github.com/kagenti/agent-examples/issues/218 ? - If
qwen2.5:7bworks, use it as the.env.ollamadefault
|
|
||
| # LLM configuration | ||
| TASK_MODEL_ID=ollama_chat/ibm/granite4:latest | ||
| TASK_MODEL_ID=gpt-oss:latest |
There was a problem hiding this comment.
Model selection guidance — The switch to gpt-oss works but requires 12-16 GB RAM, which defeats the purpose of a local dev config.
The root cause of the granite4 failure is likely that it lacks native tool-calling support in Ollama (check ollama.com/search?c=tools). The ReAct format in prompts.py requires strong instruction-following that sub-7B models struggle with.
Recommended: Try qwen2.5:7b (~4.4 GB) — it has excellent tool-calling support and fits on a 16 GB laptop. The config would be:
TASK_MODEL_ID=qwen2.5:7b
Also note: dropping the ollama_chat/ prefix means llm.py:12-13 no longer sets num_ctx=8192, falling back to Ollama's 2048-token default. This could cause failures even with a capable model. Consider filing a follow-up to fix the num_ctx detection logic.
There was a problem hiding this comment.
I tried the recommended models. None of them were able to respond meaningfully to the query What do you think about https://github.com/kagenti/agent-examples/issues/218 ?.
| Model | Output |
|---|---|
| qwen2.5:7b | Thought: The user provided owner and repo; list_issues fits. Action: list_issues Action Input: {"after": null, "direction": null, "labels": null, "orderBy": null, "owner": "kagenti", "perPage": 100, "repo": "agent-examples", "since": null, "state": null} |
| llama3.1:8b | Thought: The user provided owner and repo; list_sub_issues fits. Action: list_sub_issues Action Input: {"owner": "kagenti", "repo": "agent-examples", "issue_number": 218} |
| mistral:7b | Thought: The user provided owner and repo along with an issue number; list_sub_issues fits. Action: list_sub_issues Action Input: {"owner":"kagenti","repo":"agent-examples","issueNumber":218} |
| qwen2.5:3b | Thought: I now know the final answer Final Answer: The issue with ID 218 in the repository kagenti/agent-examples is titled "Issue Title Not Provided". As of my last query, there are no comments or reactions associated with this issue. To provide a more comprehensive report, I would need to fetch additional details such as the state (open/closed), labels, and any related sub-issues. |
I haven't been able to get any model other than gpt-oss to produce output that isn't wrong.
It might be worthwhile to merge this as-is, and create a new Issue to investigate why both the Git agent and also Simple Generalist ( https://github.com/kagenti/agent-examples/blob/main/a2a/simple_generalist/.env.ollama#L7 ) are using the gpt-oss with ollama, if it is too large.
| # MCP Tool endpoint | ||
| MCP_URL=http://github-tool-mcp:9090/mcp | ||
| # Port 8000 is the default for Kagenti Tools | ||
| MCP_URL=http://github-tool-mcp:8000/mcp |
There was a problem hiding this comment.
✅ Correct — Port 8000 is Kagenti's default for tools. This fix (and the comment) should be merged independently of the model discussion.
Summary
We supply a configuration for the Git Service when running on Kagenti under Ollama, but it doesn't work.
For example, the chat query
What do you think about https://github.com/kagenti/agent-examples/issues/218 ?yieldsThe default port for an MCP tool in Kagenti is 8000, but this Agent example assumes the MCP server will be deployed on 9090.
(Optional) Testing Instructions