Skip to content

Bug fix: Fix the Git Service when running on Ollama#219

Open
esnible wants to merge 2 commits intokagenti:mainfrom
esnible:git-service-on-ollama
Open

Bug fix: Fix the Git Service when running on Ollama#219
esnible wants to merge 2 commits intokagenti:mainfrom
esnible:git-service-on-ollama

Conversation

@esnible
Copy link
Copy Markdown
Contributor

@esnible esnible commented Apr 3, 2026

Summary

We supply a configuration for the Git Service when running on Kagenti under Ollama, but it doesn't work.

For example, the chat query What do you think about https://github.com/kagenti/agent-examples/issues/218 ? yields

Thought: The user query involves retrieving issues from a specific GitHub repository and issue number. Action: list_issues Action Input: {"owner":"kagenti","repo":"agent-examples","state":"all"}

The default port for an MCP tool in Kagenti is 8000, but this Agent example assumes the MCP server will be deployed on 9090.

(Optional) Testing Instructions

  1. Deploy the Git Tool
  2. http://kagenti-ui.localtest.me:8080/agents/import

Signed-off-by: Ed Snible <snible@us.ibm.com>
Copy link
Copy Markdown
Contributor

@rubambiza rubambiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The MCP port fix (9090 -> 8000) is correct and matches Kagenti's default tool port. The /v1 URL suffix and model ID change look intentional for OpenAI-compatible mode via litellm. One issue: the file header comment is now stale.

Areas reviewed: Config/env
Commits: 1 commit, signed-off
CI status: All passing (10/10)


# LLM configuration
TASK_MODEL_ID=ollama_chat/ibm/granite4:latest
TASK_MODEL_ID=gpt-oss:latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must-fix: The header comment on line 5 still says ollama pull ibm/granite4:latest but the model ID is now gpt-oss:latest. Please update the comment to match the new model, or explain what gpt-oss is and what prerequisite pull command is needed.

Comment thread a2a/git_issue_agent/.env.ollama
Signed-off-by: Ed Snible <esnible@acm.org>
Copy link
Copy Markdown
Contributor

@rubambiza rubambiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see below concern about model switch.

# Uses a local Ollama instance for LLM inference.
# Prerequisite: Ollama must be running with the model pulled:
# ollama pull ibm/granite4:latest
# ollama pull gpt-oss:latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is intended for local development and testing of a PoC, I have concerns about the switch to gpt-oss. The smallest version of 20B parameters requires 12-16GB of RAM. On the other hand, the granite models vary from 350M-32B parameters. My intuition is that granite should good enough for a git issue agent. So, I'd def recommend that we stick to them. This is coming from my experience developing kagenti with a laptop of only 16GB of RAM.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I agree, let's tr to stick with smallest possible model for this in terms of footprint

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree. However, the granite model didn't work for me -- the tools were not called. The weather service uses llama3.2:3b-instruct-fp16 which also didn't work. Can you suggest a model that works?

For testing, I was deploying using the current environment variables and doing kubectl set env with models from other .env.ollama files, with queries such as Tell me about https://github.com/kagenti/agent-examples/issues/208.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubambiza Can you or someone on the team provide guidance for the what we are looking for in an open source model for demos? I have tried my query, What do you think about https://github.com/kagenti/agent-examples/issues/218 ? with the models used by all the Kagenti ollama examples and with some popular recent small models:

model output
granite4:latest outputs a thought
llama3.2:3b-instruct-fp16 outputs a thought
gpt-oss:latest works
phi4-mini outputs a thought
gemma3 gemma3:latest does not support tools

At this point I can either try more models in the hopes of finding one that meets your criteria, or I can assume there is some bug in the agent that prevents it from going from thought to action. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esnible In our recent meeting (I didn't have time to capture her suggestion), @kellyaa mentioned there might be some peculiarities to do with prompting granite in particular and the underlying agent framework itself. I hope @kellyaa restate her insight here, and hopefully it can help us move forward.

Comment thread a2a/git_issue_agent/.env.ollama
Copy link
Copy Markdown
Contributor

@pdettori pdettori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: Model Guidance + Suggested Path Forward

Areas reviewed: Config/env, LLM integration code (llm.py, config.py, prompts.py, main.py)
Commits: 2 commits, both signed-off
CI status: All 10 checks passing

What's Correct and Should Merge

The MCP port fix (9090 → 8000) is correct and matches Kagenti's default tool port. The /v1 suffix, API key comment, and OLLAMA_API_BASE removal are fine. These changes are needed regardless of the model question.

Suggested Path Forward

Split this PR into two changes:

  1. Merge now: MCP port fix + comment updates (uncontested, correct)
  2. Follow-up PR: Model selection, once a small model is validated

Why Small Models Fail at Tool Calling

Ed's test results are consistent with a well-understood limitation. The agent uses CrewAI's ReAct-style loop, which requires the model to output a strict multi-line format (Thought: → Action: → Action Input: {JSON}). The "outputs a thought" failure means the model generates reasoning but doesn't follow through with the action format.

Three factors contribute:

  1. Native tool support in Ollama — Ollama maintains a curated list of models that support the tools parameter (ollama.com/search?c=tools). Models without proper tool-call templates in their model files simply cannot emit structured tool calls. granite4 may not be on this list.

  2. The ollama_chat/ prefix matters — When TASK_MODEL_ID starts with ollama_chat/, litellm routes through Ollama's native chat API. When the prefix is dropped and /v1 is added to the base URL, litellm routes through the OpenAI-compatible endpoint. This is a meaningful routing change that affects how tool calls are handled.

  3. Hidden num_ctx regression — In git_issue_agent/llm.py:12-13, the code only sets num_ctx=8192 for models prefixed with ollama/ or ollama_chat/. With gpt-oss:latest (no prefix), this override doesn't trigger, leaving Ollama's default 2048-token context. The prompts in prompts.py are quite large — this context limit alone could cause silent failures even with a capable model.

Recommended Models to Test

Tool/function calling is a specific model capability, not just a matter of parameter count. For any candidate, verify it appears in Ollama's tools-capable list.

Model Size (disk) Tool calling Notes
qwen2.5:7b ~4.4 GB ✅ Excellent Best balance of size vs capability for dev laptops
llama3.1:8b ~4.7 GB ✅ Good Native function calling; different from llama3.2:3b
mistral:7b ~4.1 GB ✅ Good Established function calling support
qwen2.5:3b ~1.9 GB ⚠️ May work Worth testing if minimal footprint needed

qwen2.5:7b would be my first recommendation — it fits in 16 GB RAM alongside Ollama overhead and has strong tool-calling benchmarks.

Action Items

  • Split PR: extract MCP port fix into a mergeable change
  • Fix llm.py to set num_ctx regardless of model prefix (detect Ollama via base URL containing :11434 rather than model ID prefix)
  • Test qwen2.5:7b with Ed's query: What do you think about https://github.com/kagenti/agent-examples/issues/218 ?
  • If qwen2.5:7b works, use it as the .env.ollama default


# LLM configuration
TASK_MODEL_ID=ollama_chat/ibm/granite4:latest
TASK_MODEL_ID=gpt-oss:latest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model selection guidance — The switch to gpt-oss works but requires 12-16 GB RAM, which defeats the purpose of a local dev config.

The root cause of the granite4 failure is likely that it lacks native tool-calling support in Ollama (check ollama.com/search?c=tools). The ReAct format in prompts.py requires strong instruction-following that sub-7B models struggle with.

Recommended: Try qwen2.5:7b (~4.4 GB) — it has excellent tool-calling support and fits on a 16 GB laptop. The config would be:

TASK_MODEL_ID=qwen2.5:7b

Also note: dropping the ollama_chat/ prefix means llm.py:12-13 no longer sets num_ctx=8192, falling back to Ollama's 2048-token default. This could cause failures even with a capable model. Consider filing a follow-up to fix the num_ctx detection logic.

Copy link
Copy Markdown
Contributor Author

@esnible esnible Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the recommended models. None of them were able to respond meaningfully to the query What do you think about https://github.com/kagenti/agent-examples/issues/218 ?.

Model Output
qwen2.5:7b Thought: The user provided owner and repo; list_issues fits. Action: list_issues Action Input: {"after": null, "direction": null, "labels": null, "orderBy": null, "owner": "kagenti", "perPage": 100, "repo": "agent-examples", "since": null, "state": null}
llama3.1:8b Thought: The user provided owner and repo; list_sub_issues fits. Action: list_sub_issues Action Input: {"owner": "kagenti", "repo": "agent-examples", "issue_number": 218}
mistral:7b Thought: The user provided owner and repo along with an issue number; list_sub_issues fits. Action: list_sub_issues Action Input: {"owner":"kagenti","repo":"agent-examples","issueNumber":218}
qwen2.5:3b Thought: I now know the final answer Final Answer: The issue with ID 218 in the repository kagenti/agent-examples is titled "Issue Title Not Provided". As of my last query, there are no comments or reactions associated with this issue. To provide a more comprehensive report, I would need to fetch additional details such as the state (open/closed), labels, and any related sub-issues.

I haven't been able to get any model other than gpt-oss to produce output that isn't wrong.

It might be worthwhile to merge this as-is, and create a new Issue to investigate why both the Git agent and also Simple Generalist ( https://github.com/kagenti/agent-examples/blob/main/a2a/simple_generalist/.env.ollama#L7 ) are using the gpt-oss with ollama, if it is too large.

# MCP Tool endpoint
MCP_URL=http://github-tool-mcp:9090/mcp
# Port 8000 is the default for Kagenti Tools
MCP_URL=http://github-tool-mcp:8000/mcp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — Port 8000 is Kagenti's default for tools. This fix (and the comment) should be merged independently of the model discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants