Help configuring Ollama/Continue to split 7B model between 4GB VRAM and 24GB RAM (Exit Status 2) #11788
Unanswered
opaulomatias
asked this question in
Help
Replies: 1 comment
-
|
not sure about this one tbh, but it might be a limitation with the VRAM. ollama's auto-split should handle it, but some models just need more VRAM than it can offer. maybe check their docs or discord for any specific flags or settings. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I'm trying to set up Continue to run local models via Ollama, specifically
qwen2.5-coder:7b, but I keep running into memory crashes when trying to use file context, and I'm hoping to find a way to properly balance the load between my VRAM and system RAM.My Hardware:
The Problem:
If I run the 3B model, everything works perfectly. However, when I load the 7B model and try to use
@index.htmlor@codebase, Continue instantly throws this error:"llama runner process has terminated: exit status 2"What I've Tried:
config.yamlby settingnum_ctx: 2048for the 7B model, but it still crashes the moment I attach a file.num_gpu: 0. Same results.My Question:
Since Ollama normally auto-splits models, is there a specific
config.yamlconfiguration or Ollama parameter I can use to successfully force the 7B model to utilize my 4GB VRAM for speed, but safely offload the rest (and the context window) to my 24GB of RAM without triggering the out-of-memory crash?Any guidance on how to optimize this specific hardware split would be hugely appreciated!
Beta Was this translation helpful? Give feedback.
All reactions