- Local AI Inference: Uses llama.cpp for low latency, on-device inference == no data leaves your machine.
- LSP-Backed Context: Leverages your existing LSP servers for rich context (completions & signatures).
- Prompt Caching: Caches suggestions for repeated contexts to reduce latency.
- Cross-file Context Chunks: Automatically extracts and includes relevant code snippets from your project files.
- Git-Aware: Respects
.gitignorefor context extraction. - Trigger Characters: Automatically makes fresh requests on trigger characters (e.g.,
.,:,(,{,[).
vim.pack.add({ "https://github.com/davkk/quickfill.nvim" })
-- no need to call setup!
-- the plugin uses `<Plug>` mappings for flexibility
-- you can map them to your preferred keys like this:
vim.keymap.set("i", "<C-y>", "<Plug>(quickfill-accept)") -- accept full suggestion
vim.keymap.set("i", "<C-k>", "<Plug>(quickfill-accept-word)") -- accept next word
vim.keymap.set("i", "<C-l>", "<Plug>(quickfill-accept-replace)") -- accept and replace
vim.keymap.set("i", "<C-x>", "<Plug>(quickfill-trigger)") -- trigger fresh infill requestCustomize behavior via vim.g.quickfill.
Defaults are used if not set explicitly:
vim.g.quickfill = {
url = "http://localhost:8080", -- llama.cpp server URL
n_predict = 128, -- max tokens to predict
temperature = 0.3, -- temperature
top_k = 20, -- top-k sampling
top_p = 0.4, -- top-p sampling
repeat_penalty = 1.5, -- repeat penalty
stop_chars = { "\n", "\r", "\r\n" }, -- stop characters
trigger_chars = { ".", ":", "[", "{", "(" }, -- trigger characters for fresh request
fresh_on_trigger_char = true, -- make fresh request on trigger char
stop_on_trigger_char = false, -- stop generating on trigger char
n_prefix = 16, -- prefix context lines
n_suffix = 16, -- suffix context lines
max_cache_entries = 32, -- max cache entries
extra_chunks = true, -- enable extra project chunks
max_extra_chunks = 6, -- max extra chunks
chunk_lines = 16, -- lines per chunk
lsp_completion = true, -- enable LSP completions
max_lsp_completion_items = 20, -- max LSP completion items
lsp_signature_help = true, -- enable signature help
}Before using the plugin, make sure to have a llama.cpp server running.
Here's an example command to download a model and start the server in the background:
llama-server \
-hf bartowski/Qwen2.5-Coder-0.5B-GGUF:Q4_0 \
--n-gpu-layers 99 \
--ctx-size 0 \
--flash-attn on \
--mlock \
--cache-reuse 256- start plugin with
:AI startor:AI - stop plugin with
:AI stop

