quickfill.nvim

Quick code infill suggestions by combining a local llama.cpp server with active LSP servers.

Features

Local AI Inference: Uses llama.cpp for low latency, on-device inference == no data leaves your machine.
LSP-Backed Context: Leverages your existing LSP servers for rich context (completions & signatures).
Prompt Caching: Caches suggestions for repeated contexts to reduce latency.
Cross-file Context Chunks: Automatically extracts and includes relevant code snippets from your project files.
Git-Aware: Respects .gitignore for context extraction.
Trigger Characters: Automatically makes fresh requests on trigger characters (e.g., ., :, (, {, [).

Installation

vim.pack.add({ "https://github.com/davkk/quickfill.nvim" })

-- no need to call setup!

-- the plugin uses `<Plug>` mappings for flexibility
-- you can map them to your preferred keys like this:
vim.keymap.set("i", "<C-y>", "<Plug>(quickfill-accept)")         -- accept full suggestion
vim.keymap.set("i", "<C-k>", "<Plug>(quickfill-accept-word)")    -- accept next word
vim.keymap.set("i", "<C-l>", "<Plug>(quickfill-accept-replace)") -- accept and replace
vim.keymap.set("i", "<C-x>", "<Plug>(quickfill-trigger)")        -- trigger fresh infill request

Configuration

Customize behavior via vim.g.quickfill.

Defaults are used if not set explicitly:

vim.g.quickfill = {
    url = "http://localhost:8080",          -- llama.cpp server URL

    n_predict = 128,                        -- max tokens to predict
    temperature = 0.3,                      -- temperature
    top_k = 20,                             -- top-k sampling
    top_p = 0.4,                            -- top-p sampling
    repeat_penalty = 1.5,                   -- repeat penalty

    stop_chars = { "\n", "\r", "\r\n" },    -- stop characters
    trigger_chars = { ".", ":", "[", "{", "(" }, -- trigger characters for fresh request
    fresh_on_trigger_char = true,           -- make fresh request on trigger char
    stop_on_trigger_char = false,           -- stop generating on trigger char

    n_prefix = 16,                          -- prefix context lines
    n_suffix = 16,                          -- suffix context lines

    max_cache_entries = 32,                 -- max cache entries

    extra_chunks = true,                    -- enable extra project chunks
    max_extra_chunks = 6,                   -- max extra chunks
    chunk_lines = 16,                       -- lines per chunk

    lsp_completion = true,                  -- enable LSP completions
    max_lsp_completion_items = 20,          -- max LSP completion items

    lsp_signature_help = true,              -- enable signature help
}

Local Inference Server Setup

Before using the plugin, make sure to have a llama.cpp server running.

Here's an example command to download a model and start the server in the background:

llama-server \
    -hf bartowski/Qwen2.5-Coder-0.5B-GGUF:Q4_0 \
    --n-gpu-layers 99 \
    --ctx-size 0 \
    --flash-attn on \
    --mlock \
    --cache-reuse 256

Commands

start plugin with :AI start or :AI
stop plugin with :AI stop

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
after/plugin		after/plugin
doc		doc
lua		lua
.busted		.busted
.gitignore		.gitignore
.luarc.json		.luarc.json
.stylua.toml		.stylua.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quickfill.nvim

Features

Installation

Configuration

Local Inference Server Setup

Commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

quickfill.nvim

Features

Installation

Configuration

Local Inference Server Setup

Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages