Skip to content

davkk/quickfill.nvim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

quickfill-logo

quickfill.nvim

Neovim

Quick code infill suggestions by combining a local llama.cpp server with active LSP servers.

quickfill-demo

Features

  • Local AI Inference: Uses llama.cpp for low latency, on-device inference == no data leaves your machine.
  • LSP-Backed Context: Leverages your existing LSP servers for rich context (completions & signatures).
  • Prompt Caching: Caches suggestions for repeated contexts to reduce latency.
  • Cross-file Context Chunks: Automatically extracts and includes relevant code snippets from your project files.
  • Git-Aware: Respects .gitignore for context extraction.
  • Trigger Characters: Automatically makes fresh requests on trigger characters (e.g., ., :, (, {, [).

Installation

vim.pack.add({ "https://github.com/davkk/quickfill.nvim" })

-- no need to call setup!

-- the plugin uses `<Plug>` mappings for flexibility
-- you can map them to your preferred keys like this:
vim.keymap.set("i", "<C-y>", "<Plug>(quickfill-accept)")         -- accept full suggestion
vim.keymap.set("i", "<C-k>", "<Plug>(quickfill-accept-word)")    -- accept next word
vim.keymap.set("i", "<C-l>", "<Plug>(quickfill-accept-replace)") -- accept and replace
vim.keymap.set("i", "<C-x>", "<Plug>(quickfill-trigger)")        -- trigger fresh infill request

Configuration

Customize behavior via vim.g.quickfill.

Defaults are used if not set explicitly:

vim.g.quickfill = {
    url = "http://localhost:8080",          -- llama.cpp server URL

    n_predict = 128,                        -- max tokens to predict
    temperature = 0.3,                      -- temperature
    top_k = 20,                             -- top-k sampling
    top_p = 0.4,                            -- top-p sampling
    repeat_penalty = 1.5,                   -- repeat penalty

    stop_chars = { "\n", "\r", "\r\n" },    -- stop characters
    trigger_chars = { ".", ":", "[", "{", "(" }, -- trigger characters for fresh request
    fresh_on_trigger_char = true,           -- make fresh request on trigger char
    stop_on_trigger_char = false,           -- stop generating on trigger char

    n_prefix = 16,                          -- prefix context lines
    n_suffix = 16,                          -- suffix context lines

    max_cache_entries = 32,                 -- max cache entries

    extra_chunks = true,                    -- enable extra project chunks
    max_extra_chunks = 6,                   -- max extra chunks
    chunk_lines = 16,                       -- lines per chunk

    lsp_completion = true,                  -- enable LSP completions
    max_lsp_completion_items = 20,          -- max LSP completion items

    lsp_signature_help = true,              -- enable signature help
}

Local Inference Server Setup

Before using the plugin, make sure to have a llama.cpp server running.

Here's an example command to download a model and start the server in the background:

llama-server \
    -hf bartowski/Qwen2.5-Coder-0.5B-GGUF:Q4_0 \
    --n-gpu-layers 99 \
    --ctx-size 0 \
    --flash-attn on \
    --mlock \
    --cache-reuse 256

Commands

  • start plugin with :AI start or :AI
  • stop plugin with :AI stop

About

Instant AI completion powered by local LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages