Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
03cc708
Fix parser: handle multi-line and edge case parsing
JesseHerrick Apr 12, 2026
1ecda73
Handle more edge-cases
JesseHerrick Apr 13, 2026
cb03bb0
Replace line-based parser with tokenizer and walker
JesseHerrick Apr 13, 2026
3bea731
Handle transitive use lookups using the tokenizer
JesseHerrick Apr 13, 2026
b1149a1
Rewrite LSP extraction functions to use tokenizer; fix delegate chain…
JesseHerrick Apr 13, 2026
e4dd9ae
Remove now-dead code
JesseHerrick Apr 13, 2026
5dc449c
Fix Keyword.fetch! use scenario
JesseHerrick Apr 13, 2026
7ede6cc
Fix off-by-one bug
JesseHerrick Apr 13, 2026
694039a
Fix tokenizer scope and inline-def walking in LSP parsing
JesseHerrick Apr 13, 2026
a4e3595
Fix processModuleDef scope tracking for split-line do and inline do: …
JesseHerrick Apr 13, 2026
0aa0f17
Fix infinite loop in multi-alias brace scanning on unexpected tokens
JesseHerrick Apr 13, 2026
5c91c82
Merge remote-tracking branch 'origin/main' into feat-tokenizer
JesseHerrick Apr 13, 2026
6d2686e
Fix `require` with `as:` not registering aliases for go-to-definition
JesseHerrick Apr 13, 2026
55be4e8
Remove dead code: TokDo followed by TokColon check
JesseHerrick Apr 13, 2026
d4a6e2c
Fix tokenizer losing line count on escaped newlines in strings/heredocs
JesseHerrick Apr 14, 2026
9eb3926
Tokenize all the things!
JesseHerrick Apr 14, 2026
7b90104
Fix bugbot reported issues
JesseHerrick Apr 14, 2026
e7c0dfc
Refactor token-walk parsing with shared helpers
JesseHerrick Apr 14, 2026
39ea46d
Fix moduledoc sigil parsing and attribute definition matching.
JesseHerrick Apr 14, 2026
3eb476d
Fix no-paren call context misclassifying Elixir keywords.
JesseHerrick Apr 14, 2026
be7551a
Fix performance and quality issues
JesseHerrick Apr 14, 2026
38959c3
Add regression tests for alias parsing edge cases
JesseHerrick Apr 14, 2026
ce1086b
Tighten alias-block and same-line alias/require regression tests
JesseHerrick Apr 14, 2026
2547e12
Fix Bugbot issues: off-by-one loop, code duplication, and defprotocol…
JesseHerrick Apr 14, 2026
24c125c
Even more bug fixes
JesseHerrick Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ Dexter is a fast Elixir LSP server. It indexes module and function definitions f
## Module structure

- `cmd/main.go` — CLI entrypoint: `init`, `reindex`, `lookup`, `lsp` subcommands
- `internal/parser/` — Regex-based Elixir parser. Extracts defmodule, def, defp, defmacro, defdelegate, defguard, defprotocol, defimpl, @type, @callback. Handles heredocs, module nesting, alias resolution for defdelegate targets.
- `internal/parser/` — Elixir parser backed by a hand-rolled tokenizer (`tokenizer.go`). The tokenizer produces a flat token stream (handling heredocs, sigils, strings, comments as opaque tokens) and `parser_tokenized.go` walks it to extract defmodule, def, defp, defmacro, defdelegate, defguard, defprotocol, defimpl, @type, @callback, alias, import, use, and Module.function references. Handles module nesting, alias resolution for defdelegate targets, and multi-line expressions natively via bracket depth tracking.
- `internal/store/` — SQLite layer. Tables: `files` (path + mtime), `definitions` (module, function, kind, line, file_path, delegate_to, delegate_as), `refs` (module, function, line, file_path, kind).
- `internal/lsp/` — LSP server. `server.go` handles all LSP methods. `elixir.go` contains pure functions for cursor expression extraction, alias/import resolution, use-chain parsing. `rename.go` has rename helpers. `hover.go` has hover formatting. `documents.go` is an in-memory open-buffer store.
- `internal/lsp/` — LSP server. `server.go` handles all LSP methods. `elixir.go` contains pure functions for cursor expression extraction, alias/import/use extraction (tokenizer-based), and use-chain parsing. `rename.go` has rename helpers. `hover.go` has hover formatting. `documents.go` is an in-memory open-buffer store.
- `internal/treesitter/` — Tree-sitter integration for scope-aware variable rename and go-to-references.

## LSP feature map
Expand Down Expand Up @@ -45,7 +45,7 @@ The `__using__` cache (`usingCacheEntry`) stores the parsed result of each modul
- **`transUses`** — `use Mod` inside the body (double-use chains); also a heuristic for `Keyword.put_new/put`
- **`optBindings`** — dynamic `import unquote(var)` where `var` comes from `Keyword.get(opts, :key, Default)`; stores `{optKey, defaultMod, kind}` so consumer opts override the default

`parseUsingBody` handles three forms:
`parseUsingBody` uses the tokenizer to walk the `__using__` body directly on the token stream. This avoids line-joining heuristics and correctly handles heredocs in moduledocs (which previously caused a regression where `bracketDepth` in line-based joining treated `#` inside markdown links as comments, cascading into file-wide line merges). It handles three forms:
- `defmacro __using__` — standard form
- `using opts do` — ExUnit.CaseTemplate form (only when `use ExUnit.CaseTemplate` is present)
- Function delegation — when the body calls a local helper like `using_block(opts)`, `parseHelperQuoteBlock` finds the function definition and parses its `quote do` body
Expand Down Expand Up @@ -84,11 +84,10 @@ Call sites are attributed to the **injecting module** in the store (not the defi

## Key design decisions

- **Regex instead of tree-sitter for indexing** — 7.5x faster per file. Tree-sitter is only used when necessary in a
file already opened by the editor.
- **Tokenizer instead of tree-sitter for indexing** — a hand-rolled tokenizer + walker replaced the original regex-based parser for both file indexing and runtime `__using__` parsing. The tokenizer handles heredocs, sigils, multi-line expressions, and comments as opaque tokens, eliminating fragile line-joining heuristics. Tree-sitter is only used for scope-aware variable operations in files already opened by the editor.
- **SQLite for storage** — single file, fast reads, incremental updates via mtime tracking.
- **Parallel indexing** — `init` uses all CPU cores for parsing, single writer for SQLite.
- **Delegate following** — `defdelegate` targets are resolved at index time (including alias resolution and `as:` renames).
- **Delegate following** — `defdelegate` targets are resolved at index time (including alias resolution and `as:` renames). `LookupFollowDelegate` follows chains recursively (up to 5 hops) so `A → B → C` resolves to `C`.
- **Git HEAD polling** — watches `.git/HEAD` mtime every 2 seconds to detect branch switches and trigger reindex.
- **Full document sync** — `TextDocumentSyncKindFull`; Elixir files are small enough that incremental sync adds complexity without benefit.
- **Index versioning** — `IndexVersion` in `internal/version/version.go`. Mismatch on startup triggers a forced rebuild. Bump when parser or schema changes would invalidate existing indexes.
59 changes: 56 additions & 3 deletions internal/lsp/documents.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,17 @@ import (

tree_sitter "github.com/tree-sitter/go-tree-sitter"
tree_sitter_elixir "github.com/tree-sitter/tree-sitter-elixir/bindings/go"

"github.com/remoteoss/dexter/internal/parser"
)

type cachedDoc struct {
text string
tree *tree_sitter.Tree
src []byte // source bytes the tree references — must stay alive
text string
tree *tree_sitter.Tree
src []byte // source bytes the tree references — must stay alive
tokens []parser.Token // cached tokenizer output
tokSrc []byte // source bytes for tokens
lineStarts []int // byte offset of each line start (from TokenizeFull)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token cache may not be invalidated on document update

High Severity

Three new fields (tokens, tokSrc, lineStarts) were added to the cachedDoc struct, and GetTokens/GetTokensFull lazily populate them (checking doc.tokens == nil). However, the Set() method is not in the diff, meaning it was not updated. If Set() mutates the existing cachedDoc in-place (resetting only text, tree, and src), the stale token cache will persist, causing all token-based operations (hover, completion, go-to-definition, alias extraction) to use outdated tokens after a file edit. This would only be safe if Set() replaces the entire *cachedDoc pointer.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 24c125c. Configure here.

}

// DocumentStore tracks the text content of open buffers and caches
Expand Down Expand Up @@ -89,3 +94,51 @@ func (ds *DocumentStore) GetTree(uri string) (*tree_sitter.Tree, []byte, bool) {
}
return doc.tree, doc.src, true
}

// GetTokens returns cached tokenizer output and source bytes for the given URI.
// Tokenizes on first access and caches the result. The cache is invalidated on
// the next Set() call.
func (ds *DocumentStore) GetTokens(uri string) ([]parser.Token, []byte, bool) {
ds.mu.Lock()
defer ds.mu.Unlock()
doc, ok := ds.docs[uri]
if !ok {
return nil, nil, false
}
if doc.tokens == nil {
doc.tokSrc = []byte(doc.text)
result := parser.TokenizeFull(doc.tokSrc)
doc.tokens = result.Tokens
doc.lineStarts = result.LineStarts
}
return doc.tokens, doc.tokSrc, true
}

// GetTokensFull returns cached tokenizer output including line starts for
// efficient (line, col) → byte offset conversion.
func (ds *DocumentStore) GetTokensFull(uri string) ([]parser.Token, []byte, []int, bool) {
ds.mu.Lock()
defer ds.mu.Unlock()
doc, ok := ds.docs[uri]
if !ok {
return nil, nil, nil, false
}
if doc.tokens == nil {
doc.tokSrc = []byte(doc.text)
result := parser.TokenizeFull(doc.tokSrc)
doc.tokens = result.Tokens
doc.lineStarts = result.LineStarts
}
return doc.tokens, doc.tokSrc, doc.lineStarts, true
}

// GetTokenizedFile returns a cached TokenizedFile for the given URI, or nil
// if the document is not tracked. This is the preferred way to get a
// TokenizedFile from the document store.
func (ds *DocumentStore) GetTokenizedFile(uri string) *TokenizedFile {
tokens, src, lineStarts, ok := ds.GetTokensFull(uri)
if !ok {
return nil
}
return NewTokenizedFileFromCache(tokens, src, lineStarts)
}
Loading
Loading