feat: auto-reindex with Merkle tree change detection#31
Merged
iamvirul merged 7 commits intoVecGrep:mainfrom Mar 1, 2026
Merged
feat: auto-reindex with Merkle tree change detection#31iamvirul merged 7 commits intoVecGrep:mainfrom
iamvirul merged 7 commits intoVecGrep:mainfrom
Conversation
- Add directory Merkle tree (sha256 of mtime+size for files, sorted child hashes for dirs) to detect changes without stat-ing every file - Save/load Merkle tree to ~/.vecgrep/<project_hash>/merkle.json - Persist watched paths in ~/.vecgrep/watched.json - Restore watchers in background thread on startup with fast Merkle diff to only re-index changed files - Update Merkle tree incrementally on live file sync events - Add stop_watching MCP tool to remove watched paths Closes VecGrep#25
- Merkle tree build, save/load, change detection - Watch state persistence save/load - stop_watching MCP tool - Background startup restore with stale path pruning - Merkle tree updates after _do_index and live sync
The old code passed "vector" as the first positional argument to LanceDB's create_index(), which maps to the `metric` parameter — not the column name. This would cause the index to be built with an invalid distance metric instead of targeting the "vector" column. Fix: use `vector_column_name="vector"` as a keyword argument and `metric="cosine"` explicitly. Note: this fix is unrelated to the Merkle tree auto-reindex feature and was a pre-existing change found in the working tree.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- ValueError in relative_to() falls back to str(entry) - OSError on entry.stat() in hash block skips file gracefully - Gitignored dirs are excluded from Merkle tree - _merkle_sync returns 'No file changes detected' when only dir hash differs
iamvirul
reviewed
Feb 27, 2026
…ndition - `_project_dir` now delegates to `_project_hash` instead of duplicating logic - `_merkle_sync` always calls `_ensure_watcher` even when no changes detected, so watchers are properly restored on server restart in the common no-change case - Add `_MERKLE_LOCK` global lock to serialize load-modify-save of merkle.json in `_process_file`, preventing concurrent file events from overwriting each other's tree updates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sha256(mtime:size)for files,sha256(sorted child hashes)for dirs). On startup, compares stored vs current tree — if a directory hash matches, the entire subtree is skipped. Only changed files are re-indexed.~/.vecgrep/watched.jsonand restored on server restart, so watchers survive MCP server restarts without user intervention.mcp.run()starts immediately — users get instant tool availability.stop_watchingMCP tool: New tool to explicitly stop watching a codebase path.VectorStore.build_index()where"vector"was passed as the first positional arg to LanceDB'screate_index(), mapping to themetricparam instead ofvector_column_name.Closes #25
Test plan
uv run pytest)uv run ruff check src/ tests/)watch=True, verify~/.vecgrep/<hash>/merkle.jsonis createdwatched.jsonstop_watching→ verify path removed