Skip to content

feat: auto-reindex with Merkle tree change detection#31

Merged
iamvirul merged 7 commits intoVecGrep:mainfrom
Kavirubc:feat/merkle-auto-reindex
Mar 1, 2026
Merged

feat: auto-reindex with Merkle tree change detection#31
iamvirul merged 7 commits intoVecGrep:mainfrom
Kavirubc:feat/merkle-auto-reindex

Conversation

@Kavirubc
Copy link
Contributor

Summary

  • Merkle tree change detection: Each directory gets a hash computed from its children (sha256(mtime:size) for files, sha256(sorted child hashes) for dirs). On startup, compares stored vs current tree — if a directory hash matches, the entire subtree is skipped. Only changed files are re-indexed.
  • Watch state persistence: Watched paths are saved to ~/.vecgrep/watched.json and restored on server restart, so watchers survive MCP server restarts without user intervention.
  • Background startup sync: Watcher restoration and Merkle sync run in a background daemon thread so mcp.run() starts immediately — users get instant tool availability.
  • stop_watching MCP tool: New tool to explicitly stop watching a codebase path.
  • Bonus fix: Corrected a pre-existing bug in VectorStore.build_index() where "vector" was passed as the first positional arg to LanceDB's create_index(), mapping to the metric param instead of vector_column_name.

Closes #25

Test plan

  • All 138 tests pass (uv run pytest)
  • Lint clean (uv run ruff check src/ tests/)
  • Index a project with watch=True, verify ~/.vecgrep/<hash>/merkle.json is created
  • Restart server, verify background restore happens and watcher resumes
  • Edit a file while server is down → restart → verify only that file is re-indexed
  • Delete a project dir → restart → verify stale path pruned from watched.json
  • Call stop_watching → verify path removed

- Add directory Merkle tree (sha256 of mtime+size for files, sorted
  child hashes for dirs) to detect changes without stat-ing every file
- Save/load Merkle tree to ~/.vecgrep/<project_hash>/merkle.json
- Persist watched paths in ~/.vecgrep/watched.json
- Restore watchers in background thread on startup with fast Merkle
  diff to only re-index changed files
- Update Merkle tree incrementally on live file sync events
- Add stop_watching MCP tool to remove watched paths

Closes VecGrep#25
- Merkle tree build, save/load, change detection
- Watch state persistence save/load
- stop_watching MCP tool
- Background startup restore with stale path pruning
- Merkle tree updates after _do_index and live sync
The old code passed "vector" as the first positional argument to
LanceDB's create_index(), which maps to the `metric` parameter —
not the column name. This would cause the index to be built with
an invalid distance metric instead of targeting the "vector" column.

Fix: use `vector_column_name="vector"` as a keyword argument and
`metric="cosine"` explicitly.

Note: this fix is unrelated to the Merkle tree auto-reindex feature
and was a pre-existing change found in the working tree.
@Kavirubc Kavirubc requested a review from iamvirul as a code owner February 27, 2026 02:55
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- ValueError in relative_to() falls back to str(entry)
- OSError on entry.stat() in hash block skips file gracefully
- Gitignored dirs are excluded from Merkle tree
- _merkle_sync returns 'No file changes detected' when only dir hash differs
…ndition

- `_project_dir` now delegates to `_project_hash` instead of duplicating logic
- `_merkle_sync` always calls `_ensure_watcher` even when no changes detected,
  so watchers are properly restored on server restart in the common no-change case
- Add `_MERKLE_LOCK` global lock to serialize load-modify-save of merkle.json
  in `_process_file`, preventing concurrent file events from overwriting each
  other's tree updates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@iamvirul iamvirul merged commit ee8cf3c into VecGrep:main Mar 1, 2026
2 checks passed
@iamvirul iamvirul mentioned this pull request Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Auto-reindexing not working properly

2 participants