Conversation
21e316d to
b65f15b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21e316d86c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
52dd978 to
e6b8998
Compare
cruessler
left a comment
There was a problem hiding this comment.
Thanks a lot for the effort, that’s much appreciated as making blame incremental was on our TODO list as well! I had a first look at the changes and left 2 comments (though this is far from an in-depth review).
The `git commit-graph write` command also supports writing a separate section on the cache file that contains information about the paths changed between a commit and its first parent. This information can be used to significantly speed up the performance of some traversal operations, such as `git log -- <PATH>` and `git blame`. This commit teaches the git-commitgraph crate in gitoxide how to parse and access this information. We've only implemented support for reading v2 of this cache, because v1 is deprecated in Git as it can return bad results in some corner cases. The implementation is 100% compatible with Git itself; it uses the exact same version of murmur3 that Git is using, including the seed hashes.
Implement a gix_blame::incremental API that yelds the blame entries as they're discovered, similarly to Git's `git blame --incremental`. The implementation simply takes the original gix_blame::file and replaces the Vec of blame entries with a generic BlameSink trait. The original gix_blame::file is now implemented as a wrapper for gix_blame::incremental, by implementing the BlameSink trait on Vec<BlameEntry> and sorting + coalescing the entries before returning.
Use the new changed-path bloom filters from the commit graph to greatly speed up blame our implementation. Whenever we find a rejection on the bloom filter for the current path, we skip it altogether and pass the blame without diffing the trees.
Implement the log_file method in gitoxide-core, which allows performing path-delimited log commands. With the new changed paths bloom filter, it is not possible to perform this operation very efficiently.
Change `process_changes` to take `&[Change]` instead of `Vec<Change>`, eliminating the `changes.clone()` heap allocation at every call site. Replace the O(H×C) restart-from-beginning approach with a cursor that advances through the changes list across hunks. Non-suspect hunks are now skipped immediately. When the rare case of overlapping suspect ranges is detected (from merge blame convergence), the cursor safely resets to maintain correctness.
Compare the performance of the implementation with and without the
commit graph cache.
gix-blame::incremental/without-commit-graph
time: [14.852 s 14.895 s 14.944 s]
change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
Change within noise threshold.
gix-blame::incremental/with-commit-graph
time: [287.55 ms 290.30 ms 292.85 ms]
change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
No change in performance detected.
Signed-off-by: Vicent Marti <vmg@strn.cat>
The BlameSink type now returns a std::ops::ControlFlow value that can be used to interrupt the blame early. Signed-off-by: Vicent Marti <vmg@strn.cat>
e6b8998 to
a85c1fe
Compare
|
Thanks for the review @cruessler! I did the two changes; interrupting the incremental blame is a little bit involved and makes the code harder to follow so I've kept it in a separate commit. Let me know what else would you like to see changed. |
Hiiiiii @Byron! Thanks for all your work on the library!
I've been playing around with the new blame APIs that @cruessler developed. The existing
gix_blame::filewas not fitting the use case we needed at Cursor, so I took a stab at implementing an equivalent togit blame --incremental. The changes were quite minimal because I just leftgix_blame::fileas a thin wrapper overgix_blame::incremental.I then tried benchmarking the
incrementalAPI against Git itself and the numbers were not good at all. After some review, I noticed that thegix-commitgraphcrate just didn't support the changed-paths bloom filter cache from Git, so I took a stab at implementing those too.The results are very good. These are for
tools/clang/spanify/Spanifier.cppin the Chromium repository, which is a very very hairy file:Since all these changes are quite related, I'm putting them up here in a single PR. Every commit is self contained and explains the changes on the commit message so if you'd like me to split this into smaller PRs just let me know.
Thanks!