perf: batch git exec calls for notes, rev-parse, status, and checkpoints#595
perf: batch git exec calls for notes, rev-parse, status, and checkpoints#595
Conversation
- Fix #1: get_commits_with_notes_from_list uses batch cat-file instead of N individual git notes show calls - Fix #2: populate_ai_human_authors and overlay_ai_authorship in blame batch-fetch all authorship notes upfront - Fix #4: resolve_commit + resolve_parent combined into single git rev-parse call - Fix #5: status() skips separate get_staged_filenames() call when no pathspecs provided - Fix #7: read_all_checkpoints() called once instead of twice in get_all_tracked_files - Add batch_get_authorship_logs() helper to refs.rs for reuse across codebase - Make batch_read_blobs_with_oids() public for cross-module access Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits. |
|
No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits. |
|
|
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
|
No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits. |
Summary
This PR implements performance optimizations to reduce git subprocess overhead by batching git exec calls. The primary bottleneck in git-ai is the number of individual git invocations, especially for operations like
git notes showandgit rev-parsethat are called repeatedly in loops.Changes:
blame.rsandrefs.rsusing 2 git calls (1cat-file --batch-check+ 1cat-file --batch) instead of N individualgit notes showcallsresolve_commit+resolve_parentinto singlegit rev-parsecall indiff.rsget_staged_filenames()call instatus()when no pathspecs provided (porcelain v2 output already contains staging info)git-aiinit idempotent #7: Read checkpoints once instead of twice inget_all_tracked_files()batch_get_authorship_logs()helper function for reuse across codebasebatch_read_blobs_with_oids()public for cross-module accessExpected impact: Significant reduction in subprocess overhead for blame operations (N commits → 2 git calls), diff operations (2 calls → 1), and status operations (2 calls → 1 when no pathspecs).
Updates since last revision
maininsrc/commands/diff.rs: theresolve_two_revsfunction now usesexec_git_with_profile(&args, InternalGitProfile::General)?(combining the PR's?error propagation + stdout parsing with main's migration fromexec_gittoexec_git_with_profile). This matches the pattern already used byresolve_commitin the same file.Review & Testing Checklist for Human
resolve_two_revs— confirm error propagation via?behaves correctly when revisions can't be resolved.get_staged_filenames()pre-call.batch_get_authorship_logs()correctly filters byAUTHORSHIP_LOG_VERSION, but the inline batch code inget_commits_with_notes_from_list()does not — it deserializes without a version check. Confirm this difference is intentional.Notes
cargo buildandcargo clippywere run — no functional tests executedstatus.rschange assumes porcelain v2 output correctly identifies staged files; this should be verified with real reposdiff.rsresolve_two_revsnow propagates errors with?instead of the oldmatchonOk/Err; initial commits fall back to the empty tree hash via the caller'smatchonresolve_two_revsget_diff_json_filtered, the parent is resolved viaresolve_two_revs(repo, &parent_rev, &to_commit)which redundantly re-resolvesto_commit— minor inefficiency but not a bugLink to Devin run: https://app.devin.ai/sessions/972f0c2bf6d94b17b663d3e7d52da147
Requested by: @svarlamov