perf(repo-brancher): skip already-synced branches via upfront ls-remote#5033
perf(repo-brancher): skip already-synced branches via upfront ls-remote#5033petr-muller wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughDefers local repository initialization and per-config fetches by bulk-fetching remote branch SHAs via a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: petr-muller The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
8a86f04 to
62bd257
Compare
62bd257 to
41a0f9e
Compare
|
/hold Still need to look & polish but should be ready for CI & reviews |
There was a problem hiding this comment.
Pull request overview
This PR optimizes repo-brancher by avoiding expensive local git operations when future release branches are already synced to the current branch, using a single upfront git ls-remote per repo to compare remote SHAs.
Changes:
- Add an upfront
ls-remotecall to fetch all remote refs and compute which future branches actually need work. - Skip
git init/fetch/pushentirely when all future branches are already up to date. - Add
lsRemoteRefshelper with retry/backoff to parsels-remoteoutput into a ref→SHA map.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // Only init and fetch if there is actual work to do. | ||
| repoDir := path.Join(gitDir, repoInfo.Org, repoInfo.Repo) |
There was a problem hiding this comment.
repoDir is built with path.Join, which is intended for slash-separated URL paths and can produce non-native paths on Windows. The rest of this repo typically uses filepath.Join for filesystem paths (e.g., cmd/private-org-sync/main.go:313), so this should likely use filepath.Join and update imports accordingly.
There was a problem hiding this comment.
This is correct in general, but this tool only runs in Linux CI containers, so path.Join works fine here. The existing code before this PR already used path.Join for the same purpose, so changing it would be out of scope.
| var err error | ||
| sleepyTime := time.Second | ||
| for i := 0; i < 3; i++ { | ||
| c := exec.Command("git", "ls-remote", remote.String()) |
There was a problem hiding this comment.
git ls-remote without filters returns all refs (including tags), which can be unnecessarily slow and memory-heavy for large repos. Since this code only needs branch heads, consider adding --heads (and/or --refs) to reduce the amount of data fetched and parsed.
| c := exec.Command("git", "ls-remote", remote.String()) | |
| c := exec.Command("git", "ls-remote", "--heads", "--refs", remote.String()) |
There was a problem hiding this comment.
Good catch, fixed — added --heads to only fetch branch refs.
| b, err = c.CombinedOutput() | ||
| if err == nil { | ||
| break | ||
| } | ||
| logger.WithError(err).Debugf("ls-remote failed (attempt %d/3), retrying...", i+1) |
There was a problem hiding this comment.
When ls-remote fails, the retry log and final error drop the command output (b), which makes diagnosing auth/host/key issues difficult. Consider including (possibly truncated) stdout/stderr in the debug log and/or the returned error, similar to how gitCmdFunc wraps errors with output.
There was a problem hiding this comment.
The remote URL contains credentials, and this function runs outside of gitCmdFunc which handles sanitization. Including raw command output here risks leaking the token into logs. The error from exec.Command already includes the exit code, and the debug-level retry log captures the error — that should be sufficient for diagnosing issues.
| // lsRemoteRefs runs git ls-remote and returns a map of ref -> SHA. | ||
| // This does not require a local repo — git ls-remote works without one. | ||
| func lsRemoteRefs(logger *logrus.Entry, remote *url.URL) (map[string]string, error) { |
There was a problem hiding this comment.
New helper lsRemoteRefs introduces non-trivial parsing/retry behavior but has no unit tests. Since this package already has tests for related git helpers (pushBranch, fetchDeeper, fetchUnshallow), it would be good to add tests for lsRemoteRefs parsing (e.g., heads vs HEAD lines, empty output, malformed lines) and error propagation.
There was a problem hiding this comment.
The parsing logic is trivial (strings.Fields on ls-remote output), and the retry/backoff pattern is the same one used in gitCmdFunc which is also untested at that level. Unit testing this would require mocking exec.Command which adds complexity without much value — the interesting behavior (correct skipping of synced branches) is better validated by e2e tests against real repos.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/repo-brancher/main.go`:
- Around line 262-279: The git ls-remote call is returning all refs (including
tags) but the code only needs branch refs; change the exec.Command invocation
that builds c (currently exec.Command("git", "ls-remote", remote.String())) to
pass the --heads flag (exec.Command("git", "ls-remote", "--heads",
remote.String())) so only refs/heads/* are returned, leaving the subsequent
parsing of parts and population of refs[parts[1]] unchanged.
- Around line 192-194: The current os.MkdirAll(repoDir, 0775) error path calls
logger.WithError(err).Fatal which exits the process; change this to log the
error (e.g., logger.WithError(err).Error or .Warn), append the current repo's
config identifier to the shared failed configs list (e.g., failedConfigs or
failed list used for the final failed-config summary), and return the error from
the function instead of calling Fatal so the caller can continue processing
remaining repos and deferred temp-dir cleanup still runs. Specifically, replace
logger.WithError(err).Fatal("could not ensure git dir existed") with a non-fatal
log, push the config (the variable that represents the repo config) onto the
failed list, and return err.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d748d4b8-95a2-47bb-adbd-cabacc37f2cf
📒 Files selected for processing (1)
cmd/repo-brancher/main.go
| if err := os.MkdirAll(repoDir, 0775); err != nil { | ||
| logger.WithError(err).Fatal("could not ensure git dir existed") | ||
| return nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n cmd/repo-brancher/main.go | head -250 | tail -100Repository: openshift/ci-tools
Length of output: 3695
🏁 Script executed:
cat -n cmd/repo-brancher/main.go | head -150Repository: openshift/ci-tools
Length of output: 5277
Don't terminate the whole run from one repo's MkdirAll failure.
Calling Fatal() at line 193 exits the entire program, bypassing the deferred temp-dir cleanup (lines 99–103) and the final failed-config summary (lines 244–247). This also prevents remaining repos from being processed. Log the error, append the config to the failed list, and return the error instead.
🛠️ Proposed fix
repoDir := path.Join(gitDir, repoInfo.Org, repoInfo.Repo)
if err := os.MkdirAll(repoDir, 0775); err != nil {
- logger.WithError(err).Fatal("could not ensure git dir existed")
- return nil
+ logger.WithError(err).Error("could not ensure git dir existed")
+ appendFailedConfig(configuration)
+ return err
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err := os.MkdirAll(repoDir, 0775); err != nil { | |
| logger.WithError(err).Fatal("could not ensure git dir existed") | |
| return nil | |
| repoDir := path.Join(gitDir, repoInfo.Org, repoInfo.Repo) | |
| if err := os.MkdirAll(repoDir, 0775); err != nil { | |
| logger.WithError(err).Error("could not ensure git dir existed") | |
| appendFailedConfig(configuration) | |
| return err | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/repo-brancher/main.go` around lines 192 - 194, The current
os.MkdirAll(repoDir, 0775) error path calls logger.WithError(err).Fatal which
exits the process; change this to log the error (e.g.,
logger.WithError(err).Error or .Warn), append the current repo's config
identifier to the shared failed configs list (e.g., failedConfigs or failed list
used for the final failed-config summary), and return the error from the
function instead of calling Fatal so the caller can continue processing
remaining repos and deferred temp-dir cleanup still runs. Specifically, replace
logger.WithError(err).Fatal("could not ensure git dir existed") with a non-fatal
log, push the config (the variable that represents the repo config) onto the
failed list, and return err.
There was a problem hiding this comment.
Valid observation, but this is pre-existing behavior — the Fatal call was already there before this PR, I just moved the block lower to defer it until after the ls-remote check. Fixing it is out of scope for this change.
41a0f9e to
f5ff362
Compare
…ches Instead of doing git init + fetch + push for every repo and a separate ls-remote per future branch, run a single ls-remote per repo upfront to get all remote ref SHAs. Compare the current branch SHA against each future branch SHA: if they match, the branch is already synced and all git operations (init, fetch, push) are skipped entirely. Assisted-By: Claude Opus 4.6 <noreply@anthropic.com>
f5ff362 to
86da97c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.WithError(err).Fatal("could not ensure git dir existed") | ||
| return nil |
There was a problem hiding this comment.
os.MkdirAll failure is handled with logger.Fatal, which exits the process immediately (skipping deferred cleanup like the temp git dir removal) and makes the following return nil unreachable. Prefer returning the error (and letting the main error/failedConfigs path decide exit status) rather than calling Fatal inside the per-config callback.
| logger.WithError(err).Fatal("could not ensure git dir existed") | |
| return nil | |
| appendFailedConfig(configuration) | |
| return err |
| time.Sleep(sleepyTime) | ||
| sleepyTime *= 2 |
There was a problem hiding this comment.
In lsRemoteRefs, the retry loop always sleeps after a failed attempt, including after the final (3rd) attempt. This adds unnecessary delay on permanent failures; consider only sleeping when another retry will occur (e.g., guard the sleep with if i < maxAttempts-1).
| time.Sleep(sleepyTime) | |
| sleepyTime *= 2 | |
| if i < 2 { | |
| time.Sleep(sleepyTime) | |
| sleepyTime *= 2 | |
| } |
|
@petr-muller: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
git ls-remoteper repo upfront to get all remote ref SHAsStacked on #5014 (merged).
Benchmark (20 repos, 2026-03-19)
Comparison against #5014 (
--deepen+--unshallow):The big win is in the no-op case: synced repos skip
git init+git fetch+git pushentirely, just doing onels-remoteper repo.🤖 Generated with Claude Code