Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds retry logic with exponential backoff to handle HTTP 429 (Too Many Requests) rate limiting errors when pulling Docker images in CI. The implementation prevents CI failures caused by Docker registry rate limits by automatically retrying failed pulls with increasing delays.
Changes:
- Added retry loop with configurable MAX_RETRIES (5 attempts) and exponential backoff strategy
- Implemented 429 error detection by checking for "toomanyrequests" or "429" in stderr output
- Added parsing of "retry-after" header from registry responses to respect server-suggested delays
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.ci/pull_fluent_image.py
Outdated
| ) | ||
| break # Success, exit retry loop | ||
| except subprocess.CalledProcessError as e: | ||
| stderr_output = e.stderr.lower() |
There was a problem hiding this comment.
While capture_output=True with text=True typically guarantees stderr is a string, add defensive programming by using 'e.stderr or ""' before calling .lower() to handle any edge cases where stderr might be None. This prevents potential AttributeError exceptions.
.ci/pull_fluent_image.py
Outdated
| # Parse retry-after hint if available | ||
| retry_after = None | ||
| match = re.search( | ||
| r"retry-after:\s*([\d.]+)\s*ms", e.stderr, re.IGNORECASE |
There was a problem hiding this comment.
The regex pattern on line 43 searches against e.stderr (which is the original bytes or string), but line 35 processes it with .lower(). If the retry-after header contains uppercase letters like "Retry-After" (which is the standard HTTP header format), it may match correctly due to re.IGNORECASE flag. However, the inconsistency between using e.stderr here and stderr_output elsewhere could lead to confusion. Consider using e.stderr consistently, or apply the same transformations to both.
| r"retry-after:\s*([\d.]+)\s*ms", e.stderr, re.IGNORECASE | |
| r"retry-after:\s*([\d.]+)\s*ms", stderr_output, re.IGNORECASE |
| subprocess.run( | ||
| ["docker", "pull", full_image_name], | ||
| check=True, | ||
| capture_output=True, | ||
| text=True, | ||
| ) |
There was a problem hiding this comment.
The success message from docker pull is not printed, which makes the output silent on success. The original code allowed docker pull to print to stdout/stderr directly. With capture_output=True, successful pulls produce no output, making it harder to debug CI issues. Consider printing a success message or the stdout output on successful pulls to maintain visibility into what's happening during CI runs.
| subprocess.run( | |
| ["docker", "pull", full_image_name], | |
| check=True, | |
| capture_output=True, | |
| text=True, | |
| ) | |
| result = subprocess.run( | |
| ["docker", "pull", full_image_name], | |
| check=True, | |
| capture_output=True, | |
| text=True, | |
| ) | |
| if result.stdout: | |
| print(result.stdout, end="") | |
| print(f"Successfully pulled Docker image: {full_image_name}") |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.ci/pull_fluent_image.py
Outdated
| result = subprocess.run( | ||
| ["docker", "pull", full_image_name], | ||
| check=True, | ||
| stderr=subprocess.PIPE, | ||
| text=True, | ||
| ) | ||
| print(f"Successfully pulled Docker image: {full_image_name}") |
There was a problem hiding this comment.
result is assigned from subprocess.run(...) but never used. Remove the assignment (or use the value) to avoid dead code and keep the retry logic minimal.
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| raise | ||
| else: | ||
| # Not a rate limit error, re-raise immediately |
There was a problem hiding this comment.
Because stderr is captured (stderr=subprocess.PIPE), Docker error output won’t be streamed to CI logs. When the exception is re-raised (non-429 path, or after max retries), the captured stderr likely won’t be visible, making failures hard to diagnose. Consider printing/logging e.stderr before re-raising (and/or including it in a new exception message) so CI logs retain the original docker pull error details.
| ) | |
| raise | |
| else: | |
| # Not a rate limit error, re-raise immediately | |
| ) | |
| print("docker pull stderr output:\n" + (e.stderr or "")) | |
| raise | |
| else: | |
| # Not a rate limit error, re-raise immediately | |
| print("docker pull stderr output:\n" + (e.stderr or "")) |
Change Summary
Add 429 backoff support to the docker image puller
Rationale
My jobs kept failing due to this
Impact
Just CI