Skip to content

feat: add half-open state to circuit breaker#33

Merged
xakep666 merged 2 commits intoplatacard:mainfrom
bjoern-weidlich-anchorage:bjoern/half-open-circuit-breaker
Mar 30, 2026
Merged

feat: add half-open state to circuit breaker#33
xakep666 merged 2 commits intoplatacard:mainfrom
bjoern-weidlich-anchorage:bjoern/half-open-circuit-breaker

Conversation

@bjoern-weidlich-anchorage
Copy link
Copy Markdown
Contributor

@bjoern-weidlich-anchorage bjoern-weidlich-anchorage commented Mar 25, 2026

The circuit breaker currently stays permanently open once tripped, remote storage is disabled for the rest of the build. For long CI builds, the server may recover mid-build but cacheprog won't try again.

This adds a half-open state: after the circuit trips, it periodically allows a single probe request through. If the probe succeeds, the circuit closes and remote storage resumes. If it fails, the circuit stays open and the timer resets.

Implementation uses a timestamp + CompareAndSwap so only one concurrent request wins the probe, no background goroutines or lifecycle management needed.

New config: --retry-after / REMOTE_STORAGE_RETRY_AFTER (default 15s). Set to 0 to disable recovery and preserve the current behavior.

The circuit breaker now periodically probes the upstream after tripping,
allowing recovery if the remote cache server comes back during a long
build. Uses a timestamp-based approach with CompareAndSwap to ensure
only one concurrent request acts as the probe.

Configurable via --retry-after / REMOTE_STORAGE_RETRY_AFTER (default 15s).
Set to 0 to disable and preserve the previous permanently-open behavior.
@bjoern-weidlich-anchorage bjoern-weidlich-anchorage marked this pull request as ready for review March 25, 2026 16:20
Comment thread internal/app/cacheprog/circuit_breaker_test.go
@xakep666
Copy link
Copy Markdown
Collaborator

Some requests here:

  • please fix linter issues (golangci-lint run --fix should help)
  • please add a new changelog entry under Unreleased

Also I thought about adding a functests scenario (IMHO it should present for such feature). But it requires to compile large-scale project which need to be find at first. Simulating network problems here shouldn't be an issue because functests setup already includes toxiproxy.

- Remove now func() field, use time.Now() directly with synctest fake time
- Add test verifying only one probe request in half-open state
- Add changelog entry
- Fix linter issues
@xakep666 xakep666 merged commit 22feeb4 into platacard:main Mar 30, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants