Skip to content

feat: Implement lease-based concurrency backend for PipelineRun queuing#2687

Closed
chmouel wants to merge 2 commits intotektoncd:mainfrom
chmouel:concurrency-lease-backend
Closed

feat: Implement lease-based concurrency backend for PipelineRun queuing#2687
chmouel wants to merge 2 commits intotektoncd:mainfrom
chmouel:concurrency-lease-backend

Conversation

@chmouel
Copy link
Copy Markdown
Member

@chmouel chmouel commented Apr 16, 2026

📝 Description of the Change

Implements a lease-based concurrency backend that replaces the previous in-memory queue mechanism for coordinating PipelineRun execution. This enables cross-replica queue management in distributed deployments.

Key Features

  • Lease-based coordination: Uses Kubernetes Lease resources to coordinate concurrency slots across multiple controller replicas
  • Queue status annotations: Exposes queue position and status on PipelineRun objects via annotations for observability and debugging
  • Configuration sync: Automatically restarts the watcher when concurrency backend configuration changes, ensuring live config updates are picked up immediately
  • Enhanced debug logging: Comprehensive debug logging throughout the lease manager and reconciler for troubleshooting
  • Background recovery: Recovers queued PipelineRuns on controller startup to handle any lost state from replicas going offline

🧪 Testing Strategy

  • Unit tests (1,464 lines of test code covering all new functionality)
  • Integration tests
  • End-to-end tests
  • Manual testing
  • Not Applicable

🤖 AI Assistance

  • I have not used any AI assistance for this PR.
  • I have used AI assistance for this PR.

✅ Submitter Checklist

  • 📝 My commit messages are clear, informative, and follow the project's How to write a git commit message guide. The Gitlint linter ensures in CI it's properly validated
  • ✨ I have ensured my commit message prefix (e.g., fix:, feat:) matches the "Type of Change" I selected above.
  • ♽ I have run make test and make lint locally to check for and fix any issues.
  • 📖 I have added or updated documentation for any user-facing changes.
  • 🧪 I have added sufficient unit tests for my code changes.
  • 🎁 I have added end-to-end tests where feasible.
  • 🔎 I have addressed any CI test flakiness or provided a clear reason to bypass it.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Kubernetes-backed concurrency queue backend called "lease" to Pipelines-as-Code, alongside the existing in-process "memory" backend. This new coordination mechanism uses Kubernetes Lease objects and PipelineRun annotations to manage concurrent execution more resiliently across watcher restarts. The changes include the implementation of a LeaseManager, a background recovery loop in the controller, and enhanced debugging visibility through new PipelineRun annotations. Feedback was provided to optimize performance in the LeaseManager by using server-side label selectors when listing PipelineRuns.

Comment thread pkg/queue/lease_manager.go Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 16, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 75.41436% with 267 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.85%. Comparing base (c9be9d6) to head (04733a6).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
pkg/queue/lease_manager.go 79.04% 94 Missing and 25 partials ⚠️
pkg/reconciler/controller.go 74.35% 35 Missing and 5 partials ⚠️
pkg/reconciler/queue_pipelineruns.go 67.54% 29 Missing and 8 partials ⚠️
pkg/queue/debug_info.go 74.10% 28 Missing and 8 partials ⚠️
pkg/params/config_sync.go 33.33% 14 Missing ⚠️
pkg/reconciler/reconciler.go 40.00% 12 Missing ⚠️
pkg/queue/common.go 85.71% 4 Missing and 2 partials ⚠️
pkg/test/concurrency/concurrency.go 80.00% 2 Missing ⚠️
pkg/queue/queue_manager.go 87.50% 0 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2687      +/-   ##
==========================================
+ Coverage   58.85%   59.85%   +0.99%     
==========================================
  Files         204      207       +3     
  Lines       20149    21201    +1052     
==========================================
+ Hits        11859    12690     +831     
- Misses       7525     7703     +178     
- Partials      765      808      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chmouel chmouel force-pushed the concurrency-lease-backend branch 8 times, most recently from af6d7ca to 57be58c Compare April 16, 2026 07:44
@chmouel
Copy link
Copy Markdown
Member Author

chmouel commented Apr 16, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new lease-backed concurrency backend for Pipelines-as-Code, providing a more resilient alternative to the default in-memory queue by using Kubernetes Leases and PipelineRun claims. The implementation includes a background recovery loop to handle stalled promotions and adds comprehensive debugging annotations and events. Feedback focuses on performance optimizations, specifically recommending the use of label selectors instead of labels.Everything() in the recovery loop and transitioning from direct API list calls to informer-based listers in the reconciliation path to minimize API server load.

Comment thread pkg/reconciler/controller.go
Comment thread pkg/queue/lease_manager.go
chmouel added 2 commits April 16, 2026 13:08
Introduced an opt-in concurrency backend that used Kubernetes Lease
objects and PipelineRun annotations for queue coordination. This
addressed potential state drift and race conditions during watcher
restarts or API delays by storing the queue state in the cluster
instead of only in memory. Added a global configuration setting to
choose between the legacy memory backend and the new lease-based
logic. Updated the queue manager interface to support context-aware
operations and ensured that stale claims were automatically
reclaimed via TTL-based expiration.

The system preserved existing lease objects during the release process
instead of completely deleting them. This reused the same records
across multiple acquisition cycles, which reduced excessive
communication overhead with the cluster.

AI-assisted-by: Cursor (Codex)
Signed-off-by: Chmouel Boudjnah <chmouel@redhat.com>
Refactored the event checking logic to specifically target and report
warning events instead of filtering out individual event reasons. This
ensures that unexpected errors are correctly identified during test
validation.
@chmouel chmouel force-pushed the concurrency-lease-backend branch from 57be58c to 04733a6 Compare April 16, 2026 11:08
@chmouel
Copy link
Copy Markdown
Member Author

chmouel commented Apr 16, 2026

closing for now, will come back to it when time allows it

@chmouel chmouel closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants