perf(informer): add TransformFuncs to reduce cache memory usage#2667
perf(informer): add TransformFuncs to reduce cache memory usage#2667theakshaypant wants to merge 1 commit intotektoncd:mainfrom
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2667 +/- ##
==========================================
+ Coverage 58.82% 58.86% +0.04%
==========================================
Files 204 205 +1
Lines 20134 20186 +52
==========================================
+ Hits 11844 11883 +39
- Misses 7525 7536 +11
- Partials 765 767 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces new cache transform functions (RepositoryForCache and PipelineRunForCache) in pkg/informer/transform to reduce memory usage by stripping unnecessary fields from Repository and PipelineRun objects before they are stored in informer caches. The pkg/reconciler/controller.go is updated to apply these transforms to the respective informers. The review comments point out a critical issue where the transform functions modify objects in-place, which could lead to race conditions or unexpected behavior. It is suggested to use DeepCopy() to operate on copies of the objects for safety.
| repo, ok := obj.(*pacv1alpha1.Repository) | ||
| if !ok { | ||
| return obj, nil | ||
| } | ||
|
|
||
| repo.ManagedFields = nil | ||
| repo.Annotations = nil | ||
| repo.Status = nil | ||
|
|
||
| return repo, nil |
There was a problem hiding this comment.
The transform function modifies the input object repo in-place. This can lead to race conditions or unexpected behavior if other parts of the system hold a reference to the original object. It's safer to operate on a copy of the object. Please add a repo.DeepCopy() at the beginning of the function body. The Tekton implementation this PR was inspired by also uses DeepCopy for safety.
repo, ok := obj.(*pacv1alpha1.Repository)
if !ok {
return obj, nil
}
repo = repo.DeepCopy()
repo.ManagedFields = nil
repo.Annotations = nil
repo.Status = nil
return repo, nilThere was a problem hiding this comment.
Intended behaviour to strip these fields from the cache.
| pr, ok := obj.(*tektonv1.PipelineRun) | ||
| if !ok { | ||
| return obj, nil | ||
| } | ||
|
|
||
| pr.ManagedFields = nil | ||
|
|
||
| // Strip large Spec fields — watcher only checks Spec.Status (pending state) | ||
| pr.Spec.PipelineRef = nil | ||
| pr.Spec.PipelineSpec = nil | ||
| pr.Spec.Params = nil | ||
| pr.Spec.Workspaces = nil | ||
| pr.Spec.TaskRunSpecs = nil | ||
| pr.Spec.TaskRunTemplate = tektonv1.PipelineTaskRunTemplate{} | ||
| pr.Spec.Timeouts = nil | ||
|
|
||
| // Strip large Status fields — watcher only reads Conditions, StartTime, CompletionTime | ||
| pr.Status.PipelineSpec = nil | ||
| pr.Status.ChildReferences = nil | ||
| pr.Status.Provenance = nil | ||
| pr.Status.SpanContext = nil | ||
|
|
||
| return pr, nil |
There was a problem hiding this comment.
The transform function modifies the input object pr in-place. This can lead to race conditions or unexpected behavior if other parts of the system hold a reference to the original object. It's safer to operate on a copy of the object. Please add a pr.DeepCopy() at the beginning of the function body. The Tekton implementation this PR was inspired by also uses DeepCopy for safety.
pr, ok := obj.(*tektonv1.PipelineRun)
if !ok {
return obj, nil
}
pr = pr.DeepCopy()
pr.ManagedFields = nil
// Strip large Spec fields — watcher only checks Spec.Status (pending state)
pr.Spec.PipelineRef = nil
pr.Spec.PipelineSpec = nil
pr.Spec.Params = nil
pr.Spec.Workspaces = nil
pr.Spec.TaskRunSpecs = nil
pr.Spec.TaskRunTemplate = tektonv1.PipelineTaskRunTemplate{}
pr.Spec.Timeouts = nil
// Strip large Status fields — watcher only reads Conditions, StartTime, CompletionTime
pr.Status.PipelineSpec = nil
pr.Status.ChildReferences = nil
pr.Status.Provenance = nil
pr.Status.SpanContext = nil
return pr, nilThere was a problem hiding this comment.
Intended behaviour to strip these fields from the cache.
|
Since the repo status field has already been deprecated, we could also target removing it altogether. |
🤖 AI Analysis - pr-complexity-ratingTo provide an accurate assessment, please provide the diff/file changes associated with PR #2667. Since the current metadata only shows a merge commit and pipeline success, I cannot evaluate the specific code logic. However, based on the branch name 📊 PR Review Complexity
Overall difficulty: [TBD] Summary[Awaiting diff] This PR appears to implement a cache for repository informers in Suggested reviewers focus
Please paste the code diff or file list to complete this triage. Generated by Pipelines-as-Code LLM Analysis |
🤖 AI Analysis - pr-complexity-ratingBased on the provided metadata, this pull request appears to be a merge commit synchronizing a feature branch ( 📊 PR Review Complexity
Overall difficulty: Easy SummaryThis PR is a merge commit from Suggested reviewers focus
Generated by Pipelines-as-Code LLM Analysis |
58ed503 to
049275e
Compare
|
Push to see the |
049275e to
9966baa
Compare
|
@theakshaypant fyi i disabled it... need to do a rework of that feature |
Add cache transform functions for the Repository and PipelineRun informers, stripping large unnecessary fields before objects enter the informer cache. Inspired by tektoncd/pipeline#9316. For Repository objects, ManagedFields, Annotations and Status are stripped. The reconciler never reads Repository annotations or Status from the lister; Status is always fetched fresh via direct API call before updates. For PipelineRun objects, ManagedFields and large Spec and Status fields are stripped. The watcher only needs Annotations, Spec.Status (pending check), Status.Conditions, and timing fields. All other data is fetched directly from the API when needed. Benchmark results with production-realistic objects show an 89% JSON size reduction for Repository objects (5.6KB to 600B) and 94% for PipelineRun objects (10.7KB to 677B), with corresponding 8-10x reductions in heap allocation per cached object. Signed-off-by: Akshay Pant <akpant@redhat.com> Asisted-by: Claude <noreply@anthropic.com>
9966baa to
781ae7a
Compare
📝 Description of the Change
Add cache transform functions for the Repository and PipelineRun informers, stripping large unnecessary fields before objects enter the informer cache. Inspired by tektoncd/pipeline#9316.
For Repository objects, ManagedFields, Annotations and Status are stripped. The reconciler never reads Repository annotations or Status from the lister; Status is always fetched fresh via direct API call before updates.
For PipelineRun objects, ManagedFields and large Spec and Status fields are stripped. The watcher only needs Annotations, Spec.Status (pending check), Status.Conditions, and timing fields. All other data is fetched directly from the API when needed.
Benchmark results with production-realistic objects show an 89% JSON size reduction for Repository objects (5.6KB to 600B) and 94% for PipelineRun objects (10.7KB to 677B), with corresponding 8-10x reductions in heap allocation per cached object.
🔗 Linked GitHub Issue
N/A
🧪 Testing Strategy
Ran a script to simulate a high-load env on the watcher, creating ~5000 PipelineRuns in 10 minutes with bloated annotations on the PipelineRuns, the heap profile does not show majority resource utilisation from the Informer as was the case earlier in such a test

🤖 AI Assistance
AI assistance can be used for various tasks, such as code generation,
documentation, or testing.
Please indicate whether you have used AI assistance
for this PR and provide details if applicable.
Important
Slop will be simply rejected, if you are using AI assistance you need to make sure you
understand the code generated and that it meets the project's standards. you
need at least know how to run the code and deploy it (if needed). See
startpaac to make it easy
to deploy and test your code changes.
If the majority of the code in this PR was generated by an AI, please add a
Co-authored-bytrailer to your commit message.For example:
Co-authored-by: Claude noreply@anthropic.com
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testandmake lintlocally to check for and fix anyissues. For an efficient workflow, I have considered installing
pre-commit and running
pre-commit installtoautomate these checks.