Skip to content

perf(informer): add TransformFuncs to reduce cache memory usage#2667

Open
theakshaypant wants to merge 1 commit intotektoncd:mainfrom
theakshaypant:feat/add-repo-informer-cache
Open

perf(informer): add TransformFuncs to reduce cache memory usage#2667
theakshaypant wants to merge 1 commit intotektoncd:mainfrom
theakshaypant:feat/add-repo-informer-cache

Conversation

@theakshaypant
Copy link
Copy Markdown
Member

@theakshaypant theakshaypant commented Apr 9, 2026

📝 Description of the Change

Add cache transform functions for the Repository and PipelineRun informers, stripping large unnecessary fields before objects enter the informer cache. Inspired by tektoncd/pipeline#9316.

For Repository objects, ManagedFields, Annotations and Status are stripped. The reconciler never reads Repository annotations or Status from the lister; Status is always fetched fresh via direct API call before updates.

For PipelineRun objects, ManagedFields and large Spec and Status fields are stripped. The watcher only needs Annotations, Spec.Status (pending check), Status.Conditions, and timing fields. All other data is fetched directly from the API when needed.

Benchmark results with production-realistic objects show an 89% JSON size reduction for Repository objects (5.6KB to 600B) and 94% for PipelineRun objects (10.7KB to 677B), with corresponding 8-10x reductions in heap allocation per cached object.

🔗 Linked GitHub Issue

N/A

🧪 Testing Strategy

  • Unit tests
  • Integration tests
  • End-to-end tests
  • Manual testing
  • Not Applicable

Ran a script to simulate a high-load env on the watcher, creating ~5000 PipelineRuns in 10 minutes with bloated annotations on the PipelineRuns, the heap profile does not show majority resource utilisation from the Informer as was the case earlier in such a test
watcher-memory-with-transformfunc

🤖 AI Assistance

AI assistance can be used for various tasks, such as code generation,
documentation, or testing.

Please indicate whether you have used AI assistance
for this PR and provide details if applicable.

  • I have not used any AI assistance for this PR.
  • I have used AI assistance for this PR.

Important

Slop will be simply rejected, if you are using AI assistance you need to make sure you
understand the code generated and that it meets the project's standards. you
need at least know how to run the code and deploy it (if needed). See
startpaac to make it easy
to deploy and test your code changes.

If the majority of the code in this PR was generated by an AI, please add a Co-authored-by trailer to your commit message.
For example:

Co-authored-by: Claude noreply@anthropic.com

✅ Submitter Checklist

  • 📝 My commit messages are clear, informative, and follow the project's How to write a git commit message guide. The Gitlint linter ensures in CI it's properly validated
  • ✨ I have ensured my commit message prefix (e.g., fix:, feat:) matches the "Type of Change" I selected above.
  • ♽ I have run make test and make lint locally to check for and fix any
    issues. For an efficient workflow, I have considered installing
    pre-commit and running pre-commit install to
    automate these checks.
  • 📖 I have added or updated documentation for any user-facing changes.
  • 🧪 I have added sufficient unit tests for my code changes.
  • 🎁 I have added end-to-end tests where feasible. See README for more details.
  • 🔎 I have addressed any CI test flakiness or provided a clear reason to bypass it.
  • If adding a provider feature, I have filled in the following and updated the provider documentation:
    • GitHub App
    • GitHub Webhook
    • Gitea/Forgejo
    • GitLab
    • Bitbucket Cloud
    • Bitbucket Data Center

@theakshaypant theakshaypant changed the title perf(informer): add TransformFuncs to reduce cache memory usage [WIP] perf(informer): add TransformFuncs to reduce cache memory usage Apr 9, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 73.58491% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.86%. Comparing base (de6de63) to head (9966baa).

Files with missing lines Patch % Lines
pkg/reconciler/controller.go 0.00% 8 Missing ⚠️
pkg/informer/transform/transform.go 86.66% 4 Missing and 2 partials ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2667      +/-   ##
==========================================
+ Coverage   58.82%   58.86%   +0.04%     
==========================================
  Files         204      205       +1     
  Lines       20134    20186      +52     
==========================================
+ Hits        11844    11883      +39     
- Misses       7525     7536      +11     
- Partials      765      767       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new cache transform functions (RepositoryForCache and PipelineRunForCache) in pkg/informer/transform to reduce memory usage by stripping unnecessary fields from Repository and PipelineRun objects before they are stored in informer caches. The pkg/reconciler/controller.go is updated to apply these transforms to the respective informers. The review comments point out a critical issue where the transform functions modify objects in-place, which could lead to race conditions or unexpected behavior. It is suggested to use DeepCopy() to operate on copies of the objects for safety.

Comment on lines +40 to +49
repo, ok := obj.(*pacv1alpha1.Repository)
if !ok {
return obj, nil
}

repo.ManagedFields = nil
repo.Annotations = nil
repo.Status = nil

return repo, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The transform function modifies the input object repo in-place. This can lead to race conditions or unexpected behavior if other parts of the system hold a reference to the original object. It's safer to operate on a copy of the object. Please add a repo.DeepCopy() at the beginning of the function body. The Tekton implementation this PR was inspired by also uses DeepCopy for safety.

repo, ok := obj.(*pacv1alpha1.Repository)
if !ok {
	return obj, nil
}
repo = repo.DeepCopy()

repo.ManagedFields = nil
repo.Annotations = nil
repo.Status = nil

return repo, nil

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended behaviour to strip these fields from the cache.

Comment on lines +74 to +96
pr, ok := obj.(*tektonv1.PipelineRun)
if !ok {
return obj, nil
}

pr.ManagedFields = nil

// Strip large Spec fields — watcher only checks Spec.Status (pending state)
pr.Spec.PipelineRef = nil
pr.Spec.PipelineSpec = nil
pr.Spec.Params = nil
pr.Spec.Workspaces = nil
pr.Spec.TaskRunSpecs = nil
pr.Spec.TaskRunTemplate = tektonv1.PipelineTaskRunTemplate{}
pr.Spec.Timeouts = nil

// Strip large Status fields — watcher only reads Conditions, StartTime, CompletionTime
pr.Status.PipelineSpec = nil
pr.Status.ChildReferences = nil
pr.Status.Provenance = nil
pr.Status.SpanContext = nil

return pr, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The transform function modifies the input object pr in-place. This can lead to race conditions or unexpected behavior if other parts of the system hold a reference to the original object. It's safer to operate on a copy of the object. Please add a pr.DeepCopy() at the beginning of the function body. The Tekton implementation this PR was inspired by also uses DeepCopy for safety.

pr, ok := obj.(*tektonv1.PipelineRun)
if !ok {
	return obj, nil
}
pr = pr.DeepCopy()

pr.ManagedFields = nil

// Strip large Spec fields — watcher only checks Spec.Status (pending state)
pr.Spec.PipelineRef = nil
pr.Spec.PipelineSpec = nil
pr.Spec.Params = nil
pr.Spec.Workspaces = nil
pr.Spec.TaskRunSpecs = nil
pr.Spec.TaskRunTemplate = tektonv1.PipelineTaskRunTemplate{}
pr.Spec.Timeouts = nil

// Strip large Status fields — watcher only reads Conditions, StartTime, CompletionTime
pr.Status.PipelineSpec = nil
pr.Status.ChildReferences = nil
pr.Status.Provenance = nil
pr.Status.SpanContext = nil

return pr, nil

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended behaviour to strip these fields from the cache.

@theakshaypant
Copy link
Copy Markdown
Member Author

Since the repo status field has already been deprecated, we could also target removing it altogether.
Major effort would be required in refactoring the e2e tests which use the repository status to confirm if a pipelinerun has completed.

@theakshaypant theakshaypant changed the title [WIP] perf(informer): add TransformFuncs to reduce cache memory usage perf(informer): add TransformFuncs to reduce cache memory usage Apr 10, 2026
@pipelines-as-code
Copy link
Copy Markdown

🤖 AI Analysis - pr-complexity-rating

To provide an accurate assessment, please provide the diff/file changes associated with PR #2667.

Since the current metadata only shows a merge commit and pipeline success, I cannot evaluate the specific code logic. However, based on the branch name feat/add-repo-informer-cache, here is the template ready for your input:

📊 PR Review Complexity

Dimension Score Rationale
Size TBD Pending diff analysis
Logic complexity TBD Pending cache implementation details
Risk TBD Caching layer changes often involve consistency/concurrency risks
Cross-cutting TBD Likely affects informer patterns and repository state
Test coverage TBD Needs verification of cache invalidation tests

Overall difficulty: [TBD]

Summary

[Awaiting diff] This PR appears to implement a cache for repository informers in pipelines-as-code. Reviewers should focus on cache invalidation logic, thread safety, and potential memory impact.

Suggested reviewers focus

  • Cache Lifecycle Management: Ensure cache expiration and invalidation logic prevent stale repository data.
  • Concurrency: Look for potential race conditions if the informer cache is accessed by multiple controller loops.

Please paste the code diff or file list to complete this triage.


Generated by Pipelines-as-Code LLM Analysis

@pipelines-as-code
Copy link
Copy Markdown

🤖 AI Analysis - pr-complexity-rating

Based on the provided metadata, this pull request appears to be a merge commit synchronizing a feature branch (feat/add-repo-informer-cache) with the main branch.

📊 PR Review Complexity

Dimension Score Rationale
Size 1 This is a merge commit; typically involves no direct code changes, only synchronization.
Logic complexity 1 No new logic introduced in this specific commit.
Risk 1 Minimal risk as it is a sync of existing branches.
Cross-cutting 1 Confined to branch synchronization.
Test coverage 5 The CI pipeline (go-testing-dj2vn) passed successfully.

Overall difficulty: Easy

Summary

This PR is a merge commit from main into feat/add-repo-informer-cache. It serves to bring the feature branch up to date with the latest changes in the upstream repository.

Suggested reviewers focus

  • No code review is required for this specific commit. The reviewer should focus on verifying that the merge did not introduce any unexpected conflicts and that the feature branch is ready for final testing or integration.

Generated by Pipelines-as-Code LLM Analysis

@theakshaypant theakshaypant force-pushed the feat/add-repo-informer-cache branch from 58ed503 to 049275e Compare April 12, 2026 06:06
@theakshaypant
Copy link
Copy Markdown
Member Author

Push to see the PR Complexity Rating in action.

@theakshaypant theakshaypant force-pushed the feat/add-repo-informer-cache branch from 049275e to 9966baa Compare April 14, 2026 07:22
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 14, 2026

@theakshaypant fyi i disabled it... need to do a rework of that feature

Add cache transform functions for the Repository and PipelineRun
informers, stripping large unnecessary fields before objects enter
the informer cache. Inspired by tektoncd/pipeline#9316.

For Repository objects, ManagedFields, Annotations and Status are
stripped. The reconciler never reads Repository annotations or
Status from the lister; Status is always fetched fresh via direct
API call before updates.

For PipelineRun objects, ManagedFields and large Spec and Status
fields are stripped. The watcher only needs Annotations, Spec.Status
(pending check), Status.Conditions, and timing fields. All other
data is fetched directly from the API when needed.

Benchmark results with production-realistic objects show an 89% JSON
size reduction for Repository objects (5.6KB to 600B) and 94% for
PipelineRun objects (10.7KB to 677B), with corresponding 8-10x
reductions in heap allocation per cached object.

Signed-off-by: Akshay Pant <akpant@redhat.com>
Asisted-by: Claude <noreply@anthropic.com>
@theakshaypant theakshaypant force-pushed the feat/add-repo-informer-cache branch from 9966baa to 781ae7a Compare April 17, 2026 08:25
@theakshaypant theakshaypant marked this pull request as ready for review April 17, 2026 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants