Skip to content

OCPBUGS-77571: Add only to tags to volume object, if volume is not already tagged#518

Merged
openshift-merge-bot[bot] merged 12 commits intoopenshift:mainfrom
gnufied:fix-tagging-if-exists
Mar 10, 2026
Merged

OCPBUGS-77571: Add only to tags to volume object, if volume is not already tagged#518
openshift-merge-bot[bot] merged 12 commits intoopenshift:mainfrom
gnufied:fix-tagging-if-exists

Conversation

@gnufied
Copy link
Member

@gnufied gnufied commented Mar 4, 2026

Fixes https://issues.redhat.com/browse/OCPBUGS-77571

The tags on ROSA cluster should not change after creation, so the day-2 tagging of volumes should not be necessary. This will avoid triggering CreateTags RPC calls.

Verified in real cluster as well:

I0307 01:19:56.561236       1 aws_ebs_tags_controller.go:358] All volumes already have the desired tags, skipping CreateTags call
I0307 01:19:56.570951       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-fa832390-b62e-4287-92e9-658e7420603f
I0307 01:19:56.578964       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-0aca21f1-bfe0-4edf-88be-39201145809d
I0307 01:19:56.584962       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-496249d4-5e37-44a4-98df-f05ddc83fc82
I0307 01:19:56.592478       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-728de243-db2a-4480-b204-c04bda1e236c

Summary by CodeRabbit

  • Chores

    • Tagging now checks existing volume tags in batches and skips volumes already matching desired tags; skips CreateTags entirely when no updates are needed and logs accordingly.
  • Bug Fixes

    • More robust error classification and retry logic: batch-level failures are retried with rate limiting while per-volume failures are handled individually; improved handling of missing volumes and tag-describe errors.
  • Tests

    • New tests covering tag-comparison, queueing behavior, batch vs. per-volume failure paths, and early-skip scenarios when resource tags are absent or empty.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 4, 2026
@openshift-ci-robot
Copy link

@gnufied: This pull request references Jira Issue OCPBUGS-77571, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Mar 4, 2026
@openshift-ci openshift-ci bot requested review from dobsonj and tsmetana March 4, 2026 21:20
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Controller now queries EC2 for existing EBS volume tags, compares them to desired ResourceTags, and only issues CreateTags for volumes needing updates. EC2 calls are abstracted via ec2TagsAPI; errors distinguish whole-batch vs per-item failures. Queue worker re-queues failures accordingly; processing skips when ResourceTags is empty.

Changes

Cohort / File(s) Summary
EBS Tags Controller Implementation
pkg/driver/aws-ebs/aws_ebs_tags_controller.go
Added ec2TagsAPI interface and error types (failedTagError, failWholeBatchErrror, failOneOrMoreTagError). Added volumeHasAllTags, fetchTagsOnVolumes, and filterVolumesNeedingTagUpdate. updateEBSTags now accepts ec2TagsAPI, pre-filters PVs, skips CreateTags when none need updates, and wraps errors into batch vs per-item failures. processInfrastructure now requires non-empty ResourceTags.
EBS Tags Controller Tests
pkg/driver/aws-ebs/aws_ebs_tags_controller_test.go
Added TestVolumeHasAllTags covering matching/extra/missing/mismatched tag scenarios and TestProcessInfrastructureSkipsWhenNoTags to assert no-op when ResourceTags are nil or empty.
Queue Worker Logic
pkg/driver/aws-ebs/aws-ebs_tags_queue_worker.go
Updated processBatchVolumes and processIndividualVolume signatures to accept ec2TagsAPI. Enhanced error handling to differentiate batch-wide failures (re-queue whole batch) versus per-item failures (enqueue individual PVs). processIndividualVolume adds NotFound handling and refines retry behavior.
Queue Worker Tests
pkg/driver/aws-ebs/aws-ebs_tags_queue_worker_test.go
New comprehensive tests and mockEC2Client covering batch and per-item flows: success, already-tagged, partial updates, DescribeTags failure (batched re-queue), and CreateTags failure (per-item re-queue). Adds test helpers to build controllers, PVs, and infra, and verifies queue state, annotations, and events.

Sequence Diagram(s)

sequenceDiagram
    participant Controller as Controller\n(aws-ebs-tags-controller)
    participant PVStore as PV Store\n(persistent volumes)
    participant EC2 as EC2 API\n(DescribeTags/CreateTags)
    participant Queue as Queue Worker\n(batch/individual)

    Controller->>PVStore: list PVs with AWS volume IDs
    Controller->>EC2: DescribeTags(volumeIDs)
    EC2-->>Controller: existing tags per volume
    Controller->>Controller: filterVolumesNeedingTagUpdate(existingTags, desiredTags)
    alt volumes need updates
        Controller->>Queue: enqueue batch of volumes needing update
        Queue->>EC2: CreateTags(filtered volume IDs)
        EC2-->>Queue: success
        Queue->>PVStore: update PV annotations (hash)
    else all volumes up-to-date
        Controller-->>Controller: log skip (no CreateTags)
    end
    alt DescribeTags failure
        Controller-->>Queue: fail-open / rate-limited requeue of batch
    end
    alt CreateTags failure (partial)
        Queue->>Queue: re-enqueue per-volume with rate limiting
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Tests contain critical timeout vulnerabilities in goroutine-based verification functions with unbounded sync.WaitGroup.Wait() calls lacking timeout protection. Wrap goroutine waits with timeout mechanisms using time.AfterFunc(), select with <-time.After() channels, or leverage t.Context() with timeouts to prevent indefinite hangs.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: optimizing EBS volume tagging to avoid redundant operations by only adding tags when volumes are not already tagged.
Stable And Deterministic Test Names ✅ Passed Test files use standard Go testing package with t.Run() pattern, not Ginkgo framework. All test names are static and contain no dynamic values.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 4, 2026
@openshift-ci-robot
Copy link

@gnufied: This pull request references Jira Issue OCPBUGS-77571, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jiezhao@redhat.com), skipping review request.

Details

In response to this:

Fixes https://issues.redhat.com/browse/OCPBUGS-77571

Summary by CodeRabbit

  • Chores

  • Optimized AWS EBS volume tagging to reduce unnecessary API calls. The system now filters volumes to identify which ones require tag updates before making changes, skipping operations when all desired tags are already present with correct values.

  • Tests

  • Added comprehensive test coverage for tag comparison logic, validating behavior across various scenarios including matching tags, missing tags, and mismatched values.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/driver/aws-ebs/aws_ebs_tags_controller.go (1)

413-448: Consider direct unit tests for filterVolumesNeedingTagUpdate behavior.

This helper now gates whether CreateTags runs. Adding focused tests for mixed volume states and the DescribeVolumes fail-open path would lock in this contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go` around lines 413 - 448, Add
direct unit tests for filterVolumesNeedingTagUpdate to lock in its gating
behavior for CreateTags: write tests that call filterVolumesNeedingTagUpdate
with mock EC2 DescribeVolumes responses (or a fake ec2Client) covering at
minimum (1) a mixed set of PVs where some volumes already have all desiredTags
and others do not, asserting only the ones needing tags are returned, and (2) a
DescribeVolumes error path asserting the function returns the original pvs
(fail-open). Use pvsToResourceIDs to build inputs and validate that
volumeHasAllTags logic is exercised by including volumes with subsets/duplicates
of desiredTags and by checking the klog skipping branch for fully-tagged
volumes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go`:
- Around line 317-324: The filtering logic treats volumes with nil Tags as
already-tagged and also allows an empty desired-tag set to proceed incorrectly;
update the code so updateEBSTags returns early (no-op) when the desired tag set
produced by newAndUpdatedTags is empty, and modify filterVolumesNeedingTagUpdate
(and its internal map population where it iterates volumes/vol.Tags) to include
volume IDs even if vol.Tags == nil so untagged volumes are considered missing
tags and will be returned for update; reference functions newAndUpdatedTags,
filterVolumesNeedingTagUpdate, and updateEBSTags to locate the changes.

---

Nitpick comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go`:
- Around line 413-448: Add direct unit tests for filterVolumesNeedingTagUpdate
to lock in its gating behavior for CreateTags: write tests that call
filterVolumesNeedingTagUpdate with mock EC2 DescribeVolumes responses (or a fake
ec2Client) covering at minimum (1) a mixed set of PVs where some volumes already
have all desiredTags and others do not, asserting only the ones needing tags are
returned, and (2) a DescribeVolumes error path asserting the function returns
the original pvs (fail-open). Use pvsToResourceIDs to build inputs and validate
that volumeHasAllTags logic is exercised by including volumes with
subsets/duplicates of desiredTags and by checking the klog skipping branch for
fully-tagged volumes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4741ca6b-2853-4fa8-b166-e48b2a2373da

📥 Commits

Reviewing files that changed from the base of the PR and between d1e1ab6 and e2dae1f.

📒 Files selected for processing (2)
  • pkg/driver/aws-ebs/aws_ebs_tags_controller.go
  • pkg/driver/aws-ebs/aws_ebs_tags_controller_test.go

@openshift-ci-robot
Copy link

@gnufied: This pull request references Jira Issue OCPBUGS-77571, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jiezhao@redhat.com), skipping review request.

Details

In response to this:

Fixes https://issues.redhat.com/browse/OCPBUGS-77571

Summary by CodeRabbit

  • Chores

  • Optimized AWS EBS volume tagging to avoid unnecessary API calls by detecting volumes that already have the desired tags and skipping needless updates; logs when no updates are required.

  • Tests

  • Added tests for tag-comparison logic and for the early-skip behavior when resource tags are absent or empty.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/driver/aws-ebs/aws_ebs_tags_controller.go (1)

317-324: ⚠️ Potential issue | 🟡 Minor

Handle empty desired-tag sets as an explicit no-op.

At Line [317], empty tags still flow into filtering logic; combined with Line [433] (vol.Tags != nil), untagged volumes can still be treated as needing update in the empty-desired-tags case. This edge case was previously raised and still appears unresolved.

Proposed fix
 func (c *EBSVolumeTagsController) updateEBSTags(ctx context.Context, ec2Client *ec2.Client, resourceTags []configv1.AWSResourceTag,
 	pvs ...*v1.PersistentVolume) error {
 	// Prepare tags
 	tags := newAndUpdatedTags(resourceTags)
+	if len(tags) == 0 {
+		klog.V(4).Infof("No desired tags configured, skipping CreateTags call")
+		return nil
+	}
 
 	// Filter out volumes that already have all desired tags
 	pvsNeedingUpdate := filterVolumesNeedingTagUpdate(ctx, ec2Client, tags, pvs)
@@
 	// Build a map of volumeID -> existing tags
 	volumeTags := make(map[string][]ec2types.Tag, len(output.Volumes))
 	for _, vol := range output.Volumes {
-		if vol.VolumeId != nil && vol.Tags != nil {
+		if vol.VolumeId != nil {
 			volumeTags[*vol.VolumeId] = vol.Tags
 		}
 	}

Also applies to: 433-435

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go` around lines 317 - 324, If
newAndUpdatedTags(resourceTags) returns an empty slice, treat this as an
explicit no-op and skip calling filterVolumesNeedingTagUpdate/any CreateTags
logic: check the result of newAndUpdatedTags (assigned to tags) and if len(tags)
== 0 return nil immediately. This prevents untagged volumes from being
considered "needing update" due to nil vol.Tags; update the same guard in the
symmetric path around the logic that inspects vol.Tags (the code referencing
vol.Tags != nil) so both create/update flows short-circuit when the desired tag
set is empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go`:
- Around line 317-324: If newAndUpdatedTags(resourceTags) returns an empty
slice, treat this as an explicit no-op and skip calling
filterVolumesNeedingTagUpdate/any CreateTags logic: check the result of
newAndUpdatedTags (assigned to tags) and if len(tags) == 0 return nil
immediately. This prevents untagged volumes from being considered "needing
update" due to nil vol.Tags; update the same guard in the symmetric path around
the logic that inspects vol.Tags (the code referencing vol.Tags != nil) so both
create/update flows short-circuit when the desired tag set is empty.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f8943c45-5052-4db6-8d2e-64e466a44933

📥 Commits

Reviewing files that changed from the base of the PR and between e2dae1f and 64213a2.

📒 Files selected for processing (2)
  • pkg/driver/aws-ebs/aws_ebs_tags_controller.go
  • pkg/driver/aws-ebs/aws_ebs_tags_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/driver/aws-ebs/aws_ebs_tags_controller_test.go

@gnufied
Copy link
Member Author

gnufied commented Mar 5, 2026

/retest

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/driver/aws-ebs/aws_ebs_tags_controller.go (1)

317-333: ⚠️ Potential issue | 🟠 Major

Add a hard no-op in updateEBSTags when desired tags are empty.

processInfrastructure now skips enqueueing for empty tags, but stale queue items can still reach this method. In that case, CreateTags is invoked with an empty Tags list and can repeatedly fail/requeue.

Proposed fix
 func (c *EBSVolumeTagsController) updateEBSTags(ctx context.Context, ec2Client *ec2.Client, resourceTags []configv1.AWSResourceTag,
 	pvs ...*v1.PersistentVolume) error {
 	// Prepare tags
 	tags := newAndUpdatedTags(resourceTags)
+	if len(tags) == 0 {
+		klog.V(4).Infof("No desired tags configured, skipping CreateTags call")
+		return nil
+	}
 
 	// Filter out volumes that already have all desired tags
 	pvsNeedingUpdate, err := filterVolumesNeedingTagUpdate(ctx, ec2Client, tags, pvs)
In AWS EC2 CreateTags API, is an empty `Tags` list valid? If not, what validation error is returned?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go` around lines 317 - 333, The
updateEBSTags flow should short-circuit when the desired tags list is empty to
avoid calling ec2Client.CreateTags with an empty Tags slice; inside
updateEBSTags (after computing tags via newAndUpdatedTags and before calling
filterVolumesNeedingTagUpdate or CreateTags) check if tags is nil or has length
0 and return nil as a hard no-op; keep the existing
filterVolumesNeedingTagUpdate, pvsNeedingUpdate, and CreateTags logic unchanged
so only non-empty tag updates reach ec2Client.CreateTags.
pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go (1)

119-122: ⚠️ Potential issue | 🟠 Major

Whole-batch retry can starve healthy volumes behind one bad ID.

If one volume in the batch is permanently invalid, retrying the same batch item can block successful tagging for all other volumes in that item. Consider degrading to per-volume retries (or at least on non-retryable AWS API errors) to isolate poison entries.

Safer fallback pattern
 	err := c.updateEBSTags(ctx, ec2Client, infra.Status.PlatformStatus.AWS.ResourceTags, pvList...)
 	if err != nil {
 		klog.Errorf("failed to update EBS tags: %v", err)
-		c.queue.AddRateLimited(item)
+		// isolate failing volumes so healthy ones can progress
+		c.handleBatchTagUpdateFailure(pvList, err)
+		c.queue.Forget(item)
 		return
 	}
For AWS EC2 CreateTags, if one resource ID in `Resources` is invalid (e.g., InvalidVolume.NotFound), does the whole request fail or are valid IDs still tagged?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go` around lines 119 - 122, The
current whole-batch retry (calling c.queue.AddRateLimited(item) on any
CreateTags failure) can starve healthy volumes; modify the error handling in the
batch tagging path (where c.queue.AddRateLimited(item) is called) to inspect the
AWS error returned from CreateTags — if it is a non-retryable/partial-failure
(e.g., InvalidVolume.NotFound or other per-resource errors), degrade to
per-volume handling: for each Resource ID in the failed item's Resources,
attempt individual tagging or enqueue individual retry items (or mark invalid
IDs as dead-letter) instead of re-queuing the entire batch; only requeue the
whole item for truly transient errors (e.g., throttling) and use the same
queue/AddRateLimited API for per-volume retries when needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go`:
- Around line 447-458: The DescribeTags call can fail due to missing
ec2:DescribeTags IAM permission and filterVolumesNeedingTagUpdate currently
propagates that error (contradicting the "fail-open" comment); either (A) add
ec2:DescribeTags to the documented/required IAM policy for this component and
update docs/comments to reflect the new dependency, or (B) implement a true
fail-open: catch/inspect the error from ec2Client.DescribeTags in the
DescribeTags call path, treat access-denied (or other permission errors) as a
non-fatal condition and return a nil/empty tag-map so
filterVolumesNeedingTagUpdate proceeds without pre-filtering, and update the
comment at the top of filterVolumesNeedingTagUpdate to accurately describe the
behavior; modify DescribeTags call site and filterVolumesNeedingTagUpdate to
handle the chosen option.

---

Duplicate comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go`:
- Around line 317-333: The updateEBSTags flow should short-circuit when the
desired tags list is empty to avoid calling ec2Client.CreateTags with an empty
Tags slice; inside updateEBSTags (after computing tags via newAndUpdatedTags and
before calling filterVolumesNeedingTagUpdate or CreateTags) check if tags is nil
or has length 0 and return nil as a hard no-op; keep the existing
filterVolumesNeedingTagUpdate, pvsNeedingUpdate, and CreateTags logic unchanged
so only non-empty tag updates reach ec2Client.CreateTags.

In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go`:
- Around line 119-122: The current whole-batch retry (calling
c.queue.AddRateLimited(item) on any CreateTags failure) can starve healthy
volumes; modify the error handling in the batch tagging path (where
c.queue.AddRateLimited(item) is called) to inspect the AWS error returned from
CreateTags — if it is a non-retryable/partial-failure (e.g.,
InvalidVolume.NotFound or other per-resource errors), degrade to per-volume
handling: for each Resource ID in the failed item's Resources, attempt
individual tagging or enqueue individual retry items (or mark invalid IDs as
dead-letter) instead of re-queuing the entire batch; only requeue the whole item
for truly transient errors (e.g., throttling) and use the same
queue/AddRateLimited API for per-volume retries when needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 75b4da50-2b40-47e0-aeca-cf1c31ca6ef0

📥 Commits

Reviewing files that changed from the base of the PR and between 64213a2 and 935ec98.

📒 Files selected for processing (3)
  • pkg/driver/aws-ebs/aws_ebs_tags_controller.go
  • pkg/driver/aws-ebs/aws_ebs_tags_controller_test.go
  • pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/driver/aws-ebs/aws_ebs_tags_controller_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go (1)

165-168: ⚠️ Potential issue | 🔴 Critical

Fix nil-pointer dereference in the NotFound path.

At line 163, pv is nil when apierrors.IsNotFound(err) is true. Accessing pv.Name will cause a panic. Use item.pvNames[0] instead, which is already available and is the PV name being queried.

Proposed fix
 		if apierrors.IsNotFound(err) {
 			klog.Infof("skipping volume tags update because PV %v does not exist", item.pvNames[0])
-			c.removeVolumesFromQueueSet(pv.Name)
+			c.removeVolumesFromQueueSet(item.pvNames[0])
 			c.queue.Forget(item)
 			return
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go` around lines 165 - 168, In
the apierrors.IsNotFound(err) branch inside the AWS EBS tags queue worker, avoid
dereferencing the nil pv by replacing pv.Name with the already-available
item.pvNames[0]; update the call to c.removeVolumesFromQueueSet(...) to use
item.pvNames[0] (and keep c.queue.Forget(item) as-is) so the NotFound path no
longer causes a nil-pointer panic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_controller.go`:
- Around line 83-90: The failedTagError type wraps an AWS error but doesn't
implement Unwrap(), breaking error chains used by errors.As in
processIndividualVolume and preventing detection of smithy.APIError (and
awsErrorVolumeNotFound). Add an Unwrap() error method on failedTagError that
returns f.awsError so errors.As(err, &smithy.APIError) can reach the underlying
AWS error and allow the awsErrorVolumeNotFound branch to execute.

In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker_test.go`:
- Around line 343-353: The test spawns a goroutine that blocks on c.queue.Get()
and then wg.Wait() with no timeout, which can hang CI if queue behavior
regresses; update both goroutine+wait blocks (the ones that call c.queue.Get(),
assign updateItem, and use wg.Wait()) to use a bounded wait: start a timer (or
context with timeout) and use select to either receive the item from
c.queue.Get() or fail the test with t.Fatal/t.Fatalf on timeout, ensure the
goroutine returns and wg.Done() is called in both success and timeout branches
so the WaitGroup does not block indefinitely.

---

Outside diff comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go`:
- Around line 165-168: In the apierrors.IsNotFound(err) branch inside the AWS
EBS tags queue worker, avoid dereferencing the nil pv by replacing pv.Name with
the already-available item.pvNames[0]; update the call to
c.removeVolumesFromQueueSet(...) to use item.pvNames[0] (and keep
c.queue.Forget(item) as-is) so the NotFound path no longer causes a nil-pointer
panic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bda8636f-439c-4108-91cb-bef1e5c0db57

📥 Commits

Reviewing files that changed from the base of the PR and between 935ec98 and f90723b.

📒 Files selected for processing (3)
  • pkg/driver/aws-ebs/aws_ebs_tags_controller.go
  • pkg/driver/aws-ebs/aws_ebs_tags_queue_worker.go
  • pkg/driver/aws-ebs/aws_ebs_tags_queue_worker_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/driver/aws-ebs/aws_ebs_tags_queue_worker_test.go (1)

343-353: ⚠️ Potential issue | 🟠 Major

Bound queue waits with a timeout to prevent hung tests.

Line 343 and Line 387 use unbounded c.queue.Get() + wg.Wait(). If expected requeue does not occur, these tests can block forever and stall CI.

Suggested fix pattern
- wg := sync.WaitGroup{}
- wg.Add(1)
- go func() {
- 	defer wg.Done()
- 	for {
- 		item, _ := c.queue.Get()
- 		updateItem = item
- 		break
- 	}
- }()
- wg.Wait()
+ itemCh := make(chan *pvUpdateQueueItem, 1)
+ go func() {
+ 	item, _ := c.queue.Get()
+ 	itemCh <- item
+ }()
+ select {
+ case updateItem = <-itemCh:
+ case <-time.After(2 * time.Second):
+ 	t.Fatal("timed out waiting for re-queued queue item")
+ }

For the multi-item case, use the same timeout pattern in a bounded receive loop until len(workItems)==2.

Also applies to: 387-401

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker_test.go` around lines 343 - 353,
Tests use unbounded blocking calls to c.queue.Get() with wg.Wait() (see the
goroutine reading updateItem and the multi-item loop that waits for workItems),
which can hang CI if the expected requeue never happens; change these to bounded
receives using a timeout select (e.g., time.After) inside the goroutine/loop and
fail the test or break after the timeout, and for the multi-item case loop until
len(workItems)==2 (or timeout) instead of a single blocking c.queue.Get() call
so the test always terminates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/driver/aws-ebs/aws_ebs_tags_queue_worker_test.go`:
- Around line 343-353: Tests use unbounded blocking calls to c.queue.Get() with
wg.Wait() (see the goroutine reading updateItem and the multi-item loop that
waits for workItems), which can hang CI if the expected requeue never happens;
change these to bounded receives using a timeout select (e.g., time.After)
inside the goroutine/loop and fail the test or break after the timeout, and for
the multi-item case loop until len(workItems)==2 (or timeout) instead of a
single blocking c.queue.Get() call so the test always terminates.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 73bc38a8-f75b-452f-bc99-ece5147e4bc1

📥 Commits

Reviewing files that changed from the base of the PR and between f90723b and cedbdcf.

📒 Files selected for processing (1)
  • pkg/driver/aws-ebs/aws_ebs_tags_queue_worker_test.go

@gnufied
Copy link
Member Author

gnufied commented Mar 6, 2026

/retest

}

type failedTagError struct {
pvs []*v1.PersistentVolume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pvs ever read and used for anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is pv nil at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, though this existed since beginning I am afraid. :(

gnufied and others added 2 commits March 9, 2026 10:28
Co-authored-by: Jan Šafránek <jsafrane@redhat.com>
Co-authored-by: Jan Šafránek <jsafrane@redhat.com>
Comment on lines +214 to +216
klog.Infof("Successfully updated PV annotations and tags for volume %s", volumeName)
} else {
klog.Infof("Successfully updated PV annotations and tags for volume %s", volumeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the messages are the same

@jsafrane
Copy link
Contributor

jsafrane commented Mar 9, 2026

/lgrm

@jsafrane
Copy link
Contributor

jsafrane commented Mar 9, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2026
@gnufied
Copy link
Member Author

gnufied commented Mar 9, 2026

I have tested this change manually on a hypershift cluster.

/verified by @gnufied

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 9, 2026
@openshift-ci-robot
Copy link

@gnufied: This PR has been marked as verified by @gnufied.

Details

In response to this:

I have tested this change manually on a hypershift cluster.

/verified by @gnufied

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Phaow
Copy link
Contributor

Phaow commented Mar 10, 2026

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

@gnufied: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 9b7feda into openshift:main Mar 10, 2026
21 checks passed
@openshift-ci-robot
Copy link

@gnufied: Jira Issue Verification Checks: Jira Issue OCPBUGS-77571
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-77571 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Fixes https://issues.redhat.com/browse/OCPBUGS-77571

The tags on ROSA cluster should not change after creation, so the day-2 tagging of volumes should not be necessary. This will avoid triggering CreateTags RPC calls.

Verified in real cluster as well:

I0307 01:19:56.561236       1 aws_ebs_tags_controller.go:358] All volumes already have the desired tags, skipping CreateTags call
I0307 01:19:56.570951       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-fa832390-b62e-4287-92e9-658e7420603f
I0307 01:19:56.578964       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-0aca21f1-bfe0-4edf-88be-39201145809d
I0307 01:19:56.584962       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-496249d4-5e37-44a4-98df-f05ddc83fc82
I0307 01:19:56.592478       1 aws_ebs_tags_queue_worker.go:152] Successfully updated PV annotations and tags for volume pvc-728de243-db2a-4480-b204-c04bda1e236c

Summary by CodeRabbit

  • Chores

  • Tagging now checks existing volume tags in batches and skips volumes already matching desired tags; skips CreateTags entirely when no updates are needed and logs accordingly.

  • Bug Fixes

  • More robust error classification and retry logic: batch-level failures are retried with rate limiting while per-volume failures are handled individually; improved handling of missing volumes and tag-describe errors.

  • Tests

  • New tests covering tag-comparison, queueing behavior, batch vs. per-volume failure paths, and early-skip scenarios when resource tags are absent or empty.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gnufied
Copy link
Member Author

gnufied commented Mar 10, 2026

/cherry-pick release-4.21

@openshift-cherrypick-robot

@gnufied: new pull request created: #521

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.22.0-0.nightly-2026-03-10-100251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants