Skip to content

fix: register event sources when dependents are marked for deletion#3250

Merged
metacosm merged 3 commits intooperator-framework:mainfrom
SamBarker:bulk-activation-condition-reproducer
Mar 25, 2026
Merged

fix: register event sources when dependents are marked for deletion#3250
metacosm merged 3 commits intooperator-framework:mainfrom
SamBarker:bulk-activation-condition-reproducer

Conversation

@SamBarker
Copy link
Contributor

@SamBarker SamBarker commented Mar 24, 2026

Adds a failing integration test to reproduce the bug described in #3249.

Workflow under test

ConfigMapDependentResource  (reconcilePrecondition = ALWAYS_FALSE)
  └── SecretBulkDependentResource  (activationCondition = ALWAYS_TRUE)

On first reconciliation the ConfigMap precondition fails → markDependentsForDelete cascades to SecretBulkDependentResourceNodeDeleteExecutor fires → the Secret event source was never registered (because NodeReconcileExecutor never ran) → NoEventSourceForClassException.

The test asserts this exception does not occur, and currently fails, demonstrating the bug.

I'm quite happy to attempt a fix, but didn't wanted to make sure it was confirmed as a bug first

Copilot AI review requested due to automatic review settings March 24, 2026 01:08
@openshift-ci openshift-ci bot requested review from csviri and xstefank March 24, 2026 01:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new integration-test workflow scenario under operator-framework to reproduce (and guard against) NoEventSourceForClassException when a downstream BulkDependentResource with an activationCondition is deleted before its event source is registered (issue #3249).

Changes:

  • Introduces a minimal 2-node workflow (ConfigMap dependent with always-failing reconcile precondition → Secret bulk dependent with always-true activation condition).
  • Adds a new integration test intended to surface the NodeDeleteExecutor / event source registration ordering bug.
  • Adds supporting test-only custom resource, reconciler, dependent resources, and condition implementations.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/SecretBulkDependentResource.java Test bulk dependent resource that accesses a Secret event source in a way that exposes missing registration.
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/ConfigMapDependentResource.java Parent dependent resource used to trigger downstream delete cascade via failing precondition.
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/BulkActivationConditionReconciler.java Reconciler declaring the workflow under test and capturing reconciliation errors.
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/BulkActivationConditionIT.java New integration test reproducer asserting NoEventSourceForClassException should not occur.
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/BulkActivationConditionCustomResource.java Test custom resource type used as the primary for the workflow.
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/AlwaysTrueActivation.java Activation condition implementation that always returns true.
operator-framework/src/test/java/io/javaoperatorsdk/operator/workflow/bulkactivationcondition/AlwaysFailingPrecondition.java Reconcile precondition implementation that always returns false.

Comment on lines +73 to +77
// Wait for the error to arrive — the ConfigMap precondition always fails,
// so JOSDK should attempt NodeDeleteExecutor for the Secret bulk dependent.
await()
.atMost(Duration.ofSeconds(30))
.until(() -> BulkActivationConditionReconciler.lastError.get() != null);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test currently waits until lastError becomes non-null. If/when the underlying bug is fixed (or if the issue reproduces intermittently), lastError may remain null and this test will time out rather than assert the intended behavior. Consider waiting for a positive signal that reconciliation happened (e.g., increment a reconcile counter in the reconciler) and then asserting lastError stays null / does not contain NoEventSourceForClassException.

Suggested change
// Wait for the error to arrive — the ConfigMap precondition always fails,
// so JOSDK should attempt NodeDeleteExecutor for the Secret bulk dependent.
await()
.atMost(Duration.ofSeconds(30))
.until(() -> BulkActivationConditionReconciler.lastError.get() != null);
// Allow some time for reconciliation to happen — the ConfigMap precondition always fails,
// so JOSDK should attempt NodeDeleteExecutor for the Secret bulk dependent.
await()
.atMost(Duration.ofSeconds(30))
.pollDelay(Duration.ofSeconds(2))
.until(() -> true);

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +49
* <p>This test FAILS on unfixed JOSDK, demonstrating the bug.
*/
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds an integration test that is expected to fail on current code (“This test FAILS on unfixed JOSDK”). If this is intended as a reproducer-only PR, it should be marked @Disabled (with a link to #3249) or otherwise excluded from the default test suite; otherwise it will break CI for any branch that includes it. Alternatively, include the corresponding production fix in the same PR so the test passes on merge.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +56
public class BulkActivationConditionIT {

@RegisterExtension
static LocallyRunOperatorExtension extension =
LocallyRunOperatorExtension.builder()
.withReconciler(new BulkActivationConditionReconciler())
.build();
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing workflow integration tests under io.javaoperatorsdk.operator.workflow.* appear to be annotated with @Sample (used for sample documentation). This new workflow IT isn’t annotated, so it won’t be picked up consistently. Consider adding an appropriate @Sample annotation (tldr/description) to match the established pattern (e.g., workflow/workflowactivationcondition/WorkflowActivationConditionIT.java).

Copilot uses AI. Check for mistakes.
public class BulkActivationConditionReconciler
implements Reconciler<BulkActivationConditionCustomResource> {

static final AtomicReference<Exception> lastError = new AtomicReference<>();
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastError is a static global used to communicate from the reconciler into the IT. This can leak state across tests and makes parallel test execution brittle. Prefer storing error state on the reconciler instance (e.g., an instance AtomicReference) and retrieving it via extension.getReconcilerOfType(...), similar to other ITs in this module.

Suggested change
static final AtomicReference<Exception> lastError = new AtomicReference<>();
final AtomicReference<Exception> lastError = new AtomicReference<>();

Copilot uses AI. Check for mistakes.
SamBarker and others added 3 commits March 24, 2026 15:57
…teExecutor

When a BulkDependentResource has an activationCondition and its parent
dependent has a failing reconcilePrecondition, JOSDK's
markDependentsForDelete() cascades to the bulk dependent and fires
NodeDeleteExecutor for it. However, NodeDeleteExecutor does not call
registerOrDeregisterEventSourceBasedOnActivation() before invoking
delete(), so if NodeReconcileExecutor has never run for that node (e.g.
on first reconciliation) the event source is never registered.

The delete() path calls getSecondaryResources() → eventSourceRetriever
.getEventSourceFor() → NoEventSourceForClassException.

This IT demonstrates the bug with a minimal workflow:
  ConfigMapDependentResource  (reconcilePrecondition = ALWAYS_FALSE)
    └── SecretBulkDependentResource  (activationCondition = ALWAYS_TRUE)

The fix is to call registerOrDeregisterEventSourceBasedOnActivation()
in NodeDeleteExecutor.doRun() before calling dependent.delete(),
mirroring what NodeReconcileExecutor already does.

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Make lastError and callCount instance fields on the reconciler; hold
  the reconciler instance as a static field in the IT so tests can
  access state without static leakage between tests
- Add callCount so the test can wait for any reconciliation activity
  (reconcile() or updateErrorStatus()) then assert cleanly, rather than
  timing out if the bug is fixed
- Add @disabled linking to issue operator-framework#3249 so this reproducer-only test
  does not break CI
- Add @sample annotation to match the pattern of other workflow ITs

Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
@metacosm metacosm force-pushed the bulk-activation-condition-reproducer branch from be363a8 to ae0eb4a Compare March 24, 2026 14:58
@metacosm
Copy link
Collaborator

@SamBarker can you confirm that this fixes your issue?

@SamBarker
Copy link
Contributor Author

@SamBarker can you confirm that this fixes your issue?

Yes I've built 5.3.1-SNAPSHOT locally and tested the kroxylicious PR with it. I no longer see the NoEventSourceForClassException exception.

Copy link
Contributor Author

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the quick turn around @metacosm

Copy link
Collaborator

@csviri csviri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM.

just added a nit, but looks great. thank you all!

if (activationConditionMet) {
// make sure we register the dependent's event source if it hasn't been added already
// this might be needed in corner cases such as
// https://github.com/operator-framework/java-operator-sdk/issues/3249
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just nit: instead of linking the issue we might want to describe the case when it is needed. (Actually this is quite an obvious why it should be there so might not even need a description, but maybe it just me).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue explains the issue way better than I could do in a short comment and that provides historical context for the change that might be useful later on because that method call here is quite unexpected, imo.

@metacosm metacosm changed the title test: add failing IT reproducing NoEventSourceForClassException in NodeDeleteExecutor for activation-conditioned BulkDependentResource fix: register event sources when dependents are marked for deletion Mar 25, 2026
@metacosm metacosm merged commit 2c16143 into operator-framework:main Mar 25, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants