Skip to content

Conversation

@jaypoulz
Copy link
Contributor

@jaypoulz jaypoulz commented Jan 22, 2026

Individual recovery tests now skip when cluster preconditions aren't met, but an AfterSuite hook ensures the suite fails with diagnostic information about which tests were skipped and why. This makes precondition failures visible to CI analysis services while maintaining test stability.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 22, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 22, 2026

@jaypoulz: This pull request references OCPEDGE-2011 which is a valid jira issue.

Details

In response to this:

Replace skip behavior with fail for cluster health checks to make precondition failures visible to test analysis services. Consolidate health check functions and use consistent 5-minute timeouts for pre-checks across recovery and node replacement tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from eggfoobar and qJkee January 22, 2026 20:29
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2026
@eggfoobar
Copy link
Contributor

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 22, 2026

@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b98661b0-f7d2-11f0-965f-3eb360421fed-0

@eggfoobar
Copy link
Contributor

/test verify

Checking if fluke

@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from fca0530 to 447b67f Compare January 22, 2026 21:26
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@eggfoobar
Copy link
Contributor

/payload-job pull-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 22, 2026

@eggfoobar: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@eggfoobar
Copy link
Contributor

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 22, 2026

@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6edd8510-f7e8-11f0-9c54-2a9b6caa1fe3-0

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just left one comment, otherwise lgtm

@jaypoulz
Copy link
Contributor Author

/hold
I've thought of a different way to do this

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2026
@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from b2eb821 to cd9d035 Compare January 26, 2026 14:20
@jaypoulz jaypoulz changed the title OCPEDGE-2011: test(two-node): fail tests on unhealthy cluster preconditions OCPEDGE-2011: test(two-node): track precondition skips and fail suite on cluster health issues Jan 26, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 26, 2026

@jaypoulz: This pull request references OCPEDGE-2011 which is a valid jira issue.

Details

In response to this:

Individual recovery tests now skip when cluster preconditions aren't met, but an AfterSuite hook ensures the suite fails with diagnostic information about which tests were skipped and why. This makes precondition failures visible to CI analysis services while maintaining test stability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/590785b0-fac2-11f0-93aa-392383318691-0

Copy link
Contributor

@clobrano clobrano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great rework, I like the idea to track skipped tests!

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2026
@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from cd9d035 to c387d04 Compare January 27, 2026 13:49
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 27, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

New changes are detected. LGTM label has been removed.

@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/87b2a9b0-fd1c-11f0-9c7d-84e7536a74e3-0

@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e451f060-fd33-11f0-8490-88a959329d05-0

@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e2c8cfa0-fd44-11f0-9fd5-54fbf52872ce-0

@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from 70afc49 to 7824a88 Compare January 29, 2026 21:34
@eggfoobar
Copy link
Contributor

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/847b0a40-fd5c-11f0-80ad-b0c189ec5243-0

@eggfoobar
Copy link
Contributor

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 30, 2026

@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/801f2400-fd79-11f0-9a89-5659735b00a6-0

@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch 2 times, most recently from f0b86c7 to 1c2bc6b Compare January 30, 2026 23:03
@jaypoulz jaypoulz changed the title OCPEDGE-2011: test(two-node): track precondition skips and fail suite on cluster health issues OCPEDGE-2011: test(two-node): stablize tnf recovery suite Jan 31, 2026
@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7694ff00-fe45-11f0-91b0-a115b2c7995c-0

@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from 1c2bc6b to 0c18169 Compare January 31, 2026 01:59
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from 0c18169 to 9908909 Compare January 31, 2026 02:35
@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a575c0e0-fe4d-11f0-8967-f7abf998ba7b-0

DualReplica topology runs etcd externally via Pacemaker/Podman rather
than as static pods. Skip pod log streaming when this topology is
detected to avoid spurious errors.

Co-authored-by: Claude Opus 4.5 (Anthropic) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from 9908909 to 1c79060 Compare January 31, 2026 02:51
jaypoulz and others added 3 commits January 30, 2026 22:01
Adds infrastructure to detect and report tests that skip due to unmet
cluster preconditions:

- Add precondition skip detection in cmd_runsuite.go that converts
  skips with "unmet cluster preconditions" marker to synthetic failures
- Handle framework initialization errors gracefully instead of panicking
- Update IsMicroShiftCluster to skip with precondition marker on timeout
- Run two-node suite serially with Disruptive marker

Co-authored-by: Claude Opus 4.5 (Anthropic) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Improves two-node cluster test reliability and diagnostics:

- Use stonith confirm for quorum recovery when VM is destroyed
- Wait for CEO update-setup jobs instead of manual pacemaker cycling
- Add diagnostic gathering on test failure (VM states, pcs status, etcd members)
- Add pacemaker cleanup before health checks to clear stale failures
- Add context support to debug container pacemaker functions
- Improve progress logging during recovery validation
- Replace klog with e2e.Logf for consistent ginkgo log capture
- Fix topology detection to bypass stale framework cache
- Reduce verbose logging in libvirt/pacemaker utilities

Co-authored-by: Claude Opus 4.5 (Anthropic) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
OPM index.json files are JSON arrays, not objects. The jsonformat tool
only handles objects, so exclude these files from verification.

Co-authored-by: Claude Opus 4.5 (Anthropic) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@jaypoulz jaypoulz force-pushed the OCPEDGE-2011-tnf-node-replacement-updates branch from 1c79060 to 1f759fb Compare January 31, 2026 03:15
@jaypoulz
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2026

@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-metal-ovn-two-node-fencing-recovery-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/25a50dc0-fe53-11f0-8776-19560c1b7387-0

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2026

@jaypoulz: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants