Skip to content

OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults#30746

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
wangke19:tls13-testdefaults-support
Mar 16, 2026
Merged

OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults#30746
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
wangke19:tls13-testdefaults-support

Conversation

@wangke19
Copy link
Contributor

@wangke19 wangke19 commented Feb 2, 2026

Summary

This PR adds TLS 1.3 (Modern profile) support to TestTLSDefaults and fixes critical test reliability issues.

Changes

1. Add TLS 1.3 (Modern profile) support

  • Replace the skip condition with a switch statement that handles both Intermediate and Modern profiles
  • For Intermediate profile: test TLS 1.2+ and cipher suites (existing behavior)
  • For Modern profile: test TLS 1.3 only (cipher suites are not configurable in TLS 1.3)
  • Use a dynamic minTLSVersion variable based on the profile

2. Fix cipher suite testing for Intermediate profile

  • Add isTLS13Cipher() helper function to identify TLS 1.3-specific cipher suites
  • Skip TLS 1.3 ciphers when testing with TLS 1.2 - this is critical because:
    • The Intermediate profile includes TLS 1.3 ciphers in its cipher list (per Mozilla 5.7 guidelines)
    • TLS 1.3 ciphers are predetermined by the spec and cannot be configured via CipherSuites
    • Testing TLS 1.3 ciphers with MaxVersion: tls.VersionTLS12 always fails with "handshake failure"
  • TLS 1.3 ciphers are verified through version negotiation tests instead

3. Skip ECDSA cipher suites (RSA-only server keys)

  • Mark ECDSA ciphers as expected-to-fail when testing against kube-apiserver
  • OpenShift uses RSA keys for all certificates (library-go's NewKeyPair() -> newRSAKeyPair() -> rsa.GenerateKey())
  • ECDSA cipher suites (TLS_ECDHE_ECDSA_WITH_*) require an ECDSA server key to negotiate -- they always fail the TLS handshake against an RSA-keyed server
  • Go's TLS server silently skips cipher suites incompatible with the server's key type during negotiation
  • The TLS profiles (Old, Intermediate) intentionally include both ECDSA and RSA ciphers per Mozilla Server Side TLS guidelines for broad compatibility -- this is correct behavior, the test just needs to account for the key type mismatch

4. Skip ChaCha20-Poly1305 ciphers in FIPS mode

  • Mark ChaCha20-Poly1305 ciphers as expected-to-fail in FIPS mode
  • ChaCha20-Poly1305 is not a FIPS 140-2/140-3 approved algorithm
  • In FIPS mode, Go's crypto library disables ChaCha20-Poly1305, so kube-apiserver cannot negotiate these ciphers
  • Detects FIPS mode via exutil.IsFIPS() (reads cluster-config-v1 ConfigMap from kube-system)
  • Affects both TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 and TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 (ECDSA variant already handled by Added etcd folder into .gitignore #3)

5. Add missing dial failure assertion

  • Assert dial failure when a cipher is expected to succeed (complementary to existing failure-expected path)
  • Previously, unexpected dial errors were silently ignored, hiding test failures

Problem

TestTLSDefaults Issue

CI jobs running with TLS 1.3 clusters (e.g., e2e-aws-ovn-tls-13) were seeing this test skip instead of validating TLS behavior.

Test Failures on Intermediate Profile

After the library-go update (commit 4e6c9ca5f in cluster-kube-apiserver-operator, Dec 17 2025) that added TLS 1.3 ciphers to DefaultCiphers(), the test began failing with:

expected success on cipher TLS_AES_128_GCM_SHA256, got error: remote error: tls: handshake failure

This occurred because the test was trying to verify TLS 1.3 ciphers using MaxVersion: tls.VersionTLS12.

ECDSA Cipher Handshake Failures

After bumping library-go to remove CBC ciphers, the test still failed with:

expected success on cipher TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, got error: remote error: tls: handshake failure

DefaultCiphers() includes both ECDSA and RSA cipher suites (per Mozilla Intermediate profile). ECDSA ciphers require an ECDSA server key, but kube-apiserver uses RSA certificates.

ChaCha20-Poly1305 Failures in FIPS Mode

After fixing ECDSA handling, FIPS CI jobs failed with:

expected success on cipher TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305, got error: remote error: tls: handshake failure

ChaCha20-Poly1305 is not FIPS 140-2/140-3 approved. Go's FIPS crypto implementation disables it entirely.

Root Cause

Three distinct issues in the test's cipher validation logic:

  1. TLS 1.3 ciphers tested at TLS 1.2: The Intermediate profile now includes TLS 1.3 ciphers, but they cannot be negotiated when MaxVersion is set to TLS 1.2.

  2. ECDSA ciphers vs RSA server keys: DefaultCiphers() includes ECDSA ciphers per Mozilla guidelines, but kube-apiserver generates RSA keys via library-go's NewKeyPair(). Go's TLS server silently skips ciphers incompatible with the server's key type.

  3. ChaCha20-Poly1305 in FIPS mode: DefaultCiphers() includes ChaCha20-Poly1305 ciphers, but Go's FIPS crypto implementation disables them. The kube-apiserver in FIPS mode cannot negotiate these ciphers.

Solution

  • Skip TLS 1.3 ciphers when testing with TLS 1.2 constraints
  • Mark ECDSA ciphers as expected-to-fail since OpenShift servers use RSA keys
  • Detect FIPS mode and mark ChaCha20-Poly1305 ciphers as expected-to-fail
  • Verify TLS 1.3 support through version negotiation tests instead

Verification

Code Quality

  • go build ./test/extended/apiserver/... passes
  • go vet ./test/extended/apiserver/... passes

Testing Approach

  • TLS 1.2 RSA+AES-GCM ciphers are tested with MaxVersion: tls.VersionTLS12
  • ECDSA ciphers are correctly expected to fail (RSA server keys)
  • ChaCha20-Poly1305 ciphers are correctly expected to fail in FIPS mode
  • TLS 1.3 ciphers are verified implicitly through TLS version negotiation tests

Related Issues

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci bot requested review from deads2k and sjenning February 2, 2026 10:07
@wangke19 wangke19 changed the title Add TLS 1.3 (Modern profile) support to TestTLSDefaults [WIP]Add TLS 1.3 (Modern profile) support to TestTLSDefaults Feb 2, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 2, 2026
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-aggregate periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-aggregate periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/44c09590-01e0-11f1-99ae-f7f91100ef91-0

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

CI job https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-origin-30746-nightly-4.22-e2e-aws-ovn-tls-13/2019074402091536384:
passed: (2.7s) 2026-02-04T18:11:23 "[sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel]"

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 4, 2026

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ad6ed650-01fa-11f1-880d-126cb2da3d50-0

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 5, 2026

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 5, 2026

@wangke19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3fb34a50-02a5-11f1-8110-7108862631ab-0

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 6, 2026

/retitle [WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults

@openshift-ci openshift-ci bot changed the title [WIP]Add TLS 1.3 (Modern profile) support to TestTLSDefaults [WIP]OCPBUGS-76334: Add TLS 1.3 (Modern profile) support to TestTLSDefaults Feb 6, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 6, 2026
@openshift-ci-robot
Copy link

@wangke19: This pull request references Jira Issue OCPBUGS-76334, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

The TestTLSDefaults test was previously skipping when the cluster TLS profile was set to Modern (TLS 1.3). This change extends the test to support both Intermediate and Modern TLS profiles, ensuring CI jobs configured with TLS 1.3 properly test the TLS version behavior.

Changes

  • Replace the skip condition with a switch statement that handles both Intermediate and Modern profiles
  • For Intermediate profile: test TLS 1.2+ and cipher suites
  • For Modern profile: test TLS 1.3 only (cipher suites are not configurable in TLS 1.3)
  • Use a dynamic minTLSVersion variable based on the profile

Problem

CI jobs running with TLS 1.3 clusters (e.g., openshift-kubernetes-2315-ci-4.18-e2e-aws-ovn-tls-13) were seeing this test skip:

[sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel]
Reason: skip [github.com/openshift/origin/test/extended/apiserver/tls.go:126]: 
Cluster TLS profile is not default (intermediate), skipping cipher defaults check

Solution

The test now properly handles Modern TLS profile by:

  1. Testing that only TLS 1.3 connections succeed
  2. Testing that TLS 1.0, 1.1, and 1.2 connections fail
  3. Skipping cipher suite testing (not applicable to TLS 1.3)

Test Plan

  • Verify test compiles without errors
  • Run test on Intermediate profile cluster (existing behavior preserved)
  • Run test on Modern profile cluster (new behavior)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wangke19
Copy link
Contributor Author

wangke19 commented Feb 6, 2026

Test Failure Analysis: kubelet terminates kube-apiserver gracefully extended

TL;DR

The test failure is NOT caused by this PR's code changes. The ungraceful kube-apiserver terminations occur during cluster initialization, 1-1.5 hours before the TestTLSDefaults test runs. This is a pre-existing infrastructure issue with the TLS 1.3 CI environment.


Evidence from 3 Failed Runs

I analyzed all three failed CI runs in detail:

Run ID Ungraceful Terminations TestTLSDefaults Execution Time Gap
2019122669349244928 19:56:19, 20:04:07 21:38:50 ✅ PASSED (5.3s) 1h 34m - 1h 42m
2019429245096300544 16:24:22 17:47:18 ✅ PASSED (2.3s) 1h 23m
2019074402091536384 16:48:52, 16:52:45, 16:57:14 18:11:20 ✅ PASSED (2.7s) 1h 14m - 1h 23m

Pattern: All ungraceful terminations occur during cluster bootstrap (16:00-20:00 range), well before test execution begins (18:11-21:38 range).


Detailed Timeline (Run 1 Example)

Timeline for run 2019122669349244928:

19:56:19 🔴 Pod kube-apiserver-ip-10-0-81-175 started (ungraceful termination detected)
20:04:07 🔴 Pod kube-apiserver-ip-10-0-76-239 started (ungraceful termination detected)
         ⏱️  [~1h 34m gap - cluster stabilization phase]
21:38:50 ✅ TestTLSDefaults starts (THIS PR)
21:38:55 ✅ TestTLSDefaults ends - PASSED
21:39:48 ✅ TestTLSMinimumVersions starts (pre-existing, also tests TLS 1.3)
21:39:59 ✅ TestTLSMinimumVersions ends - PASSED
22:03:01 ❌ Graceful termination test runs
22:03:04 ❌ Test FAILS - detects ungraceful terminations from 19:56 and 20:04

From the test failure output:

fail [github.com/openshift/origin/test/extended/apiserver/graceful_termination.go:89]: 
The following API Servers weren't gracefully terminated: 
 kube-apiserver on node ip-10-0-76-239.us-west-2.compute.internal wasn't gracefully terminated, 
 reason: Previous pod kube-apiserver-ip-10-0-76-239.us-west-2.compute.internal started at 
 2026-02-04 20:04:07.028765438 +0000 UTC did not terminate gracefully

Key Findings

  1. TestTLSDefaults always passes in all 3 runs (5.3s, 2.3s, 2.7s)
  2. Temporal impossibility - failures occur 1-1.5 hours BEFORE the test runs
  3. TestTLSMinimumVersions pre-existed - TLS 1.3 connection testing was already happening (11.3s runtime, more intensive than TestTLSDefaults)
  4. Consistent pattern - all 3 runs show identical behavior: bootstrap failures detected later
  5. Real infrastructure issue - kube-apiserver pods are terminated ungracefully during cluster initialization

Root Cause: TLS 1.3 Cluster Bootstrap Issue

The nightly-4.22-e2e-aws-ovn-tls-13 CI job has a pre-existing cluster initialization problem where kube-apiserver pods don't terminate gracefully during bootstrap. This happens during:

  • Initial cluster provisioning (16:00-20:00 time range)
  • Before any e2e tests execute (tests start ~18:00-21:00)
  • Likely causes: kubelet/CRI-O not respecting graceful termination during bootstrap, TLS configuration triggering rapid restarts, or resource pressure

Why this appears on this PR: The PR triggers the TLS 1.3 CI job via /payload-job command, which exposes the existing infrastructure issue. This job may not run frequently enough on main branch to detect the problem consistently.


Why This PR is Not the Cause

  1. Code changes are test-only - only modifies TestTLSDefaults to support TLS 1.3 instead of skipping
  2. Test is read-only - no cluster modifications, only verification via port-forward and TLS connections
  3. Pre-existing TLS testing - TestTLSMinimumVersions already performed similar (more intensive) TLS connection testing
  4. Test passes successfully - if the test code was problematic, it would fail or cause issues during its execution, not 1.5 hours earlier

Recommendation

This PR can proceed to merge. The test failure is detecting a real cluster infrastructure problem that exists independently of this PR's changes.

Separate issue tracking: I recommend filing a bug against the TLS 1.3 CI environment for the cluster bootstrap ungraceful termination issue. The failure is legitimate - it's just not caused by this PR.


Related Tests

For reference, both TLS tests run successfully:

  • TestTLSDefaults (this PR): Tests TLS versions and cipher suites via port-forward, ~5s runtime
  • TestTLSMinimumVersions (pre-existing): Tests TLS versions across 7 components, ~11s runtime

Both tests work correctly and pass. The graceful termination test is designed to detect any ungraceful terminations that occurred at any point during cluster lifetime, including initialization phase.

@wangke19
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2026

@wangke19: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@wangke19
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

@wangke19: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@wangke19
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-tls-13

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

@wangke19: trigger 14 job(s) of type blocking for the nightly release of OCP 4.22

  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips
  • periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-main-ci-4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-1of2
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-2of2
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips-no-nat-instance
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4
  • periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8e1bcf80-1e45-11f1-9673-c3ae2c377049-0

@wangke19
Copy link
Contributor Author

/retest

@wangke19
Copy link
Contributor Author

wangke19 commented Mar 13, 2026

Payload Test Results: [sig-api-machinery][Feature:APIServer] TestTLSDefaults

Checked test [sig-api-machinery][Feature:APIServer] TestTLSDefaults [Suite:openshift/conformance/parallel] across all periodic-ci jobs in payload run 8e1bcf80-1e45-11f1-9673-c3ae2c377049-0.

✅ Jobs Where the Test Ran and PASSED (9/9)

Job Result
periodic-ci-openshift-release-main-ci-4.22-e2e-aws-upgrade-ovn-single-node ✅ PASSED
periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips ✅ PASSED
periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-upgrade ✅ PASSED
periodic-ci-openshift-release-main-ci-4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade ✅ PASSED
periodic-ci-openshift-hypershift-release-4.22-periodics-e2e-aws-ovn-conformance ✅ PASSED
periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview ✅ PASSED
periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips-no-nat-instance ✅ PASSED
periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv4 ✅ PASSED
periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ipi-ovn-ipv6 ✅ PASSED

⚪ Jobs That Did NOT Run This Test (5) — Expected

The 5 serial-suite jobs (nightly-4.22-e2e-aws-ovn-serial-1of2, -2of2, ci-4.22-e2e-aws-ovn-techpreview-serial-1of3, -2of3, -3of3) do not include [Suite:openshift/conformance/parallel] tests by design.

All 9 applicable jobs passed

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

@wangke19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-fips b551cab link true /test e2e-aws-ovn-fips
ci/prow/e2e-vsphere-ovn b551cab link true /test e2e-vsphere-ovn
ci/prow/e2e-gcp-ovn b551cab link true /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wangke19
Copy link
Contributor Author

wangke19 commented Mar 13, 2026

@wangke19
Copy link
Contributor Author

/verfied by CI jobs

Copy link
Contributor

@sanchezl sanchezl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2026
@wangke19
Copy link
Contributor Author

/retest

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, sanchezl, wangke19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2026
@wangke19
Copy link
Contributor Author

/verified by CI test

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 16, 2026
@openshift-ci-robot
Copy link

@wangke19: This PR has been marked as verified by CI test.

Details

In response to this:

/verified by CI test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 5c29d3a into openshift:main Mar 16, 2026
20 of 21 checks passed
@openshift-ci-robot
Copy link

@wangke19: Jira Issue Verification Checks: Jira Issue OCPBUGS-76334
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-76334 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Summary

This PR adds TLS 1.3 (Modern profile) support to TestTLSDefaults and fixes critical test reliability issues.

Changes

1. Add TLS 1.3 (Modern profile) support

  • Replace the skip condition with a switch statement that handles both Intermediate and Modern profiles
  • For Intermediate profile: test TLS 1.2+ and cipher suites (existing behavior)
  • For Modern profile: test TLS 1.3 only (cipher suites are not configurable in TLS 1.3)
  • Use a dynamic minTLSVersion variable based on the profile

2. Fix cipher suite testing for Intermediate profile

  • Add isTLS13Cipher() helper function to identify TLS 1.3-specific cipher suites
  • Skip TLS 1.3 ciphers when testing with TLS 1.2 - this is critical because:
  • The Intermediate profile includes TLS 1.3 ciphers in its cipher list (per Mozilla 5.7 guidelines)
  • TLS 1.3 ciphers are predetermined by the spec and cannot be configured via CipherSuites
  • Testing TLS 1.3 ciphers with MaxVersion: tls.VersionTLS12 always fails with "handshake failure"
  • TLS 1.3 ciphers are verified through version negotiation tests instead

3. Skip ECDSA cipher suites (RSA-only server keys)

  • Mark ECDSA ciphers as expected-to-fail when testing against kube-apiserver
  • OpenShift uses RSA keys for all certificates (library-go's NewKeyPair() -> newRSAKeyPair() -> rsa.GenerateKey())
  • ECDSA cipher suites (TLS_ECDHE_ECDSA_WITH_*) require an ECDSA server key to negotiate -- they always fail the TLS handshake against an RSA-keyed server
  • Go's TLS server silently skips cipher suites incompatible with the server's key type during negotiation
  • The TLS profiles (Old, Intermediate) intentionally include both ECDSA and RSA ciphers per Mozilla Server Side TLS guidelines for broad compatibility -- this is correct behavior, the test just needs to account for the key type mismatch

4. Skip ChaCha20-Poly1305 ciphers in FIPS mode

  • Mark ChaCha20-Poly1305 ciphers as expected-to-fail in FIPS mode
  • ChaCha20-Poly1305 is not a FIPS 140-2/140-3 approved algorithm
  • In FIPS mode, Go's crypto library disables ChaCha20-Poly1305, so kube-apiserver cannot negotiate these ciphers
  • Detects FIPS mode via exutil.IsFIPS() (reads cluster-config-v1 ConfigMap from kube-system)
  • Affects both TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 and TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 (ECDSA variant already handled by Added etcd folder into .gitignore #3)

5. Add missing dial failure assertion

  • Assert dial failure when a cipher is expected to succeed (complementary to existing failure-expected path)
  • Previously, unexpected dial errors were silently ignored, hiding test failures

Problem

TestTLSDefaults Issue

CI jobs running with TLS 1.3 clusters (e.g., e2e-aws-ovn-tls-13) were seeing this test skip instead of validating TLS behavior.

Test Failures on Intermediate Profile

After the library-go update (commit 4e6c9ca5f in cluster-kube-apiserver-operator, Dec 17 2025) that added TLS 1.3 ciphers to DefaultCiphers(), the test began failing with:

expected success on cipher TLS_AES_128_GCM_SHA256, got error: remote error: tls: handshake failure

This occurred because the test was trying to verify TLS 1.3 ciphers using MaxVersion: tls.VersionTLS12.

ECDSA Cipher Handshake Failures

After bumping library-go to remove CBC ciphers, the test still failed with:

expected success on cipher TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, got error: remote error: tls: handshake failure

DefaultCiphers() includes both ECDSA and RSA cipher suites (per Mozilla Intermediate profile). ECDSA ciphers require an ECDSA server key, but kube-apiserver uses RSA certificates.

ChaCha20-Poly1305 Failures in FIPS Mode

After fixing ECDSA handling, FIPS CI jobs failed with:

expected success on cipher TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305, got error: remote error: tls: handshake failure

ChaCha20-Poly1305 is not FIPS 140-2/140-3 approved. Go's FIPS crypto implementation disables it entirely.

Root Cause

Three distinct issues in the test's cipher validation logic:

  1. TLS 1.3 ciphers tested at TLS 1.2: The Intermediate profile now includes TLS 1.3 ciphers, but they cannot be negotiated when MaxVersion is set to TLS 1.2.

  2. ECDSA ciphers vs RSA server keys: DefaultCiphers() includes ECDSA ciphers per Mozilla guidelines, but kube-apiserver generates RSA keys via library-go's NewKeyPair(). Go's TLS server silently skips ciphers incompatible with the server's key type.

  3. ChaCha20-Poly1305 in FIPS mode: DefaultCiphers() includes ChaCha20-Poly1305 ciphers, but Go's FIPS crypto implementation disables them. The kube-apiserver in FIPS mode cannot negotiate these ciphers.

Solution

  • Skip TLS 1.3 ciphers when testing with TLS 1.2 constraints
  • Mark ECDSA ciphers as expected-to-fail since OpenShift servers use RSA keys
  • Detect FIPS mode and mark ChaCha20-Poly1305 ciphers as expected-to-fail
  • Verify TLS 1.3 support through version negotiation tests instead

Verification

Code Quality

  • go build ./test/extended/apiserver/... passes
  • go vet ./test/extended/apiserver/... passes

Testing Approach

  • TLS 1.2 RSA+AES-GCM ciphers are tested with MaxVersion: tls.VersionTLS12
  • ECDSA ciphers are correctly expected to fail (RSA server keys)
  • ChaCha20-Poly1305 ciphers are correctly expected to fail in FIPS mode
  • TLS 1.3 ciphers are verified implicitly through TLS version negotiation tests

Related Issues

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wangke19 wangke19 deleted the tls13-testdefaults-support branch March 16, 2026 15:40
wangke19 added a commit to wangke19/origin that referenced this pull request Mar 17, 2026
The replace directive pointing to github.com/wangke19/filepath-securejoin-1
was added in PR openshift#30746 as a workaround for opencontainers/runc v1.2.5
calling the removed MkdirAllHandle function from filepath-securejoin v0.6.0.

However, go mod why shows that opencontainers/runc is not actually needed
by the origin module:

  # github.com/opencontainers/runc
  (main module does not need package github.com/opencontainers/runc)

Since runc is only a transitive dependency (not directly used), and the
compatibility shim in the fork is not needed, we can safely remove the
personal fork and use the official filepath-securejoin v0.6.0.

This eliminates the supply chain risk of depending on a personal fork
and follows Go module best practices.

Note: PR openshift#30866 (K8s 1.35 rebase) will also remove this when it merges,
as the rebase removes opencontainers/runc entirely from the dependency
graph. This PR provides immediate cleanup for the current main branch.
@wangke19
Copy link
Contributor Author

Personal Fork Status & Migration Plan

This PR currently uses a personal fork to work around a dependency issue:

github.com/cyphar/filepath-securejoin => github.com/wangke19/filepath-securejoin-1 v0.6.2-0.20260309113322-d8b3e820b3d6

Why the Personal Fork is Needed

The library-go bump in this PR transitively upgraded github.com/cyphar/filepath-securejoin from v0.3.4 to v0.6.0, which introduced a breaking API change:

  • Breaking change: filepath-securejoin v0.6.0 removed MkdirAllHandle() from the root package (moved to pathrs-lite subpackage)
  • Impact: opencontainers/runc (vendored transitively via Kubernetes kubelet) still calls the old securejoin.MkdirAllHandle() API
  • Build error: vendor/github.com/opencontainers/runc/libcontainer/utils/utils_unix.go:334:20: undefined: securejoin.MkdirAllHandle
  • Workaround: The personal fork provides a compatibility shim that forwards MkdirAllHandle() calls to the new pathrs.MkdirAll() location

Proof: PR #30889 attempted to remove the personal fork based on misleading go mod why output, but CI builds confirmed that runc still requires the compatibility shim.

Follow-up: Organization Fork Migration

The personal fork replace directive is functionally necessary but creates supply chain risks.

Action item: Created issue #30890 to track migrating to an organization fork (github.com/openshift/filepath-securejoin) which will:

  • ✅ Eliminate dependency on a personal GitHub account
  • ✅ Add team ownership and review process
  • ✅ Maintain the same compatibility shim for opencontainers/runc v1.2.5
  • ✅ Improve transparency and auditability

Potential Obsolescence

Note: PR #30866 is rebasing to K8s 1.35. If that rebase removes runc from the dependency tree, this personal fork will no longer be needed and can be removed entirely.

Long-term: This should be removed once opencontainers/runc updates to use the new pathrs-lite API.

wangke19 added a commit to wangke19/origin that referenced this pull request Mar 17, 2026
This commit removes the dependency on github.com/openshift/library-go/pkg/crypto
from the TLS test by creating a local tlsutil package that provides the same
functionality without requiring the library-go bump.

The test only used simple utility functions from library-go:
- DefaultTLSVersion() - Returns tls.VersionTLS12
- ValidTLSVersions() - Returns list of TLS version names
- TLSVersionOrDie(name) - Converts version name to uint16
- DefaultCiphers() - Returns list of default cipher suites
- ValidCipherSuites() - Returns list of cipher suite names
- CipherSuite(name) - Converts cipher suite name to uint16

All of these are simple lookup/conversion functions with no transitive
dependencies beyond crypto/tls standard library. By extracting them into
test/extended/apiserver/tlsutil/, we can avoid the library-go bump that
was introduced in the original PR openshift#30746.

This approach:
- Keeps the valuable TLS 1.3 test improvements
- Avoids unnecessary dependency updates
- Maintains test stability
- Removes the coupling between test logic and library-go versions

The personal fork of filepath-securejoin is kept as it's required for
runc compatibility and will be addressed separately in issue #30890.
wangke19 added a commit to wangke19/origin that referenced this pull request Mar 17, 2026
This commit reverts the library-go version from v0.0.0-20260223145824-7b234b47a906
back to v0.0.0-20251015151611-6fc7a74b67c5, removing the unnecessary dependency
update that was bundled with the TLS 1.3 test fixes.

The library-go bump was not required for the test functionality, as the test
only used simple static lookup functions. By using local crypto helpers instead,
we can avoid this dependency update and its associated supply chain implications.

Updated:
- go.mod: Reverted library-go to v0.0.0-20251015151611-6fc7a74b67c5
- go.sum: Updated checksums
- vendor/: Synced with go.mod changes
- Updated TODO comment in replace directive to reference issue #30890

The personal fork replace directive for filepath-securejoin is retained as it's
required for runc v1.2.5 compatibility with filepath-securejoin v0.6.0, which
is pulled in transitively by opencontainers/selinux@v1.13.0.
wangke19 added a commit to wangke19/origin that referenced this pull request Mar 17, 2026
This commit removes the dependency on github.com/openshift/library-go/pkg/crypto
from the TLS test by creating a local tlsutil package that provides the same
functionality without requiring the library-go bump.

The test only used simple utility functions from library-go:
- DefaultTLSVersion() - Returns tls.VersionTLS12
- ValidTLSVersions() - Returns list of TLS version names
- TLSVersionOrDie(name) - Converts version name to uint16
- DefaultCiphers() - Returns list of default cipher suites
- ValidCipherSuites() - Returns list of cipher suite names
- CipherSuite(name) - Converts cipher suite name to uint16

All of these are simple lookup/conversion functions with no transitive
dependencies beyond crypto/tls standard library. By extracting them into
test/extended/apiserver/tlsutil/, we can avoid the library-go bump that
was introduced in the original PR openshift#30746.

This approach:
- Keeps the valuable TLS 1.3 test improvements
- Avoids unnecessary dependency updates
- Maintains test stability
- Removes the coupling between test logic and library-go versions

The personal fork of filepath-securejoin is kept as it's required for
runc compatibility and will be addressed separately in issue #30890.
wangke19 added a commit to wangke19/origin that referenced this pull request Mar 17, 2026
This commit reverts the library-go version from v0.0.0-20260223145824-7b234b47a906
back to v0.0.0-20251015151611-6fc7a74b67c5, removing the unnecessary dependency
update that was bundled with the TLS 1.3 test fixes.

The library-go bump was not required for the test functionality, as the test
only used simple static lookup functions. By using local crypto helpers instead,
we can avoid this dependency update and its associated supply chain implications.

Updated:
- go.mod: Reverted library-go to v0.0.0-20251015151611-6fc7a74b67c5
- go.sum: Updated checksums
- vendor/: Synced with go.mod changes
- Updated TODO comment in replace directive to reference issue #30890

The personal fork replace directive for filepath-securejoin is retained as it's
required for runc v1.2.5 compatibility with filepath-securejoin v0.6.0, which
is pulled in transitively by opencontainers/selinux@v1.13.0.
wangke19 added a commit to wangke19/origin that referenced this pull request Mar 17, 2026
…faults-support"

This reverts commit 5c29d3af01fbee9cc0c6f748a8d2f1aac5c80a8e.

Reason for revert:
PR openshift#30746 is causing test instability in 4.22 payloads. Multiple test
failures have been observed in periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade
jobs as reported in #forum-testplatform Slack channel.

Example failures:
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-aws-ovn-upgrade-4.21-to-4.22-ci-openshift-release-analysis-aggregator/2033742335904321536
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade/2033742330044878848
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade/2033742330229428224
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-upgrade-from-stable-4.21-e2e-aws-ovn-upgrade/2033742330355257344

The PR bundled test logic improvements with an unnecessary library-go
dependency bump, which has introduced instability. A follow-up PR (openshift#30894)
will reintroduce the test improvements without the dependency changes.

This revert is needed to:
1. Restore test stability in 4.22 CI
2. Separate test logic from dependency management concerns
3. Allow proper evaluation of the test improvements in isolation

Related:
- Original PR: openshift#30746
- Replacement PR: openshift#30894
- JIRA: OCPBUGS-76334
- Slack: #forum-testplatform discussion on 2026-03-17
openshift-merge-bot bot added a commit that referenced this pull request Mar 18, 2026
CNTRLPLANE-2999:Revert PR #30746: TLS 1.3 test causing instability in 4.22
wangke19 added a commit to wangke19/origin that referenced this pull request Mar 18, 2026
…-go dependency

This commit re-adds the TLS 1.3 (Modern profile) test improvements from PR openshift#30746
but uses a local tlsutil package instead of depending on library-go/pkg/crypto.

Changes from original PR openshift#30746:
- Add support for testing Modern TLS profile (TLS 1.3 only)
- Add FIPS mode detection for ChaCha20-Poly1305 cipher handling
- Add ECDSA cipher suite handling (expect failure with RSA keys)
- Improve test logging and error messages

Changes from original implementation:
- Created test/extended/apiserver/tlsutil/ package with crypto helpers
- Replaced library-go/pkg/crypto dependency with local tlsutil
- No dependency on library-go version bump

The test now properly validates both Intermediate (TLS 1.2+) and Modern
(TLS 1.3 only) TLS profiles, with appropriate handling for FIPS mode
restrictions and RSA vs ECDSA cipher compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants