Skip to content

OCPBUGS-77263: create self-signed crt if failing to read default#740

Open
jcmoraisjr wants to merge 1 commit intoopenshift:masterfrom
jcmoraisjr:OCPBUGS-77263-autogen-crt
Open

OCPBUGS-77263: create self-signed crt if failing to read default#740
jcmoraisjr wants to merge 1 commit intoopenshift:masterfrom
jcmoraisjr:OCPBUGS-77263-autogen-crt

Conversation

@jcmoraisjr
Copy link
Copy Markdown
Member

@jcmoraisjr jcmoraisjr commented Feb 26, 2026

Ingress operator manages and exposes a default certificate for Router. A default certificate is used if the route does not provide one, or a request is made for a non matching route.

Current approach uses a hardcoded certificate in the router image, which should lead to image scanners reporting the storage of its private key.

This implementation removes the hardcoded certificate, as well as the template code that ensures a default certificate is provided. In the Go code side, we are ensuring that the default certificate is always provided by generating one on the fly. If this last resort option fails due to permission, the whole router process fail to start.

Summary by CodeRabbit

  • New Features

    • Automatic self-signed certificate creation when no default is configured.
    • Live certificate reloads via filesystem watch with atomic replacement.
  • Bug Fixes

    • Fail-fast handling during certificate initialization (no silent fallback).
    • TLS termination now uses the configured certificate directly.
    • Default certificate and key files created with stricter permissions (0600).
    • Plugin construction now returns errors instead of producing invalid instances.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77263, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Ingress operator manages and exposes a default certificate for Router. A default certificate is used if the route does not provide one, or a request is made for a non matching route.

Current approach uses a hardcoded certificate in the router image, which should lead to image scanners reporting the storage of its private key.

This implementation removes the hardcoded certificate, as well as the template code that ensures a default certificate is provided. In the Go code side, we are ensuring that the default certificate is always provided by generating one on the fly. If this last resort option fails due to permission, the whole router process fail to start.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from Miciah and Thealisyed February 26, 2026 19:26
@jcmoraisjr
Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77263, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@melvinjoseph86
Copy link
Copy Markdown

melvinjoseph86 commented Mar 2, 2026

Tested using clusterbot


➜  oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.22.0-0-2026-03-04-150532-test-ci-ln-jn3hyqb-latest   True        False         7m19s   Error while reconciling 4.22.0-0-2026-03-04-150532-test-ci-ln-jn3hyqb-latest: some cluster operators are not available

➜  oc get po -n openshift-ingress
NAME                             READY   STATUS    RESTARTS   AGE
router-default-849f787f8-48qdx   1/1     Running   0          108m
router-default-849f787f8-r5gkq   1/1     Running   0          108m
➜    oc exec router-default-849f787f8-48qdx   -n openshift-ingress -- cat /var/lib/haproxy/conf/default_pub_keys.pem|openssl x509 -out - --text 
cat: /var/lib/haproxy/conf/default_pub_keys.pem: No such file or directory
command terminated with exit code 1
Could not find certificate from <stdin>

Making the CVO unmanged
➜  oc patch clusterversions/version --type=json --patch='[{"op":"add","path":"/spec/overrides","value":[{"kind":"Deployment","group":"apps","name":"ingress-operator","namespace":"openshift-ingress-operator","unmanaged":true}]}]'
clusterversion.config.openshift.io/version patched
➜  oc -n openshift-ingress-operator scale deployment.apps/ingress-operator --replicas=0
Warning: spec.template.spec.nodeSelector[node-role.kubernetes.io/master]: use "node-role.kubernetes.io/control-plane" instead
deployment.apps/ingress-operator scaled

Customize router deployment by removing envvar DEFAULT_CERTIFICATE_DIR and ROUTER_USE_PROXY_PROTOCOL as false
➜  oc -n openshift-ingress edit deployment/router-default
deployment.apps/router-default edited

➜  oc get po -n openshift-ingress
NAME                              READY   STATUS        RESTARTS   AGE
router-default-5bb7dd9445-4hc72   0/1     Pending       0          41s
router-default-5bb7dd9445-dt8vc   1/1     Running       0          41s
router-default-849f787f8-48qdx    1/1     Terminating   0          113m
router-default-849f787f8-r5gkq    0/1     Terminating   0          113m

➜  oc rsh  -n openshift-ingress router-default-5bb7dd9445-4hc72                        
E0304 23:04:05.456678   30758 websocket.go:296] Unknown stream id 1, discarding message
sh-5.1$ 
sh-5.1$ 
sh-5.1$ :|openssl s_client -connect 127.0.0.1:443 |openssl x509 -noout -dates
Connecting to 127.0.0.1
Can't use SSL_get_servername
depth=0 CN=*.apps.ci-ln-jn3hyqb-76ef8.aws-2.ci.openshift.org
verify error:num=18:self-signed certificate
verify return:1
depth=0 CN=*.apps.ci-ln-jn3hyqb-76ef8.aws-2.ci.openshift.org
verify return:1
DONE
notBefore=Mar  4 16:32:15 2026 GMT
notAfter=Mar  3 16:32:15 2036 GMT
sh-5.1$ 
sh-5.1$ 
sh-5.1$ exit
exit
➜  oc logs router-default-5bb7dd9445-4hc72  -n openshift-ingress
I0304 17:32:15.349148       1 template.go:561] "msg"="starting router" "logger"="router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: abfd88a2\nversionFromGit: v0.0.0-unknown\ngitTreeState: dirty\nbuildDate: 2026-03-04T15:00:17Z\n"
<-----skip---->

The notBefore date is 1h before the router started to run, and notAfter is 10 years in the future comparing the router logs
Dummy cert file is not there and checked the auto generated certificate, hence marking as verified
/verified by @mjoseph

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@melvinjoseph86: This PR has been marked as verified by @mjoseph.

Details

In response to this:

Tested using clusterbot


➜  oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.22.0-0-2026-03-02-091706-test-ci-ln-nty2fgk-latest   True        False         118m    Cluster version is 4.22.0-0-2026-03-02-091706-test-ci-ln-nty2fgk-latest

➜  oc exec router-default-678c8f8c5f-kxwtl  -n openshift-ingress -- cat /var/lib/haproxy/conf/default_pub_keys.pem|openssl x509 -out - --text 
cat: /var/lib/haproxy/conf/default_pub_keys.pem: No such file or directory
command terminated with exit code 1
Could not find certificate from <stdin>

➜  oc rsh -n openshift-ingress   router-default-678c8f8c5f-kxwtl  
sh-5.1$ cd /var/lib/haproxy/conf
sh-5.1$ ls
cert_config.map      error-page-503.http      haproxy.config  os_edge_reencrypt_be.map	os_route_http_redirect.map  os_tcp_be.map
error-page-404.http  haproxy-config.template  metrics-auth    os_http_be.map		os_sni_passthrough.map	    os_wildcard_domain.map
sh-5.1$ 

Dummy cert file is not there, hence marking as verified
/verified by @mjoseph

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Comment on lines -169 to +172
return newDefaultTemplatePlugin(router, cfg.IncludeUDP, lookupSvc), err
if err != nil {
return nil, err
}
return newDefaultTemplatePlugin(router, cfg.IncludeUDP, lookupSvc), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change purely stylistic?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's skipping the call to the later in the case the former fails.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, the old logic ignores err from newTemplateRouter. Was that causing a problem, or is it something you just happened to notice while fixing the certificate issue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found the pattern and preferred to see it fixed, since we are handling a non properly initialized router var in case of an error. It wasn't causing issues - and didn't look for any though =)

Comment thread pkg/router/template/router.go Outdated
Comment on lines +421 to +425
if err := generateSelfSignCert(outPath); err != nil {
// Last option, we give up. This should only happen on misconfigured envs,
// like missing directory or permission.
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the call, we really should fall back to the self-signed certificate only if no certificate was provided. If a certificate was provided but we fail to write it to disk, we should not fall back to a self-signed certificate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated as we talked: 1 commit removing the hardcoded crt+key and using a self-siged crt created on the fly, another one only using the self-signed crt in case one is not provided, failing the router start otherwise.

@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77263-autogen-crt branch from 3386c51 to aeaf362 Compare March 5, 2026 20:22
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Mar 5, 2026
@jcmoraisjr
Copy link
Copy Markdown
Member Author

/test e2e-aws-serial-2of2

@alebedev87
Copy link
Copy Markdown
Contributor

/assign

@jcmoraisjr
Copy link
Copy Markdown
Member Author

/test e2e-aws-fips

@gcs278
Copy link
Copy Markdown
Contributor

gcs278 commented Apr 8, 2026

/unassign @alebedev87

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Apr 8, 2026

/assign

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Apr 10, 2026

@coderabbitai rise and shine, validate this PR, check for consistency, be an adversarial reviewer of Joao and tell me major and critical findings

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR diff
gh pr view 740 --repo openshift/router --json title,body,files,additions,deletions

echo "=== PR DIFF ==="
gh pr diff 740 --repo openshift/router

Length of output: 9710


🏁 Script executed:

#!/bin/bash
echo "=== ssl.go ==="
gh api repos/openshift/router/contents/pkg/router/template/ssl.go?ref=refs/pull/740/head \
  | jq -r '.content' | base64 -d

echo "=== router.go writeDefaultCert region ==="
gh api repos/openshift/router/contents/pkg/router/template/router.go?ref=refs/pull/740/head \
  | jq -r '.content' | base64 -d | grep -n "" | sed -n '400,480p'

Length of output: 4748


@rikatz — adversarial review complete. Here are the critical and major findings:


🔴 Critical: 2^63 is Bitwise XOR in Go, Not Exponentiation

File: pkg/router/template/ssl.go, Line 31

serial, err := rand.Int(rand.Reader, big.NewInt(2^63))

In Go, ^ is the bitwise XOR operator, not the power operator. 2 ^ 63 evaluates to 61 (binary: 0b0000010 XOR 0b0111111 = 0b0111101). This means the serial number is always in the range [0, 60] — a trivially small space — instead of the intended [0, 2^63).

Fix:

serialNumberLimit := new(big.Int).Lsh(big.NewInt(1), 128)
serial, err := rand.Int(rand.Reader, serialNumberLimit)

🔴 Critical: reloadFn Is Broken for the Self-Signed Cert Case

File: pkg/router/template/router.go, Lines 435–444

reloadFn := func() {
    log.V(0).Info("updating default certificate", "path", outPath)
    os.Remove(outPath)                                              // deletes the cert first
    if err := secretToPem(r.defaultCertificateDir, outPath); err != nil { // always calls secretToPem
        log.Error(err, "failed to update default certificate", "path", outPath)
        return
    }
    ...
}
if err := r.watchVolumeMountDir(r.defaultCertificateDir, reloadFn); err != nil {
    // swallowed on error
}

The reloadFn always calls secretToPem(r.defaultCertificateDir, outPath). In the self-signed cert branch, r.defaultCertificateDir is "". If the watch ever fires (or if watchVolumeMountDir("") doesn't error out), the code will:

  1. Delete the existing self-signed cert (os.Remove)
  2. Call secretToPem("", ...) which will fail
  3. Never call generateSelfSignCert to recreate it

The router is then left with no certificate and haproxy will fail. The reloadFn needs to be conditioned on which path was taken:

reloadFn := func() {
    log.V(0).Info("updating default certificate", "path", outPath)
    os.Remove(outPath)
    var regenErr error
    if len(r.defaultCertificateDir) != 0 {
        regenErr = secretToPem(r.defaultCertificateDir, outPath)
    } else {
        regenErr = generateSelfSignCert(outPath)
    }
    if regenErr != nil {
        log.Error(regenErr, "failed to update default certificate", "path", outPath)
        return
    }
    log.V(0).Info("reloading to get updated default certificate")
    r.rateLimitedCommitFunction.RegisterChange()
}

🟠 Major: Private Key Written World-Readable (0644)

File: pkg/router/template/ssl.go, Line 26

return os.WriteFile(outputFile, append(crt, key...), 0644)

The output file contains the RSA private key and is written with 0644 permissions — world-readable. Any process on the node running as a different user can read the private key. Should be 0600.

return os.WriteFile(outputFile, append(crt, key...), 0600)

🟡 Moderate: Missing KeyUsage and ExtKeyUsage in Certificate Template

File: pkg/router/template/ssl.go, Lines 34–43

The generated certificate does not set KeyUsage or ExtKeyUsage. Modern TLS stacks (and strict clients) may reject a server certificate that lacks ExtKeyUsageServerAuth. It also doesn't set BasicConstraints.

template := x509.Certificate{
    // ...
    KeyUsage:    x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment,
    ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
    BasicConstraintsValid: true,
}

🟡 Minor: Typo in Function Name

func generateSelfSignCert(outputFile string) error {

Should be generateSelfSignedCert (missing d). Not functional but inconsistent naming.


Summary Table

Severity Finding
🔴 Critical 2^63 is XOR in Go → serial in [0, 60] only
🔴 Critical reloadFn deletes cert then calls secretToPem("") in self-signed case — cert gone forever
🟠 Major Private key file written with 0644 (world-readable) instead of 0600
🟡 Moderate No KeyUsage/ExtKeyUsage/BasicConstraints in cert template
🟡 Minor Function name typo: generateSelfSignCertgenerateSelfSignedCert

Comment thread pkg/router/template/router.go Outdated
} else {
r.defaultCertificatePath = outPath
// generate one on the fly only in case we don't have neither a path nor a directory configured
if err := generateSelfSignCert(outPath); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two things here that can be disrupting but maybe they are already disrupting:

  • If an error happens, previously we would use a default certificate, but now we will error. What does this imply? That our proxy will be out? Is this "acceptable"?
  • From the 'user' perspective, the worst case fallback was accepting an insecure self signed certificate once (meh) and then never caring about it, while here we will always generate a new certificate

I understand this is the edge of the edge case, just some food for thought, although I agree that it is better to have a "safe" (not much, but still better) fallback than having someone being able to do MITM in this case because the private key is there and common

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously raised and we ended up agreeing that it should fail in case there is an attempt to configure, so better to have a failing/crashing router instead of incorrectly configuring it with a distinct (the fake one) certificate.

Anyway this is an ingress operator task, so failures here would only happen in case of a bug in the operator. Users could not (should not) cause a disruption in the router due to this change in the code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see, thanks!

@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77263-autogen-crt branch from aeaf362 to 87548de Compare April 13, 2026 13:07
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rikatz. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jcmoraisjr
Copy link
Copy Markdown
Member Author

@coderabbitai good catch. Note that I used a simplified approach in the max key, all the others are addressed as suggested. Can you confirm all is good now?

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 07344e11-32db-409d-9bba-0a7e0014301e

📥 Commits

Reviewing files that changed from the base of the PR and between 5cc70f2 and 0500b49.

⛔ Files ignored due to path filters (1)
  • images/router/haproxy/conf/default_pub_keys.pem is excluded by !**/*.pem
📒 Files selected for processing (4)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/plugin.go
  • pkg/router/template/router.go
  • pkg/router/template/ssl.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/plugin.go
  • pkg/router/template/router.go
  • pkg/router/template/ssl.go

Walkthrough

Adds self-signed certificate generation and atomic default-cert updates, tightens error handling in template router initialization and secret handling, changes default PEM file permissions to 0600, and updates the HAProxy template to use .DefaultCertificate directly in TLS bind directives.

Changes

Cohort / File(s) Summary
HAProxy template
images/router/haproxy/conf/haproxy-config.template
Replaced template-time firstMatch fallback with direct use of .DefaultCertificate in TLS bind ... ssl crt ... for fe_sni and fe_no_sni.
Template plugin init
pkg/router/template/plugin.go
NewTemplatePlugin now returns nil, err if newTemplateRouter fails, preventing a plugin instance from being returned on initialization error.
Default cert lifecycle & watcher
pkg/router/template/router.go
writeDefaultCert() sets defaultCertificatePath earlier, writes PEM files with mode 0600, returns fatal errors for secretToPem failures when defaultCertificateDir is set, generates self-signed cert immediately if no default provided, and installs a watcher that writes to outPath + ".tmp" and atomically renames to outPath on updates (with temp-file cleanup and rate-limited reload).
Self-signed cert generation
pkg/router/template/ssl.go
New file adding generateSelfSignedCert and createX509Certificate to produce RSA-based self-signed cert+key (SAN from ROUTER_DOMAIN or localhost) and write combined PEM (cert then key) with 0600 permissions.

Sequence Diagram

sequenceDiagram
    participant Plugin as TemplatePlugin
    participant Router as TemplateRouter
    participant CertGen as CertGenerator
    participant FS as FileSystem
    participant Watcher as FileWatcher
    participant HAProxy as HAProxy

    Plugin->>Router: Initialize (NewTemplatePlugin)
    Router->>Router: writeDefaultCert()
    alt no defaultCertificate configured
        Router->>CertGen: generateSelfSignedCert(outPath)
        CertGen->>CertGen: determine SAN (ROUTER_DOMAIN/localhost) and create cert+key
        CertGen->>FS: write PEM (cert + key) to outPath with mode 0600
        FS-->>Router: write success
    else defaultCertificateDir provided
        Router->>FS: secretToPem(defaultCertificateDir -> outPath)
        FS-->>Router: write success or error (fatal)
        Router->>Watcher: start watching defaultCertificateDir
        Watcher->>Watcher: on change -> write to outPath.tmp
        Watcher->>FS: os.Rename(outPath.tmp -> outPath)
        FS-->>Router: rename success
        Router->>HAProxy: trigger rate-limited reload
    end
    Router-->>Plugin: return initialized plugin (or error)
    Plugin->>HAProxy: render config using .DefaultCertificate
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 8 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive PR summary lists only 4 files changed (plugin.go, router.go, ssl.go, haproxy-config.template) with no test files mentioned. Clarify whether test files were modified in this PR. If test code was added/modified, provide those files for Ginkgo quality assessment per the custom check criteria.
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: generating a self-signed certificate as a fallback when unable to read the default certificate, which is the core objective of this PR.
Stable And Deterministic Test Names ✅ Passed No Ginkgo test files found in PR. The modified files are production code and configuration, not tests.
Microshift Test Compatibility ✅ Passed This PR modifies router template and configuration files without adding any Ginkgo e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests; modified files are router component code without test markers.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only internal router component changes with no deployment manifests, operator code, or scheduling constraints.
Ote Binary Stdout Contract ✅ Passed The modified code does not violate the OTE Binary Stdout Contract. All logging uses the custom logger based on klogr.New() that writes to stderr by default, with no direct stdout writes detected.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not introduce any new Ginkgo e2e tests; changes are limited to OpenShift Router implementation code without test patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me fetch the current state of the files to verify all fixes.


🏁 Script executed:

#!/bin/bash
echo "=== CURRENT ssl.go ==="
gh api "repos/openshift/router/contents/pkg/router/template/ssl.go?ref=refs/pull/740/head" \
  | jq -r '.content' | base64 -d

echo ""
echo "=== CURRENT router.go (writeDefaultCert region, lines 400-490) ==="
gh api "repos/openshift/router/contents/pkg/router/template/router.go?ref=refs/pull/740/head" \
  | jq -r '.content' | base64 -d | grep -n "" | sed -n '400,490p'

Length of output: 5541


@jcmoraisjr — confirmed, all issues are resolved. Here's the verification summary:

# Finding Status
🔴 Critical Serial number 2^63 XOR → now big.NewInt(1<<63-1) (max int64, 9223372036854775807) — valid Go, no XOR ✅ Fixed
🔴 Critical reloadFn deleting cert in self-signed case → the self-signed branch now returns early at line 422 before reloadFn is ever registered; the reloadFn is only set up when r.defaultCertificateDir != "", so secretToPem is always called with a non-empty dir ✅ Fixed
🟠 Major Private key file permissions 06440600 ✅ Fixed
🟡 Moderate Missing KeyUsage / ExtKeyUsage / BasicConstraintsValid → all three are now set in the certificate template ✅ Fixed
🟡 Minor Typo generateSelfSignCertgenerateSelfSignedCert ✅ Fixed

The simplified serial bound 1<<63-1 is a perfectly reasonable choice — it gives a serial range of [0, 2^63−2], which is far more than sufficient for a self-signed fallback cert. LGTM from my side. 🚀

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77263, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

Details

In response to this:

Ingress operator manages and exposes a default certificate for Router. A default certificate is used if the route does not provide one, or a request is made for a non matching route.

Current approach uses a hardcoded certificate in the router image, which should lead to image scanners reporting the storage of its private key.

This implementation removes the hardcoded certificate, as well as the template code that ensures a default certificate is provided. In the Go code side, we are ensuring that the default certificate is always provided by generating one on the fly. If this last resort option fails due to permission, the whole router process fail to start.

Summary by CodeRabbit

  • New Features

  • Added automatic self-signed certificate generation when no default certificate is configured.

  • Implemented live certificate updates via file system monitoring for dynamic reloading.

  • Bug Fixes

  • Improved error handling for certificate initialization; failures are now reported instead of silently falling back to defaults.

  • Simplified certificate path resolution in TLS termination configuration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/router/template/router.go (1)

432-443: ⚠️ Potential issue | 🟠 Major

Keep the last good PEM in place during watched updates.

Lines 436-438 remove outPath before the replacement is known-good. If secretToPem() fails, r.defaultCertificatePath still points to a file that no longer exists, so later config writes/reloads will reference a missing default certificate.

🩹 Proposed fix
 		// watching for changes
 		r.defaultCertificatePath = outPath
 		reloadFn := func() {
 			log.V(0).Info("updating default certificate", "path", outPath)
-			os.Remove(outPath)
-			if err := secretToPem(r.defaultCertificateDir, outPath); err != nil {
+			tmpPath := outPath + ".tmp"
+			if err := secretToPem(r.defaultCertificateDir, tmpPath); err != nil {
+				_ = os.Remove(tmpPath)
 				log.Error(err, "failed to update default certificate", "path", outPath)
 				return
 			}
+			if err := os.Rename(tmpPath, outPath); err != nil {
+				_ = os.Remove(tmpPath)
+				log.Error(err, "failed to replace default certificate", "path", outPath)
+				return
+			}
 			log.V(0).Info("reloading to get updated default certificate")
 			r.rateLimitedCommitFunction.RegisterChange()
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/template/router.go` around lines 432 - 443, The reloadFn currently
removes outPath before generating the new PEM, which can leave
r.defaultCertificatePath pointing at a missing file if secretToPem fails; change
reloadFn to write the new PEM to a temporary file (use r.defaultCertificateDir
and outPath to derive a temp name), call secretToPem to that temp file, and only
on success atomically replace/rename the temp to outPath (or update
r.defaultCertificatePath) and then call
r.rateLimitedCommitFunction.RegisterChange(); if secretToPem returns an error,
do not remove or modify the existing outPath and keep r.defaultCertificatePath
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/router/template/router.go`:
- Around line 419-423: When falling back to the self-signed branch, populate
r.defaultCertificatePath with the path to the generated PEM before returning so
writeConfig()/templateData.DefaultCertificate can reference it; specifically,
after calling generateSelfSignedCert(outPath) (inside the branch that checks
r.defaultCertificateDir == "") set r.defaultCertificatePath = outPath (or the
actual path returned/used by generateSelfSignedCert) and then return the
error/result.

---

Outside diff comments:
In `@pkg/router/template/router.go`:
- Around line 432-443: The reloadFn currently removes outPath before generating
the new PEM, which can leave r.defaultCertificatePath pointing at a missing file
if secretToPem fails; change reloadFn to write the new PEM to a temporary file
(use r.defaultCertificateDir and outPath to derive a temp name), call
secretToPem to that temp file, and only on success atomically replace/rename the
temp to outPath (or update r.defaultCertificatePath) and then call
r.rateLimitedCommitFunction.RegisterChange(); if secretToPem returns an error,
do not remove or modify the existing outPath and keep r.defaultCertificatePath
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: deade3fc-bdb7-4c51-8e3b-a4e8df080cfa

📥 Commits

Reviewing files that changed from the base of the PR and between 8963907 and 87548de.

⛔ Files ignored due to path filters (1)
  • images/router/haproxy/conf/default_pub_keys.pem is excluded by !**/*.pem
📒 Files selected for processing (4)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/plugin.go
  • pkg/router/template/router.go
  • pkg/router/template/ssl.go

Comment thread pkg/router/template/router.go
@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77263-autogen-crt branch from 87548de to 5cc70f2 Compare April 14, 2026 16:16
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
pkg/router/template/router.go (2)

429-433: ⚠️ Potential issue | 🟠 Major

Tighten the default PEM permissions.

This branch now relies on secretToPem(), but secretToPem() still writes the combined cert+key PEM with mode 0444 at Line 336. That leaves the default private key world-readable inside the container, unlike the self-signed path which already uses 0600.

🔐 Proposed fix
-	return os.WriteFile(outName, pemBlock, 0444)
+	return os.WriteFile(outName, pemBlock, 0600)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/template/router.go` around lines 429 - 433, The code currently
calls secretToPem(...) to write the combined cert+key PEM for
defaultCertificateDir, but secretToPem writes the file with mode 0444 making the
private key world-readable; update secretToPem to write the combined PEM file
with mode 0600 (owner read/write only) instead of 0444 so the default private
key is not world-readable, and ensure any file creation/open logic in
secretToPem uses the tighter permission mode when creating the PEM.

436-445: ⚠️ Potential issue | 🔴 Critical

Don’t delete the live PEM before the replacement is ready.

Line 439 removes outPath before the new PEM is rendered. If the updated secret is malformed or unreadable, the current process keeps serving with the old cert in memory, but the file disappears and the next unrelated reload can fail because DefaultCertificate still points at a missing path.

🩹 Proposed fix
 	reloadFn := func() {
 		log.V(0).Info("updating default certificate", "path", outPath)
-		os.Remove(outPath)
-		if err := secretToPem(r.defaultCertificateDir, outPath); err != nil {
+		tmpPath := outPath + ".tmp"
+		if err := secretToPem(r.defaultCertificateDir, tmpPath); err != nil {
+			_ = os.Remove(tmpPath)
 			log.Error(err, "failed to update default certificate", "path", outPath)
 			return
 		}
+		if err := os.Rename(tmpPath, outPath); err != nil {
+			_ = os.Remove(tmpPath)
+			log.Error(err, "failed to replace default certificate", "path", outPath)
+			return
+		}
 		log.V(0).Info("reloading to get updated default certificate")
 		r.rateLimitedCommitFunction.RegisterChange()
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/router/template/router.go` around lines 436 - 445, The reloadFn currently
deletes outPath before rendering the new PEM which can leave DefaultCertificate
pointing at a missing file; instead, render the new PEM to a temporary file in
r.defaultCertificateDir (use secretToPem targetting a temp path), verify
success, then atomically replace the existing outPath with os.Rename (or
equivalent) so the file is only removed on success; keep the existing error
logging on secretToPem failure and only call
r.rateLimitedCommitFunction.RegisterChange() after the atomic rename completes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/router/template/router.go`:
- Around line 429-433: The code currently calls secretToPem(...) to write the
combined cert+key PEM for defaultCertificateDir, but secretToPem writes the file
with mode 0444 making the private key world-readable; update secretToPem to
write the combined PEM file with mode 0600 (owner read/write only) instead of
0444 so the default private key is not world-readable, and ensure any file
creation/open logic in secretToPem uses the tighter permission mode when
creating the PEM.
- Around line 436-445: The reloadFn currently deletes outPath before rendering
the new PEM which can leave DefaultCertificate pointing at a missing file;
instead, render the new PEM to a temporary file in r.defaultCertificateDir (use
secretToPem targetting a temp path), verify success, then atomically replace the
existing outPath with os.Rename (or equivalent) so the file is only removed on
success; keep the existing error logging on secretToPem failure and only call
r.rateLimitedCommitFunction.RegisterChange() after the atomic rename completes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5e7ac4c6-4d67-4329-8752-366aec0e5a22

📥 Commits

Reviewing files that changed from the base of the PR and between 87548de and 5cc70f2.

⛔ Files ignored due to path filters (1)
  • images/router/haproxy/conf/default_pub_keys.pem is excluded by !**/*.pem
📒 Files selected for processing (4)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/plugin.go
  • pkg/router/template/router.go
  • pkg/router/template/ssl.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/router/template/ssl.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • images/router/haproxy/conf/haproxy-config.template
  • pkg/router/template/plugin.go

Ingress operator manages and exposes a default certificate for Router.
A default certificate is used if the route does not provide one, or a
request is made for a non matching route.

Current approach uses a hardcoded certificate in the router image, which
should lead to image scanners reporting the storage of its private key.

This implementation removes the hardcoded certificate, as well as the
template code that ensures a default certificate is provided. In the Go
code side, we are ensuring that the default certificate is provided
somehow, either by serving the certificate provided by the operator,
or in the case of a failure, using a self-signed certificate created on
the fly.

https://issues.redhat.com/browse/OCPBUGS-77263
@jcmoraisjr jcmoraisjr force-pushed the OCPBUGS-77263-autogen-crt branch from 5cc70f2 to 0500b49 Compare April 14, 2026 20:07
@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@jcmoraisjr: This pull request references Jira Issue OCPBUGS-77263, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Ingress operator manages and exposes a default certificate for Router. A default certificate is used if the route does not provide one, or a request is made for a non matching route.

Current approach uses a hardcoded certificate in the router image, which should lead to image scanners reporting the storage of its private key.

This implementation removes the hardcoded certificate, as well as the template code that ensures a default certificate is provided. In the Go code side, we are ensuring that the default certificate is always provided by generating one on the fly. If this last resort option fails due to permission, the whole router process fail to start.

Summary by CodeRabbit

  • New Features

  • Automatic self-signed certificate creation when no default is configured.

  • Live certificate reloads via filesystem watch with atomic replacement.

  • Bug Fixes

  • Fail-fast handling during certificate initialization (no silent fallback).

  • TLS termination now uses the configured certificate directly.

  • Default certificate and key files created with stricter permissions (0600).

  • Plugin construction now returns errors instead of producing invalid instances.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 14, 2026

@jcmoraisjr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-fips 0500b49 link true /test e2e-aws-fips
ci/prow/e2e-agnostic 0500b49 link true /test e2e-agnostic

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants