GCP-429: feat: Add GCP Workload Identity Federation (WIF) credential support#206
GCP-429: feat: Add GCP Workload Identity Federation (WIF) credential support#206apahim wants to merge 1 commit intoopenshift:mainfrom
Conversation
WalkthroughAdds GCP Workload Identity Federation credential detection and a consolidated credential loader that checks Changes
Sequence Diagram(s)sequenceDiagram
participant K8sSecret as "Kubernetes Secret (WIF/SA)"
participant EnvFile as "Env file (`GOOGLE_APPLICATION_CREDENTIALS`)"
participant GCPProvider as "GCP Cloud Provider"
GCPProvider->>K8sSecret: check for `workload_identity_config.json`
alt WIF present
K8sSecret-->>GCPProvider: return WIF credentials
else not present
GCPProvider->>K8sSecret: check for `service_account.json`
alt SA present
K8sSecret-->>GCPProvider: return SA credentials
else not present
GCPProvider->>EnvFile: check env var file path
alt env file exists
EnvFile-->>GCPProvider: return credentials bytes
else file missing
EnvFile-->>GCPProvider: return error
end
end
end
GCPProvider->>GCPProvider: ensureUniverseDomain(credentialsJSON)
GCPProvider->>GCPProvider: initialize Google client with credentials
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@apahim: This pull request references GCP-429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@apahim: This pull request references GCP-429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/cloudprovider/gcp.go`:
- Around line 95-109: The code currently treats any error from
g.readSecretData("workload_identity_config.json") and
g.readSecretData("service_account.json") as a benign miss and falls through to
the next credential source; change this so readSecretData errors that are not a
"not found" condition cause an immediate return of that error (fail fast)
instead of trying the next source. In practice update the handling around
wifConfig and saConfig (calls to g.readSecretData) to check the error type/value
and only continue when the error indicates the secret is missing; for any other
error return nil, err (and log it) so functions like readSecretData and the
surrounding logic in pkg/cloudprovider/gcp.go do not silently fall back on
permission or I/O failures.
- Around line 72-83: The code in ensureUniverseDomain unmarshals credentialsJSON
into var jsonMap map[string]interface{} which will be nil if the JSON is the
literal null, causing a panic when assigning jsonMap["universe_domain"]; fix by
checking for a nil map after json.Unmarshal and initializing it (e.g., jsonMap =
make(map[string]interface{})) before setting
jsonMap["universe_domain"]=defaultUniverseDomain, then proceed to json.Marshal
and return; reference symbols: ensureUniverseDomain, jsonMap,
defaultUniverseDomain.
In `@README.md`:
- Around line 104-113: The fenced code block containing the Kubernetes Secret
example is missing a language tag which triggers MD040; update the opening fence
from ``` to ```yaml so the block beginning with "apiVersion: v1" and the Secret
example (workload_identity_config.json, kind: Secret, metadata:name/namespace,
type: Opaque) is annotated as YAML.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (3)
README.mdpkg/cloudprovider/gcp.gopkg/cloudprovider/gcp_test.go
CI StatusAll checks relevant to this PR have passed: unit tests, lint, images, and verify-deps. The e2e tests are still running. The
None of these are introduced by the changes in this PR. This scan would fail on |
|
@apahim: This pull request references GCP-429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/lgtm |
There was a problem hiding this comment.
Changes look good, I only added a few minor comments.
A couple of more things:
- you can squash the second commit onto the first one: no need to keep the changes requested by coderabbit separate from the main changes
- Has this been tested end-to-end already?
- Are you planning to have any end-to-end tests, perhaps in openshift/origin, to test all three authentication methods as well as perhaps some corner cases, like what happens when a certificate rotates?
| t.Fatalf("expected env var config, got: %s", string(data)) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I would also add a test where you have service_account.json and theGOOGLE_APPLICATION_CREDENTIALS env var, in order to show which one gets picked.
EDIT: Actually, looking at openshift-online/gcp-hcp#7, in the case of HCP I see that GOOGLE_APPLICATION_CREDENTIALS contains the WIF credentials and there's no fallback. Right?
There was a problem hiding this comment.
Right, for HCP, we always use the GOOGLE_APLICATION_CREDENTIALS.
End-to-end flow:
- NEW - HO PR #7824: HyperShift Operator (ReconcileCredentials in hostedcluster controller)
- Creates secret "cloud-network-config-controller-creds" containing: application_default_credentials.json (WIF external_account JSON)
- NEW - HO PR #7824: HyperShift CPO (control-plane-operator, CNO deployment component)
- Sets GCP_CNCC_CREDENTIALS_FILE=application_default_credentials.json as env var on the CNO deployment in the control plane namespace
- NEW - CNO PR #2915: CNO (cluster-network-operator, renderCloudNetworkConfigController)
- Reads GCP_CNCC_CREDENTIALS_FILE
- Renders managed/controller.yaml template with GOOGLE_APPLICATION_CREDENTIALS=/etc/secret/cloudprovider/application_default_credentials.json on the CNCC container
- EXISTING - already in managed/controller.yaml template: kubelet (on management cluster node)
- Mounts the secret as a volume at /etc/secret/cloudprovider (configured in the template as volume "cloud-provider-secret")
- EXISTING - cloud-token minter
- Sidecar container in CNCC pod
- Mints a projected SA token and writes it to /var/run/secrets/openshift/serviceaccount/token
- NEW - CNCC PR GCP-429: feat: Add GCP Workload Identity Federation (WIF) credential support #206: CNCC (cloud-network-config-controller, readGCPCredentialsConfig)
- Reads GOOGLE_APPLICATION_CREDENTIALS
- Loads WIF config
- WIF config's credential_source.file references the minted token
- EXISTING - CNCC GCP client exchanges token via STS for access credentials
There was a problem hiding this comment.
OK, so let's add to gcp.go a line explaining that through the GOOGLE_APPLICATION_CREDENTIALS env var we load the WIF config. :)
|
@apahim: This pull request references GCP-429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Add auto-detection of WIF credentials so CNCC can authenticate without long-lived service account keys. The credential priority chain is: 1) workload_identity_config.json from secret (WIF) 2) service_account.json from secret (existing behavior) 3) GOOGLE_APPLICATION_CREDENTIALS env var (for HCP deployments) The auth mechanism (WithCredentialsJSON) is preserved since it already handles both service_account and external_account credential types. Extract ensureUniverseDomain() helper for independent testability, rewrite universe domain tests to call real code instead of replicating logic inline, add invalid JSON test, and use %w for error wrapping in initialization code to match Azure/AWS patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Done
Not yet, working on it, I will add the verified label as soon as I have it tested.
I was planning to have this only tested on the GCP HCP e2e, but I will look into the openshift/origin repository to add the tests for the auth methods. What type of coordination we need there? We first merge this and then have the tests added there? |
|
@apahim: This pull request references GCP-429 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: apahim, cristianoveiga The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/cloudprovider/gcp.go (1)
101-114:⚠️ Potential issue | 🟠 MajorDo not silently downgrade all secret-read errors to “not found”.
At Line [101] and Line [109], any
readSecretDatafailure falls through; logs at Line [106] and Line [114] also label all errors as “not found”. This can mask real read failures and cause unintended fallback to lower-priority credentials.#!/bin/bash # Verify how readSecretData wraps and surfaces file-read errors. # Expected: confirm whether os.ErrNotExist is preserved (%w) or flattened (%v), # so callers can reliably distinguish "missing file" vs other failures. rg -n -C4 'func \(.*\) readSecretData\(' pkg/cloudprovider rg -n -C4 'unable to read secret data' pkg/cloudprovider rg -n -C3 'readSecretData\(' pkg/cloudprovider/gcp.go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cloudprovider/gcp.go` around lines 101 - 114, The current logic in gcp.go treats any error from g.readSecretData as "not found" and falls back silently; change it so that after calling g.readSecretData for wifCredentialsFile and serviceAccountFile you inspect the returned error and only treat os.ErrNotExist (or errors.Is(err, fs.ErrNotExist)) as "not found" and continue to the next priority, but for any other error return that error immediately (or log and return) instead of falling back; update the log calls around g.readSecretData so they include the actual error details (use %w or errors.Is for wrapping/inspection) and reference wifCredentialsFile, serviceAccountFile and the readSecretData function when locating the code to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pkg/cloudprovider/gcp.go`:
- Around line 101-114: The current logic in gcp.go treats any error from
g.readSecretData as "not found" and falls back silently; change it so that after
calling g.readSecretData for wifCredentialsFile and serviceAccountFile you
inspect the returned error and only treat os.ErrNotExist (or errors.Is(err,
fs.ErrNotExist)) as "not found" and continue to the next priority, but for any
other error return that error immediately (or log and return) instead of falling
back; update the log calls around g.readSecretData so they include the actual
error details (use %w or errors.Is for wrapping/inspection) and reference
wifCredentialsFile, serviceAccountFile and the readSecretData function when
locating the code to modify.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e310fb4f-b3c4-4d1a-ac1c-784d6d141973
📒 Files selected for processing (3)
README.mdpkg/cloudprovider/gcp.gopkg/cloudprovider/gcp_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/cloudprovider/gcp_test.go
| t.Fatalf("expected env var config, got: %s", string(data)) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
OK, so let's add to gcp.go a line explaining that through the GOOGLE_APPLICATION_CREDENTIALS env var we load the WIF config. :)
| // Priority: | ||
| // 1. WIF config from secret | ||
| // 2. service account JSON from secret (existing behavior) | ||
| // 3. GOOGLE_APPLICATION_CREDENTIALS env var (for HCP) |
There was a problem hiding this comment.
Can we add here that GOOGLE_APPLICATION_CREDENTIALS is also used for WIF? Otherwise I'm afraid it's not super clear for somebody who only looks at this repo. Thanks!
Ok. Should we involve QE from your team for the /verified label? That's how we do it in our team for our PRs, but I don't know if it's an org-wide rule.
Uhm will your GCP HCP e2e tests exercise also these CNCC changes or just the HCP changes? |
For instance: openshift/origin#30560 (comment) |
|
@apahim: This PR was included in a payload test run from openshift/origin#30832
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/67ccb430-1813-11f1-9f44-422dfd494bcd-0 |
|
Thanks, adding tests here: |
We don't have QE, we usually (in hypershift) verify it ourselves. Is that alright?
Good point, adding here: openshift/origin#30832 |
|
/retest-required |
|
@apahim: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Sure!
Great! The new test is failing somewhere :) We can continue the discussion there |
Add auto-detection of WIF credentials so CNCC can authenticate without long-lived service account keys. The credential priority chain is:
The auth mechanism (WithCredentialsJSON) is preserved since it already handles both service_account and external_account credential types.
Additions:
Summary by CodeRabbit
New Features
Documentation
Tests