Skip to content

GCP: Add scheduled refresh for storage credentials held by GCSFileIO#15696

Open
nastra wants to merge 1 commit intoapache:mainfrom
nastra:gcsfile-refresh-credentials
Open

GCP: Add scheduled refresh for storage credentials held by GCSFileIO#15696
nastra wants to merge 1 commit intoapache:mainfrom
nastra:gcsfile-refresh-credentials

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Mar 20, 2026

The GCSFileIO implementation never refreshes the credentials that are held directly from the table load. The GCS storage clients use a OAuth2RefreshCredentialsHandler that internally refreshes the storage client, but those updates are not reflected back to the FileIO.

If an GCSFileIO instance is serialized to remote workers, the credentials may be expired triggering a thundering herd of requests to refresh immediately.

This change addresses this problem by proactively updating the credentials in the FileIO so that only valid credentials are propagated to remote clients.

This applies the same changes as was already done for S3FileIO in #15678

@nastra nastra force-pushed the gcsfile-refresh-credentials branch from 9574bba to 09f1ad0 Compare March 20, 2026 08:59
@nastra nastra requested a review from danielcweeks March 20, 2026 09:01
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM !

Have some optional suggestions, when seeing things from fresh set of eyes !

}

try (OAuth2RefreshCredentialsHandler handler =
OAuth2RefreshCredentialsHandler.create(properties)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating a OAuth2RefreshCredentialsHandler per refresh call would be expensive, it creates a http client and an AuthManager additionally, i wonder if we can just have this done one (invalidated / created again if props change) and just reuse that across refreshes ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like this is too expensive. Typically, we're talking about possibly one invocation per hour (most queries will likely have none because the run within the timeout), so holding all of the http client thread pools and other resources is the more wasteful approach.

Copy link
Contributor

@singhpk234 singhpk234 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, we're talking about possibly one invocation per hour (most queries will likely have none because the run within the timeout)

If we create the handler lazily, we can achieve the same.

holding all of the http client thread pools and other resources is the more wasteful approach

should we trim down the http client pool to be way less than ~100 in the first place ? we technically don't need the whole HTTP Client pool (~100) for OAuth2RefreshCredsHandler ? its gonna be mostly call to rest server by picking a connection from the pool .
My main suggestion was we are creating burst of connections / threads which we are discarding anyways after the credentials call completes, hence i though it was an expensive

.toList();

if (!refreshed.isEmpty() && !isResourceClosed.get()) {
this.storageCredentials = Lists.newArrayList(refreshed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to schedule a new refresh at this point. Otherwise we never refresh after the first time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants