Search before asking
Description
The token delegation mechanism on the server side is quite "stiff".
It only allows to delegate credentials to clients via AK/SK pair.
Ideally one would like to configure server-side credentials to remote storage and delegated credentials in 2 separate ways, especially for obtaining RW access on the server and RO access on the client side.
Furthermore, one may opt out of the token delegation mechanism and let clients obtain credentials on their own (see #1245)
Details below.
How it works now
The Fluss server authenticates to S3 and uses credentials for writing snapshots and log segments.
The Fluss server also serves “delegated” credentials to clients to access S3.
On the server side, the Fluss cluster relies on standard Hadoop FS + AWS SDK authentication, hence it can use any CredentialsProvider supported by the AWS SDK and by Hadoop FS implementing:
package com.amazonaws.auth;
public interface AWSCredentialsProvider {
AWSCredentials getCredentials();
void refresh();
}
This includes any provider and also the DefaultAWSCredentialsProviderChain.
On the client side (credentials delegation), the Fluss cluster relies on the implementation of org.apache.fluss.fs.Filesystem::obtainSecurityToken, a method that is invoked via gRPC by the Fluss client.
The client also relies on the SecurityTokenReceiver registered on the filesystem scheme for using the credentials used by the server.
In the S3 case, the token provided by the server is hardcoded to use some AK/SK pair (see S3DelegationTokenProvider.java ), which are provided from the configuration.
On the client side, the client configures the DynamicTemporaryAWSCredentialsProvider that just uses the credentials fetched by the SecurityTokenReceiver.
More precisely, when instantiating the FileSystem class for the corresponding scheme, the code tests for the presence of s3.acces-key-like configuration to assess whether we are on the server-side or on the client-side:
-
if the key is not set => we are on the client:
Prepend the DynamicTemporaryAWSCredentialsProvider (that takes credentials from the token receiver) to the providers chain
-
if the key is set => we are on the server:
Use the current configuration for determining how to authenticate to S3.
In any case, the client side always needs to have received some credentials before filesystem instantiation through the SecurityTokenReceiver as per this line of code, otherwise an exception would be thrown.
Here is an example configuration for the Fluss server:
# The dir to use as the remote storage of Fluss.
# The scheme makes the server select a filesystem implementation.
# The scheme also makes the client select a TokenReceiver.
remote.data.dir: s3a://<your-bucket>/path/to/remote/storage
# One can configure the server to use different providers, for example:
fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
fs.s3a.assumed.role.arn: ${FLUSS_CLUSTER_ASSUME_ROLE}
fs.s3a.assumed.role.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider
# Discriminant to recognize this is the server.
# Used to delegate credentials to the client.
# Also used on the server side if no alternate credential provider is set,
# otherwise ignored.
s3a.access-key: <your-access-key>
s3a.secret-key: <your-secret-key>
s3a.region: <your-s3-region>
Clients can configure the filesystem plugin by prefixing configuration keys with client.fs.
Flaws
1
Client/server side is determined on the presence of some specific keys in the configuration which forces the server to add those even if they don’t represent the authentication mechanism that one wants to use on the server side.
2
The S3FileSystemPlugin, when recognizing it is on the client, always checks for credentials obtained through the DynamicTemporaryAWSCredentialsProvider hardwiring the client to that credentials provider and to the token delegation mechanism.
In principle, one could force the mechanism by setting s3a.acces-key to some dummy value, to fake to be a server, and opt out of the token delegation mechanism.
3
The server cannot configure how to delegate credentials to the client, hence it is forced to provide credentials to the client using AK/SK:
- one may want to use another mechanism
- one may want to separate RW and RO paths and obtain credentials in one way, while providing them in a different way to clients.
What does Alibaba OSS do different?
The Alibaba OSS filesystem defines 2 clearly separated access paths for RW and RO (see oss.md).
The client will be provided with credentials obtained via fs.oss.sts.endpoint and fs.oss.roleArn, while the server will use this part of the configuration:
# Option 1: Direct credentials
# Aliyun access key ID
fs.oss.accessKeyId: <your-access-key>
# Aliyun access key secret
fs.oss.accessKeySecret: <your-secret-key>
# Option 2: Secure credential provider
fs.oss.credentials.provider: <your-credentials-provider>
Allowing for the server to skip AK/SK whether needed (see fluss/fluss-filesystems/fluss-fs-oss/src/main/java/org/apache/fluss/fs/oss/OSSFileSystemPlugin.java at release-0.9 · apache/fluss).
This is a good approach, still, it hardwires 2 authentication methods for the client and server sides.
Suggested changes
Providing distinct patterns for obtaining credentials for the client and server sides would allow clarity on the server side configuration.
As well, avoiding to assume the user will assume a role or favor an AK/SK pair and simply taking leverage of the powerful mechanism of AWSCredentialsProvider with "chains" and "lists" would allow to obtain maximum flexibility:
- assuming roles
- using profiles
- web-identity
- ak/sk
- etc.
In this case, one may opt for obtaining RW credentials for internal use and only expose RO credentials to clients.
Example configuration:
# The dir to use as the remote storage of Fluss.
# The scheme makes the server select a filesystem implementation.
# The scheme also makes the client select a TokenReceiver.
remote.data.dir: s3a://<your-bucket>/path/to/remote/storage
# One can configure the server to use different providers, for example:
fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
fs.s3a.assumed.role.arn: ${RW_ROLE}
fs.s3a.assumed.role.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider
# Delegation settings:
fs.delegation.s3a.access-key: <your-access-key>
fs.delegation.s3a.secret-key: <your-secret-key>
fs.delegation.s3a.region: <your-s3-region>
# Or
fs.delegation.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
fs.delegation.s3a.assumed.role.arn: ${RO_ROLE}
fs.delegation.s3a.assumed.role.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider
The implementation can instantiate the Hadoop filesystem as it currently does.
org.apache.fluss.fs.Filesystem::obtainSecurityToken should be re-implemented to accept credential providers and instantiating them by reading the delegation options that should not be forwarded to the HadoopConfig for the filesystem used by the server.
Delegation options should default to the fs.s3a options for the internal configuration if not present.
This would preserve a sane default path, by providing the client with the server credentials (same method, same access rights).
This implementation would also allow to disable token delegation.
A simple NoDelegationAWSCredentialsProvider can be implemented an provided in the distribution.
The user may set it for fs.delegation.s3a.aws.credentials.provider, ensuring that the server only delegates with empty credentials.
In turn, clients would need to be configured for access.
The client may not set anything on the filesystem side, taking leverage of the token delegation mechanism (if not active on the server side, a meaningful exception will be thrown).
If any fs.s3a option is set (delegation options would not be used by the client in any case), the client would simply ignore the tokens received by the server (whether valid or not) and use the credential provider set, as the server would do.
Willingness to contribute
Search before asking
Description
The token delegation mechanism on the server side is quite "stiff".
It only allows to delegate credentials to clients via AK/SK pair.
Ideally one would like to configure server-side credentials to remote storage and delegated credentials in 2 separate ways, especially for obtaining RW access on the server and RO access on the client side.
Furthermore, one may opt out of the token delegation mechanism and let clients obtain credentials on their own (see #1245)
Details below.
How it works now
The Fluss server authenticates to S3 and uses credentials for writing snapshots and log segments.
The Fluss server also serves “delegated” credentials to clients to access S3.
On the server side, the Fluss cluster relies on standard Hadoop FS + AWS SDK authentication, hence it can use any CredentialsProvider supported by the AWS SDK and by Hadoop FS implementing:
This includes any provider and also the
DefaultAWSCredentialsProviderChain.On the client side (credentials delegation), the Fluss cluster relies on the implementation of
org.apache.fluss.fs.Filesystem::obtainSecurityToken, a method that is invoked via gRPC by the Fluss client.The client also relies on the
SecurityTokenReceiverregistered on the filesystem scheme for using the credentials used by the server.In the S3 case, the token provided by the server is hardcoded to use some AK/SK pair (see S3DelegationTokenProvider.java ), which are provided from the configuration.
On the client side, the client configures the DynamicTemporaryAWSCredentialsProvider that just uses the credentials fetched by the SecurityTokenReceiver.
More precisely, when instantiating the FileSystem class for the corresponding scheme, the code tests for the presence of
s3.acces-key-like configuration to assess whether we are on the server-side or on the client-side:if the key is not set => we are on the client:
Prepend the DynamicTemporaryAWSCredentialsProvider (that takes credentials from the token receiver) to the providers chain
if the key is set => we are on the server:
Use the current configuration for determining how to authenticate to S3.
In any case, the client side always needs to have received some credentials before filesystem instantiation through the
SecurityTokenReceiveras per this line of code, otherwise an exception would be thrown.Here is an example configuration for the Fluss server:
Clients can configure the filesystem plugin by prefixing configuration keys with
client.fs.Flaws
1
Client/server side is determined on the presence of some specific keys in the configuration which forces the server to add those even if they don’t represent the authentication mechanism that one wants to use on the server side.
2
The
S3FileSystemPlugin, when recognizing it is on the client, always checks for credentials obtained through theDynamicTemporaryAWSCredentialsProviderhardwiring the client to that credentials provider and to the token delegation mechanism.In principle, one could force the mechanism by setting
s3a.acces-keyto some dummy value, to fake to be a server, and opt out of the token delegation mechanism.3
The server cannot configure how to delegate credentials to the client, hence it is forced to provide credentials to the client using AK/SK:
What does Alibaba OSS do different?
The Alibaba OSS filesystem defines 2 clearly separated access paths for RW and RO (see oss.md).
The client will be provided with credentials obtained via
fs.oss.sts.endpointandfs.oss.roleArn, while the server will use this part of the configuration:Allowing for the server to skip AK/SK whether needed (see fluss/fluss-filesystems/fluss-fs-oss/src/main/java/org/apache/fluss/fs/oss/OSSFileSystemPlugin.java at release-0.9 · apache/fluss).
This is a good approach, still, it hardwires 2 authentication methods for the client and server sides.
Suggested changes
Providing distinct patterns for obtaining credentials for the client and server sides would allow clarity on the server side configuration.
As well, avoiding to assume the user will assume a role or favor an AK/SK pair and simply taking leverage of the powerful mechanism of
AWSCredentialsProviderwith "chains" and "lists" would allow to obtain maximum flexibility:In this case, one may opt for obtaining RW credentials for internal use and only expose RO credentials to clients.
Example configuration:
The implementation can instantiate the Hadoop filesystem as it currently does.
org.apache.fluss.fs.Filesystem::obtainSecurityTokenshould be re-implemented to accept credential providers and instantiating them by reading thedelegationoptions that should not be forwarded to theHadoopConfigfor the filesystem used by the server.Delegation options should default to the
fs.s3aoptions for the internal configuration if not present.This would preserve a sane default path, by providing the client with the server credentials (same method, same access rights).
This implementation would also allow to disable token delegation.
A simple
NoDelegationAWSCredentialsProvidercan be implemented an provided in the distribution.The user may set it for
fs.delegation.s3a.aws.credentials.provider, ensuring that the server only delegates with empty credentials.In turn, clients would need to be configured for access.
The client may not set anything on the filesystem side, taking leverage of the token delegation mechanism (if not active on the server side, a meaningful exception will be thrown).
If any
fs.s3aoption is set (delegation options would not be used by the client in any case), the client would simply ignore the tokens received by the server (whether valid or not) and use the credential provider set, as the server would do.Willingness to contribute