[FLINK-26570][statefun] Remote module configuration interpolation#309
Conversation
3d3ff6f to
4f63228
Compare
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for creating this PR @FilKarnicki. I like the flexibility it gives to the user. The main thing I am asking myself is whether it gives users too much power so that they risk shooting themselves into the foot. I don't have a perfect answer for it yet.
| 1. System properties | ||
| 2. Environment variables | ||
| 3. flink-conf.yaml entries with prefix 'statefun.module.global-config.' | ||
| 4. Command line args |
There was a problem hiding this comment.
I think it is more common to give env variables precedence over flink-conf.yaml values.
There was a problem hiding this comment.
I agree in principal. That said, because globalConfiguration is already an combination of args and flink-conf.yaml entries with the statefun.module.global-config. prefix, there's no easy way to put env variables in between them without affecting other parts of the system.
I made a start in my fork and the number of changes is pretty high for what we gain. Please have a look and let me know if I Should working on the change you mentioned in this comment
| kind: io.statefun.endpoints.v2/http | ||
| spec: | ||
| functions: com.example/* | ||
| urlPathTemplate: ${FUNC_PROTOCOL}://${FUNC_DNS}/{function.name} | ||
| --- | ||
| kind: io.statefun.kafka.v1/ingress | ||
| spec: | ||
| id: com.example/my-ingress | ||
| address: ${KAFKA_ADDRESS}:${KAFKA_PORT} | ||
| consumerGroupId: my-consumer-group | ||
| topics: | ||
| - topic: ${KAFKA_INGRESS_TOPIC} | ||
| (...) | ||
| properties: | ||
| - ssl.truststore.location: ${SSL_TRUSTSTORE_LOCATION} | ||
| - ssl.truststore.password: ${SSL_TRUSTSTORE_PASSWORD} | ||
| (...) |
There was a problem hiding this comment.
I really like the flexibility the templating mechanism gives to the user. The thing I am asking myself is whether it gives too much power so that the user can shoot himself into his foot.
The danger I see is that we add level of indirections that make it harder to reason about what the effective yaml will look like for a user. The underlying problem the issue wants to solve is to pass in information that is only available to the process that runs SF but not the user (e.g. secrets). I am wondering whether there is an alternative to achieve the same but with a bit less power (e.g. allowing substitution only for selected fields (also confusing)). I don't have a perfect answer here. I mainly wanted to hear your and @igalshilman's opinion here.
Maybe one idea could be to log the effective/resolved yaml somewhere so that the user sees what is actually run, if this does not happen already.
There was a problem hiding this comment.
I like the idea of logging the effective yaml somewhere. We should probably make it an opt-in kind of a deal, since we don't want to be automatically logging secrets. I'll hold off on coding this until we hear from @igalshilman
|
I'm beginning to think that we can achieve the same with a 'jar -uf' at deploy time. This means that this PR can be closed, do you agree @igalshilman / @tillrohrmann ? |
|
I still see value in this PR. When deploying into a managed environment such as Kinesis Data Analytics on AWS it is not possible to use something like 'jar -uf' to modify the jar at deploy time. We see value in being able to pass a hostname for API Gateway as an environment variable, for instance, to simplify multi environment deployment in an AWS CDK stack. |
What is the purpose of the change
The goal of this PR is to add the ability to define ${placeholders} in the module.yaml's
specobject, e.g.:These placeholders are then replaced using
where (4) override (3) which override (2) which override (1).
Main changes are
RemoteModuleresolves placeholders by traversing through thespecjson nodeComponentJsonObjecthas a new constructor that allows the use of resolvedspecjson node (note:rawObjectNodestill contains unresolved ${placeholders}, but does not seem to be used anywhere)Verifying this change
Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @public(Evolving): N/A
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no