Add azure support for storing MLRun artifacts #281
Add azure support for storing MLRun artifacts #281GiladShapira94 merged 17 commits intomlrun:developmentfrom
Conversation
…upport # Conflicts: # charts/mlrun-ce/Chart.yaml # charts/mlrun-ce/templates/pipelines/deployments/ml-pipeline.yaml # charts/mlrun-ce/values.yaml
| {{- $azure_container := .Values.storage.azure.containerName | default "" }} | ||
| {{- $is_azure := eq .Values.storage.mode "azure-blob" }} |
There was a problem hiding this comment.
Id move it to template file (_helpers.tpl and set it once. then, create a "schema" variable of "s3://" or "az://" according to the storage mode. that way, you wont need to do many if/else but rather put the schema template as a prefix and variables would be the same
| @@ -1,5 +1,7 @@ | |||
| {{ if .Values.mlrun.enabled}} | |||
| {{- $bucket_name := .Values.global.infrastructure.aws.bucketName | default "mlrun" }} | |||
| {{- $azure_container := .Values.storage.azure.containerName | default "" }} | |||
There was a problem hiding this comment.
The default here is empty string, but in jupyter-env-configmap.yaml the same variable defaults to "mlrun". I think it should be consistent and probably default "mlrun" in both places. This would also be resolved by the _helpers.tpl refactor Liran suggested.
There was a problem hiding this comment.
I thought that for Azure Blob that we don't want to use default values as this service is managed by the user, in difference to the SeaweedFS that manage by us
| @@ -154,21 +154,21 @@ S3 Service Port - returns the port for pipeline config | |||
| S3 Access Key - uses top-level s3.accessKey for all components (MLRun, Jupyter, Pipelines) | |||
There was a problem hiding this comment.
the values now live under storage.s3.*., adjust comment
There was a problem hiding this comment.
This file is deleted but no replacement is created. Both values.yaml references (name: storage-credentials) and the secret_name=storage-credentials auto-mount param point to this secret, how will this work? am I missing something?
There was a problem hiding this comment.
It is a change in MLRun 1.11 that mount secrets as an ENV to the running pods, it the same as it was before but more generic approach
| MLRUN_CE__MODE: {{ .Values.jupyterNotebook.ce.mode | default "full" }} | ||
| MLRUN_CE__VERSION: {{ .Chart.Version }} | ||
| MLRUN_FUNCTION__SPEC__SERVICE_ACCOUNT__DEFAULT: {{ .Values.mlrun.api.functionSpecServiceAccountDefault | default "" | quote }} | ||
| MLRUN_FEATURE_STORE__DATA_PREFIXES__DEFAULT: s3:///{{ $bucket_name }}/projects/{project}/FeatureStore/{name}/{kind} |
There was a problem hiding this comment.
MLRUN_FEATURE_STORE__DATA_PREFIXES__DEFAULT was removed here but not replaced. In mlrun-env-configmap.yaml it's set for both modes (S3 and Azure) inside the if/else block. Since Jupyter uses a separate configmap (jupyter-common-env), it won't inherit that value - I guess it will fall back to MLRun's built-in default which points to v3io://. Should this be added back here for both modes, same as in the MLRun configmap or is it works differently?
There was a problem hiding this comment.
it is ok, as this values return from MLRun server so it's better to have one source of truth
This PR fixes: https://iguazio.atlassian.net/browse/CEML-672
Summary
This PR adds Azure Blob Storage as an artifact storage in MLRun CE, alongside the existing S3/SeaweedFS backend.
Based on the storage mode (S3/azure) and the use values, the storage secrets get update and MLRun artifact storage configuration gets updates for example: the artifact_path, and credentials to store the files in SeaweedFS or Azure.
Main changes
Other changes
Important Note - In both modes KFP pipelines writes artifacts to SeaweedFS