A new inference environment is computed whenever inference config is updated, because the inference config itself is used to generate the second short-hash inside the run_id wildcard. As a result, a new unique run_id is generated whenever an inference config is updated, which in turns means a new inference environment is generated. This should not happen, as there is no functional dependency between the two, and it can be time consuming and hard on the distributed filesystem.
As a solution, we might consider splitting the run_id in two. Right now, a run_id looks like:
├──<model-identity(+hash)>-<config-hash>
├── requirements.txt
├── anemoi.json
├── venv.squashfs
├── 2020010100
├── ...
├──<model-identity(+hash)>-<another-config-hash>
├── requirements.txt
├── anemoi.json
├── venv.squashfs
├── 2020010100
├── ...
where <model-identity(+hash)> represents the part that uniquely identifies a model (optionally contains a hash, e.g. from mlflow run id), and <config-hash> is a short hash based on how the model is configured to run.
We could, instead, split the two
├──<model-identity(+hash)>-
├── requirements.txt
├── anemoi.json
├── venv.squashfs
├── <config-hash>
├── 2020010100
├── ....
├── <another-config-hash>
├── 2020010100
├── ....
this way, even if the inference configuration file is updated, the same environment is reused.
It's important to determine which parts of the configs are used for the identity hash and which are for the configuration hash. For instance, extra_requirements will be used for the identity hash.
A new inference environment is computed whenever inference config is updated, because the inference config itself is used to generate the second short-hash inside the
run_idwildcard. As a result, a new uniquerun_idis generated whenever an inference config is updated, which in turns means a new inference environment is generated. This should not happen, as there is no functional dependency between the two, and it can be time consuming and hard on the distributed filesystem.As a solution, we might consider splitting the
run_idin two. Right now, arun_idlooks like:where
<model-identity(+hash)>represents the part that uniquely identifies a model (optionally contains a hash, e.g. from mlflow run id), and<config-hash>is a short hash based on how the model is configured to run.We could, instead, split the two
this way, even if the inference configuration file is updated, the same environment is reused.
It's important to determine which parts of the configs are used for the identity hash and which are for the configuration hash. For instance,
extra_requirementswill be used for the identity hash.