[1699] Allow anemoi-datasets config to be passed to the reader#1958
[1699] Allow anemoi-datasets config to be passed to the reader#1958cristianlussana wants to merge 4 commits intoecmwf:developfrom
Conversation
| ## | ||
| filename = stream_info.get("filenames") | ||
| if type(filename) is omegaconf.dictconfig.DictConfig: | ||
| # Convert OmegaConf DictConfig to dict |
There was a problem hiding this comment.
We should test here immediately for filename['join'] and generate an error if it does not exist.
There was a problem hiding this comment.
We should also generate an error here if type is not anemoi
There was a problem hiding this comment.
Agreed. Then, for the moment, we implement only the possibility to join anemoi datasets and we leave other operations for later implementations.
There was a problem hiding this comment.
Note that at l169 we do for fname in stream_info["filenames"]: this loop currently does nothing useful (fname is not used) but maybe I don't get it correctly
There was a problem hiding this comment.
Note that at l169 we do
for fname in stream_info["filenames"]:this loop currently does nothing useful (fname is not used) but maybe I don't get it correctly
I just compared to develop and there the iteration over the elements in filenames does make sense:
else branch (i.e. filenames is not a dict) is equivalent to current develop.
There was a problem hiding this comment.
Good point. Now I have aligned the else branch to the current develop
| if not any(filename.exists() for filename in filenames): # see above | ||
| ## | ||
| filename = stream_info.get("filenames") | ||
| if type(filename) is omegaconf.dictconfig.DictConfig: |
There was a problem hiding this comment.
The two branches of the if-statement should go to separate functions to not blow up the constructor here even further.
| # list of sources for current stream | ||
| self.streams_datasets[stream_info["name"]] = [] | ||
|
|
||
| for fname in stream_info["filenames"]: |
There was a problem hiding this comment.
I don't think the changes in l167 ff are correct because the list case is handled here and the check for dict should be at the same level.
There was a problem hiding this comment.
I just opened #1960 to move 148-165 above 147 since it doesn't belong here--sorry if this confused you
Description
The Anemoi datasets reader only supports passing in a filename. In many cases it is more convenient to pass in a config dictionary. This allows multiple datasets to be joined and allows operations to be performed on the dataset.
Changes
multi_stream_data_sampler.py: Modified the way
filenameis definedIssue Number
Closes #1699
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60