Open
Conversation
bda4959 to
0cefb2c
Compare
0cefb2c to
d6dbaf4
Compare
dfulu
commented
Mar 2, 2026
|
|
||
| def convert_to_numpy_sample( | ||
| sample: dict[str, xr.DataArray | dict[str, xr.DataArray]], | ||
| datasets_dict: dict[str, xr.DataArray | dict[str, xr.DataArray]], |
Member
Author
There was a problem hiding this comment.
Renamed this to match the naming convention we've applied elsewhere
dfulu
commented
Mar 6, 2026
| _ = pickle.loads(pickle_bytes) # noqa: S301 | ||
|
|
||
|
|
||
| def test_pvnet_dataset_batch_size_2(pvnet_config_filename): |
Member
Author
There was a problem hiding this comment.
Since we removed the batch_to_tensor and copy_batch_to_device functions I've got rid of this
Member
Author
There was a problem hiding this comment.
Though I'm not really sure what the aim of this test was
dfulu
commented
Mar 6, 2026
|
|
||
| def test_pvnet_dataset(pvnet_config_filename): | ||
| dataset = PVNetDataset(pvnet_config_filename) | ||
| def _pvnet_dataset_sample_check(sample, config, batch_dim = None): |
Member
Author
There was a problem hiding this comment.
I did some refactoring of the tests here to remove code duplication using this new helper function
dfulu
commented
Mar 6, 2026
| @@ -3,8 +3,7 @@ | |||
| import logging | |||
Member
Author
There was a problem hiding this comment.
I did some updates and refactoring of this file but after discussion I think we'll delete this file in the next PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Description
This PR is motivated by wanting to use the torch
default_collatefunction and the in-built lightning functionality which can move batches to the required device and also manage their dtype (i.e. cast to float32 / float16 etc). Our samples are currently nested dictionaries and so we have to maintain code to be able to deal with this. I think this maintenance cost is too high considering we don't get much out of the nested sample structure.Changes:
stack_np_samples_into_batch(),batch_to_tensor(), andcopy_batch_to_device()functions which are no longer needed since we can get these for free from pytorch and lightningfill_nans_in_arrays()->fill_nans_in_dataset_dicts()so that NaNs are filled in the datasets dict rather than the sample. This seemed a cleaner solution after the batch had been flattened and we also don't waste time trying to fill NaNs in the coordinate arraysinclude_extra_metadata. This flag is accessed from thePVNetDatasetclasses rather than the configuration since it doesn't matter for training and running our models at inference.numpy_sample/convert.pyChecklist: