Introduce under- and oversampling in RBatchGenerator#21079
Introduce under- and oversampling in RBatchGenerator#21079martinfoell merged 3 commits intoroot-project:masterfrom
Conversation
Test Results 22 files 22 suites 3d 15h 4m 51s ⏱️ Results for commit b58ea60. ♻️ This comment has been updated with latest results. |
vepadulano
left a comment
There was a problem hiding this comment.
Few minor remarks. Once the comments are addressed, squash the changes in the first commit please 👍
fbd6205 to
87e015e
Compare
vepadulano
left a comment
There was a problem hiding this comment.
The changes look good. Before merging, can you squash the last commit that addresses the PR reviews into the first?
In this commit under- and oversampling strategies are implemented in RBatchGenerator to enable creating a balanced dataset from a dataset that consists of a minority and majority class when 'load_eager=True'. The under- and oversampling methods are implemented in the RSampler class, and the 'sampling_type' parameter can now take two values '"undersampling"' or '"oversampling"'. The '"random"' option for 'sampling_type' is removed, and eager loading without sampling is the default when 'sampling_type' is not specified. The code to enable eager loading is moved from RSampler to the RDatasetLoader class where the 'ConcatenateDatasets' function is introduced. In the Flat2DMatrixOperators class the 'ShuffleTensors' function is changed such that it removes the unnecessary copying if 'shuffle=False'. Lastly, the changes described above is incorporated into the RBatchGenerator class to enable under- and oversampling and the new input parameters 'sampling_ratio' and 'replacement' are added to the python bindings for the loading into PyTorch, TensorFlow and Numpy tensors.
…he dataframes are correctly reset
87e015e to
b58ea60
Compare
Thank you for the review and the comments! I rebased to three commits now with changes from the last commit from the review into the first. |
Thank you! |
No description provided.