Skip to content

Introduce under- and oversampling in RBatchGenerator#21079

Merged
martinfoell merged 3 commits intoroot-project:masterfrom
martinfoell:rbatchgenerator-under-over-sampling
Feb 3, 2026
Merged

Introduce under- and oversampling in RBatchGenerator#21079
martinfoell merged 3 commits intoroot-project:masterfrom
martinfoell:rbatchgenerator-under-over-sampling

Conversation

@martinfoell
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 30, 2026

Test Results

    22 files      22 suites   3d 15h 4m 51s ⏱️
 3 778 tests  3 778 ✅ 0 💤 0 ❌
75 141 runs  75 141 ✅ 0 💤 0 ❌

Results for commit b58ea60.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor remarks. Once the comments are addressed, squash the changes in the first commit please 👍

Comment thread tmva/tmva/inc/TMVA/BatchGenerator/RSampler.hxx Outdated
Comment thread tmva/tmva/inc/TMVA/BatchGenerator/RSampler.hxx Outdated
Comment thread tmva/tmva/inc/TMVA/BatchGenerator/RSampler.hxx Outdated
@martinfoell martinfoell force-pushed the rbatchgenerator-under-over-sampling branch from fbd6205 to 87e015e Compare February 2, 2026 16:23
Copy link
Copy Markdown
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. Before merging, can you squash the last commit that addresses the PR reviews into the first?

In this commit under- and oversampling strategies are implemented in RBatchGenerator to enable creating a balanced dataset from a dataset that consists of a minority and majority class when 'load_eager=True'. The under- and oversampling methods are implemented in the RSampler class, and the 'sampling_type' parameter can now take two values '"undersampling"' or '"oversampling"'. The '"random"' option for 'sampling_type' is removed, and eager loading without sampling is the default when 'sampling_type' is not specified. The code to enable eager loading is moved from RSampler to the RDatasetLoader class where the 'ConcatenateDatasets' function is introduced. In the Flat2DMatrixOperators class the 'ShuffleTensors' function is changed such that it removes the unnecessary copying if 'shuffle=False'. Lastly, the changes described above is incorporated into the RBatchGenerator class to enable under- and oversampling and the new input parameters 'sampling_ratio' and 'replacement' are added to the python bindings for the loading into PyTorch, TensorFlow and Numpy tensors.
@martinfoell martinfoell force-pushed the rbatchgenerator-under-over-sampling branch from 87e015e to b58ea60 Compare February 3, 2026 10:20
@martinfoell
Copy link
Copy Markdown
Contributor Author

The changes look good. Before merging, can you squash the last commit that addresses the PR reviews into the first?

Thank you for the review and the comments! I rebased to three commits now with changes from the last commit from the review into the first.

Copy link
Copy Markdown
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! LGTM now.

@martinfoell
Copy link
Copy Markdown
Contributor Author

Nice work! LGTM now.

Thank you!

@martinfoell martinfoell merged commit 406ec68 into root-project:master Feb 3, 2026
28 of 30 checks passed
@siliataider siliataider added the in:ML Everything under ROOT/ML label Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in:ML Everything under ROOT/ML

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants