Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request introduces balanced feature subsampling and refactors the preprocessing pipeline to be aware of feature budgets. Key changes include the addition of a num_added_features method across preprocessing steps to accurately calculate post-transformation feature counts and the migration of pipeline creation to the TabPFNEnsemblePreprocessor initialization. Feedback focuses on performance bottlenecks, specifically the O(N^2) complexity of feature budget calculations, memory inefficiencies caused by data slicing in the main process, and redundant object copies or instantiations that could impact high-dimensional data processing and fine-tuning speed.
…bs/TabPFN into ben/balanced-feature-subsampling
…bs/TabPFN into ben/balanced-feature-subsampling
…bs/TabPFN into ben/balanced-feature-subsampling
LeoGrin
left a comment
There was a problem hiding this comment.
Looks great thanks a lot!
Nit: maybe we could add an end to end test that our estimator works with balanced feature subsampling on, and with features > max_features per estimators?
Uh oh!
There was an error while loading. Please reload this page.