Skip to content

Refactor Multimodal#15

Merged
andrewscouten merged 58 commits intomainfrom
refactor/multiomnics
Mar 15, 2026
Merged

Refactor Multimodal#15
andrewscouten merged 58 commits intomainfrom
refactor/multiomnics

Conversation

@andrewscouten
Copy link
Copy Markdown
Collaborator

@andrewscouten andrewscouten commented Feb 26, 2026

Refactors multimodal code to work with:

  • uv dependency management
  • git submodule located in "submodules/biomed-multi-omic" (vs. locally copied code with no git history)

Closes #5, #13

@andrewscouten andrewscouten self-assigned this Feb 27, 2026
@andrewscouten
Copy link
Copy Markdown
Collaborator Author

It looks like source code files weren't committed properly. There are multiple references to a src/multimodal/src/data module, but the src/multimodal/.gitignore references /src/data/*. It's likely these files were unintentionally ignored, but as a result the code cannot run as-is.

@seohyun408, @seungjindes, @y00628, if any of you still have the source code, I'd appreciate if you push it. If not, I can try to reverse engineer based on the existing code.

@seungjindes
Copy link
Copy Markdown
Collaborator

seungjindes commented Feb 27, 2026 via email

@seungjindes
Copy link
Copy Markdown
Collaborator

seungjindes commented Feb 28, 2026 via email

@seungjindes
Copy link
Copy Markdown
Collaborator

seungjindes commented Feb 28, 2026 via email

@andrewscouten
Copy link
Copy Markdown
Collaborator Author

andrewscouten commented Feb 28, 2026

@seungjindes I am not the team's writer. I would ask one of the two of them, if they have not changed since. I am of the opinion that this is okay, but I would make sure first.

…dules

Add Multimodal's README.md so that refactor can reference it more easily.
…rs while keeping original functionality. Still needs more testing.
- Added a new `encoders.py` file to implement an encoder registration system.
- Implemented `register_encoder`, `get_encoder`, and `get_all_encoders` functions for managing encoders.
- Updated imports in `train_nvflare.py` to reflect new encoder structure.
- Refactored `trainer.py` to utilize PyTorch Lightning for training orchestration.
- Removed legacy configuration management code from `config.py`.
- Added tests for configuration loading and validation in `test_config.py`.
- Created example configuration files for testing purposes.
- Added YAML error handling in XenaCohortBuilder to raise ValueError for invalid configurations.
- Filtered empty cohort names in download script to prevent processing errors.
- Initialized _full_dataset in ImageDataModule and ClinicalDataModule to improve data handling.
- Updated PillowLoader to provide more informative error messages for image loading failures.
- Improved dataset validation in MultimodalDataModule to ensure only valid labels are processed.
- Enhanced encoder classes to conditionally freeze models based on configuration settings.
- Enhanced OncoTrainer to support hyperparameter optimization (HPO) using Optuna, including a new method to run HPO and apply best parameters to the training configuration.
- Updated L1 regularization calculation in BaseOncoClassifier to only include parameters that require gradients.
- Changed the registration string for GatedLateFusionConfig to include the full module path.
- Adjusted gradient clipping value handling in OncoTrainer to ensure it is only applied when greater than zero.
- Updated dependency management in `uv.lock` to include new packages: alembic, colorlog, greenlet, mako, optuna, and sqlalchemy, along with their respective versions and dependencies.
…ic assay data; enhance API client with retry logic
…zer and loss parameter handling in YAML and code
@andrewscouten andrewscouten linked an issue Mar 11, 2026 that may be closed by this pull request
- Introduced unit tests for the pipeline executor in `test_pipeline_executor.py`, covering various scenarios including loading data, joining datasets, and handling errors.
- Added unit tests for pipeline nodes in `test_pipeline_nodes.py`, validating default behaviors and configurations for `DataSource`, `Load`, `Join`, `Sequence`, and modality classes.
- Refactored image and multimodal data modules to improve structure and consistency in `test_image_e2e.py`, `test_multimodal_e2e.py`, and `test_tabular_e2e.py`.
- Updated configuration tests in `test_config.py` to reflect changes in the pipeline-based schema and removed deprecated modality tests.
- Consolidated data module tests in `test_datamodules.py` to focus on the new `ImageDataModule` and removed legacy tests for `GeneDataModule` and `ClinicalDataModule`.
- Enhanced the dataset registry tests in `test_registry.py` to include dataset registration and retrieval functionalities.
…e labels; refactor data modules and add Log2Normalization support
- Created train and test split files for fold 0 to fold 4 in the PAM50 and stage datasets.
- Implemented logging functionality to capture the KFold generation process, including patient counts and splits.
- Updated Docker Compose configuration to mount the configs directory for easier access within containers.
- Enhanced the kfold.py script to log output to a file while also displaying it in the console.
@andrewscouten
Copy link
Copy Markdown
Collaborator Author

andrewscouten commented Mar 15, 2026

While final verification still needs to be run to ensure it meets the described metrics and methods in the paper... I will have to take an extended break for exams. As our manuscript will be published soon, I want to push this out before hand. The current code in this PR:

  • Trains and inferences models.
  • Accurately, to my understanding, reproduces the multimodal model's missing code.
  • Defines a config / DLS system for reproducibility.
  • Defines a CLI for downloading and preprocessing data.
  • Documents everything

As such, I am going to be merging this branch to main so that the project is in a more complete state when the manuscript releases.

@andrewscouten andrewscouten marked this pull request as ready for review March 15, 2026 13:42
@andrewscouten andrewscouten merged commit cd90062 into main Mar 15, 2026
@andrewscouten andrewscouten deleted the refactor/multiomnics branch March 15, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Docs] Flesh out the Wiki more [Refactor] Multi-Modal

2 participants