refactor: updated pyproject toml dependencies, changed readme file na… by nazlicantrn · Pull Request #50 · EPFLiGHT/MultiMeditron

nazlicantrn · 2026-02-06T07:18:04Z

PR summary

This PR makes the OpenAI Batch augmentation pipeline fully env/CLI-driven, removing the need for code edits. It also consolidates dependencies in the root pyproject.toml and cleans up readme documentation inconsistencies across the repo.

…me conventions, added new bash scripts

Copilot

Pull request overview

This PR aims to make the OpenAI Batch augmentation workflow fully configurable via env/CLI (no code edits), consolidate “experts” dependencies into the root pyproject.toml, and clean up documentation across the experts pipelines.

Changes:

Added shell scripts to run the GPT augmentation pipeline for full runs and cost-estimation runs via environment variables.
Refactored parts of the GPT augmentation batch builder to be env-driven and updated related docs.
Added an experts optional-dependency group in pyproject.toml and updated experts READMEs to install via pip install -e ".[experts]".

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/multimeditron/experts/gpt_augmentation_pipeline/submit_batches.py	Removes shebang (execution now assumed via `python ...`).
src/multimeditron/experts/gpt_augmentation_pipeline/scripts/run_estimate.sh	New env-driven script to run a small sample and estimate cost.
src/multimeditron/experts/gpt_augmentation_pipeline/scripts/run_all.sh	New env-driven script to run the full pipeline end-to-end.
src/multimeditron/experts/gpt_augmentation_pipeline/make_batches.py	Refactors entrypoint to use env-driven settings (currently broken due to missing helpers).
src/multimeditron/experts/gpt_augmentation_pipeline/README.md	Documents new scripts + env vars (has a script-name mismatch).
src/multimeditron/experts/evaluation_pipeline/readme.md	Updates install instructions and fixes filename references.
src/multimeditron/experts/evaluation_pipeline/hard_benchmark_skin_tone_stratified.py	Adds a new SCIN skin-tone stratified hard benchmark script.
src/multimeditron/experts/evaluation_pipeline/disease_classification_pipeline/readme.md	Updates install instructions to use the new `experts` extra.
src/multimeditron/experts/dataset_processing/skin_expert/process_isic.py	Removes paraphrasing flags and simplifies ISIC processing flow.
src/multimeditron/experts/dataset_processing/readme.md	Updates script names/examples (contains path + command typos/inconsistencies).
pyproject.toml	Introduces `project.optional-dependencies.experts` with pinned versions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T10:00:36Z

 src/multimeditron/experts/dataset_processing
 ├── skin/
 │   ├── process_skin10.py
 │   ├── process_dermnet.py
 │   ├── process_fitzpatrick.py
-│   └── process_isic4.py
+│   └── process_isic.py
 ├── ophthalmology/
 │   ├── process_rfmid2.py
 │   ├── process_messidor2.py


The directory structure shown here uses skin/ and ophthalmology/, but the actual directories in this repo are skin_expert/ and ophthalmology_expert/ (e.g., src/multimeditron/experts/dataset_processing/skin_expert). Please update this tree so users can find the scripts at the documented paths.

Copilot · 2026-03-03T10:00:36Z

 **Usage:**
 ```bash
-python split_jsonl_train_val.py \
+python train_val_split.py.py \


The usage snippet has a typo: python train_val_split.py.py will fail. It should invoke the actual script filename once (e.g., python train_val_split.py ...).

Suggested change

python train_val_split.py.py \

python train_val_split.py \

Copilot · 2026-03-03T10:00:36Z

+python skin/process_isic.py \
  --output_images_root data/isic4/images \
-  --output_jsonl data/isic4/manifest.jsonl \
-  --paraphrase \
+  --output_jsonl data/isic/manifest.jsonl
+
+python paraphrase_jsonl.py \
+  --in_jsonl data/isic4/manifest.jsonl \
+  --out_jsonl data/isic/manifest_paraphrased.jsonl \
  --seed 42


This example uses python skin/process_isic.py, but the script in this repo is skin_expert/process_isic.py (under src/multimeditron/experts/dataset_processing/skin_expert). Also the paths are inconsistent (--output_images_root data/isic4/... vs --output_jsonl data/isic/..., and the paraphrasing step reads data/isic4/manifest.jsonl even though the previous command writes data/isic/manifest.jsonl). Please align the example commands/paths with the actual script location and produced files. Note: process_isic.py no longer supports --paraphrase/--seed, so the docs should reflect that paraphrasing happens only via paraphrase_jsonl.py.

Copilot · 2026-03-03T10:00:37Z

+def main() -> None:
+    task = _get_env_task()
+    nb_samples = _get_env_nb_samples()
+    batches_dir = _get_env_batches_dir()


main() calls _get_env_task(), _get_env_nb_samples(), and _get_env_batches_dir(), but these helpers are not defined or imported anywhere in this module. As-is, python make_batches.py will raise NameError and the new run_all.sh / run_estimate.sh scripts will fail. Define these functions (with validation/parsing for TASK_TYPE, NB_SAMPLES, BATCHES_DIR) or switch back to using os.getenv(...)/config.BATCHES_DIR directly in main().

Copilot · 2026-03-03T10:00:37Z

+Use the full pipeline script to process the entire dataset:
 ```bash
-python submit_batches.py
-```
+# Default runs full skin dataset
+./scripts/run_full.sh

-* Uploads each batch file to OpenAI
-* Submits Batch API jobs
-* Saves returned batch IDs for later retrieval
+# Customize task type
+TASK_TYPE=ophthalmology ./scripts/run_full.sh
+```


The README instructs running ./scripts/run_full.sh, but the repo adds scripts/run_all.sh (no run_full.sh). Update the documentation to reference the actual script name so the full-dataset instructions work as written.

refactor: updated pyproject toml dependencies, changed readme file na…

9f39d28

…me conventions, added new bash scripts

nazlicantrn requested a review from fabnemEPFL February 6, 2026 07:18

nazlicantrn self-assigned this Feb 6, 2026

fabnemEPFL requested a review from Copilot March 3, 2026 09:53

Copilot started reviewing on behalf of fabnemEPFL March 3, 2026 09:54 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: updated pyproject toml dependencies, changed readme file na…#50

refactor: updated pyproject toml dependencies, changed readme file na…#50
nazlicantrn wants to merge 1 commit intoEPFLiGHT:experts_newfrom
nazlicanturan:experts_new

nazlicantrn commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nazlicantrn commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants