Add JEPA forecasting finetuning config and respect `enabled` flags by csjfwang · Pull Request #1946 · ecmwf/WeatherGenerator

csjfwang · 2026-02-27T10:10:03Z

Description

create config_jepa_forecasting_finetuning.yml
fix issue JEPA fine-tuning fails with 2D-rope #1943 , JEPA finetuning with 2D-RoPE

Issue Number

Closes #1943

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

2. fix issue 1943, JEPA finetuning with 2D-RoPE

sophie-xhonneux · 2026-02-27T10:20:18Z

config/config_jepa_finetuning.yml

    "student-teacher": {
        enabled: False,
-        type: LossLatentSSLStudentTeacher,
+        type: Disabled,


Why was this change necessary?

Because here is not filtered by enabled: False:

WeatherGenerator/src/weathergen/model/model.py

Line 489 in 730ee72

if v.type == "LossLatentSSLStudentTeacher"

But the fix here is to change the above line to:

if v.type == "LossLatentSSLStudentTeacher" and v.get( "enabled", True) :

sophie-xhonneux · 2026-02-27T10:20:53Z

config/config_jepa_finetuning.yml

      enabled: False,
      masking_strategy: "random",
-      num_samples: 1,
+      num_samples: 0,


if enabled: False already why do we need num_samples 0?

Since I re-used function get_batch_size_from_config (), and it doesn't use filter enabled: False, I will try to send another fix to avoid num_samples: 1 still be used.

WeatherGenerator/src/weathergen/model/model.py

Line 94 in 730ee72

self.batch_size_per_gpu = get_batch_size_from_config(cf.training_config)

oh I see, I guess in effect this part of the code gets the unfiltered config and that is the source of the error.

For now set it to 0 samples, but maybe raise an issue!

Thank you! Then I will keep the num_samples: 0 and raise an issue to report the unfiltered config thing!

nction get_batch_size_from_config (), and it doesn't use filter enabled: False, I will try to send another fix to

Where is this actually used?

nction get_batch_size_from_config (), and it doesn't use filter enabled: False, I will try to send another fix to

Where is this actually used?

I previously used it in the model.py to get the batch size which is used during the initialization of the rope_coords . But now, the use of self.batch_size_per_gpu is not needed, because it can be captured by broadcasting, as we discussed and tested here:
#1895 (comment)

I can now remove this line, which caused the error when enabled: False but num_samples is not zero:

WeatherGenerator/src/weathergen/model/model.py

Line 94 in c664e68

self.batch_size_per_gpu = get_batch_size_from_config(cf.training_config)

But I am thinking maybe better to send a PR to fix the function get_batch_size_from_config() as well? Because now it is not filtered by enabled: False, when we call this function from other place, it might cause the same error again.

WeatherGenerator/src/weathergen/train/utils.py

Line 139 in c664e68

def get_batch_size_from_config(config: Config) -> int:

Please let me know what do you think? Do we need to fix it as well? If so, in this PR or open a new one?

We can use this PR to clean it up and change the name. The discussion here is valuable.

sophie-xhonneux · 2026-02-27T10:21:36Z

config/config_jepa_forecasting_finetuning.yml

+# granted to it by virtue of its status as an intergovernmental organisation
+# nor does it submit to any jurisdiction.
+
+embed_orientation: "channels"


let's remove model params of the encoder, because they should be taken from the base_config anyway

@sophie-xhonneux I removed the params related to encoder, can you look again?

clessig

Let's please fix the cause of the problem in model.py and not patch over issues.

Can we please also change the comments to the standard style:

WeatherGenerator/src/weathergen/model/model.py

Line 96 in 730ee72

### POSITIONAL EMBEDDINGS ###

clessig · 2026-03-04T14:46:33Z

config/config_jepa_finetuning.yml

    "student-teacher": {
        enabled: False,
-        type: LossLatentSSLStudentTeacher,
+        type: Disabled,


But the fix here is to change the above line to:

if v.type == "LossLatentSSLStudentTeacher" and v.get( "enabled", True) :

clessig · 2026-03-04T14:51:36Z

config/config_jepa_finetuning.yml

      enabled: False,
      masking_strategy: "random",
-      num_samples: 1,
+      num_samples: 0,


nction get_batch_size_from_config (), and it doesn't use filter enabled: False, I will try to send another fix to

Where is this actually used?

…ve losses/heads. 2. change a few comments to the standard style

csjfwang · 2026-03-06T14:57:37Z

Hi @clessig ,

thanks again for the feedback, I addressed the points we discussed in the latest commit.
Updates:

Respect enabled flags consistently:

Batch size computation now ignores disabled model_input entries.
Active loss/loss-head selection now only considers enabled losses.

Updated a few comments you mentioned to a more standard style.

This should resolve the previous issue where enabled: False entries could still affect behavior in some paths.

Could you please take another look when you have time?

wang85 and others added 30 commits July 16, 2025 10:07

Replace cf.rank==0 with utils.distributed.is_root

317501e

replace cf.rank==0 with weathergen.utils.distributed.is_root

77de417

Merge branch 'ecmwf:develop' into develop

6439618

Merge branch 'ecmwf:develop' into develop

8993875

Merge branch 'ecmwf:develop' into develop

f4a9d85

Merge branch 'ecmwf:develop' into develop

f8fdef4

Merge branch 'ecmwf:develop' into develop

ca89e7b

Merge branch 'ecmwf:develop' into develop

49d7a4d

Merge branch 'ecmwf:develop' into develop

f39f094

Merge branch 'ecmwf:develop' into develop

ebb03ea

Merge branch 'ecmwf:develop' into develop

f40737d

Merge branch 'ecmwf:develop' into develop

87fa078

Merge branch 'ecmwf:develop' into develop

5dfe275

Merge branch 'ecmwf:develop' into develop

b7244d9

Merge branch 'ecmwf:develop' into develop

5be41f5

Merge branch 'ecmwf:develop' into develop

39d3965

Merge branch 'ecmwf:develop' into develop

015ec88

Merge branch 'ecmwf:develop' into develop

cb1b7cc

Merge branch 'ecmwf:develop' into develop

90da4cf

Merge branch 'ecmwf:develop' into develop

f04891b

Merge branch 'ecmwf:develop' into develop

105d992

Merge branch 'ecmwf:develop' into develop

5f56073

Merge branch 'ecmwf:develop' into develop

95ee18a

Merge branch 'ecmwf:develop' into develop

3c702d3

Merge branch 'ecmwf:develop' into develop

6f14a30

Merge branch 'ecmwf:develop' into develop

5e87881

Merge branch 'ecmwf:develop' into develop

0c7d305

Merge branch 'ecmwf:develop' into develop

e43ac94

Merge branch 'ecmwf:develop' into develop

5f63bcc

Merge branch 'ecmwf:develop' into develop

c51eb94

csjfwang and others added 17 commits November 26, 2025 21:12

Merge branch 'ecmwf:develop' into develop

dd5acc2

Merge branch 'ecmwf:develop' into develop

f03672d

Merge branch 'ecmwf:develop' into develop

49c52e1

Merge branch 'ecmwf:develop' into develop

c6356a2

Merge branch 'ecmwf:develop' into develop

36c709a

Merge branch 'ecmwf:develop' into develop

765276a

Merge branch 'ecmwf:develop' into develop

f3eb78a

Merge branch 'ecmwf:develop' into develop

542f23e

Merge branch 'ecmwf:develop' into develop

d14f61f

Merge branch 'ecmwf:develop' into develop

692703b

Merge branch 'ecmwf:develop' into develop

165f498

Merge branch 'ecmwf:develop' into develop

a5adf2a

Merge branch 'ecmwf:develop' into develop

0442d5d

Merge branch 'ecmwf:develop' into develop

eb8480d

Merge branch 'ecmwf:develop' into develop

fff2626

Merge branch 'ecmwf:develop' into develop

c612dff

1. create config_jepa_forecasting_finetuning.yml

9d140b1

2. fix issue 1943, JEPA finetuning with 2D-RoPE

github-project-automation bot added this to WeatherGen-dev Feb 27, 2026

sophie-xhonneux reviewed Feb 27, 2026

View reviewed changes

Jifeng Wang and others added 2 commits February 27, 2026 12:12

remove encoder related config in config_jepa_forecasting_finetuning.yml

b307e4e

Merge branch 'ecmwf:develop' into develop

a2f56d3

clessig self-requested a review March 2, 2026 13:50

clessig approved these changes Mar 4, 2026

View reviewed changes

csjfwang and others added 3 commits March 6, 2026 15:19

Merge branch 'ecmwf:develop' into develop

eca4792

Merge branch 'develop' into develop-add-jepa-forecast-finetune

ced659c

1. Respect enabled flags when computing batch size and selecting acti…

6d6fb16

…ve losses/heads. 2. change a few comments to the standard style

csjfwang changed the title ~~Create config for jepa forecasting finetuning, and fix jepa finetuning~~ Add JEPA forecasting finetuning config and respect enabled flags Mar 6, 2026

Conversation

csjfwang commented Feb 27, 2026

Description

Issue Number

Checklist before asking for review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

csjfwang commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants