Warm start and frozen teachers by sophie-xhonneux · Pull Request #1876 · ecmwf/WeatherGenerator

sophie-xhonneux · 2026-02-18T18:27:19Z

Description

Allow for the warm start with EMA and Frozen Teachers

Issue Number

Closes #1881

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

…iex/dev/warm-and-frozen-teachers

…to be (at least here) identical

clessig

Overall looks fine. I pushed some minor changes. config_jepa.yml has 2D rope param but it's not in here. This should be removed (it was also one of the things that caused problems for me).

clessig · 2026-02-27T06:44:38Z

src/weathergen/model/model_interface.py

+            cf, target_and_aux_calc_params.get("model_param_overrides", {})
+        )
+        prepare_encoder_teacher(
+            meta_ema_model, cf.training_config, cf_overridden.ae_global_dim_embed


It would be more generic to pass cf_overridden to prepare_encoder_teacher(); there might be more params in the future from the config that are useful beyond cf_overridden.ae_global_dim_embed

will fix that

clessig · 2026-02-27T06:48:43Z

src/weathergen/train/target_and_aux_ssl_teacher.py

+        self.batch_size = batch_size
+        self.reset()
+
+    def _forward_teacher(self, model_params, batch):


I don't think it's a "private" function since it's called from in the base class. We also usually don't use the '_' convention so I would remove.

I can rename it

clessig · 2026-02-27T06:49:58Z

src/weathergen/train/target_and_aux_ssl_teacher.py

+class FrozenTeacher(EncoderTeacher):
+    """SSL teacher using a frozen pre-trained encoder.
+
+    The encoder is loaded from a checkpoint and never updated. Non-encoder


The teacher_model is assumed to have non-encoder parts discarded, not?

The code should do the discarding the original model as specified in its config associated to its run id may have an encoder

clessig · 2026-02-27T06:51:37Z

src/weathergen/train/target_and_aux_ssl_teacher.py

+        self.teacher_model.eval()
+
+    @classmethod
+    def from_pretrained(cls, cf: Config, dataset, device, params: dict) -> FrozenTeacher:


This function is inconsistent with what is done for EMATeacher in model_interface. Either we have from_pretrained() for both classes or we have the functionality in model_inferface.py

But they conceptually and functionally do different things, so I don't follow

Ok, can you then maybe briefly explain what the difference is for you between this here and load_encoder_from_checkpoint()

clessig · 2026-02-27T06:53:01Z

src/weathergen/train/teacher_utils.py

+logger = logging.getLogger(__name__)
+
+
+def _create_head(name: str, head_type: str, dim_embed: int, loss_conf, cf=None) -> nn.Module:


If this is for teacher_heads then the function same should say so.

will rename

clessig · 2026-02-27T06:55:29Z

src/weathergen/train/teacher_utils.py

+    model.pred_heads = nn.ModuleDict()
+
+    # Ensure latent_pre_norm exists (teacher may not have had SSL training)
+    if model.latent_pre_norm is None:


When/why wouldn't this exist?

I don't understand, what makes you think we can assume this layer norm exists?

Ok, I assumed it always exists because it's used in the output.

clessig · 2026-02-27T06:57:17Z

src/weathergen/train/teacher_utils.py

+    3. Creates fresh latent_heads based on the student's SSL loss config
+    """
+    # Strip non-encoder components
+    model.forecast_engine = None


Can we formulate it as is not encoder so that we are robust to changes in the model design, e.g. we discussed to have a decoder-type model for the stream-specific prediction heads and we will most likely forget this hidden dependency here. Otherwise, we might have a function in model that reduces it to the encoder which is called here.

Something similar to

encoder_params = { k: v for k, v in params.items() if k.startswith(("encoder.", "latent_pre_norm")) }

okay, will change this

clessig · 2026-02-27T06:59:01Z

src/weathergen/train/teacher_utils.py

+                logger.warning(f"Unknown SSL loss type {name!r} in teacher setup, skipping.")
+
+
+def load_encoder_from_checkpoint(


Why do we need this as well as the first part of prepare_encoder_teacher(); it seems to be the same functionality

clessig · 2026-02-27T07:00:34Z

config/config_ema_warm_start.yml

@@ -0,0 +1,16 @@
+ training_config:                                                                                                                                                                                                                                  


How is this config to use used? Maybe we can given an example at the top what pretraining can be used. Copyright is also missing

it is for testing purposes will remove at the end

clessig · 2026-02-27T07:00:38Z

config/config_frozen_teacher.yml

@@ -0,0 +1,7 @@
+training_config:


How is this config to use used? Maybe we can given an example at the top what pretraining can be used. Copyright is also missing

Write first solution with Claude

d6d51da

github-project-automation bot added this to WeatherGen-dev Feb 18, 2026

sophie-xhonneux changed the title ~~Write first solution with Claude~~ Warm start and frozen teachers Feb 19, 2026

sophie-xhonneux requested review from clessig and shmh40 February 19, 2026 10:10

github-actions bot added the model Related to model training or definition (not generic infra) label Feb 19, 2026

sophie-xhonneux and others added 4 commits February 19, 2026 13:37

Add test configs, works on santis

a6e4c25

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

e65241a

…iex/dev/warm-and-frozen-teachers

Merge branch 'develop' into sophiex/dev/warm-and-frozen-teachers

66de4c0

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

9fbe081

…iex/dev/warm-and-frozen-teachers

clessig mentioned this pull request Feb 27, 2026

JEPA fine-tuning fails with 2D-rope #1943

Open

Disabling rope; removing model config from finetuning since it needs …

af9a02c

…to be (at least here) identical

clessig reviewed Feb 27, 2026

View reviewed changes

sophie-xhonneux added 2 commits February 27, 2026 16:40

Merge branch 'develop' into sophiex/dev/warm-and-frozen-teachers

da23d8a

Add new JEPA config

1ccad2b

		logger = logging.getLogger(__name__)


		def _create_head(name: str, head_type: str, dim_embed: int, loss_conf, cf=None) -> nn.Module:

		logger.warning(f"Unknown SSL loss type {name!r} in teacher setup, skipping.")


		def load_encoder_from_checkpoint(

Conversation

sophie-xhonneux commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue Number

Checklist before asking for review

Uh oh!

clessig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sophie-xhonneux commented Feb 18, 2026 •

edited

Loading