Add JEPA forecasting finetuning config and respect enabled flags#1946
Add JEPA forecasting finetuning config and respect enabled flags#1946csjfwang wants to merge 52 commits intoecmwf:developfrom
enabled flags#1946Conversation
2. fix issue 1943, JEPA finetuning with 2D-RoPE
| "student-teacher": { | ||
| enabled: False, | ||
| type: LossLatentSSLStudentTeacher, | ||
| type: Disabled, |
There was a problem hiding this comment.
Why was this change necessary?
There was a problem hiding this comment.
Because here is not filtered by enabled: False:
There was a problem hiding this comment.
But the fix here is to change the above line to:
if v.type == "LossLatentSSLStudentTeacher" and v.get( "enabled", True) :
| enabled: False, | ||
| masking_strategy: "random", | ||
| num_samples: 1, | ||
| num_samples: 0, |
There was a problem hiding this comment.
if enabled: False already why do we need num_samples 0?
There was a problem hiding this comment.
Since I re-used function get_batch_size_from_config (), and it doesn't use filter enabled: False, I will try to send another fix to avoid num_samples: 1 still be used.
There was a problem hiding this comment.
oh I see, I guess in effect this part of the code gets the unfiltered config and that is the source of the error.
For now set it to 0 samples, but maybe raise an issue!
There was a problem hiding this comment.
Thank you! Then I will keep the num_samples: 0 and raise an issue to report the unfiltered config thing!
There was a problem hiding this comment.
nction
get_batch_size_from_config (), and it doesn't use filterenabled: False, I will try to send another fix to
Where is this actually used?
There was a problem hiding this comment.
nction
get_batch_size_from_config (), and it doesn't use filterenabled: False, I will try to send another fix toWhere is this actually used?
I previously used it in the model.py to get the batch size which is used during the initialization of the rope_coords . But now, the use of self.batch_size_per_gpu is not needed, because it can be captured by broadcasting, as we discussed and tested here:
#1895 (comment)
I can now remove this line, which caused the error when enabled: False but num_samples is not zero:
But I am thinking maybe better to send a PR to fix the function get_batch_size_from_config() as well? Because now it is not filtered by enabled: False, when we call this function from other place, it might cause the same error again.
Please let me know what do you think? Do we need to fix it as well? If so, in this PR or open a new one?
There was a problem hiding this comment.
We can use this PR to clean it up and change the name. The discussion here is valuable.
| # granted to it by virtue of its status as an intergovernmental organisation | ||
| # nor does it submit to any jurisdiction. | ||
|
|
||
| embed_orientation: "channels" |
There was a problem hiding this comment.
let's remove model params of the encoder, because they should be taken from the base_config anyway
There was a problem hiding this comment.
@sophie-xhonneux I removed the params related to encoder, can you look again?
clessig
left a comment
There was a problem hiding this comment.
Let's please fix the cause of the problem in model.py and not patch over issues.
Can we please also change the comments to the standard style:
| "student-teacher": { | ||
| enabled: False, | ||
| type: LossLatentSSLStudentTeacher, | ||
| type: Disabled, |
There was a problem hiding this comment.
But the fix here is to change the above line to:
if v.type == "LossLatentSSLStudentTeacher" and v.get( "enabled", True) :
| enabled: False, | ||
| masking_strategy: "random", | ||
| num_samples: 1, | ||
| num_samples: 0, |
There was a problem hiding this comment.
nction
get_batch_size_from_config (), and it doesn't use filterenabled: False, I will try to send another fix to
Where is this actually used?
…ve losses/heads. 2. change a few comments to the standard style
enabled flags
|
Hi @clessig , thanks again for the feedback, I addressed the points we discussed in the latest commit.
This should resolve the previous issue where Could you please take another look when you have time? |
Description
Issue Number
Closes #1943
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60