Skip to content

Fix integration test single stream#1996

Merged
clessig merged 2 commits intoecmwf:developfrom
pierluigicosi:pierluigicosi/dev/1993_fix_integration_single
Mar 6, 2026
Merged

Fix integration test single stream#1996
clessig merged 2 commits intoecmwf:developfrom
pierluigicosi:pierluigicosi/dev/1993_fix_integration_single

Conversation

@pierluigicosi
Copy link
Contributor

Description

Fix integration test for single stream:

  • Fixed bugs in small1_test.py
  • Rewritten small1.yml

Reference to #1860

Issue Number

Closes #1993

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@clessig
Copy link
Collaborator

clessig commented Mar 5, 2026

@pierluigicosi : Does the test pass the threshold for you?

@pierluigicosi
Copy link
Contributor Author

@pierluigicosi : Does the test pass the threshold for you?

I raised the threshold to 0.5 but i see sometimes it's not enough. Should we use an higher value?

Copy link
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor adjustments still needed

@@ -1,30 +1,219 @@
streams_directory: "./integration_tests/streams/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extension of the file should be yml

[
"inference",
"train",
f"--config={WEATHERGEN_HOME}/integration_tests/small1.yaml",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be --base-config

run_id: ???
run_history: []

train_log_freq:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be train_logging

training_mode: ["masking"]

num_mini_epochs: 1
samples_per_mini_epoch: 32 #128 #256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set to 48 and remove commented out values

# Check that the loss does not explode in a single mini_epoch
# This is meant to be a quick test, not a convergence test
target = 0.25
target = 0.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change back to 0.25.

assert loss_metric is not None, f"'{loss_avg_name}' metric is missing in metrics file"
# Check that the loss does not explode in a single mini_epoch
# This is meant to be a quick test, not a convergence test
assert loss_metric < 0.25, f"'{loss_avg_name}' is {loss_metric}, expected to be below 0.25"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to 0.1

@clessig
Copy link
Collaborator

clessig commented Mar 5, 2026 via email

Copy link
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, much needed.

@clessig
Copy link
Collaborator

clessig commented Mar 6, 2026

@tjhunter @eirinik : this would be a good first real test case for the CI pipeline

@clessig clessig merged commit 01010c5 into ecmwf:develop Mar 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Integration test single stream fails

2 participants