resume training does not work for multi-gpus training

I add `--resume_checkpoint $path_to_checkpoint$` to continue the training, it works for a single GPU, but does not work for multi-gpus

the code gets stuck here:

Logging to /proj/ihorse_2021/users/x_manni/guided-diffusion/log9
creating model and diffusion...
creating data loader...
start training...
loading model from checkpoint: /proj/ihorse_2021/users/x_manni/guided-diffusion/log9/model200000.pt...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resume training does not work for multi-gpus training #23

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

resume training does not work for multi-gpus training #23

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions