Skip to content

resume training does not work for multi-gpus training #23

@forever208

Description

@forever208

I add --resume_checkpoint $path_to_checkpoint$ to continue the training, it works for a single GPU, but does not work for multi-gpus

the code gets stuck here:

Logging to /proj/ihorse_2021/users/x_manni/guided-diffusion/log9
creating model and diffusion...
creating data loader...
start training...
loading model from checkpoint: /proj/ihorse_2021/users/x_manni/guided-diffusion/log9/model200000.pt...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions