Skip to content

Conversion scripts fail on standard TPU VMs for large models #3418

@zzxslp

Description

@zzxslp

Bug report

The official instruction for model conversion failed on a standard TPU-v5p VM for large models such as QWen3-235B, with CPU OOM errors when sharding MoE layers. Since on GCP the CPU memory is fixed (400GB), I wonder if we can improve the script to bypass this issue, or is the doc outdated?

Also for the model conversion the sharding process with default simulated_cpu_devices_count=16 is very very slow (even for 30B model).

Logs/Output

No response

Environment Information

No response

Additional Context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions