fix(config): align default decoder_kwargs with checkpoint (3× patch2) by Mi221e · Pull Request #8 · OpenMOSS/MOSS-Audio-Tokenizer

Mi221e · 2026-04-13T07:32:34Z

Summary

The default decoder_kwargs in MossAudioTokenizerConfig contained an extra Transformer (384→768) and an extra PatchedPretransform(patch_size=2) compared to the shipped config.json and trained weights.

Why

Encoder: 240 × 2³ = 1920 matches downsample_rate.
Decoder must mirror the three ×2 upsample stages plus final ×240; four ×2 stages breaks the ratio and does not match the checkpoint.

Verification

Matches decoder_kwargs in the official model config.json (e.g. Hugging Face / local weights).
Weight index has four decoder Transformer modules (decoder.0/2/4/6), consistent with three interleaved patch_size:2 PatchedPretransform layers (no weights) plus final patch_size:240.

Remove duplicate Transformer + PatchedPretransform stage so total temporal ratio stays 240×2³=1920, matching encoder and official config.json. Made-with: Cursor

fix(config): align default decoder_kwargs with checkpoint (3× patch2)

5c979e6

Remove duplicate Transformer + PatchedPretransform stage so total temporal ratio stays 240×2³=1920, matching encoder and official config.json. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(config): align default decoder_kwargs with checkpoint (3× patch2)#8

fix(config): align default decoder_kwargs with checkpoint (3× patch2)#8
Mi221e wants to merge 1 commit intoOpenMOSS:mainfrom
Mi221e:fix/decoder-default-symmetric-patches

Mi221e commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mi221e commented Apr 13, 2026

Summary

Why

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant