Skip to content

fix(config): align default decoder_kwargs with checkpoint (3× patch2)#8

Open
Mi221e wants to merge 1 commit intoOpenMOSS:mainfrom
Mi221e:fix/decoder-default-symmetric-patches
Open

fix(config): align default decoder_kwargs with checkpoint (3× patch2)#8
Mi221e wants to merge 1 commit intoOpenMOSS:mainfrom
Mi221e:fix/decoder-default-symmetric-patches

Conversation

@Mi221e
Copy link
Copy Markdown

@Mi221e Mi221e commented Apr 13, 2026

Summary

The default decoder_kwargs in MossAudioTokenizerConfig contained an extra Transformer (384→768) and an extra PatchedPretransform(patch_size=2) compared to the shipped config.json and trained weights.

Why

  • Encoder: 240 × 2³ = 1920 matches downsample_rate.
  • Decoder must mirror the three ×2 upsample stages plus final ×240; four ×2 stages breaks the ratio and does not match the checkpoint.

Verification

  • Matches decoder_kwargs in the official model config.json (e.g. Hugging Face / local weights).
  • Weight index has four decoder Transformer modules (decoder.0/2/4/6), consistent with three interleaved patch_size:2 PatchedPretransform layers (no weights) plus final patch_size:240.

Remove duplicate Transformer + PatchedPretransform stage so total
temporal ratio stays 240×2³=1920, matching encoder and official config.json.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant