-
Notifications
You must be signed in to change notification settings - Fork 273
[OMNIML-2914] Fix export of fused layernorm weights for TE spec #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis change adds an internal proxy class to support exporting fused layernorm weights in Tensor Engine (TE)-based transformer models. The proxy exposes fused layernorm weights correctly during state-dict export by wrapping TE modules, handling fusion across input, pre-MLP, and Mamba mixer layers without modifying public APIs. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #876 +/- ##
=======================================
Coverage 73.44% 73.44%
=======================================
Files 197 197
Lines 20657 20657
=======================================
Hits 15172 15172
Misses 5485 5485 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: James Shen <yueshen@nvidia.com>
ec257b8 to
ff4332f
Compare
Signed-off-by: James Shen <yueshen@nvidia.com>
| has_mcore = True | ||
|
|
||
| Qwen3VLModel = None | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does Qwen3VLModel exist in mcore? the other imports were from mcore. just wondering if this will cause a circular dependency since MBridge also depends on modelopt
| ) | ||
| else: | ||
| # For llama 4, in hf unified checkpoint, all local experts share one scale | ||
| self.rules["local_experts.linear_fc1"]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does non-grouped MLP export still work even if you remove these lines?
What does this PR do?
Type of change: Bug fix
Overview: When using TE spec (Transformer Engine layer specification), the
input_layernormandpre_mlp_layernormare fused intoTELayerNormColumnParallelLinear, making the standalone layernorm modulesIdentityOp. The HF export (export_mcore_gpt_to_hf) skipsIdentityOpmodules, which silently dropped these layernorm weights from the exported HF checkpoint — producing an incomplete checkpoint missinginput_layernorm.weightandpost_attention_layernorm.weightfor every layer.This PR adds
_FusedLayerNormProxy, a lightweightnn.Modulethat extracts the fusedlayer_norm_weight(and optionallayer_norm_bias) from the TE linear module and exposes it as a standard.weightattribute. This allows the existing export rules to produce the correct HF key names without any changes to the rule mappings. The fix is applied to both transformer layer and Mamba layer export paths.Usage
No API changes. The fix is transparent —
export_mcore_gpt_to_hfnow correctly exports layernorm weights regardless of whether the model uses TE spec or local spec.Testing
quantize.pyexport.pyinput_layernorm.weightandpost_attention_layernorm.weightfor all layersvllm/vllm-openai:latest,quantization="modelopt") — model loaded successfully and produced coherent textBefore your PR is "Ready for review"
elifbranches only activate when layernorm isIdentityOpand fused weights exist)Additional Information
The root cause is that TE spec fuses layernorm into the subsequent linear layer (
TELayerNormColumnParallelLinear), storing the norm weight as.layer_norm_weighton the linear module instead of as.weighton a standalone norm module. The export walker checksisinstance(layer.input_layernorm, IdentityOp)and skips it, but never looks for the fused weight on the linear module. This fix bridges that gap with a zero-overhead proxy.Summary by CodeRabbit