[OMNIML-2914] Fix export of fused layernorm weights for TE spec #876

yueshen2016 · 2026-02-10T19:16:54Z

What does this PR do?

Type of change: Bug fix

Overview: When using TE spec (Transformer Engine layer specification), the input_layernorm and pre_mlp_layernorm are fused into TELayerNormColumnParallelLinear, making the standalone layernorm modules IdentityOp. The HF export (export_mcore_gpt_to_hf) skips IdentityOp modules, which silently dropped these layernorm weights from the exported HF checkpoint — producing an incomplete checkpoint missing input_layernorm.weight and post_attention_layernorm.weight for every layer.

This PR adds _FusedLayerNormProxy, a lightweight nn.Module that extracts the fused layer_norm_weight (and optional layer_norm_bias) from the TE linear module and exposes it as a standard .weight attribute. This allows the existing export rules to produce the correct HF key names without any changes to the rule mappings. The fix is applied to both transformer layer and Mamba layer export paths.

Usage

No API changes. The fix is transparent — export_mcore_gpt_to_hf now correctly exports layernorm weights regardless of whether the model uses TE spec or local spec.

# Same usage as before — no changes needed
import modelopt.torch.export as mtex
mtex.export_mcore_gpt_to_hf(model, hf_model_id, export_dir="./hf_export")

Testing

Quantized Llama-3.2-1B with TE spec via Megatron-Bridge quantize.py
Exported to HF format via export.py
Verified the exported safetensors contain input_layernorm.weight and post_attention_layernorm.weight for all layers
Ran end-to-end inference with vLLM (vllm/vllm-openai:latest, quantization="modelopt") — model loaded successfully and produced coherent text

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes — local spec models are unaffected (the elif branches only activate when layernorm is IdentityOp and fused weights exist)
Did you write any new necessary tests?: No
Did you add or update any necessary documentation?: No
Did you update Changelog?: No

Additional Information

The root cause is that TE spec fuses layernorm into the subsequent linear layer (TELayerNormColumnParallelLinear), storing the norm weight as .layer_norm_weight on the linear module instead of as .weight on a standalone norm module. The export walker checks isinstance(layer.input_layernorm, IdentityOp) and skips it, but never looks for the fused weight on the linear module. This fix bridges that gap with a zero-overhead proxy.

Summary by CodeRabbit

Bug Fixes
- Improved export support for transformer and MTP models with fused layer normalization components, ensuring proper tensor mapping and export compatibility.

coderabbitai · 2026-02-10T19:17:17Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This change adds an internal proxy class to support exporting fused layernorm weights in Tensor Engine (TE)-based transformer models. The proxy exposes fused layernorm weights correctly during state-dict export by wrapping TE modules, handling fusion across input, pre-MLP, and Mamba mixer layers without modifying public APIs.

Changes

Cohort / File(s)	Summary
Fused Layernorm Export Support `modelopt/torch/export/unified_export_megatron.py`	Added `_FusedLayerNormProxy` internal class to expose fused layernorm weights from TE modules. Extended `_get_transformer_layer_state_dict` and `_get_mamba_layer_state_dict` functions to detect and wrap fused layernorms (input, pre-MLP, and mixer norms) with the proxy for proper export mapping.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing export of fused layernorm weights for the Transformer Engine (TE) specification, which is the core issue addressed in this PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yueshen/export-fused-layernorm-te-spec

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-10T19:29:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.44%. Comparing base (5e43b2a) to head (c828300).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #876   +/-   ##
=======================================
  Coverage   73.44%   73.44%           
=======================================
  Files         197      197           
  Lines       20657    20657           
=======================================
  Hits        15172    15172           
  Misses       5485     5485

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: James Shen <yueshen@nvidia.com>

jenchen13 · 2026-02-10T21:07:17Z

modelopt/torch/export/unified_export_megatron.py

    has_mcore = True

+Qwen3VLModel = None
+try:


does Qwen3VLModel exist in mcore? the other imports were from mcore. just wondering if this will cause a circular dependency since MBridge also depends on modelopt

jenchen13 · 2026-02-10T21:18:15Z

modelopt/torch/export/unified_export_megatron.py

                        )
                else:
-                    # For llama 4, in hf unified checkpoint, all local experts share one scale
-                    self.rules["local_experts.linear_fc1"](


does non-grouped MLP export still work even if you remove these lines?

yueshen2016 requested a review from a team as a code owner February 10, 2026 19:16

yueshen2016 requested review from kevalmorabia97 and meenchen and removed request for meenchen February 10, 2026 19:16

yueshen2016 requested review from ChenhanYu and jenchen13 February 10, 2026 19:17

Fix export of fused layernorm weights for TE spec.

ff4332f

Signed-off-by: James Shen <yueshen@nvidia.com>

yueshen2016 force-pushed the yueshen/export-fused-layernorm-te-spec branch from ec257b8 to ff4332f Compare February 10, 2026 19:45

Support GroupedMLP and Qwen3-VL for unified checkpoint export

c828300

Signed-off-by: James Shen <yueshen@nvidia.com>

jenchen13 reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OMNIML-2914] Fix export of fused layernorm weights for TE spec #876

[OMNIML-2914] Fix export of fused layernorm weights for TE spec #876

Uh oh!

yueshen2016 commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

jenchen13 Feb 10, 2026

Uh oh!

jenchen13 Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[OMNIML-2914] Fix export of fused layernorm weights for TE spec #876

Are you sure you want to change the base?

[OMNIML-2914] Fix export of fused layernorm weights for TE spec #876

Uh oh!

Conversation

yueshen2016 commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

codecov bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jenchen13 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yueshen2016 commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

codecov bot commented Feb 10, 2026 •

edited

Loading