expert distributions by CUHKSZzxy · Pull Request #3709 · InternLM/lmdeploy

CUHKSZzxy · 2025-07-04T04:01:07Z

Adapted from dlBLAS.

Usage

LMDEPLOY_DUMP_EXPERT_DISTRIBUTION=1 \
LMDEPLOY_EXPERT_DUMP_DIR="/tmp/lmdeploy/expert_distribution" \
LMDEPLOY_EXPERT_DUMP_FREQUENCY=30 \
LMDEPLOY_EXPERT_DUMP_VISUALIZE=1 \
LMDEPLOY_DP_MASTER_ADDR=0.0.0.0 \
LMDEPLOY_DP_MASTER_PORT=29555 \
lmdeploy serve api_server \
    Qwen/Qwen3-235B-A22B-FP8 \
    --backend pytorch \
    --tp 1 \
    --dp 4 \
    --ep 4 \
    --proxy-url http://0.0.0.0:8001 \
    --nnodes 1 \
    --node-rank 0 \
    --log-level INFO

Result

Copilot

Pull request overview

Adds an MoE expert-dispatch distribution recorder to the PyTorch backend, enabling periodic dumps of routed expert counts to JSON for debugging/analysis (intended for eager-mode runs).

Changes:

Introduce ExpertsDistributionRecorder utility (real vs no-op based on env flags) that aggregates and dumps expert token counts.
Hook the recorder into Qwen3 MoE and DeepSeek V2 MoE forward passes to record topk_ids.
Add new environment variables for enabling/disabling and configuring dump output.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`lmdeploy/pytorch/models/utils/expert_distribution_recorder.py`	New recorder implementation that tracks expert token counts and periodically dumps JSON.
`lmdeploy/pytorch/models/qwen3_moe.py`	Records MoE router `topk_ids` per forward for Qwen3 MoE blocks.
`lmdeploy/pytorch/models/deepseek_v2.py`	Records MoE router `topk_ids` per forward for DeepSeek V2 MoE blocks; stores `layer_idx`.
`lmdeploy/pytorch/envs.py`	Adds env configuration knobs for expert distribution dumping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/pytorch/models/utils/expert_distribution_recorder.py

- Sort by (layer_index, num_experts) for stable JSON output - Use topk_ids.device instead of hard-coded 'cuda' - Use reshape(-1) instead of view(-1) for non-contiguous safety - Move all_reduce inside dump block to avoid per-step sync overhead - Replace minute-modulo dump guard with absolute timestamp; validate dump_frequency >= 1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

CUHKSZzxy added 2 commits July 4, 2025 11:54

expert distributions

a84d356

add expert distribution recorder for deepseek

184ef28

CUHKSZzxy marked this pull request as ready for review July 4, 2025 04:16

Merge branch 'main' into expert-distribution

a574968

Copilot AI review requested due to automatic review settings March 30, 2026 12:54

CUHKSZzxy marked this pull request as draft March 30, 2026 12:55

Copilot started reviewing on behalf of CUHKSZzxy March 30, 2026 12:55 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

CUHKSZzxy and others added 5 commits March 30, 2026 21:27

cuda graph compatible

a39981f

improve dump, add visualization

0055f31

convenient visualize, adjust freq

da5d350

fix deadlock, adjust axis

26723a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expert distributions#3709

expert distributions#3709
CUHKSZzxy wants to merge 8 commits intoInternLM:mainfrom
CUHKSZzxy:expert-distribution

CUHKSZzxy commented Jul 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CUHKSZzxy commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUHKSZzxy commented Jul 4, 2025 •

edited

Loading