Skip to content

expert distributions#3709

Draft
CUHKSZzxy wants to merge 8 commits intoInternLM:mainfrom
CUHKSZzxy:expert-distribution
Draft

expert distributions#3709
CUHKSZzxy wants to merge 8 commits intoInternLM:mainfrom
CUHKSZzxy:expert-distribution

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented Jul 4, 2025

Adapted from dlBLAS.

Usage

LMDEPLOY_DUMP_EXPERT_DISTRIBUTION=1 \
LMDEPLOY_EXPERT_DUMP_DIR="/tmp/lmdeploy/expert_distribution" \
LMDEPLOY_EXPERT_DUMP_FREQUENCY=30 \
LMDEPLOY_EXPERT_DUMP_VISUALIZE=1 \
LMDEPLOY_DP_MASTER_ADDR=0.0.0.0 \
LMDEPLOY_DP_MASTER_PORT=29555 \
lmdeploy serve api_server \
    Qwen/Qwen3-235B-A22B-FP8 \
    --backend pytorch \
    --tp 1 \
    --dp 4 \
    --ep 4 \
    --proxy-url http://0.0.0.0:8001 \
    --nnodes 1 \
    --node-rank 0 \
    --log-level INFO

Result

rank0_step120_expert_counts_heatmap

@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review July 4, 2025 04:16
Copilot AI review requested due to automatic review settings March 30, 2026 12:54
@CUHKSZzxy CUHKSZzxy marked this pull request as draft March 30, 2026 12:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an MoE expert-dispatch distribution recorder to the PyTorch backend, enabling periodic dumps of routed expert counts to JSON for debugging/analysis (intended for eager-mode runs).

Changes:

  • Introduce ExpertsDistributionRecorder utility (real vs no-op based on env flags) that aggregates and dumps expert token counts.
  • Hook the recorder into Qwen3 MoE and DeepSeek V2 MoE forward passes to record topk_ids.
  • Add new environment variables for enabling/disabling and configuring dump output.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
lmdeploy/pytorch/models/utils/expert_distribution_recorder.py New recorder implementation that tracks expert token counts and periodically dumps JSON.
lmdeploy/pytorch/models/qwen3_moe.py Records MoE router topk_ids per forward for Qwen3 MoE blocks.
lmdeploy/pytorch/models/deepseek_v2.py Records MoE router topk_ids per forward for DeepSeek V2 MoE blocks; stores layer_idx.
lmdeploy/pytorch/envs.py Adds env configuration knobs for expert distribution dumping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CUHKSZzxy and others added 5 commits March 30, 2026 21:27
- Sort by (layer_index, num_experts) for stable JSON output
- Use topk_ids.device instead of hard-coded 'cuda'
- Use reshape(-1) instead of view(-1) for non-contiguous safety
- Move all_reduce inside dump block to avoid per-step sync overhead
- Replace minute-modulo dump guard with absolute timestamp; validate dump_frequency >= 1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants