-
Notifications
You must be signed in to change notification settings - Fork 135
[AMD] [code not in mergable state yet][blocker waiting for more nodes to speed up dev iteration speed] mi325 sglang disagg #985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
JordanNanos
wants to merge
50
commits into
main
Choose a base branch
from
jordan/mi325x-disagg-bnxt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
834fe82
Add MI325X DeepSeek-R1 FP8 disaggregated inference (1P1D, Broadcom Th…
7b50476
Update amd-master.yaml
JordanNanos b40908c
Add MTP config, expand sweep to full pareto frontier, use -good image
2421ca5
Add perf-changelog entry for MI325X disagg configs
6abdf85
Fix MI325X QoS detection and NFS-safe cleanup for disagg benchmarks
JordanNanos 3716258
Add local NVMe model caching for faster model loading
JordanNanos db677bd
Switch model caching from rsync to rclone sync
JordanNanos 0a485de
Add MTP baseline to single-node MI325X DeepSeek-R1 FP8 config
JordanNanos 67dec7c
Split MI325X single-node MTP into separate config key
JordanNanos f18257f
Fix MI325X single-node script resolution and add MTP support
JordanNanos 3ccfba3
Fix decode dispatch token limit for DP attention disagg configs
JordanNanos 0213032
Disable EP8/DP disagg configs on MI325X and bump MTP to 3 tokens
JordanNanos 2afb24a
Add single-node EP8/DP test configs for MI325X disagg
JordanNanos 36aebfd
Move container image to semianalysiswork Docker Hub and fix launcher …
JordanNanos b5a0bc2
Test EP8/DP workaround: drop MoRI a2a backend on MI325X bnxt_re
JordanNanos beb3808
Fix MODEL_NAME for EP8/DP test configs with MODEL_YAML_KEY override
JordanNanos 23c2931
fix: resolve MODEL_NAME from flat repo dir when HF snapshot absent
JordanNanos e5b9d00
Tune EP8/DP test: lower concurrency + QP params for SQ full fix
JordanNanos 76d89d0
fix: lower bnxt_re QP limits and concurrency for MI325X EP8/DP disagg
JordanNanos 4d9ee30
Add GLM-5 FP8 single-node benchmark for MI325X
JordanNanos 13c1167
Skip HF download validation when model is cached on MI325X
JordanNanos d4d6e19
Add Qwen3.5 and GLM-5 FP8 disaggregated inference for MI325X
JordanNanos 5228c62
Fix HF cache path resolution: use sed instead of tr for org/repo sepa…
JordanNanos b08abaf
Sanitize MODEL_NAME in Docker container name
JordanNanos 6dbaa19
Force-reinstall transformers for GLM-5 in disagg Docker containers
JordanNanos 2c24d0d
Switch GLM-5 MI325X configs to v0.5.10 image
JordanNanos d3522ec
Switch GLM-5 MI325X to MI355X GLM-5 image (rocm/sgl-dev mori-0402)
JordanNanos 5dd235f
Switch Qwen3.5/GLM-5 disagg to v0.5.10 image + no-MoRI transfer
JordanNanos d8abc66
Switch Qwen3.5/GLM-5 disagg to v0.5.10 image + no-MoRI transfer
JordanNanos 44780e0
Fix YAML: switch Qwen3.5/GLM-5 disagg to v0.5.10 + no-MoRI transfer
JordanNanos 21ce11a
Remove MODEL_NAME overrides — let launcher resolve HF cache path
JordanNanos fc2f0d9
Fix TP mismatch for non-MLA models in Qwen3.5/GLM-5 disagg
JordanNanos c956ce2
Add MI325X container image build scripts and documentation
JordanNanos 18f1c5c
Use latest SGLang main for MI325X image build
JordanNanos 13be2f6
Update build script default to SGL_BRANCH=v0.5.10
JordanNanos 9ec6e9d
Add transformers patch layer for GLM-5/Qwen3.5 model type support
JordanNanos 02645c7
Build from SGLang main for Qwen3.5/GLM-5 PD disagg fixes
JordanNanos 947e339
Switch Qwen3.5/GLM-5 to main-bnxt image with PD disagg fixes
JordanNanos d648774
Switch to v0.5.10-bnxt-patched (PD fixes + transformers patch)
JordanNanos d6053e1
Add thin bnxt layer Dockerfile for existing SGLang images
JordanNanos 757d015
Switch Qwen3.5/GLM-5 to amd-disagg-bnxt-lite image
JordanNanos a205fda
Switch to amd-main-bnxt image (full AMD fork build)
JordanNanos e661747
Switch to amd-main-bnxt-nopatch (no transformers override)
JordanNanos 03015e0
Update MI325X runners to new amds naming convention
JordanNanos 268be7b
Install tiktoken/sentencepiece in disagg server for GLM-5 tokenizer
JordanNanos 02154ef
Switch Qwen3.5/GLM-5 disagg to explicit Mooncake transfer backend
JordanNanos dcade3d
Build MI325X image from MI355X Qwen3.5 disagg base + bnxt
JordanNanos b8d3d44
Fix router health check: use /health instead of /readiness
JordanNanos 3b9eb4f
Add --trust-remote-code to disagg benchmark for Qwen3.5/GLM-5
JordanNanos ec7e4f3
Switch to PR #22665 image with MoRI DSA/GDN fix
JordanNanos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u verify if these changes break mi355 disagg? +viz @Oseltamivir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the check for nicctl was breaking on this cluster, MoRI needs it to enforce QoS, disabled for now as it's not installed on these nodes or in the container built and seems unnecessary