[Feat][Plugin] Enable spec decoding for vLLM Plugin by whx-sjtu · Pull Request #557 · ROCm/ATOM

whx-sjtu · 2026-04-14T06:44:19Z

Motivation

This PR enables spec decode feature for running GLM5 with vLLM+atom.

Technical Details

atom_config related bugfix.
Fix wrong full_cls_name of different MLA sparse attention backends.
Register model architecture and model class for GLM5 MTP
Add index_buffer for DeepseekMTP.
Adapt full graph of main model with mtp enabled.

Test Plan

Comming soon.

Test Result

zai-org/GLM-5.1-FP8

Accuracy test commands:

lm_eval --model local-completions \
        --model_args model=/home/models/GLM-5.1-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,max_retries=3 \
        --tasks gsm8k \
        --num_fewshot 20

Accuracy test result with mtp=3:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|    20|exact_match|↑  |0.9454|±  |0.0063|
|     |       |strict-match    |    20|exact_match|↑  |0.9462|±  |0.0062|

deepseek-ai/DeepSeek-R1-0528

Accuracy test commands:

lm_eval --model local-completions \
        --model_args model=/home/models/DeepSeek-R1-0528,base_url=http://localhost:8000/v1/completions,num_concurrent=16,max_retries=3,tokenized_requests=False \
        --tasks gsm8k \
        --num_fewshot 3

Accuracy test result with mtp=3:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9492|±  |0.0060|
|     |       |strict-match    |     3|exact_match|↑  |0.9469|±  |0.0062|

Submission Checklist

This PR is based on PR [Feat][Plugin] Enable Sparse MLA and GLM-5 for vLLM-ATOM #399 and should be merged after that.

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

wuhuikx · 2026-04-15T06:02:59Z

Could you please help attach the accuracy test results on gms8k? Do we support MTP=1 or MTP=1/2/3? How about the acceptance ratio?

wuhuikx · 2026-04-15T09:17:38Z

I will turn this PR to draft and go through CI after the code review is done.

whx-sjtu · 2026-04-15T09:40:09Z

Could you please help attach the accuracy test results on gms8k? Do we support MTP=1 or MTP=1/2/3? How about the acceptance ratio?

Sure I will attach the acc results later. Now we support MTP=1/2/3, but the acceptance rate is low (about 20% for first draft token and 0 for other tokens) and I'm working on it.

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu marked this pull request as ready for review April 14, 2026 14:52

whx-sjtu requested review from ganyi1996ppo, wuhuikx and zejunchen-zejun April 14, 2026 14:53

whx-sjtu changed the title ~~[Feat][Plugin] Enable spec decoding for GLM5 in atom (vLLM Plugin)~~ [Feat][Plugin] Enable spec decoding for GLM5 (vLLM Plugin) Apr 14, 2026

whx-sjtu added 3 commits April 15, 2026 03:36

adapt mtp for glm5 (vllm plugin)

448d157

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

add patch to support mtp>1

4046cee

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fix model load failure of draft model

9015568

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu force-pushed the whx-sjtu/atom-support-vllm-glm5-mtp branch from 17446a6 to 9015568 Compare April 15, 2026 03:37

wuhuikx marked this pull request as draft April 15, 2026 09:16

Merge branch 'main' into whx-sjtu/atom-support-vllm-glm5-mtp

91dd4b6

adapt full graph with mtp enabled

c91eb0d

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu mentioned this pull request Apr 16, 2026

[Feature] Support GLM-5 MTP for vLLM Pluggin. #544

Closed

whx-sjtu added 2 commits April 17, 2026 09:35

fix MLA MTP acceptance issue

f2c6d8a

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

fall back to vllm-style mtp position

7c8ab28

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu changed the title ~~[Feat][Plugin] Enable spec decoding for GLM5 (vLLM Plugin)~~ [Feat][Plugin] Enable spec decoding for vLLM Plugin Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat][Plugin] Enable spec decoding for vLLM Plugin#557

[Feat][Plugin] Enable spec decoding for vLLM Plugin#557
whx-sjtu wants to merge 7 commits intomainfrom
whx-sjtu/atom-support-vllm-glm5-mtp

whx-sjtu commented Apr 14, 2026 •

edited

Loading

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

whx-sjtu commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

whx-sjtu commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

wuhuikx commented Apr 15, 2026

Uh oh!

whx-sjtu commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

whx-sjtu commented Apr 14, 2026 •

edited

Loading