This is a draft roadmap for DeepSpeed Q2 2026. Feedback is welcome — please leave comments on this issue or join the #2026q2-roadmap channel on the DeepSpeed Slack.
New feature and enhancement
AutoEP support
AutoEP enables Expert Parallelism (EP) for major Mixture-of-Experts (MoE) models out of the box, eliminating the need for users to write model-specific parallelization code. By automatically distributing expert layers across devices, AutoEP allows users to scale MoE training with minimal configuration changes.
A prototype implementation has been validated on 8xH100, achieving ~5x throughput improvement over ZeRO-3 baselines. We will build on this work to extend AutoEP support to production readiness in Q2.
AutoTP extension
AutoTP was significantly revamped in Q1 (PR #7806), introducing a flexible, configuration-driven API for custom layer partitioning patterns. In Q2, we will extend this foundation to support a broader range of models and scales.
AutoSP Integration
AutoSP (ICLR 2026) is a compiler-based approach that automatically applies sequence parallelism via DeepSpeed Ulysses, removing the need for manual partitioning of sequence dimensions.
Compiler Integration Enhancement (Optional)
New Accelerator Support (Q2)
RL training specific Optimization for DeepSpeed-Inference
Stability (Q2)
This is a draft roadmap for DeepSpeed Q2 2026. Feedback is welcome — please leave comments on this issue or join the
#2026q2-roadmapchannel on the DeepSpeed Slack.New feature and enhancement
AutoEP support
AutoEP enables Expert Parallelism (EP) for major Mixture-of-Experts (MoE) models out of the box, eliminating the need for users to write model-specific parallelization code. By automatically distributing expert layers across devices, AutoEP allows users to scale MoE training with minimal configuration changes.
A prototype implementation has been validated on 8xH100, achieving ~5x throughput improvement over ZeRO-3 baselines. We will build on this work to extend AutoEP support to production readiness in Q2.
AutoTP extension
AutoTP was significantly revamped in Q1 (PR #7806), introducing a flexible, configuration-driven API for custom layer partitioning patterns. In Q2, we will extend this foundation to support a broader range of models and scales.
tp_plansupport: Leverage thebase_model_tp_planmetadata provided by HuggingFace Transformers models to automatically derive partitioning configurations, enabling out-of-the-box TP for any model that ships with a tp_planAutoSP Integration
AutoSP (ICLR 2026) is a compiler-based approach that automatically applies sequence parallelism via DeepSpeed Ulysses, removing the need for manual partitioning of sequence dimensions.
Compiler Integration Enhancement (Optional)
New Accelerator Support (Q2)
RL training specific Optimization for DeepSpeed-Inference
Stability (Q2)