You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document describes the environment variables used in the ATOM project.
Data Parallelism
Variable
Type
Default
Description
ATOM_DP_RANK
int
0
The rank ID for the current process in data parallelism.
ATOM_DP_RANK_LOCAL
int
0
The local rank ID for the current process (used in SPMD mode).
ATOM_DP_SIZE
int
1
Total number of data parallel ranks.
ATOM_DP_MASTER_IP
str
127.0.0.1
Master IP address for DP ranks coordination.
ATOM_DP_MASTER_PORT
int
29500
Master port for DP ranks coordination.
Model Loading
Variable
Type
Default
Description
ATOM_DISABLE_MMAP
bool
false
If set to true, disable memory-mapped file loading for model weights. Useful in containerized environments where mmap may cause issues.
Plugin Mode
Variable
Type
Default
Description
ATOM_DISABLE_VLLM_PLUGIN
bool
0 (false)
If set to 1, disable the vLLM plugin registration entirely.
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION
bool
0 (false)
If set to 1, disable only the vLLM attention plugin while keeping other plugins active.
Kernel / Backend Selection
Variable
Type
Default
Description
ATOM_USE_TRITON_GEMM
bool
0 (false)
If set to 1, use AITER Triton FP4 weight preshuffled GEMM. Otherwise use AITER ASM FP4 weight preshuffled GEMM.
ATOM_USE_TRITON_MXFP4_BMM
bool
0 (false)
If set to 1, use FP4 BMM in MLA attention module.
Fusion Passes
TP AllReduce Fusion
Variable
Type
Default
Description
ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION
bool
1 (true)
If set to 1, fuse allreduce with RMSNorm in tensor parallel mode.
DeepSeek-style
Variable
Type
Default
Description
ATOM_ENABLE_DS_INPUT_RMSNORM_QUANT_FUSION
bool
1 (true)
If set to 1, fuse RMSNorm with quantization.
ATOM_ENABLE_DS_QKNORM_FUSION
bool
1 (true)
If set to 1, use the fused Q/K RMSNorm path (fused_qk_rmsnorm) in the DeepSeek MLA attention module when Q-LoRA is enabled and QK norm+quant fusion is not used. If set to 0, apply separate RMSNorm for the Q and KV branches instead.
ATOM_ENABLE_DS_QKNORM_QUANT_FUSION
bool
1 (true)
If set to 1, fuse QK norm with quantization in MLA attention module.
ATOM_DUAL_STREAM_MOE_TOKEN_THRESHOLD
int
1024
Upper bound on MoE token count (num_tokens in the MoE forward) for using the dual-stream path: shared experts on a secondary CUDA stream while routed experts run on the default stream. If num_tokens exceeds this value, that forward uses single-stream MoE instead. Set to 0 to disable dual-stream setup entirely (no alt stream, no maybe_dual_stream_forward registration).
Qwen3-MoE style
Variable
Type
Default
Description
ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION
bool
0 (false)
If set to 1, fuse QK norm, RoPE, and cache quantization into one kernel. Enable this for Qwen3-MoE models for better performance.