ATOM Environment Variables

This document describes the environment variables used in the ATOM project.

Data Parallelism

Variable	Type	Default	Description
ATOM_DP_RANK	int	0	The rank ID for the current process in data parallelism.
ATOM_DP_RANK_LOCAL	int	0	The local rank ID for the current process (used in SPMD mode).
ATOM_DP_SIZE	int	1	Total number of data parallel ranks.
ATOM_DP_MASTER_IP	str	127.0.0.1	Master IP address for DP ranks coordination.
ATOM_DP_MASTER_PORT	int	29500	Master port for DP ranks coordination.

Model Loading

Variable	Type	Default	Description
ATOM_DISABLE_MMAP	bool	false	If set to `true`, disable memory-mapped file loading for model weights. Useful in containerized environments where mmap may cause issues.

Plugin Mode

Variable	Type	Default	Description
ATOM_DISABLE_VLLM_PLUGIN	bool	0 (false)	If set to `1`, disable the vLLM plugin registration entirely.
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION	bool	0 (false)	If set to `1`, disable only the vLLM attention plugin while keeping other plugins active.

Kernel / Backend Selection

Variable	Type	Default	Description
ATOM_USE_TRITON_GEMM	bool	0 (false)	If set to `1`, use AITER Triton FP4 weight preshuffled GEMM. Otherwise use AITER ASM FP4 weight preshuffled GEMM.
ATOM_USE_TRITON_MXFP4_BMM	bool	0 (false)	If set to `1`, use FP4 BMM in MLA attention module.

Fusion Passes

TP AllReduce Fusion

Variable	Type	Default	Description
ATOM_ENABLE_ALLREDUCE_RMSNORM_FUSION	bool	1 (true)	If set to `1`, fuse allreduce with RMSNorm in tensor parallel mode.

DeepSeek-style

Variable	Type	Default	Description
ATOM_ENABLE_DS_INPUT_RMSNORM_QUANT_FUSION	bool	1 (true)	If set to `1`, fuse RMSNorm with quantization.
ATOM_ENABLE_DS_QKNORM_FUSION	bool	1 (true)	If set to `1`, use the fused Q/K RMSNorm path (`fused_qk_rmsnorm`) in the DeepSeek MLA attention module when Q-LoRA is enabled and QK norm+quant fusion is not used. If set to `0`, apply separate RMSNorm for the Q and KV branches instead.
ATOM_ENABLE_DS_QKNORM_QUANT_FUSION	bool	1 (true)	If set to `1`, fuse QK norm with quantization in MLA attention module.
ATOM_DUAL_STREAM_MOE_TOKEN_THRESHOLD	int	1024	Upper bound on MoE token count (`num_tokens` in the MoE forward) for using the dual-stream path: shared experts on a secondary CUDA stream while routed experts run on the default stream. If `num_tokens` exceeds this value, that forward uses single-stream MoE instead. Set to `0` to disable dual-stream setup entirely (no alt stream, no `maybe_dual_stream_forward` registration).

Qwen3-MoE style

Variable	Type	Default	Description
ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION	bool	0 (false)	If set to `1`, fuse QK norm, RoPE, and cache quantization into one kernel. Enable this for Qwen3-MoE models for better performance.

Llama-style

Variable	Type	Default	Description
ATOM_LLAMA_ENABLE_AITER_TRITON_FUSED_RMSNORM_QUANT	bool	1 (true)	If set to `1`, use Triton kernel to fuse RMSNorm with quantization.
ATOM_LLAMA_ENABLE_AITER_TRITON_FUSED_SILU_MUL_QUANT	bool	1 (true)	If set to `1`, use Triton kernel to fuse SiLU and mul with quantization in MLP module.

Profiling & Debugging

Variable	Type	Default	Description
ATOM_TORCH_PROFILER_DIR	str	—	When set, enables PyTorch profiler and writes traces to this directory. Create subdirectories per rank (e.g., `rank_0`, `dp0_tp0`).
ATOM_PROFILER_MORE	bool	0 (false)	When `ATOM_TORCH_PROFILER_DIR` is set and this is `1`, enables detailed profiling: `record_shapes`, `with_stack`, and `profile_memory`.
ATOM_LOG_MORE	bool	0 (false)	If set to `1`, use verbose logging format (includes process name, PID, path, line number, function name).

Benchmarks (Optional)

Variable	Type	Default	Description
OPENAI_API_KEY	str	—	API key for OpenAI-compatible benchmark requests.
VLLM_USE_MODELSCOPE	bool	false	If set to `true`, use ModelScope for model downloads in benchmarks.
SAVE_TO_PYTORCH_BENCHMARK_FORMAT	bool	false	If set, save benchmark results in PyTorch benchmark format.

Internal / Set by ATOM

The following variables are set internally by ATOM; users typically do not need to configure them:

Variable	Description
AITER_QUICK_REDUCE_QUANTIZATION	Set to `INT4` for Llama models with bf16/fp16.
TORCHINDUCTOR_CACHE_DIR	Set by compiler interface for inductor cache.
TRITON_CACHE_DIR	Set by compiler interface for Triton cache.

Reference

Environment variables are defined and accessed via atom.utils.envs:

from atom.utils import envs

# Example: check data parallel size
dp_size = envs.ATOM_DP_SIZE

See atom/utils/envs.py for the full list of lazy-evaluated environment variables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATOM Environment Variables

Data Parallelism

Model Loading

Plugin Mode

Kernel / Backend Selection

Fusion Passes

TP AllReduce Fusion

DeepSeek-style

Qwen3-MoE style

Llama-style

Profiling & Debugging

Benchmarks (Optional)

Internal / Set by ATOM

Reference

FilesExpand file tree

environment_variables.md

Latest commit

History

environment_variables.md

File metadata and controls

ATOM Environment Variables

Data Parallelism

Model Loading

Plugin Mode

Kernel / Backend Selection

Fusion Passes

TP AllReduce Fusion

DeepSeek-style

Qwen3-MoE style

Llama-style

Profiling & Debugging

Benchmarks (Optional)

Internal / Set by ATOM

Reference