Enable CPU Optimizer Support for bitsandbytes by jiqing-feng · Pull Request #1901 · bitsandbytes-foundation/bitsandbytes

jiqing-feng · 2026-03-18T02:21:38Z

Summary

This PR enables all bitsandbytes optimizers (32-bit and 8-bit blockwise) to run on CPU. Previously optimizers were restricted to CUDA/XPU only.

Motivation

Users on CPU-only machines had to fall back to vanilla PyTorch optimizers, losing the benefits of 8-bit state compression. This PR removes that limitation.

Changes

Python CPU Kernels (`bitsandbytes/backends/cpu/ops.py`)

Implemented _optimizer_update_32bit_cpu (Adam, AdEMAMix, LAMB/LARS, Lion, SGD, RMSprop)
Implemented _optimizer_update_8bit_blockwise_cpu with blockwise quantization/dequantization
Fixed AdEMAMix m1/m2 interleaved state layout

Optimizer Framework (`bitsandbytes/optim/optimizer.py`, `bitsandbytes/functional.py`)

get_state_buffer: CPU uses regular tensors; paged optimizers fall back to non-paged with warning
to_gpu: skips CPU parameters
is_on_gpu: accepts all-CPU tensor sets, rejects mixed CPU/GPU

Native C++ Kernels (`csrc/cpu_ops.cpp`, `csrc/cpu_ops.h`, `csrc/pythonInterface.cpp`)

Replaced per-element binary search with LUT-based quantization (4-slot cached LUT with content fingerprinting)
Added quantize_cpu_bf16 / quantize_cpu_fp16 for direct half-precision quantization
Fixed 8-bit blockwise kernel for bf16/fp16 inputs

Tests (`tests/test_optim.py`)

Removed no_cpu=True filters — all optimizer tests now run on CPU
Paged optimizer variants auto-skip on CPU

Example (`examples/cpu/cpu_training.py`)

End-to-end CPU training with JackFram/llama-68m + Alpaca, supporting multiple optimizers, --compare mode, and HF Trainer integration

Supported Optimizers on CPU

Optimizer	32-bit	8-bit Blockwise
Adam / AdamW	✅	✅
SGD (Momentum)	✅	✅
Lion	✅	✅
RMSprop	✅	✅
LARS	✅	—
LAMB	✅	—
AdEMAMix	✅	✅

Note: Paged variants (e.g., PagedAdamW) fall back to non-paged on CPU. LARS/LAMB 8-bit blockwise is not available upstream.

How to Test

pytest tests/test_optim.py -x -v -k "cpu"
python examples/cpu/cpu_training.py --optimizer adamw8bit --steps 20

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-03-18T02:54:19Z

Hi @matthewdouglas
Please review this PR. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions · 2026-03-18T19:52:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-03-19T03:16:41Z

Hi @matthewdouglas . The failed tests seems like the CI node does not have avx512. Please rerun the CI to see if I fixed this issue.

fix kernelk

eb0c87f

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as draft March 18, 2026 02:25

update tests

d7c6ef0

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as ready for review March 18, 2026 02:41

jiqing-feng added 9 commits March 18, 2026 09:28

enable cpu optimizer

ac28bef

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix ademamix

48c3cda

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update tests and example

64bac6d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

optimize

3887b4f

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix 8bit custom op

1aaa851

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update example

b886e1e

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update example

860e7a8

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update tests

fc54f49

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix lint

aefc6bf

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

matthewdouglas added this to the v0.50.0 milestone Mar 18, 2026

matthewdouglas added Intel x64 CPU Optimizers Issues or feature requests relating to optimizers labels Mar 18, 2026

jiqing-feng added 3 commits March 19, 2026 10:02

fix storage

bfed130

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix shape

af7410d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

fix dispatch

2ad0744

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable CPU Optimizer Support for bitsandbytes#1901

Enable CPU Optimizer Support for bitsandbytes#1901
jiqing-feng wants to merge 14 commits intobitsandbytes-foundation:mainfrom
jiqing-feng:cpu

jiqing-feng commented Mar 18, 2026 •

edited

Loading

Uh oh!

jiqing-feng commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

jiqing-feng commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jiqing-feng commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Python CPU Kernels (bitsandbytes/backends/cpu/ops.py)

Optimizer Framework (bitsandbytes/optim/optimizer.py, bitsandbytes/functional.py)

Native C++ Kernels (csrc/cpu_ops.cpp, csrc/cpu_ops.h, csrc/pythonInterface.cpp)

Tests (tests/test_optim.py)

Example (examples/cpu/cpu_training.py)

Supported Optimizers on CPU

How to Test

Uh oh!

jiqing-feng commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

jiqing-feng commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiqing-feng commented Mar 18, 2026 •

edited

Loading

Python CPU Kernels (`bitsandbytes/backends/cpu/ops.py`)

Optimizer Framework (`bitsandbytes/optim/optimizer.py`, `bitsandbytes/functional.py`)

Native C++ Kernels (`csrc/cpu_ops.cpp`, `csrc/cpu_ops.h`, `csrc/pythonInterface.cpp`)

Tests (`tests/test_optim.py`)

Example (`examples/cpu/cpu_training.py`)