Skip to content

Add 1-bit affine quantization support on top of mlx-swift 0.31.1#381

Closed
khosravipasha wants to merge 1 commit intoml-explore:mainfrom
PrismML-Eng:v0.31.1_prism
Closed

Add 1-bit affine quantization support on top of mlx-swift 0.31.1#381
khosravipasha wants to merge 1 commit intoml-explore:mainfrom
PrismML-Eng:v0.31.1_prism

Conversation

@khosravipasha
Copy link
Copy Markdown

@khosravipasha khosravipasha commented Mar 31, 2026

This PR includes 1-bit affine kernels for mlx-swift.

To support our newly released 1-bit Bonsai models on iPhones/iPads, etc.
With this PR can run 1-bit Bonsai 8B model on iPhone 17 Pro with ~44 tok/s.

In the meantime check out our demos here:

Might need some help with properly syncing it with mlx, and handle some edge cases, in our testing all our 3 models run okay but might have missed few edge cases.
I tested the kernels by unpacking the models from 1-bit to fp16 and checked KL-divergence of the logits, happy to share model details

mlx PR: ml-explore/mlx#3161

- Point mlx submodule to PrismML-Eng/mlx v0.31.1_prism (1-bit kernels on v0.31.1)
- Keep mlx-c at upstream v0.6.0 (global_scale support is native)
- Regenerate mlx-generated/ with 1-bit Metal kernel support
Comment thread .gitmodules
[submodule "submodules/mlx"]
path = Source/Cmlx/mlx
url = https://github.com/ml-explore/mlx
url = https://github.com/PrismML-Eng/mlx.git
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo reminder for myself: switch the url to main mlx, for now including as is so you can test and run the code.

@davidkoski
Copy link
Copy Markdown
Collaborator

We will pick this up through ml-explore/mlx#3161 via the mlx and mlx-c dependencies.

@davidkoski davidkoski closed this Apr 1, 2026
@khosravipasha
Copy link
Copy Markdown
Author

@davidkoski Thanks, I see so it happens automatically? I was running some of the metal generation scripts to get these locally for our testing.

@davidkoski
Copy link
Copy Markdown
Collaborator

More or less -- every couple of weeks we will make a new mlx-c for the current mlx tag. Once that is done I will pick up all changes from mlx here. Due to some limitations in the swift build machinery there are tools/ that help with the prep.

Yes, automatic in that all the changes are picked up. Manual in that I have some steps to do (mostly making sure the build and tests are in good shape after each rev).

@davidkoski
Copy link
Copy Markdown
Collaborator

Feel free to manually integrate like this for testing. In fact if you:

  • repoint the Source/Cmlx/mlx to your fork
  • run tools/update-mlx.sh

that is most of what I would do. The change you are making in this PR looked reasonable, so it should be good for testing.

@khosravipasha
Copy link
Copy Markdown
Author

Awesome, yeah it worked pretty well in our testing, happy to know its easy to do.

It runs at 44 tok/s on iPhone 17 pro max.

We partnered with Locally AI developer to do day 0 support for iPhones if you want to try it in the meantime :)
https://x.com/adrgrondin/status/2039066539022778613?s=20

Screenshot 2026-04-01 at 11 40 28

https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants