Add 1-bit affine quantization support on top of mlx-swift 0.31.1#381
Add 1-bit affine quantization support on top of mlx-swift 0.31.1#381khosravipasha wants to merge 1 commit intoml-explore:mainfrom
Conversation
- Point mlx submodule to PrismML-Eng/mlx v0.31.1_prism (1-bit kernels on v0.31.1) - Keep mlx-c at upstream v0.6.0 (global_scale support is native) - Regenerate mlx-generated/ with 1-bit Metal kernel support
| [submodule "submodules/mlx"] | ||
| path = Source/Cmlx/mlx | ||
| url = https://github.com/ml-explore/mlx | ||
| url = https://github.com/PrismML-Eng/mlx.git |
There was a problem hiding this comment.
Todo reminder for myself: switch the url to main mlx, for now including as is so you can test and run the code.
|
We will pick this up through ml-explore/mlx#3161 via the mlx and mlx-c dependencies. |
|
@davidkoski Thanks, I see so it happens automatically? I was running some of the metal generation scripts to get these locally for our testing. |
|
More or less -- every couple of weeks we will make a new mlx-c for the current mlx tag. Once that is done I will pick up all changes from mlx here. Due to some limitations in the swift build machinery there are Yes, automatic in that all the changes are picked up. Manual in that I have some steps to do (mostly making sure the build and tests are in good shape after each rev). |
|
Feel free to manually integrate like this for testing. In fact if you:
that is most of what I would do. The change you are making in this PR looked reasonable, so it should be good for testing. |
|
Awesome, yeah it worked pretty well in our testing, happy to know its easy to do. It runs at 44 tok/s on iPhone 17 pro max. We partnered with Locally AI developer to do day 0 support for iPhones if you want to try it in the meantime :)
https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf |

This PR includes 1-bit affine kernels for mlx-swift.
To support our newly released 1-bit Bonsai models on iPhones/iPads, etc.
With this PR can run 1-bit Bonsai 8B model on iPhone 17 Pro with ~44 tok/s.
In the meantime check out our demos here:
Might need some help with properly syncing it with mlx, and handle some edge cases, in our testing all our 3 models run okay but might have missed few edge cases.
I tested the kernels by unpacking the models from 1-bit to fp16 and checked KL-divergence of the logits, happy to share model details
mlx PR: ml-explore/mlx#3161