Kernel-Smith is a GPU kernel generation system developed by the Shanghai Artificial Intelligence Laboratory and MetaX. The technical report is available here.
We do not currently plan to release the Kernel-Smith model weights or agent code. For now, this repository will focus on sharing generated kernels, benchmarks, and related documentation. Stay tuned.
- Uses an evolution-based optimization loop with stable evaluation on both NVIDIA Triton and MetaX MACA backends.
- Trains for kernel improvement by rewarding correctness-preserving changes that increase performance.
- Outperforms frontier models like Gemini-3.0-pro and Claude-4.6-opus on KernelBench.
Kernel-Smith generated kernels have already been integrated into several open-source projects:
| Project | Optimized Kernel | Impact | Pull Request |
|---|---|---|---|
| SGLang | normal_decode_set_metadata |
4.78x kernel acceleration | #20778 |
| LMDeploy | DeepSeek MoE Routing | 1.36x kernel acceleration | #4345 |
| DLBlas | DeepSeek Engram kernels | Accelerated architecture research | #102 |