[DEPRECATED] Moved to ROCm/rocm-systems repo
-
Updated
Mar 31, 2026 - Python
[DEPRECATED] Moved to ROCm/rocm-systems repo
Online CUDA Occupancy Calculator
(Spring 2017) Assignment 2: GPU Executor
Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line
GPU Drano Static Analysis for GPU programs.
Prototype for a SPIR-V assembler and dissasembler. It provides a composable Java interface for generating SPIR-V code at runtime.
Open source skill library for AI coding agents to write, optimize, and debug high performance compute kernels across CUDA, Triton, and quantized workloads.
A self-hosted low-level functional-style programming language 🌀
Noeris — autonomous kernel fusion discovery + drop-in LLM training accelerator. Cross-op fusion, autotuning, beats cuDNN on sliding-window. pip install noeris.
High-performance GPU-accelerated C# scripting for Rhino Grasshopper, powered by ILGPU
🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy
Medical AI diagnostics system implementing real compiled Mojo GPU kernels with MAX Graph integration
Runtime correctness checker for custom CUDA kernels. Attach a single decorator to periodically verify outputs against a reference implementation, with outlier-biased sampling and zero training graph impact.
16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official performance on A100 — Ampere architecture
High-performance Triton kernels for NVIDIA H100. Implements fused FP8 LayerNorm, tiled FlashAttention, and SRAM-optimized memory primitives for Hopper architecture.
LLM primitives rebuilt in Triton — FlashAttention 2.52×, fused AdamW 3.45×, Bias+GELU 14.65× faster than PyTorch
Benchmarking hand-written CUDA C, Numba, and Triton self-attention kernels against PyTorch's SDPA - how fast can you go depending on the tool?
Add a description, image, and links to the gpu-kernels topic page so that developers can more easily learn about it.
To associate your repository with the gpu-kernels topic, visit your repo's landing page and select "manage topics."