GitHub - ZeroKernel798/Triton-CUDA-Lab: 用于复现和优化常见的深度学习算子，基于cuda和triton两种方案，可供学习和参考

本项目是一个算子开发与性能分析实验室。它涵盖了从基础到前沿的算子实现（CUDA & Triton），也提供了一套基于 Pybind11 的自动化测试和调优框架，同时接入了nsys和ncu工具，辅助进行更深层次性能分析。后续将会增加分布式算子的实现，持续更新中......

├── operators/                              # 算子实现
│   └── 04-matrix-multiplication/     
│       ├── cuda/                           # cuda kernel实现
│       │   ├── native.cu             
│       │   ├── float4.cu             
│       │   ├── flattened_float4.cu   
│       │   └── cublas.cu                   # cublas 
│       ├── triton/                         # triton kernel实现
│       │   └── main.py            
│       └── test_cfg.py                     # 算子的测试案例及参数组合调优
├── requirements.txt    
├── build_lab.py                            # 编译脚本
├── run.sh                                  # 测试脚本
├── run_lab.py
└── utils                                   # 工具类实现
    ├── compiler.py
    ├── logger.py
    ├── prober.py
    └── runner.py
└── README.md

编译命令

编译所有算子

python3 build_lab.py --j 8

编译指定算子支持模糊匹配支持指定多进程编译

python3 build_lab.py --op 04 --j 8

清除编译产物

python3 build_lab.py --clean

运行测试

全尺寸性能跑分:

./run.sh 04-matrix-multiplication --mode cuda --bench_mode scaling

测试指定尺寸:

./run.sh 04-matrix-multiplication --mode cuda --bench_mode tuning

triton&cuda:

./run.sh 04-matrix-multiplication --mode all --bench_mode tuning

nsys分析:

./run.sh 04-matrix-multiplication --profile

ncu分析:

./run.sh 04-matrix-multiplication --profile --ncu

生成不同性能曲线

./run.sh 04-matrix-multiplication --mode cuda --bench_mode scaling --metric bw

./run.sh 04-matrix-multiplication --mode cuda --bench_mode scaling --metric ms

./run.sh 04-matrix-multiplication --mode cuda --bench_mode scaling --metric flops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

编译命令

编译所有算子

编译指定算子支持模糊匹配支持指定多进程编译

清除编译产物

运行测试

全尺寸性能跑分:

测试指定尺寸:

triton&cuda:

nsys分析:

ncu分析:

生成不同性能曲线

GEMM Benchmark:

Latency

Throughput

Compute

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
operators		operators
png		png
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_lab.py		build_lab.py
requirements.txt		requirements.txt
run.sh		run.sh
run_lab.py		run_lab.py

Folders and files

Latest commit

History

Repository files navigation

编译命令

编译所有算子

编译指定算子 支持模糊匹配 支持指定多进程编译

清除编译产物

运行测试

全尺寸性能跑分:

测试指定尺寸:

triton&cuda:

nsys分析:

ncu分析:

生成不同性能曲线

GEMM Benchmark:

Latency

Throughput

Compute

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

编译指定算子支持模糊匹配支持指定多进程编译

Packages