Skip to content

ggml-cpu: add RVV repack GEMM and GEMV for Q3_K, Q6_K#12

Open
taimur-10x wants to merge 13 commits intomasterfrom
10x/riscv-quant-repack-k
Open

ggml-cpu: add RVV repack GEMM and GEMV for Q3_K, Q6_K#12
taimur-10x wants to merge 13 commits intomasterfrom
10x/riscv-quant-repack-k

Conversation

@taimur-10x
Copy link
Collaborator

@taimur-10x taimur-10x commented Mar 15, 2026

Summary

This PR extends existing repacking and GEMM/GEMV kernels for Q3_K and Q6_K for RVV (VLEN=128 to 1024).

Key Changes

  • Added repacking RVV GEMM and GEMV kernels for:
    • Q3_K
    • Q6_K

Tile Sizes

VLEN Tiling LHS RHS OUT
128 4, 8, 1 4x1 8x1 4x8
256 4, 16, 1 4x1 16x1 4x16
512 4, 32, 1 4x1 32x1 4x32
1024 4, 64, 1 4x1 64x1 4x64

Testing

Kernels were functionally tested on QEMU for VLENs (128-bit to 1024-bit) for a range of input sizes.

Benchmarking Results

End-to-end benchmarking on BananaPI-BPI F3 (VLEN=256)

Q3_K

Prompt Processing

Model Prompt Size Repack GEMM 4x16x1 (Tok/s) Vec Dot (Tok/ s)
Tinyllama Q3_K 1.1B 32 24.9 11.52
Tinyllama Q3_K 1.1B 64 25.4 11.86
Tinyllama Q3_K 1.1B 128 25.16 10.66
Tinyllama Q3_K 1.1B 256 24.97 10.54
Tinyllama Q3_K 1.1B 512 23.53 10.42

Token Generation

Model Tokens Generated Repack GEMV 1x16x1 (Tok/s) Vec Dot (Tok/s)
Tinyllama Q3_K 1.1B 10 6.04 9.26
Tinyllama Q3_K 1.1B 16 6.26 9.17
Tinyllama Q3_K 1.1B 32 6.05 8.99
Tinyllama Q3_K 1.1B 64 5.98 7.82
Tinyllama Q3_K 1.1B 100 5.91 7.66

Q6_K

Prompt Processing

Model Prompt Size Repack GEMM 4x16x1 (Tok/s) Vec Dot (Tok/s)
Tinyllama Q6_K 1.1B 32 20.34 10.67
Tinyllama Q6_K 1.1B 64 21.11 10.29
Tinyllama Q6_K 1.1B 128 20.29 10.37
Tinyllama Q6_K 1.1B 256 20.73 10.48
Tinyllama Q6_K 1.1B 512 19.92 10.11

Token Generation

Model Tokens Generated Repack GEMV 1x16x1 (Tok/s) Vec Dot (Tok/s)
Tinyllama Q6_K 1.1B 10 5.63 7.08
Tinyllama Q6_K 1.1B 16 5.76 7.05
Tinyllama Q6_K 1.1B 32 5.54 7.06
Tinyllama Q6_K 1.1B 64 5.61 6.67
Tinyllama Q6_K 1.1B 100 5.44 6.75

@taimur-10x taimur-10x self-assigned this Mar 15, 2026
@github-actions github-actions bot added the ggml label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants