Modular GDN forward / backward kernels (compatible with Kimi CP)

### Description

Implement modular forward and backward CUDA kernels for the GDN (Gated Delta Network) algorithm, compatible with [Kimi CP](https://github.com/fla-org/flash-linear-attention/blob/main/fla/ops/cp/KCP.md).

### Context

GDN is a linear attention variant that improves expressiveness with gating and delta-style updates. Implementing modular forward/backward kernels compatible with Kimi CP would enable efficient distributed training via context parallelism.

### Tasks

- [ ] Implement GDN forward kernels (modular, similar to KDA structure)
- [ ] Implement GDN backward kernels
- [ ] Ensure compatibility with Kimi CP's context parallelism protocol
- [ ] Add correctness tests against FLA reference implementation
- [ ] Add benchmarks

### References

- GDN: https://arxiv.org/abs/2412.06464
- Kimi CP: https://github.com/fla-org/flash-linear-attention/blob/main/fla/ops/cp/KCP.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular GDN forward / backward kernels (compatible with Kimi CP) #13

Description

Context

Tasks

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Modular GDN forward / backward kernels (compatible with Kimi CP) #13

Description

Description

Context

Tasks

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions