feat(ascend): add 9 Ascend operator kernels by zhangyue207 · Pull Request #47 · InfiniTensor/InfiniOps

zhangyue207 · 2026-04-08T09:30:36Z

Add, RmsNorm, Swiglu, Matmul, CausalSoftmax, AddRmsNorm,ReshapeAndCache, RotaryEmbedding, FlashAttention.

Add, RmsNorm, Swiglu, Matmul, CausalSoftmax, AddRmsNorm, ReshapeAndCache, RotaryEmbedding, FlashAttention.

Pass stream to all CANN ops in existing tests; add FlashAttention, ReshapeAndCache, RotaryEmbedding, and E2E LLaMA layer tests.

This reverts commit 26c2bdc.

…/Linear/Mul operators Descriptor caching (`AclTensorCache` + `aclSetRawTensorAddr`), executor caching (`aclSetAclOpExecutorRepeatable`), D2H sync elimination, `add_rms_norm` decomposition, and `WorkspacePool` thread-local fast path. Host dispatch dropped from ~255 us/call to 17-57 us/call for all cacheable operators. New operators: Cast (`aclnnCast`), Cat (`aclnnCat` with TensorList executor caching), Linear (`aclnnAddmm`/`aclnnBaddbmm`/ `aclnnMatmul`), Mul (`aclnnMul`). Full regression: 2040 passed, 0 failed.

Use `unique_ptr<WorkspaceArena>` in the arena map so that thread-local cached pointers remain valid across `unordered_map` rehashes. Remove unused `detail::reshapeView` helper from FlashAttention.

…tion Normalize negative `dim` in the base class constructor (e.g. -1 → last dimension). Add comment in the Ascend kernel explaining why `aclSetRawTensorAddr` on TensorList-contained descriptors is sufficient without `aclSetInputTensorAddr`. Add negative-dim test case.

zhangyue added 8 commits April 8, 2026 16:48

feat(ascend): add 9 Ascend operator kernels

1e78a02

Add, RmsNorm, Swiglu, Matmul, CausalSoftmax, AddRmsNorm, ReshapeAndCache, RotaryEmbedding, FlashAttention.

test(ascend): add NPU stream injection and new operator tests

2ccd53f

Pass stream to all CANN ops in existing tests; add FlashAttention, ReshapeAndCache, RotaryEmbedding, and E2E LLaMA layer tests.

ci(ascend): update Ascend CI config, Dockerfile, and NPU detection

b336c84

docs: add Ascend FlashAttention design spec

26c2bdc

Revert "docs: add Ascend FlashAttention design spec"

ad8bf06

This reverts commit 26c2bdc.

fix(ascend): stabilize WorkspacePool pointers and remove dead code

4f90b5a

Use `unique_ptr<WorkspaceArena>` in the arena map so that thread-local cached pointers remain valid across `unordered_map` rehashes. Remove unused `detail::reshapeView` helper from FlashAttention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ascend): add 9 Ascend operator kernels#47

feat(ascend): add 9 Ascend operator kernels#47
zhangyue207 wants to merge 8 commits intofeat/ascend-frameworkfrom
feat/ascend-operators

zhangyue207 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangyue207 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant