Skip to content

feat(ascend): add 9 Ascend operator kernels#47

Open
zhangyue207 wants to merge 8 commits intofeat/ascend-frameworkfrom
feat/ascend-operators
Open

feat(ascend): add 9 Ascend operator kernels#47
zhangyue207 wants to merge 8 commits intofeat/ascend-frameworkfrom
feat/ascend-operators

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

Add, RmsNorm, Swiglu, Matmul, CausalSoftmax, AddRmsNorm,ReshapeAndCache, RotaryEmbedding, FlashAttention.

zhangyue added 8 commits April 8, 2026 16:48
Add, RmsNorm, Swiglu, Matmul, CausalSoftmax, AddRmsNorm,
ReshapeAndCache, RotaryEmbedding, FlashAttention.
Pass stream to all CANN ops in existing tests; add FlashAttention,
ReshapeAndCache, RotaryEmbedding, and E2E LLaMA layer tests.
…/Linear/Mul operators

Descriptor caching (`AclTensorCache` + `aclSetRawTensorAddr`), executor caching
(`aclSetAclOpExecutorRepeatable`), D2H sync elimination, `add_rms_norm` decomposition,
and `WorkspacePool` thread-local fast path. Host dispatch dropped from ~255 us/call to
17-57 us/call for all cacheable operators. New operators: Cast (`aclnnCast`), Cat
(`aclnnCat` with TensorList executor caching), Linear (`aclnnAddmm`/`aclnnBaddbmm`/
`aclnnMatmul`), Mul (`aclnnMul`). Full regression: 2040 passed, 0 failed.
Use `unique_ptr<WorkspaceArena>` in the arena map so that thread-local
cached pointers remain valid across `unordered_map` rehashes.  Remove
unused `detail::reshapeView` helper from FlashAttention.
…tion

Normalize negative `dim` in the base class constructor (e.g. -1 → last
dimension).  Add comment in the Ascend kernel explaining why
`aclSetRawTensorAddr` on TensorList-contained descriptors is sufficient
without `aclSetInputTensorAddr`.  Add negative-dim test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant