Skip to content

[BUG] incorrect spin-loop synchronization #3117

@gonzalobg

Description

@gonzalobg

Which component has the problem?

CuTe DSL

Bug Report

Describe the bug
The following sample is using a spin-loop to acquire system-scope writes, but is using relaxed instead of acquire:

res = cute.arch.load(flag.llvm_ptr, cutlass.Int32, sem="relaxed", scope="gpu")

It then uses a warp barrier, but that warp barrier only orders memory operations issued by the same warp, not other warps or other thread blocks, like the example is relying on.

Expected behavior
I expected the spin-loop to use acquire memory ordering.

Additional context
This is a customer request (ping me internally for more details if needed).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions