DWCAS fun

Hi Martin,

There are a loath of interesting utilities you host in your repo, this one, in particular, caught my attention because I can see you're using DWCAS lock-free technique.

Given that I occasionally go through the lock-free programming rabbit hole myself, I hope you don't mind sharing a few things I have with you here.

According to the [x86-64 manual](https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b)

> Note that CMPXCHG16B requires that the destination (memory) operand be 16-byte aligned.

the first precondition for DWCAS appears to be that destination operand must be 16-byte aligned.

Given I don't see this is a case here
https://github.com/avakar/intrusive_lfstack/blob/ab522c2f8dfd64f6e09317d1e89f8a17c9b64371/include/avakar/intrusive/lfstack.h#L24
perhaps this should be transformed to
> alignas(16) struct _next_t

in order to really materialize this guarantee?

Also, I made some experiments now, but also in the past which is why I resonated with this code, and I remember that support for DWCAS was a bit kludgy, to say the least.

My findings were, and still are, that essentially support for DWCAS on GCC+Linux is broken. E.g. no `CMPXCHG16B` instruction will be emitted but rather a call into `__atomic_compare_exchange` which has the freedom, as you already know, to use the mutexes beneath and which it probably will given the absence of `CMPXCHG16B` in generated codegen. This can be seen in this [godbolt example](https://godbolt.org/z/WE1oEK7sE) if you remove the `static_assert`.

So, it seems like even plain clang+Linux isn't sufficient unless you compile the code with `-march=native`. I found out that `-mcx16` seems to have the same effect. This, however, doesn't help with the GCC and I think only the very old versions of GCC handled it correctly.

Good thing is that at least `static_assert` such as

> static_assert(std::atomic<_next_t>::is_always_lock_free);

will be triggered correctly for all the combinations I tried

Only true solution for all of this fun, as I see it, is to handcraft the assembly, or another option would be to use boost because it handles it already [that way](https://github.com/boostorg/atomic/blob/f72bef98e0117fb2952235ae86bbc29ce8c50454/include/boost/atomic/detail/core_arch_ops_gcc_x86.hpp).

Cheers,
Adi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DWCAS fun #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DWCAS fun #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions