Hi Martin,
There are a loath of interesting utilities you host in your repo, this one, in particular, caught my attention because I can see you're using DWCAS lock-free technique.
Given that I occasionally go through the lock-free programming rabbit hole myself, I hope you don't mind sharing a few things I have with you here.
According to the x86-64 manual
Note that CMPXCHG16B requires that the destination (memory) operand be 16-byte aligned.
the first precondition for DWCAS appears to be that destination operand must be 16-byte aligned.
Given I don't see this is a case here
perhaps this should be transformed to
alignas(16) struct _next_t
in order to really materialize this guarantee?
Also, I made some experiments now, but also in the past which is why I resonated with this code, and I remember that support for DWCAS was a bit kludgy, to say the least.
My findings were, and still are, that essentially support for DWCAS on GCC+Linux is broken. E.g. no CMPXCHG16B instruction will be emitted but rather a call into __atomic_compare_exchange which has the freedom, as you already know, to use the mutexes beneath and which it probably will given the absence of CMPXCHG16B in generated codegen. This can be seen in this godbolt example if you remove the static_assert.
So, it seems like even plain clang+Linux isn't sufficient unless you compile the code with -march=native. I found out that -mcx16 seems to have the same effect. This, however, doesn't help with the GCC and I think only the very old versions of GCC handled it correctly.
Good thing is that at least static_assert such as
static_assert(std::atomic<_next_t>::is_always_lock_free);
will be triggered correctly for all the combinations I tried
Only true solution for all of this fun, as I see it, is to handcraft the assembly, or another option would be to use boost because it handles it already that way.
Cheers,
Adi
Hi Martin,
There are a loath of interesting utilities you host in your repo, this one, in particular, caught my attention because I can see you're using DWCAS lock-free technique.
Given that I occasionally go through the lock-free programming rabbit hole myself, I hope you don't mind sharing a few things I have with you here.
According to the x86-64 manual
the first precondition for DWCAS appears to be that destination operand must be 16-byte aligned.
Given I don't see this is a case here
intrusive_lfstack/include/avakar/intrusive/lfstack.h
Line 24 in ab522c2
perhaps this should be transformed to
in order to really materialize this guarantee?
Also, I made some experiments now, but also in the past which is why I resonated with this code, and I remember that support for DWCAS was a bit kludgy, to say the least.
My findings were, and still are, that essentially support for DWCAS on GCC+Linux is broken. E.g. no
CMPXCHG16Binstruction will be emitted but rather a call into__atomic_compare_exchangewhich has the freedom, as you already know, to use the mutexes beneath and which it probably will given the absence ofCMPXCHG16Bin generated codegen. This can be seen in this godbolt example if you remove thestatic_assert.So, it seems like even plain clang+Linux isn't sufficient unless you compile the code with
-march=native. I found out that-mcx16seems to have the same effect. This, however, doesn't help with the GCC and I think only the very old versions of GCC handled it correctly.Good thing is that at least
static_assertsuch aswill be triggered correctly for all the combinations I tried
Only true solution for all of this fun, as I see it, is to handcraft the assembly, or another option would be to use boost because it handles it already that way.
Cheers,
Adi