Listing 5.6 - highly possible output error (+ book explanation)

## Original listing from the book
### Listing 5.6 Relaxed operations on multiple threads
<details>

```
#include <thread>
#include <atomic>
#include <iostream>
std::atomic<int> x(0),y(0),z(0);
std::atomic<bool> go(false);
unsigned const loop_count=10;
struct read_values
{
 int x,y,z;
};
read_values values1[loop_count];
read_values values2[loop_count];
read_values values3[loop_count];
read_values values4[loop_count];
read_values values5[loop_count];
void increment(std::atomic<int>* var_to_inc,read_values* values)
{
 while(!go)
 std::this_thread::yield();
 for(unsigned i=0;i<loop_count;++i)
 {
 values[i].x=x.load(std::memory_order_relaxed);
 values[i].y=y.load(std::memory_order_relaxed);
 values[i].z=z.load(std::memory_order_relaxed);
 var_to_inc->store(i+1,std::memory_order_relaxed);
 std::this_thread::yield();
 }
}
void read_vals(read_values* values)
{
 while(!go)
 std::this_thread::yield();
 for(unsigned i=0;i<loop_count;++i)
 {
 values[i].x=x.load(std::memory_order_relaxed);
 values[i].y=y.load(std::memory_order_relaxed);
 values[i].z=z.load(std::memory_order_relaxed);
 std::this_thread::yield();
 }
}
void print(read_values* v)
{
 for(unsigned i=0;i<loop_count;++i)
 {
 if(i)
 std::cout<<",";
std::cout<<"("<<v[i].x<<","<<v[i].y<<","<<v[i].z<<")";
 }
 std::cout<<std::endl;
}
int main()
{
 std::thread t1(increment,&x,values1);
 std::thread t2(increment,&y,values2);
 std::thread t3(increment,&z,values3);
 std::thread t4(read_vals,values4);
 std::thread t5(read_vals,values5);
 go=true;
 t5.join();
 t4.join();
 t3.join();
 t2.join();
 t1.join();
 print(values1);
 print(values2);
 print(values3);
 print(values4);
 print(values5);
} 
```
</details>

The book also states that `One possible output from this program is as follows:`
### Output 5.6
<details>

```
(0,0,0),(1,0,0),(2,0,0),(3,0,0),(4,0,0),(5,7,0),(6,7,8),(7,9,8),(8,9,8),
(9,9,10)
(0,0,0),(0,1,0),(0,2,0),(1,3,5),(8,4,5),(8,5,5),(8,6,6),(8,7,9),(10,8,9),
(10,9,10)
(0,0,0),(0,0,1),(0,0,2),(0,0,3),(0,0,4),(0,0,5),(0,0,6),(0,0,7),(0,0,8),
(0,0,9)
(1,3,0),(2,3,0),(2,4,1),(3,6,4),(3,9,5),(5,10,6),(5,10,8),(5,10,10),
(9,10,10),(10,10,10)
(0,0,0),(0,0,0),(0,0,0),(6,3,7),(6,5,7),(7,7,7),(7,8,7),(8,8,7),(8,8,9),
(8,8,9)
```
</details>

I believe, this output is **impossible** for _reasons explained later_. I do also believe that the explanation in the book might contain **logical mistakes and errors in formulations** that are non-conforming with the standard.  

I will try to elaborate on a **simplified example** before going back to the original.

## Simplified identical example
Let's reduce the number of threads to 2 and leave the writers only.  
We will also reduce the number of loops to 2.

All we are interested in is a situation where the output produced looks like this:
```
(0,1),(1,1)
(1,0),(1,1)
```
Our statement is that this sequence, although showcased in a book as a possible output, and also reasoned as a possible outcome:
> This is a valid outcome for relaxed operations, but it’s not the only valid outcome. Any
set of values that’s consistent with the three variables, each holding the values 0 to 10
in turn, and that has the thread incrementing a given variable printing the values 0 to
9 for that variable, is valid.
>
> -- Anthony Williams - C++ Concurrency in Action (2nd Edition), p.154

This statement rewritten for our simplified case reads as:
> This is a valid outcome for relaxed operations, but it’s not the only valid outcome. Any
set of values that’s consistent with the 2 variables, each holding the values 0 to 1
in turn, and that has the thread incrementing a given variable printing the values 0 to
1 for that variable, is valid.

What's wrong with it? Let's take look closer

### Circular dependency on a future value
|                | T1 | T3 |
|:---|:--|:---|
| thread_x | (0,**1**)  |  (**_1_**,1)   |   
| thread_y | (_**1**_,**0**)  |  (1,1)   |  

And our simplified pseudo-code for reference:
```cpp
// increment operation inside the loop
values[i].x=x.load(std::memory_order_relaxed);
values[i].y=y.load(std::memory_order_relaxed);         // <-- happens 1st in a view of THIS thread
var_to_inc->store(i+1,std::memory_order_relaxed);  // <-- happens afterwards in the view of THIS thread
```
T3 > T2 > T1
It is important to understand that for now these T1, T2 and T3 are thread-local and might be very different for these 2 threads. Which means, these are relative to thread timestamps and do not necessarily (actually, very unlikely) synchronized in "real" / global time.

At T1 thread_x:
1) reads `y` (already 1)
2) writes incremented `x` (as 1)

Somewhere at T2 (in the view of thread_x) the x value is updated. It might be lagging a little for other threads, but it is happening AFTER it was written.

At some other point T1 thread_y:
1) reads `x` (already updated, 1)
2) reads and stores `y` (still 0, no one touched it  yet, it's a 1st loop)
3) writes incremented `y` (as 1)

This yields, in a view of this thread_y, which is guaranteed by the Standard to be sequential (at least in a view of THIS very thread) that reading of `x` as 1 happens BEFORE writing `y` as 1. 

#### Now, a contradiction and an impossible circular dependency on a future value:
thread_x, which is sequential in the eyes of the thread_x, has **read** a value of `y` 1, which is being **written** as 1 only AFTER thread_y has **read** the value of `x` as 1, which happens only AFTER thread_x has **read** a value of `y` as 1. Blitz!

In the very eyes of thread_x, we can read the value `y` as 1 only AFTER the instruction of writing `x`. But the output shows the opposite.
Meaning, it's invalid.

Important to note: we do not make any assumptions on how fast the threads synchronize: the relaxed memory ordering allows for the written values to be updated later than we expect and for threads to read the values that are lagging behind.
But the flow of one specific thread is guaranteed by the Standard inside of the very thread and the earlier read of a value that will written as a result of mutations performed by this very thread later on should not be possible.

## Standard and supporting logical materials
I was surprised to find out that the text of the Standard itself is not freely available to the public, since everyone is referencing it, but due to impossibllity to find the original text I got a copy of a C++17 draft. I hope it's not too different from the original and I will be working with it.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf

Let's list our important logical building blocks here used for a compilation of a later thesis, hidden, because a lot of text =):
<details>

- The C++ standard ensures that memory_order_relaxed operations within a single thread behave as if they follow program order through the "happens-before" relation, which is derived from the "sequenced-before" relationship. In one thread the operations follow the source code order:
> Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single
thread (4.7), which induces a partial order among those evaluations. Given any two evaluations A and B,
if A is sequenced before B (or, equivalently, B is sequenced after A), then the execution of A shall precede
the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are
unsequenced. [ Note: The execution of unsequenced evaluations can overlap. — end note ] Evaluations
A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A,
but it is unspecified which. [ Note: Indeterminately sequenced evaluations cannot overlap, but either could
be executed first. — end note ] An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y
> -- C++ Standard, draft N4659, Clause 4.4.15
- One thead is sequential, even through the program might consist of lots of concurrent ones:
> A thread of execution (also known as a thread) is a single flow of control within a program, including the initial
invocation of a specific top-level function <...> 
> The execution of each thread proceeds
as defined by the remainder of this International Standard. The execution of the entire program consists
of an execution of all of its threads. [ Note: Usually the execution can be viewed as an interleaving of
all its threads. However, some kinds of atomic operations, for example, allow executions inconsistent
with a simple interleaving, as described below. — end note ]
> -- C++ Standard, draft N4659, Clause 4.7.1
- while memory_order_relaxed allows reordering relative to other threads, single threads must observe their own actions in order due to coherence rules specified in section 4.7.1 (13-19) The "happens before" relation, which includes the "sequenced before" relation for operations within a single thread, and write-read coherence (4.7.1 14-18), ensure a relaxed load sees a value from its own prior relaxed store, preventing visible self-reordering. These coherence rules govern how different threads observe modifications to atomic objects, stating that a thread should not see "future" or "stale" values that contradict the modification order.
> 13 The value of an atomic object M, as determined by evaluation B, shall be the value stored by some side effect
A that modifies M, where B does not happen before A. [ Note: The set of such side effects is also restricted
by the rest of the rules described here, and in particular, by the coherence requirements below. — end note ]
> 14 If an operation A that modifies an atomic object M happens before an operation B that modifies M, then
A shall be earlier than B in the modification order of M. [ Note: This requirement is known as write-write
coherence. — end note ]
> 15 If a value computation A of an atomic object M happens before a value computation B of M, and A takes
its value from a side effect X on M, then the value computed by B shall either be the value stored by X or
the value stored by a side effect Y on M, where Y follows X in the modification order of M. [ Note: This
requirement is known as read-read coherence. — end note ]
> 16 If a value computation A of an atomic object M happens before an operation B that modifies M, then A
shall take its value from a side effect X on M, where X precedes B in the modification order of M. [ Note:
This requirement is known as read-write coherence. — end note ]
> 17 If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B
shall take its value from X or from a side effect Y that follows X in the modification order of M. [ Note: This
requirement is known as write-read coherence. — end note ]
> 18 [ Note: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations
to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence
guarantee provided by most hardware available to C++ atomic operations. — end note ]
> 19 [ Note: The value observed by a load of an atomic depends on the “happens before” relation, which depends
on the values observed by loads of atomics. The intended reading is that there must exist an association of
atomic loads with modifications they observe that, together with suitably chosen modification orders and
the “happens before” relation derived as described above, satisfy the resulting constraints as imposed here.
— end note ]
> -- C++ Standard, draft N4659, Clause 4.7.1

</details>


## Back to original example
With all of this let's take a look at the part of the original output which is similarly problematic

<img width="714" height="224" alt="Image" src="https://github.com/user-attachments/assets/b3004bbd-2964-4c07-8179-a9a0e373ae7c" />

Extracting the specific problematic part:
```
(5,7),(6,7)     <-- thread_x
(8,4),(8,5)    <--  thread_y
```
At the moment of `thread_x` incrementing `x` from 5 to 6 it BEFORE **reads** `y` as 7 and only THEN **writes** `x` as 6
At the moment of `thread_y` incrementing `y` from 4 to 5 it BEFORE **reads** `x` as 8 and only THEN **writes** `y` as 5
But at this point `thread_x` has already **read** `y` as 7 BEFORE writing `x` as even 6, but `thread_y` observed a future value, `x` being already 8 BEFORE it didn't write `y` as 5, not even talking of 7.

These 2 are not just threads being unsynchonized: they show an impossible circular dependency on future values.
They do not just represent threads lagging in reading values, but in the view of the same thread reading values that have not yet been written and depend on this very thread to write them afterwards.

It's about sequence of actions in the very same thread.
Thread_x HAS to first read the value of y, which can only update AFTER the increment of x.

And thus, I conclude that this breaks the principles of the Standard and this output is impossible.
As for the statement in the book:
> This is a valid outcome for relaxed operations, but it’s not the only valid outcome. Any
set of values that’s consistent with the three variables, each holding the values 0 to 10
in turn, and that has the thread incrementing a given variable printing the values 0 to
9 for that variable, is valid.
>
> -- Anthony Williams - C++ Concurrency in Action (2nd Edition), p.154

The conclusion is that it seems to be wrongfully formulated and that `Not any set of values that's consistent with variables, each holding the values 0 to 10 in turn, and that has the thread incrementing a given variable printing the values 0 to 9 for that variable, is valid.`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Listing 5.6 - highly possible output error (+ book explanation) #61

Original listing from the book

Listing 5.6 Relaxed operations on multiple threads

Output 5.6

Simplified identical example

Circular dependency on a future value

Now, a contradiction and an impossible circular dependency on a future value:

Standard and supporting logical materials

Back to original example

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Listing 5.6 - highly possible output error (+ book explanation) #61

Description

Original listing from the book

Listing 5.6 Relaxed operations on multiple threads

Output 5.6

Simplified identical example

Circular dependency on a future value

Now, a contradiction and an impossible circular dependency on a future value:

Standard and supporting logical materials

Back to original example

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions