497 questions
1
vote
0
answers
17
views
cache coherence vs atomicity
I want to verify that if I understand properly the notion of atomicity and cache coherence.
Starting with a simple scenario with only one shared memory location a, so I suppose there is nothing to do ...
0
votes
1
answer
84
views
What Store/Store reordering do modern CPUs do in practice?
Aarch64 and RISC-V WMO seem to allow Store/Store reordering according to their formal specifications.
However, Store/Store reordering seems very tricky to perform in practice: the CPU would need to ...
4
votes
2
answers
111
views
C11 atomics: How does a relaxed load interact with a release store on the same variable?
Context: I have been writing a multithreaded program that uses atomics extensively. I've noticed that these atomics are very slow especially on ARM because the compiler inserted too many fences, ...
0
votes
0
answers
65
views
Does disabling/enabling interrupts require memory barriers or similar memory ordering constraints?
AFAIK part of the reason why we need the C++11 memory model (and later patches/variants) is the fact we trade various things for single threaded executions to be fast, with only one main criteria i.e. ...
1
vote
0
answers
97
views
Why LWSYNC can not make the Independent Read Independent Write Example (IRIW) behave sensibly on PowerPC?
On PowerPC platform, the book A Primer on Memory Consistency and Cache Coherence stated:
As depicted in Table 5.18, Power’s HWSYNCs can be used to make the
Independent Read Independent Write Example (...
4
votes
1
answer
105
views
Visibility of atomic operations with seq-cst fences in C++20
Until C++17 the standard contained the following paragraph (C++17 Section 32.4 [atomics.order] paragraph 6):
For atomic operations A and B on an atomic object M, where A modifies M and B takes its ...
5
votes
1
answer
224
views
CUDA memory model: why acquire fence is not needed to prevent load-load reordering?
I am reading the book "Programming Massively Parallel Processors" and noticed the below code snippets to achieve "domino-style" scan:
if (threadIdx.x == 0) {
while(AtomicAdd(&...
4
votes
2
answers
197
views
Does this transitive happens-before use case need sequential consistency or will acquire/release suffice?
This snip is from Herb Sutter's Atomic Weapons talk slide from page number 19.
If I am understanding this correctly, what Herb is saying is that, for the assert() in thread 3 to succeed, this has to ...
1
vote
1
answer
69
views
How does Julia know what the type of some object is? What is the memory layout of a reference to a `Vector{Int64}`?
I'm trying to understand more detail about how the Julia typesystem works.
Consider the following simple example
julia> v = [1]
1-element Vector{Int64}:
1
julia> typeof(v)
Vector{Int64} (alias ...
5
votes
1
answer
140
views
Does gcc treat relaxed atomic operation as a Compiler-fence?
I have following code with GCC8.3 ,x86-64 linux:
// file: inc.cc
int inc_value(int* x) {
(*x)++;
//std::atomic<int> ww;
//ww.load(std::memory_order_relaxed);
(*x)++;
return *x;
}
...
1
vote
2
answers
202
views
Can compiler optimizations avoid writing to memory by operating exclusively in registers?
Context:
In multithreaded programming, one of the key challenges is ensuring that changes made by one thread are visible to others. While this issue is commonly tied to memory visibility, another ...
1
vote
0
answers
97
views
How Does the Store Buffer Drain in x86 Architecture Work?
The topic of the Store Buffer (SB) and its mechanics, size, purpose, and interaction with other buffers has been discussed on Stack Overflow several times. However, certain aspects of its operation ...
2
votes
2
answers
119
views
How does C++23 happens before apply to std::memory_order_seq_cst?
Consider the following C++23 program:
#include <atomic>
#include <cassert>
#include <thread>
#define LOAD_ORDER std::memory_order_seq_cst
#define STORE_ORDER std::...
-1
votes
1
answer
78
views
Why different threads can see different memory operation orders? [duplicate]
The following code is an example from the book C++ Concurrency in Action (2nd edition). The author mentions that threads Ta and Tb can observe different memory states:
Tc observes x == true and y == ...
2
votes
1
answer
88
views
Understanding Memory Controller RPQ/WPQ ordering guarantees for loads and ntstores
I'm trying to understand how memory controllers maintain program order between non-temporal loads and non-temporal stores when there's significant queue pressure disparity between RPQ (Read Pending ...