Skip to main content
Filter by
Sorted by
Tagged with
1 vote
0 answers
17 views

cache coherence vs atomicity

I want to verify that if I understand properly the notion of atomicity and cache coherence. Starting with a simple scenario with only one shared memory location a, so I suppose there is nothing to do ...
0x314159's user avatar
0 votes
1 answer
84 views

What Store/Store reordering do modern CPUs do in practice?

Aarch64 and RISC-V WMO seem to allow Store/Store reordering according to their formal specifications. However, Store/Store reordering seems very tricky to perform in practice: the CPU would need to ...
64_'s user avatar
  • 559
4 votes
2 answers
111 views

C11 atomics: How does a relaxed load interact with a release store on the same variable?

Context: I have been writing a multithreaded program that uses atomics extensively. I've noticed that these atomics are very slow especially on ARM because the compiler inserted too many fences, ...
RedGreenBlue123's user avatar
0 votes
0 answers
65 views

Does disabling/enabling interrupts require memory barriers or similar memory ordering constraints?

AFAIK part of the reason why we need the C++11 memory model (and later patches/variants) is the fact we trade various things for single threaded executions to be fast, with only one main criteria i.e. ...
Not A Name's user avatar
1 vote
0 answers
97 views

Why LWSYNC can not make the Independent Read Independent Write Example (IRIW) behave sensibly on PowerPC?

On PowerPC platform, the book A Primer on Memory Consistency and Cache Coherence stated: As depicted in Table 5.18, Power’s HWSYNCs can be used to make the Independent Read Independent Write Example (...
Anonemous's user avatar
  • 319
4 votes
1 answer
105 views

Visibility of atomic operations with seq-cst fences in C++20

Until C++17 the standard contained the following paragraph (C++17 Section 32.4 [atomics.order] paragraph 6): For atomic operations A and B on an atomic object M, where A modifies M and B takes its ...
mpoeter's user avatar
  • 3,011
5 votes
1 answer
224 views

CUDA memory model: why acquire fence is not needed to prevent load-load reordering?

I am reading the book "Programming Massively Parallel Processors" and noticed the below code snippets to achieve "domino-style" scan: if (threadIdx.x == 0) { while(AtomicAdd(&...
Lifu Huang's user avatar
  • 12.9k
4 votes
2 answers
197 views

Does this transitive happens-before use case need sequential consistency or will acquire/release suffice?

This snip is from Herb Sutter's Atomic Weapons talk slide from page number 19. If I am understanding this correctly, what Herb is saying is that, for the assert() in thread 3 to succeed, this has to ...
Dhwani Katagade's user avatar
1 vote
1 answer
69 views

How does Julia know what the type of some object is? What is the memory layout of a reference to a `Vector{Int64}`?

I'm trying to understand more detail about how the Julia typesystem works. Consider the following simple example julia> v = [1] 1-element Vector{Int64}: 1 julia> typeof(v) Vector{Int64} (alias ...
user2138149's user avatar
  • 17.9k
5 votes
1 answer
140 views

Does gcc treat relaxed atomic operation as a Compiler-fence?

I have following code with GCC8.3 ,x86-64 linux: // file: inc.cc int inc_value(int* x) { (*x)++; //std::atomic<int> ww; //ww.load(std::memory_order_relaxed); (*x)++; return *x; } ...
song xs's user avatar
  • 81
1 vote
2 answers
202 views

Can compiler optimizations avoid writing to memory by operating exclusively in registers?

Context: In multithreaded programming, one of the key challenges is ensuring that changes made by one thread are visible to others. While this issue is commonly tied to memory visibility, another ...
Dmytro Kostenko's user avatar
1 vote
0 answers
97 views

How Does the Store Buffer Drain in x86 Architecture Work?

The topic of the Store Buffer (SB) and its mechanics, size, purpose, and interaction with other buffers has been discussed on Stack Overflow several times. However, certain aspects of its operation ...
Dmytro Kostenko's user avatar
2 votes
2 answers
119 views

How does C++23 happens before apply to std::memory_order_seq_cst?

Consider the following C++23 program: #include <atomic> #include <cassert> #include <thread> #define LOAD_ORDER std::memory_order_seq_cst #define STORE_ORDER std::...
user3188445's user avatar
  • 4,812
-1 votes
1 answer
78 views

Why different threads can see different memory operation orders? [duplicate]

The following code is an example from the book C++ Concurrency in Action (2nd edition). The author mentions that threads Ta and Tb can observe different memory states: Tc observes x == true and y == ...
hao's user avatar
  • 1
2 votes
1 answer
88 views

Understanding Memory Controller RPQ/WPQ ordering guarantees for loads and ntstores

I'm trying to understand how memory controllers maintain program order between non-temporal loads and non-temporal stores when there's significant queue pressure disparity between RPQ (Read Pending ...
idle_cycles's user avatar

15 30 50 per page
1
2 3 4 5
34