Newest 'memory-model' Questions

1 vote

0 answers

17 views

cache coherence vs atomicity

I want to verify that if I understand properly the notion of atomicity and cache coherence. Starting with a simple scenario with only one shared memory location a, so I suppose there is nothing to do ...

0x314159

51

asked 5 hours ago

0 votes

1 answer

84 views

What Store/Store reordering do modern CPUs do in practice?

Aarch64 and RISC-V WMO seem to allow Store/Store reordering according to their formal specifications. However, Store/Store reordering seems very tricky to perform in practice: the CPU would need to ...

64_

559

asked 2 days ago

4 votes

2 answers

111 views

C11 atomics: How does a relaxed load interact with a release store on the same variable?

Context: I have been writing a multithreaded program that uses atomics extensively. I've noticed that these atomics are very slow especially on ARM because the compiler inserted too many fences, ...

RedGreenBlue123

67

asked Apr 4 at 17:31

0 votes

0 answers

65 views

Does disabling/enabling interrupts require memory barriers or similar memory ordering constraints?

AFAIK part of the reason why we need the C++11 memory model (and later patches/variants) is the fact we trade various things for single threaded executions to be fast, with only one main criteria i.e. ...

Not A Name

43

asked Apr 4 at 5:31

1 vote

0 answers

97 views

Why LWSYNC can not make the Independent Read Independent Write Example (IRIW) behave sensibly on PowerPC?

On PowerPC platform, the book A Primer on Memory Consistency and Cache Coherence stated: As depicted in Table 5.18, Power’s HWSYNCs can be used to make the Independent Read Independent Write Example (...

Anonemous

319

asked Apr 1 at 12:20

4 votes

1 answer

105 views

Visibility of atomic operations with seq-cst fences in C++20

Until C++17 the standard contained the following paragraph (C++17 Section 32.4 [atomics.order] paragraph 6): For atomic operations A and B on an atomic object M, where A modifies M and B takes its ...

mpoeter

3,011

asked Mar 3 at 14:41

5 votes

1 answer

224 views

CUDA memory model: why acquire fence is not needed to prevent load-load reordering?

I am reading the book "Programming Massively Parallel Processors" and noticed the below code snippets to achieve "domino-style" scan: if (threadIdx.x == 0) { while(AtomicAdd(&...

Lifu Huang

12.9k

asked Feb 11 at 9:16

4 votes

2 answers

197 views

Does this transitive happens-before use case need sequential consistency or will acquire/release suffice?

This snip is from Herb Sutter's Atomic Weapons talk slide from page number 19. If I am understanding this correctly, what Herb is saying is that, for the assert() in thread 3 to succeed, this has to ...

Dhwani Katagade

1,322

asked Feb 5 at 10:54

1 vote

1 answer

69 views

How does Julia know what the type of some object is? What is the memory layout of a reference to a `Vector{Int64}`?

I'm trying to understand more detail about how the Julia typesystem works. Consider the following simple example julia> v = [1] 1-element Vector{Int64}: 1 julia> typeof(v) Vector{Int64} (alias ...

user2138149

17.9k

asked Jan 15 at 11:07

5 votes

1 answer

140 views

Does gcc treat relaxed atomic operation as a Compiler-fence?

I have following code with GCC8.3 ,x86-64 linux: // file: inc.cc int inc_value(int* x) { (*x)++; //std::atomic<int> ww; //ww.load(std::memory_order_relaxed); (*x)++; return *x; } ...

song xs

81

asked Jan 2 at 10:10

1 vote

2 answers

202 views

Can compiler optimizations avoid writing to memory by operating exclusively in registers?

Context: In multithreaded programming, one of the key challenges is ensuring that changes made by one thread are visible to others. While this issue is commonly tied to memory visibility, another ...

Dmytro Kostenko

235

asked Dec 27, 2024 at 23:06

1 vote

0 answers

97 views

How Does the Store Buffer Drain in x86 Architecture Work?

The topic of the Store Buffer (SB) and its mechanics, size, purpose, and interaction with other buffers has been discussed on Stack Overflow several times. However, certain aspects of its operation ...

Dmytro Kostenko

235

asked Dec 25, 2024 at 15:21

2 votes

2 answers

119 views

How does C++23 happens before apply to std::memory_order_seq_cst?

Consider the following C++23 program: #include <atomic> #include <cassert> #include <thread> #define LOAD_ORDER std::memory_order_seq_cst #define STORE_ORDER std::...

user3188445

4,812

asked Dec 18, 2024 at 19:56

-1 votes

1 answer

78 views

Why different threads can see different memory operation orders? [duplicate]

The following code is an example from the book C++ Concurrency in Action (2nd edition). The author mentions that threads Ta and Tb can observe different memory states: Tc observes x == true and y == ...

hao

1

asked Dec 9, 2024 at 6:34

2 votes

1 answer

88 views

Understanding Memory Controller RPQ/WPQ ordering guarantees for loads and ntstores

I'm trying to understand how memory controllers maintain program order between non-temporal loads and non-temporal stores when there's significant queue pressure disparity between RPQ (Read Pending ...

idle_cycles

173

asked Nov 10, 2024 at 23:44

Collectives™ on Stack Overflow

cache coherence vs atomicity

What Store/Store reordering do modern CPUs do in practice?

C11 atomics: How does a relaxed load interact with a release store on the same variable?

Does disabling/enabling interrupts require memory barriers or similar memory ordering constraints?

Why LWSYNC can not make the Independent Read Independent Write Example (IRIW) behave sensibly on PowerPC?

Visibility of atomic operations with seq-cst fences in C++20

CUDA memory model: why acquire fence is not needed to prevent load-load reordering?

Does this transitive happens-before use case need sequential consistency or will acquire/release suffice?

How does Julia know what the type of some object is? What is the memory layout of a reference to a `Vector{Int64}`?

Does gcc treat relaxed atomic operation as a Compiler-fence?

Can compiler optimizations avoid writing to memory by operating exclusively in registers?

How Does the Store Buffer Drain in x86 Architecture Work?

How does C++23 happens before apply to std::memory_order_seq_cst?

Why different threads can see different memory operation orders? [duplicate]

Understanding Memory Controller RPQ/WPQ ordering guarantees for loads and ntstores

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags