Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
1 answer
51 views

full cache-line align or half

I have a ring-buffer and high reading and writing usage. I have the buffer structure defined as below: typedef struct { uint8_t* buffer; int32_t readIdx; int32_t readBase; int32_t ...
Pengcheng's user avatar
2 votes
2 answers
99 views

What's the benefit of bring frequently-accessed array address into cache?

Recently I am looking over someone's Limit Order Book implementation. There is one place the author left a comment and I don't quite understand how is it going to benefit performance-wise. Let me ...
Love Cute Shiba's user avatar
2 votes
0 answers
83 views

How many cache lines does the adjacent cache line prefetcher bring into cache?

I was investigating the effectiveness of the adjacent cache line prefetcher and its impact on the number of cache lines prefetched from DRAM. Initially, I assumed it fetched only one more adjacent ...
Hod Badihi's user avatar
0 votes
2 answers
75 views

Does truncating a file affect the cached contents?

Let's say I write some amount to some file A, then I close and reopen the file using open("A", O_TRUNC | O_WRONLY);. After this, I write the same amount again. Will this be slower than a ...
Hanz Schmidt's user avatar
3 votes
1 answer
80 views

Measuring cache line latency

I want to measure the latency to access one element of a cache line. I have an struct with a next index and padding to have a length of a cache line size (64 bytes in my arch). Then, an array of N ...
Franks's user avatar
  • 132
0 votes
1 answer
32 views

Need to decouple library and build a cohesive application

I have my application which has a hal function to invalidate cache memory. This application uses a library which needs the invalidate function. A straight solution is to couple the application and ...
basangouda46's user avatar
0 votes
1 answer
92 views

Cache-friendly sqare matrix transposition logic issue

This code is a transposition algorithm specifically optimized to work on square matrices with rows and columns that are multiples of 8. The optimization involves processing components later that ...
hskimse's user avatar
  • 13
0 votes
1 answer
117 views

Prevent cache destruction by using thread affinity

I'm writing a program for Windows using latest msvc and the winapi. In short, I'm trying to speed up the computation process on a block of raw data read in from a file (in this case, creating a ...
Arush Agarampur's user avatar
0 votes
0 answers
126 views

My Cache simulator isnt running properly, can someone help me why that is?

For my studies, I need to make a cache simulator in C. I don't need to actually save the data but just count the misses, hits and evictions. The input is s, E, b for sets lines and block offset and I ...
Alabama92's user avatar
0 votes
2 answers
180 views

Allocation array in Ram, no cache misses, C/C++

Is there a way to say to the compiler: "do not try looking for it in the different caches and miss each time. Go for the RAM" That way it should be faster to access to a huge array, which ...
Mitto's user avatar
  • 1
1 vote
2 answers
104 views

Efficiency of storing from data cache to memory

Background I am optimizing a function that reads from a Real-Time Clock (RTC) structure and stores the values into an array of 8-bit bytes. The original function used direct assignments, the compiler ...
Thomas Matthews's user avatar
0 votes
1 answer
103 views

impact of matrix storage method on cache miss penalty

I'm tasked to implement a matrix-vector-multiplication using both a rows-first and a columns-first approach, when the matrix is stored row-major. Of course I expected lots of cache misses when the ...
j-hap's user avatar
  • 333
2 votes
0 answers
134 views

Making generic LRU Cache in c for postgresql extension

I am trying to write an LRU cache which can hold any type of data item in c for a postgresql extension. Everything is fine except that I am relying on postgresql hashmap which itself allocates memory ...
widesense's user avatar
  • 120
0 votes
0 answers
72 views

Delay for writing data (if the data is not in the cache)?

Given the following code which is executed on x86-64 (other architectures do not matter here): #include <stdatomic.h> _Atomic int x; void g(int a) { atomic_store_explicit(&x, a, ...
Kevin Meier's user avatar
  • 2,582
0 votes
1 answer
183 views

How does the CPU load data in cache in a vector multiplication function?

Considering the following function for element-wise vector multiplication: void vector_multiply(float* result, const float* a, const float* b, int n) { for (int i = 0; i < n; i++) { ...
scasci's user avatar
  • 49

15 30 50 per page
1
2 3 4 5
34