All Questions
496 questions
0
votes
1
answer
51
views
full cache-line align or half
I have a ring-buffer and high reading and writing usage.
I have the buffer structure defined as below:
typedef struct {
uint8_t* buffer;
int32_t readIdx;
int32_t readBase;
int32_t ...
2
votes
2
answers
99
views
What's the benefit of bring frequently-accessed array address into cache?
Recently I am looking over someone's Limit Order Book implementation. There is one place the author left a comment and I don't quite understand how is it going to benefit performance-wise.
Let me ...
2
votes
0
answers
83
views
How many cache lines does the adjacent cache line prefetcher bring into cache?
I was investigating the effectiveness of the adjacent cache line prefetcher and its impact on the number of cache lines prefetched from DRAM. Initially, I assumed it fetched only one more adjacent ...
0
votes
2
answers
75
views
Does truncating a file affect the cached contents?
Let's say I write some amount to some file A, then I close and reopen the file using open("A", O_TRUNC | O_WRONLY);. After this, I write the same amount again.
Will this be slower than a ...
3
votes
1
answer
80
views
Measuring cache line latency
I want to measure the latency to access one element of a cache line.
I have an struct with a next index and padding to have a length of a cache line size (64 bytes in my arch).
Then, an array of N ...
0
votes
1
answer
32
views
Need to decouple library and build a cohesive application
I have my application which has a hal function to invalidate cache memory. This application uses a library which needs the invalidate function.
A straight solution is to couple the application and ...
0
votes
1
answer
92
views
Cache-friendly sqare matrix transposition logic issue
This code is a transposition algorithm specifically optimized to work on square matrices with rows and columns that are multiples of 8. The optimization involves processing components later that ...
0
votes
1
answer
117
views
Prevent cache destruction by using thread affinity
I'm writing a program for Windows using latest msvc and the winapi. In short, I'm trying to speed up the computation process on a block of raw data read in from a file (in this case, creating a ...
0
votes
0
answers
126
views
My Cache simulator isnt running properly, can someone help me why that is?
For my studies, I need to make a cache simulator in C. I don't need to actually save the data but just count the misses, hits and evictions. The input is s, E, b for sets lines and block offset and I ...
0
votes
2
answers
180
views
Allocation array in Ram, no cache misses, C/C++
Is there a way to say to the compiler:
"do not try looking for it in the different caches and miss each time. Go for the RAM"
That way it should be faster to access to a huge array, which ...
1
vote
2
answers
104
views
Efficiency of storing from data cache to memory
Background
I am optimizing a function that reads from a Real-Time Clock (RTC) structure and stores the values into an array of 8-bit bytes.
The original function used direct assignments, the compiler ...
0
votes
1
answer
103
views
impact of matrix storage method on cache miss penalty
I'm tasked to implement a matrix-vector-multiplication using both a rows-first and a columns-first approach, when the matrix is stored row-major. Of course I expected lots of cache misses when the ...
2
votes
0
answers
134
views
Making generic LRU Cache in c for postgresql extension
I am trying to write an LRU cache which can hold any type of data item in c for a postgresql extension. Everything is fine except that I am relying on postgresql hashmap which itself allocates memory ...
0
votes
0
answers
72
views
Delay for writing data (if the data is not in the cache)?
Given the following code which is executed on x86-64 (other architectures do not matter here):
#include <stdatomic.h>
_Atomic int x;
void g(int a)
{
atomic_store_explicit(&x, a, ...
0
votes
1
answer
183
views
How does the CPU load data in cache in a vector multiplication function?
Considering the following function for element-wise vector multiplication:
void vector_multiply(float* result, const float* a, const float* b, int n) {
for (int i = 0; i < n; i++) {
...