282 questions
1
vote
0
answers
404
views
Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022
I've encountered an issue after upgrading my project to Visual Studio 2022. During the build process, I have the following error:
LINK : fatal error C1007: unrecognized flag '-archSSE2' in 'p2'
LINK :...
-1
votes
1
answer
135
views
Implementing real-time bitmap scaling with SSE2 intrinsics [closed]
I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics:
for (uint r = 0; r < height; r++)
{
uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch *...
1
vote
1
answer
648
views
What exactly is the _mm_movemask_epi8 intrinsic doing?
I encountered the _mm_movemask_epi8 intrinsic in some code and I am trying to understand what exactly it does through an example, as I didn't comprehend entirely what it does from reading the ...
6
votes
3
answers
359
views
Clamp unsigned int to 0x10000 using SSE2
I want to clamp 32-bit unsigned ints to fixed value (0x10000) using only SSE2 instructions.
Basically, this C code:
if (c>0x10000) c=0x10000;
This code below works, but I'm wondering if it can be ...
0
votes
0
answers
255
views
How to add an alpha channel very fast to a RGB image using SSE2 and c++
I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows:
size of image: 1920 x 1200
time of RGBA to ...
1
vote
2
answers
152
views
Suggestions on further optimising this chi-square function using SSE2 intrinsics
I am trying to convert the below chi-square function in c code to SSE2 intrinsics
I am getting the correct output for both the functions. and I have measured the time it takes for both functions to ...
1
vote
1
answer
147
views
Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]
I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. ...
1
vote
0
answers
259
views
Sum of bytes in an __m128 register [duplicate]
I am trying to find the sum of all bytes in an __m128 register using SSE and SSE2.
So far what I have is
__m128i sum = _mm_sad_epu8(bytes, _mm_setzero_si128());
return _mm_cvtsi128_si32(sum) + ...
0
votes
1
answer
447
views
Why isn't Avx.Multiply significantly faster than the * operator?
I've created the following test method to understand how SSE and AVX work and what their benefits are. Now I'm actually very surprised to see that System.Runtime.Intrinsics.X86.Avx.Multiply is less ...
4
votes
1
answer
701
views
In SIMD, SSE2,many instructions named as "_mm_set_epi8","_mm_cmpgt_epi8 " and so on,what does "mm" "epi" mean?
I see many instruction with shorthand such as "_mm_and_si128". I want to know what does the "mm" mean.
1
vote
0
answers
1k
views
MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights
First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community:
MOVDQU instruction + page boundary
MOVUPD vs. MOVDQU (x86/x64 assembly)
Difference ...
1
vote
0
answers
602
views
Efficiently find indices of 1-bits in large array, using SIMD
If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD.
(For finding the first 1-bit, see an ...
1
vote
0
answers
98
views
C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]
If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do:
Try it online!
#include <bit>
int ...
1
vote
0
answers
690
views
Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]
If I have array of 16 or 32 or 64 bytes (let's suppose aligned on 64-bytes memory boundary), how do I quickly find index of first byte equal to given, using SIMD SSE2/AVX/AVX2/AVX-512. If such byte ...
1
vote
3
answers
891
views
How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?
I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here.
...