All Questions
107 questions
1
vote
0
answers
404
views
Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022
I've encountered an issue after upgrading my project to Visual Studio 2022. During the build process, I have the following error:
LINK : fatal error C1007: unrecognized flag '-archSSE2' in 'p2'
LINK :...
-1
votes
1
answer
135
views
Implementing real-time bitmap scaling with SSE2 intrinsics [closed]
I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics:
for (uint r = 0; r < height; r++)
{
uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch *...
0
votes
0
answers
255
views
How to add an alpha channel very fast to a RGB image using SSE2 and c++
I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows:
size of image: 1920 x 1200
time of RGBA to ...
1
vote
1
answer
147
views
Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]
I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. ...
4
votes
1
answer
701
views
In SIMD, SSE2,many instructions named as "_mm_set_epi8","_mm_cmpgt_epi8 " and so on,what does "mm" "epi" mean?
I see many instruction with shorthand such as "_mm_and_si128". I want to know what does the "mm" mean.
1
vote
0
answers
1k
views
MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights
First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community:
MOVDQU instruction + page boundary
MOVUPD vs. MOVDQU (x86/x64 assembly)
Difference ...
1
vote
0
answers
602
views
Efficiently find indices of 1-bits in large array, using SIMD
If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD.
(For finding the first 1-bit, see an ...
1
vote
0
answers
98
views
C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]
If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do:
Try it online!
#include <bit>
int ...
1
vote
0
answers
690
views
Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]
If I have array of 16 or 32 or 64 bytes (let's suppose aligned on 64-bytes memory boundary), how do I quickly find index of first byte equal to given, using SIMD SSE2/AVX/AVX2/AVX-512. If such byte ...
1
vote
3
answers
891
views
How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?
I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here.
...
0
votes
0
answers
118
views
Why some of sse intrinsics introduce move back and forth?
In my code, I set a 128-bit variable to zero. But I don't quite understand why it translates to two move instructions in assembly code?
__m128i zeros = reinterpret_cast<__m128i>(_mm_setzero_pd())...
1
vote
1
answer
636
views
AVX divide __m256i packed 32-bit integers by two (no AVX2)
I'm looking for the fastest way to divide an __m256i of packed 32-bit integers by two (aka shift right by one) using AVX. I don't have access to AVX2.
As far as I know, my options are:
Drop down to ...
4
votes
1
answer
894
views
Is there a difference between SVML vs. normal intrinsic square root functions?
Is there any sort of difference in precision or performance between normal sqrtps/pd or the SVML version:
__m128d _mm_sqrt_pd (__m128d a) [SSE2]
__m128d _mm_svml_sqrt_pd (__m128d a) [SSE?]
...
2
votes
3
answers
758
views
How would you convert a "while" iterator into simd instructions?
This is the code I actually had (for a scalar code) which I've replicated (x4) storing data into simd:
waveTable *waveTables[4];
for (int i = 0; i < 4; i++) {
int waveTableIindex = 0;
while ...
0
votes
1
answer
581
views
how to set a int32 value at some index within an m128i with only SSE2?
Is there a SSE2 intrinsics that can set a single int32 value within m128i?
Such as set value 1000 at index 1 on a m128i that already contains 1,2,3,4? (which result in 1,1000,3,4)