Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
1 vote
0 answers
404 views

Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022

I've encountered an issue after upgrading my project to Visual Studio 2022. During the build process, I have the following error: LINK : fatal error C1007: unrecognized flag '-archSSE2' in 'p2' LINK :...
Frank Escobar's user avatar
-1 votes
1 answer
135 views

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics: for (uint r = 0; r < height; r++) { uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch *...
Nasir's user avatar
  • 27
0 votes
0 answers
255 views

How to add an alpha channel very fast to a RGB image using SSE2 and c++

I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows: size of image: 1920 x 1200 time of RGBA to ...
bluetooth16's user avatar
1 vote
1 answer
147 views

Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]

I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. ...
Arheus's user avatar
  • 23
4 votes
1 answer
701 views

In SIMD, SSE2,many instructions named as "_mm_set_epi8","_mm_cmpgt_epi8 " and so on,what does "mm" "epi" mean?

I see many instruction with shorthand such as "_mm_and_si128". I want to know what does the "mm" mean.
dongwang's user avatar
1 vote
0 answers
1k views

MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights

First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community: MOVDQU instruction + page boundary MOVUPD vs. MOVDQU (x86/x64 assembly) Difference ...
RajibTheKing's user avatar
  • 1,362
1 vote
0 answers
602 views

Efficiently find indices of 1-bits in large array, using SIMD

If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD. (For finding the first 1-bit, see an ...
Arty's user avatar
  • 16.8k
1 vote
0 answers
98 views

C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]

If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do: Try it online! #include <bit> int ...
Arty's user avatar
  • 16.8k
1 vote
0 answers
690 views

Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]

If I have array of 16 or 32 or 64 bytes (let's suppose aligned on 64-bytes memory boundary), how do I quickly find index of first byte equal to given, using SIMD SSE2/AVX/AVX2/AVX-512. If such byte ...
Arty's user avatar
  • 16.8k
1 vote
3 answers
891 views

How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?

I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here. ...
dave_thenerd's user avatar
0 votes
0 answers
118 views

Why some of sse intrinsics introduce move back and forth?

In my code, I set a 128-bit variable to zero. But I don't quite understand why it translates to two move instructions in assembly code? __m128i zeros = reinterpret_cast<__m128i>(_mm_setzero_pd())...
DoodleNoodle's user avatar
1 vote
1 answer
636 views

AVX divide __m256i packed 32-bit integers by two (no AVX2)

I'm looking for the fastest way to divide an __m256i of packed 32-bit integers by two (aka shift right by one) using AVX. I don't have access to AVX2. As far as I know, my options are: Drop down to ...
GlassBeaver's user avatar
4 votes
1 answer
894 views

Is there a difference between SVML vs. normal intrinsic square root functions?

Is there any sort of difference in precision or performance between normal sqrtps/pd or the SVML version: __m128d _mm_sqrt_pd (__m128d a) [SSE2] __m128d _mm_svml_sqrt_pd (__m128d a) [SSE?] ...
dave_thenerd's user avatar
2 votes
3 answers
758 views

How would you convert a "while" iterator into simd instructions?

This is the code I actually had (for a scalar code) which I've replicated (x4) storing data into simd: waveTable *waveTables[4]; for (int i = 0; i < 4; i++) { int waveTableIindex = 0; while ...
markzzz's user avatar
  • 48.2k
0 votes
1 answer
581 views

how to set a int32 value at some index within an m128i with only SSE2?

Is there a SSE2 intrinsics that can set a single int32 value within m128i? Such as set value 1000 at index 1 on a m128i that already contains 1,2,3,4? (which result in 1,1000,3,4)
markzzz's user avatar
  • 48.2k

15 30 50 per page
1
2 3 4 5
8