Skip to main content
Filter by
Sorted by
Tagged with
1 vote
0 answers
404 views

Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022

I've encountered an issue after upgrading my project to Visual Studio 2022. During the build process, I have the following error: LINK : fatal error C1007: unrecognized flag '-archSSE2' in 'p2' LINK :...
Frank Escobar's user avatar
-1 votes
1 answer
135 views

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics: for (uint r = 0; r < height; r++) { uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch *...
Nasir's user avatar
  • 27
1 vote
1 answer
648 views

What exactly is the _mm_movemask_epi8 intrinsic doing?

I encountered the _mm_movemask_epi8 intrinsic in some code and I am trying to understand what exactly it does through an example, as I didn't comprehend entirely what it does from reading the ...
Blargian's user avatar
  • 360
6 votes
3 answers
359 views

Clamp unsigned int to 0x10000 using SSE2

I want to clamp 32-bit unsigned ints to fixed value (0x10000) using only SSE2 instructions. Basically, this C code: if (c>0x10000) c=0x10000; This code below works, but I'm wondering if it can be ...
Sanyin's user avatar
  • 101
0 votes
0 answers
255 views

How to add an alpha channel very fast to a RGB image using SSE2 and c++

I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows: size of image: 1920 x 1200 time of RGBA to ...
bluetooth16's user avatar
1 vote
2 answers
152 views

Suggestions on further optimising this chi-square function using SSE2 intrinsics

I am trying to convert the below chi-square function in c code to SSE2 intrinsics I am getting the correct output for both the functions. and I have measured the time it takes for both functions to ...
Sanku's user avatar
  • 511
1 vote
1 answer
147 views

Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]

I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. ...
Arheus's user avatar
  • 23
1 vote
0 answers
259 views

Sum of bytes in an __m128 register [duplicate]

I am trying to find the sum of all bytes in an __m128 register using SSE and SSE2. So far what I have is __m128i sum = _mm_sad_epu8(bytes, _mm_setzero_si128()); return _mm_cvtsi128_si32(sum) + ...
user17784058's user avatar
0 votes
1 answer
447 views

Why isn't Avx.Multiply significantly faster than the * operator?

I've created the following test method to understand how SSE and AVX work and what their benefits are. Now I'm actually very surprised to see that System.Runtime.Intrinsics.X86.Avx.Multiply is less ...
André Reichelt's user avatar
4 votes
1 answer
701 views

In SIMD, SSE2,many instructions named as "_mm_set_epi8","_mm_cmpgt_epi8 " and so on,what does "mm" "epi" mean?

I see many instruction with shorthand such as "_mm_and_si128". I want to know what does the "mm" mean.
dongwang's user avatar
1 vote
0 answers
1k views

MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights

First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community: MOVDQU instruction + page boundary MOVUPD vs. MOVDQU (x86/x64 assembly) Difference ...
RajibTheKing's user avatar
  • 1,362
1 vote
0 answers
602 views

Efficiently find indices of 1-bits in large array, using SIMD

If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD. (For finding the first 1-bit, see an ...
Arty's user avatar
  • 16.8k
1 vote
0 answers
98 views

C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]

If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do: Try it online! #include <bit> int ...
Arty's user avatar
  • 16.8k
1 vote
0 answers
690 views

Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]

If I have array of 16 or 32 or 64 bytes (let's suppose aligned on 64-bytes memory boundary), how do I quickly find index of first byte equal to given, using SIMD SSE2/AVX/AVX2/AVX-512. If such byte ...
Arty's user avatar
  • 16.8k
1 vote
3 answers
891 views

How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?

I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here. ...
dave_thenerd's user avatar

15 30 50 per page
1
2 3 4 5
19