Newest 'sse2' Questions

1 vote

0 answers

404 views

Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022

I've encountered an issue after upgrading my project to Visual Studio 2022. During the build process, I have the following error: LINK : fatal error C1007: unrecognized flag '-archSSE2' in 'p2' LINK :...

Frank Escobar

376

asked Oct 11, 2024 at 11:24

-1 votes

1 answer

135 views

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics: for (uint r = 0; r < height; r++) { uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch *...

Nasir

27

asked Sep 5, 2024 at 10:26

1 vote

1 answer

648 views

What exactly is the _mm_movemask_epi8 intrinsic doing?

I encountered the _mm_movemask_epi8 intrinsic in some code and I am trying to understand what exactly it does through an example, as I didn't comprehend entirely what it does from reading the ...

Blargian

360

asked Apr 30, 2024 at 17:37

6 votes

3 answers

359 views

Clamp unsigned int to 0x10000 using SSE2

I want to clamp 32-bit unsigned ints to fixed value (0x10000) using only SSE2 instructions. Basically, this C code: if (c>0x10000) c=0x10000; This code below works, but I'm wondering if it can be ...

Sanyin

101

asked Feb 2, 2024 at 17:46

0 votes

0 answers

255 views

How to add an alpha channel very fast to a RGB image using SSE2 and c++

I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows: size of image: 1920 x 1200 time of RGBA to ...

bluetooth16

11

asked Oct 24, 2023 at 13:43

1 vote

2 answers

152 views

Suggestions on further optimising this chi-square function using SSE2 intrinsics

I am trying to convert the below chi-square function in c code to SSE2 intrinsics I am getting the correct output for both the functions. and I have measured the time it takes for both functions to ...

Sanku

511

asked Sep 4, 2023 at 11:46

1 vote

1 answer

147 views

Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]

I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. ...

Arheus

23

asked Aug 3, 2023 at 14:07

1 vote

0 answers

259 views

Sum of bytes in an __m128 register [duplicate]

I am trying to find the sum of all bytes in an __m128 register using SSE and SSE2. So far what I have is __m128i sum = _mm_sad_epu8(bytes, _mm_setzero_si128()); return _mm_cvtsi128_si32(sum) + ...

user17784058

11

asked Jun 8, 2023 at 22:36

0 votes

1 answer

447 views

Why isn't Avx.Multiply significantly faster than the * operator?

I've created the following test method to understand how SSE and AVX work and what their benefits are. Now I'm actually very surprised to see that System.Runtime.Intrinsics.X86.Avx.Multiply is less ...

André Reichelt

1,641

asked Feb 15, 2023 at 10:58

4 votes

1 answer

701 views

In SIMD, SSE2，many instructions named as "_mm_set_epi8"，"_mm_cmpgt_epi8 " and so on，what does "mm" "epi" mean?

I see many instruction with shorthand such as "_mm_and_si128". I want to know what does the "mm" mean.

dongwang

43

asked Dec 17, 2022 at 4:24

1 vote

0 answers

1k views

MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights

First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community: MOVDQU instruction + page boundary MOVUPD vs. MOVDQU (x86/x64 assembly) Difference ...

RajibTheKing

1,362

asked Nov 8, 2022 at 12:20

1 vote

0 answers

602 views

Efficiently find indices of 1-bits in large array, using SIMD

If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD. (For finding the first 1-bit, see an ...

Arty

16.8k

asked Nov 8, 2022 at 6:26

1 vote

0 answers

98 views

C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]

If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do: Try it online! #include <bit> int ...

Arty

16.8k

asked Nov 7, 2022 at 19:27

1 vote

0 answers

690 views

Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]

If I have array of 16 or 32 or 64 bytes (let's suppose aligned on 64-bytes memory boundary), how do I quickly find index of first byte equal to given, using SIMD SSE2/AVX/AVX2/AVX-512. If such byte ...

Arty

16.8k

asked Oct 22, 2022 at 20:04

1 vote

3 answers

891 views

How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?

I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here. ...

dave_thenerd

468

asked Oct 13, 2022 at 4:04

Collectives™ on Stack Overflow

Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

What exactly is the _mm_movemask_epi8 intrinsic doing?

Clamp unsigned int to 0x10000 using SSE2

How to add an alpha channel very fast to a RGB image using SSE2 and c++

Suggestions on further optimising this chi-square function using SSE2 intrinsics

Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]

Sum of bytes in an __m128 register [duplicate]

Why isn't Avx.Multiply significantly faster than the * operator?

In SIMD, SSE2，many instructions named as "_mm_set_epi8"，"_mm_cmpgt_epi8 " and so on，what does "mm" "epi" mean?

MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights

Efficiently find indices of 1-bits in large array, using SIMD

C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]

Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]

How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags