Newest 'sse2 c++' Questions

1 vote

0 answers

404 views

Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022

I've encountered an issue after upgrading my project to Visual Studio 2022. During the build process, I have the following error: LINK : fatal error C1007: unrecognized flag '-archSSE2' in 'p2' LINK :...

Frank Escobar

376

asked Oct 11, 2024 at 11:24

-1 votes

1 answer

135 views

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

I have this code that blits a bitmap onto the frame buffer with SSE2 intrinsics: for (uint r = 0; r < height; r++) { uint32* bufPixels = (frameBuffer->pixels + xPos) + frameBuffer->pitch *...

Nasir

27

asked Sep 5, 2024 at 10:26

0 votes

0 answers

255 views

How to add an alpha channel very fast to a RGB image using SSE2 and c++

I am writing a YUV420p to RGBA color conversion algorithm in C++ using SSE2. Right now, I have YUV420p to RGB and RGB to RGBA. The results are as follows: size of image: 1920 x 1200 time of RGBA to ...

bluetooth16

11

asked Oct 24, 2023 at 13:43

1 vote

1 answer

147 views

Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]

I wanted to create a matrix multiplication with simd. Everything is fine, when matrix is filled with some integers. But there are some issues when my matrices are filled with floating point values. ...

Arheus

23

asked Aug 3, 2023 at 14:07

4 votes

1 answer

701 views

In SIMD, SSE2，many instructions named as "_mm_set_epi8"，"_mm_cmpgt_epi8 " and so on，what does "mm" "epi" mean?

I see many instruction with shorthand such as "_mm_and_si128". I want to know what does the "mm" mean.

dongwang

43

asked Dec 17, 2022 at 4:24

1 vote

0 answers

1k views

MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights

First of all, let's start with the following links about MOVDQA and MOVDQU which are already in this community: MOVDQU instruction + page boundary MOVUPD vs. MOVDQU (x86/x64 assembly) Difference ...

RajibTheKing

1,362

asked Nov 8, 2022 at 12:20

1 vote

0 answers

602 views

Efficiently find indices of 1-bits in large array, using SIMD

If I have very large array of bytes and want to find indices of all 1-bits, indices counting from leftmost bit, how do I do this efficiently, probably using SIMD. (For finding the first 1-bit, see an ...

Arty

16.8k

asked Nov 8, 2022 at 6:26

1 vote

0 answers

98 views

C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]

If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do: Try it online! #include <bit> int ...

Arty

16.8k

asked Nov 7, 2022 at 19:27

1 vote

0 answers

690 views

Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]

If I have array of 16 or 32 or 64 bytes (let's suppose aligned on 64-bytes memory boundary), how do I quickly find index of first byte equal to given, using SIMD SSE2/AVX/AVX2/AVX-512. If such byte ...

Arty

16.8k

asked Oct 22, 2022 at 20:04

1 vote

3 answers

891 views

How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?

I came access this post whilst doing research for my next project. Being able to bit shift 8 and 16-bit integers by vector using SIMD would be very useful to me and I think many other people here. ...

dave_thenerd

468

asked Oct 13, 2022 at 4:04

0 votes

0 answers

118 views

Why some of sse intrinsics introduce move back and forth?

In my code, I set a 128-bit variable to zero. But I don't quite understand why it translates to two move instructions in assembly code? __m128i zeros = reinterpret_cast<__m128i>(_mm_setzero_pd())...

DoodleNoodle

19

asked Jun 25, 2022 at 6:57

1 vote

1 answer

636 views

AVX divide __m256i packed 32-bit integers by two (no AVX2)

I'm looking for the fastest way to divide an __m256i of packed 32-bit integers by two (aka shift right by one) using AVX. I don't have access to AVX2. As far as I know, my options are: Drop down to ...

GlassBeaver

332

asked Apr 30, 2022 at 22:46

4 votes

1 answer

894 views

Is there a difference between SVML vs. normal intrinsic square root functions?

Is there any sort of difference in precision or performance between normal sqrtps/pd or the SVML version: __m128d _mm_sqrt_pd (__m128d a) [SSE2] __m128d _mm_svml_sqrt_pd (__m128d a) [SSE?] ...

dave_thenerd

468

asked Sep 28, 2021 at 0:51

2 votes

3 answers

758 views

How would you convert a "while" iterator into simd instructions?

This is the code I actually had (for a scalar code) which I've replicated (x4) storing data into simd: waveTable *waveTables[4]; for (int i = 0; i < 4; i++) { int waveTableIindex = 0; while ...

markzzz

48.2k

asked Aug 16, 2021 at 8:57

0 votes

1 answer

581 views

how to set a int32 value at some index within an m128i with only SSE2?

Is there a SSE2 intrinsics that can set a single int32 value within m128i? Such as set value 1000 at index 1 on a m128i that already contains 1,2,3,4? (which result in 1,1000,3,4)

markzzz

48.2k

asked Apr 21, 2021 at 16:44

Collectives™ on Stack Overflow

All Questions

Error C1007: Unrecognized Flag '-archSSE2' After Upgrading Project to Visual Studio 2022

Implementing real-time bitmap scaling with SSE2 intrinsics [closed]

How to add an alpha channel very fast to a RGB image using SSE2 and c++

Matrix multiplication using simd produces incorrect results when filled with floating point values [closed]

In SIMD, SSE2，many instructions named as "_mm_set_epi8"，"_mm_cmpgt_epi8 " and so on，what does "mm" "epi" mean?

MOVDQU vs MOVDQA Instruction (x86/x64 assembly) better insights

Efficiently find indices of 1-bits in large array, using SIMD

C++ std::countr_zero() in SIMD 128/256/512 (find position of least significant 1 bit in 128/256/512-bit number) [duplicate]

Having array of 16/32/64 bytes how to quickly find index of first byte equal to given, using SSE2/AVX/AVX2/AVX-512 [duplicate]

How can I implement Bit Shift Right and Bit Shift Left by Vector for 8-bit and 16-bit integers in SSE2?

Why some of sse intrinsics introduce move back and forth?

AVX divide __m256i packed 32-bit integers by two (no AVX2)

Is there a difference between SVML vs. normal intrinsic square root functions?

How would you convert a "while" iterator into simd instructions?

how to set a int32 value at some index within an m128i with only SSE2?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags