clang 20.0.0git
Macros
avxvnniint16intrin.h File Reference

Go to the source code of this file.

Macros

#define _mm_dpwsud_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm256_dpwsud_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm_dpwsuds_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm256_dpwsuds_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm_dpwusd_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm256_dpwusd_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm_dpwusds_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm256_dpwusds_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm_dpwuud_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm256_dpwuud_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm_dpwuuds_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 
#define _mm256_dpwuuds_epi32(__W, __A, __B)
 Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.
 

Macro Definition Documentation

◆ _mm256_dpwsud_epi32

#define _mm256_dpwsud_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m256i)__builtin_ia32_vpdpwsud256((__v8si)(__W), (__v8si)(__A), \
(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m256i _mm256_dpwsud_epi32(__m256i __W, __m256i __A, __m256i __B)
#define _mm256_dpwsud_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

This intrinsic corresponds to the VPDPWSUD instruction.

Parameters
__WA 256-bit vector of [8 x int].
__AA 256-bit vector of [16 x short].
__BA 256-bit vector of [16 x unsigned short].
Returns
A 256-bit vector of [8 x int].
FOR j := 0 to 7
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0

Definition at line 82 of file avxvnniint16intrin.h.

◆ _mm256_dpwsuds_epi32

#define _mm256_dpwsuds_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m256i)__builtin_ia32_vpdpwsuds256((__v8si)(__W), (__v8si)(__A), \
(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B)
#define _mm256_dpwsuds_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters
__WA 256-bit vector of [8 x int].
__AA 256-bit vector of [16 x short].
__BA 256-bit vector of [16 x unsigned short].
Returns
A 256-bit vector of [8 x int].
FOR j := 0 to 7
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0

Definition at line 152 of file avxvnniint16intrin.h.

◆ _mm256_dpwusd_epi32

#define _mm256_dpwusd_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m256i)__builtin_ia32_vpdpwusd256((__v8si)(__W), (__v8si)(__A), \
(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m256i _mm256_dpwusd_epi32(__m256i __W, __m256i __A, __m256i __B)
#define _mm256_dpwusd_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-b...

This intrinsic corresponds to the VPDPWUSD instruction.

Parameters
__WA 256-bit vector of [8 x int].
__AA 256-bit vector of [16 x unsigned short].
__BA 256-bit vector of [16 x short].
Returns
A 256-bit vector of [8 x int].
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0

Definition at line 220 of file avxvnniint16intrin.h.

◆ _mm256_dpwusds_epi32

#define _mm256_dpwusds_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m256i)__builtin_ia32_vpdpwusds256((__v8si)(__W), (__v8si)(__A), \
(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B)

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters
__WA 256-bit vector of [8 x int].
__AA 256-bit vector of [16 x unsigned short].
__BA 256-bit vector of [16 x short].
Returns
A 256-bit vector of [8 x int].
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0

Definition at line 290 of file avxvnniint16intrin.h.

◆ _mm256_dpwuud_epi32

#define _mm256_dpwuud_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m256i)__builtin_ia32_vpdpwuud256((__v8si)(__W), (__v8si)(__A), \
(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m256i _mm256_dpwuud_epi32(__m256i __W, __m256i __A, __m256i __B)
#define _mm256_dpwuud_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16...

This intrinsic corresponds to the VPDPWUUD instruction.

Parameters
__WA 256-bit vector of [8 x unsigned int].
__AA 256-bit vector of [16 x unsigned short].
__BA 256-bit vector of [16 x unsigned short].
Returns
A 256-bit vector of [8 x unsigned int].
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0

Definition at line 358 of file avxvnniint16intrin.h.

◆ _mm256_dpwuuds_epi32

#define _mm256_dpwuuds_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m256i)__builtin_ia32_vpdpwuuds256((__v8si)(__W), (__v8si)(__A), \
(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m256i _mm256_dpwuuds_epi32(__m256i __W, __m256i __A, __m256i __B)
#define _mm256_dpwuuds_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16...

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters
__WA 256-bit vector of [8 x unsigned int].
__AA 256-bit vector of [16 x unsigned short].
__BA 256-bit vector of [16 x unsigned short].
Returns
A 256-bit vector of [8 x unsigned int].
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0

Definition at line 428 of file avxvnniint16intrin.h.

◆ _mm_dpwsud_epi32

#define _mm_dpwsud_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m128i)__builtin_ia32_vpdpwsud128((__v4si)(__W), (__v4si)(__A), \
(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m128i _mm_dpwsud_epi32(__m128i __W, __m128i __A, __m128i __B)
#define _mm_dpwsud_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

This intrinsic corresponds to the VPDPWSUD instruction.

Parameters
__WA 128-bit vector of [4 x int].
__AA 128-bit vector of [8 x short].
__BA 128-bit vector of [8 x unsigned short].
Returns
A 128-bit vector of [4 x int].
FOR j := 0 to 3
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0

Definition at line 48 of file avxvnniint16intrin.h.

◆ _mm_dpwsuds_epi32

#define _mm_dpwsuds_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m128i)__builtin_ia32_vpdpwsuds128((__v4si)(__W), (__v4si)(__A), \
(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A, __m128i __B)
#define _mm_dpwsuds_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters
__WA 128-bit vector of [4 x int].
__AA 128-bit vector of [8 x short].
__BA 128-bit vector of [8 x unsigned short].
Returns
A 128-bit vector of [4 x int].
FOR j := 0 to 3
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0

Definition at line 117 of file avxvnniint16intrin.h.

◆ _mm_dpwusd_epi32

#define _mm_dpwusd_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m128i)__builtin_ia32_vpdpwusd128((__v4si)(__W), (__v4si)(__A), \
(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m128i _mm_dpbusd_epi32(__m128i __W, __m128i __A, __m128i __B)
#define _mm_dpbusd_epi32(S, A, B)
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in A with corresponding signed 8-bit i...

This intrinsic corresponds to the VPDPWUSD instruction.

Parameters
__WA 128-bit vector of [4 x int].
__AA 128-bit vector of [8 x unsigned short].
__BA 128-bit vector of [8 x short].
Returns
A 128-bit vector of [4 x int].
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0

Definition at line 186 of file avxvnniint16intrin.h.

◆ _mm_dpwusds_epi32

#define _mm_dpwusds_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m128i)__builtin_ia32_vpdpwusds128((__v4si)(__W), (__v4si)(__A), \
(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m128i _mm_dpwusds_epi32(__m128i __W, __m128i __A, __m128i __B)
#define _mm_dpwusds_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-b...

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters
__WA 128-bit vector of [4 x int].
__AA 128-bit vector of [8 x unsigned short].
__BA 128-bit vector of [8 x short].
Returns
A 128-bit vector of [4 x int].
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0

Definition at line 255 of file avxvnniint16intrin.h.

◆ _mm_dpwuud_epi32

#define _mm_dpwuud_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m128i)__builtin_ia32_vpdpwuud128((__v4si)(__W), (__v4si)(__A), \
(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m128i _mm_dpwuud_epi32(__m128i __W, __m128i __A, __m128i __B)
#define _mm_dpwuud_epi32(__W, __A, __B)
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16...

This intrinsic corresponds to the VPDPWUUD instruction.

Parameters
__WA 128-bit vector of [4 x unsigned int].
__AA 128-bit vector of [8 x unsigned short].
__BA 128-bit vector of [8 x unsigned short].
Returns
A 128-bit vector of [4 x unsigned int].
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0

Definition at line 324 of file avxvnniint16intrin.h.

◆ _mm_dpwuuds_epi32

#define _mm_dpwuuds_epi32 (   __W,
  __A,
  __B 
)
Value:
((__m128i)__builtin_ia32_vpdpwuuds128((__v4si)(__W), (__v4si)(__A), \
(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A, __m128i __B)

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters
__WA 128-bit vector of [4 x unsigned int].
__AA 128-bit vector of [8 x unsigned short].
__BA 128-bit vector of [8 x unsigned short].
Returns
A 128-bit vector of [4 x unsigned int].
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0

Definition at line 393 of file avxvnniint16intrin.h.