Macros
#define	_mm_dpwsud_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm256_dpwsud_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm_dpwsuds_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm256_dpwsuds_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm_dpwusd_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm256_dpwusd_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm_dpwusds_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm256_dpwusds_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm_dpwuud_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm256_dpwuud_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm_dpwuuds_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

#define	_mm256_dpwuuds_epi32(__W, __A, __B)
	Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Macro Definition Documentation

◆ _mm256_dpwsud_epi32

#define _mm256_dpwsud_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m256i)__builtin_ia32_vpdpwsud256((__v8si)(__W), (__v8si)(__A), \

(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m256i _mm256_dpwsud_epi32(__m256i __W, __m256i __A, __m256i __B)

_mm256_dpwsud_epi32

#define _mm256_dpwsud_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

Definition: avxvnniint16intrin.h:82

This intrinsic corresponds to the VPDPWSUD instruction.

Parameters

__W	A 256-bit vector of [8 x int].
__A	A 256-bit vector of [16 x short].
__B	A 256-bit vector of [16 x unsigned short].

Returns: A 256-bit vector of [8 x int].

FOR j := 0 to 7
  tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0

Definition at line 82 of file avxvnniint16intrin.h.

◆ _mm256_dpwsuds_epi32

#define _mm256_dpwsuds_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m256i)__builtin_ia32_vpdpwsuds256((__v8si)(__W), (__v8si)(__A), \

(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B)

_mm256_dpwsuds_epi32

#define _mm256_dpwsuds_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

Definition: avxvnniint16intrin.h:152

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters

__W	A 256-bit vector of [8 x int].
__A	A 256-bit vector of [16 x short].
__B	A 256-bit vector of [16 x unsigned short].

Returns: A 256-bit vector of [8 x int].

FOR j := 0 to 7
  tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0

Definition at line 152 of file avxvnniint16intrin.h.

◆ _mm256_dpwusd_epi32

#define _mm256_dpwusd_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m256i)__builtin_ia32_vpdpwusd256((__v8si)(__W), (__v8si)(__A), \

(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m256i _mm256_dpwusd_epi32(__m256i __W, __m256i __A, __m256i __B)

_mm256_dpwusd_epi32

#define _mm256_dpwusd_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-b...

Definition: avxvnniint16intrin.h:220

This intrinsic corresponds to the VPDPWUSD instruction.

Parameters

__W	A 256-bit vector of [8 x int].
__A	A 256-bit vector of [16 x unsigned short].
__B	A 256-bit vector of [16 x short].

Returns: A 256-bit vector of [8 x int].

FOR j := 0 to 7
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
  dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0

Definition at line 220 of file avxvnniint16intrin.h.

◆ _mm256_dpwusds_epi32

#define _mm256_dpwusds_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m256i)__builtin_ia32_vpdpwusds256((__v8si)(__W), (__v8si)(__A), \

(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B)

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters

__W	A 256-bit vector of [8 x int].
__A	A 256-bit vector of [16 x unsigned short].
__B	A 256-bit vector of [16 x short].

Returns: A 256-bit vector of [8 x int].

FOR j := 0 to 7
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
  dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0

Definition at line 290 of file avxvnniint16intrin.h.

◆ _mm256_dpwuud_epi32

#define _mm256_dpwuud_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m256i)__builtin_ia32_vpdpwuud256((__v8si)(__W), (__v8si)(__A), \

(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m256i _mm256_dpwuud_epi32(__m256i __W, __m256i __A, __m256i __B)

_mm256_dpwuud_epi32

#define _mm256_dpwuud_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16...

Definition: avxvnniint16intrin.h:358

This intrinsic corresponds to the VPDPWUUD instruction.

Parameters

__W	A 256-bit vector of [8 x unsigned int].
__A	A 256-bit vector of [16 x unsigned short].
__B	A 256-bit vector of [16 x unsigned short].

Returns: A 256-bit vector of [8 x unsigned int].

FOR j := 0 to 7
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0

Definition at line 358 of file avxvnniint16intrin.h.

◆ _mm256_dpwuuds_epi32

#define _mm256_dpwuuds_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m256i)__builtin_ia32_vpdpwuuds256((__v8si)(__W), (__v8si)(__A), \

(__v8si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m256i _mm256_dpwuuds_epi32(__m256i __W, __m256i __A, __m256i __B)

_mm256_dpwuuds_epi32

#define _mm256_dpwuuds_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16...

Definition: avxvnniint16intrin.h:428

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters

__W	A 256-bit vector of [8 x unsigned int].
__A	A 256-bit vector of [16 x unsigned short].
__B	A 256-bit vector of [16 x unsigned short].

Returns: A 256-bit vector of [8 x unsigned int].

FOR j := 0 to 7
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0

Definition at line 428 of file avxvnniint16intrin.h.

◆ _mm_dpwsud_epi32

#define _mm_dpwsud_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m128i)__builtin_ia32_vpdpwsud128((__v4si)(__W), (__v4si)(__A), \

(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m128i _mm_dpwsud_epi32(__m128i __W, __m128i __A, __m128i __B)

_mm_dpwsud_epi32

#define _mm_dpwsud_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

Definition: avxvnniint16intrin.h:48

This intrinsic corresponds to the VPDPWSUD instruction.

Parameters

__W	A 128-bit vector of [4 x int].
__A	A 128-bit vector of [8 x short].
__B	A 128-bit vector of [8 x unsigned short].

Returns: A 128-bit vector of [4 x int].

FOR j := 0 to 3
  tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0

Definition at line 48 of file avxvnniint16intrin.h.

◆ _mm_dpwsuds_epi32

#define _mm_dpwsuds_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m128i)__builtin_ia32_vpdpwsuds128((__v4si)(__W), (__v4si)(__A), \

(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A, __m128i __B)

_mm_dpwsuds_epi32

#define _mm_dpwsuds_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of signed 16-bit integers in __A with corresponding unsigned 16-b...

Definition: avxvnniint16intrin.h:117

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters

__W	A 128-bit vector of [4 x int].
__A	A 128-bit vector of [8 x short].
__B	A 128-bit vector of [8 x unsigned short].

Returns: A 128-bit vector of [4 x int].

FOR j := 0 to 3
  tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0

Definition at line 117 of file avxvnniint16intrin.h.

◆ _mm_dpwusd_epi32

#define _mm_dpwusd_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m128i)__builtin_ia32_vpdpwusd128((__v4si)(__W), (__v4si)(__A), \

(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m128i _mm_dpbusd_epi32(__m128i __W, __m128i __A, __m128i __B)

_mm_dpbusd_epi32

#define _mm_dpbusd_epi32(S, A, B)

Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in A with corresponding signed 8-bit i...

Definition: avx512vlvnniintrin.h:120

This intrinsic corresponds to the VPDPWUSD instruction.

Parameters

__W	A 128-bit vector of [4 x int].
__A	A 128-bit vector of [8 x unsigned short].
__B	A 128-bit vector of [8 x short].

Returns: A 128-bit vector of [4 x int].

FOR j := 0 to 3
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
  dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0

Definition at line 186 of file avxvnniint16intrin.h.

◆ _mm_dpwusds_epi32

#define _mm_dpwusds_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m128i)__builtin_ia32_vpdpwusds128((__v4si)(__W), (__v4si)(__A), \

(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m128i _mm_dpwusds_epi32(__m128i __W, __m128i __A, __m128i __B)

_mm_dpwusds_epi32

#define _mm_dpwusds_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding signed 16-b...

Definition: avxvnniint16intrin.h:255

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters

__W	A 128-bit vector of [4 x int].
__A	A 128-bit vector of [8 x unsigned short].
__B	A 128-bit vector of [8 x short].

Returns: A 128-bit vector of [4 x int].

FOR j := 0 to 3
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
  dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0

Definition at line 255 of file avxvnniint16intrin.h.

◆ _mm_dpwuud_epi32

#define _mm_dpwuud_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m128i)__builtin_ia32_vpdpwuud128((__v4si)(__W), (__v4si)(__A), \

(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W, and store the packed 32-bit results in dst.

__m128i _mm_dpwuud_epi32(__m128i __W, __m128i __A, __m128i __B)

_mm_dpwuud_epi32

#define _mm_dpwuud_epi32(__W, __A, __B)

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16...

Definition: avxvnniint16intrin.h:324

This intrinsic corresponds to the VPDPWUUD instruction.

Parameters

__W	A 128-bit vector of [4 x unsigned int].
__A	A 128-bit vector of [8 x unsigned short].
__B	A 128-bit vector of [8 x unsigned short].

Returns: A 128-bit vector of [4 x unsigned int].

FOR j := 0 to 3
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0

Definition at line 324 of file avxvnniint16intrin.h.

◆ _mm_dpwuuds_epi32

#define _mm_dpwuuds_epi32	(	__W,
		__A,
		__B
	)

Value:

((__m128i)__builtin_ia32_vpdpwuuds128((__v4si)(__W), (__v4si)(__A), \

(__v4si)(__B)))

Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in __A with corresponding unsigned 16-bit integers in __B, producing 2 intermediate signed 16-bit results.

Sum these 2 results with the corresponding 32-bit integer in __W with signed saturation, and store the packed 32-bit results in dst.

__m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A, __m128i __B)

This intrinsic corresponds to the VPDPWSUDS instruction.

Parameters

__W	A 128-bit vector of [4 x unsigned int].
__A	A 128-bit vector of [8 x unsigned short].
__B	A 128-bit vector of [8 x unsigned short].

Returns: A 128-bit vector of [4 x unsigned int].

FOR j := 0 to 3
  tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
  tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
  dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0

Definition at line 393 of file avxvnniint16intrin.h.

Macros

Macro Definition Documentation

◆ _mm256_dpwsud_epi32

◆ _mm256_dpwsuds_epi32

◆ _mm256_dpwusd_epi32

◆ _mm256_dpwusds_epi32

◆ _mm256_dpwuud_epi32

◆ _mm256_dpwuuds_epi32

◆ _mm_dpwsud_epi32

◆ _mm_dpwsuds_epi32

◆ _mm_dpwusd_epi32

◆ _mm_dpwusds_epi32

◆ _mm_dpwuud_epi32

◆ _mm_dpwuuds_epi32