User:Punpcklbw/sandbox

Original Pentium MMX instructions, and SSE2/AVX/AVX-512 extended variants thereof
These instructions are, unless otherwise noted, available in the following forms:
 * MMX: 64-bit vectors, operating on mm0..mm7 registers (aliased on top of the old x87 register file)
 * SSE2: 128-bit vectors, operating on xmm0..xmm15 registers (xmm0..xmm7 in 32-bit mode)
 * AVX: 128-bit vectors, operating on xmm0..xmm15 registers, with a new three-operand encoding enabled by the new VEX prefix. (AVX introduced 256-bit vector registers, but the full width of these vectors was in general not made available for integer SIMD instructions until AVX2.)
 * AVX2: 256-bit vectors, operating on ymm0..ymm15 registers (extended versions of the xmm0..xmm15 registers)
 * AVX-512: 512-bit vectors, operating on zmm0..zmm31 registers. AVX-512 also introduces opmasks, allowing the operation of most instructions to be masked on a per-lane basis by the opmask (the lane width varies from one instruction to another). AVX-512 also adds broadcast functionality for some of its instructions.

MMX instructions added with SSE/SSE2/SSSE3, and SSE2/AVX/AVX-512 extended variants thereof
Instructions other than the above that can touch MMX registers: MOVQ2DQ, MOVDQ2Q, CVTPS2PI, CVTPD2PI, CVTPI2PS, CVTPI2PD, CVTTPS2PI, CVTTPD2PI

Other integer SSE2/4 instructions with 66h prefix, and AVX/AVX-512 extended variants thereof
These instructions do not have any MMX forms, and do not support any encodings without a prefix. Most of these instructions have extended variants available in VEX-encoded and EVEX-encoded forms:
 * The VEX-encoded forms are available under AVX/AVX2. Under AVX, they are available only with a vector length of 128 bits (VEX.L=0 enocding) - under AVX2, they are (with some exceptions noted with "L=0") also made available with a vector length of 256 bits.
 * The EVEX-encoded forms are available under AVX-512 - the specific AVX-512 subset needed for each instruction is listed along with the instruction.

Regularly-encoded floating-point SSE/SSE2 instructions, and AVX/AVX-512 extended variants thereof
For the instructions in the following table, the following considerations apply unless otherwise noted:
 * Packed instructions are available at all vector lengths (128-bit for SSE2, 128/256-bit for AVX, 128/256/512-bit for AVX-512)
 * FP32 variants of instructions are introduced as part of SSE. FP64 variants of instructions are introduced as part of SSE2.
 * The AVX-512 variants of the FP32 and FP64 instructions are introduced as part of the AVX512F subset.
 * For AVX-512 variants of the instructions, opmasks and broadcasts are available with a width of 32 bits for FP32 operations and 64 bits for FP64 operations. (Broadcasts are available for vector operations only.)

Instructions introduced with AVX, AVX2 and F16C
This covers instructions/opcodes that are new to AVX and AVX2.

AVX and AVX2 also include extended VEX-encoded forms of a large number of MMX/SSE instructions - please see tables above.

Some of the AVX/AVX2 instructions also exist in extended EVEX-encoded forms under AVX-512 as well.

Regularly-encoded AVX-512 floating-point instructions
These instructions all follow a given pattern where:
 * EVEX.W is used to specify floating-point format (0=FP32, 1=FP64)
 * The bottom opcode bit is used to select between packed and scalar operation (0: packed, 1:scalar)
 * For a given operation, all the scalar/packed variants belong to the same AVX-512 subset.
 * The instructions all support result masking by opmask registers. Except for the AVX512_4FMAPS instructions, they also all support broadcast of memory operands.
 * Except for the AVX512ER and AVX512_4FMAPS extensions, all vector widths (128-bit, 256-bit and 512-bit) are supported.
 * Except for the AVX512_4FMAPS instructions, all variants support broadcast for memory operands

AVX-512 foundation: opmask instructions
AVX-512 introduces, in addition to 512-bit vectors, a set of eight opmask registers, named k0,k1,k2...k7. These registers are 64 bits wide in implementations that support AVX512BW and 16 bits wide otherwise. They are mainly used to enable/disable operation on a per-lane basis for most of the AVX-512 vector instructions. They are usually set with vector-compare instructions or instructions that otherwise produce a 1-bit per-lane result as a natural part of their operation - however, AVX-512 defines a set of 55 new instructions to help assist manual manipulation of the opmask registers.

These instructions are, for the most part, defined in groups of 4 instructions, where the four instructions in a group are basically just 8-bit, 16-bit, 32-bit and 64-bit variants of the same basic operation (where only the low 8/16/32/64 bits of the registers participate in the given operation and, if a result is written back to a register, all bits except the bottom 8/16/32/64 bits are set to zero). The opmask instructions are all encoded with the VEX prefix (unlike all other AVX-512 instructions, which are encoded with the EVEX prefix).

In general, the 16-bit variants of the instructions are introduced by AVX512F (except  and  ), the 8-bit variants by the AVX512DQ extension, and the 32/64-bit variants by the AVX512BW extension.

Most of the instructions follow a very regular encoding pattern where the four instructions in a group have identical encodings except for the VEX.pp and VEX.W fields:

Not all of the opmask instructions fit the pattern above - the remaining ones are:

AVX-512 foundation: compare, test, blend, opmask-convert
Vector-register instructions that use opmasks in ways other than just as a result writeback mask.