X86 instruction listings

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.

x86 integer instructions
Below is the full 8086/8088 instruction set of Intel (81 instructions total). These instructions are also available in 32-bit mode, they operate instead on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is grouped according to architecture (i186, i286, i386, i486, i586/i686) and is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).

Original 8086/8088 instructions
This is the original instruction set. In the 'Notes' column, r means register, m means memory address and imm means immediate (i.e. a value).

Added with 80286
The new instructions added in 80286 add support for x86 protected mode. Some but not all of the instructions are available in real mode as well.

Added with 80386
The 80386 added support for 32-bit operation to the x86 instruction set. This was done by widening the general-purpose registers to 32 bits and introducing the concepts of OperandSize and AddressSize – most instruction forms that would previously take 16-bit data arguments were given the ability to take 32-bit arguments by setting their OperandSize to 32 bits, and instructions that could take 16-bit address arguments were given the ability to take 32-bit address arguments by setting their AddressSize to 32 bits. (Instruction forms that work on 8-bit data continue to be 8-bit regardless of OperandSize. Using a data size of 16 bits will cause only the bottom 16 bits of the 32-bit general-purpose registers to be modified – the top 16 bits are left unchanged.)

The default OperandSize and AddressSize to use for each instruction is given by the D bit of the segment descriptor of the current code segment -  makes both 16-bit,   makes both 32-bit. Additionally, they can be overridden on a per-instruction basis with two new instruction prefixes that were introduced in the 80386:
 * : OperandSize override. Will change OperandSize from 16-bit to 32-bit if, or from 32-bit to 16-bit if.
 * : AddressSize override. Will change AddressSize from 16-bit to 32-bit if, or from 32-bit to 16-bit if.

The 80386 also introduced the two new segment registers  and   as well as the x86 control, debug and test registers.

The new instructions introduced in the 80386 can broadly be subdivided into two classes: For instruction forms where the operand size can be inferred from the instruction's arguments (e.g.  can be inferred to have a 32-bit OperandSize due to its use of EAX as an argument), new instruction mnemonics are not needed and not provided.
 * Pre-existing opcodes that needed new mnemonics for their 32-bit OperandSize variants (e.g.,  )
 * New opcodes that introduced new functionality (e.g.,  )

Added in P5/P6-class processors
Integer/system instructions that were not present in the basic 80486 instruction set, but were added in various x86 processors prior to the introduction of SSE. (Discontinued instructions are not included.)

Added with x86-64
These instructions can only be encoded in 64 bit mode. They fall in four groups:


 * original instructions that reuse existing opcodes for a different purpose ( replacing  )
 * original instructions with new opcodes
 * existing instructions extended to a 64 bit address size
 * existing instructions extended to a 64 bit operand size (remaining instructions)

Most instructions with a 64 bit operand size encode this using a  prefix; in the absence of the   prefix, the corresponding instruction with 32 bit operand size is encoded. This mechanism also applies to most other instructions with 32 bit operand size. These are not listed here as they do not gain a new mnemonic in Intel syntax when used with a 64 bit operand size.

Bit manipulation extensions
Bit manipulation instructions. For all of the VEX-encoded instructions defined by BMI1 and BMI2, the operand size may be 32 or 64 bits, controlled by the VEX.W bit – none of these instructions are available in 16-bit variants.

Added with Intel CET
Intel CET (Control-Flow Enforcement Technology) adds two distinct features to help protect against security exploits such as return-oriented programming: a shadow stack (CET_SS), and indirect branch tracking (CET_IBT).

x87 floating-point instructions
The x87 coprocessor, if present, provides support for floating-point arithmetic. The coprocessor provides eight data registers, each holding one 80-bit floating-point value (1 sign bit, 15 exponent bits, 64 mantissa bits) – these registers are organized as a stack, with the top-of-stack register referred to as "st" or "st(0)", and the other registers referred to as st(1),st(2),...st(7). It additionally provides a number of control and status registers, including "PC" (precision control, to control whether floating-point operations should be rounded to 24, 53 or 64 mantissa bits) and "RC" (rounding control, to pick rounding-mode: round-to-zero, round-to-positive-infinity, round-to-negative-infinity, round-to-nearest-even) and a 4-bit condition code register "CC", whose four bits are individually referred to as C0,C1,C2 and C3). Not all of the arithmetic instructions provided by x87 obey PC and RC.

MMX instructions
MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU registers.

Original MMX instructions
Added with Pentium MMX

MMX instructions added with MMX+ and SSE
The following MMX instruction were added with SSE. They are also available on the Athlon under the name MMX+.

MMX instructions added with SSE2
The following MMX instructions were added with SSE2:

SSE instructions
Added with Pentium III

SSE instructions operate on xmm registers, which are 128 bit wide.

SSE consists of the following SSE SIMD floating-point instructions:

* The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce the same result as the SSE2 integer (PAND, PANDN, POR, PXOR) and double ones (ANDPD, ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied values of the wrong type.

SSE2 instructions
Added with Pentium 4

SSE2 conversion instructions

 * CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS); however, the former refer to scalar double-precision floating-points whereas the latter refer to doubleword strings. Assemblers disambiguate them based on the presence or absence of operands.

SSE2 MMX-like instructions extended to SSE registers
SSE2 allows execution of MMX instructions on SSE registers, processing twice the amount of data at once.

SSE2 integer instructions for SSE registers only
The following instructions can be used only on SSE registers, since by their nature they do not work on MMX registers

SSE3 instructions
Added with Pentium 4 supporting SSE3

SSSE3 instructions
Added with Xeon 5100 series and initial Core 2

The following MMX-like instructions extended to SSE registers were added with SSSE3

SSE4.1
Added with Core 2 manufactured in 45nm

SSE4a
Added with Phenom processors

SSE4.2
Added with Nehalem processors

F16C
Half-precision floating-point conversion.

AVX
AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.

Vector operations on 256 bit registers.

AVX2
Introduced in Intel's Haswell microarchitecture and AMD's Excavator.

Expansion of most vector integer SSE and AVX instructions to 256 bits

FMA3 and FMA4 instructions
Floating-point fused multiply-add instructions are introduced in x86 as two instruction set extensions, "FMA3" and "FMA4", both of which build on top of AVX to provide a set of scalar/vector instructions using the xmm/ymm/zmm vector registers. FMA3 defines a set of 3-operand fused-multiply-add instructions that take three input operands and writes its result back to the first of them. FMA4 defines a set of 4-operand fused-multiply-add instructions that take four input operands – a destination operand and three source operands.

FMA3 is supported on Intel CPUs starting with Haswell, on AMD CPUs starting with Piledriver, and on Zhaoxin CPUs starting with YongFeng. FMA4 was only supported on AMD Family 15h (Bulldozer) CPUs and has been abandoned from AMD Zen onwards. The FMA3/FMA4 extensions are not considered to be an intrinsic part of AVX or AVX2, although all Intel and AMD (but not Zhaoxin) processors that support AVX2 also support FMA3. FMA3 instructions (in EVEX-encoded form) are, however, AVX-512 foundation instructions.

The FMA3 and FMA4 instruction sets both define a set of 10 fused-multiply-add operations, all available in FP32 and FP64 variants. For each of these variants, FMA3 defines three operand orderings while FMA4 defines two.

FMA3 encoding

FMA3 instructions are encoded with the VEX or EVEX prefixes – on the form  or. The VEX.W/EVEX.W bit selects floating-point format (W=0 means FP32, W=1 means FP64). The opcode byte  consists of two nibbles, where the top nibble   selects operand ordering ( ='132',  ='213',  ='231') and the bottom nibble   (values 6..F) selects which one of the 10 fused-multiply-add operations to perform. ( and   outside the given ranges will result in something that is not an FMA3 instruction.)

At the assembly language level, the operand ordering is specified in the mnemonic of the instruction: For all FMA3 variants, the first two arguments must be xmm/ymm/zmm vector register arguments, while the last argument may be either a vector register or memory argument. Under AVX-512, the EVEX-encoded variants support EVEX-prefix-encoded broadcast, opmasks and rounding-controls.
 * will perform
 * will perform
 * will perform

The AVX512-FP16 extension, introduced in Sapphire Rapids, adds FP16 variants of the FMA3 instructions – these all take the form  with the opcode byte working in the same way as for the FP32/FP64 variants. (For the FMA4 instructions, no FP16 variants are defined.)

FMA4 encoding

FMA4 instructions are encoded with the VEX prefix, on the form  (no EVEX encodings are defined). The opcode byte  uses its bottom bit to select floating-point format (0=FP32, 1=FP64) and the remaining bits to select one of the 10 fused-multiply-add operations to perform.

For FMA4, operand ordering is controlled by the VEX.W bit. If VEX.W=0, then the third operand is the r/m operand specified by the instruction's ModR/M byte and the fourth operand is a register operand, specified by bits 7:4 of the ib (8-bit immediate) part of the instruction. If VEX.W=1, then these two operands are swapped. For example:
 * will perform  and require a W=0 encoding.
 * will perform  and require a W=1 encoding.
 * will perform  and can be encoded with either W=0 or W=1.

Opcode table

The 10 fused-multiply-add operations and the 110 instruction variants they give rise to are given by the following table – with FMA4 instructions highlighted with * and yellow cell coloring, and FMA3 instructions not highlighted:

AVX-512
AVX-512, introduced in 2014, adds 512-bit wide vector registers (extending the 256-bit registers, which become the new registers' lower halves) and doubles their count to 32; the new registers are thus named zmm0 through zmm31. It adds eight mask registers, named k0 through k7, which may be used to restrict operations to specific parts of a vector register. Unlike previous instruction set extensions, AVX-512 is implemented in several groups; only the foundation ("AVX-512F") extension is mandatory. Most of the added instructions may also be used with the 256- and 128-bit registers.

AMX
Intel AMX adds eight new tile-registers, -, each holding a matrix, with a maximum capacity of 16 rows of 64 bytes per tile-register. It also adds a  register to configure the sizes of the actual matrices held in each of the eight tile-registers, and a set of instructions to perform matrix multiplications on these registers.

Intel AES instructions
6 new instructions.

Intel SHA instructions
7 new instructions.

Intel AES Key Locker instructions
These instructions, available in Tiger Lake and later Intel processors, are designed to enable encryption/decryption with an AES key without having access to any unencrypted copies of the key during the actual encryption/decryption process.

VIA PadLock instructions
The VIA/Zhaoxin PadLock instructions are instructions designed to apply cryptographic primitives in bulk, similar to the 8086 repeated string instructions. As such, unless otherwise specified, they take, as applicable, pointers to source data in ES:rSI and destination data in ES:rDI, and a data-size or count in rCX. Like the old string instructions, they are all designed to be interruptible.

Intel VT-x instructions
Intel virtualization instructions. VT-x is also supported on some processors from VIA and Zhaoxin.

Other instructions
x86 also includes discontinued instruction sets which are no longer supported by Intel and AMD, and undocumented instructions which execute but are not officially documented.

Undocumented x86 instructions
The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some official documents. They can be found in various sources across the Internet, such as Ralf Brown's Interrupt List and at sandpile.org

Some of these instructions are widely available across many/most x86 CPUs, while others are specific to a narrow range of CPUs.