ARM Cortex-A77

The ARM Cortex-A77 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM announced an increase of 23% and 35% in integer and floating point performance, respectively. Memory bandwidth increased 15% relative to the A76.

Design
The Cortex-A77 serves as the successor of the Cortex-A76. The Cortex-A77 is a 4-wide decode out-of-order superscalar design with a new 1.5K macro-OP (MOPs) cache. It can fetch 4 instructions and 6 Mops per cycle. And rename and dispatch 6 Mops, and 13 μops per cycle. The out-of-order window size has been increased to 160 entries. The backend is 12 execution ports with a 50% increase over Cortex-A76. It has a pipeline depth of 13 stages and the execution latencies of 10 stages.

There are six pipelines in the integer cluster – an increase of two additional integer pipelines from Cortex-A76. One of the changes from Cortex-A76 is the unification of the issue queues. Previously each pipeline had its own issue queue. On Cortex-A77, there is now a single unified issue queue which improves efficiency. Cortex-A77 added a new fourth general math ALU with a typical 1-cycle simple math operations and some 2-cycle more complex operations. In total, there are three simple ALUs that perform arithmetic and logical data processing operations and a fourth port which has support for complex arithmetic (e.g. MAC, DIV). Cortex-A77 also added a second branch ALU, doubling the throughput for branches.

There are two ASIMD/FP execution pipelines. This is unchanged from Cortex-A76. What did change is the issue queues. As with the integer cluster, the ASIMD cluster now features a unified issue queue for both pipelines, improving efficiency. As with Cortex-A76, the ASIMD on Cortex-A77 are both 128-bit wide capable of 2 double-precision operations, 4 single-precision, 8 half-precision, or 16 8-bit integer operations. Those pipelines can also execute the cryptographic instructions if the extension is supported (not offered by default and requires an additional license from Arm). Cortex-A77 added a second AES unit in order to improve the throughput of cryptography operations.

Larger ROB, Up to 160-entry, up from 128, Add New L0 MOP cache, can up to 1536-entry.

The core supports unprivileged 32-bit applications, but privileged applications must utilize the 64-bit ARMv8-A ISA. It also supports Load acquire (LDAPR) instructions (ARMv8.3-A), Dot Product instructions (ARMv8.4-A), and PSTATE Speculative Store Bypass Safe (SSBS) bit instructions (ARMv8.5-A).

The Cortex-A77 supports ARM's DynamIQ technology, and is expected to be used as high-performance cores in combination with Cortex-A55 power-efficient cores.

Architecture changes in comparison with ARM Cortex-A76

 * Front-end
 * Branch-prediction
 * Better accuracy
 * Up to 64B runahead window (From 32B)
 * Increase L1 BRB capacity, up to 64-entry (From 16-entry)
 * Increase BTB capacity, up to 8K-entry (From 6K-entry)
 * Improved prefetcher
 * Add new L0 Macro-op cache
 * Wider instruction fetch, up to 6 instructions/cycle (From 4 instructions/cycle)
 * Execution engine
 * Wider instruction fetch, Up to 6 instructions/cycle (From 4 instructions/cycle)
 * Larger Re-Order Buffer, Up to 160-entry (From 128-entry)
 * Wider dispatch, up to 10-way, (From 8-way)
 * Wider issue, up to 12-way (From 8-way)
 * Execution units
 * New integer ALU unit and port
 * New branch unit and port
 * New dedicated store data ports
 * New AES unit added

Licensing
The Cortex-A77 is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).

Usage
The Samsung Exynos 980 was introduced in September 2019 as the first SoC to use the Cortex-A77 microarchitecture. This was later followed by a lower-end variant Exynos 880 in May 2020. The MediaTek Dimensity 1000, 1000L and 1000+ SoCs also utilizes the Cortex-A77 microarchitecture. Derivatives by the names of Kryo 585, Kryo 570 and Kryo 560, are used in the Snapdragon 865, 750G, and 690 respectively. HiSilicon uses the Cortex-A77 at two different frequencies in their Kirin 9000 series.

Both its predecessor (Cortex-A76) and its successor (Cortex-A78) had automotive variants with Split-Lock capability, the Cortex-A76AE and Cortex-A78AE, but the Cortex-A77 did not, thus not finding its way into security critical applications.