Meltdown (security vulnerability)

Meltdown is one of the two original transient execution CPU vulnerabilities (the other being Spectre). Meltdown affects Intel x86 microprocessors, IBM Power microprocessors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

Meltdown affects a wide range of systems. At the time of disclosure (2018), this included all devices running any but the most recent and patched versions of iOS, Linux, macOS, or Windows. Accordingly, many servers and cloud services were impacted, as well as a potential majority of smart devices and embedded devices using ARM-based processors (mobile devices, smart TVs, printers and others), including a wide range of networking equipment. A purely software workaround to Meltdown has been assessed as slowing computers between 5 and 30 percent in certain specialized workloads, although companies responsible for software correction of the exploit reported minimal impact from general benchmark testing.

Meltdown was issued a Common Vulnerabilities and Exposures ID of, also known as Rogue Data Cache Load (RDCL), in January 2018. It was disclosed in conjunction with another exploit, Spectre, with which it shares some characteristics. The Meltdown and Spectre vulnerabilities are considered "catastrophic" by security analysts. The vulnerabilities are so severe that security researchers initially believed the reports to be false.

Several procedures to help protect home computers and related devices from the Meltdown and Spectre security vulnerabilities have been published. Meltdown patches may produce performance loss. Spectre patches have been reported to significantly reduce performance, especially on older computers; on the then-newest (2017) eighth-generation Core platforms, benchmark performance drops of 2–14 percent have been measured. On 18 January 2018, unwanted reboots, even for newer Intel chips, due to Meltdown and Spectre patches, were reported. Nonetheless, according to Dell, "No 'real-world' exploits of these vulnerabilities [i.e., Meltdown and Spectre] have been reported to date [26 January 2018], though researchers have produced proof-of-concepts." Dell further recommended "promptly adopting software updates, avoiding unrecognized hyperlinks and websites, not downloading files or applications from unknown sources ... following secure password protocols ... [using] security software to help protect against malware (advanced threat prevention software or anti-virus)."

On 15 March 2018, Intel reported that it would redesign its CPUs to help protect against the Meltdown and related Spectre vulnerabilities (especially, Meltdown and Spectre-V2, but not Spectre-V1), and expected to release the newly redesigned processors later in 2018. On 8 October 2018, Intel is reported to have added hardware and firmware mitigations regarding Spectre and Meltdown vulnerabilities to its latest processors.

Overview
Meltdown exploits a race condition, inherent in the design of many modern CPUs. This occurs between memory access and privilege checking during instruction processing. Additionally, combined with a cache side-channel attack, this vulnerability allows a process to bypass the normal privilege checks that isolate the exploit process from accessing data belonging to the operating system and other running processes. The vulnerability allows an unauthorized process to read data from any address that is mapped to the current process's memory space. Because the affected processors implement instruction pipelining, the data from an unauthorized address will almost always be temporarily loaded into the CPU's cache during out-of-order execution – from which the data can be recovered. This can occur even if the original read instruction fails due to privilege checking, or if it never produces a readable result.

Since many operating systems map physical memory, kernel processes, and other running user space processes into the address space of every process, Meltdown effectively makes it possible for a rogue process to read any physical, kernel or other processes' mapped memory – regardless of whether it should be able to do so. Defenses against Meltdown would require avoiding the use of memory mapping in a manner vulnerable to such exploits (i.e. a software-based solution) or avoidance of the underlying race condition (i.e. a modification to the CPUs' microcode or execution path).

The vulnerability is viable on any operating system in which privileged data is mapped into virtual memory for unprivileged processes – which includes many present-day operating systems. Meltdown could potentially impact a wider range of computers than presently identified, as there is little to no variation in the microprocessor families used by these computers.

A Meltdown attack cannot be detected if it is carried out, as it does not leave any traces in traditional log files.

History
Meltdown was discovered independently by Jann Horn from Google's Project Zero, Werner Haas and Thomas Prescher from Cyberus Technology, and Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Graz University of Technology. The same research teams that discovered Meltdown also discovered Spectre. The security vulnerability was called Meltdown because "the vulnerability basically melts security boundaries which are normally enforced by the hardware".


 * On 8 May 1995, a paper called "The Intel 80x86 Processor Architecture: Pitfalls for Secure Systems" published at the 1995 IEEE Symposium on Security and Privacy warned against a covert timing channel in the CPU cache and translation lookaside buffer (TLB). This analysis was performed under the auspices of the National Security Agency's Trusted Products Evaluation Program (TPEP).
 * In July 2012, Apple's XNU kernel (used in macOS, iOS, and tvOS, among others) adopted kernel address space layout randomization (KASLR) with the release of OS X Mountain Lion 10.8. In essence, the base of the system, including its kernel extensions (kexts) and memory zones, is randomly relocated during the boot process in an effort to reduce the operating system's vulnerability to attacks.
 * In March 2014, the Linux kernel adopted KASLR to mitigate address leaks.
 * On 8 August 2016, Anders Fogh and Daniel Gruss presented "Using Undocumented CPU Behavior to See Into Kernel Mode and Break KASLR in the Process" at the Black Hat 2016 conference.
 * On 10 August 2016, Moritz Lipp et al. of Graz University of Technology published "ARMageddon: Cache Attacks on Mobile Devices" in the proceedings of the 25th USENIX security symposium. Even though focused on ARM, it laid the groundwork for the attack vector.
 * On 27 December 2016, at 33C3, Clémentine Maurice and Moritz Lipp of Graz University of Technology presented their talk "What could possibly go wrong with &lt;insert x86 instruction here&gt;? Side effects include side-channel attacks and bypassing kernel ASLR" which outlined already what was coming.
 * On 1 February 2017, the CVE numbers 2017–5715, 2017-5753 and 2017-5754 were assigned to Intel.
 * On 27 February 2017, Bosman et al. of Vrije Universiteit Amsterdam published their findings of how address space layout randomization (ASLR) could be abused on cache-based architectures at the NDSS Symposium.
 * On 27 March 2017, researchers at Graz University of Technology developed a proof-of-concept that could grab RSA keys from Intel SGX enclaves running on the same system within five minutes by using certain CPU instructions in lieu of a fine-grained timer to exploit cache DRAM side-channels.
 * In June 2017, KASLR was found to have a large class of new vulnerabilities. Research at Graz University of Technology showed how to solve these vulnerabilities by preventing all access to unauthorized pages. A presentation on the resulting KAISER technique was submitted for the Black Hat congress in July 2017, but was rejected by the organizers. Nevertheless, this work led to kernel page-table isolation (KPTI, originally known as KAISER) in 2017, which was confirmed to eliminate a large class of security bugs, including some limited protection against the not-yet-discovered Meltdown – a fact confirmed by the Meltdown authors.
 * In July 2017, research made public on the CyberWTF website by security researcher Anders Fogh outlined the use of a cache timing attack to read kernel space data by observing the results of speculative operations conditioned on data fetched with invalid privileges.
 * In October 2017, KASLR support on amd64 was added to NetBSD-current, making NetBSD the first totally open-source BSD system to support kernel address space layout randomization. However, the partially open-source Apple Darwin, which forms the foundation of macOS and iOS (among others), is based on FreeBSD; KASLR was added to its XNU kernel in 2012 as noted above.
 * On 14 November 2017, security researcher Alex Ionescu publicly mentioned changes in the new version of Windows 10 that would cause some speed degradation without explaining the necessity for the changes, just referring to similar changes in Linux.
 * After affected hardware and software vendors had been made aware of the issue on 28 July 2017, the two vulnerabilities were made public jointly, on 3 January 2018, several days ahead of the coordinated release date of 9 January 2018 as news sites started reporting about commits to the Linux kernel and mails to its mailing list. As a result, patches were not available for some platforms, such as Ubuntu, when the vulnerabilities were disclosed.
 * On 28 January 2018, Intel was reported to have shared news of the Meltdown and Spectre security vulnerabilities with Chinese technology companies before notifying the U.S. government of the flaws.
 * On 8 October 2018, Intel was reported to have added hardware and firmware mitigations regarding Spectre and Meltdown vulnerabilities to its latest processors.
 * In November 2018, two new variants of the attacks were revealed. Researchers attempted to compromise CPU protection mechanisms using code to exploit weaknesses in memory protection and the BOUND instruction. They also attempted but failed to exploit CPU operations for memory alignment, division by zero, supervisor modes, segment limits, invalid opcodes, and non-executable code.

Mechanism
Meltdown relies on a CPU race condition that can arise between instruction execution and privilege checking. Put briefly, the instruction execution leaves side effects that constitute information not hidden to the process by the privilege check. The process carrying out Meltdown then uses these side effects to infer the values of memory mapped data, bypassing the privilege check. The following provides an overview of the exploit, and the memory mapping that is its target. The attack is described in terms of an Intel processor running Microsoft Windows or Linux, the main test targets used in the original paper, but it also affects other processors and operating systems, including macOS (aka OS X), iOS, and Android.

Background – modern CPU design
Modern computer processors use a variety of techniques to gain high levels of efficiency. Four widely used features are particularly relevant to Meltdown:
 * Virtual (paged) memory, also known as memory mapping – used to make memory access more efficient and to control which processes can access which areas of memory.
 * A modern computer usually runs many processes in parallel. In an operating system such as Windows or Linux, each process is given the impression that it alone has complete use of the computer's physical memory, and may do with it as it likes. In reality it will be allocated memory to use from the physical memory, which acts as a "pool" of available memory, when it first tries to use any given memory address (by trying to read or write to it). This allows multiple processes, including the kernel or operating system itself, to co-habit on the same system, but retain their individual activity and integrity without being affected by other running processes, and without being vulnerable to interference or unauthorized data leaks caused by a rogue process.
 * Privilege levels, or protection domains – provide a means by which the operating system can control which processes are authorized to read which areas of virtual memory.
 * As virtual memory permits a computer to refer to vastly more memory than it will ever physically contain, the system can be greatly sped up by "mapping" every process and their in-use memory – in effect all memory of all active processes – into every process's virtual memory. In some systems all physical memory is mapped as well, for further speed and efficiency. This is usually considered safe, because the operating system can rely on privilege controls built into the processor itself, to limit which areas of memory any given process is permitted to access. An attempt to access authorized memory will immediately succeed, and an attempt to access unauthorized memory will cause an exception and void the read instruction, which will fail. Either the calling process or the operating system directs what will happen if an attempt is made to read from unauthorized memory – typically it causes an error condition and the process that attempted to execute the read will be terminated. As unauthorized reads are usually not part of normal program execution, it is much faster to use this approach than to pause the process every time it executes some function that requires privileged memory to be accessed, to allow that memory to be mapped into a readable address space. If the operating system immediately suspends and terminates any process that attempts to access an unauthorized memory location, then the process will have tricked the processor into accessing only one illegal memory address for it, and probably not have time to analyze the side-effects and store the result. If the operating system allows the process to simply recover from the access violation and continue, then the process can analyze the side-effects, store the result, and repeat the sequence millions of times per second, thereby uncovering thousands of bytes of privileged data per second.
 * Instruction pipelining and speculative execution – used to allow instructions to execute in the most efficient manner possible – if necessary allowing them to run out of order or in parallel across various processing units within the CPU – so long as the outcome is correct.
 * Modern processors commonly contain numerous separate execution units, and a scheduler that decodes instructions and decides, at the time they are executed, the most efficient way to execute them. This might involve the decision that two instructions can execute at the same time, or even out of order, on different execution units (known as "instruction pipelining"). So long as the correct outcome is still achieved, this maximizes efficiency by keeping all of the processor's execution units in use as much as possible. Some instructions, such as conditional branches, will lead to one of two different outcomes, depending on a condition. For example, if a value is 0, it will take one action, and otherwise will take a different action. In some cases, the CPU may not yet know which branch to take. This may be because a value is uncached. Rather than wait to learn the correct option, the CPU may proceed immediately (speculative execution). If so, it can either guess the correct option (predictive execution) or even take both (eager execution). If it executes the incorrect option, the CPU will attempt to discard all effects of its incorrect guess.
 * CPU cache – a modest amount of memory within the CPU used to ensure it can work at high speed, to speed up memory access, and to facilitate "intelligent" execution of instructions in an efficient manner.
 * From the perspective of a CPU, the computer's physical memory is slow to access. Also the instructions a CPU runs are very often repetitive, or access the same or similar memory numerous times. To maximize efficient use of the CPU's resources, modern CPUs often have a modest amount of very fast on-chip memory, known as CPU cache. When data is accessed or an instruction is read from physical memory, a copy of that information is routinely saved in the CPU cache at the same time. If the CPU later needs the same instruction or memory contents again, it can obtain it with minimal delay from its own cache rather than waiting for a request related to physical memory to take place.

Meltdown exploit
Ordinarily, the mechanisms described above are considered secure. They provide the basis for most modern operating systems and processors. Meltdown exploits the way these features interact to bypass the CPU's fundamental privilege controls and access privileged and sensitive data from the operating system and other processes. To understand Meltdown, consider the data that is mapped in virtual memory (much of which the process is not supposed to be able to access) and how the CPU responds when a process attempts to access unauthorized memory. The process is running on a vulnerable version of Windows, Linux, or macOS, on a 64-bit processor of a vulnerable type. This is a very common combination across almost all desktop computers, notebooks, laptops, servers and mobile devices.


 * 1) The CPU encounters an instruction accessing the value, A, at an address forbidden to the process by the virtual memory system and the privilege check. Because of speculative execution, the instruction is scheduled and dispatched to an execution unit. This execution unit then schedules both the privilege check and the memory access.
 * 2) The CPU encounters an instruction accessing address Base+A, with Base chosen by the attacker. This instruction is also scheduled and dispatched to an execution unit.
 * 3) The privilege check informs the execution unit that the address of the value, A, involved in the access is forbidden to the process (per the information stored by the virtual memory system), and thus the instruction should fail and subsequent instructions should have no effect. Because these instructions were speculatively executed, however, the data at Base+A may have been cached before the privilege check – and may not have been undone by the execution unit (or any other part of the CPU). If this is indeed the case, the mere act of caching constitutes a leak of information in and of itself. At this point, Meltdown intervenes.
 * 4) The process executes a timing attack by executing instructions referencing memory operands directly. To be effective, the operands of these instructions must be at addresses which cover the possible address, Base+A, of the rejected instruction's operand. Because the data at the address referred to by the rejected instruction, Base+A, was cached nevertheless, an instruction referencing the same address directly will execute faster. The process can detect this timing difference and determine the address, Base+A, that was calculated for the rejected instruction – and thus determine the value A at the forbidden memory address.

Meltdown uses this technique in sequence to read every address of interest at high speed, and depending on other running processes, the result may contain passwords, encryption data, and any other sensitive information, from any address of any process that exists in its memory map. In practice, because cache side-channel attacks are slow, it is faster to extract data one bit at a time (only 2 × 8 = 16 cache attacks needed to read a byte, rather than 256 steps if it tried to read all 8 bits at once).

Impact
The impact of Meltdown depends on the design of the CPU, the design of the operating system (specifically how it uses memory paging), and the ability of a malicious party to get any code run on that system, as well as the value of any data it could read if able to execute.
 * CPU – Many of the most widely used modern CPUs from the late 1990s until early 2018 have the required exploitable design. However, it is possible to mitigate it within CPU design. A CPU that could detect and avoid memory access for unprivileged instructions, or was not susceptible to cache timing attacks or similar probes, or removed cache entries upon non-privilege detection (and did not allow other processes to access them until authorized) as part of abandoning the instruction, would not be able to be exploited in this manner. Some observers consider that all software solutions will be "workarounds" and the only true solution is to update affected CPU designs and remove the underlying weakness.
 * Operating system – Most of the widely used and general-purpose operating systems use privilege levels and virtual memory mapping as part of their design. Meltdown can access only those pages that are memory mapped so the impact will be greatest if all active memory and processes are memory mapped in every process and have the least impact if the operating system is designed so that almost nothing can be reached in this manner. An operating system might also be able to mitigate in software to an extent by ensuring that probe attempts of this kind will not reveal anything useful. Modern operating systems use memory mapping to increase speed so this could lead to performance loss.
 * Virtual machine – A Meltdown attack cannot be used to break out of a virtual machine, i.e., in fully virtualized machines guest user space can still read from guest kernel space, but not from host kernel space. The bug enables reading memory from address space represented by the same page table, meaning the bug does not work between virtual tables. That is, guest-to-host page tables are unaffected, only guest-to-same-guest or host-to-host, and of course host-to-guest since the host can already access the guest pages. This means different VMs on the same fully virtualized hypervisor cannot access each other's data, but different users on the same guest instance can access each other's data.
 * Embedded device – Among the vulnerable chips are those designed by ARM and Intel designed for standalone and embedded devices, such as mobile phones, smart TVs, networking equipment, vehicles, hard drives, industrial control, and the like. As with all vulnerabilities, if a third party cannot run code on the device, its internal vulnerabilities remain unexploitable. For example, an ARM processor in a cellphone or Internet of Things "smart" device may be vulnerable, but the same processor used in a device that cannot download and run new code, such as a kitchen appliance or hard drive controller, is believed to not be exploitable.

The specific impact depends on the implementation of the address translation mechanism in the OS and the underlying hardware architecture. The attack can reveal the content of any memory that is mapped into a user address space, even if otherwise protected. For example, before kernel page-table isolation was introduced, most versions of Linux mapped all physical memory into the address space of every user-space process; the mapped addresses are (mostly) protected, making them unreadable from user-space and accessible only when transitioned into the kernel. The existence of these mappings makes transitioning to and from the kernel faster, but is unsafe in the presence of the Meltdown vulnerability, as the contents of all physical memory (which may contain sensitive information such as passwords belonging to other processes or the kernel) can then be obtained via the above method by any unprivileged process from user-space.

According to researchers, "every Intel processor that implements out-of-order execution is potentially affected, which is effectively every processor since 1995 (except Intel Itanium, and Intel Atom before 2013)." Intel responded to the reported security vulnerabilities with an official statement.

The vulnerability is expected to impact major cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform. Cloud providers allow users to execute programs on the same physical servers where sensitive data might be stored, and rely on safeguards provided by the CPU to prevent unauthorized access to the privileged memory locations where that data is stored, a feature that the Meltdown exploit circumvents.

The original paper reports that paravirtualization (Xen) and containers such as Docker, LXC, and OpenVZ, are affected. They report that the attack on a fully virtualized machine allows the guest user space to read from the guest kernel memory, but not read from the host kernel space.

Affected hardware
The Meltdown vulnerability primarily affects Intel microprocessors, but the ARM Cortex-A75 and IBM's Power microprocessors are also affected. The vulnerability does not affect AMD microprocessors. When the effect of Meltdown was first made public, Intel countered that the flaws affect all processors, but AMD denied this, saying "we believe AMD processors are not susceptible due to our use of privilege level protections within paging architecture".

Researchers have indicated that the Meltdown vulnerability is exclusive to Intel processors, while the Spectre vulnerability can possibly affect some Intel, AMD, and ARM processors. However, ARM announced that some of their processors were vulnerable to Meltdown. Google has reported that any Intel processor since 1995 with out-of-order execution is potentially vulnerable to the Meltdown vulnerability (this excludes Itanium and pre-2013 Intel Atom CPUs). Intel introduced speculative execution to their processors with Intel's P6 family microarchitecture with the Pentium Pro IA-32 microprocessor in 1995.

ARM has reported that the majority of their processors are not vulnerable, and published a list of the specific processors that are affected. The ARM Cortex-A75 core is affected directly by both Meltdown and Spectre vulnerabilities, and Cortex-R7, Cortex-R8, Cortex-A8, Cortex-A9, Cortex-A15, Cortex-A17, Cortex-A57, Cortex-A72, and Cortex-A73 cores are affected only by the Spectre vulnerability. This contradicts some early statements made about the Meltdown vulnerability as being Intel-only.

A large portion of the then-current mid-range Android handsets use the Cortex-A53 or Cortex-A55 in an octa-core arrangement and are not affected by either the Meltdown or Spectre vulnerability as they do not perform out-of-order execution. This includes devices with the Qualcomm Snapdragon 630, Snapdragon 626, Snapdragon 625, and all Snapdragon 4xx processors based on A53 or A55 cores. Also, no Raspberry Pi computers are vulnerable to either Meltdown or Spectre, except the newly released Raspberry Pi 4, which uses the ARM Cortex-A72 CPU.

IBM has also confirmed that its Power CPUs are affected by both CPU attacks. Red Hat has publicly announced that the exploits are also for IBM System Z, POWER8, and POWER9 systems.

Oracle has stated that V9-based SPARC systems (T5, M5, M6, S7, M7, M8, M10, M12 processors) are not affected by Meltdown, though older SPARC processors that are no longer supported may be impacted.

Mitigation
Mitigation of the vulnerability requires changes to operating system kernel code, including increased isolation of kernel memory from user-mode processes. Linux kernel developers have referred to this measure as kernel page-table isolation (KPTI). KPTI patches have been developed for Linux kernel 4.15, and have been released as a backport in kernels 4.14.11 and 4.9.75. Red Hat released kernel updates to their Red Hat Enterprise Linux distributions version 6 and version 7. CentOS also already released their kernel updates to CentOS 6 and CentOS 7.

Apple included mitigations in macOS 10.13.2, iOS 11.2, and tvOS 11.2. These were released a month before the vulnerabilities were made public. Apple has stated that watchOS and the Apple Watch are not affected. Additional mitigations were included in a Safari update as well a supplemental update to macOS 10.13, and iOS 11.2.2.

Microsoft released an emergency update to Windows 10, 8.1, and 7 SP1 to address the vulnerability on 3 January 2018,  as well as Windows Server (including Server 2008 R2, Server 2012 R2, and Server 2016) and Windows Embedded Industry. These patches are incompatible with third-party antivirus software that use unsupported kernel calls; systems running incompatible antivirus software will not receive this or any future Windows security updates until it is patched, and the software adds a special registry key affirming its compatibility. The update was found to have caused issues on systems running certain AMD CPUs, with some users reporting that their Windows installations did not boot at all after installation. On 9 January 2018, Microsoft paused the distribution of the update to systems with affected CPUs while it investigated and addressed this bug.

It was reported that implementation of KPTI may lead to a reduction in CPU performance, with some researchers claiming up to 30% loss in performance, depending on usage, though Intel considered this to be an exaggeration. It was reported that Intel processor generations that support process-context identifiers (PCID), a feature introduced with Westmere and available on all chips from the Haswell architecture onward, were not as susceptible to performance losses under KPTI as older generations that lack it. This is because the selective translation lookaside buffer (TLB) flushing enabled by PCID (also called address space number or ASN under the Alpha architecture) enables the shared TLB behavior crucial to the exploit to be isolated across processes, without constantly flushing the entire cache – the primary reason for the cost of mitigation.

A statement by Intel said that "any performance impacts are workload-dependent, and, for the average computer user, should not be significant and will be mitigated over time". Phoronix benchmarked several popular PC games on a Linux system with Intel's Coffee Lake Core i7-8700K CPU and KPTI patches installed, and found that any performance impact was small to non-existent. In other tests, including synthetic I/O benchmarks and databases such as PostgreSQL and Redis, an impact in performance was found, accounting even to tens of percent for some workloads. More recently, related tests, involving AMD's FX and Intel's Sandybridge and Ivybridge CPUs, have been reported.

Several procedures to help protect home computers and related devices from the Meltdown and Spectre security vulnerabilities have been published. Meltdown patches may produce performance loss. On 18 January 2018, unwanted reboots, even for newer Intel chips, due to Meltdown and Spectre patches, were reported. According to Dell, "No 'real-world' exploits of these vulnerabilities [ie, Meltdown and Spectre] have been reported to date [26 January 2018], though researchers have produced proof-of-concepts." Dell further recommended "promptly adopting software updates, avoiding unrecognized hyperlinks and websites, not downloading files or applications from unknown sources ... following secure password protocols ... [using] security software to help protect against malware (advanced threat prevention software or anti-virus)."

On 25 January 2018, the current status and possible future considerations in solving the Meltdown and Spectre vulnerabilities were presented. In March 2018, Intel announced that it had designed hardware fixes for future processors for Meltdown and Spectre-V2 only, but not Spectre-V1. The vulnerabilities were mitigated by a new partitioning system that improves process and privilege-level separation. The company also announced it had developed workarounds in microcode for processors dating back to 2013, and that it had plans to develop them for most processors dating back to 2007 including the Core 2 Duo; however, a month later in April 2018, it announced it was backing off that plan for a number of processor families and that no processor earlier than 2008 would have a patch available.

On 8 October 2018, Intel was reported to have added hardware and firmware mitigations regarding Spectre and Meltdown vulnerabilities to its latest processors.