The Machine (computer architecture)

The Machine is the name of an experimental computer made by Hewlett Packard Enterprise. It was created as part of a research project to develop a new type of computer architecture for servers. The design focused on a “memory centric computing” architecture, where NVRAM replaced traditional DRAM and disks in the memory hierarchy. The NVRAM was byte addressable and could be accessed from any CPU via a photonic interconnect. The aim of the project was to build and evaluate this new design.

Hardware overview
The Machine was a computer cluster with many individual nodes connected over a memory fabric. The fabric interconnect used VCSEL-based silicon photonics with a custom chip called the X1. Access to memory is non-uniform and may include multiple hops. The Machine was envisioned to be a rack-scale computer initially with 80 processors and 320 TB of fabric attached memory, with potential for scaling to more enclosures up to 32 ZB. The fabric attached memory is not cache coherent and requires software to be aware of this property. Since traditional locks need cache coherency, hardware was added to the bridges to do atomic operations at that level. Each node also has a limited amount of local private cache-coherent memory (256 GB). Storage and compute on each node had completely separate power domains. The whole fabric attached memory of The Machine is too large to be mapped into a processor's virtual address space (which was 48-bits wide ). A way is needed to map windows of the fabric attached memory into processor memory. Therefore, communication between each node SoC and the memory pool goes through an FPGA-based “Z-bridge” component that manages memory mapping of the local SoC to the fabric attached memory. The Z-bridge deals with two different kinds of addresses: 53-bit logical Z addresses and 75-bit Z addresses, which allows addressing 8PB and 32ZB respectively. Each Z-bridge also contained a firewall to enforce access control. The interconnect protocol was developed in-house and known as Next Generation Memory Interconnect (NGMI). This protocol evolved into the open Gen-Z standard. The Z-bridge connects to the SoC using PCIe, avoiding major software changes.

A half rack prototype of the machine was unveiled at HPE Discover in London in 2016. Each node contained ARMv8-A based Broadcom/Cavium ThunderX2 SoCs. In total there were 40 32-core SoCs. Due to unavailability of adequate memristor-based NVRAM or phase-change memory, the prototype used 160 TB of battery-backed DRAM. Despite this setback, software architect Keith Packard said this "can be used to prove the other parts of the design before switching". According to The Register, HPE's partnership with SK Hynix to develop memristor-based NVRAM ran into funding and directional problems and they were working with Sandisk on Resistive RAM (ReRAM) for The Machine. According to The Next Platform, HPE considered switching to Intel Optane DIMMs "when production quantities of are available on the market".

The Next Platform estimated the rack prototype to consume 24 kW to 36 kW of power.

Software overview
Two major software projects were created for the Machine. An experimental version of Linux called Linux++ with all the necessary enhancements to configure the hardware and work with traditional programming models. This included bridge configuration, access control and mapping using the DAX subsystem. In parallel, a new operating system (OS) called Carbon was announced that would be designed from first principles to take full advantage of an NVRAM based computer.

Primary workloads for The Machine included in-memory database, Hadoop-style software, and real-time big data analytics. HPE claimed that a memory-driven computing design like The Machine could "improve speeds by up to 8000x compared to conventional systems".

In the prototype system, the fabric attached memory of the system was organised by a "top of rack" management server component called The Librarian. The Librarian divided the memory into "shelves" of 8GB "books", and hardware protections could be configured on book boundaries. A fine grained 64KB "booklet" was also supported.

The mapping of memory is handled by the OS, while the access controls for the memory are configured by the management infrastructure of The Machine system as a whole. Software needs to be aware that fabric attached memory memory reads can have synchronous errors whilst writes can have asynchronous errors. On the Linux system, when a memory error occurs the SIGBUS operating system signal is used.

Programming model and data structure changes were also explored, including changes to thread libraries and heap data structures to be resilient with non-volatile memory failure modes.

History
A few years after HP’s re-discovery of the Memristor, the newly appointed CTO of HP, Martin Fink, created a HP Labs project to build a computer system based on memristor to tackle the slowing of Moore's law. He announced the project at HP’s Discover event in the summer of 2014. Some of the ideas of The Machine also came from Dragonhawk system designs. Three-quarters of HP Labs’s 200 staff were focused on the hardware and software of the machine.

Speaking to Bloomberg, HP says it would commercialize The Machine within a few years, “or fall on its face trying.”

Kirk Bresniker served as Chief Architect, and Keith Packard was hired to work on the Linux enhancements. Bdale Garbee was hired to manage open source development.

In 2015, Hewlett-Packard separated into two separate companies, HP Inc and Hewlett Packard Enterprise (HPE), with The Machine project assigned to the latter.

In late 2016, Martin Fink retired as HPE CTO. Fink's retirement announcement also said that Hewlett Packard Labs staff would be moved into the Enterprise product group to "align our R&D work on The Machine with the business".

By early 2017, Hewlett Packard Labs had a slide saying that the project's aim was “to demonstrate progress, not develop products” and they would “collaborate to deliver differentiating Machine value into existing architectures as well as disruptive architectures”. BleepingComputer said "In other words, The Machine is no longer a product in its own right. Instead it will provide technologies that will be used in other HPE products going forward.". HPE restructured its pure R&D organization and placed it in the products group. Yahoo! Finance reported that the Machine prototype "remains years away from being commercially available".

In 2018, HPE stated that the project had reached the stage where it needed commercial applications from customers in the next step of its evolution.