Red Storm (computing)

Red Storm was a supercomputer architecture designed for the US Department of Energy’s National Nuclear Security Administration Advanced Simulation and Computing Program. Cray, Inc developed it in 2004 based on the contracted architectural specifications provided by Sandia National Laboratories. The architecture was later commercially produced as the Cray XT3.

Red Storm was a partitioned, space shared, tightly coupled, massively parallel processing machine with a high performance 3D mesh network. The processors were commodity AMD Opteron CPUs with off-the-shelf memory DIMMs. The NIC/router combination, called SeaStar, was the only custom ASIC component in the system and used a PowerPC 440 based core. When deployed in 2005, Red Storm’s initial configuration consisted of 10,880 single-core 2.0 GHz Opterons, of which 10,368 were dedicated for scientific calculations. The remaining 512 Opterons were used to service the computations and also provide the user interface to the system and run a version of Linux. This initial installation consisted of 140 cabinets, taking up 280 m2 of floor space.

The Red Storm supercomputer was designed to be highly scalable from a single cabinet to hundreds of cabinets and was scaled up twice during its lifetime. In 2006 the system was upgraded to 2.4 GHz Dual-Core Opterons. An additional fifth row of computer cabinets were also brought online resulting in over 26,000 processor cores. This resulted in a peak performance of 124.4 teraflops, or 101.4 running the Linpack benchmark. A second major upgrade in 2008 introduced Cray XT4 technology: Quad-core Opteron processors and an increase in memory to 2 GB per core. This resulted in a peak theoretical performance of 284 teraflops.

Top500 performance ranking for Red Storm after each upgrade:
 * November 2005: Rank 6 (36.19 TFLOPS)
 * November 2006: Rank 2 (101.4 TFLOPS)
 * November 2008: Rank 9 (204.2 TFLOPS)

Red Storm was intended for capability computing. That is, a single application could be run across the entire system. This is in contrast to cluster-style capacity computing, in which portions of a cluster are assigned to run different applications. The performance of the memory subsystem, the processor, and the network must be in proper balance to achieve adequate application progress across the entire machine. System software played a key role as well. The Portals network programming API was used to ensure inter-processor communication can scale as large as the entire system, and was used on many different supercomputers, including the Intel Teraflops and Paragon. The compute processors use a custom lightweight kernel operating system named Catamount, which was based on the operating system of ASCI Red called "Cougar". A userspace implementation of the Lustre file system, named liblustre, was ported to the Catamount environment using libsysio library to provide POSIX-like semantics. This filesystem client ran in the single-threaded Catamount environment without interrupts, and only serviced IO requests when explicitly allowed by the application, to reduce jitter introduced by background file system operations.

Red Storm was decommissioned in 2012.