IBM POWER architecture

IBM POWER is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.

The ISA is used as base for high end microprocessors from IBM during the 1990s and were used in many of IBM's servers, minicomputers, workstations, and supercomputers. These processors are called POWER1 (RIOS-1, RIOS.9, RSC, RAD6000) and POWER2 (POWER2, POWER2+ and P2SC).

The ISA evolved into the PowerPC instruction set architecture and was deprecated in 1998 when IBM introduced the POWER3 processor that was mainly a 32/64-bit PowerPC processor but included the IBM POWER architecture for backwards compatibility. The original IBM POWER architecture was then abandoned. PowerPC evolved into the third Power ISA in 2006.

IBM continues to develop PowerPC microprocessor cores for use in their application-specific integrated circuit (ASIC) offerings. Many high volume applications embed PowerPC cores.

The 801 research project
In 1974, IBM started a project with a design objective of creating a large telephone-switching network with a potential capacity to deal with at least 300 calls per second. It was projected that 20,000 machine instructions would be required to handle each call while maintaining a real-time response, so a processor with a performance of 12 MIPS was deemed necessary. This requirement was extremely ambitious for the time, but it was realised that much of the complexity of contemporary CPUs could be dispensed with, since this machine would need only to perform I/O, branches, add register-register, move data between registers and memory, and would have no need for special instructions to perform heavy arithmetic.

This simple design philosophy, whereby each step of a complex operation is specified explicitly by one machine instruction, and all instructions are required to complete in the same constant time, would later come to be known as RISC.

By 1975 the telephone switch project was canceled without a prototype. From the estimates from simulations produced in the project's first year, however, it looked as if the processor being designed for this project could be a very promising general-purpose processor, so work continued at Thomas J. Watson Research Center building #801, on the 801 project.

1982 Cheetah project
For two years at the Watson Research Center, the superscalar limits of the 801 design were explored, such as the feasibility of implementing the design using multiple functional units to improve performance, similar to what had been done in the IBM System/360 Model 91 and the CDC 6600 (although the Model 91 had been based on a CISC design), to determine if a RISC machine could maintain multiple instructions per cycle, or what design changes need to be made to the 801 design to allow for multiple-execution-units.

To increase performance, Cheetah had separate branch, fixed-point, and floating-point execution units. Many changes were made to the 801 design to allow for multiple-execution-units. Cheetah was originally planned to be manufactured using bipolar emitter-coupled logic (ECL) technology, but by 1984 complementary metal–oxide–semiconductor (CMOS) technology afforded an increase in the level of circuit integration while improving transistor-logic performance.

The America project
In 1985, research on a second-generation RISC architecture started at the IBM Thomas J. Watson Research Center, producing the "AMERICA architecture"; in 1986, IBM Austin started developing the RS/6000 series, based on that architecture.

POWER
In February 1990, the first computers from IBM to incorporate the POWER instruction set were called the "RISC System/6000" or RS/6000. These RS/6000 computers were divided into two classes, workstations and servers, and hence introduced as the POWERstation and POWERserver. The RS/6000 CPU had 2 configurations, called the "RIOS-1" and "RIOS.9" (or more commonly the "POWER1" CPU). A RIOS-1 configuration had a total of 10 discrete chips - an instruction cache chip, fixed-point chip, floating-point chip, 4 data cache chips, storage control chip, input/output chips, and a clock chip. The lower cost RIOS.9 configuration had 8 discrete chips - an instruction cache chip, fixed-point chip, floating-point chip, 2 data cache chips, storage control chip, input/output chip, and a clock chip.

A single-chip implementation of RIOS, RSC (for "RISC Single Chip"), was developed for lower-end RS/6000's; the first machines using RSC were released in 1992.

POWER2
IBM started the POWER2 processor effort as a successor to the POWER1 two years before the creation of the 1991 Apple/IBM/Motorola alliance in Austin, Texas. Despite being impacted by diversion of resources to jump start the Apple/IBM/Motorola effort, the POWER2 took five years from start to system shipment. By adding a second fixed-point unit, a second floating point unit, and other performance enhancements to the design, the POWER2 had leadership performance when it was announced in November 1993.

New instructions were also added to the instruction set:
 * Quad-word storage instructions. The quad-word load instruction moves two adjacent double-precision values into two adjacent floating-point registers.
 * Hardware square root instruction.
 * Floating-point to integer conversion instructions.

To support the RS/6000 and RS/6000 SP2 product lines in 1996, IBM had its own design team implement a single-chip version of POWER2, the P2SC ("POWER2 Super Chip"), outside the Apple/IBM/Motorola alliance in IBM's most advanced and dense CMOS-6S process. P2SC combined all of the separate POWER2 instruction cache, fixed point, floating point, storage control, and data cache chips onto one huge die. At the time of its introduction, P2SC was the largest and highest transistor count processor in the industry. Despite the challenge of its size, complexity, and advanced CMOS process, the first tape-out version of the processor was able to be shipped, and it had leadership floating point performance at the time it was announced. P2SC was the processor used in the 1997 IBM Deep Blue chess playing supercomputer which beat chess grandmaster Garry Kasparov. With its twin sophisticated MAF floating point units and huge wide and low latency memory interfaces, P2SC was primarily targeted at engineering and scientific applications. P2SC was eventually succeeded by the POWER3, which included 64-bit, SMP capability, and a full transition to PowerPC in addition to P2SC's sophisticated twin MAF floating point units.

The architecture
The POWER design is descended directly from the 801's CPU, widely considered to be the first true RISC processor design. The 801 was used in a number of applications inside IBM hardware.

At about the same time the PC/RT was being released, IBM started the America Project, to design the most powerful CPU on the market. They were interested primarily in fixing two problems in the 801 design:


 * The 801 required all instructions to complete in one clock cycle, which precluded floating point instructions.
 * Although the decoder was pipelined as a side effect of these single-cycle operations, they didn't use superscalar effects.

Floating point became a focus for the America Project, and IBM was able to use new algorithms developed in the early 1980s that could support 64-bit double-precision multiplies and divides in a single cycle. The FPU portion of the design was separate from the instruction decoder and integer parts, allowing the decoder to send instructions to both the FPU and ALU (integer) execution units at the same time. IBM complemented this with a complex instruction decoder which could be fetching one instruction, decoding another, and sending one to the ALU and FPU at the same time, resulting in one of the first superscalar CPU designs in use.

The system used 32 32-bit integer registers and another 32 64-bit floating point registers, each in their own unit. The branch unit also included a number of "private" registers for its own use, including the program counter.

Another interesting feature of the architecture is a virtual address system which maps all addresses into a 52-bit space. In this way applications can share memory in a "flat" 32-bit space, and all of the programs can have different blocks of 32 bits each.

Appendix E of Book I: PowerPC User Instruction Set Architecture of the PowerPC Architecture Book, Version 2.02 describes the differences between the POWER and POWER2 instruction set architectures and the version of the PowerPC instruction set architecture implemented by the POWER5.