R5000

The R5000 is a 64-bit, bi-endian, superscalar, in-order execution 2-issue design microprocessor that implements the MIPS IV instruction set architecture (ISA) developed by Quantum Effect Design (QED) in 1996. The project was funded by MIPS Technologies, Inc (MTI), also the licensor. MTI then licensed the design to Integrated Device Technology (IDT), NEC, NKK, and Toshiba. The R5000 succeeded the QED R4600 and R4700 as their flagship high-end embedded microprocessor. IDT marketed its version of the R5000 as the 79RV5000, NEC as VR5000, NKK as the NR5000, and Toshiba as the TX5000. The R5000 was sold to PMC-Sierra when the company acquired QED. Derivatives of the R5000 are still in production today for embedded systems.

Users
Users of the R5000 in workstation and server computers were Silicon Graphics, Inc. (SGI) and Siemens-Nixdorf. SGI used the R5000 in their O2 and Indy low-end workstations. The R5000 was also used in embedded systems such as network routers and high-end printers. The R5000 found its way into the arcade gaming industry, R5000 powered mainboards were used by Atari and Midway. Initially the Cobalt Qube and Cobalt RaQ used a derivative model, the RM5230 and RM5231. The Qube 2700 used the RM5230 microprocessor, whereas the Qube 2 used the RM5231. The original RaQ systems were equipped with RM5230 or RM5231 CPUs but later models used AMD K6-2 chips and then eventually Intel Pentium III CPUs for the final models.

History
The original roadmap called for 200 MHz operation in early 1996, 250 MHz in late 1996, succeeded in 1997 by R5000A. The R5000 was introduced in January 1996 and failed to achieve 200 MHz, topping out at 180 MHz. When positioned as a low-end workstation microprocessor, the competition included the IBM and Motorola PowerPC 604, the HP PA-7300LC and the Intel Pentium Pro.

Description
The R5000 is a two-way superscalar design that executes instructions in-order. The R5000 could simultaneously issue an integer and a floating-point instruction. It had one simple pipeline for integer instructions and another for floating-point to save transistors and die area to reduce cost. The R5000 did not perform dynamic branch prediction for cost reasons. Instead it uses a static approach, utilizing the hints encoded by the compiler in the branch-likely instructions first introduced in the MIPS II architecture to determine how likely a branch is taken.

The R5000 had large L1 caches, a distinct characteristic of QED, whose designers favored simple designs with large caches. The R5000 had two L1 caches, one for instructions and the other for data. Both have a capacity of 32 KB. The caches are two-way set-associative, have a 32-byte line size, and are virtually indexed, physically tagged. Instructions were predecoded as they enter the instruction cache by appending four bits to each instruction. These four bits specify whether can be issued together and which execution unit they are executed by. This assisted superscalar instruction issue by moving some of the dependency and conflict checking out of the critical path.

The integer unit executes most instructions with a one cycle latency and throughput except for multiply and divide. 32-bit multiplies have a five-cycle latency and a four-cycle throughput. 64-bit multiplies have an extra four cycles of latency and half the throughput. Divides have a 36-cycle latency and throughput for 32-bit integers, and for 64-bit integers, they are increased to 68 cycles.

The floating-point unit (FPU) was a fast single-precision (32-bit) design, for reduced cost and to benefit SGI, whose mid-range 3D graphics workstations relied mostly on single-precision math for 3D graphics applications. It was fully pipelined, which made it significantly better than that of the R4700. The R5000 implements the multiply-add instruction of the MIPS IV ISA. Single-precision adds, multiplies and multiply-adds have a four-cycle latency and a one cycle throughput. Single-precision divides have a 21-cycle latency and a 19-cycle throughput, while square roots have a 26-cycle latency and a 38-cycle throughput. Division and square-root was not pipelined. Instructions that operate on double precision numbers have a significantly higher latency and lower throughput except for add, which has identical latency and throughput with single-precision add. Multiply and multiply-add have a five-cycle latency and a two-cycle throughput. Divide has a 36-cycle latency and a 34-cycle throughput. Square root has a 68-cycle latency and a 66-cycle throughput.

The R5000 had an integrated L2 cache controller that supported capacities of 512 KB, 1 MB and 2 MB. The L2 cache shares the SysAD bus with the external interface. The cache was built with custom synchronous SRAMs (SSRAMs). The microprocessor uses the SysAD bus that is also used by several other MIPS microprocessors. The bus is multiplexed (address and data share the same set of wires) and can operate at clock frequencies up to 100 MHz. The initial R5000 did not support multiprocessing, but the package reserved eight pins for the future addition of this feature.

QED was a fabless company and did not fabricate their own designs. The R5000 was fabricated by IDT, NEC and NKK. All three companies fabricated the R5000 in a 0.35 μm complementary metal–oxide–semiconductor (CMOS) process, but with different process features. IDT fabricated the R5000 in a process with two levels of polysilicon and three levels of aluminium interconnect. The two levels of polysilicon enabled IDT to use a four-transistor SRAM cell, resulting in a transistor count of 3.6 million and a die that measured 8.7 mm by 9.7 mm (84.39 mm2). NEC and NKK fabricated the R5000 in a process with one level of polysilicon and three levels of aluminium interconnect. Without an extra level of polysilicon, both companies had to use a six-transistor SRAM cell, resulting in a transistor count of 5.0 million and a larger die with an area of around 87 mm2. Die sizes in the range of 80 to 90 mm2 were claimed by MTI. 0.8 million of the transistors in both versions were for logic, and the remainder contained in the caches. It was packaged in a 272-ball plastic ball grid array (BGA) or 223-pin ceramic pin grid array (PGA). It was not pin-compatible with any previous MIPS microprocessor.

Derivatives
In the late 1990s, Quantum Effect Design acquired a license to manufacture and sell MIPS microprocessors from MTI and became a microprocessor vendor, changing its name to Quantum Effect Devices to reflect its new business model. The company's first products were members of the RM52xx family, which initially consisted of two models, the RM5230 and RM5260. These were announced on 24 March 1997. The RM5230 was initially available at 100 and 133 MHz, and the RM5260 at 133 and 150 MHz. On 29 September 1997, new 150 and 175 MHz RM5230s were introduced, as were 175 and 200 MHz RM5260s.

Both the RM5230 and RM5260 are derivatives of the R5000 and differ in the size of their primary caches (16 KB each instead of 32 KB), the width of their system interfaces (the RM5230 has a 32-bit 67 MHz SysAD bus, and the RM5260 a 64-bit 75 MHz SysAD bus), and the addition of multiply-add and three-operand multiply instructions for digital signal processing applications. These microprocessors were fabricated by the Taiwan Semiconductor Manufacturing Company (TSMC) in its 0.35 μm process with three levels of interconnect. They were packaged by Amkor Technology in its Power-Quad 4 packages, the RM5230 in a 128-pin version, and the RM5260 in a 208-pin version.

The RM52xx family was later joined by the RM5270, which was announced at the Embedded Systems Conference on 29 September 1997. Intended for high-end embedded applications, the RM5270 was available at 150 and 200 MHz. Improvements were the addition of an on-chip secondary cache controller that supported up to 2 MB of cache. The SysAD bus is 64 bits wide and can operate at 100 MHz. It was packaged in a 304-pin Super-BGA (SBGA) that was pin-compatible with the RM7000 and was offered as a migration path to the RM7000.

On 20 July 1998, the RM52x1 family was announced. The family consisted of the RM5231, RM5261, and RM5271. These microprocessors were derivatives of the corresponding devices from the RM52x0 family fabricated in a 0.25 μm process with four levels of metal. The RM5231 was initially available at 150, 200, and 250 MHz; whereas the RM5261 and RM5271 were available at 250 and 266 MHz. On 6 July 1999, a 300 MHz RM5271 was introduced, priced at US$140 in quantities of 10,000. The RM52x1 improved upon the previous family with larger 32 KB primary caches and a faster SysAD bus that supported clock rates up to 125 MHz.

After QED was acquired by PMC-Sierra, the RM52xx and RM52x1 families were continued as PMC-Sierra products. PMC-Sierra introduced two RM52x1 derivatives, the RM5231A and RM5261A, on 4 April 2001. These microprocessors were fabricated by TSMC in its 0.18 μm process and differ from the previous devices by featuring higher clock rates and lower power consumption. The RM5231A was available at clock rates of 250 to 350 MHz, and the RM5261A from 250 to 400 MHz.

R5900 used in Sony's PlayStation 2 is a modified version of R5000 CPU dubbed the Emotion Engine with a customized instruction/data cache arrangement and Sony's proprietary 107 vector SIMD Multimedia Extensions(MMI). Its custom FPU is not IEEE 754 compliant unlike FPUs used by R5000. It also has a second MIPS core which acted as a sync controller for specialized vector coprocessors, important for 3D math which at the time was principally computed on the CPU.