CDC 7600

The CDC 7600 was designed by Seymour Cray to be the successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz (27.5 ns clock cycle) and had a 65 Kword primary memory (with a 60-bit word size) using magnetic core and variable-size (up to 512 Kword) secondary memory (depending on site). It was generally about ten times as fast as the CDC 6600 and could deliver about 10 MFLOPS on hand-compiled code, with a peak of 36 MFLOPS. In addition, in benchmark tests in early 1970 it was shown to be slightly faster than its IBM rival, the IBM System/360, Model 195. When the system was released in 1967, it sold for around $5 million in base configurations, and considerably more as options and features were added.

Among the 7600's notable state-of-the-art contributions, beyond extensive pipelining, was the physical C-shape, which both reduced floor space and dramatically increased performance by reducing the distance that signals needed to travel.

Design
As the 6600 neared production quality, Cray lost interest in it and turned to designing its replacement. Making a machine "somewhat" faster would not be too difficult in the late 1960s; the introduction of integrated circuits allowed denser packing of components and, in turn, a higher clock speed. Transistors in general were also getting somewhat faster as the production processes and quality improved. These sorts of improvements might be expected to make a machine twice as fast, perhaps as much as five times. However, as with the 6600 design, Cray set himself the goal of producing a machine with ten times the performance.

One of the reasons the 6600 was so much faster than its contemporaries is that it had multiple functional units that could operate in parallel. For instance, the machine could perform an addition of two numbers while simultaneously multiplying two others. However, any given instruction had to complete its trip through the unit before the next could be fed into it, which caused a bottleneck when the scheduler system ran out of instructions. Adding more functional units would not improve performance unless the scheduler was also greatly improved, especially in terms of allowing it to have more memory, so it could look through more instructions for ones that could be fed into the parallel units. That appeared to be a major problem.

In order to solve this problem, Cray turned to the concept of an instruction pipeline. Each functional unit consisted of several sections that operated in turn, for instance, an addition unit might have circuitry dedicated to retrieving the operands from memory, then the actual math unit, and finally another to send the results back to memory. At any given instance only one part of the unit was active, while the rest waited their turn. A pipeline improves on this by feeding in the next instruction before the first has completed, using up that idle time. For instance, while one instruction is being added together, the operands for the next add instruction can be fetched. That way, as soon as the current instruction completes and moves to the output circuitry, the operands for the next addition are already waiting to be added. In this way each functional unit works in "parallel", as well as the machine as a whole. The improvement in performance generally depends on the number of steps the unit takes to complete. For instance, the 6600's multiply unit took 10 cycles to complete an instruction, so by pipelining the units it could be expected to gain about 10 times the speed.

Things are never that simple, however. Pipelining requires that the unit's internals can be effectively separated to the point where each step of the operation is running on completely separate circuitry. This is rarely achievable in the real world. Nevertheless, the use of pipelining on the 7600 improved performance over the 6600 by a factor of about 3. To achieve the rest of the goal, the machine would have to run at a faster speed, now possible using new transistor designs. However, there is a physical limit to performance because of the time it takes signals to move between parts of the machine, which in turn is defined by its physical size. As always, Cray's design work spent considerable effort on this problem and thus allow higher operating frequencies. For the 7600, each circuit module actually consisted of up to six printed circuit boards, each one stuffed with subminiature resistors, diodes, and transistors. The six boards were stacked up and then interconnected along their edges, making a very compact, but basically unrepairable module.

However the same dense packing also led to the machine's biggest problem – heat. For the 7600, Cray once again turned to his refrigeration engineer, Dean Roush, formerly of the Amana company. Roush added an aluminum plate to the back of each side of the cordwood stack, which were in turn cooled by a liquid-freon system running through the core of the machine. Since this system was mechanical, and therefore prone to failure, the 7600 was redesigned into a large "C" shape to allow access to the modules on either side of the cooling piping by walking into the inside of the "C" and opening the cabinet.

Architecture
The 7600 was an architectural landmark, and most of its features are still standard parts of computer design. It is a load-store computer with a 15-bit instruction word containing a 6-bit operation code. There are only 64 machine codes, including a no-operation code, with no fixed-point multiply or divide operations in the central processor.

The 7600 has two main core memories. Small core memory holds the instructions currently being executed and the data currently being processed. It has an access time of 10 of the 27.5-ns minor cycles and a 60-bit word length. Large core memory holds data ready to transfer to small core memory. It has an access time of 60 of the 27.5-ns minor cycles and a word length of 480 bits (512 bits with parity). Accesses are fully pipelined and buffered, so the two have the same sequential transfer rate of 60 bits every 27.5 ns. The two work in parallel, so the sequential transfer rate from one to the other is 60 bits per 27.5 ns minor-cycle. On an operating system call, the contents of the small core memory are swapped out and replaced from the large core memory by the operating system, and restored afterward.

There is a 12-word instruction pipeline, called instruction word stack in CDC documentation. All addresses in the stack are fetched, without waiting for the instruction field to be processed. Therefore, the fetch of the target instruction of a conditional branch precedes evaluation of the branch condition. During the execution of a 10-word (up to 40 instruction) loop, all the needed instructions remain in the stack, so no instructions are fetched, leaving small core memory free for data transfers.

There are eight 60-bit registers, each with an address register. Moving an address to an address register starts a small core memory read or write. Arithmetic and logic instructions have these registers as sources and destinations. The programmer or compiler tries to fetch data in time to be used and store data before more data needs the same register, but if it is not ready, the processor goes into a wait state until it is. It also waits if one of the four floating-point arithmetic units is not ready when requested, but due to pipelining, this does not usually happen.

Relationship with the CDC 6600
The CDC 7600 "was designed to be machine code upward compatible with the 6600, but to provide a substantial increase in performance". One user said: "Most users could run on either system without changes."

Although the 7600 shared many features of the 6600, including hardware, instructions, and its 60-bit word size, it was not object-code compatible with the CDC 6600. In addition, it was not entirely source-code (COMPASS) compatible, as some instructions in the 7600 did not exist in the 6600, and vice versa. It had originally been named the CDC 6800, but was changed to 7600 when Cray decided that it could not be completely compatible. However, due to the 7600's operating system design, the 6600 and 7600 shared a "uniform software environment" despite the low-level differences.

In fact, from a high-level perspective, the 7600 was quite similar to the 6600. At the time computer memory could be arranged in blocks with independent access paths, and Cray's designs used this to their advantage. While most machines would use a single CPU to run all the functionality of the system, Cray realized that this meant each memory block spent a considerable amount of time idle while the CPU was processing instructions and accessing other blocks. In order to take advantage of this, the 6600 and 7600 left mundane housekeeping tasks, printing output or reading punched cards, for instance, to a series of ten smaller 12-bit machines based on the CDC 160-A known as "Peripheral Processor Units", or PPUs. For any given cycle of the machine one of the PPUs was in control, feeding data into the memory while the main processor was crunching numbers. When the cycle completed, the next PPU was given control. In this way the memory always held up-to-date information for the main processor to work on (barring delays in the external devices themselves), eliminating delays on data, as well as allowing the CPU to be built for mathematical performance and nothing else. The PPU could have been called a very smart "communications channel".

Like the 6600, the 7600 used 60-bit words with instructions that were generally 15 bits in length, although there were also 30-bit instructions. The instructions were packed into the 60-bit words, but a 30-bit instruction could not straddle two words, and control could only be transferred to the first instruction in a word. However, the instruction set itself had changed to reflect the new internal memory layout, thereby rendering it incompatible with the earlier 6600. The machines were similar enough to make porting of compilers and operating systems possible without too much trouble. The machine initially did not come with software; sites had to be willing to write their own operating system, like LTSS, NCAROS, and others; and compilers like LRLTRAN (Livermore's version of Fortran with dynamic memory management and other non-standard features).

CDC also manufactured two multi-processor computers based on the 7600, with the model number 7700. They consisted of two 7600 machines in an asymmetric configuration: a central and an adjunct machine. They were used for missile launch and inbound tracking of USSR ICBMs. The radar simulator was a real-time simulator with a CDC 6400 for input/output front-end. These systems were to be used in the Pacific Missile Range. One computer was installed at TRW in Redondo Beach CA (later moved to Kwajalein Atoll, South Pacific), and the second one was installed at McDonnell Douglas in Huntington Beach, California. They were actual 7600s connected by chassis 25 to make them a 7600 MP.

Reception and usage
From about 1969 to 1975, the CDC 7600 was generally regarded as the fastest computer in the world, except for specialized units. However, even with the advanced mechanicals and cooling, the 7600 was prone to failure. Both LLNL and NCAR reported that the machine would break down at least once a day, and often four or five times. Acceptance at installation sites took years while the bugs were worked out, and while the machine generally sold well enough given its "high end" niche, it is unlikely the machine generated any sort of real profits for CDC. The successor CDC 8600 was never completed, and Seymour Cray went on to form his own company, Cray Research.

One surviving 7600 is partially on display at the Computer History Museum. Its sheer size allows only two corner units to be shown. The rest are in storage. Another 7600 is on display at the Chippewa Falls Museum of Industry and Technology, along with its console and a tape controller.

Photos

 * Inside the 7600