Bull Gamma 60

The Bull Gamma 60 was a large transistorized mainframe computer designed by Compagnie des Machines Bull. Initially announced in 1957, the first unit shipped in 1960. It holds the distinction of being the world's first multi-threaded computer, and the first to feature an architecture specially designed for parallelism.

The Gamma 60 spearheaded numerous groundbreaking technologies during the early 1960s, notably in multi-programming, utilizing tools that were still in their nascent stages. Upon its release, its architecture garnered significant attention among machine designers, becoming a subject of study alongside contemporary supercomputers and being cited as an example for progress in computer design.

Despite its innovations, the Gamma 60's large footprint (close to 4000 sq.ft), high cost, energy consumption, and complexity ultimately resulted in limited commercial success, with about only twenty units sold worldwide. Its main competitors included the IBM 7070, 7090, and 7030 "Stretch". The last Gamma 60 remained in service until 1974.

Design
The Gamma 60 marked Bull's entry into core memory, solid-state logic and magnetic tape capabilities. Its architectural core was based on a large, high-speed central memory, with an arbitrator (known as the Program Distributor) responsible for distributing data and instructions to the various units within the computer. The processor was segmented into a central unit and a series of discrete, specialized processing units. This design allowed for the concurrent operation of up to five clusters, each containing five processing units.

Each unit in the computer, whether a processing unit or a peripheral device, operated autonomously and would request data and instructions from the central unit when they became available. Data transmission to and from the processing units occurred through two independent buses—one for transmission and another for retrieval.

Processor
The processor operated in a 24-bit parallel configuration, with its primary data types employing one, two, or four words, also referred to as 'catenae', ranging from 24 to 96 bits in width. Simpler and slower external devices often employed an 8-bit parallel logic internally. These devices communicated with the central unit via bit-serial messages for instruction and data transfer requests. All messages were asynchronous, and the machine, through priority classes, was designed to accommodate very high device latencies if necessary, even from an ALU (Arithmetic Logic Unit).

The processor was divided into four kinds of processing elements:


 * Logic Calculator (binary ALU): Capable of performing binary arithmetic and logical operations.
 * Arithmetic Calculator (BCD ALU): Designed for operations involving decimal numbers, including addition, subtraction, multiplication, and division.
 * Comparison Unit: Used for comparing strings, string-to-constant comparisons, data transfer between memory areas, and memory area erasure.
 * Translator: Responsible for translating between I/O device code and internal character code, as well as editing records for output.

Even though up to twenty-five of those specialized processing units could run simultaneously, only one central unit (functioning as a dispatcher) existed in the machine, which does not classify as an SMP (Symmetric Multiprocessing) architecture.

Given the distribution of processing units across separate cabinets, the choice of a slower clock speed of 100 kHz, half of what was originally envisioned, was aimed at mitigating the impact of propagation delays. Although this imposed a limitation on the performance of individual processing units, Bull's strategy was to counterbalance it by parallelism and the ability to easily add additional processing units. Interestingly, while the Gamma 60 logic was transistorized, the operating tension of the germanium transistors was too low to counter the impedance of the great distances that could sometimes occur between some cabinets and peripherals. As a result, vacuum tubes had to be used in certain instances to drive the clock and some I/O signals.

In modern terms, the Gamma 60 CPU would be described as a hardware time-sharing central processor for asynchronous parallel processes, using an explicit fork-join parallelism at the instruction level.

The Gamma 60 foreshadowed the architecture of superscalar processors, where the role of its central memory is now partly assumed by caches. Additionally, it shared similarities with EPIC architectures such as Intel Itanium, in that each instruction was its own thread, and the management of execution concurrency and memory access coherence was placed under the responsibility of the programmer.

Memory
Central memory was implemented with a stack of core memory with a basic memory cycle of 10 μs, making it the fastest component of the Gamma 60 (in comparison, a fixed point addition required 100 microseconds). Main memory addresses were coded on 15-bits, allowing the central memory to store 32,768 words, or 96 kB.

The memory map was continuous and shared by all units, except for the first 128 bytes which were unit specific and contained a set of local registers. Since central memory had, by far, the lowest latency of all components, it constituted the heart of the Gamma 60. Henceforth, the processing units didn't have working memory besides their registers, instead, the Program Distributor dispatched data and instructions to them by loading pointers to the central memory into their registers.

The 128 bytes memory of each processing unit was implemented using planar core memory and followed the following map:


 * A 15-bit Program Address Register (PAR)
 * A one-bit availability status (busy/available)
 * A 15-bit queue head pointer (B register)
 * A 15-bit queue tail pointer (C register)
 * One to four 15-bit Data Address Registers (DAR), depending on the unit
 * An Analysis Register (AR), which gave information to the Program Distributor about the status of the processing unit
 * An Actual Program Address Register (APAR). When a particular processing unit was selected by the program distributor, the PAR of that unit was loaded into the APAR. The APAR was used to fetch instruction words for that unit and was incremented after each fetch. When the unit was released by the program distributor, the current contents of the APAR was stored into the PAR for that unit.

One exception was the Program Distributor which had its own set of registers.

Peripherals
The peripherals of the Gamma 60 were categorized into four distinct classes based on their latency and performance characteristics.

Class 0
Class 0 included the fastest and most time-critical peripherals. The Program Distributor favored this class over any other to transfer instructions and data. It consisted of two devices: the Logical Unit (binary ALU) and the operator desk.

Class 1
Class 1 included the Arithmetic Unit (BCD ALU), the General Comparator and the Transcoder (translation unit).

Class 2
Class 2 included storage devices, notably:


 * Magnetic drums:

The magnetic drums acted as large memories with a slower access time than Central Memory (10 milliseconds versus 10 microseconds). Up to four drums could be added to the Gamma 60, each with a capacity of 128 tracks of 200 words (150 kB each). Their memory was not mapped to the main memory namespace but was accessed as would a magnetic tape.


 * Magnetic tapes:

Up to ten specialized I/O processors named Uniselectors could each control up to twelve tape units. However, only 48 tapes units could be connected simultaneously.

Magnetic tapes had a recording density of 200 bits per inch and a 730m or 1100m length. The magnetic tape units used phase modulation rather than NRZ (non-return-from-zero), a technology that had been previously developed for the drum memory of the Gamma 3.

Class 3
Class 3 included slower recording equipment such as:


 * Card punches and card readers (300 cards per minute), up to four of each kind.
 * Paper-tape readers (200 characters per second)
 * High-performance printers (5 lines per second), up to eight of them. The Gamma 60 printers had their own core memory used as a buffer.

Class 4
Finally, class 4 included the slowest peripherals, such as teletypes and typewriters.

Execution cycle
The sequencing mechanism employed by the Gamma 60 encompassed an orchestration of instruction fetching and distribution, execution control, memory access, and inter-process communication. The following sequence details a complete instruction execution cycle:


 * 1) The Program Distributor scanned for the highest-priority processing unit requesting an instruction (ITR signal).
 * 2) The Program Distributor sent a signal to the highest-priority processing unit, which then returned its status word, subsequently placed in the Address Register (AR).
 * 3) a) If the unit was busy but not currently executing a directive, the program distributor fetched and executed the instruction addressed by that unit's Program Address Register (PAR). When a new directive was encountered, it was sent to the processing unit, and the program distributor scanned for other work. b) If the highest-priority processing unit was available, the program distributor checked the unit's queue for waiting processes. If found, the PAR was loaded with the address of the processing unit's queue head pointer plus one, the queue head pointer was updated, and instruction fetch and execution began for that unit until a directive (or cut) word was encountered. As before, upon sending the directive, the program distributor did reset the instruction request signal and scanned for other work.
 * 4) When the processing unit was ready to receive or transmit data to/from memory, it signaled the memory arbiter.
 * 5) The memory arbiter honored the memory request, using the address from the appropriate DAR (Data Address Register).
 * 6) The processing unit continued data transfers until the instruction was complete. It then sent a new instruction request signal to the program distributor.

Programming
Programming the Gamma 60 was tedious, as adequate development tools for parallel programming did not exist in those times. High-level languages only appeared two years after the Gamma 60 introduction.

Two kinds of machine languages existed:


 * Code A was actual machine code, written in a very similar way to the Gamma 3 with an Operation Type and arguments. Code A used a fixed structure conducive to card punching and easy visual inspection. It was meant to be fed directly to the Central Memory for execution.
 * Code B was a more modern assembler (then called autocode) supporting mnemonics, decimal addresses (useful in BCD mode), and symbolic names. Code B was meant to be fed to an assembler program which wrote the resulting machine code to a magnetic tape. Bull developed a library of about 300 code B subroutines for the Gamma 60, most of them dealing with arithmetic computations and I/Os.

Recognizing the need for a high-level language, Bull began the development of a new language for the Gamma 60 called AP3. However, this initiative was eventually abandoned in favor of ALGOL, in which Bull got significantly invested starting in the late 1950s. The ALGOL 60 compiler was eventually released in December 1962.

While Algol solved many complexity issues with programming the Gamma 60, it was ill suited to business applications. Additionally, COBOL did not exist when the Gamma 60 was designed, and was still in its early stages during the computer lifespan. Furthermore, the Gamma 60 used a fork-join model which COBOL had not been designed to accommodate. As a result, developers had no alternative but to use code B for programming the Gamma 60 in business applications.

This absence of advanced high-level tools for breaking down tasks into small concurrent threads meant that most programs only utilized a fraction of the hardware's capabilities.

Instructions
A complete Gamma 60 instruction was composed of a series of 24-bit words of four types:


 * A - Address: used a 15-bit address to identify an operand. An address word could load a data address register (DAR), modify its value through indexing or indirection, or store it in memory
 * B - Branch: the branch address was indicated by a 15-bit address field
 * C - Cut: Enabled multiprocessing (as seen below)
 * D - Directive: Gave a processing order
 * E (or 0) - Blank. Has no processing effect, was used to provide an address field for queuing

A complete instruction must always last with a directive.

The instruction formats could vary depending on the processing unit, and supported direct, relative and indirect memory addressing. The Gamma 60 primarily performed memory operations through load-store operations in the main memory, or used processing instructions to manipulate registers in the processing units. Multiprocessing was achieved with two instructions: cut ("coupure") activated a processing unit by specifying a program address, while simu ("simultané") enabled an asynchronous branch for another unit.

As an example, considering the following expression: $$a = b * c $$ $$d = a + d$$ This could be implemented using the following Gamma 60 pseudo-code: [C] cut to multiplication unit: [A] address (load)  b     [A]  address (load)  c     [A]  address (store) a     [D]  directive (mult) [C] cut to arithmetic unit: [A] address (load)  a     [A]  address (load)  d     [A]  address (store) d     [D]  directive (add) The distribution of instructions across the processing units, as well as the handling of synchronization, was done automatically by the Program Distributor.

An early operating system, GGZ (Gestion Générale Zéro, or General Management version Zero), was developed for the Gamma 60. It was delivered as a compact resident supervisor stored on magnetic tape, containing a bootloader, resource table, error handler and an operator command interpreter. The bootloader was capable of initializing several variables during boot or accept them from the operator.

A more advanced operating system, GGU (Gestion Générale des Unités), was later shipped but remained incomplete as the computer neared the end of its lifespan. GGU introduced features like memory management, enhanced error recovery, and automated job management and scheduling. It was notably used at RTT in Belgium.

Unfortunately, the software tools available in the late fifties and early sixties, such as compilers, operating systems, and debuggers, were still too primitive to fully utilize the capabilities of the Gamma 60. Compilers allowing automatic exploitation of concurrency for an EPIC architecture only emerged in the late 1990s, after a development period that exceeded the entire lifespan of the Gamma 60.

Due to the significant challenges in program development with the crude tools of the times, Bull lost several clients to IBM, whose machines, while single-processor, were simpler and more cost-effective.

Clients
About twenty large companies and organizations showed interest in the Gamma 60 in the late fifties and early sixties, although not all of them made a purchase. The unit shipped to AG Vie was the first one manufactured, but encountered reliability issues as some hardware problems were still being discovered and resolved. As the company's patience ran out, the computer was decommissioned prematurely. Gaz de France (now Engie) also expressed interest but later withdrew due to implementation and operational complexities. The contract was eventually awarded to the IBM 7070. Other clients are listed in the table below: The Régie des Télégraphes et Téléphones, later known as Belgacom and now Proximus, boasted one of the largest Gamma 60 installation, featuring an average of twenty Processing Units. The installation spanned two floors: the lower floor housed a 400 kVA alternator that supplied stabilized current to the computer, along with robust air conditioning systems to maintain the heat-sensitive germanium electronics at a constant 18 °C. The upper floor was partitioned into three rooms—one for the mainframe, another for the operator console and tape readers, and the last room for card punches and readers.

The Gamma 60 that remained in service the longest was also at RTT, where it operated continuously for 13 years. The decision to replace its two Gamma 60 systems in the mid-1970s with several Siemens 4004 computers was politically motivated, as the Belgian government had established a five-year contract linking their information systems to Siemens and Phillips.

Legacy
Bull committed significant resources to the development of the Gamma 60, even though the practical utilization of its architectural innovations would not become feasible until the 1980s or even the 1990s, greatly impacting its commercial success. The challenges it faced in the market led to financial difficulties that hindered the creation of a more accessible, scaled-down version suitable for smaller companies. Instead, the mid-range Gamma 30 transitioned to a licensed version of the simpler and more traditional RCA 301 computer, which, while successful in competing with the IBM 1401, could not capitalize on the innovations of the Gamma 60.

In 1963, Bull was eventually purchased by General Electric, and the primary successor to the Gamma 60 became the GE 600 family starting in 1965.

No surviving specimens of the Gamma 60 exist today, save for some of its components displayed in certain museums. The NAM Computer Museum in Namur, Belgium, showcases a detailed scaled model of the Gamma 60 as it appeared at RTT. The Gamma 60 was also featured in the Jean-Luc Godard film Alphaville (1965), portraying the antagonist Alpha 60 computer.

No Gamma 60 emulator is known to exist.