BELLMAC-8

The MAC-8, better known today as the BELLMAC-8, is an 8-bit microprocessor designed by Bell Labs. Production began in CMOS form at Western Electric as the WE212 in 1977. The MAC-8 was used only in AT&T products, like the 4ESS. No commercial spec sheets were published, so it is little known as a result. The best-known use, in the public, is the MAC-TUTOR computer trainer, released in 1979.

The MAC-8 was designed to run high-level programming languages, in particular, Bell's own C programming language. An uncommon feature of the system is that its assembler language was deliberately written to resemble C code, including support for variables and high-level constructs like for loops. In contrast, most assemblers of the era mapped much more directly onto the low-level opcodes of the processor and lacked higher-level features.

The MAC-8 was followed by the BELLMAC-80, a 32-bit system very different from the MAC-8 internally, but maintained the concept of being designed to run C. This was followed by the experimental CRISP design, and finally by the 1992 AT&T Hobbit, which saw limited commercial use.

Design concepts
The MAC-8 is the first design to be produced by AT&T's ongoing C Machine Project that started in 1975. This aimed to produce processor designs that could directly run high level languages, specifically Bell's own C programming language.

To offer reasonable performance in programs that are dominated by many function calls, it is important to have a large number of processor registers. These are small amounts of high-speed memory that can be accessed with no time penalty, unlike main memory which normally involves some delay. Using registers allows data to be passed in and out of functions very quickly. Registers are very expensive in terms of the number of transistors they need and their connections to the rest of the chip. Given Bell's goals and available process technology, the number of on-chip registers would be too few for their needs.

Instead, the C Machine concept places its registers in main memory. This was not uncommon in the minicomputer field in the late 1960s, where the core memory was relatively fast compared to the central processing unit (CPU), meaning there was only a one-cycle delay accessing the data. The value of having many registers offset any downside compared of the slower access. Those microprocessors that were designed to work like older minicomputers, like the Texas Instruments TMS9900, often used this concept. The MAC-8 followed this pattern as well, using blocks of sixteen memory locations to represent the registers, and selecting the start of the block in memory with the Register Pointer, or RP. This meant a compiler could pass data into a function by writing the values to memory, moving RP to point to them, and then jumping to the function. When the function exited, the compiler changed RP to its earlier value to return the machine to its previous state. This concept is known as a register window. In those designs with a fixed number of hardware registers, like the MOS 6502, these sorts of function calls would normally require the data in the registers be written to main memory or a call stack, both of which require multiple cycles accessing memory.

The system also included four public hardware registers. These were the 16-bit Program Counter (PC), Stack Pointer (SP), Register Pointer (RP) and Condition Register (CR), the latter more commonly known as a status register on other platforms. It also included two internal 8-bit registers used only during the processing of the current instruction, the instruction register (IR) that held the last-read instruction opcode, and the D/S register that stored the destination and source register numbers, 0 to 15 in two nibbles. Additionally, one 16-bit address value and its 8-bit data were latched during processing.

The system included a separate, simplified, math unit dedicated to address translations, the address arithmetic unit (AAU). This could read or write the address on the bus to or from the internal registers, or perform offsets and indexing, independent of the main arithmetic logic unit (ALU). One of the reasons for offloading the registers to memory was to free up room for the AAU. This allowed it to offer a wide variety of addressing modess while not having to use the ALU and thus suffering from cycle delays for the more complex modes. Because addresses were 16-bit, and often used registers for storage, addressing instructions always read and wrote the registers in pairs. In these cases, they were referred to a "base" register, or "b registers", and those instructions using only a single 8-bit value were known as "a registers". Since the registers were always 16-bit in memory, a registers only used the low byte of the pair. Somewhat confusingly, when referred to in the documentation, a registers were denoted R, while the b registers were B.

Instruction set
The MAC-8 instruction set architecture (ISA) was split into three broad groups, Arithmetic and Logical, Control Transfer (branching), and Special.

Arithmetic and Logical instructions took one or two operands, each of which pointed to a register, a memory location, or held an immediate value (constant). The instruction was held in a single byte with the upper five bits containing the opcode and the lower three the addressing mode, indicating where the operands (if any) were held. For instance, the addition instruction could indicate that it wanted to add the values in Rs (source) to Rd (destination), by setting the mode to 0 (register-to-register). If the values were in R1 and R2, then the D/S byte would be set to 00010010. Alternately, the same addition could use mode 6, which would add the value in a source b register to the value in a memory location offset by the value in another register (indirect addressing). In this case, the D/S register contains the 16-bit source register, Bs, and the destination holds the destination a register, Rd.

The main set of eight modes included register/register, register to or from base address in memory (indirect), register to or from base address plus an offset (indexed) and auto-incrementing, which added one to the value of the selected B register and then accessed the data at that location. This last mode was useful for implementing loops over memory, by placing the base address in memory and then repeatedly calling the opcode with the same register number, causing it to increment without an explicit instruction. If either nibble was set to the value 15, the meaning of the eight modes changed. In this case, the source was no longer one of the 16 registers, but either the program counter or stack counter depending on the mode, while the destination mostly referred to R15 but addresses were taken from the SP. These later modes were mostly used with conditional branches, allowing the instruction to, for instance, jump forward n locations from the current PC based on the value in the source. Separate instructions, PUSH and POP and BUMP and DEBUMP, increment and decrement the stack pointer or register pointer, respectively.

Control Transfer functions are similar to the logic instruction but lack the optional opcodes, and only need a single register value to perform things like offsets as the base address is normally the program counter. Unconditional jumps, subroutine calls and returns require only a single destination or none at all. In the case of conditional branches, only a single register needs to be used, as nothing is being written back, so the lower nibble of the D/S byte was instead used to indicate which of 16 different conditions should be tested, like whether the value in the indicated location is zero. The conditions included the typical negative, zero, carry and overflow conditions, but also included whether or not the value is all ones (255), odd or even, whether interrupts are enabled, among others.

Programming
The MAC-8 was designed to be programmed in C, and Bell offered cross-compiling support and the M8SIM simulator running on Unix on the PDP-11. The PLAID system, running on the PDP-11, provided debugger support of MAC-8 systems using a cable connection. For those applications that did require direct assembly language programming, the system used a very different sort of language that was deliberately written to look like C. An example from the introduction to the system:

This code sets aside 100 bytes of memory and assigns it the name array. The sum routine then finds the address of array in memory, clears out a register a1 to hold the resulting sum, and then loops over the array summing the entries into a1 while incrementing the address in b0. When compiled, the "variables" b0, a1 and a2 will be placed into registers and the various operations translated into the ISA opcodes - for instance, the assignment of will be turned into a MOVE instruction with the addressing set so the destination is an R register and the source is the constant value zero. The for will be implemented as a macro using other registers as the source for the index variable a2. The language includes structs, variable definitions, functions and most of the other features of C.

As the system was new, and the entire concept of microprocessors new to AT&T, the company also introduced the MAC-TUTOR single-board computer that could be used for testing and development. The Unix tools could be used to build a program, download it to the MAC-TUTOR, run and debug it, and send status back to the Unix side. The basic MAC-TUTOR included 2 kB of RAM, 2 kB of ROM with basic hardware control functions, three sockets for 1 kB PROM chips, a 28-button calculator-type keyboard (4 by 7), and a display consisting of eight 7-segment LEDs. Onboard interfaces included cassette tape, two RS-232 for computer terminals, and a 32-pin bus expander that could be used to add more memory or memory-mapped devices.

Implementation
The WE212 used over 7,000 transistors and was implemented on a 5 micron CMOS process, resulting in a die that measured 220x230 mils. It normally ran at 12 V and 2 MHz, resulting in a 200 milliwatt draw, significantly lower than contemporary processors like the 6502 or Z80. A second supply at 5 V was also needed to power the transistor-transistor logic (TTL) portions that interfaced with the rest of the computer hardware.

CMOS was chosen both for its lower power use than existing NMOS logic designs, as well as the ability to have both NMOS and PMOS transistors on the same chip, which the designers felt offered greater flexibility. The ALU was too complex to be implemented in CMOS, which would have required twice the area of the NMOS implementation they used. As this resulted in increased power dissipation, the system removed power to the ALU during the periods it was not being used, during memory access for instance. As the ALU was only active about 20% of the time, this represented a significant power saving.

It was packaged in a 40-pin DIP with a 16-pin address bus and 8-pin data bus, meaning none of the main pins were multiplexed and data could be read in a single cycle. Input/output was memory mapped and did not use separate pins, like those on the Intel 8080. Two pins provided direct memory access (DMA); a device desiring DMA would pull DMAREQ low, and when the processor was ready to release the bus it would indicate this by pulling DMAACK low. The device could then access memory as long as it needed and indicated it was finished by releasing DMA REQ. Another three pins, S1 through S3, indicated the internal state of the CPU and any error conditions. The rest of the pins were a typical mix of power, interrupt control, and clock pins.

At least three versions of the WE212 are known to exist, A through C. The differences between these, aside from the packaging, are not described in any available references.

Uses and influence
In addition to the MAC-TUTOR system, the WE212 is mentioned in passing in a number of AT&T products, including, among others, the 4ESS switch and SLC-96 subscriber loop carrier.

Although the BELLMAC-8 was relatively little use, the basic concept of designing a processor specifically to run C and similar languages was continually explored by Bell over the next decade. The following BELLMAC-80 was essentially a 32-bit implementation of the C Machine. An attempted high-performance design in ECL was abandoned and a simpler implementation was produced as CRISP in 1986, achieving approximately 7.7 VAX MIPS.

AT&T decided to change the target for the C Machine efforts, reorienting towards low-power applications for mobile computing. This led to the AT&T Hobbit design, with the first version, the AT&T 92010, released in 1992. A lack of success in the market led AT&T to withdraw the Hobbit from the market in 1993, and with it the C Machine developments ended.