Intel iAPX 432

The iAPX 432 (Intel Advanced Performance Architecture) is a discontinued computer architecture introduced in 1981. It was Intel's first 32-bit processor design. The main processor of the architecture, the general data processor, is implemented as a set of two separate integrated circuits, due to technical limitations at the time. Although some early 8086, 80186 and 80286-based systems and manuals also used the iAPX prefix for marketing reasons, the iAPX 432 and the 8086 processor lines are completely separate designs with completely different instruction sets.

The project started in 1975 as the 8800 (after the 8008 and the 8080) and was intended to be Intel's major design for the 1980s. Unlike the 8086, which was designed the following year as a successor to the 8080, the iAPX 432 was a radical departure from Intel's previous designs meant for a different market niche, and completely unrelated to the 8080 or x86 product lines.

The iAPX 432 project is considered a commercial failure for Intel, and was discontinued in 1986.

Description
The iAPX 432 was referred to as a "micromainframe", designed to be programmed entirely in high-level languages. The instruction set architecture was also entirely new and a significant departure from Intel's previous 8008 and 8080 processors as the iAPX 432 programming model is a stack machine with no visible general-purpose registers. It supports object-oriented programming, garbage collection and multitasking as well as more conventional memory management directly in hardware and microcode. Direct support for various data structures is also intended to allow modern operating systems to be implemented using far less program code than for ordinary processors. Intel iMAX 432 is a discontinued operating system for the 432, written entirely in Ada, and Ada was also the intended primary language for application programming. In some aspects, it may be seen as a high-level language computer architecture.

These properties and features resulted in a hardware and microcode design that was more complex than most processors of the era, especially microprocessors. However, internal and external buses are (mostly) not wider than 16-bit, and, just like in other 32-bit microprocessors of the era (such as the 68000 or the 32016), 32-bit arithmetical instructions are implemented by a 16-bit ALU, via random logic and microcode or other kinds of sequential logic. The iAPX 432 enlarged address space over the 8080 was also limited by the fact that linear addressing of data could still only use 16-bit offsets, somewhat akin to Intel's first 8086-based designs, including the contemporary 80286 (the new 32-bit segment offsets of the 80386 architecture was described publicly in detail in 1984).

Using the semiconductor technology of its day, Intel's engineers weren't able to translate the design into a very efficient first implementation. Along with the lack of optimization in a premature Ada compiler, this contributed to rather slow but expensive computer systems, performing typical benchmarks at roughly 1/4 the speed of the new 80286 chip at the same clock frequency (in early 1982). This initial performance gap to the rather low-profile and low-priced 8086 line was probably the main reason why Intel's plan to replace the latter (later known as x86) with the iAPX 432 failed. Although engineers saw ways to improve a next generation design, the iAPX 432 capability architecture had now started to be regarded more as an implementation overhead rather than as the simplifying support it was intended to be.

Originally designed for clock frequencies of up to 10 MHz, actual devices sold were specified for maximum clock speeds of 4 MHz, 5 MHz, 7 MHz and 8 MHz with a peak performance of 2 million instructions per second at 8 MHz.

Development
Intel's 432 project started in 1976, a year after the 8-bit Intel 8080 was completed and a year before their 16-bit 8086 project began. The 432 project was initially named the 8800, as their next step beyond the existing Intel 8008 and 8080 microprocessors. This became a very big step. The instruction sets of these 8-bit processors were not very well fitted for typical Algol-like compiled languages. However, the major problem was their small native addressing ranges, just 16 KB for 8008 and 64 KB for 8080, far too small for many complex software systems without using some kind of bank switching, memory segmentation, or similar mechanism (which was built into the 8086, a few years later on). Intel now aimed to build a sophisticated complete system in a few LSI chips, that was functionally equal to or better than the best 32-bit minicomputers and mainframes requiring entire cabinets of older chips. This system would support multiprocessors, modular expansion, fault tolerance, advanced operating systems, advanced programming languages, very large applications, ultra reliability, and ultra security. Its architecture would address the needs of Intel's customers for a decade.

The iAPX 432 development team was managed by Bill Lattin, with Justin Rattner as the lead engineer (although one source states that Fred Pollack was the lead engineer). (Rattner would later become CTO of Intel.) Initially the team worked from Santa Clara, but in March 1977 Lattin and his team of 17 engineers moved to Intel's new site in Portland. Pollack later specialized in superscalarity and became the lead architect of the i686 chip Intel Pentium Pro.

It soon became clear that it would take several years and many engineers to design all this. And it would similarly take several years of further progress in Moore's Law, before improved chip manufacturing could fit all this into a few dense chips. Meanwhile, Intel urgently needed a simpler interim product to meet the immediate competition from Motorola, Zilog, and National Semiconductor. So Intel began a rushed project to design the 8086 as a low-risk incremental evolution from the 8080, using a separate design team. The mass-market 8086 shipped in 1978.

The 8086 was designed to be backward-compatible with the 8080 in the sense that 8080 assembly language could be mapped on to the 8086 architecture using a special assembler. Existing 8080 assembly source code (albeit no executable code) was thereby made upward compatible with the new 8086 to a degree. In contrast, the 432 had no software compatibility or migration requirements. The architects had total freedom to do a novel design from scratch, using whatever techniques they guessed would be best for large-scale systems and software. They applied fashionable computer science concepts from universities, particularly capability machines, object-oriented programming, high-level CISC machines, Ada, and densely encoded instructions. This ambitious mix of novel features made the chip larger and more complex. The chip's complexity limited the clock speed and lengthened the design schedule.

The core of the design — the main processor — was termed the General Data Processor (GDP) and built as two integrated circuits: one (the 43201) to fetch and decode instructions, the other (the 43202) to execute them. Most systems would also include the 43203 Interface Processor (IP) which operated as a channel controller for I/O, and an Attached Processor (AP), a conventional Intel 8086 which provided "processing power in the I/O subsystem".

These were some of the largest designs of the era. The two-chip GDP had a combined count of approximately 97,000 transistors while the single chip IP had approximately 49,000. By comparison, the Motorola 68000 (introduced in 1979) had approximately 40,000 transistors.

In 1983, Intel released two additional integrated circuits for the iAPX 432 Interconnect Architecture: the 43204 Bus Interface Unit (BIU) and 43205 Memory Control Unit (MCU). These chips allowed for nearly glueless multiprocessor systems with up to 63 nodes.

The project's failures
Some of the innovative features of the iAPX 432 were detrimental to good performance. In many cases, the iAPX 432 had a significantly slower instruction throughput than conventional microprocessors of the era, such as the National Semiconductor 32016, Motorola 68010 and Intel 80286. One problem was that the two-chip implementation of the GDP limited it to the speed of the motherboard's electrical wiring. A larger issue was the capability architecture needed large associative caches to run efficiently, but the chips had no room left for that. The instruction set also used bit-aligned variable-length instructions instead of the usual semi-fixed byte or word-aligned formats used in the majority of computer designs. Instruction decoding was therefore more complex than in other designs. Although this did not hamper performance in itself, it used additional transistors (mainly for a large barrel shifter) in a design that was already lacking space and transistors for caches, wider buses and other performance oriented features. In addition, the BIU was designed to support fault-tolerant systems, and in doing so up to 40% of the bus time was held up in wait states.

Another major problem was its immature and untuned Ada compiler. It used high-cost object-oriented instructions in every case, instead of the faster scalar instructions where it would have made sense to do so. For instance the iAPX 432 included a very expensive inter-module procedure call instruction, which the compiler used for all calls, despite the existence of much faster branch and link instructions. Another very slow call was enter_environment, which set up the memory protection. The compiler ran this for every single variable in the system, even when variables were used inside an existing environment and did not have to be checked. To make matters worse, data passed to and from procedures was always passed by value-return rather than by reference. When running the Dhrystone benchmark, parameter passing took ten times longer than all other computations combined.

According to the New York Times, "the i432 ran 5 to 10 times more slowly than its competitor, the Motorola 68000".

Impact and similar designs
The iAPX 432 was one of the first systems to implement the new IEEE-754 Standard for Floating-Point Arithmetic.

An outcome of the failure of the 432 was that microprocessor designers concluded that object support in the chip leads to a complex design that will invariably run slowly, and the 432 was often cited as a counter-example by proponents of RISC designs. However, some hold that the OO support was not the primary problem with the 432, and that the implementation shortcomings (especially in the compiler) mentioned above would have made any CPU design slow. Since the iAPX 432 there has been only one other attempt at a similar design, the Rekursiv processor, although the INMOS Transputer's process support was similar — and very fast.

Intel had spent considerable time, money, and mindshare on the 432, had a skilled team devoted to it, and was unwilling to abandon it entirely after its failure in the marketplace. A new architect—Glenford Myers—was brought in to produce an entirely new architecture and implementation for the core processor, which would be built in a joint Intel/Siemens project (later BiiN), resulting in the i960-series processors. The i960 RISC subset became popular for a time in the embedded processor market, but the high-end 960MC and the tagged-memory 960MX were marketed only for military applications.

According to the New York Times, Intel's collaboration with HP on the Merced processor (later known as Itanium) was the company's comeback attempt for the very high-end market.

Architecture
The iAPX 432 instructions have variable length, between 6 and 321 bits. Unusually, they are not byte-aligned, that is, they may contain odd numbers of bits and directly follow each other without regard to byte boundaries.

Object-oriented memory and capabilities
The iAPX 432 has hardware and microcode support for object-oriented programming and capability-based addressing. The system uses segmented memory, with up to 224 segments of up to 64 KB each, providing a total virtual address space of 240 bytes. The physical address space is 224 bytes (16 MB).

Programs are not able to reference data or instructions by address; instead they must specify a segment and an offset within the segment. Segments are referenced by access descriptors (ADs), which provide an index into the system object table and a set of rights (capabilities) governing accesses to that segment. Segments may be "access segments", which can only contain Access Descriptors, or "data segments" which cannot contain ADs. The hardware and microcode rigidly enforce the distinction between data and access segments, and will not allow software to treat data as access descriptors, or vice versa.

System-defined objects consist of either a single access segment, or an access segment and a data segment. System-defined segments contain data or access descriptors for system-defined data at designated offsets, though the operating system or user software may extend these with additional data. Each system object has a type field which is checked by microcode, such that a Port Object cannot be used where a Carrier Object is needed. User programs can define new object types which will get the full benefit of the hardware type checking, through the use of type control objects (TCOs).

In Release 1 of the iAPX 432 architecture, a system-defined object typically consisted of an access segment, and optionally (depending on the object type) a data segment specified by an access descriptor at a fixed offset within the access segment.

By Release 3 of the architecture, in order to improve performance, access segments and data segments were combined into single segments of up to 128 kB, split into an access part and a data part of 0–64 KB each. This reduced the number of object table lookups dramatically, and doubled the maximum virtual address space.

The iAPX432 recognizes fourteen types of predefined system objects:
 * instruction object contains executable instructions
 * domain object represents a program module and contains references to subroutines and data
 * context object represents the context of a process in execution
 * type-definition object represents a software-defined object type
 * type-control object represents type-specific privilege
 * object table identifies the system's collection of active object descriptors
 * storage resource object represents a free storage pool
 * physical storage object identifies free storage blocks in memory
 * storage claim object limits storage that may be allocated by all associated storage resource objects
 * process object identifies a running process
 * port object represents a port and message queue for interprocess communication
 * carrier Carriers carry messages to and from ports
 * processor contains state information for one processor in the system
 * processor communication object is used for interprocessor communication

Garbage collection
Software running on the 432 does not need to explicitly deallocate objects that are no longer needed. Instead, the microcode implements part of the marking portion of Edsger Dijkstra's on-the-fly parallel garbage collection algorithm (a mark-and-sweep style collector). The entries in the system object table contain the bits used to mark each object as being white, black, or grey as needed by the collector. The iMAX 432 operating system includes the software portion of the garbage collector.

Instruction format
Executable instructions are contained within a system "instruction object". Due to instructions being bit-aligned, a 16-bit bit displacement into the instruction object allows the object to contain up to 65,536 bits (8,192 bytes) of instructions.

Instructions consist of an operator, consisting of a class and an opcode, and zero to three operand references. "The fields are organized to present information to the processor in the sequence required for decoding". More frequently used operators are encoded using fewer bits. The instruction begins with the 4 or 6 bit class field which indicates the number of operands, called the order of the instruction, and the length of each operand. This is optionally followed by a 0 to 4 bit format field which describes the operands (if there are no operands the format is not present). Then come zero to three operands, as described by the format. The instruction is terminated by the 0 to 5 bit opcode, if any (some classes contain only one instruction and therefore have no opcode). "The Format field permits the GDP to appear to the programmer as a zero-, one-, two-, or three-address architecture." The format field indicates that an operand is a data reference, or the top or next-to-top element of the operand stack.