NS32000

The NS32000, sometimes known as the 32k, is a series of microprocessors produced by National Semiconductor. The first member of the family came to market in 1982, briefly known as the 16032 before becoming the 32016. It was the first general-purpose microprocessor on the market that used 32-bit data internally: the Motorola 68000 had 32-bit registers and instructions to perform 32-bit arithmetic, but used a 16-bit ALU for arithmetic operations on data, and thus took twice as long to perform those arithmetic operations. However, the 32016 contained many bugs and often could not be run at its rated speed. These problems, and the presence of the otherwise similar 68000 which had been available since 1980, led to little use in the market.

Several improved versions followed, including 1985's 32032 which was essentially a bug-fixed 32016 with an external 32-bit data bus. While it offered about 50% better speed than the 32016, it was outperformed by the 32-bit Motorola 68020, released a year prior. The 32532, released in 1987, outperformed the contemporary Motorola 68030 by almost two times, but by this time most interest in microprocessors had turned to RISC platforms and this otherwise excellent design saw almost no use as well.

National was working on further improvements in the 32732, but eventually gave up attempting to compete in the central processing unit (CPU) space. Instead, the basic 32000 architecture was combined with several support systems and relaunched as the Swordfish microcontroller. This had some success in the market before it was replaced by the CompactRISC architecture in mid-1990s.

Design concept


The NS32000 series traces its history to an effort by National Semiconductor to produce a single-chip implementation of the VAX-11 architecture. The VAX is well known for its highly "orthogonal" instruction set architecture (ISA), in which any instruction can be applied to any data. For instance, an  instruction might add the contents of two processor registers, or one register against a value in memory, two values in memory, or use the register as an offset against an address. This flexibility was considered the paragon of design in the era of complex instruction set computers (CISC).

National took DEC to court in California to ensure the legality of the design, but when DEC had the lawsuit moved to Massachusetts, DEC's home state, the lawsuit was dropped and the Series 32000 architecture was developed instead. Although the new instruction set architecture was not VAX-11 compatible, it did retain its highly "orthogonal" design philosophy. That is, every instruction could be used with any type of data. Articles of the time also referred to this as "symmetrical".

The original processor family consisted of the NS16032 CPU and a NS16C032 low-power variant, both having a 16-bit data path and so requiring two machine cycles to load a single 32-bit word. Both could be used with the NS16082 memory management unit, which provided 24-bit virtual memory support for up to 16 MB physical memory. The NS16008 was a cut-down version with an 8-bit external data path and no virtual memory support, which had a reduced pin count and was thus somewhat easier to implement. Early announcements of the family included the NS16016 with a 16-bit data bus, and both the NS16008 and NS16016 were to feature an emulation mode for the Intel 8080 running at four times the speed of that processor.

At the same time, National Semiconductor also announced two future versions, the NS32032 and NS32132. The former was essentially a version of the NS16032 with a 32-bit external data bus, allowing it to read data at twice the rate. This was project to be released in 1984. The NS32132 was a version with a 29-bit internal addresses and 32-bit external, allowing it to address a complete 4 GB of memory. It was to be released in 1985.

All of these could also be used with the NS16081 floating point unit.

Architecture
The processors have 8 general-purpose 32-bit registers, plus a series of special-purpose registers: (Additional system registers not listed).
 * Frame pointer
 * Stack pointer (one each for user and supervisor modes)
 * Static base register, for referencing global variables
 * Link base register for dynamically linked modules (object orientation)
 * Program counter
 * A typical processor status register, with a low-order user byte and a high-order system byte.

The instruction set is very much in the CISC model, with 2-operand instructions, memory-to-memory operations, flexible addressing modes, and variable-length byte-aligned instruction encoding. Addressing modes can involve up to two displacements and two memory indirections per operand as well as scaled indexing, making the longest conceivable instruction 23 bytes. The actual number of instructions is much lower than that of contemporary RISC processors.

Unlike some other processors, autoincrement of the base register is not provided; the only exception is a "top of stack" addressing modes that pop sources and push destinations. Uniquely, the size of the displacement is encoded in its most significant bits: 0, 10 and 11 preceded 7-, 14- and 30-bit signed displacements. (Although the processors are otherwise consistently little-endian, displacements in the instruction stream are stored in big-endian order).

General-purpose operands are specified using a 5-bit field. To this can be added an index byte (specifying the index register and 5-bit base address), and up to 2 variable-length displacements per operand.

32016
The first chip in the series was originally referred to as the 16032, but later renamed 32016 to emphasize its 32-bit internals. This contrasts it with its primary competitor in this space, 1979's Motorola 68000 (68k). The 68k used 32-bit instructions and registers, but its arithmetic logic unit (ALU), which controls much of the overall processing task, was only 16-bit. This meant it had to cycle 32-bit data through the ALU twice to complete an operation. In contrast, the NS32000 has a 32-bit ALU, so that 16-bit and 32-bit instructions take the same time to complete.

The 32016 first shipped in 1982 in a 46-pin DIP package. It may have been the first 32-bit chip to reach mass production and sale (at least according to National's marketing). In a report in a June 1983 publication, however, it was remarked that National was "promising production quantities this summer" of 16032 parts, having been "shipping sample quantities for several months", with the floating point co-processor sampling "this month". Although a 1982 introduction post-dates the 68k by about two years, the 68k was not yet being widely used in the market and the 32016 generated significant interest. Unfortunately, the early versions were filled with bugs and could rarely be run at their rated speed. By 1984, after two years, the errata list still contained items specifying uncontrollable conditions that would result in the processor coming to a halt, forcing a reset.

The original product roadmap envisaged 6 MHz and 10 MHz parts during 1983 and 12 MHz and 14 MHz parts during 1984. However, press reports in 1984 indicated difficulties in keeping to this roadmap, with it reportedly having taken five months to increase the frequency of the parts from 6 MHz to 8 MHz, and with representatives estimating a further "two, three or five months" to increase the frequency to 10 MHz. Two unspecified chips of the five in the chipset were reported to be the cause of these problems. An early 1985 article about the 32016-based Whitechapel MG-1 workstation noted that the 32081 memory management unit was "suffering from bugs" and had been situated on its own board providing hardware fixes. In 1986, Texas Instruments announced a "fully qualified 10 MHz TI32000 32-bit microprocessor chip set" consisting of the TI32016 CPU and TI32082 memory management unit as 48-pin devices, the TI32201 timing control unit and TI32081 floating-point unit as 24-pin devices, and the TI32202 interrupt control unit as a 40-pin device, with the five-device chipset "priced at $289 in 100-unit quantities".

National changed its design methodology to make it possible to get the part into production and a design system based on the language "Z" was co-developed with the University of Tel-Aviv, close to the "NSC" design centre in Herzliya, Israel. The "Z" language is similar to today's Verilog and VHDL, but has a Pascal-like syntax and is optimized for two-phase clock designs. However, by the times the fruit of these efforts were being felt in the design, numerous 68k machines were already on the market, notably the Apple Macintosh, and the 32016 never saw widespread use.

The 32016 has a 16-bit external data bus, a 24-bit external address bus, and a full 32-bit instruction set. It also includes a coprocessor interface, allowing coprocessors such as FPUs and MMUs to be attached as peers to the main processor. The MMU is based on demand paging Virtual Memory, which is the most unusual feature compared to the segmented memory approach used by the competition, and has become the standard for how microprocessors are designed today. The architecture supports an instruction restart mechanism on a page fault, which is much cleaner than the Motorola approach to dump the internal status on a page fault, which has to be read back, before the instruction is continued.



While often compared to the 68k's instruction set, this was rejected by NSC employees; one of the key marketing phrases of the time was "Elegance is Everything", comparing the highly orthogonal Series 32000 to the "kludge". One key difference is Motorola's use of address registers and data registers, with instructions only working on either address or data registers. The Series 32000 has general-purpose registers, described as "address-data" registers in technical documentation.

32032


The 32032 was introduced in 1984. It is almost completely compatible with the 32016, but features a 32-bit data bus (although keeping the 24-bit address bus) for somewhat faster performance, described as "minicomputer performance" comparable with that of a VAX-11 system. There was also a 32008, a 32016 with a data bus cut down to 8-bits wide for low-cost applications. It is philosophically similar to the MC68008, and equally unpopular.

National also produced a series of related support chips like the NS32081 Floating Point Unit (FPU), NS32082 Memory Management Units (MMUs), NS32203 Direct Memory Access (DMA) and NS32202 Interrupt Controllers. With the full set plus memory chips and peripherals, it was feasible to build a 32-bit computer system capable of supporting modern multi-tasking operating systems, something that had previously been possible only on expensive minicomputers and mainframes.

32332, 32532
In 1985, National Semi introduced the NS32332, a much-improved version of the 32032. From the datasheet, the enhancements include "the addition of new dedicated addressing hardware (consisting of a high speed ALU, a barrel shifter and an address register), a very efficient increased (20 bytes) instruction prefetch queue, a new system/memory bus interface/protocol, increased efficiency slave processor protocol and finally enhancements of microcode." There was also a new NS32382 MMU, NS32381 FPU and the (very rare) NS32310 interface to a Weitek FPA. The aggregate performance boost of the NS32332 from these enhancements only made it 50 percent faster than the original NS32032, and therefore less than that of the main competitor, the MC68020.

National Semi introduced the NS32532 in early 1987. Running at 20-, 25- & 30-MHz, it was a complete redesign of the internal implementation with a five-stage pipeline, an integrated Cache/MMU and improved memory performance, making it about twice as performant as the competing MC68030 and i80386. At this stage RISC architectures were starting to make inroads, and the main competitors became the now equally dead AM29000 and MC88000, which was considered faster than the NS32532. For floating-point, the NS32532 used the existing NS32381 or the NS32580 interface to a Weitek FPA. The NS32532 was the basis of the PC532, a "public domain" hardware project, and one of the few to produce a useful machine running a real operating system (in this case, Minix or NetBSD).

The semi-mythical NS32732 (sometimes called NS32764), envisioned as the high-performance successor to the NS32532, never came to the market.

Swordfish
A derivative of the NS32732 called Swordfish was aimed at embedded systems and arrived in about 1990. Swordfish has an integrated floating point unit, timers, DMA controllers and other peripherals not normally available in microprocessors. It has a 64-bit data bus and is internally overclocked from 25 to 50 MHz. The chief architect of the Swordfish is Donald Alpert, who went on to manage the architectural team designing the Pentium. The Pentium internal microarchitecture is similar to the preceding Swordfish.

The focus of Swordfish was high-end Postscript laser printers, and performance was exceptional at the time. Competing solutions could render about one new page per minute, but the Swordfish demo unit would print out sixteen pages per minute, limited only by the laser-engine mechanics. On each page it would print out how much time it was idling, waiting for the engine to complete.

The Swordfish die is huge, and it was eventually decided to drop the project altogether, and the product never went into production. The lessons from the Swordfish were used for the CompactRISC designs. In the beginning, there were both a CompactRISC-32 and a CompactRISC-16, designed using "Z". National never brought a chip to the market with the CompactRISC-32 core. National's Research department worked with the University of Michigan to develop the first synthesizable Verilog Model, and Verilog was used from the CR16C and onwards.

Others
Versions of the older NS32000 line for low-cost products such as the NS32CG16, NS32CG160, NS32FV16, NS32FX161, NS32FX164 and the NS32AM160/1/3, all based on the NS302CG16 were introduced from 1987 and onwards. These processors had some success in the laser printer and fax market, despite intense competition from AMD and Intel RISC chips. Especially the NS32CG16 should be noted. The key difference between this and the NS32C016 is the integration of the expensive TCU (Timing Control Unit) which generates the needed two-phase clock from a crystal, and the removal of the floating point coprocessor support, which freed up microcode space for the useful BitBLT instruction set, which significantly improves the performance in laser printer operations, making this 60,000 transistor chip faster than the 200,000 transistor MC68020. The NS32CG160 is the CG16 with timers and DMA peripherals, while the NS32FV/FX16x chips have extra DSP functionality on top of the CG16 BitBLT core for the Fax/Answering Machine market. They are complemented by the NS32532 based NS32GX32 later. Unlike the previous chips, there was no extra hardware. The NS32GX32 is the NS32532 without the MMU sold at an attractive price for embedded system. In the beginning, this was just a remarked chip. It is unclear if the chip was redesigned for lower-cost production.

Datasheets exist for an NS32132, apparently designed for multiprocessor systems. This is the NS32032 extended with an arbiter. The bus usage of the NS32032 is about 50 percent, owing to its very compact instruction set, or its very slow pipeline as competitors would phrase it. Indeed, one suggested application of the NS32032 was as part of a "fault-tolerant transaction system" employing "two 32032s in parallel and comparing results on alternate memory cycles to detect soft errors". The NS32132 chip allows a pair of CPUs to be connected to the same memory system, without much change of the PCB. Prototype systems were built by Diab Data AB in Sweden, but did not perform as well as the single-CPU MC68020 system designed by the same company.

Machines using the NS32000 series

 * Acorn Cambridge Workstation – NS32016 (with 6502-based BBC Micro host)
 * BBC Micro 32016 Second Processor - a separate expansion for the BBC Micro providing the NS32016 capabilities from the Acorn Cambridge Workstation
 * Canon LBP-8 Mark III Laser Printer – NS32CG16
 * CompuPro 32016 – NS32016 S-100 Card
 * Encore Multimax – NS32032, NS32332 and NS32532 Multiprocessor
 * E-mu Systems Emax – NS32008
 * E-mu Systems Emulator III – NS32016
 * ETH Zürich Ceres workstation – NS32032
 * ETH Zürich Ceres-2 workstation – NS32532
 * ETH Zürich Ceres-3 workstation – NS32GX32
 * General Robotics Corp. Python – NS32032 & N32016 Q-Bus card
 * Heurikon VME532 – NS32532 VME Card (with cache)
 * IBM RT PC – Some early models used the NS32081 FPU as a coprocessor for the IBM ROMP microprocessor
 * Intermec (previously A-Tech and then UBI) Label Printer – NS32CG16
 * Labtam Unix System NS32032 and NS32332 CPUs
 * Lauterbach Incircuit Emulator ICE (System Controller 32-bit, first version in 1996, max 16 MB ZIP20-RAM, Z180 to serve Ethernet)
 * National Semiconductor ICM-3216 – NS32016
 * National Semiconductor ICM-332-1 – NS32332 w/ NS32016 I/O processor
 * National Semiconductor SYS32/20 – NS32016 PC add-on board w/ Unix
 * Opus Systems Opus516 Personal Mainframe – NS32016 PC Add-On Board
 * Opus Systems Opus532.32 Personal Mainframe – NS32032 PC Add-On Board
 * PC532 – NS32532
 * Sequent Balance – NS32016, NS32032 and NS32332 multiprocessor
 * Siemens PC-MX2 – NS32016
 * Siemens MX300-05/-10/-15/-30 – NS32332 (−05/-10) or NS32532 (−15/-30) under SINIX (MX300-55 and later use i486)
 * Siemens MX500-75/-85 – NS32532 (2-8x CPUs; Sequent Boards / MX500-90 uses 2-12x i486)
 * Symmetric Computer Systems S/375 – NS32016, used to cross-develop 386BSD
 * Syte Information Technology Model 300 – NS32032-based Unix graphics workstation, several such "multiple tightly coupled microcomputers organized in a mainframe architecture" comprising the Syte Series 3000 "micro-mainframe" running the Global Environment Manager to manage multiple virtual environments on each processor node, with Syte eventually failing before initial product shipments
 * Tektronix 6130 and 6250 Workstation – NS32016 and NS32032
 * Tolerant Systems Eternity Series – NS32032 w/ NS32016 I/O processor
 * Trinity College Workstation – NS32332
 * Teklogix 9020 network controller – NS32332
 * Teklogix 9200 network controller – NS32CG160
 * Whitechapel MG-1 – NS32016
 * Whitechapel MG200 – NS32332

Legacy
In June 2015, Udo Möller released a complete Verilog implementation of an NS32000 processor on OpenCores. Fully software-compatible with an NS32532 CPU with N32381 FPU, it is significantly faster when implemented on an FPGA, both operating at a higher clock rate and using fewer cycles per instruction.