NEC V60

The NEC V60 is a CISC microprocessor manufactured by NEC starting in 1986. Several improved versions were introduced with the same instruction set architecture (ISA), the V70 in 1987, and the V80 and AFPP in 1989. They were succeeded by the V800 product families, which is currently produced by Renesas Electronics.

The V60 family includes a floating-point unit (FPU) and memory management unit (MMU) and real-time operating system (RTOS) support for both Unix-based user-application-oriented systems and I-TRON–based hardware-control-oriented embedded systems. They can be used in a multi-cpu lockstep fault-tolerant mechanism named FRM. Development tools included Ada certified system MV-4000, and an in-circuit emulator (ICE).

The V60/V70/V80's applications covered a wide area, including circuit switching telephone exchanges, minicomputers, aerospace guidance systems, word processors, industrial computers, and various arcade games.

Introduction
NEC V60 is a CISC processor manufactured by NEC starting in 1986. It was the first 32-bit general-purpose microprocessor commercially available in Japan.

Based on a relatively traditional design for the period,  the V60 was a radical departure from NEC's previous, 16-bit V–series processor, the V20-V50, which were based on the Intel 8086 model, although the V60 had the ability to emulate the V20/V30.

According to NEC's documentation, this computer architectural change was due to the increasing demands for, and the diversity of, high-level programming languages. Such trends called for a processor with both improved performance, achieved by doubling the bus width to 32 bits, and with greater flexibility facilitated by having a large number of general-purpose registers. These were common features of RISC chips. At the time, a transition from CISC to RISC seemed to bring many benefits for emerging markets.

Today, RISC chips are common, and CISC designs—such as Intel's x86 and the 80486—which have been mainstream for several decades, internally adopt RISC features in their microarchitectures. According to Pat Gelsinger, binary backward compatibility for legacy software is more important than changing the ISA.

Instruction set
The V60 (a.k.a. μPD70616) retained a CISC architecture. Its manual describes their architecture as having "features of high-end mainframe and supercomputers", with a fully orthogonal instruction set that includes non-uniform-length instructions, memory-to-memory operations that include string manipulation, and complex operand-addressing schemes.

Family
The V60 operates as a 32-bit processor internally, while externally providing 16-bit data, and 24-bit address, buses. In addition, the V60 has 32 32-bit general-purpose registers. Its basic architecture is used in several variants. The V70 (μPD70632), released in 1987, provides 32-bit external buses. Launched in 1989, the V80 (μPD70832) is the culmination of the series: having on-chip caches, a branch predictor, and less reliance on microcode for complex operations.

Software
The operating systems developed for the V60-V80 series are generally oriented toward real-time operations. Several OSs were ported to the series, including real-time versions of Unix and I-TRON.

Because the V60/V70 was used in various Japanese arcade games, their instruction set architecture is emulated in the MAME CPU simulator. The latest open-source code is available from the GitHub repository.

FRM
All three processors have the FRM (Functional Redundancy Monitoring) synchronous multiple modular lockstep mechanism, which enables fault-tolerant computer systems. It requires multiple devices of the same model, one of which then operates in "master mode", while the other devices listen to the master device, in "checker mode". If two or more devices simultaneously output different results via their "fault output" pins, a majority-voting decision can be taken by external circuits. In addition, a recovery method for the mismatched instruction—either "roll-back by retry" or "roll-forward by exception"—can be selected via an external pin.

V60
The work on V60 processor began in 1982 with about 250 engineers under the leadership of Yoichi Yano, and the processor debuted in February 1986. It had a six-stage pipeline, built-in memory-management unit, and floating-point arithmetic. It was manufactured using a two-layer aluminum CMOS process technology, under a 1.5 μm design rule, to implement 375,000 transistors on a 13.9 × 13.8 mm2 die. It operates at 5 V and was initially packaged in a 68-pin PGA. The first version ran at 16 MHz and attained 3.5 MIPS. Its sample price at launch was set at ¥100,000 ($588.23). It entered full-scale production in August 1986.



Sega employed this processor for most of its arcade game sets in the 1990s; both the Sega System 32 and the Sega Model 1 architectures used V60 as their main CPU. (The latter used the lower-cost μPD70615 variant, which doesn't implement V20/V30 emulation and FRM. ) The V60 was also used as the main CPU in the SSV arcade architecture—so named because it was developed jointly by Seta, Sammy, and Visco. Sega originally considered using a 16 MHz V60 as the basis for its Sega Saturn console; but after receiving word that the PlayStation employed a 33.8 MHz MIPS R3000A processor, instead chose the dual-SH-2 design for the production model.

In 1988, NEC released a kit called PS98-145-HMW for Unix enthusiasts. The kit contained a V60 processor board that could be plugged into selected models of the PC-9800 computer series and a distribution of their UNIX System V port, the PC-UX/V Rel 2.0 (V60), on 15 8-inch floppy disks. The suggested retail price for this kit was 450,000 Yen. NEC-group companies themselves intensively employed the V60 processor. Their telephone circuit switcher (exchange), which was one of the first intended targets, used V60. In 1991, they expanded their word processor products line with Bungou Mini (文豪ミニ in Japanese) series 5SX, 7SX, and 7SD, which used the V60 for fast outline font processing, while the main system processor was a 16 MHz NEC V33. In addition, V60 microcode variants were employed in NEC's MS-4100 minicomputer series, which was the fastest one in Japan at that time.

V70


The V70 (μPD70632) improved on the V60 by increasing the external buses to 32 bits, equal to the internal buses. It was also manufactured in 1.5 μm with a two-metal layer process. Its 14.35 × 14.24 mm2 die had 385,000 transistors and was packaged in a 132-pin ceramic PGA. Its MMU had support for demand paging. Its floating-point unit was IEEE 754 compliant. The 20 MHz version attained a peak performance of 6.6 MIPS and was priced, at launch in August 1987, at ¥100,000 ($719.42). The initial production capacity was 20,000 units per month. A later report describes it as fabricated in 1.2-micrometer CMOS on a 12.23 × 12.32 mm2 die. The V70 had a two-cycle non-pipeline (T1-T2) external bus system, whereas that of the V60 operated at 3 or 4 cycles (T1-T3/T4). Of course, the internal units were pipelined.

The V70 was used by Sega in its System Multi 32 and by Jaleco in its Mega System 32. (See the photo of the V70 mounted on the latter system's printed circuit board.)



JAXA embedded its variant of the V70, with the I-TRON RX616 operating system, in the Guidance Control Computer of the H-IIA carrier rockets, in satellites such as the Akatsuki (Venus Climate Orbiter), and the Kibo International Space Station (ISS) module. The H-IIA launch vehicles were deployed domestically, in Japan, although their payloads included satellites from foreign countries. As described in JAXA's LSI (MPU/ASIC) roadmap, this V70 variant is designated "32bit MPU (H32/V70)", whose development, probably including the testing (QT) phase, was "from the middle of 1980s to early 1990s". This variant was used until its replacement, in 2013, by the HR5000 64-bit, 25 MHz microprocessor, which is based on the MIPS64-5Kf architecture, fabricated by HIREC, whose development was completed around 2011.

"Space Environment Data Acquisition" for the V70 was done at the Kibo-ISS exposed facility.

V80
The V80 (μPD70832) was launched in the spring of 1989. By incorporating on-chip caches and a branch predictor, it was declared NEC's 486 by Computer Business Review. The performance of the V80 was two to four times than that of the V70, depending on application. For example, compared with V70, the V80 had a 32-bit hardware multiplier that reduced the number of cycles required to complete an integer-multiplication machine-instruction from 23 to 9. (For more detailed differences, see the hardware architecture section below.) The V80 was manufactured in a 0.8-micrometer CMOS process on a die area of 14.49 × 15.47 mm2, implementing 980,000 transistors. It was packaged in a 280-pin PGA, and operated at 25 and 33 MHz with claimed peak performances of 12.5 and 16.5 MIPS, respectively. The V80 had separate 1 KB on-die caches for both instructions and data. It had a 64-entry branch predictor, a 5% performance gain being attributed to it. The launch prices of the V80 were cited as equivalent to $1200 for the 33 MHz model and $960 for the 25 MHz model. Supposedly, a 45 MHz model was scheduled for 1990, but it did not materialize.

The V80, with μPD72691 co-FPP and μPD71101 simple peripheral chips, was used for an industrial computer running the RX-UX832 real-time UNIX operating system and a X11-R4-based windowing system.

AFPP (co-FPP)
The Advanced Floating Point Processor (AFPP) (μPD72691) is a co-processor for floating-point arithmetic operations. The V60/V70/V80 themselves can perform floating-point arithmetic, but they are very slow because they lack hardware dedicated to such operations. In 1989, to compensate V60/V70/V80 for their fairly weak floating-point performance, NEC launched this 80-bit floating-point co-processor for 32-bit single precision, 64-bit double precision, and 80-bit extended precision operations according to IEEE 754 specifications. This chip had a performance of 6.7 MFLOPS, doing vector-matrix multiplication while operating at 20 MHz. It was fabricated using a 1.2-micrometer double-metal layer CMOS process, resulting in 433,000 transistors on an 11.6 × 14.9 mm2 die. It was packaged in a 68-pin PGA. This co-processor connected to a V80 via a dedicated bus, to a V60 or V70 via a shared main bus, which constrained peak performance.

Hardware architecture
The V60/V70/V80 shared a basic architecture. They had thirty-two 32-bit general-purpose registers, with the last three of them commonly used as stack pointer, frame pointer, and argument pointer, which well matched high level language compilers' calling conventions. The V60 and V70 have 119 machine instructions, with that number being extended slightly to 123 instructions for the V80. The instructions are of non-uniform length, between one and 22 bytes, and take two operands, both of which can be addresses in main memory. After studying the V60's reference manual, Paul Vixie described it as "a very VAX-ish arch, with a V20/V30 emulation mode (which[...] means it can run Intel 8086/8088 software)".

The V60–V80 has a built-in memory management unit (MMU) that divides a 4-GB virtual address space into four 1-GB sections, each section being further divided into 1,024 1-MB areas, and each area being composed of 256 4-KB pages. On the V60/V70, four registers (ATBR0 to ATBR3) store section pointers, but the "area tables entries" (ATE) and page tables entries (PTE) are stored in off-chip RAM. The V80 merged the ATE and ATBR registers—which are both on-chip, with only the PTE entries stored in external RAM—allowing for faster execution of translation lookaside buffer (TLB) misses by eliminating one memory read.

The translation lookaside buffers on the V60/70 are 16-entry fully associative with replacement done by microcode. The V80, in contrast, has a 64-entry 2-way set associative TLB with replacement done in hardware. TLB replacement took 58 cycles in the V70 and disrupted the pipelined execution of other instructions. On the V80, a TLB replacement takes only 6 or 11 cycles depending on whether the page is in the same area; pipeline disruption no longer occurs in the V80 because of the separate TLB replacement hardware unit, which operates in parallel with the rest of the processor.

All three processors use the same protection mechanism, with 4 protection levels set via a program status word, Ring 0 being the privileged level that could access a special set of registers on the processors.

All three models support a triple-mode redundancy configuration with three CPUs used in a byzantine fault–tolerance scheme with bus freeze, instruction retry, and chip replacement signals. The V80 added parity signals to its data and address buses.

String operations were implemented in microcode in the V60/V70; but these were aided by a hardware data control unit, running at full bus speed, in the V80. This made string operations about five times faster in the V80 than in the V60/V70.

All floating-point operations are largely implemented in microcode across the processor family and are thus fairly slow. On the V60/V70, the 32-bit floating-point operations take 120/116/137 cycles for addition/multiplication/division, while the corresponding 64-bit floating-point operations take 178/270/590 cycles. The V80 has some limited hardware assist for phases of floating-point operations—e.g. decomposition into sign, exponent, and mantissa—thus its floating-point unit was claimed to be up to three times as effective as that of the V70, with 32-bit floating-point operations taking 36/44/74 cycles and 64-bit operations taking 75/110/533 cycles (addition/multiplication/division).

Unix (non-real-time and real-time)
NEC ported several variants of the Unix operating system to its V60/V70/V80 processors for user-application-oriented systems, including real-time ones. The first flavor of NEC's UNIX System V port for V60 was called PC-UX/V Rel 2.0 (V60). (Also refer to external link photos below.) NEC developed a Unix variant with a focus on real-time operation to run on V60/V70/V80. Called Real-time UNIX RX-UX 832, it has a double-layered kernel structure, with all task scheduling handled by the real-time kernel. A multiprocessor version of RX-UX 832 was also developed, named MUSTARD (Multiprocessor Unix for Embedded Real-Time Systems). The MUSTARD-powered computer prototype uses eight V70 processors. It utilizes FRM function, and can configure and change the configuration of master and checker upon request.

I-TRON (real-time)
For hardware-control-oriented embedded systems, the I-TRON-based real-time operating system, named RX616, was implemented by NEC for the V60/V70. The 32-bit RX616 was a continuous fork from the 16-bit RX116, which was for the V20-V50.

FlexOS (real-time)
In 1987, Digital Research, Inc. also announced that they were planning on porting FlexOS to the V60 and V70.

CP/M and DOS (legacy 16-bit)
The V60 could also run CP/M and DOS programs (ported from the V20-V50 series) using V20/V30 emulation mode. According to a 1991 article in InfoWorld, Digital Research was working on a version of Concurrent DOS for the V60 at some point; but this was never released, as the V60/V70 processors were not imported to the US for use in PC clones.

C/C++ cross-compilers
As part of its development tool kit and integrated development environment (IDE), NEC had its own C-compiler, the PKG70616 "Software Generation tool package for V60/V70". In addition, GHS (Green Hills Software) made its native mode C compiler (MULTI), and MetaWare, Inc. (currently Synopsys, via ARC International) made one, for V20/V30 (Intel 8086), emulation mode, called High C/C++. Cygnus Solutions (currently Red Hat) also ported GCC as a part of an enhanced GNU compiler system (EGCS) fork, but it seems not to be public.

, the processor-specific directory necv70 is still kept alive in the newlib C-language libraries (libc.a and libm.a) by RedHat. Recent maintenance seems to be done on Sourceware.org. The latest source code is available from its git repository.

MV-4100 Ada 83–certified system
The Ada 83–certified "platform system" was named MV-4000, certified as "MV4000". This certification was done with a target system, that utilized the real-time UNIX RX-UX 832 OS running on a VMEbus (IEEE 1014)–based system with a V70 processor board plugged in. The host of the cross compiler was an NEC Engineering Work Station EWS 4800, whose host OS, EWS-US/V, was also UNIX System V–based.

The processor received Ada-83 validation from AETECH, Inc., running the Ada Compiler Validation Capability tests.

Evaluation board kits
NEC released some plug-in evaluation board kits for the V60/V70.

On-chip software debug support with the IE-V60
NEC based its own full (non-ROM and non-JTAG) probe-based in-circuit emulator, the IE-V60, on the V60, because V60/V70 chips themselves had emulator-chip capabilities. The IE-V60 was the first in-circuit emulator for V60 that was manufactured by NEC. It also had a PROM programmer function.Section 9.4, p. 205 NEC described it as a "user friendly software debug function". The chips have various trapping exceptions, such as data read (or write) to the user specified address, and 2 break-points simultaneously.Section 9

External bus status pins
The external bus system indicates its bus status using 3 status pins, which provide three bits to signal such conditions as first instruction fetch after branch, continuous instruction fetch, TLB data access, single data access, and sequential data access. Section 6.1, p. 114

Debugging with V80
These software and hardware debugging functions were also built into the V80. However, the V80 did not have an in-circuit emulator, possibly because the presence of such software as real-time UNIX RX-UX 832 and real-time I-TRON RX616 rendered such a function unnecessary. Once Unix boots up, there is no need for an in-circuit emulator for developing either device drivers or application software. What is needed is a C compiler, a cross compiler, and a screen debugger—such as GDB-Tk—that works with the target device,.

HP 64758
Hewlett-Packard (currently Keysight) offered probing-pod-based in-circuit emulation hardware for the V70, built on their HP 64700 Series systems, the successor to the HP 64000 Series, specifically the HP 64758. It enables trace function like a logic analyzer. This test equipment also displays disassembled source code automatically, with trace data display and without an object file, and displays high-level language source code when the source code and the object files are provided and they were compiled in DWARF format. An interface for the V60 (10339G) was also in the catalog, but the long probing-pod cable required "special grade qualified" devices, i.e. the high-speed grade V70.

HP 64758: Main units, sub-units, and hosted interface

Software options

Hardware options

Strategic failure of the V80 microarchitecture
In its development phase, the V80 was thought to have the same performance as the Intel 80486, but they ended up having many different features. The internal execution for each instruction of the V80 needed at least two cycles, while that of i486 required one. The internal pipeline of the V80 seemed buffered asynchronous, but that of i486 was synchronous. In other words, the internal microarchitecture of V80 was CISC, but that of i486 was RISC. Both of their ISAs allowed long non-uniform CISC instructions, but the i486 had a wider, 128-bit internal cache memory bus, while that of V80 had a 32-bit width. This difference can be seen on their die photos. The design was fatal from the performance point of view, but NEC did not change it. NEC might have been able to redesign the physical design, with the same register-transfer level, but it did not.

Lack of commercial success
The V60-V80 architecture did not enjoy much commercial success.

The V60, V70, and V80 were listed in the 1989 and 1990 NEC catalogs in their PGA packaging. A NEC catalog from 1995 still listed the V60 and V70 (not only in their PGA version but also in a QFP packaging, and also included a low-cost variant of the V60 named μPD70615, which eliminated V20/V30 emulation and FRM function), alongside their assorted chipsets; but the V80 was not offered in this catalog. The 1999 edition of the same catalog no longer had any V60-V80 products.

The V800 series
In 1992, NEC launched a new model, the V800 Series 32-bit microcontroller; but it did not have a memory management unit (MMU). It had a RISC-based architecture, inspired by the Intel i960 and MIPS architectures, and other RISC processor instructions, such as JARL (Jump and Register Link) and load–store architecture.

At this time, the enormous software assets of the V60/V70, such as real-time Unix, were abandoned and never returned to their successors, a scenario Intel avoided.

The V800 Series had 3 major variants, the V810, V830, and V850 families.

The V820 (μPD70742) was a simple variant of the V810 (μPD70732), but with peripherals.

The designation V840 may have been skipped as a designation because of Japanese tetraphobia (see page 58 ). One Japanese pronunciation of "4" means "death", thus avoid names evoking such as Death-watch Shi-ban (the number 4 – Shi-ban) Bug (死番虫, precisely "deathwatch beetle").

As of 2005, it was already the V850 era, and the V850 family has been enjoying great success. As of 2018, it is called the Renesas V850 family and the RH850 family, with V850/V850E1/V850E2 and V850E2/V850E3 CPU cores, respectively. Those CPU cores have extended the ISA of the original V810 core; running with the V850 compiler.

MAME
Because the V60/V70 had been used for many Japanese arcade games, MAME (for "Multiple Arcade Machine Emulator"), which emulates multiple old arcade games for enthusiasts, includes an CPU simulator for their instruction set architecture. It is a kind of an instruction set simulator, not for developers but for users.

It has been maintained by the MAME development team. The latest open-source code, written in C++, is available from the GitHub repository. The operation codes in the file optable.hxx are exactly the same as those of the V60.