WD16

The WD16 is a 16-bit microprocessor introduced by Western Digital in October 1976. It is based on the MCP-1600 chipset, a general-purpose design that was also used to implement the DEC LSI-11 low-end minicomputer and the Pascal MicroEngine processor. The three systems differed primarily in their microcode, giving each system a unique instruction set architecture (ISA).

The WD16 implements an extension of the PDP-11 instruction set architecture but is not machine code compatible with the PDP-11. The instruction set and microcoding were created by Dick Wilcox and Rich Notari. The WD16 is an example of orthogonal CISC architecture. Most two-operand instructions can operate memory-to-memory with any addressing mode and some instructions can result in up to ten memory accesses.

The WD16 is implemented in five 40-pin DIP packages. Maximum clock speed is 3.3 MHz. Its interface to memory is via a 16-bit multiplexed data/address bus.

The WD16 is best known for its use in Alpha Microsystems' AM-100 and AM-100/T processor boards. A prototype was demonstrated in 1977. As of 1981 there were at least 5,000 Alpha Micro computers based on the WD16. As late as 1982, WD16-based Alpha Micros were still being characterized as "supermicros." The WD16 was superseded by the Motorola 68000 in June 1982.

Data formats
The smallest unit of addressable and writable memory is the 8-bit byte. Bytes can also be held in the lower half of registers R0 through R5.

16-bit words are stored little-endian with least significant bytes at the lower address. Words are always aligned to even memory addresses. Words can be held in registers R0 through R7.

32-bit double words can only be stored in register pairs with the lower word being stored in the lower-numbered register. 32 bit values are used by MUL, DIV and some rotate and arithmetic shift instructions.

Floating point values are 48 bits long and can only be stored in memory. This format is half-way between single and double precision floating point formats. They are stored an unusual middle-endian format sometimes referred to as "PDP-endian." Floating point values are always aligned to even addresses. The first word contains the sign, exponent, and high byte of the mantissa. The next higher address contains the middle two bytes of the mantissa, and the next higher address contains the lowest two bytes of the mantissa. The complete format is as follows:

1. A 1 bit sign for the entire number which is zero for positive.

2. An 8-bit base-two exponent in excess-128 notation with a range of +127, -128. The only legal number with an exponent of -128 is true zero (all zeros).

3. A 40 bit mantissa with the MSB implied.

Memory management
The WD16's 16-bit addresses can directly access 64 KB of memory. The WD16 does not offer any inherent memory management or protection. In the AM-100 application, the last 256 memory locations are mapped to port space. As most AM-100 computers were used as multi-user computers, the memory would usually be expanded past 64K with bank switching. Although the AM-100 could be configured for up to 22 users and 512 Kilobytes of RAM, a typical memory configuration for a 9-user AM-100 might be in the range of 352 Kilobytes. In 1981 an optional AM-700 memory management unit was offered for the AM-100/T which allowed memory segmentation in 256 byte increments.

CPU registers
The CPU contains eight general-purpose 16-bit registers, R0 to R7. The registers can be used for any purpose with these exceptions: Register R7 is the program counter (PC). Although any register can be used as a stack pointer, R6 is the stack pointer (SP) used for hardware interrupts and traps. R0 is the count for the block transfer instructions.

Addressing modes
Most instructions allocate six bits to specify each operand. Three bits select one of eight addressing modes and three bits select a general register. The encoding of the six bit operand addressing mode is as follows:

In the following sections, each item includes an example of how the operand would be written in assembly language. Rn means one of the eight registers, written R0 through R7.

General register addressing modes
The following eight modes can be applied to any general register. Their effects when applied to R6 (the stack pointer, SP) and R7 (the program counter, PC) are set out separately in the following sections.

In index and index deferred modes, X is a 16-bit value taken from a second word of the instruction. In double-operand instructions, both operands can use these modes. Such instructions are three words long.

Autoincrement and autodecrement operations on a register are by 1 in byte instructions, by 2 in word instructions, and by 2 whenever a deferred mode is used, since the quantity the register addresses is a (word) pointer.

Program counter addressing modes
When R7 (the program counter) is specified, four of the addressing modes naturally yield useful effects:

There are only two common uses of absolute mode, whose syntax combines immediate and deferred mode. The first is accessing the reserved processor locations at 0000-003F. The other is to specify input/output registers in port space, as the registers for each device have specific memory addresses. Relative mode has a simpler syntax and is more typical for referring to program variables and jump destinations. A program that uses relative mode (and relative deferred mode) exclusively for internal references is position-independent; it contains no assumptions about its own location, so it can be loaded into an arbitrary memory location, or even moved, with no need for its addresses to be adjusted to reflect its location. In computing such addresses relative to the current location, the processor performs relocation on the fly.

Immediate and absolute modes are merely autoincrement and autoincrement deferred mode, respectively, applied to PC. When the auxiliary word is in the instruction, the PC for the next instruction is automatically incremented past the auxiliary word. As PC always points to words, the autoincrement operation is always by a stride of 2.

Stack addressing modes
R6, also written SP, is used as a hardware stack for traps and interrupts. A convention enforced by the set of addressing modes the WD16 provides is that a stack grows downward—toward lower addresses—as items are pushed onto it. When a mode is applied to SP, or to any register the programmer elects to use as a software stack, the addressing modes have the following effects:

Although software stacks can contain bytes, SP always points to a stack of words. Autoincrement and autodecrement operations on SP are always by a stride of 2.

Instruction set
Most of the WD16 instructions operate on bytes and words. Bytes are specified by a register number—identifying the register's low-order byte—or by a memory location. Words are specified by a register number or by the memory location of the low-order byte, which must be an even number. All opcodes and addresses are expressed in hexadecimal.

Double-operand instructions
The high-order four bits specify the operation to be performed. Two groups of six bits specify the source operand addressing mode and the destination operand addressing mode, as defined above.

Some two-operand instructions utilize an addressing mode for one operand and a register for the second operand:

The high-order seven bits specify the operation to be performed, six bits specify the operand addressing mode and three bits specify a register or register pair. Where a register pair is used (written below as "Reg+1:Reg") Reg contains the low-order portion of the operand. The next higher numbered register contains the high-order portion of the operand (or the remainder).

Single-operand instructions
The high-order ten bits specify the operation to be performed, with bit 15 generally selecting byte versus word addressing. A single group of six bits specifies the operand as defined above.

Single-operand short immediate instructions
The high-order seven bits and bits 5 and 4 specify the operation to be performed. A single group of three bits specifies the register. A four bit count field contains a small immediate or a count. In all cases one is added to this field making the range 1 through 16.

Floating point instructions
The high-order eight bits specify the operation to be performed. Two groups of four bits specify the source and destination addressing mode and register. If field I = 0, designated register contains the address of the operand, the equivalent of addressing mode (Rn). If field I = 1, designated register contains the address of the address of the operand, the equivalent of addressing mode @0(Rn).

Block transfer instructions
The high-order ten bits specify the operation to be performed. Two groups of three bits specify the source and destination registers. In all cases the source register contains the address of the first word or byte of memory to be moved, and the destination register contains the address of the first word or byte of memory to receive the data being moved. The number of words or bytes being moved is contained in R0 as a unsigned integer. The count ranges from 1–65536. These instructions are fully interruptible.

Branch instructions
The high-order byte of the instruction specifies the operation. The low-order byte is a signed word offset relative to the current location of the program counter. This allows for forward and reverse branches in code. Maximum branch range is +128, -127 words from the branch op code.

In most branch instructions, whether the branch is taken is based on the state of the condition codes. A branch instruction is typically preceded by a two-operand CMP (compare) or BIT (bit test) or a one-operand TST (test) instruction. Arithmetic and logic instructions also set the condition codes. In contrast to Intel processors in the x86 architecture, MOV instructions set them too, so a branch instruction could be used to branch depending on whether the value moved was zero or negative.

The limited range of the branch instructions meant that as code grows, the target addresses of some branches may become unreachable. The programmer would change the one-word Bcc to the two-word JMP instruction. As JMP has no conditional forms, the programmer would change the Bcc to its opposite sense to branch around the JMP.

SOB (Subtract One and Branch) is another conditional branch instruction. The specified register is decremented by 1, and if the result is not zero, a reverse branch is taken based on the 6-bit word offset.

Subroutine instructions
The JSR instruction can save any register on the stack. Programs that do not need this feature specify PC as the register (JSR PC, address) and the subroutine returns using RTN PC. If a routine were called with, for example, "JSR R4, address", then the old value of R4 would be saved on the top of the stack and the return address (just after JSR) would be in R4. This lets the routine gain access to values coded in-line by specifying (R4)+ or to in-line pointers by specifying @(R4)+. The autoincrementation moves past these data, to the point at which the caller's code resumes. Such a routine would have to specify RTN R4 to return to its caller.

The JSR PC,@(SP)+ form can be used to implement coroutines. Initially, the entry address of the coroutine is placed on the stack and from that point the JSR PC,@(SP)+ instruction is used for both the call and the return statements. The result of this JSR instruction is to exchange the contents of the PC and the top element of the stack, and so permit the two routines to swap control and resume operation where each was terminated by the previous swap.

Single register instructions
These instructions have a 13 bit opcode and a three bit register argument.

Supervisor calls
These instructions are used to implement operating system (supervisor) calls. All have a six bit register argument. SVCB and SVCC are designed so an argument to the operating system can use most of the addressing modes supported by the native instruction set.

Condition-code operations
The four condition codes in the processor status word (PSW) are
 * N indicating a negative value
 * Z indicating a zero (equal) condition
 * V indicating an overflow condition, and
 * C indicating a carry condition.

Reserved low-memory locations
Memory locations between 0000 and 003F have fixed functions defined by the processor. All addresses below are word addresses.

Performance
WD16 processor speed varies by clock speed, memory configuration, op code, and addressing modes. Instruction timing has up to three components, fetch/execute of the instruction itself and access time for the source and the destination. The last two components depend on the addressing mode. For example, at 3.3 MHz, an instruction of the form ADD x(Rm),y(Rn) has a fetch/execute time of 3.3 microseconds plus source time of 2.7 microseconds and destination time of 3.0 microseconds, for a total instruction time of 9.0 microseconds. The register-to-register ADD Rm,Rn executes in 3.3 microseconds. Floating point is significantly slower. A single-and-a-half precision (48 bit) floating add instruction typically ranges from 54 to 126 microseconds. The WD16's precision is a compromise between traditional single and double precision floats.

For contrast, the fastest PDP-11 computer at the time was the PDP-11/70. An instruction of the form ADD x(Rm),y(Rn) has a fetch/execute time of 1.35 microseconds plus source and destination times of 0.6 microseconds each, for a total instruction time of 2.55 microseconds. Any case where addressed memory was not in the cache adds 1.02 microseconds. The register-to-register ADD Rm,Rn could execute from the cache in 0.3 microseconds. A single-precision floating add instruction executed by the FP11-C co-processor could range from 0.9 to 2.5 microseconds plus time to fetch the operands which could range up to 4.2 microseconds.

The WD16 block transfer instructions approximately double the speed of moves and block I/O. A word moved with  instructions takes 9.6 microseconds per iteration. The  equivalent takes 4.8 microseconds per iteration.

The Association of Computer Users performed a series of benchmarks on an AM-100T-based system costing $35,680. They found that their CPU-bound benchmark executed in 31.4 seconds on the AM-100T compared to 218 seconds for the average single user system in the $15,000 to $25,000 price range. In a group of multi-user computers priced between $25,000 and $50,000, the AM-100T was in the "upper third" for speed.

The Creative Computing Benchmark of May 1984 placed the WD16 (in the AM-100T application) as number 34 out of 183 machines tested. The elapsed time was 10 seconds, compared to 24 seconds for the IBM PC.

Emulator
Virtual Alpha Micro is an open source WD16 emulator. Written in C, it emulates the WD16 processor and the Alpha Micro AM-100 hardware environment. The author claims it runs on Linux (including Raspberry Pi), Windows, and Macintosh desktops, though no binaries are provided. It will run the Alpha Micro Operating System (AMOS) and all associated programs. In 2002, Alpha Micro granted limited permission to distribute AMOS 4.x or 5.0 binaries including the manuals for hobby use only.