R800

The R800 is the central processing unit used in the MSX Turbo-R home computer. The R800 was designed by ASCII Corporation of Japan and built by Mitsui & Co The goal was a modern and pipelined CPU binary compatible with the Z80, and therefore with MSX software, while also maintaining compatibility with older MSX Z80-based hardware.

Compatibility
During the development of the MSX Turbo R, ASCII Corporation considered various processors, both compatible and incompatible with the Z80, as candidates. At that time, Kazuya Kishioka (岸岡和也), a company employee, was researching and developing an ASIC that was a high-speed version of the Z80 and largely customized for the MSX architecture.

For software compatibility with older MSX software, the R800 uses the same instruction set as the Z80, with only minor but useful additions, such as 8x8-bit and 16x16-bit multiplication instructions called MULUB (8-bit), and MULUW (16-bit). Also, many of the undocumented Z80 instructions were made official, including all the opcodes for instructions dealing with IX and IY as 8-bit registers (IXH, IXL, IYH, IYL).

As the R800 is not based directly on the Z80, but stems from the Z800 family, it lacks some of the other undocumented Z80 features. For instance, the undocumented flags represented in bits 3 and 5 of the F register don't assume the same values as in Z80 (causing it to fail ZEXALL tests) and the undocumented opcode often called SLL is just an alias of the SLA instruction.

Hardware changes
Being a much newer design, the R800 implementation was quite different from the old Z80. The changes were similar to the Z800, Z280, Z380 and eZ80 lines of Z80 compatible processors. The original Z80 uses an unusual 4-bit ALU hardware internally, a solution actually able to compete with similar CPUs using full hardwired 8-bit ALU logic (such as its immediate precursor, the Intel 8080 ). However, the R800 designers implemented a full 16-bit ALU in order to keep up with its more pipelined execution. Instructions like ADD HL,BC that takes 11 clock cycles on the Z80 can in some situations execute in as little as one bus cycle (1-2 clocks) on the R800, due to the degree of pipelining made possible by this full width ALU. The maximum CPU clock speed used on this new MSX was 14.32 MHz&mdash;four times as fast as the original 3.57 MHz speed used in the older MSX, while the bus clock was increased to 7.16 MHz. The data bus remained 8-bit to maintain compatibility with old hardware.

Fetching opcodes
Additional changes were made in the way the CPU fetches opcodes. The original Z80 uses two cycles to fetch a simple instruction like OR A, plus two cycles for refresh. An additional waitstate is issued on the MSX architecture. A review of the fetch mechanism in a typical MSX environment helps in explaining the R800:


 * Z80, cycle 1: set the higher 8-bits of address
 * Z80, cycle 2: set the lower 8-bits of address
 * Z80, cycle 3: waitstate
 * Z80, cycle 4: refresh, part 1
 * Z80, cycle 5: refresh, part 2

Since most implementations of MSX use RAM disposed in a 256×256 bytes block, two cycles are required to set the address for the fetch. The R800 avoids this by remembering the last known state of the higher 8-bits. If the next instruction is in the same 256-byte boundaries, the higher 8-bits are not set, and a cycle is saved. However, on the Z80, the refresh cycles destroy the information on the higher bits, so a workaround was needed.

The solution used in the R800 was to refresh entire blocks of RAM, instead of refreshing one line of RAM on each instruction issued. Each 30μs, the CPU is halted for 4μs, this time is used to refresh a block of the RAM. Since there's no refresh in between fetch instructions, and the waitstate is removed due to faster RAM chips, simple instructions can be issued using only one cycle. This cycle would be cycle 2 in the Z80 example above; cycle 1 becomes optional, and it's only issued when the program crosses a 256-byte boundary.

External hardware
All this only applies to the fast RAM used on the MSX Turbo-R. External hardware, connected through cartridge slots, uses timings similar to Z80. Not even the internal ROM of Turbo-R is fast enough for this fetch scheme, so additional chips on the Turbo-R can mirror the contents of ROM into RAM, in order to make it run faster.