CSG 65CE02

The CSG 65CE02 is an 8/16-bit microprocessor developed by Commodore Semiconductor Group in 1988. It is a member of the MOS Technology 6502 family, developed from the CMOS WDC 65C02 released by the Western Design Center in 1983.

Like the 65C02, the 65CE02 was built on a 2 µm CMOS process instead of the original 6502's 8 µm NMOS technology, making the chip smaller (and thus less expensive) as well as using much less power. In addition to changes made in the 65C02, the 65CE02 also included improvements to the processor pipeline to allow one-byte instructions to complete in 1 cycle, rather than the 6502's (and most variants) minimum of 2 cycles. It also removed 1 cycle delays when crossing page boundaries. These changes improved performance as much as 25% at the same clock speed.

Other changes included the addition of a third index register, Z, along with the addition and modification of a number of instructions to use this register. The zero-page, the first 256 bytes of memory that were used as pseudo-registers, could now be moved to any page in main memory using the B(ase page) register. The stack register was widened from 8 to 16-bits using a similar page register, SPH (stack pointer high), allowing the stack to be moved out of page one and to grow to larger sizes.

The 65CE02 was the basis for the system on a chip CSG 4510 that was developed for the unreleased Commodore 65. The 65CE02 was later used for the A2232 serial port card for the Amiga computer. It appears to have seen no other use.

Background
By the late 1970s, the original MOS Technology team that designed the 6502 had broken up. Bill Mensch had moved to Arizona and set up the Western Design Center (WDC) to provide 6502-based design services. Around 1981, the main licensees of the 6502 design, Rockwell Semiconductor, GTE and Signetics, began a redesign effort with Mensch that led to the WDC 65C02. This was mainly a CMOS implementation of the original NMOS 6502 that used 10 to 20 times less power, but it also included a number of new instructions to help improve the code density in certain applications. New instructions included  to increment and decrement the accumulator,   to write a zero to a memory location, and   which was a jump with a branch-style 1-byte relative address. The 65C02 also fixed a number of minor bugs in the original 6502 design.

The original 6502 was designed in the era before microcomputers existed, when microprocessors were used as the basis for simpler systems like smart terminals, desktop calculators and many different industrial controller systems. This was also an era when memory devices were generally based on static RAM, which was very expensive and had low memory density. For both of these reasons, the ability to handle "large" amounts of memory was not required, and many processors had operating modes that worked with small portions of a larger address space in order to offer higher performance. Such was the case in the 6502, which used the first memory page, or "zero page", to provide faster access, and the second page, "page one", to hold a 256-byte stack.

By the 1980s, these assumptions were no longer valid, many machines based on these processors now shipped with the maximum 64 kB that the 6502 could address, using the far less expensive and denser dynamic RAM. The speed advantages of the zero page addressing mode remained, but now existing within a memory space that was dramatically larger. Likewise, the single-page call stack was now a pittance within the overall memory, and high-level languages that made prodigious use of stack space could not easily run on the 6502.

New features
The 65CE02 is a further improved version of the 65C02 which expands the memory model to make it more suitable for a system with large amounts of main memory. To do this, it adds the following new features:


 * The 65CE02 adds an 8-bit B register, for Base Page, that offsets the zero page to any location in memory. B is set to zero on power-up or reset, so the 65CE02 initially works exactly like the 6502. If a value is placed into the B register using  (Transfer A to B) the zero page then moves to the new location. A significant use of this feature is to allow small routines that can fit within the 256 bytes of a page to use zero-page addressing (now known as base page addressing) which makes the code smaller because addresses no longer have a second byte, which also makes the code run faster because the second byte does not have to be fetched from memory.


 * The 65CE02 also extends the stack from the original 256-bytes of page one to, in theory, the entire address space. It does this by adding another 8-bit register, SPH, for Stack Pointer High. Normally this works like B, offsetting the base address of the stack from page one to any selected page. It otherwise continues to work as before, having a maximum size of one page, 256 bytes. Like B, on startup or reset, SPH is set to 01 so that it works exactly like the 65C02.


 * When the new "stack extend" bit in the status register is set, using the new   instructions, the stack pointer becomes a true 16-bit value. The value in SPH is added to the value in the original SP, now known as SPL for Stack Pointer Low, to produce a 16-bit pointer to the bottom of the stack. This allows the stack to grow much larger than the original 256 bytes, which was too small for high-level languages.


 * This means there are two types of stacks, a 256-byte one that can be anywhere, or a 16-bit one spanning memory. While the latter is more flexible, it does mean that accesses into the stack have to construct a 16-bit address from the two registers, taking an extra cycle, and thus slowing overall performance. Using the smaller stack, where possible, offers better performance.


 * The 65CE02 also adds a new index register, Z. This is set to zero on startup or reset, meaning that its store-Z-to-memory instruction,, works just like it does in the 65C02 where the same instruction means store-zero-to-memory. This allows unmodified 65C02 code to run on the 65CE02. A number of other instructions are added or modified to allow access to the Z register. Among these are the   to load the value from memory,   to transfer the value to or from the accumulator,   to push and pull Z to the stack,   for increment and decrement, and   to compare the value in Z to a value in memory.


 * The 65C02 added, Branch Always, which was essentially a   that used branch-style 8-bit relative address instead of an absolute 16-bit address. For unknown reasons, the 65CE02 changed the mnemonic to   (Branch Unconditionally). They also added the   instruction, Branch to SubRoutine, which uses the same relative addressing mode with the  , Jump to SubRoutine.


 * In addition, the CE added 16-bit addressing, or "word relative", to all of the existing branch instructions. Previously, the branches could only move backward 128 locations or forward 127, based on a signed 8-bit value, the "relative address". In the 65CE02, these could be -32768 or +32767 locations, by following the branch with a 16-bit value. Previously to perform a "long branch" one normally had to use a  to the 16-bit target and then branch over those three bytes when you didn't want to do it. For instance, if one wanted to branch to address $1234 if the accumulator is zero, one would do a , meaning you want to skip over the 3-byte   if the accumulator is not zero. In the 65CE02 this can be reduced to something like  , thereby making the code more obvious, removing two bytes of instructions, and removing the need for the lost cycles fetching and running the branch. However, as it still uses relative addressing, the relative address has to be calculated from the label by the programmer or assembler when converting to machine code.


 * Another addition to the system were a number of "word" instructions that carried out operations on 16-bit data. This included  to increment and decrement a value in memory, and   to perform an   Arithmetic Shift (left) Word or ROtate (left) Word.


 * More minor changes include the addition of  to perform an arithmetic (signed) right shift (the 6502 only had logical, or unsigned right shift), a   instruction which performs a two's complement negation on the accumulator, and , a variation on   (ReTurn from Subroutine) that returns to an address offset into the stack instead of at the top, avoiding the need to explicitly   off anything the routine added while it ran. The system also added a new addressing mode that used a base address on the stack as the basis for indirect addressing.


 * Finally, the new four-byte AUG instruction was added for future expansion. Although the data-sheet is not clear on its ultimate purpose, it appears to be a placeholder intended to allow instructions to be passed to co-processor units, like a memory management unit.

Pipeline improvements
A major oddity of the original 6502 was that one-byte instructions like  still took two cycles to complete. This allowed for simplifications in the pipeline system; the next byte from memory was fetched while the operation was being decoded, meaning the next byte was fetched no matter what. For most instructions, this byte would be part (or whole) of an operand, which could then be immediately fed into the now-decoded instruction.

If the instruction required only one byte, the processor still read the following byte as it decoded the first. In this case the next byte was the following instruction, but it had no way to feed that back into the first stage of the pipeline to decode it. The fetched instruction was instead discarded and re-read to feed it into the decoder. This wastes a cycle. Although this led to a number of instructions being slower than they could have been, this "feature" was retained in the 65C02, although whether this was in order to retain its pipeline's simplicity or its cycle timing is not explained in available sources.

Maintaining cycle compatibility was not a requirement for the 65CE02, and new fabrication processes made the extra circuitry in the pipeline a non-issue, so the pipeline was re-arranged to correctly handle one-byte instructions in a single cycle. These improvements allow the 65CE02 to execute code up to 25% faster than previous 65xx models.

A further improvement addresses an issue involving addressing instructions that add values to produce a final address. Examples include "indexed indirect" where the value in one of the index registers is added to a base address, and then applies the instruction to the resulting address. In the original 6502, if the addition of the two values crossed a page boundary, every 256 locations, an extra cycle was needed to produce the final address value. The 65CE02 removed this limitation, thereby improving the performance of these commonly used modes.

Physical details
It is fabricated using 2 µm CMOS technology, allowing for lower power operation compared to previous NMOS and HMOS versions of the 65xx family. It is housed in a 40-pin DIP that is pin compatible with the 6502.

CSG 4510
The 4510 is a system in package (SiP) variant of the 65CE02 that includes two 6526 CIA I/O port controllers and a custom MMU to expand the address space to 20 bit (1 megabyte). It is housed in an 84-pin PLCC.

The 4510 was used in the unreleased Commodore 65 home computer and the unreleased Commodore CDTV cost-reduced revision.

Applications
The 65CE02 was used in the Commodore A2232 serial port card for the Amiga computer.