Hack computer

The Hack Computer is a theoretical computer design created by Noam Nisan and Shimon Schocken and described in their book, The Elements of Computing Systems: Building a Modern Computer from First Principles.   In using the term “modern”, the authors refer to a digital, binary machine that is patterned according to the von Neumann architecture model.

The Hack computer is intended for hands-on virtual construction in a hardware simulator application as a part of a basic, but comprehensive, course in computer organization and architecture. One such course, created by the authors and delivered in two parts, is freely available as a massive open online course (MOOC) called Build a Modern Computer From First Principles: From Nand to Tetris. In the twelve projects included in the course, learners start with a two input Nand gate and end up with a fully operational virtual computer, including both hardware (memory and CPU) and software (assembler, VM, Java-like programming language, and OS). In addition to the hardware simulator used for initial implementation of the computer hardware, a complete Hack computer emulator program and assembler that supports the projects described in the book and the on-line course is also available at the author's web site.

Hardware architecture
The Hack computer hardware consists of three basic elements as shown in the block diagram. There are two separate 16-bit memory units and a central processing unit (CPU). Because data is moved and processed by the computer in 16-bit words, the Hack computer is classified as a 16-bit architecture.

The instruction memory, implemented as read-only memory from the viewpoint of the computer and designated ROM, holds assembled binary program code for execution. The random access memory, called RAM, provides storage for an executing program’s data and provides services and storage areas for the computer’s memory-mapped I/O mechanism. Data processing and program control management are provided by the CPU.

The three units are connected by parallel buses. The address buses (15-bit), as well as the data and instruction busses (16-bit) for the ROM and RAM units are completely independent. Therefore, the Hack design follows the Harvard architecture model with respect to bus communication between the memory units and the CPU. All memory is word addressable only.

Read-only memory (ROM)
The Hack computer’s ROM module is presented as a linear array of individually addressable, sequential, 16-bit memory registers. Addresses start at 0 (0x0000). Since the memory elements are sequential devices, a system clock signal is supplied by the simulation application and the computer emulator application. The ROM address bus is 15 bits wide, so a total of 32,768 individual words are available for program instructions. The address of the currently active word is supplied by a program counter register within the CPU (see below). The value in the ROM memory register identified by the address placed on the instruction address bus in a particular clock cycle is available as the "current" instruction at the beginning of the next cycle. There is no instruction register; instructions are decoded in each cycle from the currently active ROM register.

Random access memory (RAM)
Although the RAM module is also viewed as a continuous linear array of individually addressable sequential, read-write, 16-bit memory registers, it is functionally organized by address range into three segments. Addresses 0 (0x000) through 16383 (0x3FFF) contain conventional 16-bit, read-write registers and are meant for use as general-purpose program data storage.

The registers at addresses 16384 (0x4000) through 24575 (0x5FFF) are essentially like data RAM, but they are also designated for use by a built-in screen I/O subsystem. Data written to addresses in this range have the side effect of producing output on the computer’s virtual 256 x 512 screen (see I/O). If a program does not require screen output, registers in this range may be used for general program data.

The final address in the RAM address space, at 24576 (0x6000), contains a single one word register whose current value is controlled by the output of a keyboard attached to the computer hosting the Hack emulator program. This keyboard memory map register is read-only (see I/O).

Data memory addresses in the range 24577 (0x6001) through 32767 (0x7FFF) are invalid. State transitions of the selected RAM memory register is also coordinated by the system clock signal.

Central Processing Unit (CPU)
As illustrated in the accompanying diagram, the Hack computer central processing unit (CPU) is an integrated logic unit with internal structure. It provides many of the functions found in simple, commercially available CPUs. The most complex element of the CPU is the arithmetic logic unit (ALU) which provides the computational functionality of the computer. The ALU is a combinational logic device having two 16-bit input operands and a single 16-bit output. The computation produced as output from the operands is specified by a set of six ordered, single-bit inputs to the ALU. The ALU also emits two single-bit status flags which indicate whether a computation result is zero (zr flag) or negative (ng flag).

The CPU also contains two 16-bit registers, labeled D and A. The D (Data) register is a general-purpose register whose current value always supplies the ALU x operand, although for some instructions its value is ignored. While the A (Address) register may also provide its current value as the y operand to the ALU when so directed by an instruction, its value may also be used for data memory addressing and as a target address in instruction memory for branching instructions. To facilitate this function, the A register is directly associated with a "pseudo-register" designated as M which is not explicitly implemented in hardware. This M register therefore represents the value contained in RAM having the address of the current value contained in the A register.

The final important element in the CPU is the program counter (PC) register. The PC is a 16-bit binary counter whose low 15 bits specify the address in instruction memory of the next instruction for execution. Unless directed otherwise by a branching instruction, the PC increments its value at the end of each clock cycle. The CPU also includes logic to change, under program control, the order of the computer's instruction execution, by setting the PC to a non-sequential value. The PC also implements a single-bit reset input that initializes the PC value to 0 (0x0000) when it is cycled from logic 0 to logic 1 and back. Unlike many actual CPU designs, there is no program accessible hardware mechanism provided to implement CPU external or internal interrupts or support for function calls.

External Input and Output (I/O)
The Hack computer employs a memory-mapped approach to I/O. Bitmapped, black and white output to a virtual 256 x 512 screen is effected by writing a bitmap of the desired output to data memory locations 16384 (0x4000) through 24575 (0x5FFF). The data words in this address range are viewed as a linear array of bits with each bit value representing the black/white state of a single pixel on the computer emulator's virtual screen. The least significant bit of the word in the first memory address of the screen RAM segment sets the pixel in the upper left corner of the screen to white if it is 0 and black if it is 1. The next-most significant bit in the first word controls the next pixel to the right, and so on. After the first 512-pixel row is described by the first 32 words of screen memory, the mapping is continued in the same fashion for the second row with the next 32 words. Logic external to the computer reads the screen RAM memory map segment and updates the virtual screen.

If a keyboard is attached to the computer hosting the CPU emulator program, the emulator puts a 16-bit bit scan code corresponding to a key depressed during program execution into the keyboard register at RAM address 24576 (0x6000). If no key is depressed, this register contains the value 0. The emulator provides a toggle button to enable/disable the keyboard. The encoding scheme closely follows ASCII encoding for printable characters. The effect of the Shift key is generally honored. Codes are also provided for other keys often present on a standard PC keyboard; for example, direction control keys (←, ↑, ↓, →) and Fn keys.

Operating cycle
Step-wise operation of the CPU and memory units is controlled by a clock that is built-in to both the hardware simulator and the computer emulator programs. At the beginning of a clock cycle the instruction at the ROM address emitted by the current value of the program counter is decoded. The ALU operands specified in the instruction are marshalled where needed. The computation specified is performed by the ALU and the appropriate status flags are set. The computation result is saved as specified by the instruction. Finally, the program counter is updated to the value of the next required program instruction. If no branching was specified by the current instruction, the PC value is simply incremented. If branching was specified, the PC is loaded (from the A register) with the address of the next instruction to be executed. The cycle then repeats using the now current PC value.

Because of its Harvard memory architecture model, the Hack computer is designed to execute the current instruction and “fetch” the next instruction in a single, two-part clock cycle. The speed of the clock may be varied by a control element in both the hardware simulator and the CPU emulator. Independent of the selected speed however, each instruction is completely executed in one cycle. The user may also single-step through a program.

Execution of a program loaded in ROM is controlled by the CPU's reset bit. If the value of the reset bit is 0, execution proceeds according to the operating cycle described above. Setting the reset bit to 1 sets the PC to 0. Setting the reset bit value back to zero then begins execution of the current program at the first instruction; however, RAM contains the values from any previous activity on reset.

There is no hardware or machine language support for interrupts of any kind.

Data types
Values stored in ROM memory must represent valid Hack machine language instructions as described in the Instruction Set Architecture section.

Any 16-bit value may be stored in RAM. The data type of value stored in RAM is inferred by its location and/or its use within a program. The primary hardware supported data type is the 16-bit signed integer, which is represented in 2’s complement format. Signed integers therefore have the range -32768 through 32767. The lower 15 bits of a  value in RAM may also represent an address in ROM or RAM in the sense of a pointer.

For values in the RAM memory registers assigned for screen I/O, the value will be interpreted as a 16 pixel map of the 256 row x 512 column virtual screen by the computer's independent I/O subsystem if the screen is "turned on".

The code value in keyboard memory may be read programmatically and interpreted for use by a program.

There is no hardware support for floating-point types.

Instruction set architecture (ISA) and machine language
The Hack computer's instruction set architecture (ISA) and derived machine language is sparse compared to many other architectures. Although the 6 bits used to specify a computation by the ALU could allow for 64 distinct instructions, only 18 are officially implemented in the Hack computer's ISA. Since the Hack computer hardware has direct support for neither integer multiplication (and division) or function calls, there are no corresponding machine language instructions in the ISA for these operations.

Hack machine language has only two types of instructions, each encoded in 16 binary digits.

A-instructions
Instructions whose most significant bit is “0” are called A-instructions or address instructions. The A-instruction is bit-field encoded as follows:

0b14b13b12b11b10b9b8b7b6b5b4b3b2b1b0

0 – the most significant bit of a A-instruction is “0”

b14 - b0 - these bits provide the binary representation of a non-negative integer in the decimal range 0 through 32767

When this instruction is executed, the remaining 15 bits are left-zero extended and loaded into the CPU's A-register. As a side-effect, the RAM register having the address represented by that value is  enabled for subsequent read/write action in the next clock cycle.

C-instructions
The other instruction type, known as C-instructions (computation instructions) is the programming working horse. It has “1” as the most significant bit. The remaining 15 bits are bit-field encoded to define the operands, computation performed, and storage location for the specified computation result. This instruction may also specify a program branch based on the most recent computation result. he format is

C-instruction: dest=comp;jump, either the dest or jump may be empty giving two options dest=comp or comp;jump

The C-instruction is bit-field encoded as follows:

111a  c1c2c3c4  c5c6d1d2  d3j1j2j3

1 – the most significant bit of a C-instruction is “1”

11 – these second two bits are ignored by the CPU and, by convention, are each always set to “1”

a – this bit specifies the source of the “y” operand of the ALU when it is used in a computation

c1-c6 – these six control bits specify the operands and computation to be performed by the ALU

d1-d3 – these three bits specify the destination(s) for storing the current ALU output

j1-j3 – these three bits specify an arithmetic branch condition, an unconditional branch (jump), or no branching

The Hack computer encoding scheme of the C-instruction is shown in the following tables.

In these tables,


 * A represents the value currently contained in the A-register
 * D represents the value currently contained in the D-register
 * M represents the value currently contained in the data memory register whose address is contained in the A-register; that is, M == RAM[A]

Assembly language
The Hack computer has a text-based assembly language to create programs for the hardware platform that implements the Hack computer ISA. Hack assembly language programs may be stored in text files having the file name extension “.asm”. Hack assembly language source files are case sensitive. Each line of text contains one of the following elements:


 * Blank line
 * Comment
 * Label declaration (with optional end-of-line comment)
 * A-instruction (with optional end-of-line comment)
 * C-instruction (with optional end-of-line comment)

Each of these line types has a specific syntax and may contain predefined or user defined symbols or numeric constants. Blank lines and comments are ignored by the assembler. Label declarations, A-instructions, and C-instructions, as defined below, may not include any internal white-space characters, although leading or trailing whitespace is permitted (and ignored).

Comments
Any text beginning with the two-character sequence “//” is a comment. Comments may appear on a source code line alone, or may also be placed at the end of any other program source line. All text following the comment identifier character sequence to end of line is completely ignored by the assembler; consequently, they produce no machine code.

Symbols and numeric constants
Hack assembly language allows the use of alphanumeric symbols for number of different specific purposes. A symbol may be any sequence of alphabetic (upper and lower case) or numeric digits. Symbols may also contain any of the following characters: under bar (“_”), period(“.”), dollar sign (“$”), and colon (“:”). Symbols may not begin with a digit character. Symbols are case sensitive. User defined symbols are used to create variable names and labels (see below).

The Hack assembly language assembler recognizes some predefined symbols for use in assembly language programs. The symbols R0, R1, …, R15 are bound respectively to the integers 0 through 15. These symbols are meant to represent general purpose registers and the symbols values therefore represent data memory addresses 0 through 15. Predefined symbols SCREEN and KBD are also specified to represent the data memory address of the start of memory-mapped virtual screen output (16384) and keyboard input (24756). There are a few other symbols (SP, LCL, ARG, THIS, and THAT) that are used in building the operating system software stack.

A string of decimal (0-9) digits may be used to represent a non-negative, decimal constant in the range 0 through 32,767. The use of the minus sign to indicate a negative number is not allowed. Binary or octal representation is not supported.

Variables
User defined symbols may be created in an assembly language program to represent variables; that is, a named RAM register. The symbol is bound at assembly to a RAM address chosen by the assembler. Therefore, variables must be treated as addresses when appearing in assembly language source code.

Variables are implicitly defined in assembly language source code when they are first referenced in an A-instruction. When the source code is processed by the assembler, the variable symbol is bound to a unique positive integer value in beginning at address 16. Addresses are sequentially bound to variable symbols in the order of their first appearance in the source code. By convention, user-defined symbols that identify program variables are written in all lower case.

Labels
Labels are symbols delimited by left "(" and right ")" parenthesis. They are defined on a separate source program line and are bound by the assembler to the address of the instruction memory location of the next instruction in the source code. Labels may be defined only once, but they may be used multiple times anywhere within the program, even before the line on which they are defined. By convention, labels are expressed in all-caps. They are used to identify the target address of branch C-instructions.

A-instructions
The A-instruction has the syntax “@xxxx”, where xxxx is either a numeric decimal constant in the range 0 through 32767, a label, or a variable (predefined or user defined). When executed, this instruction sets the value of the A register and the M pseudo-register to a 15-bit binary value represented by “xxxx”. The 15-bit value is left-zero extended to 16-bits in the A register.

The A-instruction may be used for one of three purposes. It is the only means to introduce a (non-negative) numeric value into the computer under program control; that is, it may be used to create program constants. Secondly, it is used to specify a RAM memory location using the M pseudo-register mechanism for subsequent reference by a C-instruction. Finally, a C-instruction which specifies a branch uses the current value of the A register as the branch target address. The A-instruction is used to set that target address prior to the branch instruction, usually by reference to a label.

C-Instructions
C-instructions direct the ALU computation engine and program flow control capabilities of the Hack computer. The instruction syntax is defined by three fields, referred to as “comp”, “dest”, and “jump”. The comp field is required in every C-instruction. The C-instruction syntax is “dest=comp;jump”. The “=” and “;” characters are used to delimit the fields of the instruction. If the dest field is not used, the “=” character is omitted. If the jump field is not used, the “;” character is omitted. The C-instruction allows no internal spaces.

The comp field must be one of the 28 documented mnemonic codes defined in the table above. These codes are considered distinct units;  they must be expressed in all-caps with no internal spaces. It is noted that the 6 ALU control bits could potentially specify 64 computational functions; however, only the 18 presented in the table are officially documented for recognition by the assembler.

The dest field may be used to specify one or more locations to store the result of the specified computation. If this field is omitted, along with the “=” delimiter, the computed value is not stored. The allowed storage location combinations are specified by the mnemonic codes defined in the table above.

The jump field may be used to specify the address in ROM of the next instruction to be executed. If the field is omitted, along with the “;” delimiter, execution continues with the instruction immediately following the current instruction. The branch address target, in ROM, is provided by the current value of the A register if the specified branch condition is satisfied. If the branch condition fails, execution continues with the next instruction in ROM. Mnemonic codes are provided for six different comparisons based on the value of the current computation. Additionally, an unconditional branch is provided as a seventh option. Because the comp field must always be supplied, even though the value is not required for the unconditional branch, the syntax of this instruction is given as “0;JMP”. The branch conditions supported are specified in the table above.

Assembler
Freely available software supporting the Hack computer includes a command line assembler application. The assembler reads Hack assembly language source tiles (*.asm) and produces Hack machine language output files (*.hack). The machine language file is also a text file. Each line of this file is a 16-character string of binary digits that represents the encoding of each corresponding executable line of the source text file according to the specification described in the section "Instruction set architecture (ISA) and machine language". The file created may be loaded into the Hack computer emulator by a facility provided by the emulator user interface.

Example Assembly Language Program
Following is an annotated example program written in Hack assembly language. This program sums the first 100 consecutive integers and places the result of the calculation in a user-defined variable called “sum”. It implements a “while” loop construct to iterate though the integer values 1 through 100 and adds each integer to a “sum” variable. The user-defined variable “cnt” maintains the current integer value through the loop. This program illustrates all of the features of the “documented” assembly language capabilities of Hack Computer except memory-mapped I/O. It is Hack Assembly translation of the C fragment:

The contents of the Hack assembly language source file are shown in the second column in bold font. Line numbers are provided for reference in the following discussion but do not appear in the source code. The Hack machine code produced by the assembler is shown in the last column with the assigned ROM address in the preceding column. Note that full-line comments, blank lines, and label definition statements generate no machine language code. Also, the comments provided at the end of each line containing an assembly language instruction are ignored by the assembler.

The assembler output, shown in the last column, is a text string of 16 binary characters, not 16-bit binary integer representation. Note that the instruction sequence follows the pattern of A-instruction, C-instruction, A-instruction, C-instruction, ... . This is typical for Hack assembly language programs. The A-instruction specifies a constant or memory address that is used in the subsequent C-instruction.

All three variations of the A-instruction are illustrated. In line 11 (@100), the constant value 100 is loaded into the A register. This value is used in line 12 (D=D-A) to compute the value used to test the loop branch condition.

Since line 4 (@cnt) contains the first appearance of the user-defined variable "cnt", this statement binds the symbol to the next unused RAM address. In this instance, the address is 16, and that value is loaded into the A register. Also, the M pseudo-register also now references this address, and RAM[16] is made the active RAM memory location.

The third use of the A-instruction is seen in line 21 (@LOOP). Here the instruction loads the bound label value, representing an address in ROM memory, into the A register and M pseudo-register. The subsequent unconditional branch instruction in line 22 (0;JMP) loads the M register value into the CPU's program counter register to effect control transfer to the beginning of the loop.

The Hack computer provides no machine language instruction to halt program execution. The final two lines of the program (@END and 0;JMP) create an infinite loop condition which Hack assembly programs conventionally use to terminate programs designed to run in the CPU emulator.