User:Maury Markowitz/sandbox

The VAX architecture is a 32-bit CISC instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). It is implemented by central processing units (CPUs) and microprocessors used in VAX minicomputers. It is based on, and highly compatible with, the earlier PDP-11 architecture, extending it with a larger address space and many more instructions.

The VAX is among the most complex ISAs ever put to market. In addition to typical operations on integers and floating point numbers, it included operations to directly manipulate double linked lists, translate strings from one encoding to another, directly support binary coded decimal, and calculate cyclic redundancy checks (CRCs), among others. Among its more infamous instructions was INDEX, which looked up values in an array; it was demonstrated that this instruction was generally slower than calculating the location in code.

It was in wide use during the 1970s and 80s, but was eventually overshadowed by RISC-based machines in the late 1980s, and was replaced by the DEC Alpha in the early 1990s. The VAX ISA remains a canonical model of the CISC design philosophy, especially in discussions of the RISC concept.

Data formats
The VAX is a 32-bit machine that was designed to be highly compatible with the earlier 16-bit PDP-11. On the PDP-11, an 8-bit value was a byte, a 16-bit value was known as a "word", and those instructions working with 32-bit values referred to them as "long words". This terminology was retained for VAX. Like the PDP, words are stored little-endian (with least significant bytes first).

To the two basic types in the PDP-11, the VAX ISA added several new types. Later versions of the VAX hardware manual describe a total of five integer formats, byte, word, longword, quadword and octaword. Only the first three are widely used in the ISA. Quadwords are supported by a small number of operations, and octawords are supported by few operations and are optional in hardware implementations.

In contrast to the PDP-11, floating point support was now a standard part of the ISA and included several data types of its own. These were the 32-bit single known as "F" format, 64-bit double "D" and "G" and 128-bit quad "H" formats. There are two double formats, D and G, differing in the number of bits dedicated to the datum and exponent, in D format uses an 8-bit exponent, G-format uses 11-bits. This allows the G-format to store a wider range of numbers, but at less precision.

The VAX also directly supported variable-length bit fields, or bit strings. These could be any length from 0 to 32 bits, and located anywhere in memory, they did not have to be aligned with a byte or word boundary. These required three values to completely describe them; an address to the byte where the first bit appeared, a value, 0 to 7, indicating the bit location within that byte, or offset, and a length. The first two values were stored in a long, with the lowest three bits describing the offset, and the upper 29 the address. These values were placed in registers, one for smaller strings but possibly two if the sum of the offset and size was larger than 32-bits.

Other data formats were formally specified for specific tasks like strings and queues. These will be discussed in their own sections below.

Memory management and addressing
VAX addresses are byte-oriented. The 32-bit words allow addresses up to 232 = 4 GB. In order to support PDP-11 compatibility and better provide memory protection, the address space was broken into four separate areas of 230 = 1 GB each. Attempts to read or write to addresses outside the program's natural "space" would result in a memory access error, and these types of operations were only possible with privileged instructions. The spaces are:

For the VMS operating system, P0 was used for user process space, P1 for process stack, S0 for the operating system, and S1 was reserved.

CPU registers
The VAX system had sixteen 32-bit processor registers. The first 12 of these, R0 through R11, were general purpose and could be used by most instructions.

The last four, R12 through R15, could be used in any instruction that took a register value. However, these registers were used internally by the system, both as the program counter as well as its subroutine call system (see below). Using these as normal registers might lead to problems if subroutines were called.

Status register
The VAX also included one separate special-purpose register, the Processor Status Longword, or PSL:

Private registers
The VAX also used a number of internal registers that were not normally accessible by programs. These included the base pointers for the memory management system, internal parts of the PSL, and a separate stack pointer for each privilege mode (see below). These registers could only be accessed using the special Move to Processor Register (MTPR) and Move from Processor Register (MFPR) instructions, which were privileged. 23

Addressing modes
As the VAX was designed to allow existing 16-bit PDP-11 code to run on the new machines, it had to include the ability to work with 8, 16 or 32-bit addresses. This means there are a huge variety of opcodes for every instruction that requires one or more addresses, encoding not only the type of access, but also the length of the addresses. As the PC and SP are visible to all instructions that take a register containing an address, the VAX automatically supported PC-relative and SP-relative addressing, in contrast to the PDP where these were considered separate modes.

The purpose of the different length addresses is to reduce the size of code depending on how far the target memory location is from the current program counter. One can, for instance, use a single byte offset for nearby locations, and use word or long offsets only if needed. Using these shorter addresses can save significant amounts of space in typical programs where there are many branch offsets that are typically quite small.

Additionally, the VAX allows any address to be used as an indirect address, although DEC referred to this as deferred addressing. In this mode, the address is calculated and the value in that memory location is read, and then that value is used as the final address. This can be used, for instance, to build jump tables in memory, placing the address of subroutines in a known location in memory and then jumping to that value with deferred turned on. This way the location of the subroutine can move and only the value in the table has to change to reflect this, the code itself will continue to work as long as the table does not move.

Subroutine calls
On most machines of the era, subroutine calls are mostly handled by program code. Before calling the routine, the caller would push parameters onto the stack and then call an opcode like JSR, Jump to Subroutine. The JSR would push another item on the stack, the current program counter (PC) value. The subroutine would then be responsible for collecting any parameters it needed and ensuring the last item remaining on the stack was the PC when it called RET to return.

On the VAX, there were several mechanisms that were added to make subroutine calling easier, using the AP, FP and SP registers. The first change was to formalize the passing of parameters using an argument list. This consisted of a series of 32-bit longs pushed onto the stack and then an 8-bit value in the low bits of another 32-bit long indicating the number of parameters in the list. The caller put the parameters on the stack and then called CALLS with the number of parameters. CALLS pushes that count onto the stack to complete the argument list, sets the AP register to point to the count, and then jumps into the subroutine. Alternately, the argument list can be placed in memory and called with CALLG, which then puts the address of the list in AP.

The first long of the subroutine contains a mask of which registers need to be saved. This allows the program to save out only the registers it intends to use. These values are written to the stack, along with the mask, and then the SP and FP are both updated to point to the mask at the top of the stack. The routine could then use the stack as normal, adding data as required. To return, by calling RET, the SP was returned to the FP location, the mask was read and those values read from the stack and set back into the registers, and then the count at the top of AP was read and the SP moved that many longs. The result was that a single user instruction returns the stack to its original state before the call.

VAX Subroutines https://people.computing.clemson.edu/~mark/subroutines/vax.html Mark Smotherman September 2002 Clemson University

Privilege modes
The VAX has four hardware implemented privilege modes:

Floating point
The VAX included several different floating point formats, F(loat), D(ouble), G, and H. Operations on these data types appended the type to the end of the instruction mnemonic; to copy an F between two locations, one used MOVF, to copy an H, MOVH. Additionally, most instructions came in two or three-operant variants, indicated with a further suffix; to add two F's and place the result in the first of the two locations, one uses ADDF2, to do the same but place the result in a third location, ADDF3. 25

The primary math instructions were ADDx2, ADDx3, SUBx2, SUBx3, MULx2, MULx3, where x is the data type. EMODx performed the modulus (remainder). POLYx evaluated a polynomial with the coefficients stored in a table using the length/location format (see below). CVTxy converted values between formats x and y, with the additional ability to convert to and from integer formats, indicated with B(yte), W(ord) or L(ong). There was also a dedicated CVTRxL to round any format into a long. 25

Logical operations were also supported. These included CMPx to compare two numbers, TSTx to test them, and the multi-step ACBx which, Added, Compared and Branched in a single opcode. Additional operations included MOVx and MNEGx which negated the number while moving. CLRx set a number to zero. 25

Character-string instructions
One of the later additions to the PDP-11 ISA was "CIS", the Commercial Instruction Set. This included a number of instructions to help manipulate strings and binary coded decimal data. In the VAX, these were promoted to be a standard part of the ISA, known as the character-string instructions. 140

A string was specified with two values, a word (16-bit) containing the length of the string in bytes, and a long (32-bit) pointing to the start of the data in memory. This length/location format was also used for a number of other purposes in the system. When used for strings, the operands are stored in the machine registers R0 through R5, depending on how many are needed for the operation. Registers are used in pairs, with the length in the least significant bits of the odd-numbered register, like R0, and the address in the following register, like R1. 141

String instructions include CMPC to compare two strings, setting the condition register flags N, Z, V and C depending on the outcome. Like most instructions in this set, there are two formats, one taking two registers as operands, CMPC3, and the second taking two pairs of values for length and location, CMPC5. Thus, only two types of memory access were offered, register-register and memory-memory. While the instruction is being performed, R0/R1 contains the position being tested, and if an inequality is detected at any point, processing stops, leaving the remaining length in R0 and the location of the inequality in R1. R2 always equals R0, while R3 is the pointer to the inequality in the second string. 143 Other simple instructions include LOCC to find the first occurrence of a character in a string, MATCHC, which is similar but looks for a multi-character match (string within a string), MOVC which copies characters from one string to another, and SKPC which skips over a given character (useful for skipping blanks). 142-157

More powerful commands are intended to perform translations. MOVETC (translate characters) copies data from one string to another, but translates each character using a 256-byte table. The table is pointed to using the same length/address style as a string. As each character is read from the source, the numeric value is used as an index into the table, and the value at that location is copied to the output, as opposed to the original character. This can be used for EBCDIC conversions, for instance, by placing the corresponding EBCDIC character code for the mapped ASCII codes in the table. The character "E" is character 69 in ASCII and 197 in EBCDIC, so to convert EBCDIC to ASCII one would make a table of 256 bytes with a 69 in location 197. When MOVTC is called and sees a 197 in the original string, it will output 97 in the new string, performing the conversion. MOVTUC is similar, but stops the conversion when a particular character is seen, which is useful if the input may include an end-of-line or end-of-file marker. 153 SCANC is similar, but also ANDs the characters with a mask byte provided as an operand, stopping when it gets a non-zero result. This can be used to scan a string for a given set of characters. SPANC works similarly, but stops when it encounters a zero result, as opposed to non-zero. 142-157

Decimal string instructions
The PDP-11's CIS instructions also included a set of instructions for handling binary coded decimal (BCD) data. Packed BCD stores two decimal digits per byte, so a single 32-bit long stores 8 digits in total. VAX induces instructions for packing digits into words, unpacking them, and performing integer operations on those numbers without having to first unpack them.

CRC
An addition for the VAX was the CRC instruction to perform cyclic redundancy check calculations. This takes a string descriptor to point to data in memory, along with an initial CRC value, typically zero, and a 16-byte table that contains the polynomial that describes the CRC function. It then scans the bytes in the "string" and calculates the resulting CRC and places it in R0. This function is optional in subsets of the ISA, like microVAX. 161-162

Queue instructions
The VAX included instructions that atomically manipulated doubly linked lists in order to provide single-instruction operations on queues. This relied on the queue being implemented with a pair of longword addresses for the forward and backward links, respectively. 102 When the queue instructions were called, the program would provide a pointer to one of these pointer pairs, and then follow the links in that pair to find the previous or next item in the list to perform operations on them.

INSQHI inserted a new entry at the head of a queue, setting the next item's (the current head) previous pointer to point to the new item. INSQTI did the same at the tail, while REMQHI and REMQTI performed removals at the head and tail. The "I" at the end of these instruction names indicate the operation is "interlocked", meaning it is non-interruptible and a context switch cannot occur until it is complete. This avoided issues when different programs using the same queue might be inserting or removing entires at the same time, leading to dangling pointers. INSQUE and REMQUE inserted or removed entries from any point in the queue, but were not interlocked and only to be used on non-shared data. 108

VAX Vector Architecture
In 1990, DEC released an extension VAX to add vector processing to the VAX family. It was first implemented on the VAX 9000 and VAX 6000 Model 400. 210

The vector units were implemented in new hardware registers, consisting of a set of sixteen 64-word vectors, each storing 64-bit words. Thus, every vector held 512 bytes, and the system as a whole had 8192 (8 k) bytes of register values. Because the register file was so large, loading and saving it during a context switch was very expensive, and could take up to several hundred microseconds. To avoid this, the operating system turned off the vector unit on a context switch, and only turned it back on when a vector instruction was issued, at which point it would save the state. This meant that in typical use, where a single vector program was running among a number of "normal" applications, the state would not have to be saved often, or at all, and the registers would contain the same values where they left off when the vector application switched back in. 210

Three additional registers were used to control the execution of instructions. The Vector Length Register (VLR) indicated how many of the 64-word entries in the vector should be processed, allowing it to, for instance, ignore empty entries at the end. The Vector Mask Register (VMR) is a 64-bit register with each bit controlling whether the instructions should apply to the corresponding work. This provides a more fine-grained control than the VLR. The VMR was also used to store per-word comparison results; when two vectors were compared for equality, for instance, mark bits in the VMR would indicate which words were equal and spaces indicated they were not equal. Finally, the Vector Count Register (VCR) indicates the length of a vector that has been unpacked from storage, where it might be run length encoded or in a sparse array. 206

Instructions were 16-bits, along with a variable number of following control bytes indicating the mode for the operation and the sources and destination registers, if any. Instructions could be lengthy; a comparison operation could take as many as 16 bytes. 207 There were 63 opcodes in total, although this includes different versions of the same instruction that had different inputs. The hardware could perform operations using the F, D and G floating point formats. 205 For instance, there were several different ADD instructions; VSADDL added a provided integer longword to the elements of a vector (VS for vector-scalar), VSADDF did the same with an F format single-precision floating point while VSADDD and VSADDG did the same for D and G formats, and VVADDL, F, D and G performed the same operations between the elements of two vectors, potentially masked by VMR. Thus, there were a total of eight ADD opcodes. 208

In contrast to the main VAX ISA, the vector instructions did not include memory-memory or memory-register instructions, only register-register. Memory was accessed only through dedicated memory-register instructions, in a fashion very similar to the RISC methodology. There were four instructions for moving data, VLDL loaded longwords. into a vector, VLDQ loaded quads, and VSTL and VSTQ saved data back to main memory. This was expanded with the use of a scatter/gather unit, which allowed data to be collected from different locations in memory. This was generally used to collect values from arrays or sparse vectors which contain zeros which can be ignored instead of loading them and masking them off. VGATFIL and VGATHQ gathered longs or quads from memory, and VSCATL and VSCATQ scattered them back. 208 The IOTA instruction was used to produce a packed memory structure used by the scatter/gather system, with elements that pointed to the start and length of the vectors to be loaded or saved. Other general instructions included VVMERGE which merged two vectors into one using the VMR, and MFVP and MTVP to read and write values in the control registers. 208

The vector unit was essentially a separate computer controlled by one of the main system's CPUs. VAX machines had up to four CPUs, one or all of which might have an attached vector unit. Because the vector units ran independently and asynchronous of the associated CPU, it was possible that a program performing both vector and scalar operations, which is most of them, would have to periodically synchronize. This was handled through the MFVP, which performed the MSYNC function, waiting until all outstanding memory accesses are complete before continuing. The related VSYNC ensured that all memory accesses within the vector unit itself were complete, to avoid different vector units reading and writing memory out of order when sharing data. 212