Burroughs Large Systems

The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables. The first machine in the family was the B5000 in 1961, which was optimized for compiling ALGOL 60 programs extremely well, using single-pass compilers. The B5000 evolved into the B5500 (disk rather than drum) and the B5700 (up to four systems running as a cluster). Subsequent major redesigns include the B6500/B6700 line and its successors, as well as the separate B8500 line.

In the 1970s, the Burroughs Corporation was organized into three divisions with very different product line architectures for high-end, mid-range, and entry-level business computer systems. Each division's product line grew from a different concept for how to optimize a computer's instruction set for particular programming languages. "Burroughs Large Systems" referred to all of these large-system product lines together, in contrast to the COBOL-optimized Medium Systems (B2000, B3000, and B4000) or the flexible-architecture Small Systems (B1000).

Background
Founded in the 1880s, Burroughs was the oldest continuously operating company in computing (Elliott Brothers was founded before Burroughs, but did not make computing devices in the 19th century). By the late 1950s its computing equipment was still limited to electromechanical accounting machines such as the Sensimatic. It had nothing to compete with its traditional rivals IBM and NCR, who had started to produce larger-scale computers, or with recently founded Univac. In 1956, they purchased ElectroData Corporation and rebranded its design as the B205.

Burroughs' first internally developed machine, the B5000, was designed in 1961 and Burroughs sought to address its late entry in the market with the strategy of a completely different design based on the most advanced computing ideas available at the time. While the B5000 architecture is dead, it inspired the B6500 (and subsequent B6700 and B7700). Computers using that architecture were still in production as the Unisys ClearPath Libra servers which run an evolved but compatible version of the MCP operating system first introduced with the B6700. The third and largest line, the B8500, had no commercial success. In addition to a proprietary CMOS processor design, Unisys also uses Intel Xeon processors and runs MCP, Microsoft Windows and Linux operating systems on their Libra servers; the use of custom chips was gradually eliminated, and by 2018 the Libra servers had been strictly commodity Intel for some years.

B5000, B5500, and B5700
The first member of the first series, the B5000, was designed beginning in 1961 by a team under the leadership of Robert (Bob) Barton. It had an unusual architecture. It has been listed by the computing scientist John Mashey as one of the architectures that he admires the most. "I always thought it was one of the most innovative examples of combined hardware/software design I've seen, and far ahead of its time." The B5000 was succeeded by the B5500, which used disks rather than drum storage, and the B5700, which allowed multiple CPUs to be clustered around shared disk. While there was no successor to the B5700, the B5000 line heavily influenced the design of the B6500, and Burroughs ported the Master Control Program (MCP) to that machine.

Features

 * Hardware was designed to support software requirements
 * Hardware designed to exclusively support high-level programming languages
 * Simplified instruction set
 * No Assembly language or assembler; all system software written in an extended variety of ALGOL 60 named ESPOL. However, ESPOL had statements for each of the syllables in the architecture.
 * Partially data-driven tagged and descriptor-based design
 * Few programmer accessible registers
 * Stack machine where all operations use the stack rather than explicit operands. This approach has by now fallen out of favor.
 * All interrupts and procedure calls use the stack
 * Support for other languages such as COBOL
 * Powerful string manipulation
 * All code automatically reentrant: programmers don't have to do anything more to have any code in any language spread across processors than to use just the two shown simple primitives.
 * Support for an operating system (MCP, Master Control Program)
 * Support for asymmetric (master/slave) multiprocessing
 * An attempt at a secure architecture prohibiting unauthorized access of data or disruptions to operations
 * Early error-detection supporting development and testing of software
 * A commercial implementation virtual memory, preceded only by the Ferranti Atlas.
 * First segmented memory model

System design
The B5000 was unusual at the time in that the architecture and instruction set were designed with the needs of software taken into consideration. This was a large departure from the computer system design of the time, where a processor and its instruction set would be designed and then handed over to the software people.

The B5000, B5500 and B5700 in Word Mode has two different addressing modes, depending on whether it is executing a main program (SALF off) or a subroutine (SALF on). For a main program, the T field of an Operand Call or Descriptor Call syllable is relative to the Program Reference Table (PRT). For subroutines, the type of addressing is dependent on the high three bits of T and on the Mark Stack FlipFlop (MSFF), as shown in B5x00 Relative Addressing.

Language support
The B5000 was designed to exclusively support high-level languages. This was at a time when such languages were just coming to prominence with FORTRAN and then COBOL. FORTRAN and COBOL were considered weaker languages by some, when it comes to modern software techniques, so a newer, mostly untried language was adopted, ALGOL-60. The ALGOL dialect chosen for the B5000 was Elliott ALGOL, first designed and implemented by C. A. R. Hoare on an Elliott 503. This was a practical extension of ALGOL with I/O instructions (which ALGOL had ignored) and powerful string processing instructions. Hoare's famous Turing Award lecture was on this subject.

Thus the B5000 was based on a very powerful language. Donald Knuth had previously implemented ALGOL 58 on an earlier Burroughs machine during the three months of his summer break, and he was peripherally involved in the B5000 design as a consultant. Many wrote ALGOL off, mistakenly believing that high-level languages could not have the same power as assembler, and thus not realizing ALGOL's potential as a systems programming language.

The Burroughs ALGOL compiler was very fast &mdash; this impressed the Dutch scientist Edsger Dijkstra when he submitted a program to be compiled at the B5000 Pasadena plant. His deck of cards was compiled almost immediately and he immediately wanted several machines for his university, Eindhoven University of Technology in the Netherlands. The compiler was fast for several reasons, but the primary reason was that it was a one-pass compiler. Early computers did not have enough memory to store the source code, so compilers (and even assemblers) usually needed to read the source code more than once. The Burroughs ALGOL syntax, unlike the official language, requires that each variable (or other object) be declared before it is used, so it is feasible to write an ALGOL compiler that reads the data only once. This concept has profound theoretical implications, but it also permits very fast compiling. Burroughs large systems could compile as fast as they could read the source code from the punched cards, and they had the fastest card readers in the industry.

The powerful Burroughs COBOL compiler was also a one-pass compiler and equally fast. A 4000-card COBOL program compiled as fast as the 1000-card/minute readers could read the code. The program was ready to use as soon as the cards went through the reader.



B6500, B6700/B7700, and successors
The B6500 (delivery in 1969 ) and B7500 were the first computers in the only line of Burroughs systems to survive to the present day. While they were inspired by the B5000, they had a totally new architecture. Among the most important differences were
 * The B6500 had variable length instructions with an 8-bit syllable instead of fixed length instructions with a 12-bit syllable.
 * The B6500 had a 51-bit instead of a 48-bit word, and used 3 bits as a tag
 * The B6500 had Symmetric Multiprocessing (SMP)
 * The B6500 had a Saguaro stack
 * The B6500 had paged arrays
 * The B6500 had Display Registers, D1 thru D32 to allow nested subroutines to access variables in outer blocks.
 * The B6500 used monolithic integrated circuits with magnetic thin-film memory.

Among other customers of the B6700 and B7700 were all five New Zealand universities in 1971.

B8500
The B8500 line derives from the D825, a military computer that was inspired by the B5000.

The B8500 was designed in the 1960s as an attempt to merge the B5500 and the D825 designs. The system used monolithic integrated circuits with magnetic thin-film memory. The architecture employed a 48-bit word, stack, and descriptors like the B5500, but was not advertised as being upward-compatible. The B8500 could never be gotten to work reliably, and the project was canceled after 1970, never having delivered a completed system.

History
The central concept of virtual memory appeared in the designs of the Ferranti Atlas and the Rice Institute Computer, and the central concepts of descriptors and tagged architecture appeared in the design of the Rice Institute Computer in the late 1950s. However, even if those designs had a direct influence on Burroughs, the architectures of the B5000, B6500 and B8500 were very different from those of the Atlas and the Rice machine; they are also very different from each other.

The first of the Burroughs large systems was the B5000. Designed in 1961, it was a second-generation computer using discrete transistor logic and magnetic-core memory, followed by the B5500 and B5700. The first machines to replace the B5000 architecture were the B6500 and B7500. The successor machines to the B6500 and B7500 followed the hardware development trends to re-implement the architectures in new logic over the next 25 years, with the B6500, B7500, B6700, B7700, B6800, B7800, B5900, B7900 and finally the Burroughs A series. After a merger in which Burroughs acquired Sperry Corporation and changed its name to Unisys, the company continued to develop new machines based on the MCP CMOS ASIC. These machines were the Libra 100 through the Libra 500, with the Libra 590 being announced in 2005. Later Libras, including the 590, also incorporate Intel Xeon processors and can run the Burroughs large systems architecture in emulation as well as on the MCP CMOS processors. It is unclear if Unisys will continue development of new MCP CMOS ASICs.

Primary lines of hardware
Hardware and software design, development, and manufacturing were split between two primary locations, in Orange County, California, and the outskirts of Philadelphia. The initial Large Systems Plant, which developed the B5000 and B5500, was located in Pasadena, California but moved to City of Industry, California, where it developed the B6500. The Orange County location, which was based in a plant in Mission Viejo, California but at times included facilities in nearby Irvine and Lake Forest, was responsible for the smaller B6x00 line, while the East Coast operations, based in Tredyffrin, Pennsylvania, handled the larger B7x00 line. All machines from both lines were fully object-compatible, meaning a program compiled on one could be executed on another. Newer and larger models had instructions which were not supported on older and slower models, but the hardware, when encountering an unrecognized instruction, invoked an operating system function which interpreted it. Other differences include how process switching and I/O were handled, and maintenance and cold-starting functionality. Larger systems included hardware process scheduling and more capable input/output modules, and more highly functional maintenance processors. When the Bxx00 models were replaced by the A Series models, the differences were retained but no longer readily identifiable by model number.

ALGOL
The Burroughs large systems implement ALGOL-derived stack architectures. The B5000 was the first stack-based system.

While B5000 was specifically designed to support ALGOL, this was only a starting point. Other business-oriented languages such as COBOL were also well supported, most notably by the powerful string operators which were included for the development of fast compilers.

The ALGOL used on the B5000 is an extended ALGOL subset. It includes powerful string manipulation instructions but excludes certain ALGOL constructs, notably unspecified formal parameters. A DEFINE mechanism serves a similar purpose to the #defines found in C, but is fully integrated into the language rather than being a preprocessor. The EVENT data type facilitates coordination between processes, and ON FAULT blocks enable handling program faults.

The user level of ALGOL does not include many of the insecure constructs needed by the operating system and other system software. Two levels of language extensions provide the additional constructs: ESPOL and NEWP for writing the MCP and closely related software, and DCALGOL and DMALGOL to provide more specific extensions for specific kinds of system software.

ESPOL and NEWP
Originally, the B5000 MCP operating system was written in an extension of extended ALGOL called ESPOL (Executive Systems Programming Oriented Language). This was replaced in the mid-to-late 70s by a language called NEWP. Though NEWP probably just meant "New Programming language", legends surround the name. A common (perhaps apocryphal) story within Burroughs at the time suggested it came from "No Executive Washroom Privileges." Another story is that circa 1976, John McClintock of Burroughs (the software engineer developing NEWP) named the language "NEWP" after being asked, yet again, "does it have a name yet": answering "nyoooop", he adopted that as a name. NEWP, too, was a subset ALGOL extension, but it was more secure than ESPOL, and dropped some little-used complexities of ALGOL. In fact, all unsafe constructs are rejected by the NEWP compiler unless a block is specifically marked to allow those instructions. Such marking of blocks provide a multi-level protection mechanism.

NEWP programs that contain unsafe constructs are initially non-executable. The security administrator of a system is able to "bless" such programs and make them executable, but normal users are not able to do this. (Even "privileged users", who normally have essentially root privilege, may be unable to do this depending on the configuration chosen by the site.) While NEWP can be used to write general programs and has a number of features designed for large software projects, it does not support everything ALGOL does.

NEWP has a number of facilities to enable large-scale software projects, such as the operating system, including named interfaces (functions and data), groups of interfaces, modules, and super-modules. Modules group data and functions together, allowing easy access to the data as global within the module. Interfaces allow a module to import and export functions and data. Super-modules allow modules to be grouped.

DCALGOL and Message Control Systems (MCS)
In the original implementation, the system used an attached specialized data communications processor (DCP) to handle the input and output of messages from/to remote devices. This was a 24-bit minicomputer with a conventional register architecture and hardware I/O capability to handle thousands of remote terminals. The DCP and the B6500 communicated by messages in memory, essentially packets in today's terms, and the MCS did the B6500-side processing of those messages. In the early years the DCP did have an assembler (Dacoma), an application program called DCPProgen written in B6500 ALGOL. Later the NDL (Network Definition Language) compiler generated the DCP code and NDF (network definition file). Ultimately, a further update resulted in the development of the NDLII language and compiler which were used in conjunction with the model 4 and 5 DCPs. There was one ALGOL function for each kind of DCP instruction, and if you called that function, then the corresponding DCP instruction bits would be emitted to the output. A DCP program was an ALGOL program comprising nothing but a long list of calls on these functions, one for each assembly language statement. Essentially ALGOL acted like the macro pass of a macro assembler. The first pass was the ALGOL compiler; the second pass was running the resulting program (on the B6500) which would then emit the binary for the DCP.

Starting in the early 1980's, the DCP technology was replaced by the ICP (Integrated Communications Processor) which provided LAN based connectivity for the mainframe system. Remote devices, and remote servers/mainframes, were connected to the network via freestanding devices called CP2000s. The CP2000s were designed to provide network node support in a distributed network wherein the nodes were connected using the BNAV2 (Burroughs Network Architecture Version 2) networking technology. BNAV2 was a Burroughs functional equivalent of the IBM SNA product and did support interoperation with IBM environments in both PUT2 and PUT5 transport modes. The change in external datacommunications hardware did not require any change to existing MCS (Message Control System (discussed below)) software.

On input, messages were passed from the DCP via an internal bus to the relevant MCP Datacom Control (DCC) DCP process stack. One DCC process was initiated for each DCP configured on the system. The DCP Process stack would then ensure that the inbound message was queued for delivery to the MCS identified to handle traffic from the particular source device and return any response to the DCP for delivery to the destination device. From a processing perspective no changes were required to the MCS software to handle different types of gateway hardware, be it any of the 5 styles of DCP or the ICP or ICP/CP2000 combinations.

Apart from being a message delivery service, an MCS is an intermediate level of security between operating system code (in NEWP) and user programs (in ALGOL, or other application languages including COBOL, FORTRAN, and, in later days, JAVA). An MCS may be considered to be a middleware program and is written in DCALGOL (Data Communications ALGOL). As stated above, the MCS received messages from queues maintained by the Datacom Control Stack (DCC) and forwarded these messages to the appropriate application/function for processing. One of the original MCS's was CANDE (Command AND Edit) which was developed as the online program development environment. The University of Otago in New Zealand developed a skinny program development environment equivalent to CANDE which they called SCREAM/6700 in the same time that IBM was offering a remote time-sharing/program development service known as CALL/360 which ran on IBM 360 series systems. Another MCS named COMS was introduced around 1984 and developed as a high performance transaction processing control system. There were predecessor transaction processing environments which included GEMCOS (GEneralized Message COntrol System), and an Australian Burroughs subsidiary developed MCS called TPMCS (Transaction Processing MCS). The transaction processing MCS's supported the delivery of application data to online production environments and the return of responses to remote users/devices/systems.

MCSs are items of software worth noting – they control user sessions and provide keeping track of user state without having to run per-user processes since a single MCS stack can be shared by many users. Load balancing can also be achieved at the MCS level. For example, saying that you want to handle 30 users per stack, in which case if you have 31 to 60 users, you have two stacks, 61 to 90 users, three stacks, etc. This gives B5000 machines a great performance advantage in a server since you don't need to start up another user process and thus create a new stack each time a user attaches to the system. Thus you can efficiently service users (whether they require state or not) with MCSs. MCSs also provide the backbone of large-scale transaction processing.

Around 1988 an implementation of TCP/IP was developed principally for a U.S. government customer utilizing the CP2000 distributed communications processor as the protocol host. Two to three years later, the TCP/IP implementation was rewritten to be host/server based with significant performance and functionality improvements. In the same general time frame an implementation of the OSI protocol stacks was made, principally on the CP2000, but a large supporting infrastructure was implemented on the main system. All of the OSI standard defined applications were implemented including X.400 mail hosting and X.500 directory services.

DMALGOL and databases
Another variant of ALGOL is DMALGOL (Data Management ALGOL). DMALGOL is ALGOL extended for compiling the DMSII database software from database description files created by the DASDL (Data Access and Structure Definition Language) compiler. Database designers and administrators compile database descriptions to generate DMALGOL code tailored for the tables and indexes specified. Administrators never need to write DMALGOL themselves. Normal user-level programs obtain database access by using code written in application languages, mainly ALGOL and COBOL, extended with database instructions and transaction processing directives. The most notable feature of DMALGOL is its preprocessing mechanisms to generate code for handling tables and indices.

DMALGOL preprocessing includes variables and loops, and can generate names based on compile-time variables. This enables tailoring far beyond what can be done by preprocessing facilities which lack loops.

DMALGOL is used to provide tailored access routines for DMSII databases. After a database is defined using the Data Access and Structure Definition Language (DASDL), the schema is translated by the preprocessor into tailored DMALGOL access routines and then compiled. This means that, unlike in other DBMS implementations, there is often no need for database-specific if/then/else code at run-time. In the 1970s, this "tailoring" was used very extensively to reduce the code footprint and execution time. It became much less used in later years, partly because low-level fine tuning for memory and speed became less critical, and partly because eliminating the preprocessing made coding simpler and thus enabled more important optimizations.

An applications version of ALGOL to support the accessing of databases from application programs is called BDMSALGOL and included verbs like "FIND", "LOCK", "STORE", "GET", and "PUT" for database access and record manipulation. Additionally, the verbs "BEGINTRANSACTION" and "ENDTRANSACTION" were also implemented to solve the deadlock situation when multiple processes accessed and updated the same structures.

Roy Guck of Burroughs was one of the main developers of DMSII.

In later years, with compiler code size being less of a concern, most of the preprocessing constructs were made available in the user level of ALGOL. Only the unsafe constructs and the direct processing of the database description file remain restricted to DMALGOL.

Stack architecture
In many early systems and languages, programmers were often told not to make their routines too small. Procedure calls and returns were expensive, because a number of operations had to be performed to maintain the stack. The B5000 was designed as a stack machine – all program data except for arrays (which include strings and objects) was kept on the stack. This meant that stack operations were optimized for efficiency. As a stack-oriented machine, there are no programmer addressable registers.

Multitasking is also very efficient on the B5000 and B6500 lines. There are specific instruction to perform process switches:
 * B5000, B5500, B5700
 * Initiate P1 (IP1) and Initiate P2 (IP2)

Each stack and associated Program Reference Table (PRT) represents a process (task or thread) and tasks can become blocked waiting on resource requests (which includes waiting for a processor to run on if the task has been interrupted because of preemptive multitasking). User programs cannot issue an IP1, IP2 or MVST, and there is only one place in the operating system where this is done.
 * B6500, B7500 and successors
 * MVST (move stack).

So a process switch proceeds something like this – a process requests a resource that is not immediately available, maybe a read of a record of a file from a block which is not currently in memory, or the system timer has triggered an interrupt. The operating system code is entered and run on top of the user stack. It turns off user process timers. The current process is placed in the appropriate queue for the resource being requested, or the ready queue waiting for the processor if this is a preemptive context switch. The operating system determines the first process in the ready queue and invokes the instruction move_stack, which makes the process at the head of the ready queue active.

Stack speed and performance
Stack performance was considered to be slow compared to register-based architectures, for example, such an architecture had been considered and rejected for the System/360. One way to increase system speed is to keep data as close to the processor as possible. In the B5000 stack, this was done by assigning the top two positions of the stack to two registers A and B. Most operations are performed on those two top of stack positions. On faster machines past the B5000, more of the stack may be kept in registers or cache near the processor.

Thus the designers of the current successors to the B5000 systems can optimize in whatever is the latest technique, and programmers do not have to adjust their code for it to run faster – they do not even need to recompile, thus protecting software investment. Some programs have been known to run for years over many processor upgrades. Such speed up is limited on register-based machines.

Another point for speed as promoted by the RISC designers was that processor speed is considerably faster if everything is on a single chip. It was a valid point in the 1970s when more complex architectures such as the B5000 required too many transistors to fit on a single chip. However, this is not the case today and every B5000 successor machine now fits on a single chip as well as the performance support techniques such as caches and instruction pipelines.

In fact, the A Series line of B5000 successors included the first single chip mainframe, the Micro-A of the late 1980s. This "mainframe" chip (named SCAMP for Single-Chip A-series Mainframe Processor) sat on an Intel-based plug-in PC board.

How programs map to the stack
Here is an example of how programs map to the stack structure

begin — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —   — This is lexical level 2 (level zero is reserved for the operating system and level 1 for code segments). — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —   — At level 2 we place global variables for our program. integer i, j, k; real f, g; array a [0:9]; procedure p (real p1, p2); value p1;    — p1 passed by value, p2 implicitly passed by reference. begin — — — — — — — — — — — — — — — — — —         — This block is at lexical level 3 — — — — — — — — — — — — — — — — — —         real r1, r2;

r2 := p1 * 5; p2 := r2; — This sets g to the value of r2 p1 := r2; — This sets p1 to r2, but not f — Since this overwrites the original value of f in p1 it might be a         — coding mistake. Some few of ALGOL's successors therefore insist that — value parameters be read only – but most do not. if r2 > 10 then begin — — — — — — — — — — — — — — — — — — — — — — — — — — — —               — A variable declared here makes this lexical level 4 — — — — — — — — — — — — — — — — — — — — — — — — — — — —               integer n;

— The declaration of a variable makes this a block, which will invoke some — stack building code. Normally you won't declare variables here, in which — case this would be a compound statement, not a block. ... <== sample stack is executing somewhere here. end; end; .....   p (f, g); end.

Each stack frame corresponds to a lexical level in the current execution environment. As you can see, lexical level is the static textual nesting of a program, not the dynamic call nesting. The visibility rules of ALGOL, a language designed for single pass compilers, mean that only variables declared before the current position are visible at that part of the code, thus the requirement for forward declarations. All variables declared in enclosing blocks are visible. Another case is that variables of the same name may be declared in inner blocks and these effectively hide the outer variables which become inaccessible.

Lexical nesting is static, unrelated to execution nesting with recursion, etc. so it is very rare to find a procedure nested more than five levels deep, and it could be argued that such programs would be poorly structured. B5000 machines allow nesting of up to 32 levels. This could cause difficulty for some systems that generated Algol source as output (tailored to solve some special problem) if the generation method frequently nested procedure within procedure.

Procedures
Procedures can be invoked in four ways – normal, call, process, and run.

The normal invocation invokes a procedure in the normal way any language invokes a routine, by suspending the calling routine until the invoked procedure returns.

The call mechanism invokes a procedure as a coroutine. Coroutines are partner tasks established as synchronous entities operating in their own stack at the same lexical level as the initiating process. Control is explicitly passed between the initiating process and the coroutine by means of a CONTINUE instruction.

The process mechanism invokes a procedure as an asynchronous task with a separate stack set up starting at the lexical level of the processed procedure. As an asynchronous task, there is no control over exactly when control will be passed between the tasks, unlike coroutines. The processed procedure still has access to the enclosing environment and this is a very efficient IPC (Inter Process Communication) mechanism. Since two or more tasks now have access to common variables, the tasks must be synchronized to prevent race conditions, which is handled by the EVENT data type, where processes can WAIT on one, or more, events until it is caused by another cooperating process. EVENTs also allow for mutual exclusion synchronization through the PROCURE and LIBERATE functions. If for any reason the child task dies, the calling task can continue – however, if the parent process dies, then all child processes are automatically terminated. On a machine with more than one processor, the processes may run simultaneously. This EVENT mechanism is a basic enabler for multiprocessing in addition to multitasking.

Run invocation type
The last invocation type is run. This runs a procedure as an independent task which can continue on after the originating process terminates. For this reason, the child process cannot access variables in the parent's environment, and all parameters passed to the invoked procedure must be call-by-value.

Thus, Burroughs Extended ALGOL had some of the multi-processing and synchronization features of later languages like Ada. It made use of the support for asynchronous processes that was built into the hardware.

Inline procedures
One last possibility is that in NEWP a procedure may be declared INLINE, that is when the compiler sees a reference to it the code for the procedure is generated inline to save the overhead of a procedure call; this is best done for small pieces of code. Inline functions are similar to parameterized macros such as C #defines, except you don't get the problems with parameters that you can with macros.

Asynchronous calls
In the example program only normal calls are used, so all the information will be on a single stack. For asynchronous calls, a separate stack is initiated for each asynchronous process so that the processes share data but run asynchronously.

Display registers
A stack hardware optimization is the provision of D (or "display") registers. These are registers that point to the start of each called stack frame. These registers are updated automatically as procedures are entered and exited and are not accessible by any software other than the MCP. There are 32 D registers, which is what limits operations to 32 levels of lexical nesting.

Consider how we would access a lexical level 2 (D[2]) global variable from lexical level 5 (D[5]). Suppose the variable is 6 words away from the base of lexical level 2. It is thus represented by the address couple (2, 6). If we don't have D registers, we have to look at the control word at the base of the D[5] frame, which points to the frame containing the D[4] environment. We then look at the control word at the base of this environment to find the D[3] environment, and continue in this fashion until we have followed all the links back to the required lexical level. This is not the same path as the return path back through the procedures which have been called in order to get to this point. (The architecture keeps both the data stack and the call stack in the same structure, but uses control words to tell them apart.)

As you can see, this is quite inefficient just to access a variable. With D registers, the D[2] register points at the base of the lexical level 2 environment, and all we need to do to generate the address of the variable is to add its offset from the stack frame base to the frame base address in the D register. (There is an efficient linked list search operator LLLU, which could search the stack in the above fashion, but the D register approach is still going to be faster.) With D registers, access to entities in outer and global environments is just as efficient as local variable access.

D Tag Data               — Address couple, Comments register

| 0       | n          | (4, 1) The integer n (declared on entry to a block, not a procedure) |---| | D[4]==>3 | MSCW      | (4, 0) The Mark Stack Control Word containing the link to D[3]. |=======================| | 0       | r2         | (3, 5) The real r2 |---| | 0       | r1         | (3, 4) The real r1 |---| | 1       | p2         | (3, 3) A SIRW reference to g at (2,6) |---| | 0       | p1         | (3, 2) The parameter p1 from value of f |---| | 3       | RCW        | (3, 1) A return control word |---| | D[3]==>3 | MSCW      | (3, 0) The Mark Stack Control Word containing the link to D[2]. |=======================| | 1       | a          | (2, 7) The array a  ======>[ten word memory block] |---| | 0       | g          | (2, 6) The real g |---| | 0       | f          | (2, 5) The real f |---| | 0       | k          | (2, 4) The integer k |---| | 0       | j          | (2, 3) The integer j |---| | 0       | i          | (2, 2) The integer i |---| | 3       | RCW        | (2, 1) A return control word |---| | D[2]==>3 | MSCW      | (2, 0) The Mark Stack Control Word containing the link to the previous stack frame. |=======================| — Stack bottom

If we had invoked the procedure p as a coroutine, or a process instruction, the D[3] environment would have become a separate D[3]-based stack. This means that asynchronous processes still have access to the D[2] environment as implied in ALGOL program code. Taking this one step further, a totally different program could call another program's code, creating a D[3] stack frame pointing to another process' D[2] environment on top of its own process stack. At an instant the whole address space from the code's execution environment changes, making the D[2] environment on the own process stack not directly addressable and instead make the D[2] environment in another process stack directly addressable. This is how library calls are implemented. At such a cross-stack call, the calling code and called code could even originate from programs written in different source languages and be compiled by different compilers.

The D[1] and D[0] environments do not occur in the current process's stack. The D[1] environment is the code segment dictionary, which is shared by all processes running the same code. The D[0] environment represents entities exported by the operating system.

Stack frames actually don't even have to exist in a process stack. This feature was used early on for file I/O optimization, the FIB (file information block) was linked into the display registers at D[1] during I/O operations. In the early nineties, this ability was implemented as a language feature as STRUCTURE BLOCKs and – combined with library technology - as CONNECTION BLOCKs. The ability to link a data structure into the display register address scope implemented object orientation. Thus, the B6500 actually used a form of object orientation long before the term was ever used.

On other systems, the compiler might build its symbol table in a similar manner, but eventually the storage requirements would be collated and the machine code would be written to use flat memory addresses of 16-bits or 32-bits or even 64-bits. These addresses might contain anything so that a write to the wrong address could damage anything. Instead, the two-part address scheme was implemented by the hardware. At each lexical level, variables were placed at displacements up from the base of the level's stack, typically occupying one word - double precision or complex variables would occupy two. Arrays were not stored in this area, only a one word descriptor for the array was. Thus, at each lexical level the total storage requirement was not great: dozens, hundreds or a few thousand in extreme cases, certainly not a count requiring 32-bits or more. And indeed, this was reflected in the form of the VALC instruction (value call) that loaded an operand onto the stack. This op-code was two bits long and the rest of the byte's bits were concatenated with the following byte to give a fourteen-bit addressing field. The code being executed would be at some lexical level, say six: this meant that only lexical levels zero to six were valid, and so just three bits were needed to specify the lexical level desired. The address part of the VALC operation thus reserved just three bits for that purpose, with the remainder being available for referring to entities at that and lower levels. A deeply nested procedure (thus at a high lexical level) would have fewer bits available to identify entities: for level sixteen upwards five bits would be needed to specify the choice of levels 0–31 thus leaving nine bits to identify no more than the first 512 entities of any lexical level. This is much more compact than addressing entities by their literal memory address in a 32-bit addressing space. Further, only the VALC opcode loaded data: opcodes for ADD, MULT and so forth did no addressing, working entirely on the top elements of the stack.

Much more important is that this method meant that many errors possible on systems employing flat addressing could not occur because they were simply unspeakable even at the machine code level. A task had no way to corrupt memory in use by another task, because it had no way to develop its address. Offsets from a specified D-register would be checked by the hardware against the stack frame bound: rogue values would be trapped. Similarly, within a task, an array descriptor contained information on the array's bounds, and so any indexing operation was checked by the hardware: put another way, each array formed its own address space. In any case, the tagging of all memory words provided a second level of protection: a misdirected assignment of a value could only go to a data-holding location, not to one holding a pointer or an array descriptor, etc. and certainly not to a location holding machine code.

Array storage
Arrays were not stored contiguous in memory with other variables, they were each granted their own address space, which was located via the descriptor. The access mechanism was to calculate on the stack the index variable (which therefore had the full integer range potential, not just fourteen bits) and use it as the offset into the array's address space, with bound checking provided by the hardware. By default, should an array's length exceed 1,024 words, the array would be segmented, and the index be converted into a segment index and an offset into the indexed segment. There was, however, the option to prevent segmentation by specifying the array as LONG in the declaration. In ALGOL's case, a multidimensional array would employ multiple levels of such addressing. For a reference to A[i,j], the first index would be into an array of descriptors, one descriptor for each of the rows of A, which row would then be indexed with j as for a single-dimensional array, and so on for higher dimensions. Hardware checking against the known bounds of all the array's indices would prevent erroneous indexing.

FORTRAN however regards all multidimensional arrays as being equivalent to a single-dimensional array of the same size, and for a multidimensional array simple integer arithmetic is used to calculate the offset where element A[i,j,k] would be found in that single sequence. The single-dimensional equivalent array, possibly segmented if large enough, would then be accessed in the same manner as a single-dimensional array in ALGOL. Although accessing outside this array would be prevented, a wrong value for one index combined with a suitably wrong value for another index might not result in a bounds violation of the single sequence array; in other words, the indices were not checked individually.

Because an array's storage was not bounded on each side by storage for other items, it was easy for the system to "resize" an array - though changing the number of dimensions was precluded because compilers required all references to have the same number of dimensions. In ALGOL's case, this enabled the development of "ragged" arrays, rather than the usual fixed rectangular (or higher dimension) arrays. Thus in two dimensions, a ragged array would have rows that were of different sizes. For instance, given a large array A[100,100] of mostly-zero values, a sparse array representation that was declared as SA[100,0] could have each row resized to have exactly enough elements to hold only the non-zero values of A along that row.

Because arrays larger than 1024 words were generally segmented but smaller arrays were not, on a system that was short of real memory, increasing the declared size of a collection of scratchpad arrays from 1,000 to say 1,050 could mean that the program would run with far less "thrashing" as only the smaller individual segments in use were needed in memory. Actual storage for an array segment would be allocated at run time only if an element in that segment were accessed, and all elements of a created segment would be initialised to zero. Not initialising an array to zero at the start therefore was encouraged by this, normally an unwise omission.

Array equivalencing is also supported. The ARRAY declaration requested allocation of 48-bit data words which could be used to store any bit pattern but the general operational practice was that each allocated word was considered to be a REAL operand. The declaration of:

ARRAY A [0:99]

requested the allocation of 100 words of type REAL data space in memory. The programmer could also specify that the memory might be referred to as character oriented data by the following equivalence declaration:

EBCDIC ARRAY EA [0] = A [*];

or as hexadecimal data via the equivalence declaration:

HEX ARRAY HA [0] = A [*];

or as ASCII data via the equivalence declaration:

ASCII ARRAY AA [0] = A[*];

The capability to request data type specific arrays without equivalencing is also supported, e.g.

EBCDIC ARRAY MY_EA [0:99]

requested that the system allocated a 100 character array. Given that the architecture is word based, the actual space allocated is the requested number of characters rounded up to the next whole word boundary.

The Data Descriptor generated at compilation time indicated the data type usage for which the array was intended. If an array equivalence declaration was made a copy descriptor indicating that particular usage type was generated but pointed back to the original, or MOM, descriptor. Thus, indexing the correct location in memory was always guaranteed.

BOOLEAN arrays are also supported and may be used as a bit vector. INTEGER arrays may also be requested.

The immediately preceding discussion uses the ALGOL syntax implementation to describe ARRAY declarations, but the same functionality is supported in COBOL and FORTRAN.

Stack structure advantages
One nice thing about the stack structure is that if a program does happen to fail, a stack dump is taken and it is very easy for a programmer to find out exactly what the state of a running program was. Compare that to core dumps and exchange packages of other systems.

Another thing about the stack structure is that programs are implicitly recursive. FORTRAN was not expected to support recursion and perhaps one stumbling block to people's understanding of how ALGOL was to be implemented was how to implement recursion. On the B5000, this was not a problem – in fact, they had the reverse problem, how to stop programs from being recursive. In the end they didn't bother. The Burroughs FORTRAN compiler allowed recursive calls (just as every other FORTRAN compiler does), but unlike many other computers, on a stack-based system the returns from such calls succeeded as well. This could have odd effects, as with a system for the formal manipulation of mathematical expressions whose central subroutines repeatedly invoked each other without ever returning: large jobs were ended by stack overflow!

Thus Burroughs FORTRAN had better error checking than other contemporary implementation of FORTRAN. For instance, for subroutines and functions it checked that they were invoked with the correct number of parameters, as is normal for ALGOL-style compilers. On other computers, such mismatches were common causes of crashes. Similarly with the array-bound checking: programs that had been used for years on other systems embarrassingly often would fail when run on a Burroughs system. In fact, Burroughs became known for its superior compilers and implementation of languages, including the object-oriented Simula (a superset of ALGOL), and Iverson, the designer of APL declared that the Burroughs implementation of APL was the best he'd seen. John McCarthy, the language designer of LISP disagreed, since LISP was based on modifiable code, he did not like the unmodifiable code of the B5000, but most LISP implementations would run in an interpretive environment anyway.

The storage required for the multiple processes came from the system's memory pool as needed. There was no need to do SYSGENs on Burroughs systems as with competing systems in order to preconfigure memory partitions in which to run tasks.

Tagged architecture
The most defining aspect of the B5000 is that it is a stack machine as treated above. However, two other very important features of the architecture is that it is tag-based and descriptor-based.

In the original B5000, a flag bit in each control or numeric word was set aside to identify the word as a control word or numeric word. This was partially a security mechanism to stop programs from being able to corrupt control words on the stack.

Later, when the B6500 was designed, it was realized that the 1-bit control word/numeric distinction was a powerful idea and this was extended to three bits outside of the 48 bit word into a tag. The data bits are bits 0–47 and the tag is in bits 48–50. Bit 48 was the read-only bit, thus odd tags indicated control words that could not be written by a user-level program. Code words were given tag 3. Here is a list of the tags and their function:

Internally, some of the machines had 60 bit words, with the extra bits being used for engineering purposes such as a Hamming code error-correction field, but these were never seen by programmers.

The current incarnation of these machines, the Unisys ClearPath has extended tags further into a four bit tag. The microcode level that specified four bit tags was referred to as level Gamma.

Even-tagged words are user data which can be modified by a user program as user state. Odd-tagged words are created and used directly by the hardware and represent a program's execution state. Since these words are created and consumed by specific instructions or the hardware, the exact format of these words can change between hardware implementation and user programs do not need to be recompiled, since the same code stream will produce the same results, even though system word format may have changed.

Tag 1 words represent on-stack data addresses. The normal IRW simply stores an address couple to data on the current stack. The SIRW references data on any stack by including a stack number in the address. Amongst other things, SIRW's are used to provide addressing between discrete process stacks such as those generated in response to the CALL and PROCESS statements.

Tag 5 words are descriptors, which are more fully described in the next section. Tag 5 words represent off-stack data addresses.

Tag 7 is the program control word which describes a procedure entry point. When hardware operators hit a PCW, the procedure is entered. The ENTR operator explicitly enters a procedure (non-value-returning routine). Functions (value-returning routines) are implicitly entered by operators such as value call (VALC). Global routines are stored in the D[2] environment as SIRWs that point to a PCW stored in the code segment dictionary in the D[1] environment. The D[1] environment is not stored on the current stack because it can be referenced by all processes sharing this code. Thus code is reentrant and shared.

Tag 3 represents code words themselves, which won't occur on the stack. Tag 3 is also used for the stack control words MSCW, RCW, TOSCW.



Descriptor-based architecture
The figure to the left shows how the Burroughs Large System architecture was fundamentally a hardware architecture for object-oriented programming, something that still doesn't exist in conventional architectures.

Instruction sets
There are three distinct instruction sets for the Burroughs large systems. All three are based on short syllables that fit evenly into words.

B5000, B5500 and B5700
Programs on a B5000, B5500 and B5700 are made up of 12-bit syllables, four to a word. The architecture has two modes, Word Mode and Character Mode, and each has a separate repertoire of syllables. A processor may be either Control State or Normal State, and certain syllables are only permissible in Control State. The architecture does not provide for addressing registers or storage directly; all references are through the 1024 word Program Reference Table, current code segment, marked locations within the stack or to the A and B registers holding the top two locations on the stack. Burroughs numbers bits in a syllable from 0 (high bit) to 11 (low bit)

B6500 and successors
Programs are made up of 8-bit syllables, which may be Name Call, Value Call, or form an operator, which may be from one to twelve syllables in length. There are less than 200 operators, all of which fit into 8-bit syllables. Many of these operators are polymorphic depending on the kind of data being acted on as given by the tag. If we ignore the powerful string scanning, transfer, and edit operators, the basic set is only about 120 operators. If we remove the operators reserved for the operating system such as MVST and HALT, the set of operators commonly used by user-level programs is less than 100. The Name Call and Value Call syllables contain address couples; the Operator syllables either use no addresses or use control words and descriptors on the stack.

Multiple processors
The B5000 line also were pioneers in having two processors connected together on a high-speed bus as master and slave. In the B6000, B7000 and B8000 lines the processors were symmetric. The B7000 line could have up to eight processors, as long as at least one was an I/O module. RDLK (ReaD with LocK) is a very low-level way of synchronizing between processors. RDLK operates in a single cycle. The higher level mechanism generally employed by user programs is the EVENT data type. The EVENT data type did have some system overhead. To avoid this overhead, a special locking technique called Dahm locks (named after a Burroughs software guru, Dave Dahm) can be used. Dahm locks used the READLOCK ALGOL language statement which generated a RDLK operator at the code level.

Notable operators are:

HEYU — send an interrupt to another processor

RDLK — Low-level semaphore operator: Load the A register with the memory location given by the A register and place the value in the B register at that memory location in a single uninterruptible cycle. The Algol compiler produced code to invoke this operator via a special function that enabled a "swap" operation on single-word data without an explicit temporary value.

WHOI — Processor identification

IDLE — Idle until an interrupt is received

Two processors could infrequently simultaneously send each other a 'HEYU' command resulting in a lockup known as 'a deadly embrace'.

Influence of the B5000
The direct influence of the B5000 can be seen in the current Unisys ClearPath range of mainframes which are the direct descendants of the B6500, which was influenced by the B5000, and still have the MCP operating system after 40 years of consistent development. This architecture is now called emode (for emulation mode) since the B6500 architecture has been implemented on machines built from Intel Xeon processors running the x86 instruction set as the native instruction set, with code running on those processors emulating the B5000 instruction set. In those machines, there was also going to be an nmode (native mode), but this was dropped, so you may often hear the B6500 successor machines being referred to as "emode machines".

B5000 machines were programmed exclusively in high-level languages; there is no assembler.

The B5000 stack architecture inspired Chuck Moore, the designer of the programming language Forth, who encountered the B5500 while at MIT. In Forth - The Early Years, Moore described the influence, noting that Forth's DUP, DROP and SWAP came from the corresponding B5500 instructions (DUPL, DLET, EXCH).

B5000 machines with their stack-based architecture and tagged memory also heavily influenced the Soviet Elbrus series of mainframes and supercomputers. The first two generations of the series featured tagged memory and stack-based CPUs that were programmed only in high-level languages. There existed a kind of an assembly language for them, called El-76, but it was more or less a modification of ALGOL 68 and supported structured programming and first-class procedures. Later generations of the series, though, switched away from this architecture to the EPIC-like VLIW CPUs.

The Hewlett-Packard designers of the HP 3000 business system had used a B5500 and were greatly impressed by its hardware and software; they aimed to build a 16-bit minicomputer with similar software. Several other HP divisions created similar minicomputer or microprocessor stack machines. Bob Barton's work on reverse Polish notation (RPN) also found its way into HP calculators beginning with the 9100A, and notably the HP-35 and subsequent calculators.

The NonStop systems designed by Tandem Computers in the late 1970s and early 1980s were also 16-bit stack machines, influenced by the B5000 indirectly through the HP 3000 connection, as several of the early Tandem engineers were formerly with HP. Around 1990, these systems migrated to MIPS RISC architecture but continued to support execution of stack machine binaries by object code translation or direct emulation. Sometime after 2000, these systems migrated to Itanium architecture and continued to run the legacy stack machine binaries.

Bob Barton was also very influential on Alan Kay. Kay was also impressed by the data-driven tagged architecture of the B5000 and this influenced his thinking in his developments in object-oriented programming and Smalltalk.

Another facet of the B5000 architecture was that it was a secure architecture that runs directly on hardware. This technique has descendants in the virtual machines of today in their attempts to provide secure environments. One notable such product is the Java JVM which provides a secure sandbox in which applications run.

The value of the hardware-architecture binding that existed before emode would be substantially preserved in the x86-based machines to the extent that MCP was the one and only control program, but the support provided by those machines is still inferior to that provided on the machines where the B6500 instruction set is the native instruction set. A little-known Intel processor architecture that actually preceded 32-bit implementations of the x86 instruction set, the Intel iAPX 432, would have provided an equivalent physical basis, as it too was essentially an object-oriented architecture.