C.mmp



The C.mmp was an early multiple instruction, multiple data (MIMD) multiprocessor system developed at Carnegie Mellon University (CMU) by William Wulf (1971). The notation C.mmp came from the PMS notation of Gordon Bell and Allen Newell, where a central processing unit (CPU) was designated as C, a variant was noted by the dot notation, and mmp stood for Multi-Mini-Processor. , the machine is on display at CMU, in Wean Hall, on the ninth floor.

Structure
Sixteen Digital Equipment Corporation PDP-11 minicomputers were used as the processing elements, named Compute Modules (CMs) in the system. Each CM had a local memory of 8K and a local set of peripheral devices. One of the challenges was that a device was only available through its unique connected processor, so the input/output (I/O) system (designed by Roy Levin) hid the connectivity of the devices and routed the requests to the hosting processor. If a processor went down, the devices connected to its Unibus became unavailable, which became a problem in overall system reliability. Processor 0 (the boot processor) had the disk drives attached.

Each of the Compute Modules shared these communication pathways:
 * An Interprocessor bus – used to distribute system-wide clock, interrupt, and process control messaging among the CMs
 * A 16x16 crossbar switch – used to connect the 16 CMs on one side and 16 banks of shared memory on the other. If all 16 processors were accessing different banks of memory, the memory accesses would all be concurrent. If two or more processors were trying to access the same bank of memory, one of them would be granted access on one cycle and the remainder would be negotiated on subsequent memory cycles.

Since the PDP-11 had a logical address space of 16-bits, another address translation unit was added to expand the address space to 25 bits for the shared memory space. The Unibus architecture provided 18 bits of physical address, and the two high-order bits were used to select one of four relocation registers which selected a bank of memory. Properly managing these registers was one of the challenges of programming the operating system (OS) kernel.

The original C.mmp design used magnetic-core memory, but during its lifetime, higher performance dynamic random-access memory (RAM) became available and the system was upgraded.

The original processors were PDP-11/20 processors, but in the final system, only five of these were used; the remaining 11 were PDP-11/40 processors, which were modified by having extra writeable microcode space. All modifications to these machines were designed and built at CMU.

Most of the 11/20 modifications were custom changes to the wire-wrapped backplane, but because the PDP-11/40 was implemented in microcode, a separate proc-mod board was designed that intercepted certain instructions and implemented the protected operating system requirements. For example, it was necessary, for operating system integrity, that the stack pointer register never be odd. On the 11/20, this was done by clipping the lead to the low-order bit of the stack register. On the 11/40, any access to the stack was intercepted by the proc-mod board and generated an illegal data access trap if the low-order bit was 1.

Operating system
The operating system (OS) was named Hydra. It was capability-based, object-oriented, multi-user, and a microkernel. System resources were represented as objects and protected through capabilities.

The OS and most application software was written in the programming language BLISS-11, which required cross-compiling on a PDP-10. The OS used very little assembly language.

Among the programming languages available on the system was an ALGOL 68 variant which included extensions supporting parallel computing, to make good use of the C.mmp. The ALGOL compiler ran native on Hydra OS.

Reliability
Because overall system reliability depended on having all 16 CPUs running, there were serious problems with overall hardware reliability. If the mean time between failures (MTBF) of one processor was 24 hours, then the overall system reliability was 16/24 hours, or about 40 minutes. Overall, the system usually ran for between two and six hours. Many of these failures were due to timing glitches in the many custom circuits added to the processors. Great effort was expended to improve hardware reliability, and when a processor was noticeably failing, it was partitioned out, and would run diagnostics for several hours. When it had passed a first set of diagnostics, it was partitioned back in as an I/O processor and would not run application code (but its peripheral devices were now available); it continued to run diagnostics. If it passed these after several more hours, it was reinstated as a full member of the processor set. Similarly, if a block of memory (one page) was detected as faulty, it was removed from the pool of available pages, and until otherwise notified, the OS would ignore this page. Thus, the OS became an early example of a fault-tolerant system, able to deal with hardware problems which arose, inevitably.