Multiprocessor system architecture

A multiprocessor system is defined as "a system with more than one processor", and, more precisely, "a number of central processing units linked together to enable parallel processing to take place".

The key objective of a multiprocessor is to boost a system's execution speed. The other objectives are fault tolerance and application matching.

The term "multiprocessor" can be confused with the term "multiprocessing". While multiprocessing is a type of processing in which two or more processors work together to execute multiple programs simultaneously, multiprocessor refers to a hardware architecture that allows multiprocessing.

Multiprocessor systems are classified according to how processor memory access is handled and whether system processors are of a single type or various ones.

Multiprocessor system types
There are many types of multiprocessor systems:


 * Loosely coupled multiprocessor system
 * Tightly coupled multiprocessor system
 * Homogeneous multiprocessor system
 * Heterogeneous multiprocessor system
 * Shared memory multiprocessor system
 * Distributed memory multiprocessor system
 * Uniform memory access (UMA) system
 * cc–NUMA system
 * Hybrid system – shared system memory for global data and local memory for local data

Loosely-coupled (distributed memory) multiprocessor system


In loosely-coupled multiprocessor systems, each processor has its own local memory, input/output (I/O) channels, and operating system. Processors exchange data over a high-speed communication network by sending messages via a technique known as "message passing". Loosely-coupled multiprocessor systems are also known as distributed-memory systems, as the processors do not share physical memory and have individual I/O channels.

System characteristics

 * These systems are able to perform multiple-instructions-on-multiple-data (MIMD) programming.
 * This type of architecture allows parallel processing.
 * The distributed memory is highly scalable.

Tightly-coupled (shared memory) multiprocessor system
Multiprocessor system with a shared memory closely connected to the processors.

A symmetric multiprocessing system is a system with centralized shared memory called main memory (MM) operating under a single operating system with two or more homogeneous processors.

There are two types of systems:


 * Uniform memory-access (UMA) system
 * NUMA system

Uniform memory access (UMA) system

 * Heterogeneous multiprocessing system
 * Symmetric multiprocessing system (SMP)

Heterogeneous multiprocessor system
A heterogeneous multiprocessing system contains multiple, but not homogeneous, processing units – central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), or any type of application-specific integrated circuits (ASICs). The system architecture allows any accelerator – for instance, a graphics processor – to operate at the same processing level as the system's CPU.

Symmetric multiprocessor system


Systems operating under a single OS (operating system) with two or more homogeneous processors and with a centralized shared main memory.

A symmetric multiprocessor system (SMP) is a system with a pool of homogeneous processors running under a single OS with a centralized, shared main memory. Each processor, executing different programs and working on different sets of data, has the ability to share common resources (memory, I/O device, interrupt system, and so on) that are connected using a system bus, a crossbar, or a mix of the two, or an address bus and data crossbar.

Each processor has its own cache memory that acts as a bridge between the processor and main memory. The function of the cache is to alleviate the need for main-memory data access, thus reducing system-bus traffic.

Use of shared memory allows for a uniform memory-access time (UMA).

cc-NUMA system


It is known that the SMP system has limited scalability. To overcome this limitation, the architecture called "cc-NUMA" (cache coherency–non-uniform memory access) is normally used. The main characteristic of a cc-NUMA system is having shared global memory that is distributed to each node, although the effective "access" a processor has to the memory of a remote component subsystem, or "node", is slower compared to local memory access, which is why the memory access is "non-uniform".

A cc–NUMA system is a cluster of SMP systems – each called a "node", which can have a single processor, a multi-core processor, or a mix of the two, of one or other kinds of architecture – connected via a high-speed "connection network" that can be a "link" that can be a single or double-reverse ring, or multi-ring, point-to-point connections, or a mix of these (e.g. IBM Power Systems ), bus interconnection (e.g. NUMAq ), "crossbar", "segmented bus" (NUMA Bull HN ISI ex Honeywell, ) "mesh router", etc.

cc-NUMA is also called "distributed shared memory" (DSM) architecture.

The difference in access times between local and remote memory can be also an order of magnitude, depending on the kind of connection network used (faster in segmented bus, crossbar, and point-to-point interconnection; slower in serial rings connection).

Examples of interconnection


To overcome this limit, a large remote cache (see Remote cache) is normally used. With this solution, the cc-NUMA system becomes very close to a large SMP system.

Tightly-coupled versus loosely-coupled architecture
Both architectures have trade-offs which may be summarized as follows:
 * Loosely-coupled architectures feature high performances of each individual processor but do not enable for easy real-time balancing of the load among processors.
 * Tightly-coupled architectures feature easy load-balancing and distribution among processors but suffer from the bottleneck consisting in the sharing of common resources through one or more buses.

Multiprocessor system featuring global data multiplication
An intermediate approach, between those of the two previous architectures, is having common resources and local resources, such as local memories (LM), in each processor.

The common resources are accessible from all processors via the system bus, while local resources are only accessible to the local processor. Cache memories can be viewed in this perspective as local memories.

This system (patented by F. Zulian ), used on the DPX/2 300 Unix based system (Bull Hn Information Systems Italia (ex Honeywell)), is a mix of tightly and loosely coupled systems and makes use of all the advancements of these two architectures.

The local memory is divided into two sectors, global data (GD) and local data (LD).

The basic concept of this architecture is to have global data, which is modifiable information, accessible by all processors. This information is duplicated and stored in each local memory of each processor.

Each time the global data is modified in a local memory, a hardware write-broadcasting is sent to the system bus to all other local memories to maintain the global data coherency. Thus, global data may be read by each processor accessing its own local memory without involving the system bus. System bus access is only required when global data is modified in a local memory to update the copy of this data stored in the other local memories.

Local data can be exchanged in a loosely coupled system via message-passing