Talk:Symmetric multiprocessing

Interpretation of SMP
This article is incorrect in its interpretation of SMP. SMP (Symmetric MultiProcessing) refers to the capability of any part of the operating system to execute on any processor. Asymmetric MP is a system where key portions of the OS such as IO operations can only execute on the primary CPU. Applications code can also execute on secondary CPUs. Asymmetric MP is typically easier to implement but does not scale as well as SMP because the primary cpu becomes a bottleneck. SMP avoids this by allowing all code to execute on any available CPU. This requires reentrant OS code.

NUMA and UMA refer to memory access in shared memory MP architectures (usually SMP). UMA (Uniform Memory Access) is generally implemented as a bus where each CPU has essentially the same path to shared memory. This is difficult to implement in systems with large numbers of CPUs, though examples have existed with 64 CPUs. In this design the memory bus eventually becomes a bottleneck. To avoid this, NUMA (NonUniform Memory Access) systems are typically composed of building blocks of small UMA SMP nodes with two to four CPUs and some local memory linked by high speed networks so that any CPU can access all addressable memory. Access to nonlocal memory is slower. There are usually several tiers of networking in very large NUMA systems with over a thousand CPUs. These systems scale better than UMA because with good locality of reference and intelligent scheduling much data required by a given CPU will be held in local memory avoiding bus contention. The term ccNUMA means cache coherent NUMA. Some provision such as bus snooping or a directory is used to maintain a coherent picture of shared memory in the cache of each processor. All major commercial NUMA machines are cache coherent, so the cc is often dropped.

Another popular multiprocessing model is the distributed memory cluster. In this case you have a dedicated network of independent computing nodes which do not have a shared address space. These systems employ message passing to communicate data between nodes. This requires a different approach to programming since data resides on specific nodes rather than in a single shared address space. Distributed clusters are generally far less costly than shared memory multiprocessors of similar size.

— Preceding unsigned comment added by 64.136.49.229 (talk • contribs) 11:14, 1 January 2005‎


 * "This article is incorrect in its interpretation of SMP. SMP (Symmetric MultiProcessing) refers to the capability of any part of the operating system to execute on any processor."
 * Ah...No, that would be a multi-programmed OS, or a Multi-processor aware applicion.
 * "Asymmetric MP is a system where key portions of the OS such as IO operations can only execute on the primary CPU."
 * An Example being a PowerMacintosh 9500 180/MP.
 * "Applications code can also execute on secondary CPUs."
 * Actually Applications can execute functions on the secondary CPU. When an application is executed, its loaded into memory, and the OS passes control to it.
 * "Asymmetric MP is typically easier to implement but does not scale as well as SMP because the primary cpu becomes a bottleneck."
 * Only for certain types of applications.
 * "SMP avoids this by allowing all code to execute on any available CPU. This requires reentrant OS code."
 * As most applications are. There are a few badly behaved applications, like games, that have to manage multi-programming themselves, but for the most part,
 * "Another popular multiprocessing model is the distributed memory cluster. In this case you have a dedicated network of independent computing nodes which do not have a shared address space. These systems employ message passing to communicate data between nodes.  This requires a different approach to programming since data resides on specific nodes rather than in a single shared address space.  Distributed clusters are generally far less costly than shared memory multiprocessors of similar size."
 * Distrubuted memory clusters, for the most part are NUMA machines. Due to efficency, each memory segment has multiple processors. There are many examples of this. The message passing can occur on a dedicated processor bus, the system bus, an I/O bus, an I/O bus to Ethernet/Myranet or custom communcation fabrics like the MassPar.
 * — Preceding unsigned comment added by Artoftransformation (talk • contribs) 13:04, 5 November 2007‎

Undue weight on vSMP
There's a whole section on vSMP in this article. It describes several key advantages of vSMP in detail, but it fails to cover any potential disadvantages, nor do the advantages alone explain the very limited adoption of the technology. The only processors I know of that implemented this are the Tegra series, and the technology was not licensed to other processor designers. Indeed, most other ARM SoC designers and manufacturers went with ARM big.LITTLE instead.

Perhaps we could turn this section into one that briefly discusses heterogeneous multiprocessing techniques like vSMP and big.LITTLE in general? I think a brief summary of vSMP and its advantages is justified, but it should be one that fits within a paragraph, rather than a whole section. As it stands, the section gives too much weight on a processor technology that isn't widely adopted. — bwDraco talk /contribs 22:13, 11 August 2017 (UTC)

memory access time from all CPUs
&ldquo;The term SMP is widely used but causes a bit of confusion. [...] The more precise description of what is intended by SMP is a shared memory multiprocessor where the cost of accessing a memory location is the same for all processors; that is, it has uniform access costs when the access actually is to memory. If the location is cached, the access will be faster. but cache access times and memory access times are the same on all processors.&rdquo;

I don't dispute the authenticity of the quotation, but the definition is silly. On the IBM System/360 model 65, for example, the time required to access memory from each CPU varied based on the length of the cables from the CPU to the memory box. See page 34 of the Functional Characteristics manual. That doesn't make the model 65 not an SMP system. The actual rule is that memory access time is close enough to uniform that software can get away with ignoring the different times. This is similar to the transition from drum to core memory, which allowed software to treat all memory locations as having the same access time. John Sauter (talk) 03:42, 21 May 2018 (UTC)


 * And if the difference between memory access times is significant, that's NUMA.


 * I tend to think of "not symmetric" as meaning "the system's behavior is not symmetric under a permutation of processors". At the memory level, "symmetric" would mean that, for each main memory access speed, each processor has access to as much main memory of that speed as any other processor (assuming a full memory configuration).  If you permute two processors in an MP65 system, each processor still has fast access to half the memory and slower access to the other half, with the two processors swapping which memory they have fast access to and which memory they don't (according to the table you cited in the Model 65 Functional Characteristics manual).  The same applies to a NUMA system. Guy Harris (talk) 05:11, 21 May 2018 (UTC)


 * It sounds like you classify a NUMA computer as a type of SMP computer. I'm not sure I agree.  On a non-NUMA computer (a UMA computer?) the kernel scheduler can assign any CPU to any ready task.  Scheduling a different CPU from the last one used for the task might involve reloading the FPU state and reloading cache, but those are small enough penalties that they can be ignored.  On a NUMA system, by contrast, scheduling a different CPU can require copying the tasks's memory to that CPU's local memory.  Doing the copy is a big deal, and not doing it incurs unacceptable performance penalties.  That's different enough that I don't think a NUMA computer deserves to be called SMP.  John Sauter (talk) 18:25, 21 May 2018 (UTC)


 * I'm treating "symmetry" strictly, in the "symmetric under a permutation of processors" sense. I view the distinction between uniform and non-uniform memory access as separate from the distinction between symmetric and asymmetric.  The OS requirements for an ASMP system, such as routing all I/O through the processor to which the devices are attached if the asymmetry involves one CPU handling all I/O, or routing all privileged operations through the CPU running the privileged code if the asymmetry involves one CPU handling all the privileged code, are different from the OS requirements for a NUMA system, such as dividing the pool of CPUs into groups and assigning a process or thread to a group and scheduling it to run only on CPUs within the group.


 * (As for the MP65, was the 2x-3.3x speed difference in memory access sufficient for the OS to try to keep a job running on the CPU that has the fastest access to the job's memory? If so, it's sort of NUMA, although not as non-uniform as more recent NUMA systems.


 * My quick read of the functional characteristics manual seems to indicate that not all peripherals were necessarily accessible by both CPUs; if that's the case, that's another form of non-uniformity, non-uniform access to peripherals.)


 * I don't know whether treating "asymmetric vs. symmetric" and "uniform vs. non-uniform access to system resources" is a common distinction drawn in multiprocessor taxonomy, however. Guy Harris (talk) 19:24, 21 May 2018 (UTC)


 * I don't have any personal experience with the multiprocessor model 65, but as best I can cell from the surviving literature, there was no attempt on the part of the operating system to optimize for memory access times in the task scheduler. Doing that would have required knowing how the memory address switches were set, and as far as I know the operating system could not sense them.  Also, note that the memory boxes have a 750 nanosecond cycle time, so the access times were not extremely different from one box to the next.  The model 65 did not have memory interleaving, whereas the contemporary DEC PDP-6 did, at the expense of not having partitioning.  John Sauter (talk) 21:41, 22 May 2018 (UTC)


 * I was in the USAF in the 1970s. The Back-Up Interceptor Control systems based on the Burroughs D-825 had a large amount of coax wound on the internal card frames (I forget what those were called). We were warned not to shorten the cables as they kept the timing constant. It would appear that all memory cabinets, timing-wise, were equidistant from the IO and CPUs. Adakiko (talk) 20:07, 8 January 2022 (UTC)
 * "The model 65 did not have memory interleaving". This is not true. The 65 (including 65s with the multiprocessor feature) used Model 2365 storage units which implemented interleaving. There is a "Defeat Interleaving" switch on the 2065 front panel. Plenty of references to interleaving in the 2065 FETOM. Interleaving reduced the cycle time from 750 to 400 nanoseconds. Mark Triggers (talk) 21:07, 28 June 2024 (UTC)
 * I apologize for saying that the model 65 did not have interleaving: I was not aware of it, having overlooked the clear description in the Functional Characteristics manual. However, 400 nanoseconds is not the cycle time of interleaved memory, but the effective access time, assuming sequential access. John Sauter (talk) 23:33, 28 June 2024 (UTC)

x86 / Mass-market SMP
I came to this article looking for when SMP came to the masses, to the PC. I think it would be nice to have a section, or at least a line, on the matter.--Prosfilaes (talk) 11:18, 20 December 2019 (UTC)