User:Dv02159/sandbox/The Single-Chip Cloud Computer

 The Single-Chip Cloud Computer  

The Single-Chip Cloud Computer, or SCC, is a multiprocessor (a computer with more than one central processing unit) project created and funded by Intel Corporation beginning 2009. The project was started to promote large quantity core-processors and parallel programming research; with a focus on multi-threaded coding. Parallel Programming is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved "in parallel", or at the same time. For this project, Intel designed the integrated circuit, also known as a chip, to explore integrated cloud computing on a chip. The configuration and design of the SCC took two and a half years for a small groups of 50-60 people from Intel corporations around the world to collaborate and produce. A unique feature of the SCC is its ability to adjust the voltage and frequency of the tiles, both at startup and dynamically during operation.

To achieve the necessary power each SCC chip contains 48 P54C Pentium cores that are inter-linked with a 4x6 2D-mesh; the processors utilize 4 DDR3 controller for Random Access Memory(RAM), that are linked with a 2D mesh as well.

Development
The SCC is designed to continue Intel's trend toward employing higher core counts with the intent to replace the increase in clock frequency that have resulted in high power requirements, difficult to control high temperature, associated lower reliability, and chip design difficulties.

Intel originally predicted that it would take at least 3 years for the project to complete, but the team of designers completed the feat in 2 and a half years with a working model. They credited the high core counts to the limits set by the parameters of dynamic and leakage power consumption, software development tools, and the nature of workload that profit. The SCC is capable of handling a range from consumer level platform, composed of just a handful of instructions, to commercial, or computational platforms, that crunch threads as long as tens of thousands of instructions and beyond.

Capacity
The endurance capacity, or extent of the SCC's endurance under stress, came from Intel's interest in finding out if they can realistically bring support for the datacenter model used in most major corporations for better efficiency, and cost ratio, and to bring that programming model to the workstation and client with better hardware support

The SCC interconnect driver reliably transports single cache suze messages, composed of 32 bytes, through a message setup in noncoherent shared memory. Shared memory is accessed entirely from user-space, using the SCC write-combine buffer for performance. -The polling approach to detect incoming messages, used by light-weight message passing runtimes, such as RCCE, is inappropriate when using shared memory to deliver message payloads, since each poll of a message-passing channel requires a cache invalidate followed by a load from DDR3 memory (Barrel )

Design
The SCC processor consists of 48 general-purpose x86 cores on a single die. The cores are distributed across 24 tiles, organized into a six by four 2D mesh. Each tile contains two P54C processor cores. The P54C is the second generation Pentium® processor core, as shown in Figure 1. The tile provides a separate unified L2 cache for each core, as well as two globally accessible test&set registers. These are atomic read-modify-write registers that can be in one of two states, set and unset. In addition, each tile has 16 kB of SRAM to support a shared address space visible to all cores, called the Message Passing Buffer (MPB). The cache controllers and MPB connect to the router through a Mesh Interface (I/F) unit.

The SCC processor includes four DDR3 memory controllers at the corners of the mesh, and an extension of the on-die network to a unit that translates router traffic to support the PCI interface. In addition, the SCC processor provides three distinct address spaces; private DRAM, shared DRAM, and the Message Passing Buffer (MPB) in on-chip SRAM.

Originally, Intel expected the SCC processor to be used without an Operating System; however, Linux was eventually ported to SCC and has become the most common mode of operation with the SCC processor. This was so because Intel found that the choices they made to support execution of RCCE programs without an OS have reduced the overhead of message passing on the SCC processor.

Modes of Operation
Programs may be loaded into SCC memory by the mangement console through the system interface and by the on-die mesh. The memory, with the program on it, can be dynamically mapped to the address space of 48 cores; it can also be mapped to the memory space for loading and debugging. This means that the SCC can be loaded with programs by a user and be told what to do with the program through mapping. The program can then be used with 2 different modes of operation, Processor Mode ad Mesh Mode.

Processor Mode
In processor mode, the cores are operational and they execute the code from system memory and programmed I/O through the system interface that is connected off-die to the system board FPGA.

Loading memory and configuring the processor for bootstrapping is currently done by software running on the console.

Mesh Mode
In mesh mode the cores are off and the router is stressed for performance measurements> the mesh and traffic generators are on and sending/receiving large amounts of data. The drawback of this mode, however, is because the core logic is off, there is no memory map.

Goals and Usage
The Single-chip Cloud Computer is a research chip created by Intel Labs to study many-core CPUs, their architectures, and the techniques used to program them. The SCC processor is a co-design of hardware and software, i.e. the design of the processor and the parallel programming environment proceeded together. This environment exposes the low level details of the hardware to the programmer, but is still sufficiently abstract to support productive parallel programming. The SCC processor at its lowest level is based on an on-die network to move messages around the chip.

In its current usage model the SCC has the following implications :
 * there are no message queues to maintain and traverse
 * synchronization variables can be limited to exactly two for each pair of communicating cores
 * the entire net dedicated on-chip message memory (MPB) is available to each individual communication payload