User:BPon/sandbox/Intel Tera-Scale

Intel Tera-Scale is a research program by Intel that focuses on development in Intel Processors and platforms that utilize the inherent  parallelism of emerging visual-computing applications. Such applications require teraFLOPS of parallel computing performance to process  terabytes of data quickly. Parallelism is the concept of performing multiple tasks simultaneously. Utilizing parallelism will not only increase the efficiency of computer processing units (CPUs), but also increase the bytes of data analyzed each second. In order to appropriately apply parallelism, the CPU must be able to handle multiple threads  and to do so the CPU must consist of multiple cores. The conventional amount of cores in consumer grade computers are 2-8 cores while workstation grade computers can have even greater amounts. However, even the current amount of cores aren't great enough to perform at teraFLOPS performance leading to an even greater amount of cores that must be added. As a result of the program, two prototypes have been manufactured that were used to test the feasibility of having many more cores than the conventional amount and proved to be successful.

Prototypes
Teraflops Research Chip (Polaris) is an 80-core prototype processor developed by Intel in 2007. It represents Intel's first public attempt at creating a Tera-Scale processor. The Polaris processor requires to be run at 3.13GHz and 1V in order to maintain its teraFLOP name. At its peak performance, the processor is capable of 1.28 teraFLOP.

Single-chip Cloud Computer is another research processor developed by Intel in 2009. This processor consists of 48 P54C cores connected in a 6x4 2D-mesh.

Ideology
Parallelism is the concept of performing multiple tasks simultaneously, effectively reducing the time needed to perform a given task. The Tera-Scale research program is focused on the concept of utilizing many more cores than conventional to increase performance with parallelism. Based on their previous experience with increased core counts on CPUs, doubling the number of cores was able to nearly double the performance with no increase in power. With a greater amount of cores, there are possibilities of improved energy efficiency, improved performance, extended lifetimes and new capabilities. Tera-Scale processors would improve energy efficiency by being able to "put to sleep" cores that are unneeded at the time while being able to improve performance by intelligently redistributing workloads to ensure an even workload spread across the chip. Extended lifetimes are also capable by tera-scale processors due to the possibility of having reserve cores that could be brought online when a core fails in the processor. Lastly, the processors would gain new capabilities and functionality as dedicated hardware engines, such as graphics engines, could be integrated.

Hardware
Intel Tera-Scale is focused on creating multi-core processors that can utilize parallel processing to reach teraFLOPS of computing performance. Current processors consist of highly complicated cores; however, current cores are built in a way that makes it difficult to have more than the current amounts of cores in CPUs. As a result, Intel is currently focused on creating Tera-Scale processors with many cores rather than high performance cores. To simplify CPU cores, Intel moved from CPUs utilizing the x86 architecture to a much simpler  VLIW architecture. VLIW is an uncommon architecture for desktops, but is adequate for computers running specialized applications. This architecture simplifies hardware design at the cost of the increasing the workload on the compiler side meaning more work must be put into programming. This drawback is offset by the fact that the number of applications that will be run on a Tera-Scale processor is low enough for it to not be too much of a burden on the software side.

Software
With the release of the Polaris 80 core processor in 2007, people questioned the need of 10s-100s of cores. Intel then responded with a category of software called Recognition, Mining, and Synthesis (RMS) applications which require the computational power of 10s-100s of cores. Recognition applications create models based on what they identify such as a person's face. Mining applications extract one or more instances from a large amount of data. Lastly, synthesis applications allow for prediction and projecting of new environments. An example of where RMS and tera-scale processors are necessary is the creation of sport summaries. Usually sport summaries require hours for a computer to mine through hundreds of thousands of video frames to find short action clips to be shown in the sport summaries. With RMS software and a tera-scale processor, sport summaries could be created in real time during sporting events. The Tera-Scale processors also show potential in real-time analysis in fields such as finance which requires a processor that is capable of analyzing immense amounts of data. From Intel's past evolution from single core to multi-core processors, Intel has learned that parallelization is the key to the greater processing power in the future. The Intel Tera-Scale research program is not only focused on creating the multi-cored processors, but also the parallelizing applications of today and in the future. To show their dedication to all aspects of parallel computing, Intel set aside $20 million to establish centers that will research and develop new methods utilize parallel computing in many more applications.

Challenges
In early 2005, Intel originally encountered the problem of memory bandwidth. As more cores are added, the memory bandwidth remains the same due to size constrictions, effectively bottle necking the CPU. Fortunately, they were able to overcome the problem by a process called die stacking. This is a process in which the CPU die, flash, and  DRAM would be stacked on top of each other significantly raising the possible memory bus widths. Another challenge that Intel encountered were the physical limitations of electrical buses. A bus bandwidth is the CPU's connection to the outside world and with the current bus bandwidth, it would be unable to keep up with the teraFLOPs performance resulting from tera-scale processors. Intel's research into Silicon Photonics has produced a functional optical bus that can offer superior signaling speed and power efficiency compared to the current buses. These optical buses are an ideal solution to the bus bandwidth limitation for tera-scale processors.