Reconfigurable computing

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.

History
The concept of reconfigurable computing has existed since the 1960s, when Gerald Estrin's paper proposed the concept of a computer made of a standard processor and an array of "reconfigurable" hardware. The main processor would control the behavior of the reconfigurable hardware. The latter would then be tailored to perform a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware.

In the 1980s and 1990s there was a renaissance in this area of research with many proposed reconfigurable architectures developed in industry and academia, such as: Copacobana, Matrix, GARP, Elixent, NGEN, Polyp, MereGen, PACT XPP, Silicon Hive, Montium, Pleiades, Morphosys, and PiCoGA. Such designs were feasible due to the constant progress of silicon technology that let complex designs be implemented on one chip. Some of these massively parallel reconfigurable computers were built primarily for special subdomains such as molecular evolution, neural or image processing. The world's first commercial reconfigurable computer, the Algotronix CHS2X4, was completed in 1991. It was not a commercial success, but was promising enough that Xilinx (the inventor of the Field-Programmable Gate Array, FPGA) bought the technology and hired the Algotronix staff. Later machines enabled first demonstrations of scientific principles, such as the spontaneous spatial self-organisation of genetic coding with MereGen.

Tredennick's Classification
The fundamental model of the reconfigurable computing machine paradigm, the data-stream-based anti machine is well illustrated by the differences to other machine paradigms that were introduced earlier, as shown by Nick Tredennick's following classification scheme of computing paradigms (see "Table 1: Nick Tredennick's Paradigm Classification Scheme").

Hartenstein's Xputer
Computer scientist Reiner Hartenstein describes reconfigurable computing in terms of an anti-machine that, according to him, represents a fundamental paradigm shift away from the more conventional von Neumann machine. Hartenstein calls it Reconfigurable Computing Paradox, that software-to-configware (software-to-FPGA) migration results in reported speed-up factors of up to more than four orders of magnitude, as well as a reduction in electricity consumption by up to almost four orders of magnitude—although the technological parameters of FPGAs are behind the Gordon Moore curve by about four orders of magnitude, and the clock frequency is substantially lower than that of microprocessors. This paradox is partly explained by the Von Neumann syndrome.

High-performance computing
High-Performance Reconfigurable Computing (HPRC) is a computer architecture combining reconfigurable computing-based accelerators like field-programmable gate array with CPUs or multi-core processors.

The increase of logic in an FPGA has enabled larger and more complex algorithms to be programmed into the FPGA. The attachment of such an FPGA to a modern CPU over a high speed bus, like PCI express, has enabled the configurable logic to act more like a coprocessor rather than a peripheral. This has brought reconfigurable computing into the high-performance computing sphere.

Furthermore, by replicating an algorithm on an FPGA or the use of a multiplicity of FPGAs has enabled reconfigurable SIMD systems to be produced where several computational devices can concurrently operate on different data, which is highly parallel computing.

This heterogeneous systems technique is used in computing research and especially in supercomputing. A 2008 paper reported speed-up factors of more than 4 orders of magnitude and energy saving factors by up to almost 4 orders of magnitude. Some supercomputer firms offer heterogeneous processing blocks including FPGAs as accelerators. One research area is the twin-paradigm programming tool flow productivity obtained for such heterogeneous systems.

The US National Science Foundation has a center for high-performance reconfigurable computing (CHREC). In April 2011 the fourth Many-core and Reconfigurable Supercomputing Conference was held in Europe.

Commercial high-performance reconfigurable computing systems are beginning to emerge with the announcement of IBM integrating FPGAs with its IBM Power microprocessors.

Partial re-configuration
Partial re-configuration is the process of changing a portion of reconfigurable hardware circuitry while the other portion keeps its former configuration. Field programmable gate arrays are often used as a support to partial reconfiguration.

Electronic hardware, like software, can be designed modularly, by creating subcomponents and then higher-level components to instantiate them. In many cases it is useful to be able to swap out one or several of these subcomponents while the FPGA is still operating.

Normally, reconfiguring an FPGA requires it to be held in reset while an external controller reloads a design onto it. Partial reconfiguration allows for critical parts of the design to continue operating while a controller either on the FPGA or off of it loads a partial design into a reconfigurable module. Partial reconfiguration also can be used to save space for multiple designs by only storing the partial designs that change between designs.

A common example for when partial reconfiguration would be useful is the case of a communication device. If the device is controlling multiple connections, some of which require encryption, it would be useful to be able to load different encryption cores without bringing the whole controller down.

Partial reconfiguration is not supported on all FPGAs. A special software flow with emphasis on modular design is required. Typically the design modules are built along well defined boundaries inside the FPGA that require the design to be specially mapped to the internal hardware.

From the functionality of the design, partial reconfiguration can be divided into two groups:
 * dynamic partial reconfiguration, also known as an active partial reconfiguration - permits to change the part of the device while the rest of an FPGA is still running;
 * static partial reconfiguration - the device is not active during the reconfiguration process. While the partial data is sent into the FPGA, the rest of the device is stopped (in the shutdown mode) and brought up after the configuration is completed.

Computer emulation
With the advent of affordable FPGA boards, students' and hobbyists' projects seek to recreate vintage computers or implement more novel architectures. Such projects are built with reconfigurable hardware (FPGAs), and some devices support emulation of multiple vintage computers using a single reconfigurable hardware (C-One).

COPACOBANA
A fully FPGA-based computer is the COPACOBANA, the Cost Optimized Codebreaker and Analyzer and its successor RIVYERA. A spin-off company SciEngines GmbH of the COPACOBANA-Project of the Universities of Bochum and Kiel in Germany continues the development of fully FPGA-based computers.

Mitrionics
Mitrionics has developed a SDK that enables software written using a single assignment language to be compiled and executed on FPGA-based computers. The Mitrion-C software language and Mitrion processor enable software developers to write and execute applications on FPGA-based computers in the same manner as with other computing technologies, such as graphical processing units ("GPUs"), cell-based processors, parallel processing units ("PPUs"), multi-core CPUs, and traditional single-core CPU clusters. (out of business)

National Instruments
National Instruments have developed a hybrid embedded computing system called CompactRIO. It consists of reconfigurable chassis housing the user-programmable FPGA, hot swappable I/O modules, real-time controller for deterministic communication and processing, and graphical LabVIEW software for rapid RT and FPGA programming.

Xilinx
Xilinx has developed two styles of partial reconfiguration of FPGA devices: module-based and difference-based. Module-based partial reconfiguration permits to reconfigure distinct modular parts of the design, while difference-based partial reconfiguration can be used when a small change is made to a design.

Intel
Intel supports partial reconfiguration of their FPGA devices on 28 nm devices such as Stratix V, and on the 20 nm Arria 10 devices. The Intel FPGA partial reconfiguration flow for Arria 10 is based on the hierarchical design methodology in the Quartus Prime Pro software where users create physical partitions of the FPGA that can be reconfigured at runtime while the remainder of the design continues to operate. The Quartus Prime Pro software also support hierarchical partial reconfiguration and simulation of partial reconfiguration.

Classification of systems
As an emerging field, classifications of reconfigurable architectures are still being developed and refined as new architectures are developed; no unifying taxonomy has been suggested to date. However, several recurring parameters can be used to classify these systems.

Granularity
The granularity of the reconfigurable logic is defined as the size of the smallest functional unit (configurable logic block, CLB) that is addressed by the mapping tools. High granularity, which can also be known as fine-grained, often implies a greater flexibility when implementing algorithms into the hardware. However, there is a penalty associated with this in terms of increased power, area and delay due to greater quantity of routing required per computation. Fine-grained architectures work at the bit-level manipulation level; whilst coarse grained processing elements (reconfigurable datapath unit, rDPU) are better optimised for standard data path applications. One of the drawbacks of coarse grained architectures are that they tend to lose some of their utilisation and performance if they need to perform smaller computations than their granularity provides, for example for a one bit add on a four bit wide functional unit would waste three bits. This problem can be solved by having a coarse grain array (reconfigurable datapath array, rDPA) and a FPGA on the same chip.

Coarse-grained architectures (rDPA) are intended for the implementation for algorithms needing word-width data paths (rDPU). As their functional blocks are optimized for large computations and typically comprise word wide arithmetic logic units (ALU), they will perform these computations more quickly and with more power efficiency than a set of interconnected smaller functional units; this is due to the connecting wires being shorter, resulting in less wire capacitance and hence faster and lower power designs. A potential undesirable consequence of having larger computational blocks is that when the size of operands may not match the algorithm an inefficient utilisation of resources can result. Often the type of applications to be run are known in advance allowing the logic, memory and routing resources to be tailored to enhance the performance of the device whilst still providing a certain level of flexibility for future adaptation. Examples of this are domain specific arrays aimed at gaining better performance in terms of power, area, throughput than their more generic finer grained FPGA cousins by reducing their flexibility.

Rate of reconfiguration
Configuration of these reconfigurable systems can happen at deployment time, between execution phases or during execution. In a typical reconfigurable system, a bit stream is used to program the device at deployment time. Fine grained systems by their own nature require greater configuration time than more coarse-grained architectures due to more elements needing to be addressed and programmed. Therefore, more coarse-grained architectures gain from potential lower energy requirements, as less information is transferred and utilised. Intuitively, the slower the rate of reconfiguration the smaller the power consumption as the associated energy cost of reconfiguration are amortised over a longer period of time. Partial re-configuration aims to allow part of the device to be reprogrammed while another part is still performing active computation. Partial re-configuration allows smaller reconfigurable bit streams thus not wasting energy on transmitting redundant information in the bit stream. Compression of the bit stream is possible but careful analysis is to be carried out to ensure that the energy saved by using smaller bit streams is not outweighed by the computation needed to decompress the data.

Host coupling
Often the reconfigurable array is used as a processing accelerator attached to a host processor. The level of coupling determines the type of data transfers, latency, power, throughput and overheads involved when utilising the reconfigurable logic. Some of the most intuitive designs use a peripheral bus to provide a coprocessor like arrangement for the reconfigurable array. However, there have also been implementations where the reconfigurable fabric is much closer to the processor, some are even implemented into the data path, utilising the processor registers. The job of the host processor is to perform the control functions, configure the logic, schedule data and to provide external interfacing.

Routing/interconnects
The flexibility in reconfigurable devices mainly comes from their routing interconnect. One style of interconnect made popular by FPGAs vendors, Xilinx and Altera are the island style layout, where blocks are arranged in an array with vertical and horizontal routing. A layout with inadequate routing may suffer from poor flexibility and resource utilisation, therefore providing limited performance. If too much interconnect is provided this requires more transistors than necessary and thus more silicon area, longer wires and more power consumption.

Challenges for operating systems
One of the key challenges for reconfigurable computing is to enable higher design productivity and provide an easier way to use reconfigurable computing systems for users that are unfamiliar with the underlying concepts. One way of doing this is to provide standardization and abstraction, usually supported and enforced by an operating system.

One of the major tasks of an operating system is to hide the hardware and present programs (and their programmers) with nice, clean, elegant, and consistent abstractions to work with instead. In other words, the two main tasks of an operating system are abstraction and resource management.

Abstraction is a powerful mechanism to handle complex and different (hardware) tasks in a well-defined and common manner. One of the most elementary OS abstractions is a process. A process is a running application that has the perception (provided by the OS) that it is running on its own on the underlying virtual hardware. This can be relaxed by the concept of threads, allowing different tasks to run concurrently on this virtual hardware to exploit task level parallelism. To allow different processes and threads to coordinate their work, communication and synchronization methods have to be provided by the OS.

In addition to abstraction, resource management of the underlying hardware components is necessary because the virtual computers provided to the processes and threads by the operating system need to share available physical resources (processors, memory, and devices) spatially and temporarily.