C-slowing

C-slow retiming is a technique used in conjunction with retiming to improve throughput of a digital circuit. Each register in a circuit is replaced by a set of C registers (in series). This creates a circuit with C independent threads, as if the new circuit contained C copies of the original circuit. A single computation of the original circuit takes C times as many clock cycles to compute in the new circuit. C-slowing by itself increases latency, but throughput remains the same.

Increasing the number of registers allows optimization of the circuit through retiming to reduce the clock period of the circuit. In the best case, the clock period can be reduced by a factor of C. Reducing the clock period of the circuit reduces latency and increases throughput. Thus, for computations that can be multi-threaded, combining C-slowing with retiming can increase the throughput of the circuit, with little, or in the best case, no increase in latency.

Since registers are relatively plentiful in FPGAs, this technique is typically applied to circuits implemented with FPGAs.

Resources

 * Intel® Hyperflex™ Architecture High-Performance Design Handbook §
 * PipeRoute: A Pipelining-Aware Router for Reconfigurable Architectures
 * Simple Symmetric Multithreading in Xilinx FPGAs
 * Post Placement C-Slow Retiming for Xilinx Virtex (.ppt)
 * Post Placement C-Slow Retiming for Xilinx Virtex (.pdf)
 * Exploration of RaPiD-style Pipelined FPGA Interconnects
 * Time and Area Efficient Pattern Matching on FPGAs