HPC Challenge Benchmark

HPC Challenge Benchmark combines several benchmarks to test a number of independent attributes of the performance of high-performance computer (HPC) systems. The project has been co-sponsored by the DARPA High Productivity Computing Systems program, the United States Department of Energy and the National Science Foundation.

Context
The performance of complex applications on HPC systems can depend on a variety of independent performance attributes of the hardware. The HPC Challenge Benchmark is an effort to improve visibility into this multidimensional space by combining the measurement of several of these attributes into a single program.

Although the performance attributes of interest are not specific to any particular computer architecture, the reference implementation of the HPC Challenge Benchmark in C and MPI assumes that the system under test is a cluster of shared memory multiprocessor systems connected by a network. Due to this assumption of a hierarchical system structure most of the tests are run in several different modes of operation. Following the notation used by the benchmark reports, results labeled "single" mean that the test was run on one randomly chosen processor in the system, results labeled "star" mean that an independent copy of the test was run concurrently on each processor in the system, and results labeled "global" mean that all the processors were working in coordination to solve a single problem (with data distributed across the nodes of the system).

Components
The benchmark currently consists of 7 tests (with the modes of operation indicated for each):
 * 1) HPL (High Performance LINPACK) – measures performance of a solver for a dense system of linear equations (global).
 * 2) DGEMM – measures performance for matrix-matrix multiplication (single, star).
 * 3) STREAM – measures sustained memory bandwidth to/from memory (single, star).
 * 4) PTRANS – measures the rate at which the system can transpose a large array (global).
 * 5) RandomAccess – measures the rate of 64-bit updates to randomly selected elements of a large table (single, star, global).
 * 6) FFT – performs a Fast Fourier Transform on a large one-dimensional vector using the generalized Cooley–Tukey algorithm (single, star, global).
 * 7) Communication Bandwidth and Latency – MPI-centric performance measurements based on the b_eff bandwidth/latency benchmark.

Performance attributes
At a high level, the tests are intended to provide coverage of four important attributes of performance: double-precision floating-point arithmetic (DGEMM and HPL), local memory bandwidth (STREAM), network bandwidth for "large" messages (PTRANS, RandomAccess, FFT, b_eff), and network bandwidth for "small" messages (RandomAccess, b_eff). Some of the codes are more complex than others and can have additional performance sensitivities. For example, in some systems HPL performance can be limited by network bandwidth and/or network latency.

Competition
The annual HPC Challenge Award Competition at the Supercomputing Conference focuses on four of the most challenging benchmarks in the suite: There are two classes of awards:
 * Global HPL
 * Global RandomAccess (OR BSS Random Access Benchmark)
 * EP STREAM (Triad) per system
 * Global FFT
 * Class 1: Best performance on a base or optimized run submitted to the HPC Challenge website.
 * Class 2: Most "elegant" implementation of four or five computational kernels including three or more of the HPC Challenge benchmarks.