User:Diego Marcos/sandbox

The LINPACK Benchmarks are a measure of a system's floating point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

The latest version of these benchmarks are used to build the Top500 list, ranking the world's most powerful supercomputers.

The aim is to approximate how fast will a computer perform when solving real problems. It is a simplification, since no single number can reflect the overall performance of a computer system. Nevertheless, the LINPACK benchmark performance can provide a good correction over the peak performance provided by the manufacturer. The peak performance is the maximal theoretical performance a computer could achieve, calculated as the machine's frequency, in cycles per second, times the number of operations per cycle it can perform. The actual performance will always be lower than the peak performance. The performance of a computer is a complex issue that depends on many interconnected variables. The performance measured by the LINPACK benchmark consists of the number of 64-bit floating-point operations, generally additions and multiplications, a computer can perform per second, or FLOPS. However, a computer's performance when running actual applications is likely to be far behind the maximal performance it achieves running the appropriate LINPACK benchmark .

The name of these benchmarks comes from the LINPACK package, a collection of algebra Fortran subrutines widely used in the 80s, and initially tightly linked to the LINPACK benchmark. The LINPACK package has been since then replaced by other libraries.

History
The LINPACK benchmark report appeared first in 1979 as an appendix to the LINPACK user's manual . It was designed to help users estimate the time required by their systems to solve a problem using the LINPACK package, by extrapolating the performance results obtained by 23 different computers solving a matrix problem of size 100. This size was chosen due to memory limitations at that time. 10000 floating-point entries from -1 to 1 are randomly generated to fill in a general, dense matrix. Then, LU decomposition with partial pivoting is used for the timing. Over the years, additional versions, with different problem sizes, like matrices of order 300 and 1000, and constrains were released, allowing new optimization opportunities as hardware architectures started to implement matrix-vector and matrix-matrix operations . In the case of the order 1000 size problem, a benchmark now called LINPACK 1000, the algorithm can be modified . Finally, 1991 saw the appearance of a version of the benchmark allowing to solve problems of arbitrary size, enabling High Performance Computers to get near to their asymptotic performance and generating the data used, two years later, for the first Top500 list release.

LINPACK 100
Very similar to the original benchmark published in 1979 along with the LINAPCK users' manual. The solution is obtained by Gaussian elimination with partial pivoting, with 2/3n³ + 2n² floating point operations where n is 100, the order of the dense matrix A that defines the problem. It's small size and the lack of software flexibility doesn't allow most modern computers to reach their performance limits. However, it can still be useful to predict performances in numerically intensive user written code using compiler optimization.

LINPACK 1000
Can provide a performance nearer to the machine's limit because, in addition to offering a bigger problem size, a matrix of order 1000, changes in the algorithm are possible. The only constrains are that the relative accuracy can't be reduced and the number of operations will always be considered to be 2/3n³ + 2n², with n = 1000.

HPLinpack
The previous benchmarks are not suitable for testing parallel computers , and the so called Linpack’s Highly Parallel Computing benchmark, or HPLinpack benchmark, was introduced. In HPLinpack the size n of the problem can be made as large as it is needed to optimize the performance results of the machine. Once again, 2/3n³ + 2n² will be taken as the operation count, with independence of the algorithm used. Strassen algorithm is not allowed because it distorts the real execution rate . The accuracy must be such that the following expression is satisfied:

$${\lVert Ax-b\rVert\over \lVert A\rVert \lVert x\rVert n \epsilon} \leq O(1)$$, where $$\epsilon$$ is the machine's precision, and n is the size of the problem , $$\lVert \cdot \rVert$$ is the matrix norm and $$O(1)$$ corresponds to the big-O notation.

For each computer system, the following quantities are reported : These results are used to compile the Top500 list twice a year, with the world's most powerful computers .
 * Rmax: the performance in Gflop/s for the largest problem run on a machine.
 * Nmax: the size of the largest problem run on a machine.
 * N1/2: the size where half the Rmax execution rate is achieved.
 * Rpeak: the theoretical peak performance Gflop/s for the machine.

LINPACK benchmark implementations
The previous section describes the ground rules for the benchmarks. The actual implementation of the program can diverge, with some examples being available in Fortran , C or Java .

HPL
HPL is a portable implementation of HPLinpack that was written in C, originally as a guideline, but that is now widely used to provide data for the Top500 list, though other technologies and packages can be used. HPL generates a linear system of equations of order n and solves it using LU decomposition with partial row pivoting. It requires installed implementations of either MPI and BLAS or VSIPL to run .

Coarsely, the algorithm has the following characteristics :
 * Cyclic data distribution in 2D blocks.
 * LU factorization using the right-looking variant with various depths of look-ahead.
 * Recursive panel factorization.
 * Six different panel broadcasting variants.
 * Bandwidth reducing swap-broadcast algorithm.
 * Backward substitution with look-ahead of depth 1.

Criticism
The LINPACK benchmark is said to have succeeded because of the scalability of HPLinpack, the fact that it generates a single number, making the results easily comparable and the extensive historical data base it has associated. However, soon after its release, the LINPACK benchmark was criticized for providing performance levels "generally unobtainable by all but a very few programmers who tediously optimize their code for that machine and that machine alone", because it only tests the resolution of dense linear systems, what is not representative of all the operations usually performed in scientific computing. Jack Dongarra, the main driving force behind the LINAPCK benchmarks, said that, while they only emphasize "peak" CPU speed and number of CPUs, not enough stress is given to local bandwidth and the network. Thom Dunning, director of the National Center for Supercomputing Applications, had this to say about the LINPACK benchmark: "The Linpack benchmark is one of those interesting phenomena -- almost anyone who knows about it will deride its utility. They understand its limitations but it has mindshare because it's the one number we've all bought into over the years." According to Dongarra, "the organizers of the Top500 are actively looking to expand the scope of the benchmark reporting" because "it is important to include more performance characteristic and signatures for a given system". One of the possibilities that is being considered to extend the benchmark for the Top500 is the HPC Challenge Benchmark Suite.