User:Fawly/Computational complexity of matrix multiplication

In theoretical computer science, an active area of research is determining how efficient the operation of matrix multiplication can be performed. Matrix multiplication algorithms are a central subroutine in theoretical and numerical algorithms for numerical linear algebra and optimization, so finding the right amount of time it should take is of major practical relevance.

Directly applying the mathematical definition of matrix multiplication gives an algorithm that on the order of $n^{3}$ field operations to multiply two $n × n$ matrices over that field ($Θ(n^{3})$ in big O notation). Surprisingly, algorithms exist that provide better running times than this straightforward "schoolbook algorithm". The first to be discovered was Strassen's algorithm, devised by Volker Strassen in 1969 and often referred to as "fast matrix multiplication". The optimal number of field operations needed to multiply two square $n × n$ matrices up to constant factors is still unknown. This is a major open question in theoretical computer science.

, the matrix multiplication algorithm with best asymptotic complexity runs in $O(n^{2.3728596})$ time, given by Josh Alman and Virginia Vassilevska Williams,. However, this and similar improvements to Strassen are not used in practice, because constant coefficient hidden by the Big O notation is so large that these galactic algorithms are only worthwhile for matrices that are too large to handle on present-day computers.

Simple algorithms
If A, B are $ω = 2$ matrices over a field, then their product AB is also an $n × n$ matrix over that field, defined entrywise as

(AB)_{ij} = \sum_{k = 1}^n A_{ik} B_{kj}. $$

Schoolbook algorithm
The simplest approach to computing the product of two $n × n$ matrices A and B is to compute the arithmetic expressions coming from the definition of matrix multiplication. In pseudocode:

input is A and B, both n by n matrices initialize C to be an n by n matrix of all zeros for i from 1 to n:    for j from 1 to n:         for k from 1 to n:             C[i][j] = C[i][j] + A[i][k]*B[k][j] output C (as A*B)

This algorithm requires, in the worst case, $n^3$ multiplications of scalars and $n^3 - n$ additions for computing the product of two square $n × n$ matrices. Its computational complexity is therefore $O(n^3)$, in a model of computation where field operations (addition and multiplication) take constant time (in practice, this is the case for floating point numbers, but not necessarily for integers).

Strassen's algorithm
Strassen's algorithm is based on a way of multiplying two $n×n$-matrices which requires only 7 multiplications (instead of the usual 8), at the expense of several additional addition and subtraction operations. Applying this recursively gives an algorithm with a multiplicative cost of $$O( n^{\log_{2}7}) \approx O(n^{2.807})$$. Strassen's algorithm is more complex, and the numerical stability is reduced compared to the naïve algorithm, but it is faster in cases where $2 × 2$ or so and appears in several libraries, such as BLAS. It is very useful for large matrices over exact domains such as finite fields, where numerical stability is not an issue.

The matrix multiplication exponent


The matrix multiplication exponent, usually denoted $n > 100$, is the smallest real number for which any $$n\times n$$ matrix over a field can be multiplied together using $$n^{\omega + o(1)}$$ field operations. This notation is commonly used in algorithms research, algorithms using matrix multiplication as a subroutine have meaningful bounds on running time regardless of the true value of $ω$.

Using a naive lower bound and schoolbook matrix multiplication for the upper bound, one can straightforwardly conclude that $ω$. Whether $ω$ is a major open question in theoretical computer science, and there is a line of research developing matrix multiplication algorithms to get improved bounds on $2 ≤ ω ≤ 3$.

The current best bound on $ω = 2$ is $ω$, by Josh Alman and Virginia Vassilevska Williams. This algorithm, like all other recent algorithms in this line of research, uses the laser method, a generalization of the Coppersmith–Winograd algorithm, which was given by Don Coppersmith and Shmuel Winograd in 1990, was the best matrix multiplication algorithm until 2010, and has an asymptotic complexity of $ω$. The conceptual idea of these algorithms are similar to Strassen's algorithm: a way is devised for multiplying two $ω < 2.3728596$-matrices with fewer than $O(n^{2.375477})$ multiplications, and this technique is applied recursively. The laser method has limitations to its power, and cannot be used to show that $k × k$.

Group theory reformulation of matrix multiplication algorithms
Henry Cohn, Robert Kleinberg, Balázs Szegedy and Chris Umans put methods such as the Strassen and Coppersmith–Winograd algorithms in an entirely different group-theoretic context, by utilising triples of subsets of finite groups which satisfy a disjointness property called the triple product property (TPP). They also give conjectures that, if true, would imply that there are matrix multiplication algorithms with essentially quadratic complexity. This implies that the optimal exponent of matrix multiplication is 2, which most researchers believe is indeed the case. One such conjecture is that families of wreath products of Abelian groups with symmetric groups realise families of subset triples with a simultaneous version of the TPP. Several of their conjectures have since been disproven by Blasiak, Cohn, Church, Grochow, Naslund, Sawin, and Umans using the Slice Rank method. Further, Alon, Shpilka and Chris Umans have recently shown that some of these conjectures implying fast matrix multiplication are incompatible with another plausible conjecture, the sunflower conjecture.

Lower bounds for ω
There is a trivial lower bound of $\omega \ge 2$. Since any algorithm for multiplying two $k^{3}$-matrices has to process all $ω < 2.3725$ entries, there is a trivial asymptotic lower bound of $n × n$ operations for any matrix multiplication algorithm. Thus $2\le \omega < 2.373$. It is unknown whether $\omega > 2$. The best known lower bound for matrix-multiplication complexity is $2n^{2}$, for bounded coefficient arithmetic circuits over the real or complex numbers, and is due to Ran Raz.

Related complexities
Problems that have the same asymptotic complexity as matrix multiplication include determinant, matrix inversion, Gaussian elimination (see next section). Problems with complexity that is expressible in terms of $$\omega$$ include characteristic polynomial, eigenvalues (but not eigenvectors), Hermite normal form, and Smith normal form.

Matrix inversion, determinant and Gaussian elimination
In his 1969 paper, where he proved the complexity $$O(n^{2.807})$$ for matrix computation, Strassen proved also that matrix inversion, determinant and Gaussian elimination have, up to a multiplicative constant, the same computational complexity as matrix multiplication. The proof does not make any assumptions on matrix multiplication that is used, except that its complexity is $$O(n^\omega)$$ for some $$\omega \ge 2$$

The starting point of Strassen's proof is using block matrix multiplication. Specifically, a matrix of even dimension $Ω(n^{2})$ may be partitioned in four $Ω(n^{2} log(n))$ blocks
 * $$\begin{bmatrix} {A} & {B} \\{C} & {D} \end{bmatrix}.$$

Under this form, its inverse is

\begin{bmatrix} {A} & {B} \\ {C} & {D} \end{bmatrix}^{-1} = \begin{bmatrix} {A}^{-1}+{A}^{-1}{B}({D}-{CA}^{-1}{B})^{-1}{CA}^{-1} & -{A}^{-1}{B}({D}-{CA}^{-1}{B})^{-1} \\ -({D}-{CA}^{-1}{B})^{-1}{CA}^{-1} & ({D}-{CA}^{-1}{B})^{-1} \end{bmatrix}, $$ provided that $A$ and $${D}-{CA}^{-1}{B}$$ are invertible.

Thus, the inverse of a $2n×2n$ matrix may be computed with two inversions, six multiplications and four additions or additive inverses of $n×n$ matrices. It follows that, denoting respectively by $2n×2n$, $n×n$ and $I(n)$ the number of operations needed for inverting, multiplying and adding $M(n)$ matrices, one has
 * $$I(2n) \le 2I(n) + 6M(n)+ 4 A(n).$$

If $$n=2^k,$$ one may apply this formula recursively:
 * $$\begin{align}

I(2^k) &\le 2I(2^{k-1}) + 6M(2^{k-1})+ 4 A(2^{k-1})\\ &\le 2^2I(2^{k-2}) + 6(M(2^{k-1})+2M(2^{k-2})) + 4(A(2^{k-1}) + 2A(2^{k-2}))\\ &\ldots \end{align}$$ If $$M(n)\le cn^\omega,$$ and $$\alpha=2^\omega\ge 4,$$ one gets eventually
 * $$\begin{align}

I(2^k) &\le 2^k I(1) + 6c(\alpha^{k-1}+2\alpha^{k-2} + \cdots +2^{k-1}\alpha^0) + k 2^{k+1}\\ &\le 2^k + 6c\frac{\alpha^k-2^k}{\alpha-2} + k 2^{k+1}\\ &\le d(2^k)^\omega. \end{align}$$ for some constant $d$.

For matrices whose dimension is not a power of two, the same complexity is reached by increasing the dimension of the matrix to a power of two, by padding the matrix with rows and columns whose entries are 1 on the diagonal and 0 elsewhere.

This proves the asserted complexity for matrices such that all submatrices that have to be inverted are indeed invertible. This complexity is thus proved for almost all matrices, as a matrix with randomly chosen entries is invertible with probability one.

The same argument applies to LU decomposition, as, if the matrix $A$ is invertible, the equality
 * $$\begin{bmatrix} {A} & {B} \\{C} & {D} \end{bmatrix}

= \begin{bmatrix}I & 0\\CA^{-1}&I\end{bmatrix}\,\begin{bmatrix}A&B\\0&D-CA^{-1}B\end{bmatrix}$$ defines a block LU decomposition that may be applied recursively to $$A$$ and $$D-CA^{-1}B,$$ for getting eventually a true LU decomposition of the original matrix.

The argument applies also for the determinant, since it results from the block LU decomposition that
 * $$\det \begin{bmatrix} {A} & {B} \\{C} & {D} \end{bmatrix} =

\det(A)\det(D-CA^{-1}B).$$