User:Ntheazk/sandbox

In multilinear algebra, the tensor rank decomposition or canonical polyadic decomposition (CPD) may be regarded as a generalization of the matrix singular value decomposition (SVD) to tensors, which has found application in statistics, signal processing, psychometrics, linguistics and chemometrics. It was introduced by Hitchcock in 1927 and later rediscovered several times, notably in psychometrics. For this reason, the tensor rank decomposition is sometimes historically referred to as PARAFAC or CANDECOMP.

Definition
Consider a tensor space $$\mathbb{F}^{n_1 \times \cdots \times n_d} \cong \mathbb{F}^{n_1} \otimes \cdots \otimes \mathbb{F}^{n_d}$$, where $$\mathbb{F}$$ is either the real field $$\mathbb{R}$$ or the complex field $$\mathbb{C}$$. Every (order-$$d$$) tensor in this space may then be represented with a suitably large $$r$$ as a linear combination of $$r$$ rank-1 tensors:


 * $$\mathcal{A} = \sum_{i=1}^{r} \lambda_i \mathbf{a}_i^1 \otimes \mathbf{a}_i^2 \otimes \cdots \otimes \mathbf{a}_i^d,$$

where $$\lambda_i \in \mathbb{F}$$ and $$\mathbf{a}_i^k \in \mathbb{F}^{n_k}$$; note that the superscript $$k$$ should not be interpreted as an exponent, it is merely another index. When the number of terms $$r$$ is minimal in the above expression, then $$r$$ is called the rank of the tensor, and the decomposition is often referred to as a (tensor) rank decomposition, minimal CP decomposition, or Canonical Polyadic Decomposition (CPD). Contrariwise, if the number of terms is not minimal, then the above decomposition is often referred to as $$r$$-term decomposition, CANDECOMP/PARAFAC or Polyadic decomposition.

Tensor rank
Contrary to the case of matrices, the rank of a tensor is presently not understood well. It is known that the problem of computing the rank of a tensor is NP-hard. The only notable well-understood case consists of tensors in $$\mathbb{F}^{m} \otimes \mathbb{F}^{n} \otimes \mathbb{F}^2$$, whose rank can be obtained from the Kronecker–Weierstrass normal form of the linear matrix pencil that the tensor represents. A simple polynomial-time algorithm exists for certifying that a tensor is of rank 1, namely the higher-order singular value decomposition.

The rank of the tensor of zeros is zero by convention. The rank of a tensor $$\mathbf{a}_1 \otimes \cdots \otimes \mathbf{a}_d $$ is one, provided that $$ \mathbf{a}_k \in \mathbb{F}^{n_k}\setminus\{0\} $$.

Field dependence
The rank of a tensor depends on the field over which the tensor is decomposed. It is known that some real tensors may admit a complex decomposition whose rank is strictly less than the rank of a real decomposition of the same tensor. As an example, consider the following real tensor


 * $$ \mathcal{A} = \mathbf{x}_1 \otimes \mathbf{x}_2 \otimes \mathbf{x}_3 + \mathbf{x}_1 \otimes \mathbf{y}_2 \otimes \mathbf{y}_3 - \mathbf{y}_1 \otimes \mathbf{x}_2 \otimes \mathbf{y}_3 + \mathbf{y}_1 \otimes \mathbf{y}_2 \otimes \mathbf{y}_3, $$

whose rank over the reals is known to be 3, while its complex rank is only 2 because it is the sum of a complex rank-1 tensor with its complex conjugate:


 * $$ \mathcal{A} = \frac{1}{2}( \bar{\mathbf{z}}_1 \otimes \mathbf{z}_2 \otimes \bar{\mathbf{z}}_3 + \mathbf{z}_1 \otimes \bar{\mathbf{z}}_2 \otimes \mathbf{z}_3).$$

In contrast, the rank of real matrices will never decrease under a field extension to $$\mathbb{C}$$: real matrix rank and complex matrix rank coincide for real matrices.

Generic rank
The generic rank $$r(n_1,\ldots,n_d)$$ is defined as the least rank $$r$$ such that the closure in the Zariski topology of the set of tensors of rank at most $$r$$ is the entire space $$\mathbb{F}^{n_1} \otimes \cdots \otimes \mathbb{F}^{n_d}$$. In the case of complex tensors, tensors of rank at most $$r(n_1,\ldots,n_d)$$ form a dense set $$S$$: every tensor in the aforementioned space is either of rank less than the generic rank, or it is the limit in the Euclidean topology of a sequence of tensors from $$S$$. In the case of real tensors, the set of tensors of rank at most $$r(n_1,\ldots,n_d)$$ only forms an open set of positive measure in the Euclidean topology. There may exist Euclidean-open sets of tensors of rank strictly higher than the generic rank. All ranks appearing on open sets in the Euclidean topology are called typical ranks. The smallest typical rank is called the generic rank; this definition applies to both complex and real tensors. The generic rank of tensor spaces was initially studied in 1983 by Volker Strassen.

As an illustration of the above concepts, it is known that both 2 and 3 are typical ranks of $$\mathbb{R}^2 \otimes \mathbb{R}^2 \otimes \mathbb{R}^2$$ while the generic rank of $$\mathbb{C}^2 \otimes \mathbb{C}^2 \otimes \mathbb{C}^2$$ is 2. Practically, this means that a randomly sampled real tensor (from a continuous probability measure on the space of tensors) of size $$2 \times 2 \times 2$$ will be a rank-1 tensor with probability zero, a rank-2 tensor with positive probability, and rank-3 with positive probability. On the other hand, a randomly sampled complex tensor of the same size will be a rank-1 tensor with probability zero, a rank-2 tensor with probability one, and a rank-3 tensor with probability zero. It is even known that the generic rank-3 real tensor in $$\mathbb{R}^2 \otimes \mathbb{R}^2 \otimes \mathbb{R}^2$$ will be of complex rank equal to 2.

The generic rank of tensor spaces depends on the distinction between balanced and unbalanced tensor spaces. A tensor space $$\mathbb{F}^{n_1} \otimes \cdots \otimes \mathbb{F}^{n_d}$$, where $$n_1 \ge n_2 \ge \cdots \ge n_d$$, is called unbalanced whenever


 * $$n_1 > 1 + \prod_{k=2}^d n_k - \sum_{k=2}^d (n_k-1), $$

and it is called balanced otherwise.

Unbalanced tensor spaces
When the first factor is very large with respect to the other factors in the tensor product, then the tensor space essentially behaves as a linear space. The generic rank of tensors living in an unbalanced tensor spaces is known to equal


 * $$ r(n_1,\ldots,r_d) = \min\left\{ n_1, \prod_{k=2}^d n_k \right\} $$

almost everywhere. More precisely, the rank of every tensor in an unbalanced tensor space $$\mathbb{F}^{n_1 \times \cdots \times n_d} \setminus Z$$, where $$Z$$ is some indeterminate closed set in the Zariski topology, equals the above value.

Balanced tensor spaces
The generic rank of tensors living in a balanced tensor space in is expected to equal


 * $$ r_E(n_1,\ldots,n_d) = \left\lceil \frac{\Pi}{\Sigma+1} \right\rceil $$

almost everywhere for complex tensors and on a Euclidean-open set for real tensors, where


 * $$ \Pi = \prod_{k=1}^{d} n_k \quad\text{and}\quad \Sigma = \sum_{k=1}^{d} (n_k - 1). $$

More precisely, the rank of every tensor in $$\mathbb{C}^{n_1 \times \cdots \times n_d} \setminus Z$$, where $$Z$$ is some indeterminate closed set in the Zariski topology, is expected to equal the above value. For real tensors, $$r_E(n_1,\ldots,n_d)$$ is the least rank that is expected to occur on a set of positive Euclidean measure. The value $$r_E(n_1,\ldots,n_d)$$ is often referred to as the expected generic rank of the tensor space $$\mathbb{F}^{n_1 \times \cdots \times n_d}$$ because it is only conjecturally correct. It is known that the true generic rank always satisfies


 * $$ r(n_1, \ldots, n_d) \ge r_E(n_1, \ldots, n_d). $$

The Abo–Ottaviani–Peterson conjecture states that equality is expected, i.e., $$r(n_1,\ldots,n_d) = r_E(n_1,\ldots,n_d)$$, with the following exceptional cases:


 * $$ \mathbb{F}^{4 \times 4 \times 3}$$
 * $$ \mathbb{F}^{(2n+1) \times (2n+1) \times 3} \text{ with } n = 1, 2, \ldots$$
 * $$ \mathbb{F}^{(n+1) \times (n+1) \times 2 \times 2} \text{ with } n = 1, 2, \ldots$$

In each of these exceptional cases, the generic rank is known to be $$r(n_1,\ldots,n_d) = r_E(n_1,\ldots,n_d)+1$$. The conjecture has been proved completely in a number of special cases. Lickteig showed already in 1985 that $$r(n,n,n) = r_E(n,n,n)$$, provided that $$n \ne 3$$. In 2011, a major breakthrough was established by Catalisano, Geramita, and Gimigliano who proved that $$r(2,2,\ldots,2) = r_E(2,2,\ldots,2)$$, except for the space $$\mathbb{F}^2 \otimes \mathbb{F}^2 \otimes \mathbb{F}^2 \otimes \mathbb{F}^2$$.

Maximum rank
The maximum rank that can be admitted by any of the tensors in a tensor space is unknown in general; even a conjecture about this maximum rank is missing. Presently, the best general upper bound states that the maximum rank $$r_M(n_1,\ldots,n_d)$$ of $$\mathbb{F}^{n_1} \otimes \cdots \otimes \mathbb{F}^{n_d}$$, where $$n_1 \ge n_2 \ge \cdots \ge n_d$$, satisfies


 * $$r_M(n_1,\ldots,n_d) \le \min\left\{ \prod_{k=2}^d n_k, 2 \cdot r(n_1,\ldots,n_d) \right\},$$

where $$r(n_1,\ldots,n_d)$$ is the (least) generic rank of $$\mathbb{F}^{n_1} \otimes \cdots \otimes \mathbb{F}^{n_d}$$. It is well-known that the foregoing inequality may be strict. For instance, the generic rank of tensors in $$\mathbb{R}^{2\times 2 \times 2}$$ is two, so that the above bound yields $$r_M(2,2,2) \le 4$$, while the maximum rank is known to equal 3.

Border rank
A rank-$$s$$ tensor $$\mathcal{A}$$ is called a border tensor if there exists a sequence of tensors of rank at most $$r < s$$ whose limit is $$\mathcal{A}$$. If $$s$$ is the least value for which such a convergent sequence exists, then it is called the border rank of $$\mathcal{A}$$. For order-2 tensors, i.e., matrices, rank and border rank always coincide, however, for tensors of order $$\ge3$$ they may differ. Border tensors were first studied in the context of fast approximate matrix multiplication algorithms by Bini, Lotti, and Romani in 1980.

A classic example of a border tensor is the rank-3 tensor


 * $$\mathcal{A} = \mathbf{u} \otimes \mathbf{u} \otimes \mathbf{v} + \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{u} + \mathbf{v} \otimes \mathbf{u} \otimes \mathbf{u}, \quad \text{with } \|\mathbf{u}\| = \|\mathbf{v}\| = 1 \text{ and } \langle \mathbf{u}, \mathbf{v}\rangle \ne 1.$$

It can be approximated arbitrarily well by the following sequence of rank-2 tensors


 * $$\begin{align}

\mathcal{A}_n &= n (\mathbf{u} + \frac{1}{n} \mathbf{v}) \otimes (\mathbf{u} + \frac{1}{n} \mathbf{v}) \otimes (\mathbf{u} + \frac{1}{n} \mathbf{v}) - n \mathbf{u}\otimes\mathbf{u}\otimes\mathbf{u} \\ &= \mathbf{u} \otimes \mathbf{u} \otimes \mathbf{v} + \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{u} + \mathbf{v} \otimes \mathbf{u} \otimes \mathbf{u} + \frac{1}{n} (\mathbf{u}\otimes\mathbf{v}\otimes\mathbf{v} + \mathbf{v}\otimes\mathbf{u}\otimes\mathbf{v} + \mathbf{v}\otimes\mathbf{v}\otimes\mathbf{u}) + \frac{1}{n^2} \mathbf{v}\otimes\mathbf{v}\otimes\mathbf{v} \end{align}$$

as $$n \to \infty$$. Therefore, its border rank is 2, which is strictly less than its rank.

Identifiability
It follows from the definition of a pure tensor that $$\mathcal{A} = \mathbf{a}^1 \otimes \mathbf{a}^2 \otimes \cdots \otimes \mathbf{a}^d = \mathbf{b}^1 \otimes \mathbf{b}^2 \otimes \cdots \otimes \mathbf{b}^d$$ if and only if there exist $$\lambda_k$$ such that $$\lambda_1 \lambda_2 \cdots \lambda_k = 1$$ and $$\mathbf{a}^k = \lambda_k \mathbf{b}^k$$ for all k. For this reason, the parameters $$\{ \mathbf{a}^k \}_{k=1}^d$$ of a rank-1 tensor $$\mathcal{A}$$ are called identifiable or essentially unique. A rank-$$r$$ tensor $$\mathcal{A} \in F^{n_1} \otimes F^{n_2} \otimes \cdots \otimes F^{n_d}$$ is called identifiable if every of its tensor rank decompositions is the sum of the same set of $$r$$ distinct tensors $$\{ \mathcal{A}_1, \mathcal{A}_2, \ldots, \mathcal{A}_r \}$$ where the $$\mathcal{A}_i$$'s are of rank 1. An identifiable rank-$$r$$ thus has only one essentially unique decomposition $$\mathcal{A} = \sum_{i=1}^r \mathcal{A}_i,$$and all $$r!$$ other tensor rank decompositions of $$\mathcal{A}$$ can be obtained by permuting the order of the summands. Observe that in a tensor rank decomposition all the $$\mathcal{A}_i$$'s are distinct, for otherwise the rank of $$\mathcal{A}$$ would be at most $$r-1$$.

Order-2 tensors in $$F^{n_1} \otimes F^{n_2} \simeq F^{n_1 \times n_2}$$, i.e., matrices, are not identifiable for $$r > 1$$. This follows essentially from the observation $$\mathcal{A} = \sum_{i=1}^r \mathbf{a}_i \otimes \mathbf{b}_i = \sum_{i=1}^r \mathbf{a}_i \mathbf{b}_i^T = A B^T = (A X^{-1}) (B X^T)^T = \sum_{i=1}^r \mathbf{c}_i \mathbf{d}_i^T = \sum_{i=1}^r \mathbf{c}_i \otimes \mathbf{d}_i,$$where $$X \in \mathrm{GL}_{r}(F)$$ is an invertible $$r \times r$$ matrix, $$A = [\mathbf{a}_i]_{i=1}^r$$, $$B = [\mathbf{b}_i]_{i=1}^r$$, $$A X^{-1} = [\mathbf{c}_i]_{i=1}^r$$ and $$B X^T = [\mathbf{d}_i]_{i=1}^r$$. It can be shown that for every $$X \in \mathrm{GL}_n(F)\setminus Z$$, where $$Z$$ is a closed set in the Zariski topology, the decomposition on the right-hand side is a sum of a different set of rank-1 tensors than the decomposition on the left-hand side, entailing that order-2 tensors of rank $$r>1$$ are generically not identifiable.

The situation changes completely for higher-order tensors in $$F^{n_1} \otimes F^{n_2} \otimes \cdots \otimes F^{n_d}$$ with $$d > 2$$ and all $$n_i \ge 2$$. For simplicity in notation, assume without loss of generality that the factors are ordered such that $$n_1 \ge n_2 \ge \cdots \ge n_d \ge 2$$. Let $$S_r \subset F^{n_1} \otimes \cdots \otimes F^{n_d}$$denote the set of tensors of rank bounded by $$r$$. Then, the following statement was proved to be correct using a computer-assisted proof for all spaces of dimension $$\Pi < 15000 $$, and it is conjectured to be valid in general:

There exists a closed set $$Z_r$$ in the Zariski topology such that every tensor $$\mathcal{A} \in S_r\setminus Z_r$$is identifiable ($$S_r$$ is called generically identifiable in this case), unless either one of the following exceptional cases holds: $ . In these exceptional cases, the generic (and also minimum) number of complex decompositions is $$ in the first 4 cases; $$ and $$F^3 \otimes F^2 \otimes F^2 \otimes F^2 $$. In summary, the generic tensor of order $$d > 2 $$ and rank $r < \frac{\Pi}{\Sigma+1} $ that is not identifiability-unbalanced is expected to be identifiable (modulo the exceptional cases in small spaces).
 * 1) The rank is too large: $$r > r_E(n_1, n_2, \ldots, n_d)$$;
 * 2) The space is identifiability-unbalanced, i.e., $n_1 > \prod_{k=2}^d n_k - \sum_{k=2}^d (n_k - 1)$, and the rank is too large: $r \ge \prod_{k=2}^d n_k - \sum_{k=2}^d (n_k-1)$ ;
 * 3) The space is the defective case $$F^4 \otimes F^4 \otimes F^3$$ and the rank is $$r=5$$;
 * 4) The space is the defective case $$F^n \otimes F^n \otimes F^2 \otimes F^2$$, where $$n \ge 2$$, and the rank is $$r = 2n-1$$;
 * 5) The space is $$F^4 \otimes F^4 \otimes F^4$$ and the rank is $$r=6$$;
 * 6) The space is $$F^6 \otimes F^6 \otimes F^3$$ and the rank is $$r = 8$$; or
 * 7) The space is $$F^2 \otimes F^2 \otimes F^2 \otimes F^2 \otimes F^2$$ and the rank is $$r=5$$.
 * 8) The space is perfect, i.e., $\frac{\Pi}{\Sigma+1}$  is an integer, and the rank is $r = \frac{\Pi}{\Sigma+1}
 * proved to be $$\infty
 * proved to be two in case 5;
 * expected to be six in case 6;
 * proved to be two in case 7; and
 * expected to be at least two in case 8 with exception of the two identifiable cases $$F^5 \otimes F^4 \otimes F^3

Ill-posedness of the standard approximation problem
The rank approximation problem asks for the rank-$$r$$ decomposition closest (in the usual Euclidean topology) to some rank-$$s$$ tensor $$\mathcal{A}$$, where $$r \le s$$. That is, one seeks to solve


 * $$ \min_{\mathbf{a}_i^k \in \mathbb{F}^{n_k}} \left\| \mathcal{A} - \sum_{i=1}^{r} \mathbf{a}_i^1 \otimes \mathbf{a}_i^2 \otimes \cdots \otimes \mathbf{a}_i^d \right\|_F, $$

where $$\|\cdot\|_F$$ is the Frobenius norm.

It was show in a 2008 paper by de Silva and Lim that the above standard approximation problem may be ill-posed. A solution to aforementioned problem may sometimes not exist because the set over which one optimizes is not closed. As such, a minimizer may not exist, even though an infimum would exist. In particular, it is known that border tensors may be approximated arbitrarily well by a sequence of tensor of rank at most $$r$$, even though the limit of the sequence converges to a tensor of rank strictly higher than $$r$$. The tensor


 * $$\mathcal{A} = \mathbf{u} \otimes \mathbf{u} \otimes \mathbf{v} + \mathbf{u} \otimes \mathbf{v} \otimes \mathbf{u} + \mathbf{v} \otimes \mathbf{u} \otimes \mathbf{u}, \quad \text{with } \|\mathbf{u}\| = \|\mathbf{v}\| = 1 \text{ and } \langle \mathbf{u}, \mathbf{v}\rangle \ne 1$$

can be approximated arbitrarily well by the following sequence of tensors


 * $$ \mathcal{A}_n = n (\mathbf{u} + \frac{1}{n} \mathbf{v}) \otimes (\mathbf{u} + \frac{1}{n} \mathbf{v}) \otimes (\mathbf{u} + \frac{1}{n} \mathbf{v}) - n \mathbf{u}\otimes\mathbf{u}\otimes\mathbf{u} $$

as $$n \to \infty$$. This example neatly illustrates the general principle that a sequence of rank-$$r$$ tensors that converges to a tensor of strictly higher rank needs to admit at least two individual rank-1 terms whose norms become unbounded. Stated formally, whenever a sequence


 * $$ \mathcal{A}_n = \sum_{i=1}^r \mathbf{a}^1_{i,n} \otimes \mathbf{a}^2_{i,n} \otimes \cdots \otimes \mathbf{a}^d_{i,n} $$

has the property that $$\mathcal{A}_n \to \mathcal{A}$$ (in the Euclidean topology) as $$n\to\infty$$, then there should exist at least $$1 \le i \ne j \le r$$ such that


 * $$ \| \mathbf{a}^1_{i,n} \otimes \mathbf{a}^2_{i,n} \otimes \cdots \otimes \mathbf{a}^d_{i,n} \|_F \to \infty \text{ and  } \| \mathbf{a}^1_{j,n} \otimes \mathbf{a}^2_{j,n} \otimes \cdots \otimes \mathbf{a}^d_{j,n} \|_F \to \infty$$

as $$n\to\infty$$. This phenomenon is often encountered when attempting to approximate a tensor using numerical optimization algorithms. It is sometimes called the problem of diverging components. It was, in addition, shown that a random low-rank tensor over the reals may not admit a rank-2 approximation with positive probability, leading to the understanding that the ill-posedness problem is an important consideration when employing the tensor rank decomposition.

A common partial solution to the ill-posedness problem consists of imposing an additional inequality constraint that bounds the norm of the individual rank-1 terms by some constant. Other constraints that result in a closed set, and, thus, well-posed optimization problem, include imposing positivity or a bounded inner product strictly less than unity between the rank-1 terms appearing in the sought decomposition.

Calculating the CPD
Several algorithms exist ...

Alternating algorithms:
 * alternating least squares (ALS)
 * alternating slice-wise diagonalisation (ASD)

Algebraic algorithms:
 * simultaneous diagonalization (SD)
 * simultaneous generalized Schur decomposition (SGSD)

Optimization algorithms:
 * Levenberg–Marquardt (LM)
 * nonlinear conjugate gradient (NCG)
 * limited memory BFGS (L-BFGS)

Direct methods:
 * Direct multilinear decomposition (DMLD)

Many of these have been incorporated into software packages for tensor computations, such as:


 * Tensorlab
 * Tensor Toolbox
 * N-way toolbox

Chemometrics
Multi-way data are characterized by several sets of categorical variables that are measured in a crossed fashion. Chemical examples could be fluorescence emission spectra measured at several excitation wavelengths for several samples, fluorescence lifetime measured at several excitation and emission wavelengths or any kind of spectrum measured chromatographically for several samples. Determining such variables will give rise to three-way data; i.e., the data can be arranged in a cube instead of a matrix as in standard multivariate data sets.

In the field of chemometrics, a number of diagnostic tools and techniques exist to help a PARAFAC user determine the best fitting model. These include the core consistency diagnostic (CORCONDIA), split-half analyses, examination of the loadings, and residual analysis.

Multilinear multiplication
Let $$F $$ be a field of characteristic zero, such as $$\mathbb{R} $$ or $$\mathbb{C} $$.

Abstract definition
Let $$V_k$$ be a finite-dimensional vector space over $$F$$, and let $$\mathcal{A} \in V_1 \otimes V_2 \otimes \cdots \otimes V_d$$ be an order-d simple tensor, i.e., there exist some vectors $$\mathbf{v}_k \in V_k$$ such that $$\mathcal{A} = \mathbf{v}_1 \otimes \mathbf{v}_2 \otimes \cdots \otimes \mathbf{v}_d$$. If we are given a collection of linear maps $$A_k : V_k \to W_k$$, then the multilinear multiplication of $$\mathcal{A}$$ with $$(A_1, A_2, \ldots, A_d)$$ is defined as the action on $$\mathcal{A}$$ of the tensor product of these linear maps, namely$$\begin{array}{ll} A_1 \otimes A_2 \otimes \cdots \otimes A_d : &V_1 \otimes V_2 \otimes \cdots \otimes V_d \to W_1 \otimes W_2 \otimes \cdots \otimes W_d, \\ &\mathbf{v}_1 \otimes \mathbf{v}_2 \otimes \cdots \otimes \mathbf{v}_d \mapsto A_1(\mathbf{v}_1) \otimes A_2(\mathbf{v}_2) \otimes \cdots \otimes A_d(\mathbf{v}_d) \end{array} $$Since the tensor product of linear maps is itself a linear map, and because every tensor admits a tensor rank decomposition, the above expression extends linearly to all tensors. That is, for a general tensor $$\mathcal{A} \in V_1 \otimes V_2 \otimes \cdots \otimes V_d$$, the multilinear multiplication is$$\begin{array}{rcl} \mathcal{B} := (A_1 \otimes A_2 \otimes \cdots \otimes A_d)(\mathcal{A}) &=& (A_1 \otimes A_2 \otimes \cdots \otimes A_d)\left(\sum_{i=1}^r \mathbf{a}_i^1 \otimes \mathbf{a}_i^2 \otimes \cdots \otimes \mathbf{a}_i^d\right) \\ &=& \sum_{i=1}^r A_1(\mathbf{a}_i^1) \otimes A_2(\mathbf{a}_i^2) \otimes \cdots \otimes A_d( \mathbf{a}_i^d ) \end{array}$$where $\mathcal{A} = \sum_{i=1}^r \mathbf{a}_i^1 \otimes \mathbf{a}_i^2 \otimes \cdots \otimes \mathbf{a}_i^d$ with $$\mathbf{a}_i^k \in V_k$$ is one of $$\mathcal{A}$$'s tensor rank decompositions. The validity of the above expression is not limited to a tensor rank decomposition; in fact, it is valid for any expression of $$\mathcal{A}$$ as a linear combination of pure tensors.

It is standard to use the following shorthand notations in the literature for multilinear multiplications:$$(A_1, A_2, \ldots, A_d) \cdot \mathcal{A} := (A_1 \otimes A_2 \otimes \cdots \otimes A_d)(\mathcal{A}) \quad\text{and}\quad A_k \cdot_k \mathcal{A} := (\mathrm{Id}_{V_1}, \ldots, \mathrm{Id}_{V_{k-1}}, A_k, \mathrm{Id}_{V_{k+1}}, \ldots, \mathrm{Id}_{V_{d}}) \cdot \mathcal{A}, $$where $$\mathrm{Id}_{V_k} : V_k \to V_k $$ is the identity operator.

In coordinates
In computational multilinear algebra it is conventional to work in coordinates. Assume that an inner product is fixed on $$V_k$$ and let $$V_k^*$$ denote the dual vector space of $$V_k $$. Let $$\{ e_1^k, \ldots, e_{n_k}^k \} $$ be a basis for $$V_k $$, let $$\{ (e_1^k)^*, \ldots, (e_{n_k}^k)^* \} $$ be the dual basis, and let $$\{f_1^k, \ldots, f_{m_k}^k \} $$ be a basis for $$W_k $$. The linear map $M_k = \sum_{i=1}^{m_k} \sum_{j=1}^{n_k} m_{i,j} f_i^k \otimes (e_j^k)^* $ is then represented by the matrix $$\widehat{M}_k = [m_{i,j}] \in F^{m_k \times n_k}$$. Likewise, with respect to the standard tensor product basis $$\{ e_{j_1}^1 \otimes e_{j_2}^2 \otimes \cdots \otimes e_{j_d}^d \}_{j_1,j_2,\ldots,j_d} $$, the abstract tensor$$\mathcal{A} = \sum_{j_1=1}^{n_1} \sum_{j_2=1}^{n_2} \cdots \sum_{j_d=1}^{n_d} a_{j_1,j_2,\ldots,j_d} e_{j_1}^1 \otimes e_{j_2}^2 \otimes \cdots \otimes e_{j_d}^d $$is represented by the multidimensional array $$\widehat{\mathcal{A}} = [a_{j_1,j_2,\ldots,j_d}] \in F^{n_1 \times n_2 \times \cdots \times n_d} $$ . Observe that $$\widehat{\mathcal{A}} = \sum_{j_1=1}^{n_1} \sum_{j_2=1}^{n_2} \cdots \sum_{j_d=1}^{n_d} a_{j_1,j_2,\ldots,j_d} \mathbf{e}_{j_1}^1 \otimes \mathbf{e}_{j_2}^2 \otimes \cdots \otimes \mathbf{e}_{j_d}^d, $$where $$\mathbf{e}_j^k \in F^{n_k} $$ is the jth standard basis vector of $$F^{n_k} $$ and the tensor product of vectors is the Segre map $$\otimes : (\mathbf{v}^{(1)}, \mathbf{v}^{(2)}, \ldots, \mathbf{v}^{(d)}) \mapsto [v_{i_1}^{(1)} v_{i_2}^{(2)} \cdots v_{i_d}^{(d)}] $$. It follows from the above choices of bases that the multilinear multiplication $$\mathcal{B} = (M_1, M_2, \ldots, M_d) \cdot \mathcal{A} $$ becomes $$\begin{array}{rcl} \widehat{\mathcal{B}} &=& (\widehat{M}_1, \widehat{M}_2, \ldots, \widehat{M}_d) \cdot \sum_{j_1=1}^{n_1} \sum_{j_2=2}^{n_2} \cdots \sum_{j_d=1}^{n_d} a_{j_1,j_2,\ldots,j_d} \mathbf{e}_{j_1}^1 \otimes \mathbf{e}_{j_2}^2 \otimes \cdots \otimes \mathbf{e}_{j_d}^d \\ &=& \sum_{j_1=1}^{n_1} \sum_{j_2=2}^{n_2} \cdots \sum_{j_d=1}^{n_d} a_{j_1,j_2,\ldots,j_d} (\widehat{M}_1, \widehat{M}_2, \ldots, \widehat{M}_d) \cdot (\mathbf{e}_{j_1}^1 \otimes \mathbf{e}_{j_2}^2 \otimes \cdots \otimes \mathbf{e}_{j_d}^d) \\ &=& \sum_{j_1=1}^{n_1} \sum_{j_2=2}^{n_2} \cdots \sum_{j_d=1}^{n_d} a_{j_1,j_2,\ldots,j_d} (\widehat{M}_1 \mathbf{e}_{j_1}^1) \otimes (\widehat{M}_2 \mathbf{e}_{j_2}^2) \otimes \cdots \otimes (\widehat{M}_d \mathbf{e}_{j_d}^d) \\ &=& \sum_{j_1=1}^{n_1} \sum_{j_2=2}^{n_2} \cdots \sum_{j_d=1}^{n_d} a_{j_1,j_2,\ldots,j_d} \mathbf{m}_{j_1}^1 \otimes \mathbf{m}_{j_2}^2 \otimes \cdots \otimes \mathbf{m}_{j_d}^d, \end{array} $$where the $$\mathbf{m}_{j_k}^k $$'s form the columns of $$\widehat{M}_k = [\mathbf{m}_{j_k}^k]_{j_k=1}^{n_k} $$.

Properties
Let $$\mathcal{A} \in V_1 \otimes V_2 \otimes \cdots \otimes V_d $$ be an order-d tensor over the tensor product of $$F $$-vector spaces.

Since a multilinear multiplication is the tensor product of linear maps, we have the following multilinearity property (in the construction of the map):$$A_1 \otimes \cdots \otimes A_{k-1} \otimes (\alpha A_k + \beta B) \otimes A_{k+1} \otimes \cdots \otimes A_d = \alpha A_1 \otimes \cdots \otimes A_d + \beta A_1 \otimes \cdots \otimes A_{k-1} \otimes \beta B_k \otimes A_{k+1} \otimes \cdots \otimes A_d $$

Multilinear multplication is a linear map: $$(M_1, M_2, \ldots, M_d) \cdot (\alpha \mathcal{A} + \beta \mathcal{B}) = \alpha \; (M_1, M_2, \ldots, M_d) \cdot \mathcal{A} + \beta \; (M_1, M_2, \ldots, M_d) \cdot \mathcal{B} $$

It follows from the definition that the composition of two multilinear multiplications is also a multilinear multiplication: $$(M_1, M_2, \ldots, M_d) \cdot \left( (K_1, K_2, \ldots, K_d) \cdot \mathcal{A} \right) = (M_1 \circ K_1, M_2 \circ K_2, \ldots, M_d \circ K_d) \cdot \mathcal{A}, $$where $$M_k : U_k \to W_k $$ and $$K_k : V_k \to U_k $$ are linear maps.

Observe specifically that multilinear multiplications in different factors commute, $$M_k \cdot_k \left( M_\ell \cdot_\ell \mathcal{A} \right) = M_\ell \cdot_\ell \left( M_k \cdot_k \mathcal{A} \right) = M_k \cdot_k M_\ell \cdot_\ell \mathcal{A}, $$if $$k \ne \ell $$.

Applications
The higher-order singular value decomposition (HOSVD) factorizes a tensor given in coordinates $$\mathcal{A} \in F^{n_1 \times n_2 \times \cdots \times n_d} $$ as the multilinear multiplication $$\mathcal{A} = (U_1, U_2, \ldots, U_d) \cdot \mathcal{S} $$, where $$U_k \in F^{n_k \times n_k} $$ are orthogonal matrices and $$\mathcal{S} \in F^{n_1 \times n_2 \times \cdots \times n_d} $$.

=HOSVD=

In multilinear algebra, the higher-order singular value decomposition (HOSVD) of a tensor is a specific orthogonal Tucker decomposition. It may be regarded as one generalization of the matrix singular value decomposition. The HOSVD has applications in computer graphics, machine learning, scientific computing, and signal processing. Some key ingredients of the HOSVD can be traced as far back as F. L. Hitchcock in 1928, but it was L. R. Tucker who developed for third-order tensors the general Tucker decomposition in the 1960s , including the HOSVD. The HOSVD as decomposition in its own right was further advocated by L. De Lathauwer et al. in 2000.

As the HOSVD was studied in many scientific fields, it is sometimes historically referred to as multilinear singular value decomposition, m-mode SVD, or cube SVD, and sometimes it is incorrectly identified with a Tucker decomposition.

Definition
For the purpose of this article, the abstract tensor $$\mathcal{A}$$ is assumed to be given in coordinates with respect to some basis as a multidimensional array, also denoted by $$\mathcal{A}$$, in $$F^{n_1 \times n_2 \times \cdots \times n_d}$$, where d is the order of the tensor and $$F$$ is either $$\mathbb{R}$$ or $$\mathbb{C}$$.

Let $$U_k \in F^{n_k \times n_k}$$be a unitary matrix containing a basis of the left singular vectors of the standard factor-k flattening $$\mathcal{A}_{(k)}$$ of $$\mathcal{A}$$ such that the jth column $$\mathbf{u}_j^k$$ of $$U_k$$ corresponds to the jth largest singular value of $$\mathcal{A}_{(k)}$$. Observe that the factor matrix $$U_k$$ does not depend on the particular freedom of choice in the definition of the standard factor-k flattening. By the properties of the multilinear multiplication, we have$$\begin{array}{rcl} \mathcal{A} &=& (I, I, \ldots, I) \cdot \mathcal{A} \\ &=& (U_1 U_1^H, U_2 U_2^H, \ldots, U_d U_d^H) \cdot \mathcal{A} \\ &=& (U_1, U_2, \ldots, U_d) \cdot \left( (U_1^H, U_2^H, \ldots, U_d^H) \cdot \mathcal{A} \right), \end{array}$$where $$\cdot^H$$ denotes the conjugate transpose. The second equality is because the $$U_k$$'s are unitary matrices. Define now the core tensor$$\mathcal{S} := (U_1^H, U_2^H, \ldots, U_d^H) \cdot \mathcal{A}.$$

Then, the HOSVD of $$\mathcal{A}$$ is the decomposition $$\mathcal{A} = (U_1, U_2, \ldots, U_d) \cdot \mathcal{S}.$$The above construction shows that every tensor has a HOSVD.

Compact HOSVD
As in the case the compact singular value decomposition of a matrix, it is also possible to consider a compact HOSVD, which is very useful in applications.

Assume that $$U_k \in F^{n_k \times r_k}$$ is a matrix with unitary columns containing a basis of the left singular vectors corresponding to the nonzero singular values of the standard factor-k flattening $$\mathcal{A}_{(k)}$$ of $$\mathcal{A}$$. Let the columns of $$U_k$$ be sorted such that the jth column $$\mathbf{u}_j^k$$ of $$U_k$$ corresponds to the jth largest nonzero singular value of $$\mathcal{A}_{(k)}$$. Since the columns of $$U_k$$ form a basis for the image of $$\mathcal{A}_{(k)}$$, we have $$\mathcal{A}_{(k)} = U_k U_k^H \mathcal{A}_{(k)} = \bigl( (U_k U_k^H) \cdot_k \mathcal{A} \bigr)_{(k)},$$where the first equality is due to the properties of orthogonal projections (in the Hermitian inner product) and the last equality is due to the properties of multilinear multiplication. As flattenings are bijective maps and the above formula is valid for all $$k=1,2,\ldots,d$$, we find as before that $$\begin{array}{rcl} \mathcal{A} &=& (U_1 U_1^H, U_2 U_2^H, \ldots, U_d U_d^H) \cdot \mathcal{A} \\ &=& (U_1, U_2, \ldots, U_d) \cdot \left( (U_1^H, U_2^H, \ldots, U_d^H) \cdot \mathcal{A} \right) \\ &=& (U_1, U_2, \ldots, U_d) \cdot \mathcal{S}, \end{array}$$where the core tensor $$\mathcal{S}$$ is now of size $$r_1 \times r_2 \times \cdots \times r_d$$.

Multilinear rank
The tuple $$(r_1, r_2, \ldots, r_d) \in \mathbb{N}^d$$ where $$r_k := \mathrm{rank}( \mathcal{A}_{(k)} )$$ is nowadays called the multilinear rank of $$\mathcal{A}$$. By definition, every tensor has a unique multilinear rank, and its components are bounded by $$0 \le r_k \le n_k$$. Not all tuples in $$\mathbb{N}^d$$ are multilinear ranks. In particular, it is known that $r_k \le \prod_{j \ne k} r_j$ must hold.

The compact HOSVD is a rank-revealing factorization in the sense that the dimensions of its core tensor correspond with the components of the multilinear rank of the tensor.

Interpretation
The following geometric interpretation is valid for both the full and compact HOSVD. Let $$(r_1, r_2, \ldots, r_d)$$ be the multilinear rank of the tensor $$\mathcal{A}$$. Since $$\mathcal{S} \in F^{r_1 \times r_2 \times \cdots \times r_d}$$ is a multidimensional array, we can expand it as follows $$\mathcal{S} = \sum_{j_1=1}^{r_1} \sum_{j_2=1}^{r_2} \cdots \sum_{j_d=1}^{r_d} s_{j_1,j_2,\ldots,j_d} \mathbf{e}_{j_1}^1 \otimes \mathbf{e}_{j_2}^2 \otimes \cdots \otimes \mathbf{e}_{j_d}^d,$$where $$\mathbf{e}_j^k$$ is the jth standard basis vector of $$F^{n_k}$$. By definition of the multilinear multiplication, it holds that $$\mathcal{A} = \sum_{j_1=1}^{r_1} \sum_{j_2=1}^{r_2} \cdots \sum_{j_d=1}^{r_d} s_{j_1,j_2,\ldots,j_d} \mathbf{u}_{j_1}^1 \otimes \mathbf{u}_{j_2}^2 \otimes \cdots \otimes \mathbf{u}_{j_d}^d,$$where the $$\mathbf{u}_j^k$$ are the columns of $$U_k \in F^{n_k \times r_k}$$. It is easy to verify that $$B = \{ \mathbf{u}_{j_1}^1 \otimes \mathbf{u}_{j_2}^2 \otimes \cdots \otimes \mathbf{u}_{j_d}^d \}_{j_1,j_2,\ldots,j_d}$$ is an orthonormal set of tensors. This means that the HOSVD can be interpreted as a way to express the tensor $$\mathcal{A}$$ with respect to a specifically chosen orthonormal basis $$B$$ with the coefficients given as the multidimensional array $$\mathcal{S}$$.

Computation
Let $$\mathcal{A} \in F^{n_1 \times n_2 \times \cdots \times n_d}$$, where $$F$$ is either $$\mathbb{R}$$ or $$\mathbb{C}$$, be a tensor of multilinear rank $$(r_1, r_2, \ldots, r_d)$$.

Classic computation
The classic strategy for computing a compact HOSVD was introduced in 1966 by L. R. Tucker and further advocated by L. De Lathauwer et al. ; it is based on the definition of the decomposition. The next three steps are to be performed:
 * For $$k=1,2,\ldots,d$$, do the following:
 * 1) Construct the standard factor-k flattening $$\mathcal{A}_{(k)}$$;
 * 2) Compute the (compact) singular value decomposition $$\mathcal{A}_{(k)} = U_k \Sigma_k V^T_k $$, and store the left singular vectors $$U_k \in F^{n_k \times r_k}$$;
 * 3) Compute the core tensor $$\mathcal{S}$$ via the multilinear multiplication $$ \mathcal{S} = (U_1^H, U_2^H, \ldots, U_d^H) \cdot \mathcal{A}. $$

Interlaced computation
A strategy that is significantly faster when some or all $$r_k \ll n_k $$ consists of interlacing the computation of the core tensor and the factor matrices, as follows :
 * Set $$\mathcal{A}^0 = \mathcal{A}$$;
 * For $$k = 1,2, \ldots, d$$ perform the following:
 * Construct the standard factor-k flattening $$\mathcal{A}_{(k)}^{k-1}$$;
 * Compute the (compact) singular value decomposition $$\mathcal{A}_{(k)}^{k-1} = U_k \Sigma_k V^T_k $$, and store the left singular vectors $$U_k \in F^{n_k \times r_k}$$;
 * Set $$\mathcal{A}^k = U_k^H \cdot_k \mathcal{A}^{k-1} $$, or, equivalently, $$\mathcal{A}^k_{(k)} = \Sigma_k V_k^T $$.

Approximation
In applications, such as those mentioned below, a common problem consists of approximating a given tensor $$\mathcal{A} \in F^{n_1 \times n_2 \times \cdots \times n_d} $$ by one of low multilinear rank. Formally, if $$\mathrm{mlrank}(\mathcal{A}) $$ denotes the multilinear rank of $$\mathcal{A} $$, then the nonlinear non-convex $$\ell_2 $$-optimization problem is $$\min_{\mathcal{B}\in F^{n_1 \times n_2 \times \cdots \times n_d}} \frac{1}{2} \| \mathcal{A} - \mathcal{B} \|_F^2 \quad\text{s.t.}\quad \mathrm{mlrank}(\mathcal{B}) = (s_1, s_2, \ldots, s_d), $$where $$(s_1, s_2, \ldots, s_d) \in \mathbb{N}^d $$, where $$0 \le s_k \le n_k $$, is a target multilinear rank that is assumed to be given, and where the norm is the Frobenius norm.

Based on the foregoing algorithms for computing a compact HOSVD, a natural idea for trying to solve this optimization problem is to truncate the (compact) SVD in step 2 of either the classic or the interlaced computation. A classically truncated HOSVD is obtained by replacing step 2 in the classic computation by while a sequentially truncated HOSVD (or successively truncated HOSVD) is obtained by replacing step 2 in the interlaced computation by Unfortunately, neither of these strategies results in an optimal solution of the best low multilinear rank optimization problem, contrary to the matrix case where the Eckart-Young theorem holds. However, both the classically and sequentially truncated HOSVD result in a quasi-optimal solution : if $$\mathcal{B}_t $$ denotes the classically or sequentially truncated HOSVD and $$\mathcal{B}^* $$ denotes the optimal solution to the best low multilinear rank approximation problem, then$$\| \mathcal{A} - \mathcal{B}_t \|_F \le \sqrt{d} \| \mathcal{A} - \mathcal{B}^* \|_F; $$in practice this means that if there exists an optimal solution with a small error, then a truncated HOSVD will for many intended purposes also yield a sufficiently good solution.
 * Compute a rank-$$s_k $$ truncated SVD $$\mathcal{A}_{(k)} \approx U_k \Sigma_k V^T_k $$, and store the top $$s_k $$ left singular vectors $$U_k \in F^{n_k \times s_k}$$;
 * Compute a rank-$$s_k $$ truncated SVD $$\mathcal{A}_{(k)}^{k-1} \approx U_k \Sigma_k V^T_k $$, and store the top $$s_k $$ left singular vectors $$U_k \in F^{n_k \times s_k}$$;

Applications
The HOSVD is most commonly applied to the extraction of relevant information from multi-way arrays.

Circa 2001, Vasilescu reframed the data analysis, recognition and synthesis problems as multilinear tensor problems based on the insight that most observed data are the result of several causal factors of data formation, and are well suited for multi-modal data tensor analysis. The power of the tensor framework was showcased in a visually and mathematically compelling manner by decomposing and representing an image in terms of its causal factors of data formation, in the context of Human Motion Signatures, face recognition - TensorFaces and computer graphics—TensorTextures.

The HOSVD has been successfully applied to signal processing and big data, e.g., in genomic signal processing. These applications also inspired a higher-order GSVD (HO GSVD) and a tensor GSVD.

A combination of HOSVD and SVD also has been applied for real-time event detection from complex data streams (multivariate data with space and time dimensions) in disease surveillance.

It is also used in tensor product model transformation-based controller design. In multilinear subspace learning, it was applied to modeling tensor objects for gait recognition.

The concept of HOSVD was carried over to functions by Baranyi and Yam via the TP model transformation. This extension led to the definition of the HOSVD-based canonical form of tensor product functions and Linear Parameter Varying system models and to convex hull manipulation based control optimization theory, see TP model transformation in control theories.