User:Waldelefant/Matrix Calculus

Definition and Notation
For positive integers $$m,n$$ let $$M(m,n)=\mathbf R^{m\times n}$$ denote the space of $$m\times n$$ matrices over field $$\mathbf R$$ of real numbers. With the "standard" inner product defined by
 * $$\langle X,Y \rangle = \operatorname{tr}(X^T Y) = \sum_{i,j} X_{i,j}Y_{i,j}$$,

for $$X,Y\in M(m,n)$$, the space of matrices is a hilbert space. The induced norm is the Frobenius norm
 * $$\|X\| = \|X\|_F = \sqrt{\langle X,X \rangle}$$.

The (columnwise) vec operator $$\operatorname{vec} : M(m,n)\to \mathbf R^{mn}$$ is defined by stacking the columns over each others, that is, for $$X=\begin{bmatrix}X_1,\ldots,X_n\end{bmatrix}$$ with the columns $$X_1, \ldots, X_n$$ it follows
 * $$\operatorname{vec}X = \begin{bmatrix}X_1\\ \vdots \\ X_n\end{bmatrix}$$.

The vec operator is a $$\mathbf R$$-linear isomorphism between the space of $$m\times n$$ matrices and the standard Euclidean space $$\mathbf R^{mn}$$. Thus calculus in $$M(m,n)$$ reduces to calculus in $$\mathbf R^{mn}$$.

For the (matrix valued) map $$F : M(m,n)\to M(p,q)$$ consider the vectorized counterpart $$f : \mathbf R^{mn}\to \mathbf R^{pq}$$ defined by
 * $$f(\operatorname{vec}X) = \operatorname{vec}F(X)$$.

Since the vec operator is an isomorphism, the map $$F$$ is differentiable (in Fréchet sense) in $$X\in M(m,n)$$ if and only if $$f$$ is differentiable in $$x=\operatorname{vec}X$$. Let $$DF(X)[H]$$ denote the derivative of $$F$$ at $$X$$ in direction $$H\in M(m,n)$$. Applying the chain rule, we have
 * $$Df(x)[h] = \operatorname{vec}(DF(X)[H])$$

for $$h = \operatorname{vec}H$$. The Jacobian (matrix) $$J_F(X)$$ of $$F$$ at $$X$$ is defined as the Jacobian matrix $$J_f(x)$$ of $$f$$ in $$x$$ which induced the differential $$Df(x)$$, that is $$Df(x)[h] = J_f(x)h$$.

Remarks on the Definition

 * 1) The definitions above imply the $$n$$-tuple space $$\mathbf R^n$$ is $$M(n,1)$$. This is a common convention yet not a necessity. For other conventions the presented formulae must be adapted accordingly.
 * 2) This article intentionally does not define formal quotients like $$dF/dX$$. The Definition of $$dF/dX$$ usually depends on applications and there is no consensus on a natural definition. Whereas the concept Jacobian matrix is sharp, since "Jacobian" defines shape and ordering of the components and "matrix" excludes differential form and higher order tensors.
 * 3) The sense of Jacobian matrix used in this article is new. The usual notations are $$d\operatorname{vec}F/d\operatorname{vec}X$$, $$d\operatorname{vec}F/d\operatorname{vec}X^\mathrm T$$ or simply $$dF/dX$$. The last one is ambiguous as noted above.
 * 4) The expression $$J_F(X)\operatorname{vec}H$$ is properly defined, whereas $$J_F(X)H$$ is not!! Abusing the notation and defining $$J_F(X)H:=J_F(X)\operatorname{vec}H$$ makes less sense, as one can always write $$DF(X)[H]$$, which is always well defined.

Properties
Let $$F,G$$ be differentiable matrix valued maps, $$X$$ be a matrix and $$\alpha,\beta\in\mathbf R$$ be real numbers. Let $$I_n$$ be the $$n\times n$$ identity matrix. Finally let $$\otimes$$ denote the Kronecker product. Assuming the matrix expression on left is well defined, that is, the matrices have the right sizes, then it follows:

Linearity:
 * $$J_{\alpha F + \beta G}(X) = \alpha J_F(X) + \beta J_G(X)$$,

Product rule:
 * $$J_{FG}(X) = (I_r\otimes F(X))J_G(X) + (G(X)^\mathrm T\otimes I_p)J_F(X)$$,

for $$F(X)\in Mat(p,q)$$ and $$G(X)\in Mat(q,r)$$,

Chain rule:
 * $$J_{G\circ F}(X) = J_G(F(X))J_F(X)$$,

for $$(G\circ F)(X) = G(F(X))$$.

Linearforms
Trace:
 * $$J_\operatorname{tr}(X) = \sum_{i=1}^n e_i\otimes e_i$$,

for $$X\in M(n,n)$$.

Inner product:
 * 1) $$J_{\langle F,G\rangle}(X) = (\operatorname{vec}F(X))^\mathrm T J_G(X) + (\operatorname{vec}G(X))^\mathrm T J_F(X)$$,
 * 2) $$J_{\langle AXB,X\rangle}(X) = (\operatorname{vec} X)^\mathrm T ((B^\mathrm T\otimes A)^T + (B^\mathrm T\otimes A)).$$