Talk:Matrix calculus/Archive 3

Edit wars are annoying
To the editors that are currently edit-warring on this page: This constantly pops up on my watchlist. Please sort it out or take appropriate measures as suggested by WP:EDITWAR. Cs32en 05:53, 14 September 2009 (UTC)
 * The appropriate measure may be semiprotection. You may suggest it at WP:RFPP, but I probably shouldn't even request it.  — Arthur Rubin  (talk) 08:55, 14 September 2009 (UTC)
 * Not sure if that would be the best way, given the specific characteristics of that dispute. Which version should be protected, and if it's not fully protected, would that mean that one of the warriors is effectively blocked here, while the other is not. They both should explain their position at the talk page. COI is not a sufficient argument for not having a valid piece of information in the article, but, especially if COI is an issue, the burden of proof is on the editor who wants to add the information. Cs32en  20:49, 14 September 2009 (UTC)
 * Relevant info can be found at Wikipedia_talk:WikiProject_Spam/2009_Archive_Aug_2. - MrOllie (talk) 21:11, 14 September 2009 (UTC)


 * The appendix can be found at http://www.stanford.edu/ dattorro/matrixcalc.pdf. It's about the topic, so I wouldn't classify it as obvious spam. On the other hand, it isn't really helpful for the readers of this article, as it presupposes a general understanding of matrix calculus. Please add your arguments for or against inclusion in the following sub-sections. Cs32en  22:20, 14 September 2009 (UTC)

Isomorphic spaces
http://en.wikipedia.org/w/index.php?title=User:Paulginz&action=edit&redlink=1 On the vector space $$\mathbb K^{pq}$$, the result of the inner product is an element of $$\mathbb K$$, while this is not true for the matrix space $$\mathbb K^{p\times q}$$. Therefore, we cannot simply extend vector calculus to matrix calculus. Cs32en 20:46, 12 December 2009 (UTC)
 * They're isomorphic as vector spaces over $$\mathbb K$$; any product operations may be different. The vec operator is a vector space isomorphism. from $$\mathbb K^{p\times q}$$ to $$\mathbb K^{pq}$$.  — Arthur Rubin  (talk) 22:52, 15 December 2009 (UTC)
 * I agree with Arthur. Although the matrix product is sometimes refered to as the inner product, it is not really a vector space inner product precisely because it does not take values in a field. --Paulginz (talk) 21:41, 12 April 2010 (UTC)
 * Since it is clear, that $$\mathbb R^{pq}$$ and $$\mathbb R^{p\times q}$$ are isomorphic by the vec operator, which is even an isometry, should it not be mentioned in the text and the confusing "dubious" mark removed? Also the vec operator appears in formulae yet vec is not defined!--Waldelefant (talk) —Preceding undated comment added 17:59, 22 July 2010 (UTC).


 * One can define a product on $$W = \mathbb K^{p \times q}$$ that is equivalent to the inner product on $$\mathbb K^{pq}$$, i.e. if we denote this product function as $$p$$, and there is an isomorphism $$f: V \longrightarrow W$$, then $$\forall v_1, v_2 \in W : p(f(v_1),f(w_2)) = \langle v_1, v_2 \rangle$$ However, we also need other products on $$\mathbb K^{p \times q}$$ in matrix calculus, which are not equivalent to the inner product on $$\mathbb K^{pq}$$. Cs32en   Talk to me  02:30, 23 July 2010 (UTC)


 * There is a "canonical" Euclidean scalar product, the Frobenius scalar product, defined by $$\langle X, Y\rangle = \operatorname{tr}(X^TY) = \sum_{i,j} X_{i,j}Y_{i,j}$$. I do understand, that there are more scalar products. But there are also more scalar products on $$\mathbb R^n$$. So clearly it should be mentioned which scalar product is used in context of isometric. But still $$\mathbb R^{pq}$$ and $$\mathbb R^{p\times q}$$ are isomorphic as vector space. Since any pair of norms are equivalent on finite dimensional space, $$\mathbb R^{pq}$$ and $$\mathbb R^{p\times q}$$ are equivalent from the view of analysis. So why is the isomorphic mark dubious?
 * To be really strict, we could at least write "$$\mathbb R^{pq}$$ and $$\mathbb R^{p\times q}$$ are isomorphic as $$\mathbb R$$-vector space" or drop it entirely! Otherwise, it is just confusing.
 * --Waldelefant (talk) —Preceding undated comment added 10:05, 23 July 2010 (UTC).


 * I think that the tag refers to the wording "not much changes with matrix spaces", rather that to the statement that the vector spaces would be isomorphic, which they actually are. What changes is that we have to be much more careful about which products we are using.  Cs32en   Talk to me  04:03, 24 July 2010 (UTC)


 * Now makes a little sense. Should we rephrase the sentence to "The derivative in the space of n×m matrices is similar/connected/related to the one in the tuple space Rnm, since both vector spaces are isomorphic. To maintain the matrix structure, not any formula has a literal analogon."
 * The phrase "simple functions" is very sluggish.
 * One does not define the derivative for simple functions, but defines it for all differentiable functions.
 * It should be clear, why the derivatives are not the same.
 * We should probably give an example for difference.
 * Waldelefant (talk) 13:28, 24 July 2010 (UTC)

Magnus/Neudecker
Arthur, please do not alter excerpts from Magnus/Neudecker. If you really feel you have to, do remove the entire sentence or paragraph. Cs32en 22:54, 13 December 2009 (UTC) -- Addition: I'm ok with tagging the formula as "contradictory", as this might encourage other editors to comment on it. (Unsurprisingly, I don't think the formula would be actually contradictory.) Cs32en  23:09, 13 December 2009 (UTC)

vector calculus vs. matrix calculus
In the vector calculus article the gradient of a scalar valued function is a vector, that is, ∇f(x) has the same dimension as x for f : Rn->R. Perhaps this article should use the same convention or at least points out the differences. —Preceding unsigned comment added by 134.99.156.154 (talk) 16:24, 21 July 2010 (UTC)
 * Yes, it is a vector, but it's a vector in the dual space of the linear functionals of the vector space that x belongs to. Therefore, if represented by a matrix, it is usually written as a row vector. Cs32en   Talk to me  18:08, 21 July 2010 (UTC)
 * It is a convention to understand the gradient as the derivative (Fréchet) rather than the vector which induces the linear map. It seems, the vector calculus and gradient article prefer the latter one. Even though they explicitly write gradients as row vectors, all formulae make sense only if the gradient ∇f(x) and x are from the same space.
 * Ahem. The Fréchet derivative of a scalar function with respect to a vector is a linear functional on that vector space, making it belong to the dual space. — Arthur Rubin  (talk) 08:36, 22 July 2010 (UTC)
 * Yes, and a linear functional is uniquely determined by a vector. The point is, should the gradient be the linear functional or the vector which induces it. —Preceding unsigned comment added by 134.99.156.154 (talk) 08:58, 22 July 2010 (UTC)
 * Actually, it's a co-vector. In a matrix context, it makes more sense for even a vector derivative to satisfy
 * $$\delta f \approx \frac{\partial f}{\partial x} \delta x$$
 * as matrix multiplication, rather than as a dot product. — Arthur Rubin  (talk) 15:19, 22 July 2010 (UTC)
 * Okay, let me rephrase my question.
 * Suppose $$Df(x)[.]$$ denote the derivative of $$f$$ at $$x$$ in Fréchet sense, that is as linear map.
 * Should we define the gradient $$\nabla f(x)$$ as
 * $$Df(x)[h] = \nabla f(x)h$$ or
 * $$Df(x)[h] = \langle \nabla f(x), h \rangle$$.
 * It seems the gradient article prefers (2) and (2) can be generalize to matrix using $$\langle X,Y \rangle = \operatorname{tr}(X^TY) $$.

--134.99.156.154 (talk) 16:26, 22 July 2010 (UTC)
 * It's noted (above) that we're using a unique notation. For the derivative of a scalar function of a matrix, the question of whether:
 * $$\delta Y \approx \operatorname{tr}\left( \frac {\partial Y}{\partial X} \delta X\right)$$
 * or
 * $$\delta Y \approx \operatorname{tr}\left(\left( \frac {\partial Y}{\partial X}\right)^{\mathrm T} \delta X\right)$$
 * is one for further consideration, but, as far as I know, all editors have proposed the first, until now. — Arthur Rubin  (talk) 16:59, 22 July 2010 (UTC)
 * Since this article links to the vector calculus and gradient article and they prefer the (2) should we not point out the difference? That was my question from beginning ...--Waldelefant (talk) —Preceding undated comment added 17:47, 22 July 2010 (UTC).
 * Definition (2), i.e. $$Df(x)[h] = \langle \nabla f(x), h \rangle$$, allows for additional flexibility, as the scalar product of two vectors can be defined in different ways. Definition (1), i.e. $$Df(x)[h] = \nabla f(x)h$$, assumes that the vector space is equipped with the canonical scalar product. However, the first definition may be more easily understood by the reader. After all, the term "matrix calculus" somehow implies that not every possible algebraic complication is being considered in this article. Cs32en   Talk to me  02:01, 23 July 2010 (UTC)

This requires index notation
In matrix multiplication $$\mathbf{A} \times \mathbf{B}$$, the rows of the matrix $$\mathbf{A}$$ are multiplied with columns of $$\mathbf{B}$$ to obtain elements of the result matrix. This is formally called contraction along the row of $$\mathbf{A}$$ and column of $$\mathbf{B}$$. However, if $$\mathbf{A}$$ and $$\mathbf{B}$$ are higher order tensors, there is no consensus on which index/dimension along which the contraction is to be done in defining a multiplication. In fact, as mentioned correctly in the article, the construction of tensors with notation like $$\frac{\partial \mathbf{X}}{\partial \mathbf{Y}}$$ can have dubious meaning since it does not clearly state which columns or rows of the matrices go towards creating which of the 4 dimensions of the tensor (i.e. if $$\mathbf{Z} = \frac{\partial \mathbf{X}}{\partial \mathbf{Y}}$$, it's not clear if $$Z_{abcd} = \frac{\partial X_{ab}}{\partial Y_{cd}}$$ or $$Z_{abcd} = \frac{\partial X_{ac}}{\partial Y_{bd}}$$ or something else). Moreover, the product $$\frac{\partial \mathbf{Z}} {\partial \mathbf{Y}} \frac{\partial \mathbf{Y}} {\partial \mathbf{X}}$$, apparently between two tensors, is not clear since there is no convention to which indices need to be contracted (See Tensor contraction).

The resolution of this conflict was the sole cause for the introduction of Einstein notation or the index notation. In index notation one uses contraction of indices to define multiplication among tensors without getting into such dubious notations. In such a notation, as an example, one defines the multiplication between two 4th-order tensors $$\mathbf{A}$$ and $$\mathbf{B}$$ by explicitly mentioning the contraction indices. For example, one way of multiplying them to obtain another 4th order tensor, $$\mathbf{C}$$, will be $$C_{abij} = \sum_p \sum_q A_{abpq} B_{ipjq}$$, where the subscripted quantities represents elements of the 4th order tensors. In Einstein notation often the summation signs are dropped by convention.

So, I think the ideal thing to do for this article will be to add the following in the "Notes" section at the beginning: '''This article describes notations that have often been attributed to create dubious meaning and conflict among different text books and mathematicians. During his development of general relativity, Albert Einstein introduced the Einstein notation or the index notation in order to deal with tensor multiplications and tensor calculus in a systematic manner. To get a modern and more accurate perspective of matrix and higher order tensor calculus see: Einstein notation, Tensor, Tensor field and Tensor contraction.'''

- Subh83 (talk &#124; contribs) 03:02, 19 March 2011 (UTC)