Talk:Matrix calculus

Factual accuracy dispute
In attempt to resolve this, shouldn't the external links be inline cited to the matrix derivative identites? Some of the identities here are in those externally linked resources. Also I added a few extra external links which may be of interest.--Maschen (talk) 10:19, 3 December 2011 (UTC)
 * I cleaned up material cited from Magnus and Neudecker, but I'm not sure whether that's the portion that the tag applied to. I removed the  tag, since that didn't apply, but the material attributed to Magnus and Neudecker was clearly wrong; the  tags were not consistent with Magnus and Neudecker, and the accompanying text was trimmed down to the point that it lost context.
 * Also, to view the text that's cited, see http://www.amazon.com/dp/047198633X using "Search inside this book", and search for "bad notation". —Steve98052 (talk) 08:46, 30 June 2012 (UTC)

Matrix integration?
Could someone knowlagable please add a section on this? The current article only has matrix differentiation, not integation.--Maschen (talk) 10:51, 3 December 2011 (UTC)

I'm new to this. However, according to my understanding, there may not be very general and useful rule. One way that should always work for integration is to find corresponding derivative... Hupili (talk) 09:10, 18 March 2012 (UTC)

Request for derivatives of inverse matrix identities
I believe information from:


 * http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.AppD.d/IFEM.AppD.pdf
 * https://nrich.maths.org/discus/messages/7601/150862.html?1302548396
 * http://planetmath.org/encyclopedia/DerivativeOfInverseMatrix.html

Should be integrated into this article or the creation of a new article talking specifically about many of these properties. — Preceding unsigned comment added by 150.135.222.177 (talk) 19:50, 17 February 2012 (UTC)


 * See above.  I can't think of a way to integrate it.  — Arthur Rubin  (talk) 06:44, 18 February 2012 (UTC)

Unifying the Notation
I think this page appears as disaster for new comers. I've read it for several times and referred to several sources. According to my understanding, the notation used in the first few sections, like "chain rule", is the transpose of that in the last few sections, like "example. Can anyone help to make this clear? Or, simply placing a warning there will be much better than the current situation. Hupili (talk) 09:16, 18 March 2012 (UTC)

fixed notation, hopefully
I rewrote this article almost entirely. Now, rather than try (and fail) to stick to one notation when there's no consistency among the sources, I present all identities according to different possible notations. Since there isn't even any self-consistency in notation in many sources, I separate all the identities according to type of numerator and denominator so that e.g. if a given source uses one type of layout for one numerator/denominator type and another layout for a different type, you can make sense by mixing and matching the appropriate identities.

I've elided entirely any discussion of derivatives that produce results beyond two dimensions (e.g. vector-by-matrix or matrix-by-matrix). I'm aware that some authors have indeed defined such derivatives, but I don't have much experience with such larger-dimensional aggregates and I imagine the notation is even less consistent here than elsewhere -- at least vectors and matrices themselves are pretty well-defined.

Benwing (talk) 01:22, 6 April 2012 (UTC)


 * You have. Thank you. =) F = q(E+v×B) ⇄ ∑ici 17:49, 6 May 2012 (UTC)

Re-Organized the article
I have reorganized the article to make in an attempt to make it more accessible to non-experts. In this spirit I have put definitions toward the front and identities toward the end of the article, as well as choosing one notation (numerator layout) for the first few sections, leaving technical discussion about other notations further along in the article.

Please give me any feedback on this. To make a major change like this one does have to make a number of small decisions, and I apologize if others feel that I have taken away clarity from one of the original sections by making it part of this new organization. Some immediate example are the sections 'usages' and 'relation to other derivatives', which have been moved to the new first section called 'Scope' because I feel they are important both to the expert and the novice in the subject. Of course, when I say novice I do not mean someone who has not math background, but someone with some knowledge of calculus and linear algebra who has never heard of a derivative involving a matrix.

Finally, the newly added discussion on differential form is very good, but is still mainly only in the identities section. Perhaps we could move some of the initial discussion of differential form into the notation section, or in each of the definition sections. — Preceding unsigned comment added by Brent Perreault (talk • contribs) 15:18, 18 May 2012 (UTC)


 * Those edits were probably fine, thanks for the good faith and helpful edits. =) Moving things around for better continuity isn't a problem, by all means do so, if you think it will be better. F = q(E+v×B) ⇄ ∑ici 17:08, 18 May 2012 (UTC)

Unfortunately
I was surprised to see that this article uses "unfortunately" four times. This seems odd - it's surely not up to an encyclopaedia to editorialize about what is or is not unfortunate, especially in a topic like this. I'm hoping that someone with knowledge of the topic can, without losing meaning, replace these with something more appropriate. Thanks and best wishes 82.45.217.156 (talk) 17:04, 7 June 2012 (UTC)

I removed the 3 uses of the word "unfortunately" when speaking about the notational conventions. Many authors would argue that having multiple conventions serves a positive purpose for the users of matrix calculus. In fact, as indicated in the article, some authors find reason to mix the use of the two consistent conventions within the same paper. Thus while a single conventions would certainly have its advantages, the encyclopedic article certainly should not claim that the preferred (more fortunate) option. Also, the word appeared so many times largely due to repeated information which I tried to cut down on. I did not remove the fourth one. The article still reads "The chain rule applies in some of the cases, but unfortunately does not apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives" where I believe almost all mathematicians would use and understand the word "unfortunately" with respect to the fact that a certain rule was not as simple as one might have guessed, and that the use of the word here adds meaning in the correct way. I'm open to other opinions, of course. : )    Brent Perreault (talk) 20:25, 23 July 2012 (UTC)

Not a numerator layout in the numerator column??
I would really appreciate if someone explained to me why in the numerator layout $$\frac{\partial \mathbf{x}^{\rm T}\mathbf{A}}{\partial \mathbf{x}} = \mathbf{A}^{\rm T}$$. The dimension of $$\mathbf{x}^{\rm T}\mathbf{A}$$ is 1xn, the dimension of $$\mathbf{x}$$ is nx1, thus according to the numerator layout the dimension of the result should be 1xn. All in all, this derivative looks like a derivative of a row vector by a column vector which is hard to interpret.

According to this identity, in a singular case (A=I), we would have $$\frac{\partial \mathbf{x}^{\rm T}}{\partial \mathbf{x}} = \mathbf{I} = \frac{\partial \mathbf{x}}{\partial \mathbf{x}}$$. Was it the intention?

Thanks, Sd1074 (talk) 20:03, 1 July 2012 (UTC)

This is a great question. This seems to be the only place where the article treats the derivative of a a row vector (or at least if you assumed all the preceding vectors to be row vectors, then this would be the first place where it treats the derivative of a column vector). I'm not sure that we can very consistently define the derivative of a row vector with respect to a (column) vector once we have established a convention for the derivative of a column vector. My guess is this identity was taken from a very limited context where this type of derivative could be consistently defined, but that the result does not belong here. Moreover, a clear understanding of these issues should be used to write about the defining of derivatives of transposes (or not defining) in the article. thanks, Brent Perreault (talk) 20:47, 23 July 2012 (UTC)

Computer Algebra Software (CAS)?
I was wondering which CAS supports performing matrix calculus operations?

I know Sage/Maxima can do tensor calculus, but the extensions do not work well with inverses or determinants. The list of CAS tools do not have a column comparing this functionality either. I am hoping someone can provide a list of tools which can automate this process/check my work. — Preceding unsigned comment added by 68.228.41.185 (talk) 04:11, 14 February 2013 (UTC)

Scalar function of a vector function chain rule
Is this case missing?

$$ h({\mathbf{x}}) = g({\mathbf{f}}({\mathbf{x}}))$$

the chain rule can be obtained by (using the chain rule for functions of several variables and the numerator layout)

$$\frac = \frac\frac $$

can I add in the main page?

{\rm tr} is not the same as \operatorname{tr}.
When one writes {\rm tr} in TeX one does not get proper spacing before and after "tr". Thus:
 * $$ a {\rm tr} B \, $$ is coded as   a {\rm tr} B, and
 * $$ a \operatorname{tr} B \, $$ is coded as  a \operatorname{tr} B, and
 * $$ a \operatorname{tr} (B) \, $$ is coded as  a \operatorname{tr} (B).

Writing \operatorname{tr} results in a certain amount of space before and after tr, and there is less space when (round brackets) follow tr than when they don't. The form {\rm tr}, on the other hand, involves no spacing conventions. The form \operatorname{tr} is standard usage and I edited accordingly. Michael Hardy (talk) 21:25, 7 February 2016 (UTC)

Matrix analysis
Not the same topic? Ardomlank (talk) 23:10, 31 March 2016 (UTC)


 * No more than Calculus and Mathematical analysis are. Highly related but not the same. -Apocheir (talk) 18:42, 22 February 2020 (UTC)

Jan R Magnus's criticism
In this edit, added some criticism of this notation used article. However, it seems to be an out-of-date criticism. The first article cited criticizes a version of this Wikipedia page from more than 10 years ago, but the article has been nearly rewritten from scratch since then. The current revision of this page does not use the omega-derivative that Magnus derides, which entails use of an operator called "vec" to turn matrices into vectors). This page, in fact, appears to use what Magnus's preferred calls the alpha-derivative: we call it the numerator layout. The denominator layout is the transpose of Magnus's alpha-derivative. For what it's worth, Magnus's work isn't entirely new to this page: it was referenced in Talk:Matrix_calculus/Archive_2 (although it was in the omega-derivative era, so it got pooh-poohed).

My point is that i've reverted that edit, because it no longer applies. Let me know if I'm wrong, or if there's something from Magnus's articles that isn't included on this page but should be. -Apocheir (talk) 18:40, 22 February 2020 (UTC)


 * I think you made a mistake there. The alpha derivative in "On the concept of matrix derivative" is defined as:



\mathrm{D} \mathbf{F} \left(\mathbf{X}\right) = \frac{\partial \mathrm{vec} \left( \mathbf{F} \left(\mathbf{X}\right)\right)}{\partial \left(\mathrm{vec}\left( \mathbf{X}\right)\right)^{\top}} $$


 * Whereas the omega derivative is used in this article. I give you an example why the alpha derivative is the easier definition:

\mathbf{F} \left(\mathbf{X}\right) = \mathbf{a}^{\top}\mathbf{X}^{-1} $$



\text{d}\mathbf{F} = - \mathbf{a}^{\top}\mathbf{X}^{-1}\text{d}\mathbf{X} \mathbf{X}^{-1} $$


 * apply vec



\text{d}\mathrm{vec} \left( \mathbf{F}\right) = - \mathrm{vec}\left(\mathbf{a}^{\top}\mathbf{X}^{-1}\text{d}\mathbf{X} \mathbf{X}^{-1}\right) = - \left(\mathbf{X}^{\top-1} \otimes \mathbf{a}^{\top}\mathbf{X}^{-1} \right) \text{d}\mathrm{vec} \left(\mathbf{X}\right) $$


 * and



\text{D} \mathbf{F}\left( \mathbf{X}\right) = - \left(\mathbf{X}^{\top-1} \otimes \mathbf{a}^{\top}\mathbf{X}^{-1} \right) $$


 * To my knowledge there is no such elgant way to do this with the omega derivative. Also think about this in a more complicated context where you have to use the chain/product rule. With the alpha derivative you can just use it like you are used in the scalar case, but with the omega derivative you are very fast out of options. Matrixcalc (talk) 19:42, 24 February 2020 (UTC)


 * OK, I misunderstood the article. That said, there are some strong opinions in the talk page archives about whether these derivatives involving $$\mathrm{vec}$$ are better expressed using tensor calculus. I'm not an expert on either matrix calculus or tensor calculus, and I'm neither in a position right now to reevaluate that whole discussion... -Apocheir (talk) 01:54, 17 March 2020 (UTC)

Example says "numerator layout" but is actually "denominator layout"?
At each of the following spots, there's a scalar-by-matrix example labeled "numerator layout," but the examples aren't using the same layout, so one of them must be wrong (the first I think?):

https://en.wikipedia.org/wiki/Matrix_calculus#Scalar-by-matrix

https://en.wikipedia.org/wiki/Matrix_calculus#Numerator-layout_notation — Preceding unsigned comment added by 98.163.18.220 (talk) 13:42, 21 March 2020 (UTC)

[edited to fix second link]

Section on "differential-form first" technique could use some clarification re numerator vs denominator layout
At the start of this section on the "differential-form first" technique, the reader is warned: "It is often easier to work in differential form and then convert back to normal derivatives. This only works well using the numerator layout."

However, a helpful example a bit further up the page uses this "differential-form first" technique to get an answer for the denominator layout (according to the table just below it, anyway). Is there something wrong with the example? Or is the "differential-form first" technique actually fine for the denominator layout, too? Or should the aforementioned warning really say that the technique "only works well using the denominator layout"?

Perhaps someone with expertise in this area (not I, alas) could reconcile this contradiction. — Preceding unsigned comment added by 98.163.18.220 (talk) 14:26, 21 March 2020 (UTC)

Follow-up: I now think that the "problem" is that the example in question tacitly performs a transpose in the final step, which "converts" the answer to denominator layout. Since the section on the "differential-form first" technique warns that the technique "only works well using the numerator layout," then maybe the best course here would be to give the answer in the example in numerator layout instead, and to explain that one would just transpose the solution to convert it to denominator layout. (That's assuming I'm correct.) — Preceding unsigned comment added by 98.163.18.220 (talk) 15:37, 21 March 2020 (UTC)

What is the meaning of $$\mathbf{U} \circ \mathbf{V}$$
In Matrix-by-scalar identities, there is a row:


 * {|class="wikitable" style="text-align: center;"

! scope="col" width="175" | Condition ! scope="col" width="100" | Expression ! scope="col" width="100" | Numerator layout, i.e. by Y
 * + Identities: matrix-by-scalar $$\frac{\partial \mathbf{Y}}{\partial x}$$
 * U = U(x), V = V(x) || $$\frac{\partial (\mathbf{U} \circ \mathbf{V})}{\partial x} =$$ || $$\mathbf{U} \circ \frac{\partial \mathbf{V}}{\partial x} + \frac{\partial \mathbf{U}}{\partial x} \circ \mathbf{V}$$
 * }
 * }

If U = U(x) is a function of x, what is the meaning of $$\mathbf{U} \circ \mathbf{V}$$? 78.91.103.181 (talk) 10:57, 6 August 2021 (UTC)