Talk:Chain rule

Initial comments
This page needs a proof and more rigorous maths.

situations for multivariable should be added: if u is a function of x, and y, and both x, and y are functions of t, then,

$$ \frac{du}{dt} = \frac{\partial u}{\partial x} \frac{dx}{dt} + \frac{\partial u}{\partial y} \frac{dy}{dt} $$

Example 1
I don't see how Ex 1 is a "This calculation is a typical chain rule application." The calculation doesn't seem to involve differentiation which I would argue is the typical chain rule application. --flatfish89 (talk) 18:24, 24 April 2010 (UTC)

New proof
This discussion is outdated. Suggesting deletion. AurelienLourot (talk) 19:30, 30 December 2014 (UTC)


 * Should archive it. Rawsar6 (talk) 19:33, 23 January 2024 (UTC)

I've replaced the current proof with one I saw somewhere. It relies on nothing but the definition of a derivative and the concept of limits. I believe it is quite rigourous and more formal than the previous one, but if there are any flaws with it feel free to point them out or even restore the old one.

Also, if you think it is too long or verbose then correct it.

Someone42 13:43, 25 May 2005 (UTC)


 * Isn't that really just the same proof, though? Written out longer. I don't think it is different from the point of view of rigour. Charles Matthews 14:06, 25 May 2005 (UTC)


 * I just reverted the thing. Someone42, this is not a math paper, not a math exam, not a math book. This is a general purpose encyclopedia.


 * I appreciate the many hours you put in that proof, I appreciate your knowledge of mathematics, and your desire for rigor.


 * However, nobody will read that proof, even mathematicians will gloss over it.


 * According to WikiProject Mathematics/Proofs, proofs are discouraged on Wikipedia to start with, and long formal proofs especially. This has been discussed over and over again, and this seems to be the general view over here.


 * If anything, Wikipedia needs less rigor, not more. This is an encyclopedia for the general public and among its audience mathematicians are less than 10% if not less. Oleg Alexandrov 15:25, 25 May 2005 (UTC)


 * Oh well, in retrospect I think I did go a little overboard. Someone42 09:25, 26 May 2005 (UTC)


 * I can't really agree about 'rigour'. Anything less than a rigorous proof is just an argument/derivation/bootstrapping. However that is not really the issue here. And I agree with Oleg that the need for proofs is not so high. Charles Matthews 15:29, 25 May 2005 (UTC)


 * OK, I did not mean that rigurous proofs are bad overall. I am mathematician too, and I hope I know the value of proofs. I was trying to say that on Wikipedia we can get away with something less than full-blown proofs. So I agree that a proof is not a proof unless it is rigurous. Oleg Alexandrov 22:37, 25 May 2005 (UTC)


 * On the other hand, who but a mathmatician (or at least a maths student) would want to know about something as particular as the chain rule? I think Wikipedia is a good maths textbook and dumbing it down so the general public can understand won't add any value... 203.97.255.167 08:37, 22 May 2006 (UTC) (edit: having looked at that proof though, I have to agree that it was a bit overboard and didn't really add any rigour) 203.97.255.167 08:39, 22 May 2006 (UTC)

In addition, this proof relies on f(g(x+deltax))-f(g(x))=f(g;(x), which it does not. f(g'(x))=f(g(x+deltax)-g(x)).  Additionally, the chain rule is not f'(g(x))=f'(g(x))g'(x), as dividing by f'(g(x)) would give you 1=g'(x), which isn't always true.  The chain rule is f'(g(x))=f(g'(x))g'(x)  —Preceding unsigned comment added by 66.169.198.79 (talk) 06:08, 3 October 2009 (UTC)

Basic idea
I think the basic idea behind the chain rule is getting swamped in a sea of details and special cases. The basic idea of the chain rule is pretty easy to explain in words:

The best linear approximation of the composition is the composition of the best linear approximations.

Every special case of the chain rule, more or less expresses this fact in various situations, with a wide variety of abstraction or concreteness. But, this is the basic idea, and it would be good if it were made a bit more prominent. Revolver 01:43, 9 October 2005 (UTC)

That is a very nice description of the chain rule. I'm not sure I dare edit "one of the 500 most frequently viewed mathematics articles," but I'd like to see the above sentence appearing prominently in the article. David Bulger (talk) 03:56, 5 July 2010 (UTC)

Mix-up??
If I'm not completely mistaken, the definition mixes up f and $$f \circ g$$. Also, x isn't some kind of magic symbol, so $$f \circ g = f(g(x))$$ doesn't make sense either. It says that the composition of f and g (which is a function) is equal to a certain value of that function, namely that at x. I think the whole thing should read


 * In algebraic terms, the chain rule (of one variable) states: Given function f that is differentiable at g(x) and a function g that is differentiable at x, then the composite $$h(x) = f(g(x))$$ (or shorter: $$h = f \circ g$$) is differentiable at x and

\frac{d}{dx} h(x) = \frac{d}{dx} f(g(x)) = f'(g(x)) \; g'(x). $$

Or leave out h completely and just write:

\frac{d}{dx} f(g(x)) = f'(g(x)) \; g'(x). $$

--K. Sperling (talk) 00:37, 12 November 2005 (UTC)


 * I agree with both points, the current definition is both confusing and (to my understanding)informal. Ariel brunner (talk) 10:18, 14 July 2013 (UTC)

Chain rule for several variables
I would like to see an extensive expansion of the chain rule for several variables. Thanks, Silly rabbit 06:00, 15 November 2005 (UTC)


 * What kind of information would you like to see added? I might try to do this. --Monguin61 03:56, 10 December 2005 (UTC)

Is the notation in the section about the chain rule of higher dimensions really correct? For example $$D(f \circ g) = Df \circ Dg$$ and $$ J_{\mathbf{a}}(f \circ g) = J_{g(\mathbf{a})}(f)J_{\mathbf{a}}(g)$$, among others. To me it seems wrong, or at least misleading. In my opinion it should be $$D(f \circ g) = (Df \circ g)Dg$$ and $$J_{\mathbf{a}}(f \circ g) = (J_{g(\mathbf{a})}(f)\circ g)J_{\mathbf{a}}(g)$$ using the notation style from the original author. I personally would drop the subscripts as well in this case:$$J(f \circ g) = (J(f)\circ g)J(g)$$. --Vilietha (talk) 05:52, 12 June 2012 (UTC)

Derivative of composite function
Since the "composition of two functions" is technically a "composite function", would its derivative be called a "composite derivative"? Also, does the secondary (inside) derivative have a special name (something like "harmonic derivative"——though I think that term means something else)? ~Kaimbridge ~20:15, 3 February 2006 (UTC)
 * I did not hear this terminology before. Oleg Alexandrov (talk) 00:27, 4 February 2006 (UTC)
 * What, "composite function"? I believe that is the legitimate term.  See Thesaurus.Maths, PinkMonkey, ThinkQuest.  I just wanted to know if its derivative had a special name, and if the inside derivative had a special name. ~Kaimbridge ~14:48, 4 February 2006 (UTC)
 * I never heard of composite derivative of harmonic derivative. Oleg Alexandrov (talk) 17:30, 4 February 2006 (UTC)
 * No, the only term that I know is valid is "composite function". I'm asking if the derivative of a composite function would be referred to as a "composite derivative" (and, if not, does it have a special, unique name——other than just "derivative of a composite function").  Likewise, does the inside derivative (e.g., for $$\frac{d}{dx} f(g(x)),\; g'(x)$$) have a special name——"harmonic derivative" is just some reasonable sounding possibility that I used as an example, not that I'm in any way implying that that is what it is actually called. P=) ~Kaimbridge ~18:12, 4 February 2006 (UTC)
 * That's what I am trying to say. As far as I know, the answer to your questions is "no". I never heard of "composite derivative" or "harmonic derivative". They don't have any special name, either that or any other one, and I don't know why would any body ever want those things to have any special name. Oleg Alexandrov (talk) 19:41, 4 February 2006 (UTC)

Role of eta in proof
This discussion is outdated. Suggesting deletion. AurelienLourot (talk) 19:33, 30 December 2014 (UTC)

I have no formal training in maths so forgive me if I sound naiive. Near the bottom of the proof it says "Observe that as $$\delta\to 0,$$ $$\frac{\alpha_\delta}{\delta}\to g'(x)$$ and $$\frac{\eta(\alpha_\delta)}{\delta}\to 0$$." Would I be correct to think that "$$\frac{\eta(\alpha_\delta)}{\delta}\to 0$$" shows that the "error" (right word?) involved goes to zero as delta goes to zero? 202.180.83.6 03:52, 16 February 2006 (UTC)

Examples 1 and 2
the primes in example 1 and 2 (where it says f'(x) = ) are very difficult to see. They look exactly like f(x). It may cause a lot of confusion, is there any way to make the primes in f'(x) stand out?

Exceptions
I think that the exceptions to the rule should be mentioned. For instance, $$\frac{d}{dx}\sqrt{25-x^2} \neq \frac{1}{2\sqrt{x}}$$, as would be suggested by the power rule. The problem is that sqrt{x+a} is just sqrt{x} shifted back, but x+25 still differentiates to 1. This means that according to the power rule, that constant that's added to x has no affect on it, even thought the derivitive should be $$\frac{-x}{\sqrt{25-x^2}}$$. Which doesn't follow directly from the Chain Rule. He Who Is 23:33, 3 June 2006 (UTC)


 * Umm... you are missing the whole point of the chain rule? If $$y=\sqrt{u}$$ and $$u=25-x^2$$, then $$\frac{dy}{dx} = \frac{dy}{du} \frac{du}{dx} = \frac{1}{2\sqrt{u}} (-2x) = \frac{-x}{\sqrt{25-x^2}}$$ --Spoon! 11:15, 31 August 2006 (UTC)

Chain rule in Probability
All that calculus gives me a headache! I have no idea if and/or how they relate, but perhaps the chain rule pertaining to probability theory deserves a place somewhere on this page? You know, the P(A1 n A2 ... n An) = P(A1)P(A2|A1)P(A3|A1 n A2)...P(An|n n-1/i=1 Ai) thing, sorry about the crummy representation. 218.165.75.221 10:04, 30 September 2006 (UTC) M.H.

Uh, how about no. 69.215.17.209 14:41, 22 April 2007 (UTC)

Personally, I would like to see some detail about the Chain Rule for probability theory. I have been looking around the internet, and have not been able to find a discussion of it (ideally a step-by-step example or a detailed proof). So, it would be a good thing if wikipedia included something on it. Should the Chain Rule for probability theory be included on this page, the page for probability theory, or it's own page [ex. Chain Rule (Probability Theory)]?? SteelSoul (talk) 17:50, 2 February 2009 (UTC)


 * I am adding this page. CharlesGillingham (talk) 21:41, 17 September 2009 (UTC)

(f(g(x)))' is incorrect and confusing
The statement (f o g)'(x) = (f(g(x)))' is incorrect and fundamentally misunderstands the prime (f') notation. The prime is a transformation from functions to functions; as such it should be applied before the variable x is evaluated, as on the LHS but not as on the RHS. 69.215.17.209 14:44, 22 April 2007 (UTC)

I see that this edit was reverted, with the argument that this notation is common enough to be included. I understand that people may sometimes use it (I've never seen it myself, and I challenge anyone to produce examples from a common calculus text), but it is not standard and pedagogically very confusing. It is ambiguous whether the constant f(x) or the function f is being differentiated. I do not think that such poor notation should be perpetuated in an encyclopedia, without evidence that it is at least commonly used. --69.212.231.101 03:52, 26 July 2007 (UTC)


 * Since the chain rule is a fairly basic concept in calculus and most people at that level haven't taken analysis would it be appropriate to add a note explaining the meaning of the composition operator? —Preceding unsigned comment added by 76.199.5.236 (talk) 06:47, 13 January 2008 (UTC)


 * Note: The objection to f(g(x))' refers to an earlier version of the article, and is no longer current.  Silly rabbit (talk) 13:51, 13 January 2008 (UTC)


 * I agree that, in an article about such an elementary concept in calculus, it is inappropriate to use composition operator. CharlesGillingham (talk) 00:21, 16 September 2009 (UTC)
 * Seconded, no need for this operator that's only encountered in 3rd/4th year undergraduate degree courses, this page should be targeting a much less advanced audience. Mirams (talk) 16:27, 14 February 2020 (UTC)

f prime is invisble inline
Perhaps resulting from corrections above, f prime is now invisible when inline (at end of first proof). g prime is visible inline due to different glyph for g not overlapping the prime in the way that it overlaps for f. f prime works ok with display notation instead of inline. Should be a simple fix but I don't know how. 58.175.211.1 (talk) —Preceding undated comment added 15:26, 7 February 2013 (UTC)

Examples
The examples could use some more description, depending on if we're shooting for "definition" or "instructional detail." Substituting U for X^2+1 makes the plug 'n' chug easier, but it's not strictly necessary. Any objections to expanding current objections to include various applications of chain and detailed description of how and why subsitutions are valid?--Legomancer (talk) 22:45, 8 September 2009 (UTC)

Comment from WPM
For any coordinate (real valued function) y on a line (e.g. the real line) and any point p, denote by dyp the equivalence class of y-y(p)1 (where 1 is the constant function, with value 1) modulo functions vanishing at p to higher order. If y=f(x) (i.e. y = f &#x26AC; x for some other coordinate x on the same line and some f:R&rarr; R) then the definition of differentiability of f at x(p) ensures that dyp = f'(x(p)) dxp because f(x)-f(x(p))1 differs from f'(x(p))(x-x(p)1) by a function vanishing at p to higher order.

The chain rule is an immediate consequence. If u = g(y) then, omitting evaluations/subscripts at p, du = g'(y)dy= g'(f(x))f'(x) dx.

Most arguments formalize this basic idea without discussing the conceptual meaning. Geometry guy 00:57, 2 November 2010 (UTC)


 * The original proof that was in the article is close to being a direct formalization of this. Unfortunately, it looks like it had gotten a bit over-edited, since it included a version of the proof for several variables that was not really consistent with this philosophy.  I've put back in what I believe is an easier to read version of the old proof.  I wasn't sure whether and how to address the case of several variables: really the article should have a proper statement of the chain rule of a function between two (finite dimensional) Euclidean spaces.   Sławomir Biały  (talk) 11:22, 3 November 2010 (UTC)

Correctness of first proof?
I was revisiting the first proof, and I've come to the conclusion that I don't think it's correct; at least, not in spirit. I was trying to rewrite it from scratch (which is my normal style), and the best I could do was as follows:


 * One proof of the chain rule begins with the definition of the derivative:
 * $$(f \circ g)'(a) = \lim_{x \to a} \frac{f(g(x)) - f(g(a))}{x - a}.$$
 * Assume for the moment that g(x) does not equal g(a) for any x near a. Then the previous expression is equal to:
 * $$\lim_{x \to a} \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \cdot \frac{g(x) - g(a)}{x - a}.$$
 * When g oscillates near a, then it might happen that no matter how close one gets to a, there is always an even closer x such that g(x) which equals g(a). For example, this happens for  near the point .  To work around this, we introduce a function Q as follows:
 * $$Q(y) = \begin{cases}

\frac{f(y) - f(g(a))}{y - g(a)}, & y \neq g(a), \\ f'(g(a)), & y = g(a). \end{cases}$$
 * Q is defined wherever f is. Furthermore, because f is differentiable at g(a) by assumption, Q is a continuous function at g(a).  Now we consider the function:
 * $$Q(g(x)) \cdot \frac{g(x) - g(a)}{x - a}.$$
 * Whenever g(x) is not equal to g(a), this product is clearly equal to the difference quotient for f ∘ g at a because the factors of g(x) - g(a) cancel.
 * They do not cancel if x != a but rather they yield the difference between g(x) and g(a) Mo5h (talk)

When g(x) equals g(a), THEN A MIRACLE OCCURS and this product is still equal to the difference quotient. Hence we can compute the derivative of f ∘ g at a by computing the limit as x goes to a of the above function. This limit exists because the above function is a product and the limit as x goes to a of each of its factors exists. Furthermore, because Q is continuous, the limit of the first factor equals f&prime;(g(a)), and by definition the limit of the second factor equals g&prime;(a). This proves the chain rule.

The problem is that when g(x) equals g(a) and x is not a, the miracle doesn't occur; the value of the product is f&prime;(g(a)) times zero, which is zero. If we were to take the limit instead of evaluating, then the miracle would occur, but I then don't know how to prove that the limit computes what we want. If we could split up the product, then the miracle would occur, but then we need to show that the limit of the product exists, and I don't know how to prove that directly. The standard proofs get around this by explicitly measuring error terms; when we approach things that way, we never see the zero product, hence the miracle occurs. The whole reason we have this proof, though, is because it avoids error terms, and if we have to introduce them to make this work then there's no point in keeping this proof. So I'm stuck; I don't see how to fill this gap. In fact, as far as I can tell, since the product is zero this proof is just wrong.

The article presently seems to ignore this difficulty. It glosses over it by introducing Q only at the end and ignoring the need for a miracle. But as far as I can tell, it has exactly the same problem. Am I missing something? Or what? Ozob (talk) 04:32, 11 December 2010 (UTC)


 * The proof can be made valid, but as your experience shows, only at a considerable cost in complexity and comprehensibility. There used to be a single proof, which had been there since 2004, and which I corrected on 11 November 2008. This remained there until 31 October 2010, when it was replaced by the current first proof. On 2 November 2010 a form of the original proof was added back as "second proof", and on 16 November I repeated the correction to it which I'd made two years previously. The only possible advantage to the first proof is that it's somehow more intuitive; the trouble is, to make it sound it is necessary to complicate it so much that this advantage vanishes. The solution now is to delete the "first proof" and then we can all go round the loop again. SamuelTheGhost (talk) 23:38, 11 December 2010 (UTC)


 * OK, I figured it out! When I need a miracle to happen, the difference quotient is equal to zero (which in retrospect is obvious), so the miracle happens!


 * I'm putting a (revised version) of the above proof into the article as the first proof. Ozob (talk) 20:30, 12 December 2010 (UTC)

Error in 'First example'
In the example "Suppose that a skydiver ..." the formula g(t) = 4000 − 9.8t2 should be replaced with g(t) = 4000 − ½9.8t2 isn't it? 2.36.204.64 (talk) 22:21, 19 January 2011 (UTC)

Suggested changes to 'First example'
1) clarify 2nd bullet from "...rate of change in atmospheric pressure at height..." to ...rate of change in atmospheric pressure w.r.t. h, at height...

2) clarify 4th bullet from "...rate of change in atmospheric pressure t seconds after..." to ...rate of change in atmospheric pressure w.r.t. t, t seconds after...

3) the bottom paragraph that starts "It is not true..." is misleading and includes an error. I would end it with the sentence "This need not have anything to do with the buoyant force ten seconds after the skydiver's jump." and start a new paragraph just below that states the following: It is true that (f o g)'(t) = f'(h) * g'(t).  To find the buoyant force w.r.t. t ten seconds after his jump, we must evaluate g(10), his height ten seconds after he jumps, and substitute the result into f'(h).  g(10) is 3510 meters above sea level, so the true buoyant force w.r.t. t ten seconds after the jump is (proportional to) f'(3510) * g'(10) = 7.133 * -98 = -699.

This example demonstrates the Chain Rule as the product of two rates. The last sentence that states "g(10) is 3020 meters above sea level, so the true buoyant force ten seconds after the jump is (proportional to) f'(3020)." is erroneous. To use the Chain Rule you need to multiply by f'(g(t)) by g'(t). —Preceding unsigned comment added by 69.117.93.37 (talk) 04:41, 31 January 2011 (UTC)


 * I've changed the article. Ozob (talk) 12:34, 31 January 2011 (UTC)

Evaluation
(Copied from WT:WPM. Ozob (talk) 02:07, 2 March 2011 (UTC))

The article titled chain rule currently says:
 * The chain rule is frequently expressed in Leibniz notation. Suppose that u = g(x) and y = f(u).  Then the chain rule is
 * $$\left.\frac{dy}{dx}\right|_{x=c} = \left.\frac{dy}{du}\right|_{u = g(c)} \cdot \left.\frac{du}{dx}\right|_{x=c}.\, $$
 * This is often abbreviated as
 * $$\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}.$$
 * However, this formula does not specify where each of these derivatives is to be evaluated, which is necessary to make a complete and correct statement of the theorem.
 * However, this formula does not specify where each of these derivatives is to be evaluated, which is necessary to make a complete and correct statement of the theorem.

Does this last form really fail to "specify where each of these derivatives is to be evaluated"? It seems to me that the first form above clutters things in such a way as to interfere with understanding, and that the second, read correctly, doesn't really fail to do anything that should be done.

Opinions? Michael Hardy (talk) 23:04, 1 March 2011 (UTC)


 * I'm with you on this one. The sentence isn't really Wikipedia-appropriate, anyway -- at best that's textbook language. CRGreathouse (t | c) 01:47, 2 March 2011 (UTC)


 * Well, since I'm the one who wrote that sentence, I think I should defend it. But I'm going to do so on Talk:Chain rule, not here. Ozob (talk) 01:49, 2 March 2011 (UTC)


 * OK, here's my defense. Yes, that last form really does fail to specify where the derivatives are to be evaluated. That's obvious because it leaves the evaluations out. I think your real objection is: Does anyone really need to specify where the derivatives are to be evaluated, or is it always safe to leave them out and let them be implicitly understood? I'm going to proceed assuming that this is your real objection.


 * I don't think it is. For a student first learning about the chain rule, the relationships between y, x, u, f, and g will not all be clear. While we don't intend the article to be a textbook treatment of the subject, we should target a very low-level audience—which includes students learning about the chain rule for the first time. Because of that I don't think we can assume that our audience will be able to infer anything about where the derivatives should be evaluated. In particular, I'm worried that they won't be able to guess that dy/du should be evaluated at g(c). I think if you were to ask most students, you'd probably get back nonsense, like saying that it should be evaluated at x or at u. I think it is much better for the article to spell out all the details. I admit that the article already does this when it gives the formula f&prime;(g(x))g&prime;(x); but I think that it is still a good idea to give a full and correct statement in the Leibniz notation, too.


 * I'm not particularly tied to those words, though. If they sound overly textbook-ish, then they ought to be changed. Maybe it would be good to just leave out that last part and stop after the second displayed equation? Ozob (talk) 02:07, 2 March 2011 (UTC)

If y = g(u) and u = f(x), then the point at which to evaluate dy/du is u and the point at which to evaluate du/dx is x. That seems obvious. The extra notation will be confusing. Michael Hardy (talk) 04:32, 2 March 2011 (UTC)


 * Both formulas should be included. The short one has its merits a good mnemonic device, and the former is indispensable for properly understanding the formula, as Ozob suggested.  Tkuvho (talk) 05:55, 2 March 2011 (UTC)


 * My objection is that u and x are variables and, if you're being careful, it doesn't make sense to evaluate anything at them. The short form of the Leibniz notation chain rule is the equivalent of the statement (f ∘ g)′ = f′g′. If you were to teach this to most students, they'd believe that (f ∘ g)′(x) = f′(x)g′(x), which is wrong. It's wrong because f′ is a function of u and should be evaluated at u = g(c), just like the first displayed equation shows. Someone with experience can deduce the right place to evaluate f′ by looking at its domain, but I don't think our target audience can. Ozob (talk) 11:58, 2 March 2011 (UTC)

I disagree: it does make sense to evaluate a function at a variable.
 * $$ \text{If } y = f(u)\text{ and } u = g(x) \text{ then }(f \circ g)'(x) = f'(u)\cdot g'(x). \, $$

There you have evaluation of a function at the variable x and evaluation of a function at the variable u. Are students really going to mistakenly assume I mean the pointwise product of &fnof; and g if I write it that way? I don't think so.

And if I write
 * $$ f'(g(x)) \cdot g'(x), \, $$

is that not also "evaluation of a function at a variable"? Michael Hardy (talk) 20:14, 2 March 2011 (UTC)
 * Well, I don't really want to discuss the meaning of the word "evaluation", but I disagree (or at least think the point is arguable). Regardless, the analog of (f ∘ g)′(x) = f′(u)g′(x) in Leibniz notation would be:
 * $$\frac{dy}{dx}(x) = \frac{dy}{du}(u)\frac{du}{dx}(x),$$
 * or maybe
 * $$\left.\frac{dy}{dx}\right|_x = \left.\frac{dy}{du}\right|_u \left.\frac{du}{dx}\right|_x.$$
 * As I said above, dy/dx = (dy/du)(du/dx) is analogous to (f ∘ g)′ = f′g′, and I don't think either of those are clear to the novice. Ozob (talk) 02:06, 3 March 2011 (UTC)

To write
 * $$ \left.\frac{dy}{du}\right|_u $$

is at best redundant. That u is where it's evaluated is inherent in the meaning of the Leibniz notation. It's hard to see how anyone could mistakenly think otherwise. That's why this whole thing about evaluation is pointless. Michael Hardy (talk) 18:24, 3 March 2011 (UTC)


 * The telegraphic formula dy/dx = dy/du * du/dx is a good mnemonic device but it is too abbreviated to be self-explanatory. The fact that students frequently make the mistake of evaluating the first factor (dy/du) at the wrong point is amply illustrated in this very page, which contains a detailed discussion of such a typical error in the context of a physics example.  I agree that there is a problem with writing $$ \left.\frac{dy}{du}\right|_u $$, but the problem is not that it is redundant, but that it is too telegraphic: it should be u=g(c) or something.  Tkuvho (talk) 18:46, 3 March 2011 (UTC)
 * I suspect I know far more about frequent errors in calculus than does anyone else writing here, and I am not aware that that is a frequent mistake, and find it implausible. Please specify where I can find that error in a physics example.  Writing $$ \left.\frac{dy}{du}\right|_{u=g(c)} $$ amounts to refusing to use the Leibniz notation at all. Michael Hardy (talk) 18:57, 5 March 2011 (UTC)
 * Myself, I don't think I ever claimed it was a frequent mistake. But that does not mean it is not a source of confusion. I remember being confused by the shape of the chain rule when I first learned calculus: I wondered why it was so asymmetric, with f and g playing such apparently different roles.
 * I do not want to be pedantic and insist that everyone put evaluation bars everywhere at every use of the chain rule. But I do want the chain rule correctly and fully stated in Leibniz notation at least once, and that's impossible without evaluation bars. Ozob (talk) 20:43, 5 March 2011 (UTC)


 * I agree. Hardy's deletion of this material does not reflect a consensus here and should be reverted.  Tkuvho (talk) 07:46, 6 March 2011 (UTC)
 * I've put it back in but reorganized it somewhat. I'll be back for more later...... Michael Hardy (talk) 17:58, 6 March 2011 (UTC)


 * Thanks. Tkuvho (talk) 12:57, 7 March 2011 (UTC)


 * I have edited the section a little. I have a specific objection to the sentence beginning "as always": While that is usually what's done, it is entirely possible to evaluate these functions as some other value. It may even be useful.  So I've taken that sentence out.  I am pretty happy with how the section is now.  Ozob (talk) 22:54, 7 March 2011 (UTC)

Multivariate chain rule
I might misunderstand the notations, but why is the multivariate chain rule written as:


 * $$D_{\mathbf{a}}(f \circ g) = D_{g(\mathbf{a})}f \circ D_{\mathbf{a}}g$$

Why is there a composition on the RHS, and not a product of derivatives, as in the univariate case ? Is it to be understood as a matrix operation, in which case composition corresponds to a product, and it that case, shouldn't this be explicitely signaled ? Donvinzk (talk) 11:34, 4 June 2011 (UTC)


 * They are meant to be understood as linear transformations. The composite is the usual composite of functions.  By choosing a basis you can construct Jacobian matrices, and the composite of linear transformations corresponds to the product of Jacobian matrices.  I've revised the article so that it explains this in more detail.  Is it clearer now? Ozob (talk) 21:02, 4 June 2011 (UTC)

Further generalizations
[...]

This is exactly the formula D(f ∘ g) = Df ∘ g + Dg ∘ f.

Shouldn't this also be D(f o g) = Df o Dg ?

Second Proof
I have doubts in the comment that the second proof does not need a theorem about products of limits. In the intermediate step, we need to consider the product of $$\varepsilon(h)$$ and $$\eta(k_h)$$, which is equivalent to [Q-f'(g(a))]*{[g(x)-g(a)]/(x-a) - g'(a)}. My thought is that both proofs rely on the same theorem. However, I would like to get comments from more experienced editors before changing anything in the article.202.130.125.147 (talk) 09:04, 14 September 2013 (UTC)


 * Nice catch! You're entirely correct.  I've updated the article.  Ozob (talk) 14:14, 14 September 2013 (UTC)

Notational consistency/rigor
I understand the need to make the article accessible for those who have not had extensive math education, but at the same time there are massive inconsistencies (and outright errors) in notation on this page that really offend the sensibilities of anyone who had studied math, and are bound to cause confusion if people take certain bits of the notation as they are formally stated. There's no need for there to be a dichotomy between formal accuracy and lucidness, and I'd argue that beginning students in calculus are poorly-served by having a resource which sacrifices accuracy for naive intuition.

For starters, we really ought to remove all ambiguity between f ∘ g and f; when you mean the former, write the former, and when you mean the latter, write the latter. These are conflated all over the place (the "higher order derivatives" section, for one, in which none of the stated formulas are formally true), and it is incorrect and confusing.

Additionally, any use of Leibniz' notation where the argument is inside the differential is abusive; it's a mild abuse of notation, granted, but it really should be avoided where it can be.

Thoughts?

129.2.129.149 (talk) 13:44, 25 October 2013 (UTC)


 * For perfect rigor, one should consult a mathematical analysis textbook rather than a general purpose encyclopedia. Recent edits have introduced the rather bizarre variant of Leibniz notation $$\frac{df}{dx}(x)$$ which is unlikely to be familiar to most readers.  The section on the quotient rule is now less clear than in was before, partly due to the awkward notation and partly due to the misguided attempt to make it "rigorous".  I have similar objections to the changes to the lead, although I would not revert that but instead prefer to write the chain rule in the more usual Leibniz notation $$\frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}$$.  It may also be appropriate to include the statement in Newton's notation as well, but mixing the two notations in the same expression is a mistake.  Some people may feel that Leibniz notation is an abuse, but it is still a very standard and familiar notation.  Wikipedia is an encyclopedia, not Bourbaki.  We should strive to write things in standard ways; it's not our place to correct such abuses.  Sławomir Biały  (talk) 17:07, 25 October 2013 (UTC)


 * That's not a "bizarre variant" at all, though. Writing df(x)/dx is meaningless, because f(x) is not a function.  You cannot differentiate a fixed value; it is the same error as writing f(x)' instead of f'(x), which has been been discussed on this page before.  I agree that the quotient rule section is ugly with Leibniz' notation, which is why we'd probably do better to simply write all of it in Lagrange's notation, which does not suffer that problem. 129.2.129.149 (talk) 17:39, 25 October 2013 (UTC)


 * The formula in the lead as it stands is, simply put, wrong. The Leibniz formula as stated by Sławomir is standard and correct, and has a natural interpretation with variables being implicit functions of a single global parameter.  Using the same name for a "variable that depends on a global parameter" and for "a function that takes an explicit parameter" (as in f = f(x)) leads to serious problems.  Thus, $df⁄dx$(x) runs afoul of this, since $df⁄dx$, which is implicitly a function of some understood parameter, is now being abused, with x being two unrelated symbols in the same expression, as well as invention the interpretation of "derive from the variable dependent on the global parameter a function that is independent of the global parameter by substituting x for global parameter.  Incidentally, this construction is undefined, since the "global parameter" is pretty arbitrary: f is a different function of the global parameter, depending on the choice of parameter.  — Quondum 18:08, 25 October 2013 (UTC)


 * The "dx" in df/dx is not a variable. It is there largely as historical accident, and plays no role at all in the formal definition of the notation (namely, the limit expression of a derivative).  It makes no sense, in the language of formal mathematics, to speak of "a function of a variable."  A function is simply a set of 2-tuples, and when you give it an argument it ceases to be a function.  (df/dx)(x) is the derivative of f (which is a function) evaluated at x (which is a value in the domain of f, and thus in the domain of df/dx), that is to say, it is a value in the image of df/dx.  That is all.  df/dx has no formal tie to 'x', nor does df/dx mean f'(x).129.2.129.149 (talk) 19:41, 25 October 2013 (UTC)


 * Wrong. The "dx" in the expression df/dx does indeed refer to the variable of differentiation in the Leibniz notation.   Sławomir Biały  (talk) 20:19, 25 October 2013 (UTC)


 * Formally define "df/dx" given f:R->R, f differentiable. Please. 71.163.32.84 (talk) 21:23, 25 October 2013 (UTC)


 * It's the derivative of f with respect to the variable x. Since this seems to be an unfamiliar notion to you, may I suggest that you refer to a textbook on calculus?  For instance, the famous textbook on the subject by Richard Courant and Fritz John has the following example:
 * $$\frac{d\sqrt{x}}{dx} = \lim_{h\to 0}\frac{\sqrt{x+h}-\sqrt{x}}{h}$$
 * etc.  Sławomir Biały  (talk) 21:56, 25 October 2013 (UTC)


 * Cut the sarcasm; I said "formally." The word "derivative" is not meaningful in lieu of a definition, especially when the issue in question is precisely the specifics of the notation.  Given the standard limit definition of a derivative, what you have claimed is provably false.  Mathematics is a formal system; things mean only what you define them to mean, and if your definitions are inconsistent then you have done it wrong.  Give a suitable limit expression for 'f', or something similar.  Note that sqrt(x) is not a function, so what you have quoted does not answer my question. 71.163.32.84  (talk) 22:00, 25 October 2013 (UTC)


 * So Courant and John are provably wrong? The statement "sqrt(x) is not a function" is baffling.  What could this possibly mean?  Also the statement "mathematics is a formal system" reflects only a very narrow view of mathematics that does not encompass most applications of calculus to the sciences.  Presumably this narrow view is also the source of your misapprehensions about this article.   Sławomir Biały  (talk) 22:22, 25 October 2013 (UTC)


 * If x is a real number, sqrt(x) is, also, a real number. Functions are not real numbers; if f:R->R, then an element in the image of f is *not* f.  If f(x) = sqrt(x), then f(x) is *not* a function; rather, f is a function, and f(x) is the specific value in the image of f(in this case, sqrt(x)) mapped to by x.  This is a crucial distinction in mathematics.  What is written in that textbook is an abuse of notation, and leads to further errors such as writing the convolution of some function f with some other function g evaluated at x as f(x) * g(x) instead of (f*g)(x).  Admittedly, this abuse of notation alone, when dealing only with differentiation of real numbers, is not likely to cause any errors; thus, its inclusion in a calculus textbook is not such a bad thing, but it is formally incorrect. 71.163.32.84 (talk) 22:48, 25 October 2013 (UTC)
 * Your error is in thinking that x refers to a particular number. It is not. It is a mathematical variable.   Sławomir Biały  (talk) 01:03, 26 October 2013 (UTC)
 * Are you not familiar with Analysis, or formal math in general? If we say "let x be a real number" (and we must say this if we are evaluating f(x) for some f:R->R), then any claim we make about x must hold if we replace x with a specific real number.  This is the essence of axiomatic, rigorous mathematics.  'x' is a real number, in that it represents any mathematical object which obeys the axioms of the real number; if we offer no other definition of 'x' other than "let x be a real number," then it necessarily follows that statements which are provably true about 'x' are also provably true about, say, 5, or sqrt(3), or 98/3.  There is no magic "variable" designation that you give to 'x'; you only say that 'x' is an object which obeys the set of properties which define the real numbers. In a purely formal sense, it is vacuous to say something like "f is a function of x," and it is, simply, incorrect to use f(x) as a function and not as the value mapped to by x. 71.163.32.84 (talk) 01:28, 26 October 2013 (UTC)
 * I don't see why my familiarity with analysis or lack thereof could possibly be relevant here. (But since you brought up credentials, I am actually an expert in real analysis and have written research papers on the subject, and taught undergraduate and graduate level courses in it at major universities.  I find it extremely unlikely that you possess similar qualifications.)  The real question is, why should a general purpose encyclopedia article on the chain rule eschew the completely standard notation and approach that is used in essentially all textbooks on calculus in favor of the "axiomatic", "rigorous" perspective that you appear to be pushing?  This is an article on basic calculus, not mathematical analysis.  Certainly the "variables" approach to functions is less favored in modern higher courses on mathematical analysis.  But that does not make it meaningless to say that x and y are real variables, y depends on x, and dy/dx is the derivative of y with respect to x, just because it is awkward to give an axiomatic definition of these statements.   Sławomir Biały  (talk) 02:18, 26 October 2013 (UTC)
 * We'll have to disagree there, then; I do not think there is any good reason to use notation which is not formally consistent when proper notation is only trivially more complicated.
 * At any rate, the current quoted formula in the article lead does need to be changed to something else, because it is most definitely not the chain rule for f∘g (clearly, df/dx is not the derivative of f∘g for any reasonable use of notation). If we wish to nonrigorously use "dy/dx" to refer to the derivative of some expression y (rather than a formally-defined function) with respect to a variable in the expression x then we should do so explicitly, and if we're going to abuse the notation, we should at least do it in a way that suggests the rigorous interpretation; what is written does not, and is not a good representation of the chain rule by any metric. 71.163.32.84 (talk) 02:37, 26 October 2013 (UTC)


 * (Edit conflict) It is a variant that doesn't seem to appear widely in published mathematical sources. In contrast, the notation you are objecting to is thoroughly standard.  Sławomir Biały  (talk) 18:15, 25 October 2013 (UTC)
 * Perhaps if Newton and Leibniz had gotten along they could have worked together on a single consistent notation instead of leaving us with two incompatible ones. But a worse mistake is trying the Frankenstein them together for the sake of some misguided sense of rigor, for example the expression $$\frac {df\circ g}{dx} = \frac {df}{dx}\circ g \cdot \frac {dg}{dx}.$$ The Leibniz notation only makes sense in terms of a dependent variable and an independent variable while Newton's makes sense with functions. At some level the notion of dependent variable is logically equivalent to a function but that doesn't mean you can mix them together. For example u=2v ∴u(2)=4 is nonsense, since I could add u=2v, v=x2, u=2x2, ∴u(2)=8. There's a modern habit of writing expressions such as $$\frac{df}{dx}$$ but this should be understood as an abuse of notation meaning the rate of change wrt x of the variable defined by the expression f(x). It's ok to abuse notation up to a limit for convenience but the recent changes go beyond this. So I'd go further than SB and say the changes are not only nonstandard but incorrect according to the traditional use of the notation. In Newton's notation the correct formula is (f∘g)'=f'∘g ⋅ g', so I would use that in the lead and explain the Leibniz notation in a separate section. --RDBury (talk) 21:26, 25 October 2013 (UTC)
 * Every Analysis text I've ever seen defines df/dx = f'. This is not abusive at all if it is taken as a formal definition; in fact, any other use of Leibniz notation is an abuse as one cannot differentiate a value, one must differentiate a function.  The axiomatic definition of the derivative demands that the argument be a function; one cannot give the function an argument before differentiation, as it then ceases to be a function.  The current lead is most certainly wrong, at any rate, since f and f∘g are not the same function; either change the paragraph leading up to it to remove any mention of f∘g, or change the notation.  You cannot have it both ways.


 * It is also worth noting that neither Newton nor Leibniz were working with anything that particularly resembles modern mathematical formalisms, so I'm not sure what they originally thought about the notation is particularly relevant in any capacity other than a historical one. 71.163.32.84 (talk) 21:34, 25 October 2013 (UTC)


 * It seems to me that you are denying the existence of variables. It also seems that you are asserting that a function is equal to a set of ordered pairs.  Both of these seem rather strange to me.  The former seems strange because one needs a name for elements of the domain of f.  If you have studied topological spaces or manifolds then I'm sure you will recognize the need to keep different domains straight.  Also, you seem to want a name like x to refer to a single element of a domain at a time; but when one writes something like f(x) = x3 + 2, one always means, "for all x in the domain, it is true that f(x) = x3 + 2".  While it's true that without additional information you haven't defined your function (the domain could be all manner of things, such as the real numbers, the complex numbers, the rational numbers, the elements of a finite field, etc.), this problem is absent if x is treated as a variable taking values in the domain of the function (and if the domain of x has been specified beforehand).  Then there is no need to say that f : R &rarr; R (or whatever the domain and codomain actually are).


 * Regarding functions, it is not true to say that a function is always just a set of ordered pairs. That depends on your foundations for mathematics.  The definition of a function as a set of ordered pairs is very common in set-theoretic foundations of mathematics, but it's not the only possible one.  Even in set-theoretic foundations, one sometimes considers a function f : X &rarr; Y to be a triple (X, Y, &Gamma;f), where &Gamma;f is the graph (the set of ordered pairs).  But there are non-set-theoretic foundations for mathematics, and in these, a function isn't defined in terms of its graph.  In category-theoretic foundations, functions ("morphisms", in category theory jargon) are a kind of primitive object.  As a third example, in type-theoretic foundations (such as the homotopy type theory that's gotten so much press lately), for every two types X and Y there is a type for functions X &rarr; Y, and that type is a primitive concept.  So in neither of these foundations is a function defined in terms of its graph.


 * You seem to care very much about being rigorous and precise, and I agree that those are necessary for doing mathematics properly. But I'd also like to say that there is also a lot to be said for doing mathematics heuristically.  Analysis didn't have rigorous foundations for hundreds of years, but that does not make the work of Euler or Gauss any less correct or important.  In the present day, there are outstanding conjectures in number theory are justified mostly by heuristic considerations and computerized checks of small (sometimes not so small) cases.  When you get a feeling for what ought to be true and why, often the details of the argument are not too hard to fill in.  It is really insight, not rigor, that makes a good mathematician.  Ozob (talk) 04:09, 26 October 2013 (UTC)


 * The point of rigor is to reinforce good intuition and destroy bad intuition. The article lead, as it is written, is not a good heuristic at all.  If we state specific rules to define the functions 'f' and 'g' (say f(x) = x^2 and g(x) = 1/x), then what is written is false even within the abuse of notation, as we would have df(x)/dx = 2x, which is most definitely not the derivative of f composed with g.  If we're going to use the abuse of notation, we should at the very least write something of the form d(f(g(x)))/dx = d(f(g(x)))/d(g(x)) * d(g(x))/dx; while this is an abuse of notation, it is one unlikely to cause confusion; removing the arguments might look cuter, but it leads to inconsistency if one then tries to interpret df/dx formally.  At the very least, if we explicitly write the arguments, it is clear that one should not do that.


 * In general the type of intuition that leads to writing d(f(x))/dx can certainly lead to incorrect notions once one is no longer dealing simply with calculus of a single variable; that this abuse of notation is acceptable for an article such as this might be true, but you should still be very careful about doing it.


 * As for your example, if one writes f(x) = x3 + 2, this does indeed mean that for any x in the domain, then the stated relation holds. But this is just the issue.  In this statement, f(x) is not the map, it is the value in the range mapped to by x.  That we have not chosen a specific number for x does not change this; we are stating that one may pick any number x in the domain, and, regardless of that choice, the element of the range mapped to by x (denoted f(x)) will equal x3 + 2.  Any statement made about f(x) must also hold if we replace x with a specific value in the domain.  This is why d(f(x))/dx is an abuse of notation; one cannot, for example, write d(f(5))/d5 and have any meaning at all, despite the fact that if we are treating 'x' as a formally defined entity then that is a perfectly permissible manipulation.  It is a mild abuse of notation - I admitted as much in my very first post - but it is an abuse and one must be wary of it.  71.163.32.84 (talk) 04:35, 26 October 2013 (UTC)


 * It's easy to see what you're saying and how you're thinking about it. And I don't disagree with what you're saying at one level. Nevertheless, this is an encyclopedia and not something being prepared as the input to a theorem-verifying algorithm. For this reason, it is essential to give due weight to the notations currently in widespread use, in both mathematical and pedagogical texts – failing to do so will make the article far less accessible across the entire spectrum of readers. Leibniz is one of the prevalent notations, and so cannot be neglected.
 * I gave an interpretation of Leibniz notation above that makes rigorous sense, though this appears to have gone unnoticed. As Sławomir has pointed out, context is often not stated explicitly. For example, the statement f(x) = x + 1 has two distinct interpretations, the choice often being implied by inference: as an equation (true for some x, or more specifically, true for solutions x that solve some system), or as an identity (true for all x in the domain of f, the latter usually being determined as those values for which the statement is defined).  Yet we do not bother to add this clarifying verbiage, and most people would not know how to state what they mean formally.
 * The overall conclusion must be that the Leibniz notation must be included in the article, without embellishments or arguments about rigor. Debates about its rigor do not actually belong on this talk page. That said, we should get the article into shape using the standard notations, not the current mess. — Quondum 06:57, 26 October 2013 (UTC)

The present state of the article states: "For example, the chain rule for (f ∘ g)(x) is $$\frac {df}{dx} = \frac {df}{dg} \cdot \frac {dg}{dx}.$$". Reading it with a fresh view, and knowing chain rule, I find it highly confusing. Thinking about it a little more, it appears that the formula is correct by itself, but wrongly specified. In fact nor Leibnitz, nor Newton did know the symbol of function composition, and probably not the modern notion of composition of function. For being correct, the formula must be introduced by "If f is a function of a variable g, which is itself function of x, then the chain rule is: ...". If one want to introduce the chain rule for (f ∘ g), Leibnitz notation must be avoided, because it supposes that the variables are named, which is not the case in (f ∘ g). On the other hand, Newton notation, where the variables needs not to be named works well and gives: "The chain rule for (f ∘ g) is (f ∘ g)&prime; = (f&prime; ∘ g) g&prime;." I agree with the IP that, even with my correction, the formulation with Leibnitz notation is difficult to make formally correct, because it needs the rather strange notion of a variable that is function of another variable. But it works well in practice.

In conclusion, my opinion is that the two versions of the chain rule must appear in the lead, appropriately introduced.

D.Lazard (talk) 10:27, 26 October 2013 (UTC)


 * I think the new lede is a great improvement. Ozob (talk) 14:30, 26 October 2013 (UTC)


 * Agreed, I find this reasonably satisfying. 71.163.32.84 (talk) 14:33, 26 October 2013 (UTC)


 * Agree, although I still question whether df/dg is the best way to express the relevant derivative. I would prefer something like dy/dx=(dy/du)(du/dx), fwiw.  Sławomir Biały  (talk) 15:53, 26 October 2013 (UTC)


 * This is a more common notation, when using the language of "variables that are functions of other variables". But, here, it seems better to use the same names in the two formulas. D.Lazard (talk) 16:04, 26 October 2013 (UTC)


 * Sławomir, I had the same concern. I hope my edit has addressed it. Er, sorry, D.Lazard, I see I've gone against your vote; I have difficulty agreeing with you on this, but had I read your comment properly, I would have debated here first. — Quondum 17:53, 26 October 2013 (UTC)

Higher Dimensions
Am I missing something, or are the claims in this section completely wrong? D(f o g) = Df o Dg seems to me to be a false statement; the correct formulation would be D(f o g) = ((Df) o g)Dg. That this section cites no sources at all does not exactly inspire confidence, as well. 71.163.32.84 (talk) 03:05, 28 October 2013 (UTC)


 * There was an explanation in an earlier version, that seems to have been replaced by the statement about Jacobians:
 * "Note that the derivatives here are linear maps and not numbers. If the linear maps are represented as matrices (namely Jacobians), the composition on the right hand side turns into a matrix multiplication."
 * Your version seems in essence to be expressed by the bit on Jacobians. Though I cannot verify it, it does seem as though it could make sense as it stands, though an interpretation of the notation may help.  And references would help, of course. — Quondum 04:57, 28 October 2013 (UTC)
 * Upon further reflection and consultation of an analysis text, it seems this is an odd notational way of expressing the relation in terms of linear operators; the linear operator for the total derivative is usually notated with a 'd' rather than a 'D,' as the latter is used to denote the matrix. I think it'd be worthwhile to restructure the whole section, and give the relation first in terms of gradients and dot products (perhaps even write out the sums entirely), which would be far more comprehensible to people without extensive math education, followed by the more elegant matrix representation, with that followed by the linear operator expression, being clear to use different notation for each, and expressing each new notation clearly in terms of the previous one.  This is the logical sequence used to construct the linear operator notation in the first place, so it would make sense to express it in the article.  As it is currently stated, it is highly confusing ("captures how the function changes in all directions" is not a useful definition).  I could make this edit tomorrow, and add references, but I don't have the time tonight. 71.163.32.84 (talk) 06:02, 28 October 2013 (UTC)
 * While I agree with most of what you've said, I do not agree with the matrix representation being "more elegant". Thinking using matrix representations unfortunately tends to obscure some insights by not tracking when one is working in the dual space or indeed what basis is being referred to (just a personal rant), and basis-independence somehow gets lost. It also does not as clearly apply to the more abstract cases. Possibly, use of juxtaposition (duly explained in text) to denote the application of an operator or composition of operators as appropriate, might be less jarring for those more familiar with linear algebra. Though I do not know whether this is standard in the context. — Quondum 14:12, 28 October 2013 (UTC)
 * You misunderstand me; I meant "more elegant than the notation that would come before it," i.e. gradient/dot product and summations. The linear operator notation is cleaner and more general than the matrix notation. 71.163.32.84 (talk) 14:19, 28 October 2013 (UTC)
 * Using d in place of D would be unacceptable, as df is standard notation for a differential form. Ozob (talk) 02:09, 29 October 2013 (UTC)
 * I should clarify: The context here is the derivative at a point in Rn, but the differential form df carries with it all the baggage of differential geometry. In the present situation it would be a vector-valued differential form.  But the object in differential geometry more analogous to the derivative function of f is the pushforward.  By restricting the pushforward to the tangent space over the point a of Rn we obtain the linear transformation that the article denotes Da(f).  It is also possible to define a vector-valued differential form df which, when restricted to the tangent space over a, becomes Da(f), and then pullback of differential forms corresponds to the chain rule.  If other editors think that it would be useful to cover pullbacks in this article, then I'm open to that; but my instinctive feeling is that it strays a little too far from the intended focus of the article, since in the general setup of manifolds the two ideas (pushforwards and vector-valued differential forms) become distinct.  Ozob (talk) 02:37, 29 October 2013 (UTC)
 * I think the current notation would be fine with a better introduction in which the notation being used is actually defined; it would make far more sense to introduce the higher-dimensional chain rule in terms of projection functions and sums first, then rewrite that statement in terms of Jacobian matrices, and then use the action of the Jacobian on a vector to define the linear operator. This would be far less confusing than what is written now, which seems to be backwards. 71.163.32.84 (talk) 02:56, 29 October 2013 (UTC)
 * It would possibly be easier to follow as you suggest. The most important is possibly a more careful explanation of notation, but the reordering would probably also help, as each step gives insight into the next, given that the premise is over Rn and not more abstract. — Quondum 03:59, 29 October 2013 (UTC)
 * I disagree on this as well. The assumption that the Jacobian matrix defines the total derivative requires some other assumption, such as that the function is differentiability class C1.  For example, consider the function:
 * $$f(x, y) = \begin{cases}\frac{x^2 y}{x^4 + y^2},& (x, y) \neq (0,0) \\ 0, &(x, y) = (0,0).\end{cases}$$
 * This function has first partial derivatives everywhere -- this is clear away from the origin, and at the origin one does a simple computation. But it's not even continuous at the origin, as you can see by looking at the limit along the curve $$\gamma(t) = (t, t^2)$$.
 * It's true that, if the total derivative exists, then all the partial derivatives exist and the total derivative is the matrix of partial derivatives. But this counterexample shows that it is not true that one can simply assemble the partial derivatives into a matrix and expect to get the total derivative.  Ozob (talk) 13:35, 29 October 2013 (UTC)
 * This is true, but this is not a convincing argument for presenting the total derivative first, rather for simply being careful when writing the section. 71.163.32.84 (talk) 14:10, 29 October 2013 (UTC)
 * The defining characteristic of the derivative is that it is the best linear approximation to a function at a given point. The partial derivatives describe the best linear approximations only in the axial directions.  Because of that the chain rule is especially simple for the total derivative and quite a bit more complicated for partial derivatives; to be honest the way I remember the chain rule for partial derivatives is by formulating everything as matrices.  I believe this simplicity means that the total derivative ought to go first.  (Contrast this with the derivative article, which treats partial derivatives first because their definition is simpler.)  Ozob (talk) 02:04, 30 October 2013 (UTC)
 * As the article is written, someone who doesn't already know what the total derivative is will be completely lost upon reading the beginning of this section, and will essentially have to read it backwards to get any understanding of the generalization. Even for people familiar with the total derivative, it's confusing, because the notation is not really defined beforehand; one would have to click on the link and read the total derivative article to understand what is being written, and even then due to different standard ways of notating the total derivative it's not immediately clear what the formula is stating (this has been mentioned on the talk page before). 71.163.32.84 (talk) 04:01, 30 October 2013 (UTC)
 * Ozob makes some good points, though I'm going to sidestep the point on which order to present things. Perhaps we should address the inadequate introduction/definition/explanation of the notation, which is a point I don't think anyone has disagreed with yet. How would you feel with introducing the notations D and ∘. The section mentions the total derivative being a linear transformation, but does not explicitly say that this is what D represents, not what it subscript is etc. It would also help if he composition of linear operations was mentioned in words. Your feeling? — Quondum 05:31, 30 October 2013 (UTC)
 * I agree that it is wrong to start with the Jacobian matrix formulation. Just for the function to be differentiable already requires the existence of a linear map (not just a Jacobian matrix).  If a reader doesn't already know this, then the section should not be written in a way to mislead such a reader.   Sławomir Biały  (talk) 12:05, 1 November 2013 (UTC)

I've made a first pass at attempting to be more explicit about the notation. Please feel free to improve it or to comment here. Ozob (talk) 13:39, 30 October 2013 (UTC)
 * There's definitely still something wrong here. f cannot be composed with g, only g can be composed with f. (The range of f does not equal the domain of g.) — Preceding unsigned comment added by 128.32.132.89 (talk) 01:25, 18 April 2017 (UTC)

Continuity of η in second proof
The second proof contains the following line:
 * The above definition imposes no constraints on η(0), even though it is assumed that η(k) tends to zero as k tends to zero. If we set &eta;(0) = 0, then η is continuous at 0.

Is this line really necessary? Wouldn't it be enough to write that η behaves like ε, i.e. it simply tends to zero as its argument tends to zero?

The second proof terminates finally with the following line:
 * The need to define Q at g(a) is analogous to the need to define η at zero.

But I don't see the need to define η at zero in that second proof. Am I missing something? Don't we just need to know that η tends to zero as its argument tends to zero? AurelienLourot (talk) 19:51, 30 December 2014 (UTC)
 * I see, that's because $$k_h$$ not only tends to zero but can also be exactly zero. $$\eta(k_h)$$ would then be undefined otherwise. Feel free to delete this discussion. AurelienLourot (talk) 20:31, 30 December 2014 (UTC)

Non-standard analysis
Would it be pertinent/beneficial to add a small section exposing the non-standard approach to the chain rule, i.e. using hyperreal numbers and standard parts? — Preceding unsigned comment added by Gio97 (talk • contribs) 08:30, 10 April 2015 (UTC)
 * What is the non standard approach of the chain rule? Chain rule is a formula which is the same in standard and non-standard analysis. Only the proof differs slightly, as the definition of the derivative is not the same. A new subsection, called "Proof in non-standard analysis", of the section "Proofs" could be added. Are you willing to write it? D.Lazard (talk) 09:29, 10 April 2015 (UTC)

Abuse of prime notation
I tend to agree with he point being made in the edit summary in this revert. Placing the prime after an expression is pushing an abuse of notation too far. The correct notation would be $$(f\circ g)'(x) = f'(g(x)) g'(x)$$. The prime just does not have the necessary flexibility (expressive power). —Quondum 02:24, 20 May 2015 (UTC)


 * So the notation under dispute was introduced in because it was supposed to be easier to understand.  That editor's point was that some students won't understand the notation for function composition.  I think that's both true and a concern for the target audience of article.  Given that assumption, I'm not sure what other notation to use besides $$f(g(x))'$$ (and I do think that the meaning of that expression is unambiguous and not an abuse).  Ozob (talk) 13:45, 20 May 2015 (UTC)


 * I understand the motivation for trying to use the prime notation for the derivative, but that notation has serious shortcomings, and using it in this way creates problems. I doubt whether you'll find a reputable textbook using it in such a sloppy fashion, and for the lead it can confuse.  Its *intended* meaning is not even clear.  For example, consider the following invalid substitution: y = g(x), so f(y)′ = f(g(x))′.  Do we really want the student of differential calculus to learn that substitution cannot be done when there is a prime applied to the expression?  —Quondum 14:20, 20 May 2015 (UTC)


 * Good point; I think of the prime mark as implicitly having an x (in which case the substitution you make is valid essentially by definition), but a student isn't likely to have that subtlety down pat. What do you think we should do?  Insist on function composition notation?  Ozob (talk) 14:34, 20 May 2015 (UTC)
 * I agree that writing $$f(g(x))'$$ is an abuse of notation. However, it is widely used by mathematicians and it is legitimate in some cases. Why it is an abuse of notation? Because, formally, in modern mathematics, the prime is an operator that apply only to functions; therefore, it must normally be written just after the name of a function. But what if the the function has not been named, such as $$g\circ f$$ or $$x \mapsto x^2+1$$? Note that, when Lagrange has introduced his notation, the operator $$\circ$$ was not yet invented (I believe, but I have not checked), and people did say "let f(x) be a function" rather than "let f be a function of x". At that time, the distinction between an expression containing x and the function that it defines was unclear (it is still unclear for many people). Therefore, it is very common and very often useful to adopt the convention that a prime placed after an expression containing x, indicates the derivative of the function defined by the expression. With this convention, the formula $$(x^2+1)'=2x$$ becomes correct, and $$f(g(x))'$$ becomes well defined. For these reasons I'll reinstall the disputed paragraph, with a note explaining the convention. D.Lazard (talk) 15:04, 20 May 2015 (UTC)


 * I understand f(x, y)′ and f(y, x)′ to be functions that are both differentiated with respect to their x-parameter (though actually, it depends on how y is related to x)), and f′(z) entirely depends on how z is a function of x. I can see where the subscript notation came from: fx(x), and here (f(x))x could make some sense. I would feel more comfortable if the explanatory note was more cautionary about the notation and written in the past tense: this is not a use that should be treated as current or without confusion.  At least Newton's overdot notation is unambiguously with respect to a universal parameter: time. —Quondum 15:45, 20 May 2015 (UTC)


 * Hopefully 's edit is a compromise acceptable to all. It neatly sidesteps the problem of which of differentiation with respect to the sole parameter or with respect to a specific variable x the prime denotes, and also allows it to be regarded more formally as an operator.  —Quondum 16:43, 20 May 2015 (UTC)


 * The current compromise seems fine to me. Glad this has been sorted out.50.163.87.238 (talk) 21:04, 20 May 2015 (UTC)

Animation
For the life of me, I can't make heads or tails of that recently-added animation. I'm not even sure what it's supposed to be showing. Is it just me? Perhaps with a bit of clarification it could be a useful addition to the article, but in its present state I'm not sure it'll contribute to anyone's understanding. 96.231.153.5 (talk) 06:07, 26 January 2016 (UTC)
 * I agree. I have removed the animation. It is not even clear that the animation correctly represent the chain rule, and, in any case, it is to fast for allowing understanding. D.Lazard (talk) 09:02, 26 January 2016 (UTC)

Example is wrong Chain_rule
Regarding the example $u(x, y) = x^{2} + 2y$. We strictly have $$\frac{\partial u(x,y)}{\partial x}=2x$$, $$\frac{\partial u(x,y)}{\partial y}=2$$ and $$\frac{\partial u(x,y)}{\partial r}=0$$ since the function does not take r as a function parameter. The article makes use of non standard notation if you apply the chain rule here. You shall not apply the chain rule here. See https://www.icp.uni-stuttgart.de/~icp/mediawiki/images/7/74/Remark_on_partial_derivatives.pdf please correct this!--94.217.251.2 (talk) 10:24, 26 June 2018 (UTC)


 * The x and y variables are given to be functions of r and t. It seems like a standard application of the chain rule to me.   Sławomir Biały  (talk) 11:27, 26 June 2018 (UTC)

Wrong order of indexes in "Higher derivatives of multivariable functions" section's formula
I've stumbled across that formula and the order of partial derivatives in the second sum is not correct, it should be like this: $$\frac{\partial^2 y}{\partial u_\ell \partial u_k}$$ not this $$\frac{\partial^2 y}{\partial u_k \partial u_\ell}$$

It is not always true that the order of the partial derivatives can be exchanged without affecting the result as asserted by the Schwartz's theorem. So it is appropriate to write the formula as general as possible or at least precise that we are under certain conditions. — Preceding unsigned comment added by 213.243.253.119 (talk • contribs) 09:28, 6 May 2019 (UTC)


 * Please, sign your posts in talk page with four tildes ( ~ ).
 * As the sum is taken over all pairs $$k, \ell$$, the two versions give always the same, as one passes from one formula to the other by simply exchanging the names of the summation indexes. So your version is confusing, as you change the order of the indices in one place among three without need and without explanation. D.Lazard (talk) 12:33, 6 May 2019 (UTC)

Please don't use capital F in the lead
I don't think it's wise, didactically, to use a capital F in F(x) = f(g(x)) since capitals are very often used to designate the primitive, as in F'(x) = f(x). Why not just use h or something? — Preceding unsigned comment added by 77.61.180.106 (talk • contribs) 13:00, 4 December 2020 (UTC)
 * ✅ However, the main reason for such a change is that $F$ suggests wrongly that the composition is more related to $f$ than to $g$. D.Lazard (talk) 14:30, 4 December 2020 (UTC)

Problem with First Example.
The First Example states "(f ∘ g)(t) is the atmospheric pressure the skydiver experiences t seconds after his jump". It's not that simple, because the distance fell after t-seconds is a tiny bit less than that given by g(t) due to buoyancy. I expect you would need to use a differential equation to model the physics in the first example rather than a simple composite function. — Preceding unsigned comment added by MathewMunro (talk • contribs) 09:26, 14 February 2021 (UTC)
 * I have removed this example as too technical here (this is not an article of physics), and totally irrealist (the aerodynamic force is much more important than buoyancy and is not considered). D.Lazard (talk) 10:19, 14 February 2021 (UTC)
 * please sign all your talk page messages with four tildes ( ~ ) — See Help:Using talk pages. Thanks.
 * Yes, I think user D.Lazard adequately took care of this . - DVdm (talk) 10:27, 14 February 2021 (UTC)
 * If someone wants to re-do the recently deleted example, this might be useful: I found what is probably the simplest possible example of the Chain Rule: https://socratic.org/questions/5a6b377f7c0149554545a31c but of course you can't just plagiarise it. MathewMunro (talk) 11:02, 14 February 2021 (UTC)
 * That source is a schoolbook example of an unreliable source. I also don't think the example should be redone, as, without a source likely having been a schoolbook example of wp:original research, it actually was a schoolbook example of a blatant wp:copyright violation of this (also unreliable) source. Good thing we got rid of it. - DVdm (talk) 13:17, 14 February 2021 (UTC)


 * I can assure everyone that the example was not plagiarised because I invented it myself. Its appearance elsewhere is an example of other websites plagiarising Wikipedia rather than the other way around.
 * I wrote the example quite a long time ago. I think it's been sixteen years; it was before I had even decided to make an account. The example does not say that it is a physically correct description of falling through air, and indeed it was never intended to be. Rather, it was intended to demonstrate certain points which I found my students confused about.  I thought the best way to do this was by means of an example where all the units involved were clear.
 * The specific points I wanted to demonstrate are in the last two paragraphs, which I'll quote here for reference. Notice the discussion of units!  Also notice the discussion of various mistakes and how the units in those mistakes are nonsense!

For example, suppose that we want to compute the rate of change in atmospheric pressure ten seconds after the skydiver jumps. This is $(f ∘ g)′(10)$ and has units of pascals per second. The factor $g′(10)$ in the chain rule is the velocity of the skydiver ten seconds after his jump, and it is expressed in meters per second. $$f'(g(10))\!$$ is the change in pressure with respect to height at the height $g(10)$ and is expressed in pascals per meter. The product of $$f'(g(10))\!$$ and $$g'(10)\!$$ therefore has the correct units of pascals per second.

Here, notice that it is not possible to evaluate $f$ anywhere else. For instance, the 10 in the problem represents ten seconds, while the expression $$f'(10)\!$$ would represent the change in pressure at a height of ten meters, which is not what we wanted. Similarly, while $g′(10) = −98$ has a unit of meters per second, the expression $f′(g′(10))$ would represent the change in pressure at a height of −98 meters, which is again not what we wanted. However, $g(10)$ is 3020 meters above sea level, the height of the skydiver ten seconds after his jump, and this has the correct units for an input to $f$.
 * I knew that some of my students would find this page. I hoped it would help them and possibly others.  The fact that other websites have copied it is some evidence that it did!
 * Of course, Wikipedia is an encyclopedia not a textbook. In the intervening sixteen years, the rules on original research have become more strict, and my appreciation for those rules has also increased.  Nowadays, I wouldn't feel quite so comfortable inventing an extended example like that and putting it onto Wikipedia.
 * Despite this, the example is still useful. What I would really like is if someone could find an example, with a citation to a textbook or paper, which clearly exhibited the same features: The equations involved should be simple, there should be some kind of obvious physical meaning, and the many mistakes that we all see when we teach calculus should result in physically meaningless quantities. That kind of example has encyclopedic value.  Ozob (talk) 00:23, 15 February 2021 (UTC)


 * there you go, if you invented it yourself, it is original research, and unless it is used/discussed/mentioned in the relevant literature to establish its worthiness to be included here, any discussion about its merit or validity is actually off-topic here. As you say, this is Wikipedia... - DVdm (talk) 11:04, 15 February 2021 (UTC)
 * Well, of course it is. I said as much.  But as I also said, I would like if someone were to find a similar example which could be cited.  (Surely all it would require is perusing some textbooks...)  A discussion of common chain rule mistakes is excellent content for an encyclopedia.  Ozob (talk) 18:21, 15 February 2021 (UTC)


 * It turns out that it's quite difficult to find examples that are of the kind I'm hoping for and that are also out of copyright. (I consider being out of copyright important here, since the article would essentially have to rework the example in full, and that might not be considered fair use.)  As far as I can tell, calculus textbooks from a century or more ago drew their basic examples exclusively from analytic geometry.  Everything is about curves and tangents.  Physical applications, if they're included at all, are invariably in a later chapter, well after the main concepts have been introduced.
 * The best examples that I've found so far have been in Thompson's Calculus for the Practical Man, chapter IV. See .  I think problem (4), in particular (about a conical cup filling with water), would illustrate the same things that the skydiver example did; and it would be physically accurate and sourced.  What does everyone else think?  Ozob (talk) 23:03, 15 February 2021 (UTC)
 * WP:Wikipedia is not a textbook, and the example was misplaced: placing a physical example before a mathematical definition may make sense in a textbook where readers are supposed to have there their first access to the subject. In the case of this article, readers may come here either by following a link or because they have encoutered the title somewhere. So they must have already heard of derivatives and function composition, and a motivating example before the definition is probably useless for them, and it may be confusing if it uses non-mathematical concepts that they do not master.
 * So, I strongly oppose this sort of example at this place. However, I do not oppose to add such an example later in the article, for example in a section "Example of application". D.Lazard (talk) 09:02, 16 February 2021 (UTC)
 * That sounds fine to me. I think it would fit well in the "Applications" section, perhaps after the subsection entitled "Absence of formulas". Ozob (talk) 01:29, 17 February 2021 (UTC)
 * Copyrights are about the actual text being used; if someone has put an example in a textbook, and you rewrite the same mathematics in your own words, there is no copyright issue. --JBL (talk) 13:03, 17 February 2021 (UTC)
 * No; rewriting in our own words only creates a derivative work, to which the original copyright owner still has some rights. Ozob (talk) 04:20, 18 February 2021 (UTC)
 * No: the idea behind a mathematical example (or a math problem) is not copyrightable. (Likewise cookbook recipes.)  Problems only arise if you copy a collection of such things. --JBL (talk) 12:15, 18 February 2021 (UTC)

Correctness of example function in first proof?
I have a question about the example function g(x) in the First Proof. Currently the article states: ‘For example, this happens for g(x) = x^2*sin(1 / x) near the point a = 0.’ I think the example function should be a split function as follows: “g(x) = x^2*sin(1 / x) for x doesn’t equal 0, and g(x) = 0 for x = 0”. As it stands the function is undefined at the point where a = 0, and so not differentiable there. I think the point of the article would be valid if it used the split function I have suggested. — Preceding unsigned comment added by Matthew.howey (talk • contribs) 10:38, 29 March 2021 (UTC (UTC)


 * Please put new talk page messages at the bottom of talk pages and sign your messages with four tildes ( ~ ) — See Help:Using talk pages. Thanks.
 * Yes, the second part of the Q function mentions g(a), so that should probably be taken into account. - DVdm (talk) 11:36, 29 March 2021 (UTC)

D.Lazard (talk) 11:40, 29 March 2021 (UTC)
 * Hi – many thanks for your quick response and edit . Could I suggest it would be clearer if the value of g(x) at x = 0 is explicitly built into the definition of the function itself – perhaps: “this happens for the split function: g(x) = x2sin(1/x) if x ≠ 0,  g(x) = 0 if x = 0”. Happy to make the edit but don’t want to do anything without agreement. Thanks.Matthew.howey (talk) 15:27, 29 March 2021 (UTC)

I've proceeded to make the edit I suggested, obviously if anyone disagrees please amend or revert. Matthew.howey (talk) 20:40, 15 April 2021 (UTC)

Notation for partial derivatives in multivariate case of $f(g1(x), ..., gk(x))$
I think the current notation for partial derivatives ($$D_i f$$) is unnecessarily pedantic. Thus I attempted to change it to what I believe is the most common notation ($$\frac{\partial f}{\partial g_i}$$), which was almost immediately reverted with the reason "Partial derivative with respect to a function is not defined". While that is correct, it is also not what I wrote, but I agree that it should perhaps be very clearly stated that it is not to be read as that. I think the $$\frac{\partial f}{\partial g_i}$$ notation is by far the most common, and I don't think the article is helping anyone by not adhering to that. Does anyone have major objections to changing the notation, perhaps with the addition of a few sentences explaining how it should be read? QuarksAndElectrons (talk) 10:03, 27 October 2023 (UTC)


 * Again, in the notation of partial derivatives, this is a variable that must appear in the denominator, not the name of a function. This is the reason for using the rather standard derivative. Note that the reason is explained in the following sentences, and your edit makes confusing these explanations. D.Lazard (talk) 12:10, 27 October 2023 (UTC)
 * And I am saying that it is incredibly common to either treat the variables and functions on equal footing in this type of situation (see eg. Terrence Tao Analysis II) or make it clear from context that $$\partial/\partial g_i$$ should be read as the derivative wrt. to the ith argument. I think that writing it with $$\partial$$s makes it much more readable.
 * I know that the D-notation exists, however i strongly disagree that it makes it simpler and clearer to use it. Especially since you end up with two different notations in the same equation in the following examples. QuarksAndElectrons (talk) 15:38, 27 October 2023 (UTC)
 * Alternatively, how about writing $$ f(y_1, \ldots, y_k) = f(g_1(x), \ldots , g_k(x)) $$, so that the chain rule can be written:
 * $$\frac{d}{dx}f(g_1(x), \dots, g_k (x))=\sum_{i=1}^k \left(\frac{d y_i}{dx}(x)\right) \frac{\partial f }{\partial y_i} (g_1(x), \dots, g_k (x)).$$ QuarksAndElectrons (talk) 16:00, 27 October 2023 (UTC)
 * The problem is not making things simpler or clearer, it is to be mathematically correct. Your notation is a sort of jargon, that is, a shortcut that is clear for accustomed readers, but may be misleading or confusing for others. So, I still disagree with your change and its variant, which needs the introduction of k unneeded dependent variables.
 * Nevertheless, this page has 202 WP:watchers; let us wait the opinion of others. D.Lazard (talk) 16:15, 27 October 2023 (UTC)