Talk:Numerical differentiation

Generalizing the 5 point method
Does anyone know how to generalize the 5 point method using $$h_1$$, $$h_2$$, $$h_3$$, and $$h_4$$? I have a series of points in Latitude, Longitude, and Altitude and I'm trying to get an accurate estimation of heading at each point. Yoda of Borg 22:37, 6 September 2007 (UTC)

The file from Numerical Recipe is encoded
Does anyone know the password?--plarq 14:12, 9 March 2009 (GMT+8)

Higher order methods
I think you're supposed to divide by 20*h, not 12*h. — Preceding unsigned comment added by 194.127.8.18 (talk) 15:21, 30 August 2011 (UTC)

Three point?
For the finite difference formula there is mention of "three-point" when actually, only two points are involved, namely at f(x) and f(x + h) or alternatively at f(x - h) and f(x + h); in particular this last does not involve the point f(x). NickyMcLean (talk) 02:25, 7 August 2012 (UTC)
 * This one given for "difference methods" seems ok. I think your query assumes that the derivative is based on a quadratic fit (Lagrange polynomial ) for which I think there should be the alternate form given below. See http://www.shodor.org/cserd/Resources/Algorithms/NumericalDifferentiation/ http://www.nd.edu/~zxu2/acms40390F11/sec4-1-derivative-approximation-1.pdf - I dont know math markup, perhaps someone can add it to the article. Shyamal (talk) 03:44, 7 August 2012 (UTC)

-3*y(x) + 4*y(x+h) - y(x+2h) y'(x) = -- 2*h $$y'(x) = {-3y(x) + 4y(x + h) - y(x + 2h) \over 2h}$$ seems to work despite not being {...}over{...} or similar so I'm not sure about the parsing of more complex formulae. I followed the example earlier in the article by activating "edit" and seeing what gibberish was employed. Anyway, although your reference does indeed apply the appellation "three point" to $$y'(x) = {y(x + h) - y(x - h) \over 2h}$$ and also to the formula you quote, one manifestly has two points and the other three. Both do indeed span three x-positions (x - h, x and x + h) but one definitely uses three points, and the other two. So, despite your reference, I am in favour of counting correctly! Also, I strongly favour slope estimations that straddle the position of interest, so the three points should be x - h, x, and x + h for the slope estimate at x. NickyMcLean (talk) 21:38, 7 August 2012 (UTC)

Comparison of two point and Five-point stencil error as a function of h
Thanks for the great work on the article so far. It would be good if the error by h also showed a 4 point method error by h in this fantastic graph. I am trying to work out if I should implement 4 point in my minimization problem and this would be a great help. Doug (talk) 18:32, 23 November 2012 (UTC)

Mixed partial derivatives
AFAIK they are computed by successively differentiating in each variable to the desired order. But does it always give the best approximation? A few words added to the article by experts would be helpful. 89.138.86.3 (talk) 15:56, 5 July 2013 (UTC)

Apparently, successively differentiating does not do the trick for central differences as I found this formula in a paper/ phd thesis (LaTeX code; "i" and "j" are indices):

% if f_{x}(i,j)*f_{y}(i,j) < 0

f_{xy}(i,j) = \frac{2*f(i,j) - f(i+1,j) + f(i+1,j+1) - f(i-1,j) + f(i-1,j-1) - f(i,j+1) - f(i,j-1)}{2*h^{2}}

% else

f_{xy}(i,j) = \frac{-2*f(i,j) + f(i+1,j) - f(i+1,j-1) + f(i-1,j) - f(i-1,j+1) j(i,j+1) + f(i,j-1)}{2*h^{2}}

However, the authour does not mention the consistency. The source is chapter 5 of "From Pixels to Regions: Partial Diﬀerential Equations in Image Analysis" by Thomas Brox.

Another source (I can not recall the name) suggested successive differentiating in case of calculating a forward difference (or more correctly: forward-forward difference, as differentiating was done "forward-direction" for each dimension). It seems to work, but I do have my doubts and therefore would be grateful to see a section concerning forward- and backward-differences in multiple dimensions. — Preceding unsigned comment added by 188.97.251.186 (talk) 17:35, 12 July 2014 (UTC)

I do have to correct what I wrote earlier, as successive differentiation seems to work. "Numerical Recipes" states that e.g. for two dimensions, the mixed partial derivative is (LaTeX code; "i" and "j" are indices, h is grid size):

f_{xy}(i,j) = \frac{(f(x+h,y+h) - f(x+h,y-h)) - (f(x-h,y+h) - f(x-h,y-h))}{4*h^2}

Evidently, successive derivation with a numerical scheme gives the same result. — Preceding unsigned comment added by 188.97.251.186 (talk) 19:42, 12 July 2014 (UTC)

code in section "Practical considerations using floating point arithmetic" is incorrect?
My criticism on the code in section "Practical considerations using floating point arithmetic": 1. There is no reason to assume Sqrt(epsilon)*x is a good estimate of the ideal step size. Indeed it should be 2 * Sqrt(epsilon) * Sqrt(|f(x)|) / Sqrt(|f''(x)|). See source [A], lemma 11.13, page 263. 2. Any sensible method for numerical derivation should be scale independent. Yours is not because you explicitly use max(x, 1) in your calculations. Why do I think scale-independence is important? Say 'x' is of units millivolts and your algorithm yields good results. Then you decide to change 'x' to megavolts in your data. Thus all values of 'x' get rescaled by 10^-6. But then your algorithm gives strikingly different precision?! This clearly should not be true for floating point numbers. 3. I don't see why it should improve accuracy of your results when you calculate (x+h-h). IMHO this adds further round off errors to your total result. If you think what I say is wrong, please add external sources, which support your algorithm and derivations. IMHO I assume you have never read a book on numerical differentiation and you invented your algorithm from scratch. The correct procedure for computing the ideal step size can be read in source [A], Procedure 11.15 on page 265. Having read source [A], I personally find your explanations more misleading than helpful.

Source [A]: — Preceding unsigned comment added by 141.84.69.77 (talk) 20:08, 20 October 2015 (UTC)


 * I don't know where the max(x,1) in place of x came from and it is a puzzle as indeed it introduces a fixed scale. The reference and formula you quote requires knowledge of f" which is often available in example problems. In a more general situation, as when numerical methods are required, this would be unknown. An ad-hoc argument would be that sqrt(eps)*x reserves about half the digits of precision to the difference. The part about (x + h) - h, as computed is real enough for limited-precision floating point arithmetic, as (x + h) will be rounded/truncated to the limited precision used for (x + h), remembering that h is much smaller than x and will have lots of non-zero low-order bits/digits that did not meet those of x when the binary/decimal points were aligned for the addition. NickyMcLean (talk) 12:00, 24 October 2015 (UTC)


 * First of all, thanks for your reply. Well, after rethinking, I agree that (x+h-h) might actually improve your accuracy. In any case it is rather insignificant, since the relative error h/(x+h-h)-1 should be of order sqrt(eps); and by your argument your error is of that order anyway. But leave that aside, and reconsider the algorithm given in source [A]. It is *not* true that the procedure works only for known sample functions. (read carefully procedure 11.15 on page 265)! The idea is to first make a rough estimate for the second derivative. Then this is used to compute the step size. From what you wrote, I see that you accept, that the ideal step size should indeed be as given in source [A], lemma 11.13. Now, observe, that your step size is (apart from the max(x,1)) differs by a factor of sqrt(f/f ' ' ). This means, that here you introduced *another* fixed scale. Essentially your algorithm works best, if f/f ' ' is in the order of 1. If, now f(x) is of units 'meter' and x is of units 'second', then f/f ' ' is of units second^2. This means, you arbitrarily fixed a constant of '1 second^2'. E.g. rescaling the data to 10^V*seconds (e.g. microseconds or so) will worsen your number of significant bits by 2*log_2(10)*V. I consider this a pretty serious flaw!!  — Preceding unsigned comment added by 141.84.69.77 (talk) 17:23, 27 October 2015 (UTC)


 * After looking through the revision history, I see that the max(x,1) was introduced in the edit of the fifth of January 2015 with the remark "Fixed h calculation so that it works for small x", that appears to be the only contribution from Wiki Kaczor. My puzzlement at it is not reduced thereby. I am not worrying about underflowing the floating-point number system (on some systems, (x very small)*(small) produces zero) so if x is small there should be no difficulty. I have removed the max(x,1) from the W. article. Your referenced article contains statements such as "suppose f has continuous derivatives up to order two near a" and so on. For the example function this is so, and f" is computable, and is computed, and an optimum value for h devised for the example function at the example position. Because the error in the approximation of the tangent by the secant can be bounded, essentially because it is determined by the deviation of f' from linearity, which is f" being non-zero and bounds on f" can be devised, as in max|f"(x)| over some region near a such as [a,a + h]. But notice the remark "all h in the interval [3.2E-9, 2E-8] give an error of about the same magnitude". (p. 263) And notice that eps (for double precision) is given as 7E-17 so that sqrt(eps) is 8E-9 - though the W. article suggests eps = 2.2E-16 so that sqrt(eps) = 1.5E-8. Since the values for x are in the example for 0.5, sqrt(eps)*0.5 gives a similar value. But via a different route.
 * Dimensional analysis: h = 2*sqrt(eps*|f(a)|/|f"(a)|) - suppose f is a function of time, giving a position in metres at time t=x. Then f" is an acceleration, units metres/sec^2 so f/f" has units metres/(metres/sec^2) or sec^2 as you say, and sqrt reduces that to sec, suitable for adding h to x. Now consider h = sqrt(eps)*x Here, eps is still a "pure number", so sqrt(eps) is likewise, and so h also has the same units as x. In the general case where f is unknown, one could say that its value might just as well be around one, and likewise, f" is unknown, and who knows, its value might well be around one also so that f/f" would be of the order of one... Now consider the example f(x) = sin(x), but for a = 100, or so, not a = 0.5. Since sin(x) is periodic, the ratio f/f" will be about the same (presuming one does not chance on f" = 0, etc.) and so h will be the same. But for h = sqrt(eps)*x, its value will be a hundred times larger because x is a hundred times larger. Indeed, for this function, that would not be helpful as the period remains constant so that a large h might span a significant part of a period thereby producing a secant that poorly aligns with the tangent. Now consider a = 1000000. The difficulty now increasing is that the limited precision of floating-point arithmetic is starting to be eroded, and a still-small h will not be helpful. The larger h of sqrt(eps)*x will at least allow the calculation of x and x + h to remain well-separated, with a chance that f(x) and f(x + h) will have a good difference. For the example function of sin(x) this won't be good, but for an arbitrary function, one for which there is no knowledge of f" other than the pious hope that its values are not startling, if it is sensible to enquire about f'(1000000) there could be reasonable results. Though one should not "compute from the hip", this is the usual expedient and if nothing is obviously wrong, ...
 * In short, h = sqrt(eps)*x should work about as well as |f(x)/f"(x)| is approximated by one. Notice that in the example f(x) = sin(x), the result of h = 2*sqrt(eps*|f(x)/f"(x)|) for x = 0.5 equals 4*sqrt(eps)*x so there is just a factor of two between them. NickyMcLean (talk) 12:08, 28 October 2015 (UTC)
 * First, on your dimensional analysis: Well, I agree, that my statement, that <>, was incorrect. Nevertheless, let me clarify what I mean, by saying you fixed an arbitrary constant of "1 second^2" (say units of x is seconds). Take f(x) := 1 + (x - 1)^2 / a, and we want to compute the derivative at x = "1 second". The constant "a" is of units second^2, and "x" is of units second. If I use your algorithm, then the step size will always be h = sqrt(eps) * 1 second. However, as shown in source [A], the ideal step size should be h = 2*sqrt(eps*|f(x)|/|f"(x)|) @ x = 1 second, which is h = 2 * sqrt(eps) * sqrt(a). In your algorithm the step size is independent of the constant "a", while in my case it is proportional to sqrt(a). In both, your algorithm and my algorithm, h is of units seconds. The point is rather, that you ignore the constant "a" in the given sample function "f". The result is, that you have a lower number of significant digits, that I have (as shown in the paper: the ideal step size is = 2*sqrt(eps*|f(x)|/|f"(x)|)). In the extreme case, when '1/a' is scaled towards infinity, you will get grossly incorrect results. That's what I meant when I said you introduced another fixed scale: Given this function f(x), your algorithm is tuned for a=1. The algorithm from source [A] works for *any* a. This is a fixed scale!
 * My conclusion is: *** Mathematically speaking, your algorithm fundamentally assumes, the second derivative to be of the same order as x^2. *** And this should not be assumed.
 * Second, about the unknown f": It is **NOT** true, that the algorithm from source [A] requires a known (in the sense of: can be computed non-numerically) f". I think you missed the point from my last post. I will requote myself: <<""It is *not* true that the procedure works only for known sample functions. (read carefully procedure 11.15 on page 265)! The idea is to first make a rough estimate for the second derivative. Then this is used to compute the step size."">>
 * Finally, let me come back to your example with sin(x). You said QUOTE: <<""Notice that in the example f(x) = sin(x), the result of h = 2*sqrt(eps*|f(x)/f"(x)|) for x = 0.5 equals 4*sqrt(eps)*x so there is just a factor of two between them."">>. Well, yes, for your arbitrarily picked value of x. The point is, if x is scaled to infinity as x = pi/4+2*pi*N, with N->infinity, then from source [A] we will always get the same h, for any value of N. In your algorithm, you get an increasingly larger step size h. But this is just the best example to make my point: Your step size can deviate *arbitrarily much* from the ideal step size. E.g. you can always pick an N, such that the factor between your h and the ideal h is as large as you want.
 * I agree, that your algorithm may give reasonably good results for reasonable functions. But so does the naive algorithm, which just computes (f(x+K)-f(x))/K with a hand-tuned number of K. The point is, that in numerical algorithms we want to avoid any fixed scale, and extend it in a way, that treats the a most general class of functions nicely, without hand-tuning. And your algorithm will requires hand-tuning for certain classes of functions, as I have shown in the first paragraph. — Preceding unsigned comment added by 141.84.69.77 (talk) 21:51, 29 October 2015 (UTC)
 * REMARK: I do unterstand the point of "Wiki Kaczor". As you said, floating point numbers can underflow. And indeed they will underflow, on *any* system that has a fixed number of digits in the exponent [not as you said on "some systems"]. In particular, of course, on all IEEE-compatible systems. Moreover, what happens if you invoke the algorithm with exactly x=0? (Not so unreasonable assumption). Then it will fail with division by zero. — Preceding unsigned comment added by 141.84.69.77 (talk) 22:02, 29 October 2015 (UTC)


 * Err, the formula in the W. article is not my formula. Someone else presented it, without attribution. I agree that it will have trouble with x = 0. The usual meaning for eps is "the smallest value such that 1 + eps ≠ 1" so the sqrt(eps)*max(x,1) offers protection against x = 0, and sqrt(eps) would be adequate for x = 0. But consider what happens for x = 1E-20 for a perfectly sensible function where the x-values happen to be very small numbers (charge on an electron, say) - in this case the max(x,1) won't help at all and x+h and x-h will have different signs - crossing x=0 is probably not a good idea, as with log(x). To add to the confusion as zero is approached via ever-smaller numbers there is the possibility of graduated underflow, with non-normalised floating-point numbers. In this case, instead of the usual set of valid machine numbers for every power of two (or 8, or 16, or, decade for decimal) in a logarithmic manner, there now arises a set of machine numbers with constant spacing all the rest of the way to zero, and these can be considered as having lesser precision. Like, in decimal with a three-digit mantissa, valid mantissas might be 0.100 to 0.999 and the smallest power be 10^-9. Thus the previous decade would have the same set of mantissa values but a power of 10^-8, and so on upwards. The next smallest number would be zero itself. But with the introduction of non-normalised numbers for the approach to zero, there would be 0.099, 0.098, ..., 0.009, 0.008, ... 0.001, at constant spacing and all with powers of 10^-9 because 10^-10 is not available.
 * This was the sort of complication that I didn't want to think about, and I have an Intel manual on the floating-point co-processor that speaks of "obscene algorithms", for cases where, driven by the requirement to give the best possible accuracy for all possible arguments, a once-simple algorithm becomes obscenely intricate. There may be those who find such "staged underflow" helpful but I prefer to think about "exponent underflow", though some systems merely step straight to zero.
 * Thus, in the general situation there is difficulty in devising criteria that work for very small numbers, numbers around one, and very large numbers, with talk of "absolute error" and "relative error" and the possibility of different choices in different circumstances, as with the max(x,1) solving one problem (in one circumstance, x = 0) but introducing another (for x small) while causing no difficulty for x > 1. Now imagine if x was the result of some computation so that f'(x) was of interest, and x came out as zero (because of underflow) when in fact it should have been a very small number - I think such problems should be rescaled so that this doesn't happen! In any single circumstance, specific decisions can be made with confidence. Similarly, method A can work better in one trial or class of trial, and method B in another.
 * Your reference gives a formula that works well for any function, at any x, and involves the value of f(x), which is perfectly acceptable. But it also requires some knowledge of f"(x) and in a general context, this is unavailable. Yes, the procedure om p.265 is comprehensive: 1) interpolate f(x) by a polynomial p(x) at "suitable points", presumably at steps of h. 2) Approximate f'(x) by p'(x). This is straightforward, but now comes the effort: 3) produce Taylor series for f(x) and 4) investigate the roundoff and truncation errors in the calculation of f' (relative or absolute, to taste) with consideration of eps, thereby determining an optimum value for h, to be used for f near x in the first step. The result will be a value, plus knowledge of its accuracy, plus confidence in its correctness. This is good.
 * None of this is going to be done by someone who is satisfied by a linear polynomial for calculating p' from two points as in the W. article, or is faced by a function (or even just a set of data values with no function) that is too complex to produce f', let alone f" with reasonable effort. The ad-hoc sqrt(eps)*x will work well enough for cases where it works - not x = 0, as you say, and one ought not be attempting numerical differentiation of sin(x) anyway, especially for large x.
 * But since the good formula has a reference as well as merit, I'll introduce it to the article... NickyMcLean (talk) 10:25, 30 October 2015 (UTC)

Theorems? Proofs?
It would be useful to have a convenient link to material that shows what theorems underlie multipoint approximations and proof that such approximations are theoretically correct. For example, proof that 1) If the derivatives to be approximated exist then the multipoint approximations converge to them and 2) if the multipoint approximations converge to a limit then the derivatives exist (in the cases where such claims are true!). The reference http://mathfaculty.fullerton.edu/mathews/n2003/NumericalDiffMod.html used in the current article has links to some proofs that are pages "under construction".

Tashiro~enwiki (talk) 15:12, 8 March 2017 (UTC)

Which compilers optimize away mathematics?
In the "step size" section, it says "However, with computers, compiler optimization facilities may fail to attend to the details of actual computer arithmetic and instead apply the axioms of mathematics to deduce that dx and h are the same. With C and similar languages, a directive that xph is a volatile variable will prevent this." Which compilers does this apply to?Nice44449 (talk) 20:44, 23 September 2022 (UTC)


 * Most of them, potentially. It is difficult to be specific about generalities. One hard-line approach is that the compiler always compiles arithmetic expressions as they stand (according to the rules for order of evaluation) and so x2:=x1 + h; dx:=x2 - x1; will be evaluated as expected. But many compilers provide "optimisation" options, often with multiple levels of optimisation on offer, and these involve all manner of potential alterations so that (for example) x/10 might become x*0.1 because multiplication is deemed less costly in time than division. Except, in binary, 1/10 is a recurring sequence and not exactly represented, though in binary x/4 and x*0.25 would be safe. Some levels of optimisation would allow this change and others not, perhaps. With regard to the problem, the key is "re-ordering" or "sub-expression re-ordering" or similar descriptions such as "Allow Reordering of Floating Point Operations". Other phrases are "local optimisations", "global optimisations", "optimise for speed", "Maximum Optimisations", and so forth. Some compilers may allow individual options to be selected or not, possibly even with variations over different portions of the source code.
 * Another trap involves different precisions between arithmetic registers and storage formats, and this is routine on IBM pc and similar computers. Notably, the 8087 floating-point processor (and its followers) does floating-point calculations in 80-bit registers (with, during an operation such as +, -, *, /, three additional special bits), while values in memory are 32 or 64 bit numbers. The sequence x2:=x1 + h; dx:=x2 - x1; will be evaluated by an 80-bit register holding the result of (x1 + h) and that value will be truncated/rounded for storage in x2, a 32 or 64 bit variable - though some languages enable the use of 80-bit variables as well as the usual 32 and 64-bits. Anyway, in the next statement, x2 might be loaded from memory (thus eliciting its 32 or 64 bit value), OR, the compiler, its analysis recognising that the register already has the value of x2, will not generate code to load x2, instead re-using the 80-bit value still in the register. This will affect the value of dx when what is wanted is the actual value of (x2 - x1), those 32 or 64 bit numbers being the values supplied to the function via F(x1) and F(x2). And still further optimisations could result in the compiler generating code to pass parameters to the function via register-held values, as in "fast procedure calls". It's a foggy field. NickyMcLean (talk) 12:04, 22 November 2022 (UTC)

India Education Program course assignment
This article was the subject of an educational assignment at College of Engineering, Pune supported by Wikipedia Ambassadors through the India Education Program.&#32;Further details are available on the course page.

The above message was substituted from by PrimeBOT (talk) on 20:07, 1 February 2023 (UTC)