Talk:Savitzky–Golay filter

Possible renaming
The title of this article is wrong. S-G 's paper also gave coefficients for numerical differentiation Petergans 16:48, 24 March 2007 (UTC)


 * Hi Peter,
 * Would you like to suggest a more appropriate article title? Perhaps I'm wrong, but I get the impression that you have access to the paper. I did not (and still do not) but rather came accross the filter in a sofware package. It filtered some data for me really well. I was so impressed I wanted to popularise the paper in some way. Comments appreciated etc...
 * Cheers, Colm Rice 12:57, 22 April 2007 (UTC)


 * What about just Savitzky–Golay filter? It's not taken, it's less verbose, and it's not likely to be confused with anything else... --naught101 (talk) 05:00, 9 October 2013 (UTC)
 * ✅ --CapitalR (talk) 07:24, 1 January 2014 (UTC)

I'd like to make a major ( by volume :) addition
I've done a significant amount of work on optimal computation and could use some help in getting it into wikipedia. First I'd like someone to review it! I do have a copy of the paper somewhere, and might add a synopsis of it to the main page if I can dig it up. Either of you interested enough to talk with me? Prenatal (talk) 02:34, 16 July 2008 (UTC)

Order of derivative
The article says "Methods are also provided for calculating the first up to the fifth derivatives." Is there any particular reason this is limited to 5th derivatives? I would expect ill-conditioning at some order, but I would expect that to be hardware dependent, rather than a hard limit of 5. There are also limits where you should pick a higher-order fit polynomial for higher derivatives, but again, that's not a hard limit. My intro to this filter was from "Numerical Recipes in C", 2nd ed., sec 14.8. —Preceding unsigned comment added by 70.234.254.126 (talk) 18:44, 14 October 2009 (UTC)

example svp for the stupid ?
I looked at the fundamentals of statisitcs link; incomprehensible The anal chem papers are paywalled — Preceding unsigned comment added by 68.236.121.54 (talk) 17:09, 23 April 2012 (UTC)

Alternate explanation
You might look at BRL Report 863 by L.S. Dederick, 1953. It describes the same algorithm. It is on the DTIC website. — Preceding unsigned comment added by 69.251.18.149 (talk) 17:32, 15 May 2012 (UTC)

Formulas are needed
Many might well come here for the explicit formulas. As it stands the article is lacking the most important aspects. The remark on Dederick's paper is interesting, direct link: http://www.dtic.mil/docs/citations/AD0021009. — Preceding unsigned comment added by 94.196.247.207 (talk) 23:24, 17 November 2012 (UTC)

Revised lead
Firstly, S&G did not invent the process. It goes back into the 19th. century, described in the first edition of Whittaker&Robinson which I have seen, but cannot verify precisely. Guest has tables of convolution coefficients. I have removed reference to the moving average because it's just the same as a SG filter - a n-point linear smoothing filter has coefficients 1/n. I assume that S&G regarded this as too trivial to include in their paper. I've also removed reference to "advantages" because the comparison with moving average is irrelevant, for the reason mentioned in the last paragraph. There are some interesting properties, decribed and proved in appendix 7 of my book "Data fitting in the Chemical Sciences", including the property that smoothing leaves the area under a function unchanged. These properties might well go into the body of the article. Petergans (talk) 12:30, 6 May 2013 (UTC)

minus signs
A minus sign in non-TeX mathematical notation looks different from a stubby little hyphen:
 * -86
 * −86

I fixed a bunch of these. See WP:MOSMATH. Michael Hardy (talk) 14:49, 2 June 2013 (UTC)

merged with therory
The article has been merged with Numerical smoothing and differentiation. After a week or two, I shall ask for that article to be deleted. In the meantime, any comments will be welcome.

Please check for spelling errors and typos. Petergans (talk) 20:30, 30 July 2013 (UTC)
 * The old article has not been deleted, but has been changed into a redirect. Petergans (talk) 11:56, 11 October 2013 (UTC)

Weighted filter
The article says: "When weights are not all the same the normal equations become
 * $$\mathbf a = \left( \mathbf {J^T W} \mathbf{J} \right)^{-1} \mathbf{J^T W} \mathbf{y}\qquad W_{i,i} \ne 1$$,

These equations do not have an analytical solution. To use non-unit weights, the coefficients of the fitting polynomial must be calculated individually for each data sub-set, using local regression."

While technically true, this seems to imply that weighted SG is a "nonlinear" filter (coefficients depend on data, e.g. like the bilateral filter) ... which is misleading. If we write a = Chat*y, then the weighted SG filter is directly analogous (isomorphic) to the standard version, just with Hat Matrix Chat instead of 'C', the matrix mentioned above in the article (i.e C = Chat[ I ], I = identity matrix). In other words the convolution coefficients are the elements of Chat (more precisely, the filter-bank = rows of Chat, 1 for each polynomial coefficient).

Since dChat/dy = 0, the filter is still linear. If the x data is regularly spaced, as in SG, then the W matrix is typically independent of window-position x0 (e.g. for standard moving least squares weight kernels, where w=w[x-x0]). Hence Chat can be formed numerically once, before seeing the data. Therefore weighted SG is much more efficient than the general case of local regression, where the x data is irregularly spaced, and hence Chat=Chat[x0].

I tried to edit the article yesterday to reflect this, but my edit was then removed. I am not sure why, but perhaps my (much more concise) edit was unclear? If this comment makes sense ... then perhaps the article can be updated now to reflect the above? (I am open to alternative wording, the main point is to prevent any misunderstanding.)

BTW I can provide Matlab code to demonstrate the Chat computation if desired ...

An interesting aside: weighting can help make higher-order SG filters less ill-conditioned ( Runge's phenomenon ) ... I've used e.g. 9th order, but regularized with a Gaussian weighting kernel. — Preceding unsigned comment added by 99.119.88.209 (talk) 13:57, 24 October 2013 (UTC)
 * I reverted the edit because it appeared to be factually incorrect. The coefficients matrix is $$ \mathbf{C=(J^TJ)^{-1}J^T}$$ whereas the hat matrix is $$ \mathbf{\hat H=J(J^TJ)^{-1}J^T}$$
 * I don't understand why you think that non-linearity is implied. As you say, correctly, by subsuming the weights into the relevant variables a standard linear expression is obtained. With a factored weight matrix, W = wTw the normal equations can be written as
 * $$\mathbf{ a = \left( (wJ)^T (wJ) \right)^{-1} (wJ)^T (wy)}$$
 * Where is the possibility of misunderstanding? In a non-linear system the parameters, a, have to calculated by iteration. The non-iterative expression
 * $$\mathbf{ a = \left( J^TWJ \right)^{-1} J^TWy}$$
 * is only possible with a linear system. I'm not familiar with the Matlab routine, but I see a potential clash of nomenclature in the way the term Chat is used. Petergans (talk) 14:59, 24 October 2013 (UTC)


 * Thank you for clarifying. I do not normally use the term 'hat matrix', so had just assumed it meant "matrix multiplying the RHS vector in the explicit solution of a linear least-squares problem". I can see how this might be unclear, though. The misleading part I think was "To use non-unit weights, the coefficients of the fitting polynomial must be calculated individually for each data sub-set". This is not true, really, since each set of coefficients is the same (unless boundaries are treated differently, e.g. w/an offset stencil). Also, when I clicked through to the linked LOESS article, the example, and much of the text, emphasizes irregularly-spaced data and the consequent computationally-intensive nature of the method.
 * Re "this is not true" - To simplify the discussion, consider the case of diagonal weights. $$\mathbf{ (J^TWJ)_{ij}= \sum_k J_{ik}J_{kj}W_k}$$. The weight for each data point is, in principle, different. It follows that $$\mathbf{J^TWJ}$$ is different for each sub-set of data points, even though the subsidiary variables, z, are the same. Of course the "weighted" coefficients can be calculated by solving the weighted normal equations. My point is that the solution cannot be expressed in analytical form and smoothing is no longer a convolution process because the coefficients vary from sub-set to sub-set. Petergans (talk) 08:51, 25 October 2013 (UTC)

Common sense
Please.

I found a huge number of instances of "Savitzky-Golay" (with a hyphen) instead of "Savitzky–Golay" (with an en-dash). And many many cases of things like -7 (with a hyphen) instead of −7 (with a minus sign). And lots of other things just as bad. Please look at WP:MOS and apply common sense. Michael Hardy (talk) 05:13, 2 January 2014 (UTC)


 * Most editors don't know the difference and don't care. Just fix them.  That's what I do.  Dicklyon (talk) 05:31, 2 January 2014 (UTC)

Appendix has incorrect coefficients
The table of Savitzky-Golay coefficients at the bottom of the article is all correct, except for the last two columns of "Coefficients for 2nd derivative". In particular, the normalization factors are incorrect: instead of 99 and 4719, they should be 1188 and 56628. — Preceding unsigned comment added by 184.9.20.9 (talk) 22:42, 6 April 2014 (UTC)

I agree with this comment. Based on ['General Least-Squares Smoothing and Differentiation by the Convolution (Savitzky-Golay) Method' in Analytical Chemistry, Vol. 62, No. 6, Mar. 15, 1990.], the last two columns are incorrect. The third column can also take a factor of 3 out. I am going to make the change and add a reference for where these numbers are calculated from (ie. the said paper above). --Gwiz42 (talk) 16:10, 10 May 2014 (UTC)

Extend the Table of Coefficients
Lacking access to S-G's original paper, it would be a kindness to extend the table at the bottom of the article to show the convolution kernels for smoothing and derivatives for window sizes 5 to 25. — Preceding unsigned comment added by Shg4421 (talk • contribs) 16:55, 24 September 2014 (UTC)
 * For most practical purposes the coefficients for quadratic/cubic polynomials are adequate. These coefficients can be calculated with the algebraic expressions given in the article.Petergans (talk) 08:11, 25 September 2014 (UTC)

Fair enough, thank you. Then perhaps we could have the same formulas for the linear/quadratic case? (And if I may, why "linear/quadratic" and "quadratic/cubic" instead of just quadratic and cubic?) Thanks again.
 * Good point. I'll work on it. It's a quirk of the system that, for example, smoothing coefficients are the same for both quadratic and cubic polynomials. A full explanation is given in Savitzky–Golay filter. It arises because the sum of odd powers of z is always zero. Petergans (talk) 07:55, 28 September 2014 (UTC)

"It arises because the sum of odd powers of z is always zero." Eureka -- thank you. — Preceding unsigned comment added by Shg4421 (talk • contribs) 21:24, 29 September 2014 (UTC)

I'm working on an Excel add-in that provides user-defined functions for S-G filtering (free, unprotected workbook, with source code). I could put that on a file sharing site and post a link, but don't recall seeing any such links in other articles. Is that not done on Wikipedia? Thanks. — Preceding unsigned comment added by Shg4421 (talk • contribs) 21:30, 29 September 2014 (UTC)
 * Personal websites are not normally considered suitable external links, unless they meet WP:ELNO. As the intention would be to use this as a semi-reference, WP:SPS is also relevant.  In short, only if you are a recognised expert in the field. SpinningSpark 08:13, 30 September 2014 (UTC)
 * A long time ago I wrote a stand-alone application to calculate the S-G coefficients. The user specifies the polynomial degree (<=3), the derivative order (0, 1, 2 or 3) and the number of convolution points. If anyone is interested in acquiring a copy of the EXE file, I can be contacted at peter dot gans @ hyperquad dot co dot uk. Source code (Delphi, Pascal) can also be provided. Petergans (talk) 09:34, 30 September 2014 (UTC)

"Personal websites are not normally considered suitable external links, ..." That's fine, when it's fully cooked, I will do as Petergans did and invite people to send me an email if they're interested. "... only if you are a recognised expert in the field" Don't know what defines an expert, but I am a Microsoft MVP for Excel. Thanks to all for responding. — Preceding unsigned comment added by Shg4421 (talk • contribs) 15:16, 30 September 2014 (UTC)
 * As it says in the link I provided, a recognised expert is one "whose work in the relevant field has previously been published by reliable third-party publications." A Microsoft MVP in Excel is not sufficient. SpinningSpark 15:52, 30 September 2014 (UTC)

Table of coefficients for smoothing the second derivative (polynomial degree 2-3) differs from formula by a factor of 2
Using the formula for smoothing the second derivative (polynomial degree 2-3) results in values exactly half the ones listed in the table. I'd tend to believe the tables, since the formula for the m=3 case results in 1/2 of the standard central difference formula for the second derivative. But one of them - the formula or the table - should be corrected. I had no trouble implementing the other formulas up to polynomial degree 3, for values or first derivative. (I didn't check the 3rd derivative formula shown). Gmstanley (talk) 06:04, 5 August 2015 (UTC)
 * With the expression for a polynomial
 * $$Y = a_0 + a_1 z + a_2 z^2 \cdots$$
 * the second derivative D2Y/dx2 is equal to 2a2 at z=0. This is clearly indicated indicated in the section "Derivation of convolution coefficients"; indeed, the interval, h, between actual data points must also be taken into consideration.
 * $$\frac = \frac{1}\left( {2a_2 + 6a_3 z} \right) = \frac{2}{h^2}a_2 {\text{ at }}z =0, x=\bar x $$
 * Petergans (talk) 21:42, 6 August 2015 (UTC)


 * The article gives the strong impression that the formulas are for calculating the convolution coefficients (using that terminology directly above the formulas for C0i, C1i, ... ). The tables are explicitly stated as presenting convolution coefficients. It's natural for readers to assume they are the same.  But they aren't the same - they differ by a factor of n!, where n is the derivative order, affecting calculations for convolution coefficients for 2nd derivative and above.  I think this needs to be more explicitly mentioned. It's not really enough mentioning that the derivation starts with polynomials with coefficients a0, a1, a2, ... and a statement about their derivatives .  The reader doesn't see all the details of what happened to that factor of 2 for the second derivative.  They just see the end result - the convolution coefficients, called C0i, C1i, ... .  They would naturally assume that the factor of 2 is built into the convolution coefficients for the derivatives. In the appendix, the tables show the convolution coefficients for the second derivative for 5 points (cubic) as ( 2/7, -1/7, -2/7, -1/7, 2/7 ) .  The example just above the tables makes their usage clear:  the second derivative is simply the dot product of the data values with the convolution coefficients, divided by the h^2 term to convert to normal time units.  There's no factor of 2 in that example, and that's fine because the table values already include that factor of 2.  But the convolution coefficient calculations by the formula come out to (1/7, -.5/7, -1/7, -.5/7, 1/7), because they don't include the n! factor.  Why not make the formulas calculate exactly the same coefficients as the table?  Many users of these formulas (obviously me, at least) would expect that the smoother will simply be the sum of products (dot product) of the convolution coefficients and the data -- that's the whole point of SG - to make things simple. And rather than put in extra explanatory words about the differences between two slightly different convolution coefficients, it would be simpler to modify the formulas to make them match the coefficients presented in the tables.    Gmstanley (talk) 07:52, 7 August 2015 (UTC)
 * I have checked that the numbers shown in in the table for 2nd derivative are the same as on p. 1634 of the paper published by Savitzki and Golay. It would be very confusing if we showed different numbers in Wikipedia. Petergans (talk) 12:55, 7 August 2015 (UTC)


 * Yes, I also verified that the tables in this article and in the original SG paper match, up through polynomial degree 3, up through derivative order 3. I also verified that when used with known polynomial inputs y as data, using the tables results in exact fits for polynomials and their analytically-exact derivatives, up to polynomial degree 3, up through derivative order 3,  where exact matches would be expected (at least for up to 13-point filters). Those fits use the tables without any correction for n!, where n is the filter derivative order. (Which is what I would expect, and is consistent with the example shown for the 2nd derivative in the Appendix, where the n! factor is not needed.)  The tables are fine, and consistent with the published papers, and consistent with the simple application that people would expect - just multiplying the convolution constants by the data values.


 * The only problem is just that the formulas in the "Algebraic expressions" section for 2nd and 3rd derivative do not calculate the table entries as I think most readers would expect. Both are convolution coefficients, and the only difference is that the 2nd and 3rd derivative formulas need to be multiplied by n! Since the tables are correct and hence should not be changed, why not fix the formulas to match the coefficients in the tables?  The derivation of the formulas is in Madden, reference 13 -- a link to a pdf that fails, so most people couldn't easily or cheaply see how the formulas were derived.  (I don't have that paper, either).  If the Madden formulas don't include the n!, you could simply put in a note just like your note 4, explaining that the difference between the Madden version of the formulas and the Wikipedia version is that it includes the n! term to generate the final, desired result of the tables.  I appreciate that you put in the comment near the top of the page about n!/h^n to start to address the problem.  But, I think that's actually confusing -- it doesn't apply at all to the formula just above it, it isn't really needed there.  Any modifications or notes are needed in the "algebraic expressions" section.


 * BTW, this is really a great article - I just think it would be even better if the formulas directly generated the table entries.  Gmstanley (talk) 23:00, 8 August 2015 (UTC)
 * You are right. The algebraic formulas have been adjusted to give the same coefficients as in the tables.
 * Given your interest, I wonder if you have an application that could be included at the top of the article. Petergans (talk) 10:50, 9 August 2015 (UTC)


 * Thank you for your quick responses! I did once build a product for real-time fault detection & diagnosis that included a graphical language for signal processing and logic, including what we called for convenience a Savitzky-Golay filter.  However, it was truly a filter, not the smoother in the SG paper. In the literature for control systems, optimal estimation for dynamic systems, etc., there's a clear distinction between the two: filtering provides the best current estimate (without knowing any future values, because they aren't available yet), while smoothing provides a past estimate for time k, knowing the future values after k.  Obviously, in control systems, filtering is what's needed to reduce noise for state estimates, not smoothing.  (Although smoothing is often fine for model parameter estimation).  Real-time diagnostics often needed to be almost as current.   But the idea was the same as SG:  for a fixed time step size, derive constants independent of data.  To see what those filters look like, you could look at the pages under "Least Squares Filter"  and "Derivative Estimate" in a filtering guide at http://gregstanleyandassociates.com/whitepapers/FaultDiagnosis/Filtering/filtering.htm
 * The reason I was looking at SG smoothing recently was for a new project, so there's no references. BTW, I would suggest changing the "filtering" terminology for this article to exclusively be "smoothing". Even S&G generally used the "correct" term "smoothing", not "filtering".   But admittedly, in general usage not involving real-time applications (photos, spectra, etc.), many people would not care about the distinction. Gmstanley (talk) 21:39, 9 August 2015 (UTC)
 * Scanning spectrophometric data are often presented in the form of absorbance as a function of wavelength, but in reality data points are generated in a time sequence. Such data cannot be distinguished from other data generated as a time sequence. The relation between of smmothing and filtering in SG processing is shown in the section on "frequency characteristics". Petergans (talk) 07:39, 10 August 2015 (UTC)


 * I can't say that I see any discussion of smoothing vs filtering in the frequency characteristics section. The issue isn't just that frequency domain analysis can be done, it's causality.  Fourier transforms include past and future data in their definition.  But control systems can't use future data.  That's why, to the extent that control systems are still designed by (linear) frequency domain methods, Laplace transforms are used rather than Fourier transforms (and the equivalent in discrete time is the z-transform).  Laplace transforms honor causality - that the future can't have any impact on the present.  Smoothing can use future data, but real time filtering (as used in control systems when reducing noise) cannot.  Discrete filtering evaluated at a given time k can only multiply by constants at times (k-1), (k-2), .. etc. - not the positive times in the future (k+1), (k+2), ...  .  You get different frequency response for filters than for smoothers. For instance, unlike the smoothing case, for a least-squares filter like SG, you get phase lead, and overshoot in response to a step input.  (See the example in the web page I cited above.)  I would think that the same applies when dealing with data at the end points of a smoothing window when future data isn't available, and special handling is required.  When estimating the newest time value as an end point (instead of the central value in the m points) by using the same polynomial as for previous points, I'd guess you're probably going to get the same phase lead behavior.  I'm not sure I understand the spectrophometric data example -- the issue isn't how the data is generated, it's when you run the evaluation.  I would have guessed that those spectral analyses are done after the time sequence is completed, not part-way through, possibly in the middle of a peak (which would be the real-time/dynamic system analogy).  Gmstanley (talk) 15:50, 10 August 2015 (UTC)


 * (back to the convolution coefficients...) I saw the most recent change in the Appendix. The use of the table terms for the second derivative now shows multiplying those coefficients by 2.  From numerical experiments with actual quadratic and cubic polynomials where the derivatives are known exactly, I can assure you that your original formula was correct, not this newly-modified one. Those tables already included the n! term built into them. The only changes that were ever needed were in the formulas in the "algebraic expressions" section, and those were already made. Gmstanley (talk) 16:03, 10 August 2015 (UTC)


 * The distinction is not as clear-cut as it appears. Take the case of a 5-point smoothing function. In standard usage this is applied to the 3rd, (central) point, using one set of coefficients. There is an alternative, described under "treatment first and last points" in which the smoothed value at the 5th, (last) point can be calculated using more than one coefficient set. The issue is about the nature of the fitting function. SG uses a polynomial fitting function. Other filters use different functions. Regarding spectrophotometers, this is real-time smoothing as the data are being produced. In the old, pre-digital, days it was done by means of a Resistance-Capacitance circuit. Petergans (talk) 08:17, 11 August 2015 (UTC)


 * Seriously, that extra 2 multiplier recently added to show how to use the tables for the 2nd derivative in the appendix section is wrong. It was right before.  Try it with known polynomials to verify.  — Preceding unsigned comment added by Gmstanley (talk • contribs) 01:24, 15 August 2015 (UTC)

Robust variant?
The Savitzky–Golay filter fits a curve by "by the method of linear least squares," which means that it minimizes the mean square error between the polynomial and the observations; this is however sensitive to outliers and might therefore not be suitable in all cases. Is there a robust variant of the Savitzky–Golay filter that instead minimizes the mean absolute error, or a Huber loss (or some other loss function with bounded derivatives), and what is that filter called in that case? I have found a paper called "Robust Savitzky-Golay filters," but that is behind a paywall so I cannot access it. —Kri (talk) 12:35, 23 October 2023 (UTC)