Wikipedia:Reference desk/Archives/Mathematics/2020 October 19

= October 19 =

Linear estimation of a local minimum from discrete data
I have an array of data &mdash; say, six (x,y) values &mdash; characterized by a minimum y value in the middle of the set, increasing toward the ends of the x range. (The x,y values in this case come from a volatility smile, which I have characterized as a superposition of two hyperbolic functions, but that isn't important.) Basically, I have discrete points on a smooth curve, and I don't want to know or care what the underlying curve is.

Is there a quick linear method for approximating the value of x that corresponds to the minimum of the smooth curve, without resorting to spline fits, other polynomial fits, or fitting my own data model (which doesn't have a closed form solution)?

For example, I can eyeball it easily enough: *    *             *               *         *_ *          ^        minimum likely here I'm just wondering if there's something I can do, perhaps with a weighted average of the slopes of each segment, that might give me an approximation of the minimum without curve-fitting. Somebody must have figured this out. Is there a technique for this? ~Anachronist (talk) 06:46, 19 October 2020 (UTC)
 * Simply calculating the argmin from the data directly should suffice, right? This linear-time procedure will take less time than anything else I could suggest here. Otherwise, your answer will depend on what kind of smooth curve you select; you'd need to essentially find the root of the derivative and numerical differentiation really only works here with some interpolation of some sort.--Jasper Deng (talk) 07:16, 19 October 2020 (UTC)


 * You can fit a quadratic curve (or, for that matter, any polynomial of a fixed degree) in linear time. If you give the data points weights $$\gamma^{\,y_i}$$ for some $$\gamma$$ between 0 and 1, the higher values – presumably farther away from the minimum – get a lower weight, which is useful if the shape of the curve is poorly approximated by a parabola; that in the example looks more cuspy. (This method of using weights is insensitive to translations but sensitive to scaling of the $$y$$ values.) It will work poorly, though, if the curve has relatively flat ends. You can alternatively use a sliding window of a fixed width, say with indices $$[0\,\,..\,2m],~m\geq 1,$$ to keep track of a data segment of length $$2m{+}1$$. The $$i$$-th data point is stored at position $$i~\mathbf{mod}~2m{+}1.$$ After detecting the minimum in the stream of data à la JD, keep copying to the window for $$m{-}1$$ more data points; then stop. The discrete minimum is in the middle of the segment of the retained data points. Then use any method of curve fitting to find the minimum. --Lambiam 08:27, 19 October 2020 (UTC)


 * If you're suggesting that argmin is simply the minimum-value data point, no, that isn't what I need. In my crude illustration, I deliberately put two points (asterisks) at the same y-level, with different slopes on the left and right, to suggest that the minimum of the curve isn't right in the middle of those two lowest points, nor is it either one or the other point if the points are unequal by a tiny amount. In the illustration, the minimum (the underscore character) is closer to the left low point.
 * I guess I could restate the problem this way:
 * Given:
 * an unknown asymmetric curve having a first derivative that increases smoothly from negative to positive in an unknown non-linear way,
 * a sparse sampling of known (x,y) data points on that curve, equally spaced along the x axis,
 * a minimum y value in the range of (x,y) samples available,
 * what approach would result in a quick approximation of the x location (not the value) of the minimum of the curve?


 * The best fit curve for data points in a volatility smile appears to be a skewed hyperbola. I haven't found an efficient way to fit such a thing without resorting iterative techniques. You've given me an idea with sliding windows and linear interpolation though. I need to think about it. ~Anachronist (talk) 18:56, 19 October 2020 (UTC)


 * This is closely analogous (though the other way up) to estimating the position of the mode when data have been classified, the construction involving a pair of crossed straight lines at the top of the modal class bar of the drawn histogram. A simple estimate, which you would have to judge the adequacy of, comes from projecting inwards the straight lines given by joining the data points closest to the minimum to the respective next ones moving outwards. → 2A00:23C6:AA08:E500:40DF:4AE9:718D:E2EA (talk) 13:14, 20 October 2020 (UTC)


 * If the derivative passes smoothly through zero, the curve can locally, around the minimum, be approximated by a parabola. If the spacing of the sample values of $$x$$ is so narrow that the segment involving the four points nearest to the sample minimum is very close to a parabola, then the $$x$$-value of the intersection of these lines is very close to the $$x$$-value of the sample minimum. So, if the curve is locally an exact parabola of the form $$y=a(x-m)^2+b,$$ and the data points have $$x$$-values $$..., -2, -1, 0, 1, 2,...,$$ the value found is $$\mathrm{round}(x)$$, so it is not a continuous function of $$m$$. Obviously, the method is perfectly suited for curves that are locally close to $$y=a|x-m|+b.$$ --Lambiam 22:18, 20 October 2020 (UTC)


 * A general equation for a hyperbola is given by $$(y-a_\mathrm{L}x-b_\mathrm{L})(y-a_\mathrm{R}x-b_\mathrm{R})=c.$$ The asymptotes are the pair of lines defined by $$y=a_\mathrm{L}x+b_\mathrm{L}$$ and $$y=a_\mathrm{R}x+b_\mathrm{R})=c.$$ It has five degrees of freedom, which means a given hyperbola is uniquely defined by any five points lying on that hyperbola. If the curve is indeed hyperbolic, the five parameters can be determined by the five data points around the sample minimum. There are two solutions; take the one with $$a_\mathrm{L}0,$$ and $$c>0.$$ Warning: there is a risk of instability if the data points do not lie neatly on a hyperbolic curve. It is safer to use a larger window (say seven data points) and find a best least-squares fit. --Lambiam 10:04, 21 October 2020 (UTC)
 * Note that for five random points the conic that fits them may be an ellipse, or if it's a hyperbola then the points might be on different branches, so some care will be needed to ensure this doesn't happen. Assuming the slope is really increasing, in other words that the curve is concave up, then the second issue should never occur. You can tell if the first one occurs by checking the discriminate of the quadratic terms. (If the conic has the form Ax2+Bxy+Cy2+Dx+Ey+F=0 then the discriminant is B2-4AC.) The conic will have two horizontal tangents; if it's a hyperbola then you want the upper one (assuming the slopes of the asymptotes have opposite signs) and if it's an ellipse then you want the lower one. If there's any uncertainty in your data then you might consider sticking with a parabolic model to avoid Overfitting. The algebra involved with the parabolic model is much simpler as well, so if you don't need much precision then it may more suited. --RDBury (talk) 12:30, 21 October 2020 (UTC)

The problem here is that the hyperbola is skewed, with a steeper curve on one side of the minimum than the other.

Anyway, I think I figured out a linear approximation without resorting to curve-fitting, which seems to work for an underlying assumption of a smoothly increasing first derivative that starts out negative and transitions to positive. The anonymous contribution from User:2A00:23C6:AA08:E500:40DF:4AE9:718D:E2EA had the right idea. To find the location of the minimum of my curve, given only discrete points on it, and without trying to fit an arbitrary curve to it, I do this: This seems to handle skewed hyperbolas well. If the slope on the left is steeper than the slope on the right, the location of the minimum is nearer to the left point of the two lowest points. ~Anachronist (talk) 05:50, 23 October 2020 (UTC)
 * 1) Find the two lowest points.
 * 2) For the point on the left, draw a line through that point and the point on its left.
 * 3) For the point on the right, draw a line through that point and the point on its right.
 * 4) Where these two lines intersect approximates the x-value of the minimum (but not the y-value). Done.
 * That doesn't sound like a good way to take into consideration data other than the middle four points. This doesn't seem like a robust statistic to me in the same way that the min and max of a data set are both not robust statistics (sampling does a poor job of estimating them). Note that there is one and only one hyperbola with this property for aeach given set of points. --Jasper Deng (talk) 07:35, 23 October 2020 (UTC)


 * (a) The equation I gave above (at 10:04, 21 October 2020 UTC) describes skewed hyperbolas as well. The left and right slopes are given by $$a_\mathrm{L}$$ and $$a_\mathrm{R}.$$
 * (b) I expect that you will find that the intersection method generally gives a result that is suspiciously close to the $$x$$-value halfway those of the lowest two points. If the four points involved happen to lie precisely on a parabola, you will get the exact halfway value. For example, take $$(x_0,y_0)=(0,1),(x_1,y_1)=(1,0),(x_2,y_2)=(2,0.99),(x_3,y_3)=(3,4).$$ These four points lie almost on the parabola with equation $$y=(x-1)^2.$$ The lowest two points are at $$x_1$$ and $$x_2.$$ The intersection procedure results in $$x=603/401\approx 1.50374.$$ Yet, clearly, the minimum should be just above $$x=1.$$  --Lambiam 08:51, 23 October 2020 (UTC)


 * To drive these points home, take the hyperbola defined by $$(y+x-1)(y-3x+7)=8.$$ The concave-up branch is then $$y=x-3+2\sqrt{x^2-4x+6}.$$ It attains its minimum at $$x \approx 1.18350,$$ yet the intersection method results in $$x \approx 1.61401$$, closer to the halfway point $$x=1.5$$ between the lowest two points when sampling at integral $$x$$-values. --Lambiam 11:29, 23 October 2020 (UTC)