Wikipedia:Reference desk/Archives/Mathematics/2012 June 24

= June 24 =

"Maximum girth" of a graph?
Girth (graph theory) defines girth as the length of the shortest cycle of the graph. Is there a term for the longest cycle that does not repeat a vertex (except the start point)?--Wikimedes (talk) 07:38, 24 June 2012 (UTC)


 * See Longest path problem; but you're looking for the longest cycle, and I can't see any special name for that. -- The Anome (talk) 12:27, 24 June 2012 (UTC)


 * Update: perhaps "longest elementary circuit"? See http://stackoverflow.com/questions/4530120/finding-the-longest-cycle-in-a-directed-graph-using-dfs, http://dutta.csc.ncsu.edu/csc791_spring07/wrap/circuits_johnson.pdf and http://dutta.csc.ncsu.edu/csc791_spring07/wrap/circuits_tiernan.pdf -- The Anome (talk) 12:31, 24 June 2012 (UTC)


 * Update 2: we probably also need a bluelink for elementary path. See for definitions of both this and "elementary circuit". Also, I would expect that elementary cycle would be a reasonable synonym for "elementary circuit". -- The Anome (talk) 12:34, 24 June 2012 (UTC)


 * van Lint, J.H. and Wilson, R.M. (1992) A Course in Combinatorics. Cambridge University Press, Cambridge p.5 calls them "simple paths" and "simple closed paths".  Grimaldi, Ralph P. (1985) Discrete and Combinatorial Mathematics.  Addison-Wesley Publishing Company, Inc. p.426 calls them "simple paths" and "simple cycles".  I'll see if they say anything about longest paths and closed paths/cycles.
 * Distance (graph theory) and diameter look like related concepts, but they're based on the shortest path between vertices rather than the longest, and are not cycles.--Wikimedes (talk) 19:04, 24 June 2012 (UTC)

Fitting cos to Data Points
Hi,

Suppose there is a data set of points, where an x-value is paired with a y-value. I was wondering how one would attempt to find a for f(x) = cos(ax) such that the error for every point in the data set (which can be least squares) is minimized. Ideally, I would also like to have a function f(x) = cos(ax) that never overestimates the values for any x, but I understand this may be too much of a constraint. I have tried to look into how this might be done, but have mostly run into dead ends, and finding that people are much more concerned with finding a more complicated, better fitting function (sum of sinusoids, etc.). However, I find it a little hard to believe this is a difficult thing to do. Nkot (talk) 20:12, 24 June 2012 (UTC)


 * Do you know the amplitude and phase of the data a priori? If not, you need more parameters: $$f(x)=A\cos(kx+\delta)$$. I don't know if there is a nice formula for finding the optimal values for those parameters (but I doubt there is) so I would just use a computer. There are plenty of software packages that can do that kind of optimisation (including Microsoft Excel with the Solver add-in that comes with it). --Tango (talk) 23:51, 24 June 2012 (UTC)


 * To put that in English, you can vary the wavelength/frequency (what you called "a" and Tango called "k"), magnitude (what Tango called "A"), and phase shift (what Tango called "$$\delta$$") of a cosine wave. Can all of these change to fit the points, or are some known in advance ?  Also, do you want to do this curve fitting manually, using an existing program, or write your own ?  (I can help you write one, if this is your goal.)  StuRat (talk) 01:51, 25 June 2012 (UTC)


 * This can be done using non-linear least squares. Unfortunately that article is written in a style that only an advanced mathematician can understand -- it's easier than the article makes it look, but still involves some pretty intricate algebra and leaves you with a set of simultaneous nonlinear equations to solve. Looie496 (talk) 02:37, 25 June 2012 (UTC)


 * It is important that only the frequency can be/need be changed (so A=1, phase shift is zero). I looked at the non-linear least squares article and didn't really understand, only really gathering that I'm not likely to get a good closed-form solution (which seems to be what you suggest as well). It would be interesting to write a program to find a solution, but I'm interested in this as more of a curiosity than a real application. As for the constraint that the estimate is never greater than the actual data points, is this possible or would that have to be abandoned? Nkot (talk) 03:51, 25 June 2012 (UTC)


 * Yes, that's possible, but you must realize this means you aren't likely to get as good of a fit (or perhaps no fit at all). I can see two approaches for solving it by trial-and-error:


 * A) A hill-climbing algorithm. This would try a few values (let's say 10, evenly spaced) in the range you want to allow for a, and find the closest in that range (or closest curve which is always less than the data points, if you prefer).  It would then try more values close to the optimal value found so far, find the optimal from that set, then find more values close to that, etc.  This approach is fast, but not guaranteed to find the optimal solution.  It might instead find a local optimum value, not the global optimum.  Using the hill-climbing analogy, you find the top of a hill, but not necessarily the tallest hill.


 * B) Use a brute force algorithm. In this case, just divide the allowed range up into many values, say a million, and try each, to determine which is optimal.  This approach would be slower, but should find the global optimal solution.


 * If I were to write such a program, I'd make the constraint of having the curve be under every data point optional, so, if no solution is found, you can drop that constraint.


 * Also, you could use all 3 variables (frequency, amplitude, and phase) in the program, but this might make the brute force approach too slow (1000 frequencies × 1000 amplitudes × 1000 phase shifts = a billion trials), so then you'd need to go with the hill-climbing method. StuRat (talk) 04:24, 25 June 2012 (UTC)


 * It may be simpler to treat the data points as weighted Dirac delta functions of the variable x and to sum these to produce a data function $$\textstyle f(x) = \sum_i y_i \delta(x - x_i)$$. Determine its Fourier transform F(ω), find find the value of ω that maximizes Re(F(ω)), and use this as a. This would permit use of standard FFT libraries or else analytic methods up to before the (in principle) straightforward task of hunting for the peak. Just an idea – others are welcome to crit it, though it feels like some variant of this should work. — Quondum☏ 05:42, 25 June 2012 (UTC)


 * On second thoughts, this minimizes the wrong mean square quantity: the integrated square error. So perhaps not.  — Quondum☏ 09:29, 25 June 2012 (UTC)


 * You could try a polynomial fit using the first few terms of the Taylor series of the cosine to estimate a: cos(ax) = 1 - (ax)^2/(2!) + (ax)4/(4!) - (ax)^6/(6!) ....--Wikimedes (talk) 19:05, 25 June 2012 (UTC)


 * I haven't tried this out but I get the distinct feeling that I could fit practically any random set of ((x,y) pairs with y within -1 to 1 with cos((ax) to any desired accuracy by making a large enough. Dmcq (talk) 13:54, 25 June 2012 (UTC)


 * I see what you're saying, it would become a smear covering all points within the positive and negative amplitude range. However, I think we can assume that this would count as "cheating", and that some limit is to be applied to the range of frequencies allowed, to prevent this. StuRat (talk) 19:11, 25 June 2012 (UTC)


 * We normally measure vertical distance from the curve, rather than shortest distance, so that doesn't quite work. Consider the set of points (0,1), (1,0) and (2,1). To go through the first and last points, there must be a whole number of wavelengths between the two points. That means the midpoint must either be y=1 (if there is an even number of wavelengths) or y=-1 (if there is an odd number of wavelengths). It can never be zero. There may be some trick where you can get close to zero by slightly missing the first and last points, but I can't see how to do it. --Tango (talk) 20:46, 25 June 2012 (UTC)
 * For strategically placed points this can be made impossible. But I believe for generic points (e.g., irrational numbers linearly independent over the rationals) it can be fit with arbitrary accuracy. -- Meni Rosenfeld (talk) 08:48, 26 June 2012 (UTC)
 * Perhaps whenever I say 'practically any' I should add 'that don't arise in practice when people are involved'. ;-) Dmcq (talk) 09:44, 26 June 2012 (UTC)


 * An alternate solution, not guaranteed to be optimal, but easy to implement would be looking for peaks. Sort the data by x value, and look for any point that is greater than any of the n neighbors closest to it. If the data is dense enough and you pick a good value for n, you should be able to find how often peaks occur and calculate the frequency. This would be best for an interactive system so you can see if the results make sense or tweak n if they don't. 209.131.76.183 (talk) 16:25, 27 June 2012 (UTC)