Wikipedia:Reference desk/Archives/Mathematics/2011 November 29

= November 29 =

Cardinality before Cantor?
I'd like some history on the notion of cardinality before Cantor, and I'm coming up empty. Epp's Discrete Mathematics (text for the course I'm teaching) says:
 * "This way of thinking about numbers was developed over several centuries by mathematicians in the Chinese, Hindu, and Islamic worlds, culminating in the work of Al-Kashi in 1427. In Europe it was first clearly formulated ... by Simon Stevin." Google book link

I don't see anything specific in those articles about what exactly these guys said that we now recognize as being in the spirit of Cantor. I've heard that there is material in the Archimedes Palimpsest that suggests Cantor's approach, but couldn't find anything specific on that either. Any specifics would be greatly appreciated (not just limited to the folks I linked). I'm especially interested in Chinese, Hindu, and Islamic contributions. Thanks- Staecker (talk) 02:05, 29 November 2011 (UTC)

Oops- now that I reread that bit from Epp, it looks like maybe that quote is about the idea of real numbers as decimal expansions rather than cardinality via bijections. That would make more sense considering what little I know of Stevin. Anyway, I'm still interested in any Cantor-like ideas on cardinality before Cantor. I know (I assume) that nobody else had demonstrated that R was uncountable, but are there earlier sources who discuss cardinality in terms of bijections? Staecker (talk) 02:09, 29 November 2011 (UTC)
 * As I recall, Galileo did. I think his observation was something along the lines of there being as many square numbers as natural numbers.  But he didn't do much more with the idea; in particular, he never came across the idea of bigger cardinalities than that of the naturals.  This is all from memory; I don't have a reference at hand. --Trovatore (talk) 02:53, 29 November 2011 (UTC)
 * He also used nested triangles to show that two lines of different lengths have the same number of points. --COVIZAPIBETEFOKY (talk) 04:46, 29 November 2011 (UTC)
 * Our Georg Cantor article says "For example, geometric problems posed by Galileo and John Duns Scotus suggested that all infinite sets were equinumerous — see Moore 1995, p. 114." Also off the top of my head, on Galileo, I think Bell's Men of Mathematics has something, and/or James R. Newman's 4 volume World of Mathematics collection. Among philosophers, thinking goes back to Anaximander.John Z (talk) 04:59, 29 November 2011 (UTC)


 * As to the research on the Archimedes Palimpsest let me say that it should be considered with some care, at least. My personal feeling is that there is no limit on what a happy, easy and well-paid research can find in those poor remains. Would you like the FLT by Archimedes? Pay, and they will find it. --pm a  17:13, 30 November 2011 (UTC)

Fourier Analysis
Here is the background. I have a bunch of data segments of different lengths each but they are all measurements from a single underlying signal and I want to use these data segments to estimate the power spectrum density of the underlying signal. Each data segment is uniformly spaced in time. I am not familiar with signal processing and Fourier analysis is notorious for being known as black magic. I ask two people and I get three different opinions on what to do and how to do it. And then no one can really convince me with their "reasoning" for why they did what they did so I appeal to wikipedia mathematicians. I have three specific questions and I ask for your opinions.

1. I am thinking of zero padding all of the data segments (to match the length of the longest segment) so that they can all be the same length. This way when I get their PSD, each segment will give me the same frequency resolution and then I can average them to get a single PSD. How does this sound? Should I add zeros at the end of the segment or should I do pad on both sides to keep the nonzero values in the middle so that the padded segment is (continuously) periodic? Is there any advantage of one over the other? Zero padding at the end is easier to program. And also, does it really matter nowadays if I pad to the next power of two? I mean the algorithms nowadays are fast enough either way, right?

2. To clarify, I want to get the PSD of each segment separately and then take their average to get a single PSD of the underlying signal. When I do take the average, I am also thinking about weighing it according to how many nonzero elements were there in the original segment. If I have three segments of length 100, 1900, and 2000, I would pad the first with 1900 zeros and the second with 100 to make them all the same length. Compute the PSD for each segment individually and then do

PSDaverage=PSD1*(100/4000)+PSD2*(1900/4000)+PSD3*(2000/4000).

Does this sound reasonable? Because obviously zero padding adds no new information and the 2000-segment contains much more information than 100-segment so it should be taken more seriously.

3. Lastly to get a single PSD of the underlying signal, I am thinking of just taking the (weighted?) arithmetic mean of all of the zero-padded PSDs. But one of the previous papers trying to do the same thing I am doing, took the log (base 10) of the power and then took their average. That doesn't make much sense to me. Is there a reason for it or any particular advantage someone knows of? The only thing I can think of is that the power might range over a few orders of magnitude but doing a log plot makes sense for plotting. But taking the log and then averaging weights different things differently. Is it better to do

PSDaverage=PSD1*(100/4000)+PSD2*(1900/4000)+PSD3*(2000/4000).

or should I do

PSDaverage=10^(log(PSD1)*(100/4000)+log(PSD2)*(1900/4000)+log(PSD3)*(2000/4000)).

Any comment/constructive criticism is welcome. If it is relevant, I am thinking of using the multitaper method with eight slepian sequences. Thank you in advance! -  Looking for Wisdom and Insight! (talk) 05:01, 29 November 2011 (UTC)


 * This is a problem I have a lot of experience with. Unfortunately you've left out the most important piece of information, which is how long your data segments are in relation to the signal frequencies you are interested in.  If the data segments are long, then almost anything you do will give reasonable results; if they are very short, then almost anything you do will be problematic.  Generally speaking zero-padding would not be my first choice, as it attenuates power and also will induce spurious power at high frequencies unless you first detrend your signals.  When I have dealt with this problem, in the context of EEG signals recorded from brain activity, my approach has been to use periodic boundary conditions -- but I try to avoid working with data segments that are very short.


 * Regarding your question 3, using the log basically means taking the geometric mean instead of the arithmetic mean, but I don't know any principled reason for doing that in a general case.


 * Lastly as a piece of advice, if your segment lengths are highly variable you should probably impose some constraints on them, otherwise you will face a serious problem of inhomogeneous variance when you try to do statistics. (Sad that we don't have an article about that.) Looie496 (talk) 16:45, 29 November 2011 (UTC)
 * We have an article on heteroscedasticity though, so could create a redirect, if appropriate? Qwfp (talk) 10:57, 30 November 2011 (UTC)
 * Good idea! I've done that.  Unfortunately that's a pretty poor article, but at least it's a start. Looie496 (talk) 17:15, 30 November 2011 (UTC)

Thanks for the reply but how do I measure how long the data segments are in relation to the signal frequencies I am interested in (what does that mean)? As for zero padding, I forgot to mention I will detrend each segment. Lastly, could the usage of geometric mean have something to do with various orders of magnitudes, especially if they were working in single precision or something (the paper is two decades old and I think they just wrote all of their own routines)? I suspect (I don't know for sure) the power might range from 10^-4 to 10^6 or something of the sort. So they were perhaps afraid of numerical error (like if they were using single precision) so taking the log makes the small numbers larger (in magnitude) and large numbers smaller. The results in their paper were over several orders of magnitudes. I am working in MATLAB so everything is double and I am not writing my own FFT or anything so I am not crazy about powers of two either. Another reason could be that because of the exponential decay in the power vs. frequency domain (that is what the PSD looks like in their paper), they thought that the geometric mean would be more appropriate than the arithmetic mean. What do you think? -  Looking for Wisdom and Insight! (talk) 00:53, 30 November 2011 (UTC)
 * I suggest that what you had better do, if you intend to use your ultimate results for anything important, is to pick out some data that you don't intend to use for the final product, and play around with different methods of analyzing it. If you don't get an intuitive feel for the consequences of different treatments, you will never really be confident that your results are robust. Looie496 (talk) 17:19, 30 November 2011 (UTC)