Wikipedia:Reference desk/Archives/Mathematics/2010 June 25

= June 25 =

Probability question
n identical shapes (think of them as cut out of paper), each of area S, are placed independently and uniformly at random over a region of area A, overlapping as necessary. What is the probability, p, that every point in the region will be covered by at least one shape? I am only interested in cases where n will be very large (millions or billions, say), before p becomes non-zero, and a decent approximation would be fine. Edge effects can be ignored. I assume p does not depend on the (non-pathological) shape of the region being covered, but it's not obvious to me whether it depends on the (non-pathological) shape of the covering shapes. If it does, assume circles. —Preceding unsigned comment added by 81.151.34.16 (talk) 03:49, 25 June 2010 (UTC)
 * would you clarify better "...uniformly over a region A". A possible interpretation is: $$n$$ points of $$A$$ are choosen indep. and uniformly on $$A$$, and the $$n$$ congruent shapes are $$S_i=x_i+S$$, for $$i=1,..n$$ (having assumed $$0\in S$$). This allows some of them to partially get out of $$A$$. Or are you imposing $$S_i\subset A$$ (this should make things more complicated). And are you also considering rotations or just translates (simpler option)?


 * Thanks for your reply. I'm afraid I do not understand your notation "$$n$$ congruent shapes are $$S_i=x_i+S$$". As I mentioned, I am not concerned about what happens at the edges, including whether shapes can partially overlap the boundary. The shapes are so small compared to the containing region that it's irrelevant for my purposes. I assume that there is a limiting distribution that holds as the covering shapes get indefinitely small, and this limiting distribution is really what I'm after. If the shape of the covering pieces matters, and it matters whether we consider rotations, then assume circles. 86.135.29.110 (talk) 14:12, 25 June 2010 (UTC).


 * When S out of A square centimeters are covered, each cm2 is covered on average S/A times. When n  shapes are placed, the average cm2 is covered nS/A times. The actual number of times has a poisson distribution. The probability that some particular cm2 is uncovered is e&minus;nS/A. The average uncovered area is L = Ae&minus; nS/A. Bo Jacoby (talk) 07:27, 25 June 2010 (UTC).


 * Does this lead to an answer to the original question? 86.135.29.110 (talk) 14:12, 25 June 2010 (UTC).


 * Yes, I think so. The dependence on shape enters here. Consider the case where each shape is a square and the shapes are placed on a square grid like a chess board. Then the average number of uncovered squares is L / S and the answer to the original question is
 * $$p=e^{-\frac L S}=e^{-\frac {A e^{-\frac{nS}A}}S}$$
 * Bo Jacoby (talk) 19:18, 25 June 2010 (UTC).
 * Are you assuming that the squares are always placed exactly on the grid lines? If so, this won't work (I mean, it isn't in the spirit of the original question). The squares can be placed anywhere. 86.185.77.226 (talk) 19:29, 25 June 2010 (UTC).


 * The shape does matter unfortunately. This can be seen by comparing two circles of area S/2 joined by a line as the basic shape area S. When rotations are allowed the distance between is large this can be consiered almost as two independent circles of area S/2. If we split up the original area A in two then if n circles of area S have probability p of fully covering area S then the same can be said of n circles of area S/2 covering area A/2. So the probability of the two circle combinations of area S covering area A is p2.
 * I think we should just consider the circles or squares form of the original question. I'll imagine the circle form as if I'm spray painting what is my chance of covering the object completely, I've protected the area around so I can go evenly up to the edges. Of course spray paint is rather more uiform and the paint won't form exact circles but were in maths land here. Dmcq (talk) 08:51, 25 June 2010 (UTC)
 * p.s. I had been wondering if this could be applied to how long till Jackson Pollock completely covered every spot of the canvas but since he dribbled bits around rather than using drops I suppose not :) Dmcq (talk) 08:56, 25 June 2010 (UTC)
 * I don't quite follow this. I'm not sure how, in the two-circles-joined-by-a-line case, you arrange that the two circles always fall in different halves of A. In any case, in my problem we can assume that the extent (as well as the area) of the covering shapes is negligibly small compared to the region A (otherwise there are numerous edge-related complexities which I specifically want to ignore), so I don't think this argument can apply. 86.185.77.226 (talk) 18:26, 25 June 2010 (UTC).
 * I wasn't saying they would fall in different halves of A, just that allowing rotations the two halves of the long dumbell would act as two independent circles to all intents and purposes when covering the area. And I believe Bo Jacoby's formula would apply generally when the area has a reasonable chance of being fully covered. For n shapes of area S in an area A you would have probability of something like exp(-K*A*exp(-nS/A)/S)) of completely covering the area where K is some constant depending only on the shape. This does depend on some big assumptions, the main one being that as the flecks of uncovered space get separated they are small and are wholly covered with probability S/A for each added shape. I wouldn't have thought the difference between a circle and a square would make a big difference to the outcome. Dmcq (talk) 19:49, 25 June 2010 (UTC)
 * I wasn't saying they would fall in different halves of A, just that allowing rotations the two halves of the long dumbell would act as two independent circles to all intents and purposes when covering the area. I understand that, but wouldn't this just tell us that p(A,S,n) = p(A,S/2,2n)?? Why would there be a problem with that? (By the way, I'm not particularly arguing that the formula shouldn't be shape-dependent, I just can't see how your argument demonstrates that it is.) 86.185.77.226 (talk) 21:02, 25 June 2010 (UTC).
 * It does tell you that,but you can then split up that 2n lot into two halves and (S/2)/(A/2)=S/A. Each half has p(A/2,S/2,n) of being completely filled which will be same as p(A,S,n) with two of those areas stuck together the probability of the two areas stuck together being filled is p(A/2,S/2,n)p(A/2,S/2,n) and this should be the same as p(A,S/2,2n). However if shape doesn't matter this last should be the same as p(A,S,n). So we have p(A,S,n)=p(A,S,n)^2 Dmcq (talk) 21:25, 25 June 2010 (UTC)
 * When you talk about splitting A into two halves, each of area A/2, how are those halves defined? I've tried the exp(-K*A*exp(-nS/A)/S)) formula against simulations with square-shaped covering shapes and, while there is always the possibility that I made a mistake, the results do not look good. If I haven't made a mistake then the formula looks definitely wrong. How confident do you feel about it? 86.174.161.139 (talk) 23:08, 25 June 2010 (UTC).
 * Nothing special, just chop in half. It shouldn't matter if A is a square or rectangle or circle if it is significantly larger than S. What did you simulate? Dmcq (talk) 23:18, 25 June 2010 (UTC)
 * I'm sorry if I'm being slow, but I just don't see it. If you chop A in half, then, in order for your logic to work, half of the dumbell ends need to fall into one half, and half into the other, don't they? How is that arranged? A typical simulation was placing 5 x 5 squares (anywhere) on a 100 x 100 grid. The results I get are nothing like a shape that could be produced by the proposed formula (it feel too wrong to be just due to the fact that this a way off insignificantly small S). However, placing 1 x 1 squares on a 20 x 20 grid, for example, gives fairly plausibly similar results to the formula with K = 1. This suggests something is going wonky with the logic once it becomes possible for the squares to overlap. I am not 100% certain about any of this, by the way! 86.174.161.139 (talk) 23:53, 25 June 2010 (UTC).
 * I'm pleased that your 1x1 simulation confirmed my analysis. How did you treat the edges in your 5x5 simulation? Bo Jacoby (talk) 06:49, 26 June 2010 (UTC).
 * I think Bo's reasoning may be correct for the 1×1 case but break down for larger shapes because the probabilities of two nearby squares being covered (or uncovered) is not independent. For example, if there are only two squares left to cover, if they are neighbours there's a good chance that both will be covered at once. If they are far apart this is clearly impossible. I also notice that Bo's answer above to the original question takes the form of the cumulative distribution function of a Gumbel distribution, implying the distribution of the number of shapes placed before every square is covered would follow this distribution. Qwfp (talk) 08:52, 26 June 2010 (UTC)


 * For the dumbell the two parts would be in the same half practically all the time, Try an A4 sheet with small circles and then two halves of an A4 sheet with dumbells of circles 1/sqrt(2) the radius and I was thinking the smaller circles on the A5 sheet should behave to all intents and purposes like the bigger circles on the A4 sheet. I wasn't trying to make the two bits go in the different halves and normally they'd be in the same half. As to the reasoning about the little bits remaining, the logic is that when there is a reasonable chance of the whole area being covered the specks would tend to be far apart relative to the size of the shapes. The big assumption is that a speck is likely to either be completely covered or completely missed, my guess was this doesn't make a big difference. I had a quick look at paint coverage and a couple of other things but didn't see anything close but I agree with Bo Jacoby someone has probably looked at something similar. Also you get a nicer formula I think if you define N=A/S, i.e. the number of shapes you'd need if they had no overlap. The formula then is p(N,n)=exp(-K*N*exp(-n/N)) and solving for n you get n=N*log(K*N/log(1/p)) so it gets dominated by N*log(N). So the smaller the drops the more paint you have to spray to be absolutely sure of covering every single point, quite a lot of paint in fact. I appreciate that you think the simulations disagree with the formula but I have no idea from your description what it is that you were looking for or what you saw. Dmcq (talk) 10:16, 26 June 2010 (UTC)


 * A study of the effects of bombing in might give something, I don't have access Anything which references it might also be useful. percolation is the study of when the area becomes solid, that is independent of n but might I guess relate to the shape effect K. Dmcq (talk) 12:48, 26 June 2010 (UTC)
 * Counting dust or bacteria is also related, but perhaps more to the percolation problem Dmcq (talk) 13:10, 26 June 2010 (UTC)
 * Thanks for those Dmcq. After a bit of forward and backward citation chasing with the help of the Science Citation Index, this paper looks most promising: . I'm fortunate enough to have access to it, but i don't have time to read it properly right this minute and to be honest its format is a bit mathematical for me so i thought i'd just post the ref in case someone else can 'decode' it first. Qwfp (talk) 15:24, 26 June 2010 (UTC)
 * On second thoughts this one looks more promising, though more heavily mathematical: . Interestingly his Theorems 1.1 & 1.2 do involve the Gumbel distribution, but the expressions are a lot more complicated than Bo's expression above. Qwfp (talk) 17:43, 26 June 2010 (UTC)

@Dmcq. Thanks for your further explanation about the dumbbells. I get it now.

@Bo Jacoby / Dmcq. The simulation works like this. I have a b x b grid, area A = b^2. Onto this I place c x c squares, each area S = c^2. For each square I place, I randomly choose a cell in the b x b grid, and I align the bottom left of the c x c square with the bottom left of the random cell. Then I mark off all the covered cells in the b x b grid, ignoring any part of the c x c square that falls off the edge. I carry on placing squares until the b x b grid is completely covered (or n reaches some large cutoff point). I repeat this many times, building up a probability distribution that the b x b grid will be completely covered by n c x c squares. Then I draw a graph of this and overlay the graph of the exp(-K*A*exp(-nS/A)/S)) equation with K = 1. When c = 1, the fit is good. With other values of c, the graphs are completely different. 86.185.76.146 (talk) 17:51, 26 June 2010 (UTC).
 * Thanks. Consider computing coordinates modulo b in order to avoid edge effects. Bo Jacoby (talk) 18:08, 26 June 2010 (UTC).
 * That is a very good point. I was thinking that edge effects wouldn't be such a big deal, but, of course they are the way I was doing it because the left and bottom edges get many fewer hits than they ought to. Making the change you suggest makes a big difference actually. The results still do not match the proposed formula, but they are quite a bit closer and approaching a situation where I don't feel completely certain that the discrepancy is not just due to other imperfections in the simulation (such as having a granular grid). However, the discussion above seems to be moving away from the idea that the simple formula ought to work? 86.185.76.146 (talk) 18:49, 26 June 2010 (UTC).

Result in Janson 1986
I've had a go at decoding and simplifying a result in the Janson 1986 paper i mentioned above. I think i've got the gist of it, but i suggest this is checked by someone with a deeper knowledge of maths than me.

It appears the relevant result is (1.3) in Theorem 1.1 on p84, the second page of the paper. He allows the shapes ('small sets' in his terminology) to be random convex sets. The result depends on the ratio of the area of the region to the average area of the small set; i'll call this ratio r for simplicity (Dmcq called it N above but that clashes with the notation in this paper so would get too confusing). There's also a constant &alpha; that depends on the shape of the small set; in Section 9 he shows that &alpha; = 1 for the cases of a circle of fixed size and a rectangle of fixed size (he gives other examples too where it's different). In two dimensions, the result is that the distribution of n, the number of small sets required to cover every point of the region at least once, is given by:
 * $$n/r - \log r - 2 \log \log r - \log \alpha \; \xrightarrow{d} \; U,$$

where U has a standard Gumbel distribution and $$ \xrightarrow{d}$$ denotes convergence in distribution as r → ∞, i.e. as the area of the region becomes much larger than that of the small set. Qwfp (talk) 19:50, 26 June 2010 (UTC)


 * Wow. I tried plugging this in to my (crucially modified!) simulation and it doesn't seem to be a perfect match-up. However, I'm now within the realm where I'm not sure if the discrepancy might be due to deficiencies in the simulation (granularity, covering squares not vanishingly small compared to covered region). So, I think it doesn't tell me much even if I've got the workings right. Speaking of which, could anyone check my workings?


 * What I need to check against the simulation is a graph of p versus n, where n is the number of covering squares and p is the probability that these n squares will completely cover the larger region. Based on the formula Qwfp posted above, I'm using


 * p = Exp(-Exp(-x))


 * where


 * x = n / r - Log(r) - 2 * Log(Log(r)) - Log(alpha)
 * r = A / S (A is area of large region; S is area of covering squares)
 * alpha = 1


 * Have I got this part right? 86.185.76.146 (talk) 20:52, 26 June 2010 (UTC).
 * I believe so. That formula is very like the one we had before but instead of K being a constant just depending on the shape it is (log r)2&alpha;. I'll have to have a think what that (log r)2 factor means, I'm rather surprised by it but I guess the paper has put a bit more thought into it. It conflicts with the dumbbell argument though the difference isn't great. The formula is equivalent to:
 * $$p(n,r) = e^{- \alpha (\log{r})^2 r e^{- \frac{n}{r}}}$$
 * the dumbbell argument implies
 * $$p(2n,2r) = (p(n,r))^2$$
 * but using the formula on the two sides gives
 * $$e^{- \alpha (\log{2r})^2 2r e^{- \frac{n}{r}}}$$
 * and
 * $$e^{- \alpha (\log{r})^2 2r e^{- \frac{n}{r}}}$$
 * log r and log 2r have only a very small relative difference if r is large but they're still different. Dmcq (talk) 22:43, 26 June 2010 (UTC)
 * Right, thanks. Excuse me if my grasp of this is rather tenuous, but might not the value of alpha be different in the case of the dumbbell? 86.185.76.146 (talk) 22:52, 26 June 2010 (UTC).
 * Hopefully it should behave as two discs, I believe there must be some problem with saying
 * $$p(2n,2r) = (p(n,r))^2$$
 * Perhaps it has something to do with there actually being some variation of the numbers in both halves, it wouldn't divide into exactly n in each half when you put 2n discs in the whole area. I haven't any really good idea what I did wrong yet though. Dmcq (talk) 23:09, 26 June 2010 (UTC)
 * I think I misunderstood you. Although you called it the "dumbbell" argument, is this $$p(2n,2r) = (p(n,r))^2$$ formula actually using dumbbell shapes at all? Are you not just saying that 2n discs each of area S in a region of area 2A should be equivalent to two independent lots of n discs of area S in two regions of area A? As I understood it, the "dumbell" argument was specifically to show that the formula couldn't be the same for a dumbell and a disc because it led to the contradiction p = p^2, right? In that case, the only thing that gives in the formula is alpha, so alpha should be different for a dumbbell? I have to admit, I find it hard to conceive of this parameter alpha that could be the same for a circle and a square and yet differ for some other unknown shapes. 86.185.76.146 (talk) 23:49, 26 June 2010 (UTC).


 * Looking further at section 9, Janson says "smaller α corresponds to more efficient coverings". He shows that α=1 for a square of fixed size and fixed orientation, but 4/π for a square of fixed size but random orientation. It's 2 for a triangle of fixed size and fixed orientation, and 3√3/π for a triangle of fixed size but random orientation. (Orientation is irrelevant for circles of course.) So squares of fixed orientation cover better than those of random orientation (which seems intuitively right), but vice-versa for triangles (which seems reasonable when you think about it). Qwfp (talk) 12:39, 27 June 2010 (UTC)


 * Sorry yes the bit in that formula doesn't depend on anything about dumbbells. What I'm saying is wrong is that 2n discs each of area S in a region of area 2A must not be entirely equivalent to two independent lots of n discs of area S in two regions of area A. In fact we could have n+d in one part and n-d in the other and one would have to sum the combinations with their various probabilities. I assumed that since d would be insignificant compared to n it didn't matter, however it could differ by a few times the square root of n and that must be what allows the (log r)2 factor to exist. I'll see if I can do the sums tomorrow, it looks a little nasty, and if that makes the problem disappear. Dmcq (talk) 00:33, 27 June 2010 (UTC)
 * I'm afraid I haven't properly attempted to follow your dumbbell argument Dmcq, but the Janson result above is valid only for convex sets, and the dumbbell shape clearly isn't a convex set. There are some more recent papers on Svante Janson's website, the latest of which gives a very brief overview of previous work on "covering by translates of a set" in its introduction and then says "In almost all cases above, the set S is taken to be convex". (This latest paper isn't making that assumption but it's about a different problem — what's the minumum number you need when you get to choose where to put them). Qwfp (talk) 09:49, 27 June 2010 (UTC)
 * The dumbbell argument had nothing to do with that, I was just looking at two lots of circles in two area adjoined. I had a quick go at adjusting it to see what happened if I used probabilities one standard deviation away as in
 * $$p(n+\sqrt{\tfrac n 2},r)p(n-\sqrt{\tfrac n 2},r) \approx p(2n,2r)$$
 * which I hoped might give me an idea what's happening and I got that one would have to have
 * $${(\log {2r})}^2 \approx {(\log r)}^2 \cosh{\sqrt{\frac n {2r^2}}}$$
 * one can approximate the cosh as $$1+n/4r^2$$ but that gives a result which is even further from what I wanted when I assumed zero standard deviation. It does show though that the exponent should grow faster than the original one with just a constant K which is I suppose heading in the right direction. I think I'll take Janson's word for it :) Dmcq (talk) 13:20, 27 June 2010 (UTC)
 * Yeah, he seems to be a clever guy, to judge by Google Scholar, which gives him an h-index somewhere around 35—pretty amazing for a mathematician, (assuming there's only one with this name). Perhaps we should have an article on him... Qwfp (talk) 13:41, 27 June 2010 (UTC). Created a stub on Svante Janson. Please expand! This is not really my area. Qwfp (talk) 14:47, 27 June 2010 (UTC)
 * By the way that factor of 2 in front of the "log log r" is the only part of the formula that depends on the dimension of the space, to which it is equal. He also comments that the coupon collector's problem is a zero-dimensional analogue. The "log log r" term then disappears. Qwfp (talk) 13:41, 27 June 2010 (UTC)

Linear or non-linear time series
How do you tell if a time series is linear or non-linear? Is there a formula? Thanks 92.28.242.52 (talk) 14:12, 25 June 2010 (UTC)

Or, if its an easier question to answer, how do you measure the extent to which a time-series is non-linear or chaotic? Thanks 92.15.15.76 (talk) 09:43, 26 June 2010 (UTC)


 * A time series that is linear with time will have the form x(t) = at + b. If there are errors or uncertainties in the observations x(t) then there are various methods for estimating the "best" values for the parameters a and b - the simplest method is ordinary least squares, but others are mentioned in our linear regression article. Once you have a linear model, then you can test the goodness of fit between model and observations using various correlation tests. Note that non-linear is not the same as chaotic - a time series of daily mid-day temperatures over the course of several years will be non-linear, but not chaotic. Gandalf61 (talk) 11:44, 26 June 2010 (UTC)
 * Thanks, you've described a straight line (possibly with noise also) but you have not answered either of the two questions. 92.15.13.228 (talk) 16:12, 26 June 2010 (UTC)
 * If you compare the goodness of fit of a linear model with that of as quadratic model, which uses up a degree of freedom, and you get a better fit with the linear model then that's a good indication linear is a good model. Dmcq (talk) 14:14, 26 June 2010 (UTC)
 * Thanks, why? 92.15.13.228 (talk) 16:12, 26 June 2010 (UTC)

Assuming the data contains a random component, there is likely no way to tell for certain -- you can only tell probabilistically. For example, regress y on a constant, x, and x^2. If the coefficient on x is statistically significant while the coefficient on x^2 is statistically insignificant, then you have evidence that the model is more likely to be linear than it is to be quadratic. If you want to test higher-order non-linearities, you can include x^3, x^4, etc. This is a better approach than comparing the goodness of fit for a couple of reasons: (1) without doing additional math, you don't know whether the improvement in the goodness of fit is statistically significant, (2) R^2 is guaranteed to increase when you add the quadratic term, so you must use an adjusted R^2 -- but, there are several (or none) to choose from depending on what you intend to do with the data and whether your dependent variable is discrete or continuous.Wikiant (talk) 16:17, 26 June 2010 (UTC)

Would calculating or estimating the Lyapunov exponent be a way? 92.15.13.228 (talk) 16:20, 26 June 2010 (UTC)


 * The answer to that is beyond my expertise. However, my limited intuition is that this would not help since you are dealing with data that contains a random component. The Power transform is often used in econometrics to test for linearity vs. several types of non-linearity. Wikiant (talk) 17:46, 26 June 2010 (UTC)

MUSIC INVERSION AS A MATHEMATICAL PATTERN
I have lost a beautiful video, demonstrating the subject material. I recall one example included: Rimsky-Korsakov's Coq d'or, displaying the bird's happy prediction, contrasting with the bird's gloomy prediction - it was the identical notes, just inverted, evoking the contrasting emotion. The video was likely from university lecture.

Any help locating such video would be immensely appreciated. MUSIC INVERSION AS A MATHEMATICAL PATTERN Padraeg Sullivan 23:32, 25 June 2010 (UTC) —Preceding unsigned comment added by PADRAEG (talk • contribs)