Wikipedia:Reference desk/Archives/Mathematics/2008 May 20

= May 20 =

Neg. Binom. GLM question
In doing a negative binomial glm in R, I found two oddities. Doing the analysis of deviance I get the message "Dispersion parameter for the negative binomial(0.553) family taken to be 1." I interpret this to mean no dispersion parameter, as in the case of a quasilikelihood, is estimated. Is this correct?

The second question is more detailed. In comparing two (nested, neg. binom.) models, the anova comparison, anova(model1,model2), shows a different log-likelihood than taking the deviance(model1)-deviance(model2). Not very different, mind you, about 0.1, which is more that I could attribute to rounding error. Any ideas? --TeaDrinker (talk) 01:57, 20 May 2008 (UTC)


 * Suggestions.


 * 1) Are you using glm or something from the MASS package? (Or something else???) Whichever you are, look at the help page for the relevant function, especially where it talks about "family".  But I would suspect your hunch is true regardless, that the dispersion is assumed fixed.
 * 2) Two thoughts.  You will want to make sure the fitting method for both models is the same, i.e., don't use REML for one and ML for the other; in fact, you should probably use ML for both and not REML (which is the default for some functions, BTW), as in many cases the comparison of REML-fitted models can be meaningless.  Second, check the "type" argument of the relevant anova method.  Make sure it will do the equivalent test to the comparison of the deviances, as some "types" do different tests.


 * Check also the R-Help mailing list if you need to. The readers there are very helpful and knowledgeable&mdash;just make sure you read the manuals that came with your R distribution, the relevant function help pages, and search the archives of that list before posting a question there. If you don't, they can be quite, er, uncivil.  ;-) Baccyak4H (Yak!) 03:20, 20 May 2008 (UTC)


 * Thanks! It is the glm.nb function from the MASS library; I was unaware of any ordinary glms being fit using reml; (although lme and glmm models I have seen it used).  I may have to send it over to the r-help, thanks! -TeaDrinker (talk) 20:39, 20 May 2008 (UTC)

Partial differential equation
Heh...does anyone wanna help me out and solve the partial differential equation given in the last problem here? Since it IS a PDE, it is quite a lot of grunt work for not much benefit. But if you were to get (1/2)sin(3x)e^(3t)+(1/2)sin(3x)e^(-3t), I would be happy and grateful. My answer is a solution to the PDE, but I'd like to know if it's the general solution, as was asked for. --M1ss1ontomars2k4 (talk) 02:42, 20 May 2008 (UTC)


 * I think you got it. Obviously $$u_{x x} = -9u$$, so $$u_{t t} = 9u$$ which has the general solution $$u(x,t) = a(x)e^{3 t} + b(x)e^{-3 t}$$ (or equivalently $$u(x,t) = c(x) \cosh(3 t) + d(x) \sinh(3 t)$$).  The initial conditions narrow that down to $$u(x,t) = \sin(3 x) \cosh(3 t)$$ just like you have it.  --Prestidigitator (talk) 05:59, 20 May 2008 (UTC)
 * Cool, thanks. I would never have thought of using a hyperbolic trig function to express it; that's neat. --M1ss1ontomars2k4 (talk) 02:21, 21 May 2008 (UTC)

Standard deviations and probabilities
Hi. I'm working on Donald Bradman, trying to get the article to FA. As long ago as Dec 2006, this edit was made, with no edit summary or discussion at the article talk page. It's never been challenged or referenced.

As the requirements of FAC mean that every claim needs to be referenced, and the user seems to have left the Project (with no email address enabled) I wondered whether the probability figures he's used derive from the (cited) standard deviation figures. As my maths is rubbish, I have no idea.

So, three questions:


 * 1) Do the probability figures derive from the SD ones?
 * 2) Are they accurate?
 * 3) If we're so far at a "yes" and "yes" scenario, how can I possibly cite the probabilities?

Cheers --Dweller (talk) 10:41, 20 May 2008 (UTC)
 * Yes, the probabilities derive from the number of SDs. They are accurate (except that the first should be 185,000, I think) subject to the following caveats: firstly, it is not clear on what basis they have been rounded to their current values. Without access to the book in question, I don't know the precise SD figures, but assuming the current figures are accurate to 1 decimal place only, the probability figures are being given far too precisely: for example, the Don's probability should be '1 in somewhere between 147,000 and 233,000'. Secondly, they depend on the assumption that the underlying distribution is approximately normal. I do not know if this is a sensible assumption.
 * As for sourcing, I doubt you'll be able to find these specific figures anywhere if they're not present in Davis' work (which I don't have access to). Algebraist 11:30, 20 May 2008 (UTC)
 * Thanks. I don't think Davis includes the probabilities, but getting hold of a copy of the book has been a bit of a problem! Your comment of "'1 in somewhere between 147,000 and 233,000'"... does that derive from a log chart or something that I could reference? --Dweller (talk) 11:36, 20 May 2008 (UTC)
 * I couldn't find precise enough normal distribution tables online (though they doubtless exist somewhere), so I used this tool, linked from normal distribution. Algebraist 11:48, 20 May 2008 (UTC)

I had a reply from User:Macgruder. The maths is beyond (I mean way beyond) my poor brain:

You can work these out by number how many standard deviations you are from the mean

For example

Move than 2 [actually 1.96] standard deviations above mean is about 2.5% (1/40)

http://davidmlane.com/hyperstat/z_table.html Type 1.96 into the 'above' box

Or look at the x-numbers here:

http://photos1.blogger.com/blogger/4850/1908/1600/sd%20table1.jpg

So Pele is 3.7 ---> 0.000108 > 1/9259 (about 1/9300)

The Bradman one is so far out you need to find special website that doesn't have rounding errors, but I calculated it to be as 1/184,000

This is not 'original' research. This is a straight correspondence to the statistical meaning of Standard Deviation. —Preceding unsigned comment added by Macgruder (talk • contribs) 13:28, 20 May 2008 (UTC)

So I'm stuck. I don't know what figures to include or how to reference them. Please help! --Dweller (talk) 13:46, 20 May 2008 (UTC)


 * The above poster has a good point in that 4.4 is anywhere between 4.35 and 4.45, which would give a range.
 * You can use this

/* 4.35: 0.99999318792039 (1/147,000) 4.4:   0.99999458304695	(1/185,000) Agreeing with 4.4 --> 0.000005413 from published tables ; see below

4.45: 0.99999570276356 (1/232,000)



function normdist($X, $mean, $sigma) { $res = 0;

$x = ($X - $mean) / $sigma;

if ($x == 0) { $res = 0.5; } else { $oor2pi = 1 / (sqrt(2 * 3.14159265358979323846)); $t = 1 / (1 + 0.2316419 * abs($x)); $t *= $oor2pi * exp(-0.5 * $x * $x) * (0.31938153 + $t           * (-0.356563782 + $t * (1.781477937 + $t                   * (-1.821255978 + $t * 1.330274429))));

if ($x >= 0) { $res = 1 - $t; } else { $res = $t; } }  return $res; }


 * refer to this table :

http://www.math.unb.ca/~knight/utility/NormTble.htm which agrees with my calculations. Look at the far 'right numbers'

If you link to that table it indicates that the calculations are by and large correct (if we consider 4.4 to be correct). There is no more need to cite a SD to probability than a conversion of Inches to cms. It's a factual conversion. Macgruder (talk) 14:09, 20 May 2008 (UTC)
 * I'd like to reiterate that this is not a straight factual conversion, it is a conversion predicated on the assumption that the data are normally distributed. This assumption is of course straightforwardly wrong in a strict sense (the data are all nonnegative) but it might be a sensible approximation. I do not have the means to discover whether or not it is. Algebraist 14:15, 20 May 2008 (UTC)


 * Agree with Algebraist. In practice a data point like this that is more than 4 SDs away from the mean is far too much of an outlier to provide any useful likelihood estimates. I doubt that the sample size was any near large enough to support the "1 in 184,000" conclusion. I would delete the whole "Probability" column and the paragraph immediately following the table, and just leave the SD figures, which presumably come straight from Davis. It is reasonable to state the SD figures, if they are given by Davis, but we should not embroider them with additional interpretations. Gandalf61 (talk) 14:25, 20 May 2008 (UTC)


 * That's exactly the conclusion User:The Rambling Man and I have just come to. The probabilities are just that bit too close to debatable (and are definitely very hard to source) for the strictures of WP:FA and I'm afraid they'll just have to go. We have sourced comment giving context for just how good people in other sports would have to be to reach such an exalted SD, which will do just fine. Thanks all, especially Macgruder who responded very quickly to my request for help. --Dweller (talk) 14:37, 20 May 2008 (UTC)


 * I mean 'factual' in the sense that (if the author is stating the statistics are a normal distribution) it is a straight interpretation of the number 4.4. The discussion of statistically what 4.4 means in terms of how it can be interpreted or how reliable it is, is not under discussion here. We are taking an authors conclusion and converting what it means. Any reliability issues with 4.4 must be sourced from the author. If the author states the 4.4 figure is not reliable then so can we. Of course, I can't find the original book so I don't know if he says it's normal or not. If in another article someone states X is 1.96 SDs above the normal we could state that this is 2.5% - it would not be appropriate to go into a discussion of sample bias, etc etc if the respected source doesn't do so. My feeling is that either the 1/x probabilities stay or the the whole table goes. If we feel that the authors original statistics are not a valid reference then fair enough. However, if we feel that the author's reference is valid and he himself does not discuss the 4.4 then there is no reason to remove it. [unless of course the author says that it's not a normal distribution]. These numbers make the S.D's clear to understand to a reader, and if we have problems with them anyway the whole table shouldn't be there.


 * Although thinking about it, major championships is obviously not Normal. It will heavily skewed towards zero, but some of the other stats like goals might be, and cricket scores might be too. I wish I could see the original source.

Macgruder (talk) 14:53, 20 May 2008 (UTC)


 * Algebraist is correct that it's not possible to score <0 in these statistics. Davis seems to have only cited the SDs. To show their impact in a practical sense, rather than looking at probabilities of finding others with such skill, he's applied the SDs to the leading stats in various sports, showing what a (say) baseball batting average would need to be to be Bradmanesque. --Dweller (talk) 15:06, 20 May 2008 (UTC)

Fields
For the field of integers mod p, I cannot find an inverse element for all cases. $$x * x^{-1} = 1$$ where x = 0 does not seem to work. What have I missed? 81.187.252.174 (talk) 11:10, 20 May 2008 (UTC)
 * 0 is the only element with no multiplicative inverse. You've missed nothing. –King Bee (&tau; • &gamma;) 11:29, 20 May 2008 (UTC)


 * Hurf, didn't read the definition properly. Blargh burp 81.187.252.174 (talk) 12:14, 20 May 2008 (UTC)

Geometry question on angles
Assume a circle (call it A if you desire) on a line with radius r. Constructed at 2r from its center point (rightwards direction) is an identical circle B of radius r. Then assume an angle θ in the first circle, extending outwards as a line in the direction of the second circle. How does one find the point for the rightmost intersection of circle B's perimeter, expressed as a θ(b)? Ah, I will upload something to help illustrate my example! Right here! I'm thankful for any help. Scaller (talk) 16:09, 20 May 2008 (UTC)
 * The right circle has equation $$(x-r)^2+y^2=r^2$$ and the line has equation $$y=(x+r)\tan\theta$$ (you can work this out: 2 points on the line are (-r, 0) and (0, $$r\tan\theta$$), the latter by trigonometric identities.) Then it's just a simultaneous equation which may be able to be solved (haven't gone any further).  x42bn6 Talk Mess  16:34, 20 May 2008 (UTC)
 * Alternatively, you could use the law of sines. You have a triangle with one known angle and two known sides. The law of sines, combined with the fact that angles in a triangle add up to 180 degrees should be enough to determine all the other angles and sides. That will then give you the point in polar coordinates with the centre of the second circle as origin, which you can then convert to Cartesian coordinates if you want. --Tango (talk) 16:43, 20 May 2008 (UTC)

Game show theorem
I came across the monthy hall problem some time ago, and I'm curious about how it applies to more complex situations. Anybody have a clue? Bastard Soap (talk) 21:42, 20 May 2008 (UTC)
 * What kind of more complex situations did you have in mind? There are all kinds of situations in probability were intuition turns out to be wildly incorrect. Take a look at Prosecutor's fallacy for one example. Also, if you haven't seen it already, we have an article on the Monty Hall problem. --Tango (talk) 21:55, 20 May 2008 (UTC)
 * I’ve seen this problem which is somewhat similar cause much debate (and I remember it because I was convinced I had the right answer for several hours until I figured out otherwise).
 * The situation is that there are 3 people, and they will enter the same room and receive a hat (either red or blue, each with a 50% chance). They cannot communicate whatsoever once in the room, but they can collaborate and determine a strategy before entering the room.
 * Then at the exact same time each of them is to guess what color their hat is (or they may choose not to guess at all). If at least one person guesses correctly and nobody guesses wrong, they win a prize.
 * The question is, given an optimal strategy, what is the probability that they will get the prize? GromXXVII (talk) 22:36, 20 May 2008 (UTC)
 * Give me a minute... --Tango (talk) 22:44, 20 May 2008 (UTC)
 * Nope, not happening. At least one person has to guess, and they have a 50% chance of getting it wrong and blowing everything, so there is no way to improve on 50%. However, if the answer was 50%, you wouldn't have asked the question... Damn you... Let me sleep on it. --Tango (talk) 22:50, 20 May 2008 (UTC)
 * 75%. Turning my computer off and going to sleep: Take 2. --Tango (talk) 23:02, 20 May 2008 (UTC)
 * I had first confused this with another problem where after receiving the hats, the players are asked one by one if they know their own hat's color, and after at most 5 (IIRC) queries, someone will know. BTW, Tango's going to sleep quip is actually quite a cute coincidence, as a New York Times  article  covering the problem mentions someone going to bed after solving it and recognizing its relevance to coding theory as he fell asleep. And yes, the answer is 75%  Baccyak4H (Yak!) 04:11, 21 May 2008 (UTC)
 * I'm having a hard time understanding the problem completely. Each one gets a hat (at the same time?), but can't see what color their own hat is?  Can they see the color of each of the others' hats?  --Prestidigitator (talk) 04:55, 21 May 2008 (UTC)
 * You have to assume that they can in order to give them a better than 50% chance of getting it right. And it took me a while to work out what the strategy is but I knew it had something to do with what to guess given what you see. (Here's a hint: no matter what the arrangement of hats, there will always be at least two people with the same colour. If you're a prisoner who can see two hats the same colour, or two hats of different colour, see if that affects your optimum strategy for guessing your own colour.) Confusing Manifestation (Say hi!) 06:42, 21 May 2008 (UTC)
 * Interesting how the game show contestants became prisoners there. --tcsetattr (talk / contribs) 07:10, 21 May 2008 (UTC)
 * Am I missing something? If all are given with 50% probability then whatever colour hats the other people have won't affect the probability of guessing your own colour. The obvious answer seems to be that they should decide that just one should guess having a 50% chance. -- Q Chris (talk) 07:36, 21 May 2008 (UTC)
 * Yeah, you're missing it. The external link given by Baccyak4H contains a pretty thorough spoiler which should have you slapping your forehead and saying "Of course!" --tcsetattr (talk / contribs) 07:58, 21 May 2008 (UTC)
 * You’re right that each person that guesses has a 50% chance of guessing right, or guessing wrong regardless of any other factors. But the problem has three people, and whether one person guesses right or wrong is not independent of whether somebody else does (in an optimal strategy, which too my knowledge results in 75%). GromXXVII (talk) 10:52, 21 May 2008 (UTC)
 * The key point to note is that the method which the hats were assigned was made explicit; there is a prior distribution on what the hats can be. While when reading the problem the description seems trivial, it turns out to be crucial.  If no information about that is known then of course there is no improvement over 50%.  So look at this as a conditional probability problem, and consider Thomas Bayes your friend... Baccyak4H (Yak!) 14:28, 21 May 2008 (UTC)
 * Or just right out all the possible combinations - worked for me! --Tango (talk) 15:17, 21 May 2008 (UTC)
 * Right, but that still presupposes the prior.  What if the prior was probability 1 for all hats being red?  What is the chance that the strategy will work in that case?    Was your sleeping comment intentional, or just a coincidence, with respect to the article I linked to above? Baccyak4H (Yak!) 16:37, 21 May 2008 (UTC)
 * Well, yes, but we were given the prior. I wasn't suggesting you were wrong, just that there's a less technical approach. Pure coincidence - I read the problem just as I was about to turn the computer off and go to sleep, attempted it, gave up, turned the computer off, immeadiately realised the answer and turned the computer back on again! (I'm not entirely sure why... bragging rights, I guess.) --Tango (talk) 18:30, 21 May 2008 (UTC)

So it's not really correct that events which aren't linked don't "influence" each other? You can get the flow of the event by looking at other manifestations of it? Bastard Soap (talk) 08:40, 21 May 2008 (UTC)
 * Having read the spoiler I don't think that's right. The events don't influence eachother, the strategy just means that when a wrong guess is made it will be by all three people at the same time, whereas a correct guess will be made by just one. Since each geuess has a 50/50 chance three out of four times a correct guess will be made, one out of four three incorrect guesses will be made. -- Q Chris (talk) 10:54, 21 May 2008 (UTC)

Incidentally, is there a simple proof that I'm missing that the 75% strategy is optimal? I'm pretty sure it is, but I can't see how to prove it. --Tango (talk) 11:46, 21 May 2008 (UTC)
 * I saw more or less the same problem a while ago. Here's one form: you have (say) 1023 (this is a hint) people, each provided with a random hat as before. This time everyone must vote, and the group wins if the majority vote correctly. A variant allows weighted voting, so you can (say) cast 100 votes that you have a red hat. I was reminded of these because the forced voting clause makes it easy to prove the optimal strategies are indeed optimal, which I can't quite see how to do for the problem here. Algebraist 12:53, 21 May 2008 (UTC)
 * This might work...although I haven’t tried to actually prove the optimality before
 * Consider that regardless of the strategy everyone will guess wrong 50% of the time. If two people guess wrong together in a case, that provides correctness of at most 67%. If all three people guess wrong together in a case, then this provides correctness of at most 75%. This exhausts all cases. It also assumes that one can partition the sample space though, which isn’t always possible, but nonetheless should provide the sufficient upper bound. (for instance, I don’t think it’s possible to have a strategy which wins two thirds of the time because 8 isn’t divisible by 3). GromXXVII (talk) 16:42, 21 May 2008 (UTC)
 * It's not possible to have a deterministic strategy with 2/3 chance of winning, if you allow non-deterministic strategies, it should be doable. --Tango (talk) 18:25, 21 May 2008 (UTC)

Momentum and vectors - Mechanics
Hopefully this is okay here, I have an exam tomorrow and I'm going over past papers and am very confused :(

Two particles, A and B, are moving on a smooth horizontal surface. Particle A has mass m kg and velocity (5i - 3j) ms-1. Particle B has mass 0.2kg and velocity (2i + 3j) ms-1. The particles collide and form a single particle C with velocity (ki - 1j) ms-1, where k is a constant.


 * Show that m = 0.1

I formed an expression for the total momentum of the two particles: m(5i - 3j) + 0.2(2i + 3j). Due to the conservation of momentum, the particle C has to have the same mass and momentum, right? So I set that expression equal to (m + 0.2)(ki + 1j). But now I appear to have two unknowns and am thoroughly stuck. Any pointers would be appreciated.  naerii -  talk  22:52, 20 May 2008 (UTC)
 * You have two unknowns, but if you look carefully, you also have two equations - one for i and one for j. --Tango (talk) 23:03, 20 May 2008 (UTC)
 * I love you tango :)  naerii -  talk  23:12, 20 May 2008 (UTC)
 * I love you, too! --Tango (talk) 11:47, 21 May 2008 (UTC)
 * Maybe this is a red herring and was just a typo when you restated the problem, but I think you introduced a sign error at some point in the velocity of C too. At one point you said -1j, and at another you used +1j.  --Prestidigitator (talk) 05:02, 21 May 2008 (UTC)