Wikipedia:Reference desk/Archives/Mathematics/2008 August 29

= August 29 =

Statistics - How can i test for the existance of a significant difference?
OK here is an example. You have 2 countries, France and Spain. You collect a sample of foreign companies doing business in each country, and specifically ask where they are from. So say 35% of the foreign companies interviewed in france were american but 55% of the foreign companies in Spain were american, how can you do a test to check for a bias, or which would suggest a preference to spain? I was thinking of simply subtracting 55-35 and saying american companies have a 20% bias towards spain, but i think this is wrong, and anyway it's really too basic. I would to test it in a way that is a bit more advanced and from which i can draw some conclusions. Also what's that thing about degree of significant? Doesn't that mean how accurate we can be that this bias exist? How can i include this in the above test? —Preceding unsigned comment added by 79.75.138.119 (talk) 07:26, 29 August 2008 (UTC)
 * I answered a similar question in http://en.wikipedia.org/w/index.php?title=Wikipedia:Reference_desk/Mathematics&oldid=229316117#Statistics.
 * It is not sufficient to know the percentages. "35 out of 100" is more information than "7 out of 20", even if the percentage is 35% in both cases. So you must provide the exact sample data. Let i  out of  n  observed foreign companies in France be american. Then the unknown probability, 0 &le; P &le; 1, that a randomly chosen foreign company in France is american, has a beta distribution with mean value ± standard deviation
 * $$ P\approx \frac {i+1}{n+2} \pm\sqrt{\frac{\frac {i+1}{n+2}\cdot(1-\frac {i+1}{n+2})}{n+3}}. $$
 * The probability for Spain has another beta distribution. You want to compute if these two distributions have a significant overlap, or if they are significantly separated. Bo Jacoby (talk) 08:32, 29 August 2008 (UTC).

Hi Bo, thanks for answering my question on the help desk very promptly. The problem is my ability in statistics is very limited to i am stuggling to get my head around your answer. I even looked at the other similar answer you directed me to but i also got quite lost. Perhaps it would help if I told you what data i'm working with. Basically the Japan and spain data was collected from different sources and represents the number of foreign companies entering a foreign market for the first time. Assume none of the companies could have entered each market at the same time. I wish to analyse this data so as to test for regional preferences. How can i test the extent of these preferences? And also how can I say with a certain degree of certainty that these preferences exist. Perhaps if you have the time you could work through an example and explain what the findings show. I would be greatly appreciative.

ok so i did the test you suggested, and assuming i was correct to both + and - against each root I have found the following....

For Japan:

America: 0.125382329 - 0.044830437 Asia: 0.645831957 - 0.503104213 Europe: 0.224457654 - 0.115967878 Middle East: 0.271838054 - 0.153693861 Oceania: 0.042105213 - 0.000447979

Form Spain:

America: 0.091807244 - 0.040060888 Asia: 0.435336959 - 0.33389381 Europe: 0.435336959 - 0.33389381 Middle East: 0.154865589 - 0.086892653 Oceania: 0.104704428 - 0.049141726

now how do i go about proving something along the lines of a preference of say european firms towards spain over japan, which there ostensibly is when one looks at the original data. Thanks 79.75.138.119 (talk) 19:02, 29 August 2008 (UTC)


 * Hello sir.
 * First some hints for editing: Log in and provide a name for yourself rather than just the number 79.75.138.119. Provide links using square brackets like this: user talk:Bo Jacoby rather than copying. Sign by typing four tildes like this: ~ . When answering, just edit the current section rather than making a new section.
 * Next I do appreciate your difficulties in understanding. It is conceptually complex, and there are missing links in the chain of thought as I provided a formula without a proof.
 * The question is: "Assuming that the total number of foreign companies in Japan is large, observing that i = 3 out of n = 45 randomly chosen foreign companies were American, what is the probability, P, that a randomly chosen foreign company is American?" The quick answer is P = i/n = 3/45 = 0.067, but you cannot be sure. You know for sure that 0 < P < 1. It might turn out that P is actually 0.100, because in a big population of foreign companies having 10% American companies it is quite possible that there are only 3 American companies in a sample of 45 foreign companies. When you don't know, you can express your limited knowledge in the form of a probability distribution. But a distribution function is a complex entity. You summarize it by its mean value and its standard deviation. The mean value of the distribution of P is
 * $$\frac {i+1}{n+2}=\frac {3+1}{45+2}=0.085 .$$
 * This does not mean that P = 0.085 because we do not know what P is, but it means that P is probably rather close to 8%. What does 'rather close' mean? The standard deviation of the distribution of P provides the answer. It is computed by the formula
 * $$ \sqrt{\frac{\frac {i+1}{n+2}\cdot(1-\frac {i+1}{n+2})}{n+3}}=\sqrt{\frac{\frac {3+1}{45+2}\cdot(1-\frac {3+1}{45+2})}{45+3}}=0.040 .$$
 * So we write P ≈ 0.085 ± 0.040, meaning that we do not know the exact value of P, but it has some distribution having mean value 0.085 and standard deviation 0.040. The true value is no more than a few standard deviations away from the mean value. The numbers for Spain are i = 5 and n = 89, giving P ≈ 0.066 ± 0.026. Now you want to know: "Is the probability that a foreign company is American greater in Japan than in Spain ? Or, written in formula, "is 0.085 ± 0.040 > 0.066 ± 0.026" ? Or, "is 0.085 - 0.066 ± 0.026 ± 0.040 > 0" ? Composition of standard deviations is computed as the square root of the sum of the squares, ± 0.026 ± 0.040 = ± (0.0262 + 0.0402)1/2 = ± 0.048, so the question is: "is 0.019 ± 0.048 >  0 " ? Dividing by 0.048 gives: "is 0.4 ± 1 > 0" ? The answer is: "not significantly", meaning that it is not improbable that a random observation from a distribution having mean = 0.4 and standard deviation = 1 assumes a negative value.
 * You must have made a mistake in computing your tenfigure results above.
 * Bo Jacoby (talk) 07:37, 30 August 2008 (UTC).

A generalization of Brouwer's fixed point theorem
This is a question that I learned about from Fedor Nazarov. Suppose that f and g are continuous function from the closed unit disk to itself and they commute; namely, $$\scriptstyle f\circ g=g\circ f$$. Is it necessarily the case that there is a point x in the unit disk such that $$\scriptstyle f(x)=g(x)$$? If we take g to be the identity mapping, then this becomes Brower's fixed point theorem. Is the answer to this question known? Does anyone know offhand a reference to this problem? Any proofs/counterexamples? Oded (talk) 21:24, 29 August 2008 (UTC)