Wikipedia:Reference desk/Archives/Mathematics/2006 July 16

Confidence margin
In a survey of mine, I asked 35 Grade 11 boys in my school in what languages they could “carry on a conversation, read newspapers and write personal letters.” 3 indicated that they could do this in Klingon. 2 said they could do so in French. Based on this, how confident can I be that more Grade 11 boys were this proficient Klingon than were in French? Seahen Neon Merlin  07:11, 16 July 2006 (UTC)


 * You're in a better position than us to say whether you can believe what these boys told you; I'll assume you do believe them. If you took a simple random sample of 35 boys from your school, then you could use a hypothesis test to say whether the difference is statistically significant at a particular level of confidence, telling you whether you can generalise from your sample to all Grade 11 boys at your school. However your sample is small enough, and Klingon and French proficiency rare enough, that the usual formula for large samples is not reliable. You should ideally use an exact test instead, or a simpler method like the Wilson score test. Our Binomial proportion confidence interval article gives details of the latter test for a single binomial proportion, but your problem focuses on the difference between two proportions from the same sample, requiring a slightly different formula. This is all a bit academic here, however, because all these methods will tell you that you can't have much confidence at all that a difference this small would be real. -- Avenue 10:57, 16 July 2006 (UTC)


 * Let us use N = 35 for the sample size, K = 3 for the number of obseved Klingon speakers, and F = 2 for the French speakers. Presumably your null hypothesis is that pF, the probability of French proficiency for a random element of the space of Grade 11 boys, is at least as large as pK, the same for Klingon proficiency. Assuming independence of these two properties, the maximum likelihood for the observation (N, K, F) under the constraint of the null hypothesis is obtained for pF = pK = (K+F)/(2N) = 1/14. The "surprising observation" under the assumption of the null hypothesis is K > F. How surprising is it really? Defining Bi(n, k, p) = C(n, k) pk (1 – p) (n – k) (see Binomial distribution), the probability of such an observation is the sum of the products Bi(N, K, pK) Bi(N, F, pF) for 0 ≤ F < K ≤ N. For the given values, this adds up to slightly more than 0.4 or 40 percent. This is much higher than levels commonly employed for statistical significance. In conclusion, the observation (under the presumed null hypothesis) is not surprising, and does not allow you to generalize beyond the sample with any reasonable confidence. --Lambiam Talk  11:42, 16 July 2006 (UTC)


 * If I read that correctly, that means that even if F > K in the population, there was still a 40% chance of K > F in the sample. Is that right? Also, do the numbers change if we consider that the population size is finite? (I think it's between 50 and 150 in the school where I did the survey, but I don't have the exact size.) Seahen Neon Merlin  16:12, 16 July 2006 (UTC)


 * Almost right; change F > K to F = K. (If we suppose F >= K in the population, then F = K is the most likely possibility given the data you have.) Good point about the finite population. Assuming you sampled without replacement, you'd need to use the hypergeometric distribution to do an exact test. This will lower the confidence level further, however, so it wouldn't change the conclusion here. -- Avenue 02:08, 17 July 2006 (UTC)


 * Lower the confidence level? I don't understand. A smaller population means making fewer guesses, and less opportunity for the unsampled population to cancel out the sampled, no? Seahen Neon Merlin  02:33, 17 July 2006 (UTC)


 * Sorry, you're right - I don't know what I was thinking. Ignore that part of my response. -- Avenue 07:44, 17 July 2006 (UTC)


 * If the population size is 35, we cannot have simultaneously that F > K in the population and K > F in the sample, so size does matter. The 40% is for a very large population. For a population size of 150 it may be a reasonable approximation, but for 35 we know the chance is exactly 0, so for 50 it will be appreciably lower than 40%. --Lambiam Talk 12:10, 17 July 2006 (UTC)

Any easy & free simulation languages?
I would like to simulate a business situation where I buy a growing number of houses and rent them out to people. I can afford to buy another house when there is enough equity or cash to provide a deposit, and where the total rents will cover the extra mortgage. Renters may sometimes default on paying the rent, or there may be voids between renters.

I am only comptenant with GWBASIC, so easyness of use is the primary concern. I have tried writing a simulation in BASIC but it got too big and complicated. As you may have surmised, the houses and renters could each be from the same template with different values. I have not yet learnt anything about object-orientated programming yet though.

Any suggestions please?--81.104.12.20 12:37, 16 July 2006 (UTC)


 * Take a look at GNU Octave. It is almost as simple as Basic, and is probably more powerful. (Igny 15:53, 16 July 2006 (UTC))

Thanks. Sorry to quibble, but its not actually a simulation language as far as I am aware. Using a non-simulation language means I would have to write the code to deal with the time aspects of the sitation, including any graphs against time of various parameters, which a simulation language would have built in. Simulation languages I have heard of are Simula (not free), plus I have also heard of a language called Witness but know nothing about it.

From the article Simulation language, I see that only SimPy is free and general-purpose. I had hoped there would be others.