Wikipedia:Reference desk/Archives/Mathematics/2009 January 7

= January 7 =

Estimation problem
I've been wondering about how to estimate the size of a population if sample members have a unique serial number, these known to be issued consecutively without gaps. Suppose that six enemy aircraft have been shot down in the order 235, 1421, 67, 216, 863 and 429. Assuming that any aircraft is as likely to be shot down as any other, is it possible to say anything about the likeliest number in total? Indeed, what would this estimate be with each successive observation? The problem seems to have something in common with the lifetime estimation of J. Richard Gott, but I can't see any obvious way of tackling it.→81.159.14.226 (talk) 22:21, 7 January 2009 (UTC)


 * I've heard of this problem before, and can't remember how to construct an unbiased estimator, but have found a real-world application of it in Google Books. In case that doesn't show up for you, Allied statisticians in WWII used a formula that resembled x(1+1/n) to estimate the number of German tanks based on a captured sample, where x was the largest serial number on a captured tank and n was the number of tanks captured. Actually, a google on "estimate largest serial number" gives some more promising results in the first page. Confusing Manifestation (Say hi!) 22:30, 7 January 2009 (UTC)


 * Even better, this gives the unbiased estimator(s) W(i) = [(n + 1) X(i) / i] - 1, where X(i) is the ith smallest serial number found, with i = n giving the best (lowest variance) estimate. Confusing Manifestation (Say hi!) 22:34, 7 January 2009 (UTC)


 * This reminds me of a famous tale about Hugo Steinhaus, from Mark Kac's book "Enigmas of Chance" (Harper & Row, 1985):
 * ... My favorite example of Steinhaus's incisive intelligence is the way he estimated the losses of the German army during WWII. Bear in mind that he was hiding under an assumed name and his only contact with the outside world was a rigidly controlled local news sheet that the Germans used mainly for propaganda purposes. The authorities allowed the news sheet to print each week a fixed number of obituaries of German soldiers who had been killed on the Eastern Front. The obituaries were standardized and read something like this: "Klaus, the son of Heinrich and Elvira Schmidt, fell for the Fuhrer and Fatherland." As time went on - late in 1942 and throughout 1943 - some obituaries began to appear which read "Gerhardt, the second of the sons of ...", and this was information enough to get the desired estimate. A friend to whom I told this story had occasion to tell it to a former high official of the CIA at a luncheon they both attended; the official was quite impressed, as well he might have been.  --PMajer (talk) 00:41, 8 January 2009 (UTC)


 * Is there a way to combine the various estimators W(i) into a single estimator of even lower variance than W(n)? It seems reasonable that using more information should give a better estimate (such as using a single sample to estimate the population average versus using an average of a larger sample), but I'm not sure how one would go about this.  Maybe a weighted average, with weights chosen carefully to emphasize the lesser variance estimates?  I remember something like this being a good idea, since scalars (the weights) get squared (and so smaller), while the variance of a sum is not that much more than the sum of the variances.  I'm not at all familiar enough with this stuff to figure out if the W(i) are sufficiently independent for the weights to be chosen well enough, nor to figure out if there is a better way of combining them. JackSchmidt (talk) 19:16, 8 January 2009 (UTC)

See Likelihood_function. Bo Jacoby (talk) 21:07, 8 January 2009 (UTC).


 * It is interesting (to me at least) to compare the unbiased estimator from the ConMan with the maximum likelihood estimator from Bo. The estimates in the first case would have been around 470, 2130, 1900, 1780, 1700, 1660 as the numbers were sampled (rounded to avoid doing homework since this appears to be a standard classroom example).  The estimates in the second case are quite simple: 235, 1421, 1421, 1421, 1421, 1421.  I'm not sure if the likelihood function article is suggesting to do this, but after 3 samples, one could confuse likelihood with probability and do an expected value kind of thing to get: undefined, ∞, 2840, 2130, 1890, 1780.  I think it is particularly interesting that the second (and the third) method "use" all of the information, but don't actually end up depending on anything more than X(n).  Since they both seem fishy and bracket the first method, I think I like the first method best.  The second method is always a lower bound (on any reasonable answer), but is the third method always larger than the first? Also, I still think it should be possible to use all of the W(i) in some way to get an even lower variance unbiased estimator; can anyone calculate or estimate the covariances well enough? JackSchmidt (talk) 00:51, 9 January 2009 (UTC)

I think that the maximum serial number of the sample is a sufficient statistics for estimating the number of elements of the population. The maximum likelihood value is the lousiest estimate, but in the case N = 1 it is all you have got. If N = 2 you have also median and confidence intervals. If N = 3 the mean value is defined, but the standard deviation is infinite. When N > 3 the standard deviation is also finite, and you may begin to feel confident. Bo Jacoby (talk) 08:37, 9 January 2009 (UTC).
 * Cool. And "sufficiency" means that even if we use more statistics (like the X(i)), we cannot do better than just using X(n), the maximum serial number, right?  Of course what we do with that one X(n) might produce better or worse estimators, but we can ignore the other X(i).  This makes sense in a way: the more samples we have the more confident we are that the largest value in the sample is really large.  When we got the 1421 on the second try, we weren't sure if the next one would be a million, but after four more tries with 1421 still the largest we begin to have intuitive confidence that it really is the largest.
 * Can you describe a little how to find the mean, confidence intervals, and standard deviation in this particular case? Feel free to just use n=2 and n=4 and the above numbers; I think I understand the concepts abstractly but have never worked a problem that was not basically setup for it from the start.
 * For the "median", is this the number S0 such that the likelihood that the population is ≥ S0 (given the observed serials above) is closest to the likelihood that the population is ≤ S0? Since n≥2, both of the likelihoods are finite, so there should be exactly one such integer.  I guess I could have a computer try numbers for S0 until it found the smallest (since it should be between 1421 and 10000).  Is there a better way?
 * I'm not sure on the confidence interval. Looking up numbers in tables labelled "confidence interval" is about as far as I've worked such problems.  How do you do this?
 * For the standard deviation, I guess I find the mean (listed above), then do something like sqrt(sum( 1/binomial(i,n)*( i - mean )^2,i=1421..infinity )/sum(1/binomial(i,n),i=1421..infinity)), where n is the sample size and mean is the mean. Is there some sane way to do this, or just let maple do it?
 * I worry I might have done this wrongly, since for n=4..6, I get standard deviations of 1230, 670, and 460. Since for n=4 the mean was only 2130, that's a heck of a deviation.
 * For reference, here are my values so far:
 * Unbiased: 470, 2130, 1900, 1780, 1700, 1660
 * MLE (L-mode?): 235, 1421, 1421, 1421, 1421, 1421
 * L-median: ∞, 2840, 2010, 1790, 1690, 1630
 * L-mean: undefined, ∞, 2840, 2130, 1890, 1780
 * How should I measure their accuracy? Confidence intervals?  Are they easy to compute from the variances? Interesting stuff. JackSchmidt (talk) 17:29, 9 January 2009 (UTC)

Thank you for asking. I too find this problem fascinating. If the number of items in the sample is N, the (unnormalized) likelihood function is
 * $$L({M=i})={[m\le i]}{\binom i N}^{-1}$$

where m is the maximum sequence number in the sample, and M is the unknown number of items in the population. The accumulated likelihood is
 * $$ L(M\le k)=\sum_{i=0}^k L({M=i})=\sum_{i=m}^k{\binom i N}^{-1}=(1-N^{-1})^{-1}\left( {\binom {m-1}{N-1}}^{-1}-{\binom k{N-1}}^{-1}\right)$$

when k &ge; m &ge; N &ge; 2. See Binomial coefficient, equation 14. I don't know if the statisticians of WW2 were aware of this simplifying identity. The normalized accumulated likelihood function is
 * $$ P(M\le k)=\frac{L(M\le k)}{L(M<\infty)}=\frac{(1-N^{-1})^{-1}\left( {\binom {m-1}{N-1}}^{-1}-{\binom k{N-1}}^{-1}\right)}{(1-N^{-1})^{-1}{\binom {m-1}{N-1}}^{-1}}=1-{\binom {m-1}{N-1}}{\binom k{N-1}}^{-1}.$$

From this you may compute a median M0.5 satisfying P(M &le; M0.5)~ 0.5, and a 90'th percentile M0.9 satisfying P(M &le; M0.9)~ 0.9, and an expected value of the number of items $$\scriptstyle  \mu=\sum_{i=0}^\infin i\cdot P(M=i)$$, and a standard deviation $$\scriptstyle  \sigma=\sqrt{\sum_{i=0}^\infin (i-\mu)^2\cdot P(M=i)}.$$

Bo Jacoby (talk) 08:18, 10 January 2009 (UTC).


 * As the OP, let me thank everyone for their thoughts. No, it wasn't homework, I just made up the serial numbers to give an example, being long past the age of taking courses and being given set work.→81.151.247.41 (talk) 19:20, 10 January 2009 (UTC)