Talk:German tank problem

(init)

 * Title added for discussion —Nils von Barth (nbarth) (talk) 23:37, 16 February 2009 (UTC)

Hi guys, I really like this problem, and I didn't see it on Wikipedia, so I decided to write one myself! I hope I didn't screw things up too much :P Themandotcom (talk) 18:25, 24 March 2008 (UTC)

Title?
I’ve seen this problem at various times, to illustrate differences in estimation, but I haven’t heard a pithy name – has it one?

(I named it “Maximum of a discrete uniform distribution” to start to avoid the sesquipedalian “Estimation of the maximum of a discrete uniform distribution”, which is admittedly more precise.)

—Nils von Barth (nbarth) (talk) 16:41, 16 February 2009 (UTC)


 * Ok, looks like German tank problem already exists (and this was the pithy example that I had heard), so I’ve merged the article there (hat tip to Michael Hardy).
 * —Nils von Barth (nbarth) (talk) 23:40, 16 February 2009 (UTC)

Median discussion has errors
The median-unbiased estimator (with maximum concentration on symmetric convex sets) differs from the umvu estimator: See van der Vaart's book, which has both examples. The umvu estimator is not median-unbiased, and so there must be an error in the description of the umvu estimator. Thank you. (I apologize for being brief today.) Sincerely, Kiefer.Wolfowitz (talk) 14:27, 27 July 2009 (UTC)
 * Be bold and remove or correct it! Bo Jacoby (talk) 08:35, 28 July 2009 (UTC).

iPhone production
I don't think it is appropriate for this article to include a discussion of iPhone production. This is an article on a mathematical problem. The connection to estimation of wartime production is appropriate because of the strong historical ties. There is no such connection to estimating the production of iPhones. It would be inappropriate to use this article as an archive of situations where this problem has arisen. I think we should restrict the scope of this article to (1) a mathematical treatment of the problem and (2) a discussion of the historical applications. iPhones meet neither of these criteria. Nippashish (talk) 16:50, 20 April 2010 (UTC)
 * Agree. (The iPhone production material has since been removed.) 98.210.208.107 (talk) 14:08, 19 February 2011 (UTC)


 * Totally disagree. The iphone production was a neat and high profile use of the technique 128.114.23.110 (talk) 06:03, 8 March 2011 (UTC)

cleanup needed
The article looks messy in mine eyes. The structure needs clean-up. Some part of the article tacitly assumes that only one tank has been observed. The distinction between frequentist and Bayesian approaches is not clear. Somebody please help. Bo Jacoby (talk) 04:31, 29 May 2011 (UTC).


 * I have now done a lot of cleanup myself. The section 'Observing one tank' is not very enlightning, and I want to remove it. Any objections? Any comments? Bo Jacoby (talk) 18:22, 30 June 2011 (UTC).


 * Under "Specific data" it says "Applying the above formula..." but there's no formula above. — Preceding unsigned comment added by 94.237.38.23 (talk) 16:27, 31 October 2011 (UTC)

"circular reasoning"
The following could use some polish:

"Note that one cannot naively use m/k (or rather (m + m/k − 1)/k) as an estimate of the standard error SE, as the standard error of an estimator is based on the population maximum (a parameter), and using an estimate to estimate the error in that very estimate is circular reasoning."

If psi = f(theta), it is perfectly legitimate to estimate psi by f(theta-hat) ... it isn't circular reasoning. Might not produce a good estimator, but that's a different matter.

Floombottle (talk) 20:30, 4 April 2012 (UTC)

--I agree with Floombottle. There are plenty of precedents. The estimate of the rate for a Poisson process is the same as an estimate of its variance. This is used in parameter estimates. For any one parameter family of distributions, the mean and standard deviation (if they exist) are always functions of the parameter. And if there is a sufficient statistic it must yield the best estimate of both. There is nothing circular in that. Whether they are any use, by way of a normal approximation, is another matter. 118.92.171.121 (talk) 00:42, 27 August 2016 (UTC)

mean, sd, pmf
Does anyone have a reference for these? The pmf was clearly wrong until I changed it just now. I changed the normalizing constant so now it sums to 1 and agrees with the mean and sd that are given on the wiki page. Here is a verification in R.

I was lazy and just verified these results, but I did not take the time to actually go through the math and prove that the normalizing constant is (k-1)/k instead of k/(k-1) (which is what it used to say).

Can someone also give some intuition behind the pmf? I understand that the {m-1 \choose k-1} is coming from choosing where the observed serial numbers (except the maximum) come between 1 and m-1. I also understand the {n \choose k} since we are sampling k tanks from the n that we are considering. But I have no good intuition behind the (k-1)/k part.

AustenWHead (talk) 02:25, 11 July 2012 (UTC)

rounding mean value
Editor HugoMe just changed $$N \approx \mu \pm \sigma = 20 \pm 10 $$ into $$N \approx \mu \pm \sigma = 19.5 \pm 10 $$. The rounded value is to be preferred, IMO. Bo Jacoby (talk) 16:24, 28 December 2012 (UTC).

--Does \mu - \sigma make any practical sense when the distribution is not approximately normal? The example gives a lower limit much smaller than the some of the observations. 118.92.171.121 (talk) 22:57, 26 August 2016 (UTC)

"The" frequentist estimate and "the" Bayesian estimate??
In this article in its present form, we are told that are certain expressions.

That's silly. Either an MLE or an unbiased estimate would be a frequentist estimate, and they're different. And there are as many Bayesian estimates as there are priors. Whoever wrote this didn't say which criteria were intended to be satisfied by "the" [sic] frequentist estimate or which prior was used in obtaining "the" [sic] Bayesian estimate. Omission of that sort of thing is disrespectful to the reader. Michael Hardy (talk) 21:23, 30 July 2013 (UTC)


 * I totally agree, though it is nice to contrast the two different approaches to the problem. I think it can be fixed by just being clear that The frequentist/Bayesian formula uses refers to the estimates given in the text below. 84.93.172.231 (talk) 16:46, 10 September 2013 (UTC)
 * "There are as many Bayesian estimates as there are priors". If you suggest another prior then please include it. Bo Jacoby (talk) 04:06, 9 July 2014 (UTC).

Historical Problem
It seems to me that Panther tank had two series of eight wheels on each side, so that there should be thirty-two wheels on each tank, and not forty-eight, as it is written here. However, the total of ninety-six wheels could be true if we suppose there were three tanks instead of two. By the way, the article says SHAEF "obtained" two tanks. It would be interesting to know by which means. Can we assume that those tanks were captured in full condition at Anzio ? — Preceding unsigned comment added by DrJosef (talk • contribs) 16:58, 24 August 2013 (UTC)
 * I agree on the number of wheels. I've found a citation that it was 2 tanks, so assuming no spare wheels mounted or wheels lost, that gives 64 in total. I've edited the article. On where these tanks were captured the citation mentions Anzio but not specifically as where they were captured.--Flexdream (talk) 11:33, 7 July 2014 (UTC)

Move 'Example' section?
The 'Example' section appears at the start of the body of the article. Shouldn't the 2 examples be moved to their relevant approaches i.e. the frequentist estimate and the Bayesian estimate? Rather than stand apart at the start?--Flexdream (talk) 11:10, 7 July 2014 (UTC)

Uh... "Credibility"??
I'm no expert but I've never heard "credibility" used as the Bayesian equivalent of probability. I've heard of "credibility intervals," but when talking about probabilities and distributions, everything I've ever read so far just uses "probability" in the usual way, which seems perfectly reasonable. This smells like a neologism to me, and might even count as original research. I'm going to leave this up for a few days and change it back to probability if nobody objects. Solemnavalanche (talk) 15:58, 23 September 2014 (UTC)
 * An event may be more or less probable to happen, and a hypothesis may be more or less credible to be true. Hypotheses are not more or less probable. So it is OK to talk about the probability of an event, and the credibility of a hypothesis. Bo Jacoby (talk) 22:41, 24 December 2015 (UTC).

Another approach
Another approach to solving the German tank production is to assume that the average of the serial numbers, is similar to the real average. Then multiple by two to give a figure. Then compare to the max of the serial number and this figure.

Here is some examples, say I have a series of serial numbers 5943	3641	5948	6592	6891	6967	5402	124	1131	8702	3947	1697	325	2164	2888	2755	6829	9760	6574	2737	4998	335	1556	3538	6152	5973	9036	3611	16	5462

The maximum serial number is 9760

The average = 4390 So the top number is the maximum of (2 x 4390, 9760) = 9760 BernardZ (talk) 08:50, 22 December 2015 (UTC)

--This is an unbiased estimator but it is inefficient compared to those based on the maximum, which is a sufficient statistic. 118.92.171.121 (talk) 22:48, 26 August 2016 (UTC)
 * What about doubling the median number to estimate N?Rich (talk) 01:44, 30 December 2018 (UTC)

German tank production using the Zimmermann method
(I moved this conversation from my talk page. Bo Jacoby (talk) 22:13, 24 December 2015 (UTC).)

At work this is how we solve the German tank production problem/ We assume that the average of the serial numbers, is similar to the real average.

If so then multiple this average by two to give a figure. Then compare to the max of the serial number you have and this figure.

Here is an example, say I have a series of serial numbers 5943 3641 5948 6592 6891 6967 5402 124 1131 8702 3947 1697 325 2164 2888 2755 6829 9760 6574 2737 4998 335 1556 3538 6152 5973 9036 3611 16 5462

The maximum serial number is 9760

The average = 4390 So the top number is the maximum of (2 x 4390, 9760) = 9760

Which is pretty close to what it is, 10000. BernardZ (talk) 08:55, 22 December 2015 (UTC)
 * I have three objections to your contribution. The first objection is formal: you did not provide a reference, and so the contribution is 'original research', which is not allowed on wikipedia. See wp:OR. The second objection is that you method is theoretically unfounded. Once you know the number, k, of observed sequence numbers, and the highest observed sequence number, m, the other observed sequence numbers contain no additional information as to the estimation of the total number of tanks, N. The third objection is that your method does not estimate the uncertainty, so you cannot tell what 'pretty close' means. With your sample data, k=30 and m=9760, the bayesian estimate is
 * N ≃ (m−1)(k−1)(k−2)−1 ± (m−1)1/2(k−1)1/2(k−2)−1(k−3)−1/2(m−k+1)1/2 = 10107.5 ± 360.7
 * In this case your method is inferior to the bayesian method. Your estimate, 9760, is 0.96 bayesian standard deviations below the bayesian mean while the correct value, 10000, is only 0.30 bayesian standard deviations below the bayesian mean.
 * Bo Jacoby (talk) 11:26, 23 December 2015 (UTC).
 * Obviously you do not know much statistical history and modern methods. http://wikieducator.org/Point_estimation_-_German_tank_problem check out Mean times 2 estimator. We use it at work to estimate max street numbers delivered by walkers. BernardZ (talk)
 * Suppose you have conquered k=5 tanks with the highest serial number m=10. If the conquered tanks had serial numbers 1, 2, 3, 4, and 10, then the mean value is 4 and your estimate is N ≃ max(8,10) = 10. If the conquered tanks had serial numbers 6, 7, 8, 9, and 10, then the mean value is 8 and your estimate is N ≃ max(16,10) = 16. In any case you know that the tanks 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10 have been produced. So the estimated values of N should not differ. The bayesian estimate is N ≃ 12 ± 3.46 . Your method is far inferior. You should do better. Bo Jacoby (talk) 20:42, 25 December 2015 (UTC).
 * Whether you can do better is irrelevant, this is an encyclopedia which should include all methods used. BernardZ (talk)
 * Should we document a method just because it is used? It is not helpful to include arithmetic errors in an encyclopedia. Bo Jacoby (talk) 04:53, 29 December 2015 (UTC).
 * The answer is as the page was written yes, what I did is change the page slightly so its clear its discussing this method only which bypasses the problem. I do not know what you mean by arithmetic errors????

ps the zimmermann method is quite accurate for very small sample sizes. BernardZ (talk)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on German tank problem. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20081120085633/http://www.rsscse.org.uk/ts/gtb/contents.html to http://www.rsscse.org.uk/ts/gtb/contents.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 01:21, 11 January 2017 (UTC)

Any nation ever try to manipulate serial numbers, to confound analysis?
Can anyone comment on whether the Germans, in World War 2, tried to prevent or confuse enemy analysis of serial-numbers? ...such as with encryption? Or possibly, did any other nation try to prevent serial-number analysis? Or might a nation ever try to manipulate serial numbers, to cause an erroneously large or small estimate of production quantities? The possibilities for counter-intelligence are fascinating, such as to encourage an enemy to over-estimate production of inferior weapons, and under-estimate the quantity of superior weapons. 69.1.52.76 (talk) 02:58, 27 December 2019 (UTC)

What is the confidence interval for sampling without replacement?
The article states that the confidence interval is given for sampling with replacement, which seems useful, but inaccurate given the problem specifically asks about sampling without replacement. J2kun (talk) 21:26, 20 June 2021 (UTC)