Talk:James–Stein estimator

Is the assumption of equal variances fundamental to this? Should say one way or another. —The preceding unsigned comment was added by 65.217.188.20 (talk).
 * Thanks for the comment. The assumption of equal variances is not required. I will add some information about this shortly. --Zvika 19:29, 27 September 2006 (UTC)
 * Looking forward to this addition. Also, what can be done if the variances are not known?  After all, if $$\theta$$ is not known then probably $$\sigma^2$$ is not either.  (Can you use some version of the sample variances, for instance?)  Thanks!  Eclecticos (talk) 05:26, 5 October 2008 (UTC)

Thanks for the great article on the James-Stein estimator. I think you may also want to mention the connection to Emprirical Bayes methods (e.g., as discsussed by Effron and Morris in their paper "Stein's Estimation Rule and Its Competitors--An Empirical Bayes Approach"). Personally, I found the Empirical Bayes explanation provided some very useful intuition to the "magic" of this estimator. — Preceding unsigned comment added by 131.239.52.20 (talk) 17:54, 18 April 2007 (UTC)
 * Thanks for the compliment! Your suggestion sounds like a good idea. User:Billjefferys recently suggested a similar addition to the article Stein's example, but neither of us has gotten around to working on it yet. --Zvika 07:55, 19 April 2007 (UTC)

dimensionality of y
A confusing point about this article: y is described as "observations" of an m-dimensional vector $$\theta$$, suggesting that it should be an m by n matrix, where n is the number of observations. However, this doesn't conform to the use of y in the formula for the James-Stein estimator, where y appears to be a single m-dimensional vector. (Is there some mean involved? Is $$||y||^2$$ computed over all mn scalars?)  Furthermore, can we still apply some version of the James-Stein technique in the case where we have more observations of $$\theta_1$$ than of $$\theta_2$$, i.e., there is not a single n?   Thanks for any clarification in the article. Eclecticos (talk) 05:19, 5 October 2008 (UTC)


 * The setting in the article describes a case where there is one observation per parameter. I have added a clarifying comment to this effect. In the situation you describe, in which several independent observations are given per parameter, the mean of these observations is a sufficient statistic for estimating θ, so that this setting can be reduced to the one in the article. --Zvika (talk) 05:48, 5 October 2008 (UTC)


 * The wording is still unclear, especially the sentence: "Suppose θ is an unknown parameter vector of length m, and let y be a vector of observations of θ (also of length m)". How can a vector of m-dimensional observations have length m? --StefanVanDerWalt (talk) 11:07, 1 February 2010 (UTC)


 * Indeed, it does not make sense. I'll give it a shot. 84.238.115.164 (talk) 19:49, 17 February 2010 (UTC)
 * Me too. What do you think of my edit? Yak90 (talk) 08:05, 24 September 2017 (UTC)


 * Is the formula using σ2/ni applicable for different sample sizes in groups?. In Morris, 1983, Parametric Empirical Bayes Inference: Theory and Applications, it is claimed that a more general version (which is also derived there) of Stein's estimator is needed if the variances Vi are unequal, where Vi denotes σ2i/ni so as I understands it, Steins formula is only applicable for equal ni as well.

Bias
The estimator is always biased, right? I think this is worth mentioning directly in the article. Lavaka (talk) 02:09, 22 March 2011 (UTC)

Risk functions
The graph of the MSE functions would need a bit more precisions : we are in the case where ν=0, probably m=10 and σ=1, aren't we ? (I thought that, in this case, for θ = 0, MSE should be equal to 2 ; maybe the red curve represents the positive JS ?) —Preceding unsigned comment added by 82.244.59.11 (talk) 15:40, 10 May 2011 (UTC)

Extensions
In the case of unknown variance, multiple observations are necessary, right? Thus it would make sense to swap bullet points 1 and 2 and reference the then first from the second. Also, the "usual estimator of the variance" is a bit dubious to me. Shouldn't it be something like: $$\widehat{\sigma}^2 = \frac{1}{ m(n-1)+2}\sum_i \left\| y_i-\overline{y} \right\| _2$$?

Always or on average?
Currently the lead says


 * the James–Stein estimator dominates the "ordinary" least squares approach, i.e., it has lower mean squared error on average.

But two sections later the article says


 * the James–Stein estimator always achieves lower mean squared error (MSE) than the maximum likelihood estimator. By definition, this makes the least squares estimator inadmissible when $$m \ge 3$$.

(Bolding is mine.) This appears contradictory, and I suspect the passage in the lead should be changed from "on average" to "always". Since I don't know for sure, I won't change it myself. Loraof (talk) 23:55, 14 October 2017 (UTC)


 * It sounds like the first one doubles the "mean". On average the squared error is lower. The mean squared error is lower. There is nothing more to average over. If you have a specific sample the squared error of the James-Stein estimator can be worse. --mfb (talk) 03:09, 15 October 2017 (UTC)

Concrete examples?
This article would be greatly improved if some concrete examples were given in the lead and text so that laymen might have some idea of what the subject deals with in the real world. μηδείς (talk) 23:17, 8 November 2017 (UTC)


 * There is an example starting at "A quirky example". I'm not sure if there are real world implications. --mfb (talk) 07:31, 9 November 2017 (UTC)


 * I agree with . And by "concrete", I don't mean more hand-waving, I mean an actual variance-covariance matrix and set of observations, so that I can check the claim for myself. Maproom (talk) 09:42, 22 June 2018 (UTC)

Using a single observation??
Is this correct?


 * "We are interested in obtaining an estimate $$\widehat{\boldsymbol \theta} $$ of $$\boldsymbol\theta$$, based on a single observation, $${\mathbf y} $$, of $${\mathbf Y} $$."

How can you get an estimate from a single observation? Presumably the means along each dimension of $$\boldsymbol\theta$$ are uncorrelated... Danski14(talk) 19:53, 2 March 2018 (UTC)
 * apparently it is right. They use the prior to get the estimate. Nevermind Danski14(talk) 20:13, 4 April 2018 (UTC)


 * You make a single observation (in all dimensions), and then you either use this observation as your estimate or you do something else with it. --mfb (talk) 05:02, 5 April 2018 (UTC)

Using James-Stein with a linear regression
I am wondering how to use James-Stein in an ordinary least squares regression.

First, if $$\beta$$ are the coefficient estimates for an OLS (I skipped the hat), is the following the formula for shrinking it towards zero:

$$\hat\beta^{JS}=\beta \left(1- \frac{(p-2)\sigma^2_n}{\beta'\beta}\right)$$

where $$\sigma_n^2$$ is the true variance (I might substitute the sample variance here), and p is the number of parameters in $$\beta$$. (I'm a bit fuzzy on whether $$\alpha$$, the constant in the regression, is in the $\beta$.)

I guessed this formula from "The risk of James-Stein and Lasso Shrinkage", but I don't know if it's right.

Second, what would the formula be for the confidence intervals of the shrunken $$\beta$$ estimates?

dfrankow (talk) 20:46, 12 May 2020 (UTC)