Talk:Bias–variance tradeoff

Clarification on the definition of the terms for the bias-variance decompositions.
When using "bias" as a parameter for the bias-variance decomposition value, would "error" be a more suitable phrase for "bias"? What's the difference between "bias" and "error"?

An example of an error would be the mean squared error (MSE): $$E[\operatorname{err}(D)] = E_{D}[|h(D) - m|^2]$$ In the MSE formula, m represents the mean s.t. $$m = E[Y]$$ and $$h(D)$$ represents an algorithm estimate of the mean.

"Spread" not defined in image caption.
The numerical parameter "spread" changes between images in the provided plots, but no description of what exactly "spread" is has been provided. The message this tries to convey is still clear, but the machinery behind is not and leaves this example lacking. --Physicsmichael (talk) 19:45, 1 February 2014 (UTC)

Clarifying which space expectations are being taken over
I would make changes to the article if I felt comfortable enough with the material, but I'm not.

One change that I think should be made throughout the page is clarifying what space expectations are being taken over. In Section 2.1, for example, we calculate var(Y) as sigma^2. This was confusing to me, because it is independent of the actual underlying distribution of y.

But in 2.1, we're not actually calculating the variance of y itself - we're computing the variance of y over the space of all possible models in our class trained on all possible subsets of the training data.

I think making this clearer (perhaps I've botched the explanation) would help the page. — Preceding unsigned comment added by Potnisanish (talk • contribs) 17:11, 26 September 2015 (UTC)

Difference between bias/variance and accuracy/precision
Can someone address this topic?

The definition of variance
Shouldn't $$\operatorname{var}\hat{f}(x)=E\left[\left(\hat{f}(x) - \bar{\hat{f}}(x)\right)^2\right]$$?

Minima2014 (talk) 03:29, 26 January 2018 (UTC)
 * $$\operatorname{Var}[X] = \operatorname{E}[X^2] - \operatorname{E}[X]^2$$ is an equivalent definition. 10:50, 26 January 2018 (UTC)Rab V (talk)


 * Oh I see now. Thank you Rab. I'm new to Wikipedia. Do you think it's a good idea to delete this section now? Minima2014 (talk) 03:05, 27 January 2018 (UTC)


 * I don't know if there is a policy on this but typically sections are left up even after they are resolved. Rab V (talk) 09:53, 27 January 2018 (UTC)


 * Below the equation it says the variance of the learning method, or, intuitively, how much the learning method $$\hat{f}$$ will move around its mean; I'm assuming by its mean, it's referring to the mean of the estimator function $$\hat{f}$$. But the equation above in the article seems to account for the true function's $$f$$ mean instead. I ran some numbers in my head that I would expect to deliver low variance but they ended up giving high variance. Is there an error in the equation? — Preceding unsigned comment added by 66.75.245.120 (talk) 17:16, 27 August 2020 (UTC)

Expectation of the estimator vs conditional mean function
I found the derivation a bit confusing because it doesn't distinguish between the expectation of the estimator, f(x), and the conditional mean function E(y|x). They coincide only in the special case where f has the correct functional form. Eg the derivation in the link below makes this distinction and is clearer to me https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote12.html — Preceding unsigned comment added by 141.160.13.251 (talk) 01:29, 18 February 2021 (UTC)