Talk:Bootstrap aggregating

Figure
A couple of things look wrong with the figure (though it's still okay enough to give the general idea of bagging)

1) Far less than 100 smoothers are plotted

2) The red curve does not appear to be an average of the grey curves

3) There are no units for the axes (e.g. Celsius/Fahrenheit) — Preceding unsigned comment added by 78.142.185.12 (talk) 10:48, 25 August 2015 (UTC)

(2) could be due to (1)

I agree. In addition to this, I would note that it is not at all obvious that the red curve is smoother than (all) the grey ones. Being thicker can make it look smoother without that being the case. Anyway, a curve fitted to the whole data would be more interesting for comparison. Elias (talk) 10:29, 30 September 2020 (UTC)

Sorry, I don't know much about editing Wikipedia articles, and I don't know anything about "bootstrap aggregating". However, I do know a bit about bootstrapping in general, and it's my understanding that the best estimate of the mean is the original sample mean, not the bootstrap mean. The whole point of bootstrapping is to generate a confidence interval around this mean. So in this case it looks like the whole point is to generate confidence intervals (or the equivalent) from the 100 smoothers, whereas the average value of those smoothers itself is less important. This isn't clear to me from the article, so perhaps either a better explanation is warranted or an explanation of how it differs from standard bootstrap theory? 128.218.42.15 (talk) 22:47, 17 November 2017 (UTC)

Merge
Should this page and Bootstrapping_(machine_learning) possibly by combined together? It seems that they cover very similar content, albeit the other article is fairly short. Xekno (talk) 04:52, 20 May 2011 (UTC)


 * Agree, it describes the same technique. Let's replace that article with a redirect. It doesn't look to me that we need to combine anything. The only new fact mentioned there is "The error is then estimated by err = 0.632×err_test + 0.368×err_training." It's not clear where that formula comes from. References don't look useful to me: 1) Efron's paper is about bootstrapping in statistics, not bagging, 2) there's nothing about bagging in Viola and Jones paper, 3) external link is dead. -- X7q (talk) 11:02, 20 May 2011 (UTC)

Linear models
This page claims that "One particular interesting point about bagging is that, since the method averages several predictors, it is not useful to improve linear models." Is there a reference for this? 88.211.7.22 12:10, 7 September 2007 (UTC)


 * I'm pretty sure that if you average a bunch of bootstrapped (linear?) models you will just end up with the same thing as fitting one model to the whole dataset. —Preceding unsigned comment added by 99.232.36.40 (talk) 22:18, 7 November 2007 (UTC)


 * I am not at all sure, but i would like to see the curve fitted on the full data in the figure! Maybe a simulated dataset would also give a better view on the advantages of bagging? Jeroenemans (talk) 15:11, 6 March 2008 (UTC)


 * What is the context of the statement? Bagging is not used in the traditional linear models but in machine learning approaches (random forests and the like). So what is the point?


 * I am not sure if this is what the anon poster meant, but my question is this. If I am making a GLM model with many free parameters (say I'm interested in fitting the first 10 terms in a Taylor expansion of some complicated function), would I benefit from bagging my models? --IlyaV (talk) 05:19, 1 April 2009 (UTC)


 * Heh, it seems I answered my own question. I ran some simulations and it seems that bagging a GLM model gave a bigger error relative to noiseless data than the original GLM. My examples were generated using Gaussian noise added to a nonlinear model. I tried very high order GLMs (10+ coefficients), and bagging never improved the prediction (as measured by MSE relative to noiseless original distribution). It seems that as n->Inf bagging gives prediction which starts to approach the original GLM --IlyaV (talk) 18:57, 2 April 2009 (UTC)

63.2%
Please explain this: "If n'=n, then for large n the set Di expected to have 63.2% of the unique examples of D, the rest being duplicates." How is the mysterious 63.2% derived? If n' (the size of each Di equals n (the size of the original training dataset), then Di contains 100% of the training data set! Also, instead of the word "examples" could we use some standard language such as cases or observations?  —Preceding unsigned comment added by 70.231.152.119 (talk) 06:13, 20 March 2009 (UTC)


 * No,if Di is drawn from D with replacement (with n=size(D)), then there is 1/n^2 probability of duplicate, 1/n^3 of triplicates (is that a word?), etc. This is actually independent of size of ni. If you have ni>>n then you should have Di have all elements from D, however, for ni<=n, it would be incredibly unlikely for Di to have all elements from D. Also, examples IS standard language when you talk about learning; bagging is, if anything, a learning algorithm. --IlyaV (talk) 16:43, 2 April 2009 (UTC)

I agree with the first comment. How can n'=n? --Zumerjud (talk) 12:55, 24 February 2021 (UTC)


 * Sampling is done with replacement. What's the problem with n'=n? Burritok (talk) 06:08, 27 February 2021 (UTC)


 * Sorry, there is no issue. --Zumerjud (talk) 14:29, 27 February 2021 (UTC)

Implementations
Is it worth listing a list of implementations? AndrewHZ (talk) 18:44, 4 December 2012 (UTC)

Here is a start of a list


 * Neural networks with dropout ("Dropout can be seen as an extreme form of bagging in which each model is trained on a single case and each parameter of the model is very strongly regularized by sharing it with the corresponding parameter in all the other models." Improving neural networks by preventing co-adaptation of feature detectors
 * bagEarth in R package caret
 * R package ipred Improved Predictors
 * R package adabag Applies multiclass AdaBoost.M1, AdaBoost-SAMME and Bagging

Wording issue
"It is well known" by who? You've made assumptions about both the reader and the general population. I'd say the dependent clause, "It is well known that" can be dropped altogether and the sentence can begin with, "The risk of a 1..." — Preceding unsigned comment added by 134.197.105.161 (talk) 21:24, 25 April 2015 (UTC)


 * I agree. I took out the offending phrase and added a ref verifying the assertion. --Mark viking (talk) 23:04, 25 April 2015 (UTC)