Talk:Gradient boosting

The URL for Reference 6 appears to have changed
The URL for Reference 6 returns Error 404. I've found a new URL: cran.r-project.org/web/packages/gbm/gbm.pdf

69.116.252.118 (talk) 18:37, 29 March 2013 (UTC)


 * I've updated the reference; thanks for bringing this to attention!    Sophus Bie  (talk) 18:42, 29 March 2013 (UTC)

Gradient Tree Boosting
The following equation in the Gradient Tree Boosting section is confusing:



\gamma_{jm} = \underset{\gamma}{\operatorname{arg\,min}} \sum_{x_i \in R_{jm}} L(y_i, F_{m-1}(x_i) + \gamma h_m(x_i)). $$

In the formula before, $$h_m$$ was explicitly written out as the indicator function of the Region $$R_{jm}$$, so the equation should be more simply be written:



\gamma_{jm} = \underset{\gamma}{\operatorname{arg\,min}} \sum_{x_i \in R_{jm}} L(y_i, F_{m-1}(x_i) + \gamma). $$

This corresponds also to step (c) of Algorithm 10.3 in 'The Elements of Statistical Learning'. Can anyone object/confirm ?

Andre.holzner (talk) 11:29, 19 August 2013 (UTC)

Grammar issue
"the goal is to learn a model "? This makes no sense. Can someone correct this? I took a stab, but it might need more work... — Preceding unsigned comment added by 128.210.106.76 (talk) 16:08, 8 December 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on Gradient boosting. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20091110212529/http://www-stat.stanford.edu/~tibs/ElemStatLearn/ to http://www-stat.stanford.edu/~tibs/ElemStatLearn/
 * Added archive https://web.archive.org/web/20100807162855/http://www.stat.rutgers.edu/~tzhang/papers/it08-ranking.pdf to http://www.stat.rutgers.edu/~tzhang/papers/it08-ranking.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 02:24, 22 October 2017 (UTC)

Names reference to MART trademark
Current final line in Names section claims Salford Systems have trademarked "MART" however I can't see how this is plausible given it was coined by Friedman & Meulman in 2006 (http://www.datatheory.nl/pages/cdc7.pdf). The claim is not substantiated by a citation therefore I propose the reference to trademark be removed as it may be incorrect and seems unlikely to be of high interest/relevance to the anticipated typical reader of this page. I've not edited Wikipedia much thus am reluctant to presume to make the change first and justify it second! — Preceding unsigned comment added by SimonDedman (talk • contribs) 23:02, 31 August 2018 (UTC)

math error (inaccuracy)
I wanted to point out that the formulae


 * $$F_0(x) = \underset{\gamma}{\arg\min} {\sum_{i=1}^n {L(y_i, \gamma)}}$$,
 * $$F_m(x) = F_{m-1}(x) + \underset{h_m \in \mathcal{H}}{\operatorname{arg\,min}} \left[{\sum_{i=1}^n {L(y_i, F_{m-1}(x_i) + h_m(x_i))}}\right]$$,

should probably read something like


 * $$F_0(x) = \underset{\gamma}{\arg\min} \left[{\sum_{i=1}^n {L(y_i, \gamma)}}\right](x)$$,
 * $$F_m(x) = F_{m-1}(x) + \underset{h_m \in \mathcal{H}}{\operatorname{arg\,min}} \left[{\sum_{i=1}^n {L(y_i, F_{m-1}(x_i) + h_m(x_i))}}\right](x)$$,

or


 * $$F_0 = \underset{\gamma}{\arg\min} {\sum_{i=1}^n {L(y_i, \gamma)}}$$,
 * $$F_m = F_{m-1} + \underset{h_m \in \mathcal{H}}{\operatorname{arg\,min}} {\sum_{i=1}^n {L(y_i, F_{m-1}(x_i) + h_m(x_i))}}$$,

Do you agree? Since I'm new here I haven't edited the main page by myself. — Preceding unsigned comment added by Toedtli (talk • contribs) 12:03, 26 April 2019 (UTC)

What is the point of omitting the (x)? — Preceding unsigned comment added by Zjplab (talk • contribs) 23:08, 21 October 2019 (UTC)

Should published references show authored date or published date?
Newbie here, so apologies if this is general knowledge a Wikipedian ought to have.

I'm confused about what date should be reported for a published reference, the date it first appeared, or the date it was published?

For example, consider two references by Freidman.

The first, "Greedy Function Approximation: A Gradient Boosted Machine" is reported as February 1999, which is the date listed in a pdf freely available at a URL which may belong to Freidman. However, the paper was eventually published in Annals of Statistics in October 2001.

Similarly, "Stochastic Gradient Boosting" is reported as March 1999, which appears to be the date it was given as a lecture, available for free at Stanford stats domain, although it was published in Computational Statistics & Data Analysis in February 2002 Rmwenz (talk) 15:47, 26 August 2022 (UTC)

Potential changes to history section, which is very brief and may contain historical inaccuracies.
As it stands, the history section is very brief, and probably due for expansion, and possibly a few corrections.

As it's worded, in my view, there is ambiguity about the contributions of Friedman and Mason et al. I read this section as implying that Friedman merely adapted boosting to the regression context by constructing some specific algorithms, while Mason et al are responsible for the connection between boosting and gradient descent in function space, which doesn't appear historically accurate.

In my opinion, this section doesn't adequately recognize the primacy of Friedman's contribution, and is potentially misleading in crediting Mason et al with the functional gradient descent view instead. Moreover, it doesn't adequately convey the breakthrough importance of Freidman's work. Here's the current wording


 * "Explicit regression gradient boosting algorithms were subsequently developed, by Jerome H. Friedman, simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett and Marcus Frean. The latter two papers introduced the view of boosting algorithms as iterative functional gradient descent algorithms."

It's somewhat difficult to determine who first made the observation that boosting algorithms perform optimization by gradient descent in function space. Friedman makes the observation in Greedy Function Approximation: A Gradient Boosting Machine (GFA). From the abstract:


 * "Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A  connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion."

Mason et al made a more abstract mathematically general version of this observation as well Boosting Algorithms as Gradient Descent (BAGD). From the abstract:


 * "We provide an abstract characterization of boosting algorithms as gradient descent on cost-functionals in an inner-product function space."

Assigning primacy or priority to the observation is potentially confounded by the difference between first appearance, authorship and publication dates. On closer inspection, GFA appears to have been first authored in February 1999, but not published until 2001 in Annals of Statistics, while BAGD first appeared at the NeurIPs conference November 29- December 4, 1999 and was later published in its proceedings.

While may not be strictly necessary to recognize primacy of this observation, a perusal of the literature indicates the connection is of very important theoretical significance. The literature is full of references to the Freidman paper, which at the time of writing has a citation count of 18687, while the paper by Mason et al has a current citation count of 1257. There may be some difference in citation counts here due to the sub-discipline - statistical learning (Freidman) versus machine learning (Mason et al) - but Friedman's paper seems clearly more influential.

Freidman's paper is published in the prestigious Annals of Statistics;, and it appears responsible for coining the term "Gradient Boosted Machines", while the Mason et al paper is published in NeurIPS conference proceedings. I could find plenty of references crediting Freidman with gradient boosting, including the scikit learn documentation, textbooks by Murphy, p. 642 and Zhou p.35, and well-cited blog articles. I could find no mentions of the paper by Mason et al in my reading of the wider literature.

I believe this section should be edited to reflect the significance of Friedman's contribution and clarify the potential confusion. The section could also stand to flesh out the work of Breiman on ARCing classifiers, which appears influential in directing the attention of the statistics community to AdaBoost specifically and boosting in general.

I'm willing to make these changes myself but since I'm new page editing, I thought I'd start with the talk page. Rmwenz (talk) 17:04, 26 August 2022 (UTC)

تحكم في حسابك وحمايته وأمنه ، كل ذلك في مكان واحد. يمنحك حساب Google الخاص بك وصولاً سريعًا إلى الإعدادات والأدوات التي تتيح لك حماية بياناتك وحماية خصوصيتك. 212.237.121.31 (talk) 21:30, 7 March 2023 (UTC)