Talk:Random forest


 * demo*

First
First time hearing about this. Reads a bit like an advertisement, and sort of short on info. I suspect that the claim of "most accurate classifier" would depend on the data being classified. I intend to read the link and see if I can contribute a bit. Dsol 09:34, 30 July 2005 (UTC)


 * I agree, the choice of prediction algorithm depends on the nature of the data among many other factors, and claiming random forests are "most accurate" isn't really true. I've updated the article to make a less strong claim. Neilc 22:40, 2 April 2006 (UTC)

Here's one such specific application. Neslin, Gupta, Kamakura, Lu and Mason "Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models" Journal of Marketing Research (2006) Vol. XLIII p. 204-211. In a competition about predicting future customer termination, 44 entries compete for the best out-of-sample performance using a variety of methods. P.210: "Our winning entry used the power of combining several trees to improve prediction [...]". Innohead 11:24, 16 August 2006 (UTC)

Similarly, the statement about "one of the best" highly discredits the entire article. Anyone heard of No Free Lunch? It should be removed with a short grace period for the OP to adapt it him/herself. This is irrespective of the above citation. — Preceding unsigned comment added by 134.147.168.152 (talk) 15:42, 4 August 2011 (UTC)

I agree with all above. I had a student whose dissertation referenced random forests, and landed here at this wiki 'article'. It reads like a sad self-promotionary advertisment for a few researchers. Your work should stand for itself, and wikipedia should not be an avenue for shameless promotion. Noting "random forests" is a trademark in the first paragraph? Weakens article and readers will subsequently disregard. — Preceding unsigned comment added by 67.163.247.231 (talk) 16:31, 24 February 2015 (UTC)

Decision process
The learning process is very well documented, and in the end of it we do get a 'forest' or collection off trees. But what is the decision process, which allows us to classify a new data point once we have the forest? —Preceding unsigned comment added by Hirak 99 (talk • contribs) 14:21, 15 October 2010 (UTC)

Disadvantages
The following two bullets were removed from the "disadvantages" section on 10 Sep, 2011 by user 78.42.107.187 with the comment "The statements were unrelated to the given publications.":


 * Random forests do not handle large numbers of irrelevant features as well as ensembles of entropy-reducing decision trees.
 * It is more efficient to select a random decision boundary than an entropy-reducing decision boundary, thus making larger ensembles more feasible. Although this may seem to be an advantage at first, it has the effect of shifting the computation from training time to evaluation time, which is actually a disadvantage for most applications.

The point expressed in the first bullet is made in Section 4, and is clearly illustrated in Figure 5. Is anyone aware of a better paper for making this point? Even if no suitable reference can be found, I think the bullet should remain since it is demonstrably true. Also, it appears that the second bullet was removed by mistake, since it was not connected to that reference. Thus, I propose restoring both bullets to the disadvantages section. Any objections? --Headlessplatter (talk) 17:29, 12 September 2011 (UTC)

I strongly object. While the above observations might be true for a few select data sets and might have been reported in this one paper, I don't think this is a mainstream observation supported by a large body of literature. Only in that case it should find entry to an encyclopedia. This also holds true for the other bullet. (For the second bullet, btw, it is not clear where you would have the advantage and the disadvantage. Plus: it is only an disadvantage for random multivariate splits, and most random forests are based on univariate decision trees.) BM 78.42.107.187 (talk) —Preceding undated comment added 10:37, 19 September 2011 (UTC).

Algorithm
If one reads the Bayes classifier article, you then have a reasonable chance of banging out a working Bayesan algorithm. I do not believe this is the case for the Random Forest article. I'm not sure one can glean enough information from the description presented here to write code that incorporates the many ideas from the lead paragraph. DavesPlanet (talk) 15:03, 3 January 2012 (UTC)

Overfitting
Random Forests do not overfit. The testing performance of Random Forests does not decrease (due to overfitting) as the number of trees increases. Hence after certain number of trees the performance tend to stay in a certain value. There are tons of examples about that.

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#mislabel — Preceding unsigned comment added by 78.181.156.92 (talk) 12:01, 26 August 2012 (UTC)


 * This is completely false. No algorithm that trains its model is immune to overfit. (That which is considered noise in one problem could be considered signal in another problem, so only a magical algorithm could always discern what the user wants.) A Random Forest with few trees is quite prone to overfit to noise. This is easily demonstrated because RF with just one tree is the same as a single tree. As more trees are added, the tendency to overfit generally decreases. It never, however, approaches zero. No number of trees will ever remove overfit. The RF page here on Wikipedia gives a visual demonstration of this behavior. It is also well documented in published literature. Segal et al. showed that Brieman's choice of datasets lacked significant noise, and that RF does in fact overfit (http://escholarship.org/uc/item/35x3v9t4.pdf). Gashler et al. showed empirically that Random Forest does not handle noise injected into arbitrary datasets as well as entropy-reducing decision trees (http://axon.cs.byu.edu/papers/gashler2008icmla.pdf). This is precisely because RF overfits to the noise, whereas the entropy-reducing trees overfit less. The method of randomly choosing divisions is inherently more susceptible to overfit because it gives equal probability to all attributes. Breiman has a conflict-of-interest in promoting RF, and repeatable experiments show that he is wrong to suggest that RF never overfits. Further, the notion of an algorithm that does not overfit is absurd in context of the No Free Lunchn Theorem. Adding more trees to the ensemble does in fact reduce variance, but it does not correct any bias. Please stop repeating Breiman's false marketing efforts.--Headlessplatter (talk) 13:45, 28 August 2012 (UTC)

From my point of view it's an academic discussion and in academic discussions, people does not make marketing. I wasn't even aware of it was Breiman's algorithm. I was just making a research about it and since some "berkeley website" contradicted with wikipedia, I just wanted to show it. And from my personal experiance the correct statement should be, "Random forests has tendency to overfitting but as the number of trees increases this tendency decreases. That's why as you add more trees the testing performance does not fluctuates and tend to stay in a minima, hence adding more trees does not lead to overfitting but actually decreases it." — Preceding unsigned comment added by 78.181.128.141 (talk) 13:36, 31 August 2012 (UTC)


 * I am sorry. I thought you were saying RF cannot overfit anything. I see now that you were really saying that additional trees do not promote overfit. I believe that is true. (That is probably what Breiman meant too.) --Headlessplatter (talk) 23:03, 31 August 2012 (UTC)


 * The fact that random forests have proven in practical applications to be often the most successful of all available techniques indicates that this article is providing a biased view by only presenting a toy example where it does very badly against a simpler method. The article urgently needs better examples in order to better inform readers. 31.185.205.225 (talk) 18:38, 26 February 2013 (UTC)

As I understand it, Breiman shows in his paper that by Law of Large Numbers, as the number of trees goes to infinity, there is a limit to the generalization error. So he concludes random forest cannot "overfit". This is nice because that means your generalization error won't rocket off to infinity as you keep adding trees, but most people would understand "overfit" to mean that your test error is significantly greater than the training error. Obviously that can and does happen. --C S (talk) 12:20, 5 November 2013 (UTC)

Unfortunately, academic writing often has marketing in it (not that a website is really academic writing). There is a great deal of status and, stemming from that, money involved in being the originator of popular and (perceived to be) high performing machine learning algorithms. But to the question at hand, as I understand it, as the number of trees in random forests approaches infinity, the random forest estimate will approach the expected estimate of a tree built on a (bootstrap) sample of the training data. Since this limit is conditional on the training data, it can and will overfit (in the sense of customizing to the training data rather than the population). I suspect this is a relatively benign form of overfitting, since it is bounded by the difference in the training data distribution from the population distribution, rather than the difference between some error minimizing model of the training data and the population. — Preceding unsigned comment added by 90.224.108.221 (talk) 16:08, 31 March 2016 (UTC)

Major rewrite
I'm have started working on a major rewrite of this article. I plan to incorporate as much of the current article as I can but there is a lot missing right now.

I plan to extend the scope of the article significantly. Currently the article restricts itself to Breiman's original algorithm from 2001, which gives a very narrow picture of random forests and is not at all representative of how they are used in practice. I plan to incorporate many of the advances discussed in Antonio Crimini's article here: http://research.microsoft.com/apps/pubs/default.aspx?id=158806

There is also no discussion of theory in the current article. There has been a lot of work done on understanding the mathematical forces behind random forests, and although there are still many open problems here, there are several theoretical works which deserve to be mentioned here. I plan to add a section which gives a short overview of the major theoretical contributions to date. This will be more survey style than say, the SVM article, since the theory for random forests is much less well developed.

In short, the main goals are:


 * 1) Expand the article to cover more than Breiman's 2001 paper.
 * 2) Include some discussion of the current state of random forest theory.

Some other things I'm considering:

Nippashish (talk) 03:08, 10 March 2013 (UTC)
 * 1) Adding some links to major implementations of random forests.  I see from the article history that there used to be such a list but it was removed for link farming.  I'm leaning towards not re-adding it for similar reasons: policing it would be difficult.
 * 2) Discussions of some more exotic generalizations of random forests.  There are a lot of neat, somewhat exotic models which use random forests as a base, but this has the same risk as a list of links.
 * 3) Significantly more examples, similar to sections 3.3,4.3,5.3,6.3,etc of the Criminisi paper I linked above.  I think this would be quite useful but is a whole lot of work so I don't know if I'll ever get around to actually doing it.


 * As part of this rewrite I've removed the "Features and Advantages" and "Disadvantages" sections of the original article. These sections were just lists of bulletpoints with no discussion or justification and think the points are better made as part of other sections in the article.  I incorporated a few of the points from "Disadvantages" into the new "Variable Importance" section directly.


 * I have also removed the "Examples" section because I think it is very misleading. It appears that it was originally added to illustrate that growing trees to full depth leads to overfitting, but that is not at all clear from the text of the section.  There is also no reason that trees must be grown to full depth, and the depth is often limited in practice.  The section also claims that "Typically, random forest is best-suited for use with categorical features, but continuous features were used in this illustration because they were easier to visualize." implying that it is not appropriate for continuous features, which is simply false. Nippashish (talk) 07:22, 16 March 2013 (UTC)

Proposed merge with Random naive Bayes
The random naive Bayes article really only has a single reference for the idea of RNB, Prinzie and V.d. Poel 2007, which only has 11 citations according to GScholar. (The other references discuss somewhat similar ideas, such as AODE.) Q VVERTYVS (hm?) 10:33, 9 December 2013 (UTC)


 * I'd be in favor of simply deleting the random naive bayes article without merging. There's no reason to have an article for every different type of model that someone has applied bagging to. Maybe add a sentence here that says something like "Random feature selection and bagging have been applied to other models than trees." Nippashish (talk) 14:02, 19 January 2014 (UTC)

Section to add to the article
The following is a section added by a contributor. I removed it since it omitted some of the algorithmic elements already available in the article. Also, it adds references but without giving the exact source. I am pasting the edit below so others may revise it, and re-insert it to the article:

Tal Galili (talk) 19:34, 14 December 2013 (UTC)


 * Looks like a word-for-word copy of part of the introduction to Breiman's paper on Random Forest. It's not him, since he's passed away. --C S (talk) 14:58, 16 December 2013 (UTC)

Breiman's Algorithm
Significant improvements in classification accuracy have resulted from growing an ensemble of trees and letting them vote for the most popular class. In order to grow these ensembles, often random vectors are generated that govern the growth of each tree in the ensemble. An early example is bagging (Breiman, 1996), where to grow each tree a random selection (without replacement) is made from the examples in the training set. Another example is random split selection (Dietterich, 1998) where at each node the split is selected at random from among the K best splits. Breiman (1999) generates new training sets by randomizing the outputs in the original training set. Another approach is to select the training set from a random set of weights on the examples in the training set. Ho (1998) has written a number of papers on “the random subspace” method which does a random selection of a subset of features to use to grow each tree. In an important paper on written character recognition, Amit and Geman (1997) define a large number of geometric features and search over a random selection of these for the best split at each node. This latter paper has been influential in my thinking.

Proposed merge with Kernel random forest
The article Kernel random forest discusses some results published on arXiv and in tech reports (not peer-reviewed). I don't think the topic is notable enough to stand on its own, and that it should be discussed in a subsection of random forest, just like the relationship to nearest neighbors. Q VVERTYVS (hm?) 15:45, 27 May 2015 (UTC)


 * Agree - happy to do this in the near future Situphobos (talk) 20:48, 10 July 2016 (UTC)
 * seems that there are no objections to the merge, and that you have the expertise to do it; still interested? Klbrain (talk) 21:43, 18 October 2017 (UTC)

Variable Importance
This article seems to indicate that the bias (more categories -> more important) disappears if you use a "one-hot" instead of "integer" encoding.


 * WP:SPS. Wait till it gets published and peer reviewed. Q VVERTYVS (hm?) 14:57, 12 August 2015 (UTC)

The unsupervised learning section
The unsupervised learning section is almost entirely copy/pasted from the abstract of reference 21:

Shi, T., Horvath, S. (2006). "Unsupervised Learning with Random Forest Predictors".

Is that an issue? Besides that, the section seems pretty unclear to me. So the method is based on distinguishing observed vs. synthetic data, but how does that give a dissimilarity metric? — Preceding unsigned comment added by 173.71.96.250 (talk) 12:33, 26 May 2016 (UTC)

"Decision Stream" Editing Campaign
This article has been targeted by an (apparent) campaign to insert "Decision Stream" into various Wikipedia pages about Machine Learning. "Decision Stream" refers to a recently published paper that currently has zero academic citations. The number of articles that have been specifically edited to include "Decision Stream" within the last couple of months suggests conflict-of-interest editing by someone who wants to advertise this paper. They are monitoring these pages and quickly reverting any edits to remove this content.

Known articles targeted:
 * Artificial intelligence
 * Statistical classification
 * Deep learning
 * Random forest
 * Decision tree learning
 * Decision tree
 * Pruning (decision trees)
 * Predictive analytics
 * Chi-square automatic interaction detection
 * MNIST database  — Preceding unsigned comment added by ForgotMyPW (talk • contribs) 17:39, 2 September 2018 (UTC)

BustYourMyth (talk) 19:17, 26 July 2018 (UTC)

Dear BustYourMyth,

Your activity is quite suspiciase: registration of the user just to delete the mention of the one popular article. Peaple from different contries with the positive hystory of Wikipedia improvement are taking place in removing of your commits as well as in providing information about "Decision Stream".

Kind regards, Dave — Preceding unsigned comment added by 62.119.167.36 (talk) 13:33, 27 July 2018 (UTC)

I asked for partial protection at WP:ANI North8000 (talk) 17:09, 27 July 2018 (UTC)

Statements not supported by sources, possible WP:CITESPAM
Some very general statements regarding performance comparison of random forest performance versus other classifiers have been recently added by a single user. Those statements are always followed by multiple citations to works by Piryonesi and co-authors related to Pavement Deterioration Modeling. One common characteristic of all these three works is that they are recent and not published in mainstream machine learning journals. As such, they do not seem to represent authoritative references, and rather seem to be an attempt of self-promotion (see Conflict_of_interest/Noticeboard). I already removed some of these references scattered in the article without success, as the same user who inserted those works in the first place systematically reverted my changes. Unfortunately, there is not much that I can do in the current state, such that further review by a third party is needed.93.13.173.200 (talk) 23:04, 2 March 2021 (UTC)
 * , please see User_talk:Drmies. Drmies (talk) 16:00, 4 March 2021 (UTC)

Vague statements in "Algorithms" section
The third paragraph of subsection "Preliminaries: decision tree learning" in section "Algorithms" is very odd. It unsubstantiated, vague, and written in a style that does not match the rest of the article.

"Forests are like the pulling together of decision tree algorithm efforts. Taking the teamwork of many trees thus improving the performance of a single random tree. Though not quite similar, forests give the effects of a k-fold cross validation."

Some particularly unrigorous phrases used: - "pulling together of decision tree algorithm efforts" - "teamwork of many trees" - "Though not quite similar, forests give the effects of a..."

I suggest removing this paragraph entirely or rewriting to be more precise. Samvitj (talk) 21:41, 3 March 2023 (UTC)
 * removed ✅