Talk:Decision tree learning

Untitled
I'm very grateful for the work that's gone into this article so far, but I think that it's missing a couple of key points that would be helpful to readers. First, there is no discussion of pruning - what necessitates it, and what algorithms are used to guide it? Second, although Gini impurity and Information gain are discussed in their own section, there is no discussion explaining their application in the construction of decision trees (i.e. as criteria for selecting values on which to split a node).

Here's hoping that these observations are not out of place; this is my first-ever contribution to Wikipedia and I suppose I ought to go play in the sandbox and then add to the article myself, eh?

Yoy riblet 04:41, 18 July 2007 (UTC)

There's also a section missing on the _disadvantages_ of decision trees. For instance, ISTM that a decision tree can only divide the input region with axis-parallel lines (neural networks do not have this restriction). -Thenickdude (talk) 02:20, 3 November 2008 (UTC)

Copyright problem

 * I'm concerned that the example in this page is needlessly copied without attribution from Quinlan's paper "Induction of Decision Trees" - in particular the table's content and organization is directly copied from Table 1. This is unlikely to be considered fair use since the specific example is not needed to comment on the method, much less decision trees in general. I suggest creating a novel example for this article (I don't believe No original research is an issue here, since we're describing decision trees in general and not any particular algorithm). Dcoetzee 00:10, 1 March 2009 (UTC)


 * I agree - the example seems to have been lifted from Quinlan, with tweaks. More importantly from a pedagogical standpoint, the example is not about decision trees in machine learning, but about decision trees in decision analysis, so it's in the wrong page. I am removing that section, as I don't see any benefit to having it in this article. --mcld (talk) 11:49, 3 November 2009 (UTC)


 * On a related note: the example is weird. The target variable is sort of a pretend categorical variable with values "Play" or "Don't Play", when what is really being measured is the number of people who show up on a given day.  The example talks about the decisions as if they are describing groups of people, which is a stretch from the data.  If there are no objections, I'm going to make up a completely different example, using Wikipedia data.  riedl 12 June 2009

Poor explanation
I read the general description several times and I'm confused. First of all, what is the target variable in the graph? There are two numbers under each leaf (I'll assume it's the probability of survival). What is the source set? The set of passengers? How do you assign probability of survival to a given passenger? What is the meaning of "... when the subset at a node all has the same value of the target variable"? Passengers with the same probability of survival? The graph will only make sense as an illustration if the mapping between it and the general description is made explicit at every step. Bartosz (talk) 19:46, 10 June 2011 (UTC)

Inclusion of "QuickDT" in list of Implementations
Two days ago a new decision tree learning implementation called QuickDT was added to the list of implementations. This is after QuickDT was posted to reddit/r/MachineLearning, to a favorable response, and also to mloss.org. It is already in commercial usage, and its author is well known.

Shortly afterwards this was removed by User:X7q with the explanation "wikipedia is not a place to promote your pet projects".

My opinion is that there is as much justification to include QuickDT in this list as any of the implementations already listed, and so I have restored the link.

If User:X7q would like to elaborate on their reason for thinking that this decision tree learning implementation should not be included in this list of decision tree learning implementations, I am happy to discuss it.

--Javalangstring (talk) 15:59, 22 September 2011 (UTC)


 * First of all, you should not add links to your own site (WP:EL, WP:COI). In a comment you wrote "This is a list of decision tree learning implementations (of which there aren't very many).". Both your points are wrong. Wikipedia is not a collection of links. Decision trees are a highly popular data mining tool and as result there are a lot of decision tree algorithms and their implementations, both open and closed source. And they aren't that hard to implement, so I imagine many students learning this topic (including myself long time ago) have implemented them - I don't see how this is different from what you've done. -- X7q (talk) 21:35, 22 September 2011 (UTC)


 * If you can't find a decent decision tree implementation on google, then you're probably googling for wrong keywords. Don't google for "ID3", "CART", "C4.5", etc - noone uses them anymore. Ensembles are the start-of-art today. Google for AdaBoost, random forest, or, to give a more recent example, additive groves - all these algorithms use decision trees as building blocks. -- X7q (talk) 21:47, 22 September 2011 (UTC)


 * I take your point that adding a link about a project I'm affiliated with may have been a violation of the letter of the guidelines. However, I don't believe it is a "conflict" of interest as I have no commercial interest in this project.  My only motivation is to fill a need for an easy-to-use Java decision tree library.  In this regard, my interests are aligned with, not in conflict with the goals of Wikipedia.


 * While we are being sticklers about Wikipedia guidelines, I believe they also state that you should assume good faith WP:AGF, which you failed to do when you questioned my motives in your original revision: "wikipedia is not a place to promote your pet projects". If my goal was self-promotion, I can think of better ways to do it than a link from an obscure wikipedia article to a github page that barely mentions my name.


 * I did look very hard for a Java decision learning tree library prior to implementing QuickDT, and I can assure you that I'm a proficient user of Google. I found two, jaDTi, which has a terrible API and has been dormant since 2004, and Weka, which also has a horrible API, not at all idiomatic Java (they use Vectors and ungenericized collections for crying out loud), and is a resource hog.  I wasted several weeks trying to make each of these fit my needs, they failed to do-so.  Decision tree learners may not be hard to implement, but there is more to writing a good library than simply implementing the algorithm, you also need a well-designed API.


 * In short, there are no good Java options.


 * I'm familiar with ensembles, they don't suit everyone. In my particular situation I required 100% recall accuracy on the training set, and ensembles degenerate to simple decision tree learners in this case.  Ensembles are also less well suited if your goal is to gain some insight into the data, rather than simply good predictive behavior.


 * That is why I originally came to this page about decision tree learning, and not a page about ensembles. Even if you consider decision tree learning algorithms to be obsolete, they are the subject matter of this article, and you shouldn't presume that visitors to this article are really looking for something else when they come here.


 * Because of WP:COI I will not make any further edits on this matter, however I request that you reconsider your removal of the link in the light of my argument above. If you decide against it I may request third-party arbitration. Javalangstring (talk) 22:32, 22 September 2011 (UTC)


 * "Based on this research, there are only two Java decision tree". Just among well-known libraries. There's probably tons of less known code implementing decision trees, likely as part of something bigger, e.g. ensemble methods. For example, random forests in Apache Mahout. There is DT code in there, it's just maybe a bit harder to reuse. And why Java? We're aren't a javapedia, let's avoid bias to particular languages. -- X7q (talk) 23:08, 22 September 2011 (UTC)
 * "required 100% recall accuracy on the training set": What a strange requirement. It's very easy to fulfill though - simply memorize the training set (1-NN). -- X7q (talk) 23:08, 22 September 2011 (UTC)
 * "Weka, which also has a horrible API," That because it's designed to be very universal, it has everything and a kitchen sink. If I were you, I'd write a wrapper around it instead of spending time on reimplementing and debugging. -- X7q (talk) 23:08, 22 September 2011 (UTC)


 * There may be "less known code", but type of person that would find this page, and perhaps wants to experiment with decision trees doesn't want to decipher Weka's ugly API, nor do they want to try to extract the relevant parts of the Mahout codebase. They want a stand-alone library with a well designed API, and right now the only option if they use Java is QuickDT.
 * There's an excellent recent ML book - http://www-ist.massey.ac.nz/smarsland/MLbook.html - which contains simple to understand implementations of basic algorithms, including decision trees. Useful for a beginner, trying to understand the algorithms: it offers both explanations and code to demonstrate it. It's in Python, though, which is however a quite popular language in ML community. Sources from book are available only. I think it would be better to point readers to this book than to a new unknown library. -- X7q (talk) 00:58, 23 September 2011 (UTC)
 * You're confusing two separate needs. One is a desire to understand the algorithms, the other is the desire to use them.  QuickDT is intended for the latter. Your definition of "unknown" is subjective, the library has received attention on a number of websites dedicated to machine learning.  People need a variety of options in a variety of languages -- Javalangstring (talk) 01:19, 23 September 2011 (UTC)
 * "They want a stand-alone library with a well designed API," - when I look for a library to use in a real world project, I persornally care much more about its proven stability, lack of bugs, and number of users, than about API. Right now Weka is the definite winner there. -- X7q (talk) 01:04, 23 September 2011 (UTC)
 * Why must there be a single "winner", and why must you be the judge? You are supposed to set your personal opinions aside when acting as a Wikipedia editor.  Let readers decide how they prioritize those things -- Javalangstring (talk) 01:19, 23 September 2011 (UTC)
 * Why Java? Because according to widely used metrics it is the world's most popular programming language and is particularly well-suited to CPU intensive tasks, like machine learning.  It is likely that more readers of this page will be familiar with Java than most or all other languages, so I don't think its "bias" to ensure that it's well represented.
 * Full recall is not really a strange requirement for real-world use of machine learning. It allows you to guarantee that the learning algorithm will exhibit correct behavior on known data, which is very useful in many scenarios.  And yes, its easy to achieve with a decision tree algorithm, but as I said it causes ensemble algorithms to degenerate to simple decision tree algorithms, yielding no benefit.
 * At the cost of much worse performance on unseen data due to weak algorithm! Are you even doing cross-validation? That's the standard way to check that an algorithm performs well on known data. -- X7q (talk) 00:58, 23 September 2011 (UTC)
 * You don't know that the algorithm is weak, and you shouldn't make editorial decisions based on such assumptions. There really is no reason to be patronizing, I've been working in machine learning for over 15 years, I know exactly what cross-validation is, I eat drink and sleep it -- Javalangstring (talk) 01:19, 23 September 2011 (UTC)
 * Weka's API is not horrible because it is designed to be universal, it is horrible because its API was designed prior to modern Java language features and API design conventions, and it was never updated to take advantage of them. For example, it makes heavy use of the "Vector" class which was effectively deprecated in 1998 with the release of Java 1.2 due to its inefficiency.  Weka then replaced it with their own custom-built FastVector class to address its inefficiency, but it doesn't implement any of the Java collections interfaces.  If you know any Java developers get them to take a look at Weka's API, they will recoil in horror.
 * Writing a wrapper for Weka would be putting lipstick on a pig, and could only address some of its flaws, not its high resource usage, nor the fact that it requires pre-processing of the dataset to enumerate all values in nominal fields.
 * I think its fairly clear that we aren't going to agree, I will request a third opinion and if they agree with you I'll drop this --Javalangstring (talk) 00:21, 23 September 2011 (UTC)

I might actually approach this from another angle: notability and reliable sourcing. Has this project been discussed in academic papers? Mentioned on the news? Have reliable sources covered QuickDT, such that we can usefully cover and discuss it in an encyclopedic sense, ideally even maintaining an article about it? If so, then I'd say it's absolutely worth a mention. If not, then I'm more dubious of its inclusion. I mean no ill will to the project, itself (actually, I think it's pretty cool!), but ultimately countless multitudes of software projects are started every year, and it's difficult to decide which ones in particular should be mentioned here. (For what it's worth, I'd say that this also goes for the other external link, "Sipina"). – Luna Santin  (talk) 05:18, 26 September 2011 (UTC)

Bayesian CART
I'm surprised that there is no information about Bayesian CART. It is highly used in many models now. Somebody should add information about this on the page.

Bayesian CART Model Search by. Hugh Chipman, Edward I. George and Robert E. McCulloch

This is the paper talking about the Bayesian method to produce trees. This paper has been cited 298 times. Many modern tree models use it as a basis for creating the trees. 74.56.25.108 (talk) 23:37, 25 September 2011 (UTC)

Poor example tree
Rather than creating rich context, using Titanic survival as an example invites needless (here) political and social distractions, and should be replaced with a less complex example. The existing example distracts the reader with questions of the role of women in the early 20th century and why wealth (class of accommodation) was omitted by the example writer. Also, the decisions based on .5 values of integers (years of age; number of siblings+spouses) are needlessly confusing. I'm not an expert in decision trees, and so appeal to those more familiar with the topic to offer a better Decision Tree example. Knowthhill (talk) 17:04, 14 April 2013 (UTC)


 * I do agree, there are no citations for the values quoted as well. A different cited example would be good. --Salix (talk): 19:09, 14 April 2013 (UTC)

Another comment regarding the Titanic survival example: the symbols are misleading and incorrect. If one branch is "age<3", the other must be "age>=3" rather than "age<=3" etc. — Preceding unsigned comment added by 2A02:8108:50BF:FCC4:295E:1EAE:3A4C:A987 (talk) 10:58, 9 December 2022 (UTC)

Link to Gini Coefficient page
While both the Gini Coefficient (on the linked page under Gini Impurity) and Gini Impurity measure (used here) were both created by the same person (Gini), they are not related to each other. I believe the link here as it is written can lead one to believe they are the same thing. Perhaps instead of "Main article: Gini Coef" it would be better to have "Not to be confused with: Gini Coef". Best citation I've found for this is this (page 8) 38.99.52.154 (talk) 20:58, 12 March 2014 (UTC)Nathan
 * Go for it!
 * In case you are wondering, there is documentation at Template:Distinguish on how to get it to say "not to be confused with".
 * Yaris678 (talk) 07:58, 13 March 2014 (UTC)
 * Cool. Yaris678 (talk) 08:25, 14 March 2014 (UTC)

'Uses' section
There should be a section talking about what decision trees are used for out in the wild.Richard☺Decal (talk) 20:08, 24 March 2017 (UTC)

Questionable statement in opening line of Limitations: "Trees do not tend to be as accurate as other approaches."
This seems to be a very damning, yet very controversial statement, thus it should either have more support and explanation, be re-written, or be removed.

One textbook is referenced to which I do not have access. OTOH I could reference several hundred scholarly articles indicating the opposite, including famous ones such as Breiman's*** (and to be fair, I could also add some references from articles that may provide additional support to the original statement).

I don't want to start an editing war, but would it be fair to mitigate this statement?

Note that the author doesn't even tell us what the other, allegedly more accurate, methods are and/or under what circumstances accuracy is higher for the other algorithms and worse for tree models.

Breiman, L. Machine Learning (2001) 45: 5. doi:10.1023/A:1010933404324
 * Just a couple I quickly grabbed as examples:

Breiman. L. (1998b). Randomizing outputs to increase prediction accuracy. Technical Report 518, May 1, 1998, Statistics Department, UCB (in press, Machine Learning).

96.244.43.227 (talk) 12:50, 27 May 2017 (UTC)

"Decision Stream" Editing Campaign
This article has been targeted by an (apparent) campaign to insert "Decision Stream" into various Wikipedia pages about Machine Learning. "Decision Stream" refers to a recently published paper that currently has zero academic citations. The number of articles that have been specifically edited to include "Decision Stream" within the last couple of months suggests conflict-of-interest editing by someone who wants to advertise this paper. They are monitoring these pages and quickly reverting any edits to remove this content.

Known articles targeted:
 * Artificial intelligence
 * Statistical classification
 * Deep learning
 * Random forest
 * Decision tree learning
 * Decision tree
 * Pruning (decision trees)
 * Predictive analytics
 * Chi-square automatic interaction detection
 * MNIST database  — Preceding unsigned comment added by ForgotMyPW (talk • contribs) 17:39, 2 September 2018 (UTC)

BustYourMyth (talk) 19:15, 26 July 2018 (UTC)

Dear BustYourMyth,

Your activity is quite suspiciase: registration of the user just to delete the mention of the one popular article. Peaple from different contries with the positive hystory of Wikipedia improvement are taking part in removing of your commits as well as in providing information about "Decision Stream".

Kind regards, Dave — Preceding unsigned comment added by 62.119.167.36 (talk) 13:37, 27 July 2018 (UTC)

"Decision Stream" is well known method, which can be placed in Wikipedia. — Preceding unsigned comment added by 185.176.76.220 (talk) 14:13, 27 July 2018 (UTC)

Aggressive [citation/citation needed]
I understand the need for [citation needed] when claims are made without citation, but this sentence:

A tree can be "learned"[clarification needed] by splitting the source set[clarification needed] into subsets based on an attribute value test[clarification needed][citation needed].

Is near unreadable. Wouldn't a single notation been sufficient? — Preceding unsigned comment added by 198.244.99.210 (talk) 06:48, 23 November 2018 (UTC)

Split criteria for regression trees
The split should be minimization of overall sum of squares and not the reduction in variance (c.f, Hastie,Tibshirani & Friedman, Elements of Statistical Learning). Specifically, the criteria ought to be

$$ \sum_{ i \neq j \in S} (x_i -x_j)^2 - (\sum_{ i \neq j \in S_l} (x_i -x_j)^2 +\sum_{ i \neq j \in S_r} (x_i -x_j)^2 ) $$

where $$ S, S_l \text{and} S_r $$ represents the set of points at the node, the left and right leaf nodes. — Preceding unsigned comment added by 115.110.204.60 (talk) 06:19, 25 February 2019 (UTC)
 * Yup! This has been fixed. Bennetto (talk) 22:31, 10 March 2021 (UTC)

Order of the calculus in the Information gain example
The information gain example reads "To find the information gain of the split using windy, we must first calculate the information in the data before the split." but the sentence is near the end of the example. So I suggest to put it at the beginning, which would be more consistent with the equation before the example. — Preceding unsigned comment added by Vl8r (talk • contribs) 14:21, 13 October 2020 (UTC)

Needs to quote dataset in Metrics > Information gain
In Metrics > Information gain, there attached an example:

"Consider an example data set with four attributes: outlook (sunny, overcast, rainy), temperature (hot, mild, cool), humidity (high, normal), and windy (true, false), with a binary (yes or no) target variable, play, and 14 data points. To construct a decision tree on this data, we need to compare the information gain of each of the four trees, each split on one of the four features. The split with the highest information gain will be taken as the first split and the process will continue until all children nodes each have consistent data, or until the information gain is 0.

To find the information gain of the split using windy, we must first calculate the information in the data before the split. The original data contained nine yes's and five no's."

At the end of the subsection "Information gain", the article did quote: "This example is adapted from the example appearing in Witten et al.".

However, to speak of the dataset (and perform mathematics on said dataset) without giving examples of the dataset (ie. how the dataset looks, what it is comprised of) yields little demonstrative values.

The writer of this article needs to attach the dataset before the example. 2402:800:61B1:7C49:B1AD:31EB:F1EC:29CF (talk) 04:29, 3 February 2023 (UTC)