Talk:Feature selection

Untitled
This article lacks the improvements that came through to address the issue of nesting and curse of dimensionality. New methods have been devised including, but not limited to, Branch and Bound, and Piecewise Linear Network. I will sort through my resources and edit this article accordingly.Aboosh 07:49, 10 November 2006 (UTC)
 * There should be also a clear distinction between deterministic and stochastic approaches. And also a nice example could be a picture or a table for picking n features from N (n over N possibilities) or picking 1 to n features from N features (2^n possibilities).

Correlation vs. statistical dependence

 * "As a special case, the "correlation" can be replaced by the statistical dependency..."

As I understand, correlation is a special case of statistical dependence, not the other way around. So this sentence is incorrect. -Pgan002 23:03, 1 February 2007 (UTC)


 * Yes. correlation is a special case of statistical dependence. linas (talk) 18:52, 8 September 2008 (UTC)

metric, distance
This article makes a very sloppy use of the words 'metric' and 'distance' -- in order for something to be a distance measure, it must obey the triangle inequality, and yet, this article proceeds to imply that things like correlations are distance measures, which is just plain wrong -- correlations are propagators, not metrics. See metric (mathematics) for the proper definition. linas (talk) 18:52, 8 September 2008 (UTC) Bold text ==

Broken link: Feature Subset Selection Bias for Classification Learning --137.73.122.137 (talk) 15:55, 14 February 2011 (UTC)

Emphasize general approaches rather than specific techniques
This article doesn't make the differences between the various different kinds of feature selection algorithm apparent. Wrapper and filter approaches differ only in their choice of evaluation metric, whereas embedded techniques cover a range of algorithms which are built into classification systems. The latter half also appears to have become a place for people to plug their pet technique. I think someone should expand the general sections and move the various different kinds to separate pages and link to them. (I can do this if nobody else wants to). Craigacp (talk) 20:06, 27 February 2012 (UTC)


 * I've rewritten the first half of this article to cover the current viewpoint on feature selection in machine learning. There is still a fair bit of work necessary to clean up the rest of it. Craigacp (talk) 23:50, 22 November 2012 (UTC)

Subset Selection - True Metrics & Triangle Inequality
Why are folks so hung up on stating that correlation and mutual information are not true metrics because they do not obey the triangle inequality? If you are concerned about that, the standard thing that every person in the world does is to just use a symmetrized version of correlation and a symmetrized version of mutual information that DO obey the triangle inequality and ARE metrics. It seems like all the mathematicians who don't use this stuff in practice overlook that, or are concerned about pointing out that it's not a true metric and just a score when unsymmetrized when every engineer in the world is like "duh, i know that". — Preceding unsigned comment added by 173.3.109.197 (talk) 20:10, 3 March 2012 (UTC)

Feature Selection vs. Extraction
So what is the difference between feature selection and feature extraction? Should these two articles be merged? — Preceding unsigned comment added by 173.3.109.197 (talk) 17:18, 21 April 2012 (UTC)

Feature selection is a special case of feature extraction. Feature extraction generally destroys the original representation, which makes it difficult to interpret the results (i.e. its hard to figure out which of the original measurements lead to a particular outcome). In contrast feature selection just selects a subset of the inputs, ensuring interpretability. Something like the proceeding text should probably appear in the FS article, and an appropriate link appear in the feature extraction one. I don't think they need merging, as the two fields are very different, and have different uses. Craigacp (talk) 17:04, 27 May 2012 (UTC)

Proposed merge with Metaheuristics for feature selection
The article Metaheuristics for feature selection is actually about feature selection in general. The good bits (like the nice diagrams) should be merged into feature selection. Q VVERTYVS (hm?) 14:25, 3 February 2015 (UTC)

I agree with this, though the table of different techniques would need extensive trimming or updating if it made it into this article.Craigacp (talk) 23:33, 21 February 2015 (UTC)


 * The other page was speedily deleted by under WP:CSD. Q VVERTYVS  (hm?) 10:23, 22 March 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Feature selection. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20120511162342/http://featureselection.asu.edu:80/software.php to http://featureselection.asu.edu/software.php

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 09:35, 30 December 2016 (UTC)

Stepwise regression: wrapper or embedded?
The article currently says "In traditional statistics, the most popular form of feature selection is stepwise regression, which is a wrapper technique."

I thought that wrapper methods treat the induction algorithm as a black box, train all candidate models on the training data, and evaluate them on holdout data. So if you try adding a variable to your linear regression by training every one-additional-variable model, then evaluate all of those models on the holdout data, that would be wrapper forward search.

However, traditional stepwise regression doesn't use a separate holdout set or treat regression as a black box. You find the one additional variable which will reduce Residual Sum of Squares the most (without needing to train all of those models separately), and you decide whether to continue or stop by a criterion like Mallows' Cp, AIC, BIC, p-values, etc. (despite the known issues with these approaches). Even if you use cross-validation as the stopping rule, you choose the next model using the training data and only evaluate that single model on the holdout data---you do not evaluate all possible next models on the holdout.

Doesn't that mean stepwise regression is an embedded method, not a wrapper method?

Civilstat (talk) 16:37, 15 August 2017 (UTC)