Wikipedia:Reference desk/Archives/Mathematics/2014 March 19

= March 19 =

Outlier in a Set of Numbers
Hello. My question is this; I've got a set of six numbers, and they are 151, 100, 135, 107, 156, and 70. In this example, would 70 be considered an outlier? If so, is it the only one? Thanks. 65.129.183.240 (talk) 00:02, 19 March 2014 (UTC)


 * The definition I was told to use (but remember that it's arbitrary) is that if a number is more than one and a half times the interquartile range from the nearest quartile, it's an outlier. Under that definition, 70 is not an outlier.--Jasper Deng (talk) 07:44, 19 March 2014 (UTC)


 * As Jasper said parenthetically, there is no general mathematical definition of what constitutes an outlier. People in various settings use various rules of thumb, like the one above. Staecker (talk) 11:45, 19 March 2014 (UTC)


 * See Outlier. I tend to be just keep outliers which solves the problem ;-) Dmcq (talk) 13:02, 19 March 2014 (UTC)


 * Whether you should keep, and what you should consider outliers is a function of the nature of the data and and how reliable your data collection methods are than any inherent mathematical criterion. If your data is coming from an electronic device that is known to be accurate and reliable then you should probably use all of it. If the data is coming from some guy with a clipboard and the guy seems hungover most mornings, then you should probably consider any reading not close to the expected value as due to input error. It may also depend on the desires and/or honesty of the people who are receiving the results of the data analysis. I myself was in a situation where errors in the data consistently made my department appear to be producing more than it actually was. Perhaps coincidentally, the directive from the department head was to keep all the data, no matter how unreasonable individual readings were. --RDBury (talk) 14:19, 19 March 2014 (UTC)

Thanks, this has all been helpful. Kudos to Jasper especially. Anyway, in this particular data set, error isn't a possibility, and all of the numbers, for my purposes, are equally credible. One more thing, might someone elaborate a little on Jasper's method which looks at quartiles? Explain it in layman's (idiot's) terms? If not, no biggie. I'll check in eventually. Again, thanks. 216.64.190.250 (talk) 16:33, 19 March 2014 (UTC)


 * (error is always a possibility -- but may well be unlikely in your data. See uncertainty, measurement uncertainty, observational error, instrument error) SemanticMantis (talk) 16:53, 19 March 2014 (UTC)


 * Jasper's rule of thumb (or something very similar) is also given in the MathWorld page on outliers. AndrewWTaylor (talk) 17:43, 19 March 2014 (UTC)


 * You might also like to read Black swan theory. Also the ozone hole wasn't found for a while because software eliminated it as an outlier. Dmcq (talk) 19:22, 19 March 2014 (UTC)


 * By my calculation, 70 is only 1.5 sample SDs below the mean, which seems insufficient to condemn it, though again there's no generally-used criterion with this approach.→86.173.216.149 (talk) 19:48, 19 March 2014 (UTC)


 * Rather than just looking at distance from the mean, it seems to me that how close the other items are to each other also matters. So, while a value of 9 may not be an outlier where the average is 10, if the other values range from 9.1-10.9, it sure is an outlier if the other values range from 9.999-10.001. StuRat (talk) 22:36, 19 March 2014 (UTC)


 * That's why the quartile test and the SD test are used. I agree that they don't always clearly identify outliers, but either of them would in your examples.    D b f i r s   06:05, 21 March 2014 (UTC)

game theory
I am getting myself tangled up in game theory. I have a symmetric zero-sum game (that is, an antisymmetric payoff matrix A) and want to identify the optimal mixed strategy p. I know that this strategy is the same for both players, and that the game value is zero. So I am solving $$p^T A p=0$$ and $$ \sum p=1$$. I would rather avoid simplex methods if possible. I thought the way to deal with this would be to use the eigenvector which corresponds to the zero eigenvalue of A and then scale that so the sum is 1. But my eigenvector has both negative and positive elements, so something's wrong with my understanding. How to find the optimal mixed strategy without resorting to linear programming?. Thanks, Robinh (talk) 21:56, 19 March 2014 (UTC)


 * In general, I think you may need to do some linear programming. This doc has a nice exposition on how to solve these zero-sum games. --Mark viking (talk) 22:54, 19 March 2014 (UTC)


 * (OP) thanks for this, Mark. Ferguson is a super reference!  And it's nice to hear that linear programming is needed for sure in the symmetric case.  But the reason I asked for a solution without linear programming is that Ferguson's method on the bottom of page II-39 (his equations 16 and 17) fail because I know that the game value is zero, so his x's become infinite, and my linear program complains.  Any advice?  thanks, Robinh (talk) 23:05, 19 March 2014 (UTC)


 * I see what you are saying. I suppose the choice is to use Ferguson's LP formulation in equations (12) and (13) on page II-38, or try a hack using eqs. 16 and 17. Being a a physicist, I might try the hack of regularizing the problem by setting the value to a small positive constant and solving the problem. If the ratio of x's doesn't change as the value goes to zero, them you have a solution. But this is all hypothetical, I've never tried to solve a game this way. --Mark viking (talk) 23:45, 19 March 2014 (UTC)


 * (re-edit removing a misconception). Thanks for this, the trick was to perturb A (just as you suggested) and *then* apply (12) and (13).  Turns out my matrix had a dominating strategy after all. best wishes, Robinh (talk) 00:09, 20 March 2014 (UTC)