Wikipedia:Reference desk/Archives/Mathematics/2022 September 24

= September 24 =

Eulerian quadruples
To clear something up; I use "Eulerian n+1-tuple" for all n>2 when we need n nth powers summing to an nth power; this is a generalization of a Pythagorean triple that must be distinguished from a "Pythagorean n+1-tuple", which means n squares sum to a square.

Because SSSS doesn't prove a unique shape and size of a quadrilateral, no theorem exists relating Eulerian quadruples to the sides of a quadrilateral where each side has that number of units long (that is, we can't know the exact shape and size of a quadrilateral whose sides are 3, 4, 5, and 6 units long.) But what about proving a tetrahedron's shape and size from the areas of its faces?? That is, if we had a tetrahedron whose face areas have 3, 4, 5, and 6 square units; what does this show about the tetrahedron?? (In case you're wondering, 3,4,5,6 is an Eulerian quadruple because 3^3+4^3+5^3=6^3.) Georgia guy (talk) 15:46, 24 September 2022 (UTC)


 * The areas of the 4 faces gives us 4 equations to be satisfied. The shape of a tetrahedron is determined by the length of its 6 edges. Provided that each of the groups of 3 edges that are to form the edges of one of the 4 faces satisfy the triangle inequality, these lengths can be chosen freely. So there are 6 unknowns to solve for, more than the number of equations. Therefore, given the areas of the 4 faces, there are in general many tetrahedra whose faces have these given areas. This has absolutely nothing whatsoever to do with any properties related to sums of powers of integers. --Lambiam 17:06, 24 September 2022 (UTC)
 * What I'm trying to do is see if there's a way to generalize the Pythagorean theorem to the general concept of n nth powers summing to an nth power. Georgia guy (talk) 17:11, 24 September 2022 (UTC)
 * The Pythagorean theorem is not about arbitrary 2-simplices, but specifically about those having a right angle. The first step when attempting a generalization to $n$-simplices should be to define a plausible generalization of "right angle" applicable to higher dimensions. --Lambiam 21:31, 24 September 2022 (UTC)


 * The Euler's sum of powers conjecture talks about sums of nth powers adding to an nth power. But afaik it doesn't really have anything to do with geometry or the Pythagorean theorem other than the form of the equation. As mentioned above, unlike triangles with SSS, the areas of the sides of a tetrahedron do not determine the congruence class or even the volume. If you define a right tetrahedron as one where all angles at one vertex are right, the the formula for the areas of the sides is A2+B2+C2=D2; all squares and not higher powers. Generalizing from the plane to higher dimensions is usually very difficult, not a matter of adding variables and increasing exponents. --RDBury (talk) 18:12, 25 September 2022 (UTC)

outliers
What is the largest set of datapoints that cannot contain any outliers? I want to say 2 but I don’t know if that’s right. Duomillia (talk) 19:01, 24 September 2022 (UTC)


 * To answer the question we need a usable definition of "outlier". As the book Statistics in a Nutshell points out, "There is no absolute agreement among statisticians about how to define outliers". A common definition is: a data point that is very different from the rest of the data. To apply this definition, a measure of this difference is needed, as well as a threshold separating "very" from "not very". If the data appears to follow a normal distribution, one can use its distance to the central tendency, divided by the square root of the variance. Somewhat arbitrarily, one can then set a threshold of, say, five sigma. If someone sees a normal distribution when looking at a five-point data set, my diagnosis is that they are suffering from statistical pareidolia. One needs a model for the distribution of "normal" points based either on a priori considerations, or because it is strongly suggested by the data. --Lambiam 21:24, 24 September 2022 (UTC)


 * Yes, but accepting that there is such a thing as an outlier means that a precise definition is not needed. Clearly, two points are equally favoured no matter how far apart, whereas it is possible to have two points any distance apart and a third much further from each. We don't need to worry about the precise meaning of "any" and "much" to see that an outlier is possible in this case, and by extension the case of any greater number of points. →2A00:23C6:AA0D:F501:D0B:AB28:1214:C2 (talk) 09:23, 25 September 2022 (UTC)
 * Taking a census of how many children my three nephews have, I find that two of my nephews have no children, whereas the third has two children. So I have the data "set" (a misnomer; really a multiset) {0, 0, 2}. The data point 2 is ∞ly farther out from the other points than they are from each other. Is it justified to call that point an outlier? The point of classifying data points as outliers is to justify their exclusion when computing a statistic. It would be unreasonable to discard this data point when asked to give the average number of children. --Lambiam 10:45, 25 September 2022 (UTC)
 * Good example. In general, statistical inference doesn't really work with small sample size, whether you include "outliers" or not. A lot depends on context; real life data often contains measurement errors, transcription errors and errors from other sources, but you have be familiar with the situation to be able to identify them. Without knowing specifics it's impossible to say whether an unusual value is better explained by statistical fluctuation or as the result of an error somewhere. For example, suppose your data consists of 12 values, 5, 6, 4, 7, 3, 5, 3, 0, 6, 4, 5, 7. Without knowing anything about the source then the 0 seems unusual but not unreasonable. But suppose this represents the number of crimes committed in a certain neighborhood by month, and you happen to know that these values come from reports that are prepared and submitted by another person. Now it starts to look like that 0 might be the result of that person going on vacation or forgetting to do the report instead of there actually being zero crime that month. When you're doing data analysis, you assume that the data is has been scrubbed of errors and accurately represents reality. In my experience this is rarely the case though; the processes involved are complex and erroneous data creeps in, so the best you can do is weed out values that seem unreasonable, hence the concept of "outlier". But erroneous data can still look reasonable, for example the last 7 could actually be incorrect with the true value being 11. A value of 11 is certainly possible value due to statistical fluctuation, or there could be some systemic reason that crime is more likely in December. But since the 7 "seems" more reasonable it would never be filtered out despite being far from the actual value. --RDBury (talk) 22:57, 25 September 2022 (UTC)