Wikipedia:Reference desk/Archives/Mathematics/2015 June 15

= June 15 =

Representing a big data set by a small data set having the same cumulants?
Consider a data set X = (X1, X2, . . ., XI) having mean value μ and standard deviation σ.

The one element data set (μ) has the same mean value as the big data set X.

The two element data set (μ-σ, μ+σ) has the same mean value and the same standard deviation as the big data set X.

I want to generalize this.

What is the three-element data set (A, B, C) having the same mean value and the same standard deviation and the same skewness as X?

What is the four-element data set (A, B, C, D) having the same mean value and the same standard deviation and the same skewness and the same kurtosis as X?

and so on.

Bo Jacoby (talk) 22:05, 15 June 2015 (UTC).


 * I think the way the proceed is this, first restate the problem in terms of moments. So you want A, B, C, .. so that A+B+C, A2+B2+C2, A3+B3+C3, ... have given values. These are power sums and you can use Newton's identities to convert these into elementary symmetric polynomials. Using these as coefficients, write down a polynomial. The roots of this polynomial are then the values A, B, C, ... that you want. For example, for two elements you want A, B so that P1=A+B=(2/n)&Sigma;i Xi and P2=A2+B2=(2/n)&Sigma;i Xi2. Then let E1=P1 and E2=(E1P1-P2)/2. The values A, B are now the roots of X2-E1X+E2=0. You have to solve an equation with degree equal to the size of the set. I think this is analogous to Chebyshev Quadrature but for arbitrary moments. (This is like Gaussian quadrature but with equal weights. Not sure if we cover this, but see .) With Chebyshev Quadrature you start to get complex roots for large n, so the same thing will probably happen here as well. I'm more (but still not very) familiar with the Gaussian type Quadrature because it applies the theory of orthogonal polynomials; in that case you're guaranteed to get real roots and you get more moments with the same number of data points, you just have to allow arbitrary weights. Not sure if there is a similar theory for Chebyshev. (There are Chebyshev polynomials but those are different afaik.)--RDBury (talk) 06:36, 16 June 2015 (UTC)


 * You can also consider the polynomial


 * $$p(x) = \prod_j (1-x X_j)$$


 * where the $$X_j$$ are the unknown data elements of the small data set (denoted by A, B, C, etc. by Bo above). The series expansion of the logarithm is then given by:


 * $$\log\left[p(x)\right] = -\sum_{k=1}^{\infty}\frac{M_k}{k}x^k$$

where


 * $$M_k = \sum_j X_j^k$$

are the moments that are known. So, you can directly write down the logarithm of the polynomial using the known moments, exponentiation is easy using most computer algebra systems (I'm sure Bo can write a compact J program for this :) ) and then the $$X_j$$ can be extracted from the zeros (and I think there is a simple J routine for that too.) So, I wouldn't be surprised if Bo can come up with a one line J program that will do the job. Count Iblis (talk) 15:17, 16 June 2015 (UTC)


 * Thanks gentlemen! I think I am on track now. Bo Jacoby (talk) 07:04, 17 June 2015 (UTC).

This is a one line J program implementing RDBury's method for two elements. (Oops: the double apostrophes around p q are changed to italics by the WP editor!) simplify=. 3 : '|.>{:p.(-:q-*:p),p,_1[p q=.2*}.(%{.)+/y^/i.3' simplify 1 2 2 2 3 1.36754 2.63246   simplify simplify 1 2 2 2 3 1.36754 2.63246 Bo Jacoby (talk) 10:38, 17 June 2015 (UTC).
 * I took the liberty of fixing your apostrophe issue. -- Meni Rosenfeld (talk) 23:00, 17 June 2015 (UTC)
 * Thank you Meni! Bo Jacoby (talk) 04:33, 18 June 2015 (UTC).

This 6-liner implements RDBury's method for computing three elements and four elements etc. As predicted the roots are sometimes complex. simplify=. 4 : 0 y=.x*}.(%{.)+/y^/i.>:x x=.1 for.y do.x=.((-/x*(#x){.y)%#x),x end. -|.>{:p.x ) Examples:     1 simplify 1 2 2 2 3 2   2 simplify 1 2 2 2 3 1.36754 2.63246   3 simplify 1 2 2 2 3 1.2254 2 2.7746   4 simplify 1 2 2 2 3 1.05666 2j0.29983 2j_0.29983 2.94334   5 simplify 1 2 2 2 3 1 2 2 2 3   10 simplify 1 2 2 2 3 1 1 2 2 2 2 2 2 3 3 Thank you everybody. The problem is solved. -- Bo Jacoby (talk) 20:37, 18 June 2015 (UTC).