Wikipedia:Reference desk/Archives/Mathematics/2023 January 5

= January 5 =

Random numbers
If you pick n random reals from the set x through y the smallest real will be a certain distance from the second smallest, the second smallest will be a certain distance from the third smallest and so on, n minus 1 of these gaps in all. How often is the skinniest of the n-1 gaps narrower than g? And how often is the most gaping of the n-1 gaps wider than G? Sagittarian Milky Way (talk) 00:31, 5 January 2023 (UTC)
 * One observation: assuming that $$x<y,$$ the distribution of the gap sizes depends solely on the width $$y{-}x$$ of the interval $$[x,y].$$ Given the result for the unit interval $$[0,1],$$ the result for $$[x,y]$$ is obtained by replacing $$g$$ by $$g/(y{-}x).$$
 * Let $$r_1,r_2,...,r_n$$ be the sequence obtained by sorting the outcomes of $$n$$ independent and identically distributed random variables drawn from the uniform distribution on the unit interval. As I understand the question, it is about the distribution of
 * $$\textstyle{\min_{\,i=1}^{n{-}1}(r_{i{+}1}{-}r_i)},$$
 * and the same with $$\min$$ replaced by $$\max.$$
 * A related question that has been studied is when the endpoints $$0$$ and $$1$$ are included, setting $$r_0=0,r_{n{+}1}=1,$$ and inquiring about
 * $$\textstyle{\min_{\,i=0}^{n}(r_{i{+}1}-r_i)}.$$
 * This problem is commonly referred to as "interval splitting". I suppose (but have not verified) that the techniques used can also be applied to the problem when the endpoints are not included. Having taken a glimpse of some of the papers, I think this will not be a simple exercise. --Lambiam 19:53, 5 January 2023 (UTC)
 * Does anyone know the median or average result from splitting the unit interval into a thousand or hundred or 10,000 or 100,000 or million pieces? Or any of the x-sigma ranges i.e. if you split the unit interval to 1,000 pieces with 999 random numbers (or 1,001 pieces with 1,000 random numbers) there's a 2-sigma/~2.28% chance smallest gap is b leaving a ~95.44% chance it's between a and b. Sagittarian Milky Way (talk) 21:08, 5 January 2023 (UTC)
 * If I'm not mistaken, the cumulative distribution function of the shortest gap, as $$n$$ gets large, will approximate $$1-\exp(-n^2x)$$ for $$x\ll 1.$$ That puts the mean at about $$1/n^2$$ and the median at about $$\log(2)/n^2,$$ --Lambiam 22:43, 5 January 2023 (UTC)
 * This actually has an application to Wikipedia for when you pick a "Random article". Each article is assigned a random number, and when you press the Random article link another random number is selected and you're taken to the article with the next highest assigned number. Because the numbers assigned are random and not uniform, some articles have a much higher chance of being selected than others. I don't know if they do anything to ameliorate that issue; one way would be to periodically update each article's number. See FAQ/Technical for details. --RDBury (talk) 00:24, 6 January 2023 (UTC)
 * If you sort $$n-1$$ uniformly distributed random numbers and introduce $$r_0=0$$ und $$r_n=1,$$ you obtain the sorted sequence $$r_0, r_1,...,r_{n-1},r_n$$ with the $$n$$ gaps $$g_i=r_i-r_{i-1}.$$ The expected values, variances and covariances are
 * $$E(r_i)=\tfrac{i}{n},$$
 * $$V(r_i)=\tfrac{i(n-i)}{n^2(n+1)}$$ (found by simulation),
 * $$C(r_i,r_j)=\tfrac{i(n-j)}{n^2(n+1)}$$ for $$i<j$$ (found by simulation),
 * $$E(g_i)=\tfrac{1}{n},$$
 * $$V(g_i)=\tfrac{n-1}{n^2(n+1)} \xrightarrow{\text{large} \, n} \tfrac{1}{n^2},$$
 * $$C(g_i,g_j)=\tfrac{-1}{n^2(n+1)}.$$
 * For large $$n$$ each $$g_i$$ seems to be nearly exponentially distributed with the cumulative distribution function $$f(x)=1-e^{-nx}.$$ Then the probability, that a $$g_i$$ is greater than $$x,$$ is $$f(\infty)-f(x)=e^{-nx}.$$ Ignoring the correlations the probability, that all $$g_i$$ are greater than $$x,$$ is the $$n$$-th power $$(e^{-nx})^n = e^{-n^2x},$$ which leads to the cumulative distribution function for the smallest gap given above by Lambiam.
 * But the probability, that all $$g_i$$ are lower then $$x,$$ cannot be calculated by $$(f(x))^n.$$ This value is greater than 0 for $$x>0.$$ But for $$x < \tfrac{1}{n}$$ the probability must be 0, because there must be at least one $$g_i \ge \tfrac{1}{n}.$$ Ignoring the correlations does not seem to work in this case. .gs8 (talk) 15:16, 7 January 2023 (UTC)
 * The exact form of the distribution function for the largest gap is a piecewise polynomial function. This can be seen by considering the standard $n$-simplex whose barycentric coordinates correspond to the gap lengths. The random point whose coordinates are those of a random sequence of $$n$$ gaps $$(g_i)_i$$ has a uniform distribution over the simplex. The $$n!$$ different orderings by size of these gaps induce as many partitions (each a non-standard $$n$$-simplex). In each partition, one coordinate dominates throughout. By symmetry, for each coordinate, the partitions in which it dominates (the number of which equals $$(n{-}1)!$$) have the same shape, so it suffices to consider one. The value of the dominating coordinate is a linear function of the Cartesian coordinates, and an appropriate similarity-preserving transformation will make it equal to say the $$x$$-coordinate. Then, the indefinite integral of the hypervolume of the $$(n{-}1)$$-dimensional cross-section of the partition (after the transformation) with respect to $$x$$, scaled in such a way that the value of the integral ranges from 0 to 1, gives the cumulative distribution function. Between vertices, the hypervolume of a cross-section varies polynomially, and then so does the integral. This does not immediately open a path to the asymptotics, but perhaps it is a step on the way. --Lambiam 22:20, 8 January 2023 (UTC)