Wikipedia:Reference desk/Archives/Mathematics/2015 December 28

= December 28 =

Probability of lucky streak in Poisson process
Given positive integers n, m and real $$T>0$$, suppose I have iid exponential variables (rate 1) $$X_1, X_2, \ldots, X_n$$. What is the probability p that there exists an integer index $$1\le i\le n-m$$ such that $$\sum_{j=i}^{i+m}X_j < T$$?

Intuition: The variables represent time between successive events in a Poisson process. An index i as described represents the start of a lucky streak, where $$m+1$$ events happen within a span of time T (assumed to be lower than the average time it should take). I want to find the probability of finding such a streak.

If it matters, in my application $$n=2(m+1)$$.

The probability v that a specific index starts a streak is easy to calculate using the Poisson or Erlang distribution. This gives a bound of $$2v \le p \le (m+2)v$$. But because of the correlations between the different indices, I can't figure out a way to calculate the exact probability.

If a symbolic expression can't be found, I will also appreciate methods to calculate it numerically or simulate it which improve upon the most naive simulation (my T is such that the event is fairly rare, so a naive simulation will take a lot of tries to reach sufficient precision). -- Meni Rosenfeld (talk) 00:34, 28 December 2015 (UTC)
 * For every i the variable
 * ∑undefinedm+i Xj
 * has a gamma distribution. These distributions are identical, but only approximately independent. The condition that at least one of the sums is below the threshold means that not all the sums are above the threshold, and the latter probability is approximately equal to the product of the probabilities that any of the sums is above the threshold
 * P(∃i:∑undefinedm+i Xj < T) = 1−P(∀i:∑undefinedm+i Xj ≥ T) ≃ 1−∏undefinedn−m P(∑undefinedm+i Xj ≥ T).
 * Bo Jacoby (talk) 23:07, 29 December 2015 (UTC).
 * Thanks. Since we're talking about events with low probability, multiplying the negative probabilities will give more or less the same result as adding the positive probabilities. This gives rise to the $$p \le (m+2)v$$ bound. In my application the ratio between the approximate and real values is about 2, which is not good enough - I need a better approximation. -- Meni Rosenfeld (talk) 23:23, 29 December 2015 (UTC)
 * (I corrected my bounds above to ∏undefinedn−m). What is the use of better precision than what is computed by naive simulation? What are your values for m and T? Why not use the formula 1−∏(1−Pi) rather than the approximate formula ∑ Pi ? Bo Jacoby (talk) 05:27, 30 December 2015 (UTC).
 * My T is 24 and m ranges between 48 and 53.
 * The formulas 1−∏(1−Pi and ∑ Pi are almost identical because the P's are very low ($$(1-a)(1-b) \approx 1-(a+b)$$ for $$a\approx 0,b\approx0$$). They're both approximations. One assumes the events are independent and the other assumes they're mutually exclusive, which for low-probability events is almost the same thing. And with my number they're both off by a factor of ~2.
 * Because the probabilities are low, a naive simulation will require many particles to obtain even one hit, and many more to get low variance on the estimate. The estimator relative variance is $$\frac{1}{pn}$$ so for low p the variance is huge unless n is huge.
 * And I need better precision because... I need better precision. I'm trying to find the correct answer to the question.
 * I've tried importance sampling but that also didn't give satisfactory results. There's another thing I'm trying out now. -- Meni Rosenfeld (talk) 11:30, 30 December 2015 (UTC)
 * When evaluating the gamma distribution you may benefit from the nice formula
 * $$\int_y^\infty {x^i\over e^x i!}dx=\sum_{k=0}^i {y^k\over e^y k!}$$
 * Bo Jacoby (talk) 18:38, 30 December 2015 (UTC).
 * If you make n experiments having success probability p, then the unknown number k of successes has a binomial distribution, k ≃ μ±σ where μ = np and σ2μ−1 = 1−p, but note that if you make n experiments giving k successes, then the unknown success probability p has a beta distribution, p ≃ μ±σ where μ = (k+1)(n+2)−1 and σ2μ−1 = (n−k+1)(n+2)−1(n+3)−1.
 * Bo Jacoby (talk) 07:50, 31 December 2015 (UTC).

Answer
The probability that $$\sum_{i=1}^m X_i>T$$ is $$\int_T^\infty {x^{m-1}\over e^x (m-1)!}dx=\sum_{k=0}^{m-1} {T^k\over e^T k!}$$.

The requested probability is $$p=1-\left(\sum_{k=0}^{m-1} {T^k\over e^T k!}\right)^{n-m}$$.

For T=24, m=48, and n=2(m+1)=98 we have $$p=1-\left(\sum_{k=0}^{47} {24^k\over e^{24} k!}\right)^{50}= 0.000521288$$

Do you agree?

Bo Jacoby (talk) 15:24, 1 January 2016 (UTC).

The stochastic variable $$Y=\sum_{i=1}^m X_i=\mu+\sigma Z$$ has mean value $$\mu=m $$ and standard deviation $$\sigma=m^{2^{-1}}$$. For T=24 and m=48 we have $${(T-\mu) \sigma^{-1}}=-2^13^{2^{-1}}$$.

Assuming that Z has approximately a standard normal distribution:
 * $$p_1=P(Y:@erf@%&(%:2)           NB. cumulative normal dist 0j15":p1=.n01cdf -2*%:3 0.000266002752570  -.50^~-.p1 0.0132138   50*p1 0.0133001   0j15":p2=.-.(+/(T^k)%!k=.i.48)*^-T=.24 0.000010428432773  -.50^~-.p2 0.000521288 Bo Jacoby (talk) 14:41, 2 January 2016 (UTC).

Mathematical algorithm
(Removed duplicate question. See WP:RDS. -- ToE 13:39, 28 December 2015 (UTC))