User:Patrick/random sequence

For a random sequence of independent characters of an alphabet of u characters, almost surely the limit as n tends to infinity of (the number of occurrences of a given subsequence of length s) divided by n, is u^(-s).

Let t be the mean value of the length of the sequence up to and including the first occurrence of a given subsequence of length s. Then almost surely the limit as n tends to infinity of (the number of occurrences of the given subsequence of length s) divided by n, is 1/t if overlapping subsequences are not counted (we count the first, the first completely after that, etc.).

The ratio of the two values, t u^(-s), is the expected value, given an occurrence of the given subsequence, of the number of occurrences starting from that, before the first occurrence completely after that.

Let the given subsequence be periodic with period p (not necessarily a whole number of cycles), then this ratio is 1 + u^(-p) + u^(-2p) + .. + u^(- int((s-1)/p) p).

Thus $$u^s \le t \le \tfrac{u}{u-1} (u^s - 1)$$.

Examples in the case of an alphabet of 10 letters:
 * p=6 banana t = 1,000,000
 * p=5 bananb t = 1,000,010
 * p=4 banaba t = 1,000,100
 * p=3 banban t = 1,001,000
 * p=2 bababa t = 1,010,100
 * p=1 bbbbbb t = 1,111,110