Sieve of Sundaram

In mathematics, the sieve of Sundaram is a variant of the sieve of Eratosthenes, a simple deterministic algorithm for finding all the prime numbers up to a specified integer. It was discovered by Indian student S. P. Sundaram in 1934.

Algorithm
Start with a list of the integers from 1 to n. From this list, remove all numbers of the form $i + j + 2ij$ where:
 * $$i,j\in\mathbb{N},\ 1 \le i \le j$$
 * $$i + j + 2ij \le n$$

The remaining numbers are doubled and incremented by one, giving a list of the odd prime numbers (i.e., all primes except 2) below $2n + 2$.

The sieve of Sundaram sieves out the composite numbers just as the sieve of Eratosthenes does, but even numbers are not considered; the work of "crossing out" the multiples of 2 is done by the final double-and-increment step. Whenever Eratosthenes' method would cross out k different multiples of a prime $2i+1$, Sundaram's method crosses out $i + j(2i+1)$ for $$1\le j\le \lfloor k/2\rfloor$$.

Correctness
If we start with integers from $1$ to $n$, the final list contains only odd integers from $3$ to $2n + 1$. From this final list, some odd integers have been excluded; we must show these are precisely the composite odd integers less than $2n + 2$.

Let $q$ be an odd integer of the form $2k + 1$. Then, $q$ is excluded if and only if $k$ is of the form $i + j + 2ij$, that is $q = 2(i + j + 2ij) + 1$. Then we have:
 * $$\begin{align}

q &= 2(i + j + 2ij) + 1 \\ &= 2i + 2j + 4ij + 1 \\ &= (2i + 1)(2j + 1). \end{align}$$

So, an odd integer is excluded from the final list if and only if it has a factorization of the form $(2i + 1)(2j + 1)$ — which is to say, if it has a non-trivial odd factor. Therefore the list must be composed of exactly the set of odd prime numbers less than or equal to $2n + 2$.

Asymptotic complexity
The above obscure but as commonly implemented Python version of the Sieve of Sundaram hides the true complexity of the algorithm due to the following reasons:


 * 1) The range for the outer i looping variable is much too large, resulting in redundant looping that can't perform any composite number representation culling; the proper range is to the array index represent odd numbers less than the square root of the range.
 * 2) The code doesn't properly account for indexing of Python arrays, which are zero index based so that it ignores the values at the bottom and top of the array; this is a minor issue, but serves to show that the algorithm behind the code has not been clearly understood.
 * 3) The inner culling loop (the j loop) exactly reflects the way the algorithm is formulated, but seemingly without realizing that the indexed culling starts at exactly the index representing the square of the base odd number and that the indexing using multiplication can much more easily be expressed as a simple repeated addition of the base odd number across the range; in fact, this method of adding a constant value across the culling range is exactly how the Sieve of Eratosthenes culling is generally implemented.

The following Python code in the same style resolves the above three issues, as well converting the code to a prime counting function that also displays the total number of composite culling representation culling operations:

Note the commented out line which is all that is necessary to convert the Sieve of Sundaram to the Odds-Only (wheel factorized by the only even prime of two) Sieve of Eratosthenes; this clarifies that the only difference between these two algorithms is that the Sieve of Sundaram culls composite numbers using all odd numbers as the base values, whereas the Odds-Only Sieve of Eratosthenes uses only the odd primes as base values, with both ranges of base values bounded to the square root of the range.

When run for various ranges, it is immediately clear that while, of course, the resulting count of primes for a given range is identical between the two algorithms, the number of culling operations is much higher for the Sieve of Sundaram and also grows much more quickly with increasing range.

From the above implementation, it is clear that the amount of work done is according to the following:

$$\int_{a}^{b} \frac{n}{2x} \,dx.$$ or $$\frac{n}{2} \int_{a}^{b} \frac{1}{x} \,dx.$$ where: - the a to b range actually starts at the square of the odd base values, but this difference is negligible for large ranges.
 * n is the range to be sieved and
 * the range a to b is the odd numbers between 3 and the square root of n

As the integral of the reciprocal of x is exactly $$\log{x}$$, and as the lower value for a is relatively very small (close to one which has a log value of zero), this is about as follows:

$$\frac{n}{4} \log{\sqrt{n}}$$ or $$\frac{n}{4} \frac{1}{2} \log{n}$$ or $$\frac{n}{8} \log{n}$$.

Ignoring the constant factor of one eighth, the asymptotic complexity in Big O notation is clearly $$O({n} \log{n})$$.