User:Dcoetzee/Schönhage-Strassen algorithm

This section explains in detail how Schönhage-Strassen is implemented and a number of important practical optimizations. It is based primarily on an overview of the method by Crandall and Pomerance in their Prime Numbers: A Computational Perspective. Other sources for detailed information include Knuth's The Art of Computer Programming and a 2007 work by Gaudry, Kruppa, and Zimmermann describing enhancements to the GNU Multi-Precision Library.

Background
Like Toom-Cook multiplication and all methods based on the discrete Fourier transform, Schönhage-Strassen is based on the idea that one can break the two input integers up into parts, make these parts the coefficients of two polynomials, multiply the polynomials, and then extract the result from the coefficients of the product polynomial. Multiplying the polynomials is done by evaluating each polynomial at some fixed set of points, multiplying the values at each point to obtain points on the product polynomial, and then interpolating to produce the coefficients of the product polynomial. For example, to multiply the integers 123 and 456, one can write:


 * p(x) = 1x2 + 2x + 3
 * q(x) = 4x2 + 5x + 6

The product polynomial will have degree 4, and a degree-k polynomial requires k + 1 points to uniquely specify it, so we evaluate each polynomial at 5 points, say 0, 1, -1, 2, and -2:


 * p(0) = 3, p(1) = 6, p(-1) = 2, p(2) = 11, p(-2) = 3
 * q(0) = 6, q(1) = 15, q(-1) = 5, q(2) = 32, q(-2) = 12

We then pointwise multiply the values of p and q at each point to get points on a product polynomial r:


 * r(0) = 18, r(1) = 90, r(-1) = 10, r(2) = 352, r(-2) = 36

We interpolate to determine the polynomial r(x) matching these points:


 * r(x) = 4x4 + 13x3 + 28x2 + 27x + 18

The coefficients of this polynomial are precisely what we would get if we performed long multiplication without doing any carrying:

This sequence is the acyclic or linear convolution of the two original sequences (1,2,3) and (4,5,6) (see convolution). Once we have this, all that remains is to do the carrying, which requires only shifts and adds. This produces the correct answer of 123 &times; 456 = 56088.

Number theoretic transform
By carefully choosing which points to evaluate the polynomials at, it's possible to reuse the same partial computations for many points and achieve a speedup. This idea forms the basis of the Fast Fourier transform (FFT) algorithm. Here we describe a version of FFT called the number theoretic transform that forms the basis for Schönhage-Strassen.

Suppose we wish to compute (A &times; B) mod N. Break each of A and B up into D = 2k parts, where D is a factor of N&minus;1. As before, we make the D parts of A and B into the coefficients of polynomials p and q. Now, suppose x is a Dth primitive root of unity mod N; that is, xD &equiv; 1 (mod N), but no smaller power of x is equivalent to 1. We then evaluate p and q at the D points xi mod N for 0 &le; i < D (using arithmetic modulo N).

To interpolate the product polynomial, we need 2D &minus; 1 points, and it seems like we've only computed D; but in fact, since xD = 1, we know that x i

The inverse number-theoretic transform is given by


 * $$f^{-1}(x)_h=n^{p-2}\sum_{j=0}^{n-1}x_j(\omega^{p-1-\xi})^{hj}\mod p\quad\quad h=0,\dots,n-1$$