User:MRaccoon/Montgomery1

This is a Draft for new additional text to incorporate into the WP article "Montgomery modular multiplication".

-

MONTGOMERY MODULAR MULTIPLICATION

In modular arithmetic computation, Montgomery modular multiplication, more commonly referred to as Montgomery multiplication, is a method for performing fast modular multiplication, introduced in 1985 by the American mathematician Peter L. Montgomery.

The purpose of the method is to speed up modular multiplication, without a penalty in the speed of modular addition and subtraction.

Because of a need for conversion of inputs and outputs, actual speedup is achieved only when performing a whole series of modular multiplications on intermediate results, all with the same modulus $$N$$ (as occurs for example in calculation of modular exponentiation via exponentiation by squaring).

Inside Montgomery's modular multiplication method, arithmetic operations are performed on numbers represented in an internal representation specific to the method, which in this article we will call the Montgomery representation. This representation makes the speedup of modular multiplication possible; however, other arithmetic operations modulo $$N$$, such as modular addition, are still possible in Montgomery representation.

Montgomery's modular multiplication method includes a number of algorithms. The process (algorithm) of performing one single modular multiplication of two numbers in Montgomery representation is called a Montgomery multiplication step, or Montgomery step for short. This Montgomery multiplication step, as well as some other operations on numbers in Montgomery representation (including the process of conversion of inputs and outputs), makes use of another algorithm, called Montgomery reduction. Montgomery reduction is a subroutine (a function called REDC or Redc) specific to the Montgomery representation and specific to the Montgomery modular multiplication method.

Rationale (for Montgomery multiplication)
Every modular arithmetic operation involves first executing a normal nonmodular arithmetic operation, and then reducing the result modulo N. Reducing modulo N means to take the remainder after division by $$N$$. For example, calculating $$(a \times b) \mod N$$ involves first multiplying $$a$$ and $$b$$ using normal nonmodular multiplication, then reducing the result modulo $$N$$. (The operator $$\mod$$ is called the remainder operator.)

The remainder operation is computationally expensive (slow), namely of the same cost as division. The integer division and remainder operations (which together are called the division-like operations) are

of a cost that is significantly higher than integer multiplication. The computational cost of the division-like operations is especially high when the operands are large, i.e. when $$N$$ is large. Thus, the remainder operation is clearly the performance bottleneck in modular arithmetic computations.

Montgomery's modular multiplication method is targeted at easing this performance bottleneck. The purpose of Montgomery's modular multiplication method is to speed up modular arithmetic computations that involve many modular multiplications, in cases where the cost of the remainder operation is especially high. A good example of an application area that uses such computations is cryptography (see next section).

A notable feature of Montgomery's modular multiplication method is that it achieves a speed-up of modular multiplication without a cost in the speed of other modular arithmetic operations like addition. This makes the method applicable to computations which, among many modular multiplication operations, also include other modular arithmetic operations such as addition.

Use in cryptography
Modular arithmetic involving exponentiation modulo a large number (typically several hundred bits) is used in many important cryptosystems such as RSA and DSA.

Modular exponentiation can be implemented in various ways, however all of them include performing a series of modular multiplication operations.

The naive way to calculate $$a^b \mod N$$ is by repeatedly multiplying $$a$$ by itself $$b$$ times, each time reducing the result modulo $$N$$. (Note that taking a single modulo at the end of the calculation will result in increasingly larger intermediate products, which is infeasible if $$b$$ is very large.)

A better way to compute $$a^b \mod N$$ is to use exponentiation by squaring, which means successively squaring $$a$$ (each time reducing the result modulo $$N$$), and then taking the sum of selected squares. This involves significantly fewer multiplication (and remainder) operations. But it still means performing a computation that involves a series of modular multiplication operations (mixed with modular addition operations).

Exponentiation by squaring is a type of computation that is well suited to application of Montgomery's modular multiplcation method.

Outline of the method
TODO

Single Montgomery multiplication step
TODO (maybe with diagram)

Intuitive explanation
The purpose of this section is to explain in an intuitive way the logic behind the REDC algorithm.

All the variables used in this section ($$N, R, T, t, m, \xi$$) are integers.

Core idea
The REDC function (algorithm) computes the value of
 * $$t = ( T \times R^{-1} ) \mod N$$

however it doesn't compute this value according to the recipe suggested by the above formula.

As explained earlier in this article, the problem in the formula above is the expensive $$\mod N$$ operation at the end.

The core idea behind the REDC algorithm is to replace the $$T \times R^{-1}$$ (the operand of the $$\mod N$$ operation) in the formula above by an expression equivalent to it (mod $$N$$) that makes the $$\mod N$$ operation trivial to compute. In other words, the strategy used, properly speaking, is not to completely eliminate the $$\mod N$$ operation; but instead to "trivialize" it beyond recognition.

That is, from an abstract point of view the REDC algorithm does in fact go through a form of computation of the $$\mod N$$ operation; however before doing so, it performs a brief amount of preparatory computation that transforms the operand of the $$\mod N$$ operation in such a way that the $$\mod N$$ operation becomes utterly trivial. (Namely, so trivial that it can be carried out extremely quickly and without using division-like machine operations.)

Derivation of the algorithm
Below is a commented demonstration of the formula manipulations that transform the $$T \times R^{-1}$$ in


 * $$t = ( T \times R^{-1} ) \mod N$$

to the expression that is equivalent to it (mod $$N$$) that is actually computed by the REDC algorithm, and that makes the $$\mod N$$ trivial to compute.

In the above formula for $$t$$, we can, since it is modular arithmetic $$\mod N$$, replace any of the multiplicands in the modular multiplication by any element in its residue class (mod $$N$$). In other words, the result $$t$$ doesn't change if prior to the multiplication we add a multiple of $$N$$ to the operand $$T$$:


 * $$t = ( (T + mN) \times R^{-1} ) \mod N$$

Where $$m$$ can be any integer. In the REDC algorithm, $$m$$ is chosen so that $$(T + mN)$$ becomes an integer multiple of $$R$$, i.e. so that $$T + mN = \xi R$$ for some integer $$\xi$$, or $$(T + mN) \equiv 0 \pmod{R}$$.

That it is possible for arbitrary $$T$$ to find an $$m$$ so that $$(T + mN) \equiv 0 \pmod{R}$$ is guaranteed because $$N$$ and $$R$$ are relatively prime.

The value of $$m$$ that makes $$(T + mN)$$ a multiple of $$R$$ follows directly from $$(T + mN) \equiv 0 \pmod{R}$$ as


 * $$m = T \times (-N^{-1}) \mod R $$

Note that computing $$m$$ involves only arithmetic (mod $$R$$).

$$N^{-1}$$ is the multiplicative inverse (mod $$R$$) of $$N$$ (i.e. $$N N^{-1} \equiv 1 \pmod{R}$$). This inverse is guaranteed to exist since $$N$$ is relatively prime to $$R$$. It can be efficiently computed (from $$N$$ and $$R$$) with the Extended Euclidean algorithm.

$$(-N^{-1})$$ is the additive inverse (mod $$R$$) of $$N^{-1}$$ (i.e. $$(-N^{-1}) = R - (N^{-1})$$).

Thus with this choice of $$m$$, we have now made $$(T + mN)$$ a multiple of $$R$$, that is $$T + mN = \xi R$$ with $$\xi = (T + mN) / R$$, so that



(T + mN) \times R^{-1}  \; =       \; \xi \; R \; R^{-1}      \; \equiv  \; \xi                     \; =       \; (T + mN) / R                            \pmod{N} $$

Therefore we then can now in the formula for $$t$$ replace the multiplication by $$R^{-1}$$ by a division by $$R$$:


 * $$t = ( (T + mN) \times R^{-1} ) \mod N$$
 * $$t = ( (T + mN) / R ) \mod N$$

This last equation is the final result of the formula manipulation. The REDC algorithm computes $$m$$ as described above, and then computes $$(T+mN)/R$$.

The final trivial mod N computation
What was gained by these transformations is that they have given the $$\mod N$$ operation an operand $$(T+mN)/R$$ that has a very limited range. This in turn makes the $$\mod N$$ operation trivial to compute.

In Montgomery's multiplication method, $$T$$ (the input to the REDC function) is restricted to the interval $$[0,RN)$$. (This restriction is sensible because this interval suffices for the result of an ordinary nonmodular multiplication of two numbers both in $$[0,N)$$.)

$$m$$ is in $$[0,R)$$, so that $$mN$$ is in $$[0,RN)$$. $$(T+mN)$$ is then in $$[0,2RN)$$, so that $$(T+mN)/R$$ is in $$[0,2N)$$.

Because of this very limited range of $$(T+mN)/R$$, the $$\mod N$$ operation in the last equation above only has trivial work left to do. Namely, all that remains to be done is to map the upper half of the interval $$[0,2N)$$ to their equivalents (mod $$N$$) in the interval $$[0,N)$$ (the interval required for the result of a $$\mod N$$ operation). This is a simple conditional subtraction -- i.e. if the operand is greater or equal $$N$$ then subtract $$N$$, else do nothing.

To state it explicitly: what the REDC algorithm does, after computing $$\xi = (T + mN) / R$$, is to compute $$t$$ (the output value of the algorithm) as follows:


 * IF      $$ \xi \ge N $$
 * THEN      $$ t = \xi - N $$
 * ELSE      $$ t = \xi $$

--