Talk:Division algorithm

Division based on a binomial theorem
I have seen the algorithm $$ \frac{1}{1-x} = \frac{1+x}{1-x^2} = \frac{(1+x)\cdot(1+x^2)}{1-x^4} = \frac{(1+x)\cdot(1+x^2)\cdot(1+x^4)}{1-x^8} $$ etc. When $$x^{2^n}$$ becomes small enough, then the denominator is set to 1 and the algorithm terminates. Does someone know its name or better a reference? I believe it was used in an IBM machine. HenningThielemann (talk) 16:45, 25 March 2010 (UTC)


 * Why is this referred to as being related to the binomial theorem? (I don't see a source for making that connection.) What it relies upon is the difference of two squares, which is quite different! If I was to suggest an English name for the method (the elementary algebraic identity $$ (a-b)(a+b) = a^2 - b^2 $$ doesn't seem to have much of an established name in English, so it'd be hard to name the method after that), then "squaring away the error" is probably appropriate. Alternatively, the method can be viewed as an application of the formula for the sum of a geometric series, but that is again not the same as the binomial theorem. 130.243.94.123 (talk) 12:07, 4 June 2015 (UTC)

Requested move
This page actually discusses implementing division algorithms for digital circuits (i.e. a divider in a microprocessor math unit). Many other types of division also exist for electronics that are not addressed in this page (e.g. voltage divider, current divider, etc)

Voting

 * ''Add *Support or *Oppose followed by an optional one sentence explanation, then sign your vote with

~ ''
 * Support Jpape 07:21, 6 December 2005 (UTC)

Discussion

 * It's consistent with Adder (electronics), for whatever that's worth. Though I do agree with the proposal to some extent.  --Interiot 07:45, 6 December 2005 (UTC)
 * Move is done. Didn't need an admin to begin with, so I just did it.  Be bold! Tedernst | talk 20:46, 9 December 2005 (UTC)
 * So do we all support moving Adder (electronics) to Adder (digital) then? --Interiot 20:56, 9 December 2005 (UTC)

Non-restoring division
The algorithm as described is inefficient. If the only two digits used are 1 and -1, the latter can be represented as 0, then:

Q = P (positive term) N (the negative term) = ~P = -P-1 Two's complement of N (-N) = P+1

The sum of P and -N = 2*P + 1.

Therefore, the least significant digit for the quotient is always 1, and the overall precision is one less bit than expected.

A better implementation of non-restoring division is to allow digits 0, 1, and -1 (this will require twice the number of state devices for the quotient, though):

P[0] = N i = 0 while(i= 0) { q[n-(i+1)] = 1; P[i+1] = 2*P[i] - D; } else              { q[n-(i+1)] = -1; P[i+1] = 2*P[i] + D; } i++; }

Comparing with 0.25 is cheap: the absolute value is less than 0.25 if the two most significant bits of mantissa are equal to the sign bit.

((That's essentially SRT now, isn't it? -Ben))


 * what about an extra section with SRT explanation and code example? -- 89.247.39.228 (talk) 18:41, 29 July 2008 (UTC)

Simple compilable code sample - for demonstration only
I'm not smart enough to understand any of the formulas on this article and I don't know which this is, but I do know that this is not the best routine - but it is a simple example of integer division in compilable C code. I only post it to talk page because I came here looking for such a thing and couldn't find it, so when I figured it out, I post it here. Just delete it if you don't like. I won't be offended. Maybe one person will be helped.

int Q=0; int N=100000000;  //1.00000000 int D=3183; //0.3182 (in other words, 1/PI) - the result should be pi. int mult=65536; while(mult>0) {               if((N-(D*mult))>=0) {                       N=N-(D*mult); Q=Q+mult; }               mult=mult>>1; //Right-shift by 1 bit.. }	printf ("Q=%d, R=%d\n",Q,N);

Anyway, after the loop is done, N contains the remainder. This code is very limited and for illustration only. PS: C code and this page don't get along.

64.146.180.232 (talk) 03:53, 11 July 2008 (UTC)

pseudo code untestable
Hi, I just fixed some issues in the pseudocode. As all pseudocode it is untestable, and thus likely to be buggy. Below you find a C implementation of same complexity for the restoring unsigned integer division, but compile- and verifiable:

Hope this helps to make the situation better, -- 89.247.39.228 (talk) 18:36, 29 July 2008 (UTC)

Why Goldschmidt division mentioned separately?
It's essentially the same thing as Newton's itterations, just written in a different form.

IMHO if anything else should be mentioned regarding fast division are those two things:

1. How to derive iteration formula with cubic convergence order (or write the formula itself). Higher order convergence is of no real use here.

2. Numerical error and achievable accuracy. It's very important, because for instance when N=32, for most numbers achievable accuracy is lower by one digit of importance than what can be represented on the machine.

79.178.123.70 (talk) 15:15, 28 June 2009 (UTC)


 * The Goldschmidt method as described would be equivalent if one had infinite precision but one doesn't. In fact one can use other multipliers from a lookup table if one wants instead just provided the denominator eventually becomes 1. Newton's method corrects itself so you start from the start but with a better approximation each time. The Goldschmidt method just multiplies the two parts independently and has no such self correction but the hardware implementation is quicker. The hardware manufacturers put in little tweaks like a couple of extra bits and check their tables carefully to ensure they actually do get accurate division. This isn't the sort of thing one can do in software easily. So each has its strengths and weaknesses. I'm not sure what you're saying about achievable accuracy, any decent computer guarantees you absolutely accurate results for both integer and floating point division. Dmcq (talk) 16:19, 28 June 2009 (UTC)


 * Sorry for replying a bit late. In any case, achievable accuracy is:
 * Suppose you're zeroing $$f\left( x \right)$$, but notice that on computer you're actually zeroing: $$\tilde{f}\left( x \right)$$, for which: $$\left| \tilde{f}\left( x \right)-f\left( x \right) \right|\le \delta $$. Hence what actually happens is: $$\left| f\left( x \right) \right|\le \delta $$.
 * Expanding $$f\left( {{x}_{n}} \right)$$ around the root $$\alpha $$: $$f\left( {{x}_{n}} \right)=f\left( \alpha \right)+{f}'\left( \alpha  \right)\left( {{x}_{n}}-\alpha  \right)+O{{\left( {{x}_{n}}-\alpha  \right)}^{2}}\approx {f}'\left( \alpha  \right)\left( {{x}_{n}}-\alpha  \right)$$.
 * Here we arive at: $$\left| {{x}_{n}}-\alpha \right|\approx \left| \frac{f\left( {{x}_{n}} \right)}{{f}'\left( \alpha  \right)} \right|\le \frac{\delta }{\left| {f}'\left( \alpha  \right) \right|}$$.
 * In our case $$f\left( x \right)={}^{1}\!\!\diagup\!\!{}_{x}\;-a$$. Hence we're bounded by: $$\left| {{x}_{n}}-{}^{1}\!\!\diagup\!\!{}_{a}\; \right|\le {}^{\delta }\!\!\diagup\!\!{}_$$, adjusting for FP: $$\left| a{{x}_{n}}-1 \right|\le {}^{\delta }\!\!\diagup\!\!{}_{\left| a \right|}\;$$.
 * Of course this can be fixed by inserting more bits, thus increasing internal precision. Now that I look at it, Goldschmidt division might be less plagued by numerical problems, and I can see the benefits of choosing it over NR division. However, I'd still put them in the same section, and just note that Goldschmidt division is a variation of NR division, with better numerical properties. 79.178.112.120 (talk) 19:40, 12 July 2009 (UTC)


 * Not sure exactly what you're up to but as I said above each iteration of Goldschmidt division accumulates errors and Newton's method doesn't. An individual iteration may be better but overall over a number of iterations it is worse. Both are notable and important and they are not the same thing. Dmcq (talk) 20:49, 12 July 2009 (UTC)

Non-restoring Division - two bugs, two issues.
As presented, the algorithm has two errors:


 * 1) D must be shifted left by n before the beginning of the loop as in the restoring case.


 * 1) When the quotient should be even it isn't.  It's too large by 1 and the remainder is negative.  Divide 5 by 2 and the result is 3 R -1.  This is incorrectly diagnosed as a loss of precision in the 2008 July 29 Talk entry.  Testing the remainder (P) for a negative value and decrementing the quotient (after quotient "fixup" as described) will give the correct quotient.  If the correct remainder is needed in this case, D should be added to P.

The two issues are:
 * 1) The subscripting of P is unnecessary and confusing.  It also obscures the similarities to and differences from the Restoring algorithm.  The algorithm can be presented much more clearly as in the Restoring case.


 * 1) The conversion from non-standard to standard representation of Q is unnecessarily complex: just subtract N from P to get Q.  Further, as others have noted, the -1 digit value is normally represented as 0.  Complementing the bits of Q and subtracting the result from Q yields the corrected Q.

I would present the algorithm so it looks similar to the restoring version. Here is a suggested replacment for the entire section:

Suggested replacement for the entire section:

Non-restoring division uses the digit set {&minus;1,1} for the quotient digits instead of {0,1}. The basic algorithm for binary (radix 2) non-restoring division is:

Following this algorithm, the quotient is in a non-standard form consisting of digits of &minus;1 and +1. This form needs to be converted to binary to form the final quotient. Example:

If the -1 digits of $$Q$$ are stored as zeros (0) as is common, then $$P$$ is $$Q$$ and computing $$N$$ is trivial: perform a bit-complement on the original $$Q$$.

Finally, quotients computed by this algorithm are always odd: 5 / 2 = 3 R -1. To correct this add the following after Q is converted from non-standard form to standard form:

The actual remainder is P >> n.

Michael Mulligan (talk) 15:24, 21 June 2010 (UTC)


 * It has been 4 1/2 years and this page is still showing the incorrect algorithm as described above. The above appears to be correct.  Why is the page still showing an example that is essentially wrong? — Preceding unsigned comment added by 141.149.176.86 (talk) 00:35, 21 January 2015 (UTC)


 * Changed tag for code from 'code' to 'syntaxhighlight'. Attendant changes to code syntax.
 * Added Code for Converting Q from {&minus;1, 1} to {0, 1}. Explained P = Q for common case.
 * Will move to Article page soon per 'unsigned' from 21 Jan 2015.


 * Michael Mulligan (talk) 06:32, 3 May 2015 (UTC)


 * Finally moved these fixes to the Article page per 'unsigned' from 21 Jan 2015.


 * Michael Mulligan (talk) 03:38, 26 September 2015 (UTC)

Initialization of Newton–Raphson division
In subsection Division (digital), it gave a formula for X0 (the initial approximation to 1/D) as a linear function of D (the divisor) which it claimed would minimize the maximum of the absolute error of this approximation on the interval [0.5, 1]. Using Fermat's theorem (stationary points), I determined that the coefficients given were incorrect; and I replaced them with the correct coefficients. However, I subsequently realized that the formula which I gave required one multiplication while the previous formula required no multiplications (since its coefficient for D was &minus;2 and &minus;2D = &minus;D&minus;D). Even if one forces the coefficient for D to be &minus;2, the other coefficient originally given would still have been wrong. However, avoiding the cost of the multiplication by &minus;32/17 in my version might be the right choice in some implementations, so I want to show what happens if we use &minus;2 for the coefficient of D.

Suppose we want to choose T to minimize the maximum absolute error when calculating $$X_0 = T - 2 \cdot D \,.$$ Then $$\epsilon_0 = D \cdot X_0 - 1 = T \cdot D - 2 \cdot D^2 - 1 \,.$$ So $$\frac{d \epsilon_0}{d D} = T - 4 \cdot D \,,$$ and thus the critical points lie at $$\frac{1}{2}, \frac{T}{4}, 1 \,.$$ If we assume that the absolute error is maximized at $$\frac{T}{4}, 1 \,$$ and has opposite signs, then we get $$0 = [ T \cdot \frac{T}{4} - 2 \cdot {\left( \frac{T}{4} \right)}^2 - 1 ] + [ T \cdot 1 - 2 \cdot 1^2 - 1 ] \,.$$ Thus $$T^2 + 8 \cdot T - 32 = 0 \,$$ and $$T = -4 + \sqrt{ 4^2 - (-32) } = 4 (\sqrt{3} - 1) \approx 2.928 \,.$$

To verify that this is correct, let us calculate $$\epsilon_0 \,$$ at each of the three critical points: if $$D = 0.5 \,,$$ then $$\epsilon_0 = 2.928 \cdot 0.5 - 2 \cdot 0.5^2 - 1 = -0.036 \,.$$ if $$D = 0.732 \,,$$ then $$\epsilon_0 = 2.928 \cdot 0.732 - 2 \cdot 0.732^2 - 1 = 0.072 \,.$$ if $$D = 1 \,,$$ then $$\epsilon_0 = 2.928 \cdot 1 - 2 \cdot 1^2 -1 = -0.072 \,.$$ So the extreme points are as we assumed and the coefficient T that we calculated is correct. Let me repeat that the extremal error is worse for this formula than for the one given in the current version of the article, but it saves one multiplication. JRSpriggs (talk) 08:44, 13 November 2010 (UTC)


 * This is the problem with people calculating things themselves and putting it into Wikipedia. I have done that for an example but not ever for the bit which purports to give the definitive information. This is unfortunately original research and the bit should be trimmed back to stuff that can be sourced from books. Dmcq (talk) 08:51, 13 November 2010 (UTC)
 * There was no source given for what was there before and it was definitely wrong. JRSpriggs (talk) 09:08, 13 November 2010 (UTC)


 * Then it should just have been removed or a citation needed tag stuck on it. It is possible they were paraphrasing something about it wrong. I looked this up on the web and here's a book that might help with citation. They say the optimal value depends on the number of iterations and that a common way of doing the job is by table lookup. Sounds like they weren't too worried by the multiplication cost which is surprising.


 * Optimal initial approximations for the Newton-Raphson division algorithm by M. J. Schulte, J. Omar and E. E. Swartzlander


 * I did a google on 'Newton–Raphson division initial' and it was at the top. Dmcq (talk) 09:11, 13 November 2010 (UTC)


 * Yes, that appears to be a good source. It has a table of formulas which implies that the coefficients I put into the article for the interval [0.5, 1] are best ones for minimizing the maximum of the absolute value of the relative error. I was using relative error exclusively. It also implies that the original coefficients are best for minimizing the maximum of the absolute value of the absolute error. So perhaps this is all a tempest in a tea pot. I guess the only real issue is which type of error matters. Of course, I think the relative error is what matters.
 * The number of multiplications in the initialization should matter because even one is as costly as half a step of the Newton–Raphson method.
 * I agree that a table look-up using the first byte of the divisor would have a smaller maximum error than any linear approximation, but I do not know how much that would cost.
 * If one multiplication is an acceptable cost, then perhaps we should also consider quadratic approximations where the coefficient of D2 is an integer since those could be done with just one multiplication. This might reduce the error significantly, but I have not worked out the details yet. JRSpriggs (talk) 13:01, 13 November 2010 (UTC)


 * Whether relative or absolute error matters most depends on what you're doing with it. If you're going to add it to something else the absolute error is very possibly going to be most important. That you got 1/10000000 out by 300% mightn't matter at all. Dmcq (talk) 13:31, 13 November 2010 (UTC)


 * Even if all you care about is the absolute error of the final result, it is the relative error of X0 which matters because the absolute error of Xn is the $$X_n - {1 \over D} = {\epsilon_n \over D} = {{- \epsilon _0^{2^n}} \over D} \,$$ which contains 2n&minus;1 factors of the relative error of X0 to one factor of its absolute error. JRSpriggs (talk) 08:24, 14 November 2010 (UTC)

Looking at table 1 of your reference gave me a brain-storm: I can eliminate the one multiplication without sacrificing precision if I just move the interval within which D lies. If instead of shifting D into [1/2, 1], I shift it into $$[\frac{4}{\sqrt{34}}, \frac{8}{\sqrt{34}}] \,,$$ then $$X_0 = \frac{12}{\sqrt{34}} - D \,$$ which requires no multiplication but still gives $$\vert \epsilon_0 \vert \leq {1 \over 17} \,.$$ JRSpriggs (talk) 08:24, 14 November 2010 (UTC)


 * What I meant was that if one optimises for relative error across the input range then one deoptimises for absolute error. Worrying about 300% relative error for a small quantity will make a large quantity have a larger absolute error which matters for addition but not for multiplication. Dmcq (talk) 11:45, 14 November 2010 (UTC)


 * I understand that for sums and differences, the absolute error of the inputs is more relevant than their relative error. And so also for an integral $$\int_a^b f (x) d x \,$$ the absolute error is what matters in calculating f.
 * However, for the problem at hand, $$X_0 D - 1 \,$$ is more appropriate than $$X_0 - {1 \over D} \,$$ regardless of what we ultimately do with N/D.
 * Do you agree with putting the scheme of my last message into the article? It avoids multiplication and maximizes precision (for linear functions) at the cost of a possible extra comparison and shift. JRSpriggs (talk) 20:12, 14 November 2010 (UTC)

I didn't find best approximation in terms of relative error, but I found $$\frac{3}{2}+\sqrt{2} - 2\cdot D$$ to minimize the maximum absolute error for $$\frac{1}{D}$$ on the interval $$[0.5,1]$$. The error is then $$\frac{3}{2}-\sqrt{2}$$ --MrOklein (talk) 20:43, 19 January 2013 (UTC)

I think the best approximation in terms of relative error is $$\frac{48}{17}-\frac{32}{17} \cdot D$$ for the same function on the same interval. The maximal relative error here is $$\frac{1}{17}$$. --MrOklein (talk) 21:12, 19 January 2013 (UTC)

SRT Division
The distinctive difference between Non Restoring and SRT division is incorrectly described as the use of a table. Radix 2 division uses digits +1, 0 and -1, don't need any table and deliver one result bit per iteration. Higher radix SRT division (for example Radix 4, digits +2,+1,0,-1 and -2) need a table as digit selection depends on the divisor and remainder (not dividend, except for the first iteration) values.

The main advantage of SRT division for electronic circuit design is that an approximate value of the remainder is used for digit selection at each iteration, instead of the exact value. Faster, non carry propagating adders can be used. That in turn enables higher precision, shorter cycle time or the calculation of several bits par cycle either by using a high radix divider or by combining several Radix 2 cells — Preceding unsigned comment added by 194.98.70.14 (talk) 11:13, 16 June 2011 (UTC)


 * I think we really need a citation for SRT division, if you could suggest a good one that would really help. Dmcq (talk) 11:27, 16 June 2011 (UTC)

Integer division (unsigned) with remainder
This new section stuck at the beginning is basically the same algorithm as the restoring division one. However neither of them is cited and I think the actual code in the restoring division one is a bit clunky. So I think I'll add citation needed to both and then we can keep the one that is cited first and remove the other one. Dmcq (talk) 23:04, 20 February 2012 (UTC)

I really think this section should be removed. It's just a bad implementation of the restoring division algorithm. --108.49.86.122 (talk) 16:35, 24 December 2019 (UTC)

Recent moves
I recently moved Division (digital) to Division algorithm, replacing a former redirect to Euclidean division. In retrospect this was too bold - I belatedly discovered that the term "division algorithm" is used (idiosyncratically) to refer to the theorem outlined at Euclidean division as well as to algorithms that perform division. I've fixed all the links now, and I think it makes sense for the primary topic to refer to the more obvious subject. I think it also makes sense to have a general article about division algorithms, similar to multiplication algorithm, rather than one solely concerned with division algorithms used in digital circuits. Unlike Adder (electronics), dividers are too complex to describe in detail at the logic or gate level, so most of what's here is already pseudocode suitable for implementation in a variety of hardware, software, and other settings. As such few changes were needed except to add some info on basic slow algorithms not normally used in implementations, which I think help in any case to introduce the notation and serve as a behavioural reference for the more complex ones. Feedback is welcome. Dcoetzee 12:31, 23 September 2012 (UTC)

Problem with Newton-Raphson initial estimate
The article claims:
 * $$X_0 = {48 \over 17} - {32 \over 17} D \,.$$
 * Using this approximation, the error of the initial value is less than
 * $$\vert \epsilon_0 \vert \leq {1 \over 17} \approx 0.059 \,.$$

It seems that the error is actually bounded by 2/17, not 1/17. Measuring error at the two endpoints and the single local maximum, we find error is 1/17 at D=1, 2/17 at D=1/2, and $$48/17 - 2\sqrt{32/17}$$ or about 1.36/17 at the local maximum. This would imply the estimate for S may be incorrect (should have $$\log_2 17/2$$ rather than $$\log_2 17$$).

I also attempted a derivation of the optimal linear approximation in terms of max error. The max error for f(x)=a-bx is given by $$\max\{a-2\sqrt{b}, 2-a+b/2, 1-a+b\}$$, assuming it crosses 1/x twice in [0.5, 1]. The planes 2-a+b/2 and 1-a+b have a difference of 1-b/2, so they intersect where b = 2; where b < 2, the plane 2 - a + b/2 is higher, and where b > 2, the plane 1-a+b is higher.

When b < 2 the functions $$a-2\sqrt{b}$$ and $$2 - a + b/2$$ intersect where $$a = b/4 + \sqrt{b} + 1$$, along which max error is $$1 - \sqrt{b} + b/4$$, which is minimized at endpoint b=2, giving the approximation $$f(x) = 3/2 + \sqrt{2} - 2x$$ which has max error of $$3/2 - \sqrt{2}$$ or about 0.0857864.

When b > 2 the functions $$a-2\sqrt{b}$$ and $$1-a+b$$ intersect where $$a = b/2 + \sqrt{b} + 1/2$$, along which max error is $$1/2 (-1 + \sqrt{b})^2$$, which is again minimized at endpoint b=2, giving the same result as before.

In short the optimal linear approximation appears to be $$f(D) = 3/2 + \sqrt{2} - 2D$$ with a max error of $$3/2 - \sqrt{2}$$ or about 0.0857864 < 1/11, as compared to 2/17 or 0.117647 < 1/8 for the one given in the article. Not only is the approximation better, but it saves a multiply since 2D is just a shift. Obviously I can't add this to the article since it's original research, but it does suggest that the approximation we're suggesting is suboptimal. Even rational approximations of this solution are better than the suggested one (e.g. 50/17 - 2D gives a max error of 0.112749, 32/11 - 2D gives 0.0909091, and 102/35 - 2D gives 0.0858586).

Note that the average error squared given by the integral $$\int\limits_{1/2}^{1} (f(x)-1/x)^2\, dx \approx 0.00170884$$ is not quite as good as for the currently supplied function which gives 0.00155871, so there is that tradeoff. However if there is no test for convergence and the number of iterations is determined solely by the desired precision, then average error doesn't help. Dcoetzee 05:26, 24 September 2012 (UTC)
 * Okay, I've now found a source "Optimal initial approximations for the Newton-Raphson division algorithm" (1994) that I may be able to use to update the article appropriately. Its treatment is more sophisticated than mine (my approach seems to have been that of ref [10], Swartzlander's "Overview of computer arithmetic" - it turns out limiting max error in the initial estimate doesn't necessarily best limit relative error in the result). Dcoetzee 06:20, 24 September 2012 (UTC)
 * Update I just noticed the thread farther up the page concerning the choice of initialization, and mentioning the same exact paper... silly me not reading the full talk page first. Regardless the article as it stands is in error, not because the initial estimate is a bad one but because it gives the wrong max error and the analysis is not sophisticated enough to explain why it's good - the current simple analysis based on max error of the initial estimate would favour an estimate with better max error. Clearly a more nuanced discussion will be required to inform readers about tradeoffs in the choice of the initial estimate. I'll try to fill it out once I get access to the paper. Dcoetzee 06:34, 24 September 2012 (UTC)

Section Rounding error
I have tagged this section expand-section not only because it is a stub, but mainly because some important material is completely lacking in the article, mainly the problem of correct rounding. The section should mention the norm IEEE 754 and describe how the algorithms have to be modified (and have been) to follow this norm. Also, and this is related, this section should mention Intel bug. D.Lazard (talk) 09:52, 24 September 2012 (UTC)

Alternate version of non-restoring division
This algorithm for signed integers produces quotient bits of 0 and 1. The dividend has double the number of bits as the divisor, quotient, and remainder. In this example, a 32 bit dividend is divided by a 16 bit divisor, producing a 16 bit quotient and remainder. There are some pre and post processing steps used to handle negative divisors, and clean up of the remainder. The overflow logic is not included.

Rcgldr (talk) 16:39, 22 November 2013 (UTC)

Fast and Slow division
Labelling some algorithms as "slow" and others as "fast" can be a bit misleading, particularly from the perspective of hardware implementation.

There are additive methods (non restoring, SRT) and multiplicative ones (newton, goldshmidt...).

The multiplicative methods converge in fewer steps, but these steps are far more complex, because a multiplication is more complex than an addition. The multiplicative methods also require an initial approximation wich incurs some delay and area.

The Goldshmidt method purpose' is also related to multiplication speed. It has some issues with precision and accumulating errors, but it has the benefit of using two independant multiplications per iteration, instead of dependant ones for Newton. It enables to use two parallel multipliers or one pipelined two-cycles multiplier. — Preceding unsigned comment added by Temlib (talk • contribs) 23:06, 24 July 2014 (UTC)

Improved unsigned division by constant
I saw this improved algorithm but it isn't peer reviewed

I'm pretty certain it is okay butdon't feelitpasses reliable sources, or does it if ridiculous fish has released an open source library with improved division algorithms and might be known as n authority? Dmcq (talk) 23:46, 25 September 2014 (UTC)


 * Include. This case is interesting wrt WP policies.
 * The article is not peer reviewed, and it looks like a blog. That puts it in the unreliable source category.
 * I'm willing to let such sources in if they reference reliable sources. Unfortunately, this self-published article only self-references.
 * Publishing open-source code does not make one a well-known authority. IIRC, I've challenged links to open-source code where the author had published a few refereed journal articles in the field. I don't see ridiculous fish having a significant presence. We don't know his referreed publication history. If his open-source library is well received, then I would put RF in the WK authority column for this narrow field. I'm not sure RF qualifies yet.
 * I've challenged self-published works where I do not believe the content. That is an easy case. Blogs often have the author making an observation or stating an opinion. I've even challenged apparently reviewed primary sources that make dubious claims. Here, I believe the content.
 * I've challenged self-published works that state a possible-but-not-clearcut claim and then offer a flawed proof of the claim. Many amateurs do not have sufficient understanding of a field. Here, the author understands the field and offers proof. (I haven't checked the proofs, but I believe they are credible.)
 * Despite the above, there are times that I would accept self-published sources without references. If the author appears knowledgeable about the subject, makes simple claims, and follows those claims with clear explanations, then I will not complain about the source. That seems to be the case here. The author appears knowledgeable about the subject, makes clear statements, and offers proofs. With sufficiently skilled WP editors, the source could fall under WP:CALC. We could verify the claims in the article by checking the proofs and trying the code. The source is not expressing expert opinion where the weight of the opinion relies on the weight of the author's credentials. The claims seem to be verifiable.
 * I'd also invoke WP:IAR here. The material is relevant to the article.
 * Glrx (talk) 19:36, 30 September 2014 (UTC)


 * Another article. [divcnst-pldi94.pdf]. This one appears to be peer reviewed and in addition, it is used by GCC and Microsoft compilers, probably other compilers as well.

I generated X86 assembly code for 16, 32, and 64 bits to confirm the algorithm. In the article, a uword has N bits, a udword has 2N bits, n = numerator, d = denominator = divisor, ℓ is initially set to ceil(log2(d)), shpre is pre-shift (used before multiply) = e = number of trailing zero bits in d, shpost is post-shift (used after multiply), prec is precision = N - e = N - shpre. The goal is to optimize calculation of n/d using a pre-shift, multiply, and post-shift.

Scroll down to figure 6.2, which defines how a udword multiplier (max size is N+1 bits), is generated, but doesn't clearly explain the process. I'll explain this below.

Figure 4.2 and figure 6.2 show how the multiplier can be reduced to a N bit or less multiplier for most divisors. Equation 4.5 explains how the formula used to deal with N+1 bit multipliers in figure 4.1 and 4.2 was derived.

Going back to Figure 6.2. The numerator can be larger than a udword only when divisor > 2^(N-1) (when ℓ == N), in this case the optimized replacement for n/d is a compare (if n>=d, q = 1, else q = 0), so no multiplier is generated. The initial values of mlow and mhigh will be N+1 bits, and two udword/uword divides can be used to produce each N+1 bit value (mlow or mhigh). Using X86 in 64 bit mode as an example:

; upper 8 bytes of numerator = 2^(ℓ) = (upper part of 2^(N+ℓ)) ; lower 8 bytes of numerator for mlow = 0 ; lower 8 bytes of numerator for mhigh = 2^(N+ℓ-prec) = 2^(ℓ+shpre) = 2^(ℓ+e) numerator dq   2 dup(?)        ;16 byte numerator divisor  dq    1 dup(?)        ; 8 byte divisor ; ...           mov     rcx,divisor mov    rdx,0 mov    rax,numerator+8    ;upper 8 bytes of numerator div    rcx                ;after div, rax == 1 mov    rax,numerator      ;lower 8 bytes of numerator div    rcx mov    rdx,1              ;rdx:rax = N+1 bit value = 65 bit value

Rcgldr (talk) 05:49, 20 January 2017 (UTC)

Regarding the Newton-Raphson algorithm
In this text :

Express D as M × 2e where 1 &le; M < 2 (standard floating point representation) D' := D / 2e+1  // scale between 0.5 and 1, can be performed with bit shift / exponent subtraction N' := N / 2e+1 X := 48/17 - 32/17 × D'  // precompute constants with same precision as D repeat $\left \lceil \log_2 \frac{P + 1}{\log_2 17} \right \rceil \,$ times  // can be precomputed based on fixed P X := X + X × (1 - D' × X) end return N' × X
 * For example, for a double-precision floating-point division, this method uses 10 multiplies, 9 adds, and 2 shifts.

I'm just curious on how many iterations that are performed in that case? Since the algorithm does two MUL and two ADD each iteration, I assume that only five iterations are needed then? Correct? — Preceding unsigned comment added by Eleteria (talk • contribs) 09:35, 28 February 2016 (UTC)


 * I only see 4 iterations; 1 multiply, 1 add, and 2 shifts going in, 4 iters for 8 multiplies and 8 adds, and one multiply coming out. Glrx (talk) 22:45, 3 March 2016 (UTC)

Recursive algorithm?
should you not mention the recursive one, for non-negative integers ? div(a,b) a 0 else => 1 + div(a-b,b)  — Preceding unsigned comment added by 87.240.239.70 (talk) 18:37, 23 June 2016 (UTC)


 * This algorithm is correct, but is exponentially slower than long division (the number of steps is equal to the quotient, while for the usual algorithm the number of steps is the number of digits of the quotient). Therefore, it is never used and does not deserve to be mentioned. D.Lazard (talk) 20:28, 23 June 2016 (UTC)

Newton–Raphson - number of significant bits
In this section it states "while making use of the second expression, one must compute the product between $$X_i$$ and $$(2-DX_i)$$ with double the required precision (2n bits)." It appears to me that the required precision would be n+1 bits. Rcgldr (talk) 18:21, 28 December 2016 (UTC)

Chipdie divide algorithm the most like multiply by design. Each digit of divisor is prepared doing high bit set. There is sample next.
Let divide 123 by 45. Will have 123 / 4 to got quotient 30. Then divide 30 by 15. Will have 30 / 15 to got quotient 2. It is divide the number, answer is quotient 2. The only except divide by numbers 11, 111, 1111 ... If one try divide 22 by 11 he will got exact 2. Now let such calculate do the exact number to select. That sequence of digit 1 should to select on divisor value. See further a sample how to: 123 / 4 (the number 4 is !digit! of number 45). To the end, a digits of divisor should prepared append a digit 1. Then digit 5 should have 15. This easy can binary divide algorithm. I would accept any note on hardware implementation of "chipdie" divide.95.78.44.26 (talk) 08:46, 5 April 2017 (UTC) Victor Mineev

Generalization of Newton–Raphson
See Division algorithm. I noticed that one can reduce the number of iterations required if one makes the error converge faster after each iteration. Specifically, let
 * $$ X_{i+1} = X_i ( 1 + \epsilon_i + \epsilon_i^2 + \ldots + \epsilon_i^k ) = \frac{1}{D} ( 1 - \epsilon_i ) ( 1 + \epsilon_i + \epsilon_i^2 + \ldots + \epsilon_i^k ) = \frac{1}{D} ( 1 - \epsilon_i^{k+1} ) .$$

Then
 * $$ \epsilon_{i+1} = \epsilon_i^{k+1} .$$

If we assume that the calculator or computer which we are using does all its multiplications to the same precision and at the same cost, then we can minimize the over-all cost of the required iterations if we choose k such that precision gained per multiplication is maximized. That is, maximize
 * $$ \frac{\log (k+1)}{k+1} $$

since each iteration requires k+1 multiplications and multiplies the precision by k+1. It is easily determined that this occurs when k+1 = 3, i.e. when k = 2. So the method that I am recommending is
 * $$ X_{i+1} = X_i ( 1 + \epsilon_i + \epsilon_i^2 ) = X_i ( 1 + (1 - D X_i) + (1 - D X_i)^2 ) =  X_i ( 3 - 3 D X_i + D^2 X_i^2 ) .$$

OK? JRSpriggs (talk) 06:52, 19 May 2018 (UTC)

Another part of the algorithm which can be improved is the calculation of the initial estimate X0. If we allow two multiplications (instead of just one) in the formula for the initial estimate, then we can use
 * $$ X_0 = \frac{140}{33} + D \cdot (\frac{-64}{11} + D \cdot \frac{256}{99}) = 4.\overline{24} + D \cdot (-5.\overline{81} + D \cdot 2.\overline{58}) \approx \frac{1}{D} $$

for D in the interval [0.5, 1.0]. This gives a maximum absolute value of the error of
 * $$ \vert \epsilon_0 \vert \le \frac{1}{99} = 0.\overline{01} $$

which is considerably better than the 1/17 which we currently are using. In fact,
 * $$ 17^3 = 4913 < 9801 = 99^2 $$
 * $$ 17^{\tfrac{3}{2}} < 99 $$
 * $$ \sqrt{2} < \sqrt[3]{3} < \tfrac{3}{2} .$$

So this is a better use of one extra multiplication than either half a Newton-Raphson iteration or a third of an iteration of the method which I explained in my previous comment here. Finding the coefficients which would give such a good result took considerable effort. So probably any further improvement in the initial estimate should rely on a table look-up. JRSpriggs (talk) 07:18, 22 May 2018 (UTC)

If three multiplications are used in getting the initial estimate, then I determined that the best cubic polynomial approximation to 1/D in [0.5, 1.0] would have a maximum absolute value of the error of 1/577. While this looks better than 1/99, it is not enough of an improvement to justify using the additional multiplication for that rather than to do more Newton-Raphson iterations. I used a figure of merit of
 * $$ \log_2 \left( \log_2 \frac{1}{M} \right) $$

where M is the maximum absolute value of the error in the estimate. This measures the number of Newton-Raphson iterations avoided relative to just using an initial estimate of 1. I guess that this is an instance of the law of diminishing returns. JRSpriggs (talk) 14:44, 23 May 2018 (UTC)

I just became aware of Chebyshev polynomials of the first kind and realized that they provide the answer to getting the best initial estimate of 1/D for polynomials of a specified degree. If the degree is to be n, then
 * $$ \epsilon_0 = M \, T_{n+1} (3 - 4D) $$

and thus
 * $$ X_0 = \frac{1}{D} (1 - \epsilon_0) = \frac{1}{D} (1 - M \, T_{n+1} (3 - 4D)) $$

where
 * $$ M = \frac{1}{T_{n+1} (3)} .$$

Well, this just shows that there is nothing new under the Sun, and my ignorance vastly exceeds my knowledge. JRSpriggs (talk) 07:33, 24 May 2018 (UTC)

The delta column shows the improvement in the merit per multiplication. The best linear approximation is way better than either the reference (X0 = 1) or the best constant (X0 = 4/3). The best quadratic is enough better than the best linear to justify using another multiplication. Neither best cubic nor best quartic improves enough to justify the extra multiplications. JRSpriggs (talk) 04:58, 25 May 2018 (UTC)

Same improvements to Goldschmidt division
The same improvements which we made to Newton-Raphson division can be made to Goldschmidt division.
 * Shift N and D so that D is in [1/2, 1]
 * $$ F := \frac{140}{33} + D \cdot (\frac{-64}{11} + D \cdot \frac{256}{99}) $$ /* Puts D into [98/99, 100/99].
 * Begin loop
 * $$ N := N \cdot F $$
 * $$ D := D \cdot F $$
 * $$ E := 1 - D $$
 * If E is indistinguishable from 0, then output N and stop.
 * $$ F:= 1 + E + E \cdot E $$ /* Cubes the error E of D.
 * End loop.

OK? JRSpriggs (talk) 08:14, 29 May 2018 (UTC)

Division through repeated subtraction
It is possible to perform division by repeated subtraction of complementary numbers, but the magic is that the result is gradually built from modifying the divdend (by addition) and accumulating digits (that form the result of the division) to the left of what remains of the dividend; successive digits of the division appear from left to right as one proceeds through the algorithm. How this can work is not intuitive and could benefit from a theoretical explanation. (This method was used on adding machines) Axd (talk) 19:19, 31 July 2018‎ (UTC)


 * Since I do not have such a mechanical adding machine nor a manual for one, I can only guess. I suspect that it used a decimal version of what is here called Division algorithm or something similar to that. JRSpriggs (talk) 21:50, 31 July 2018 (UTC)
 * Perhaps the following may be helpful. If N = D&middot;Q + R, then N + (10k &minus; D)&middot;Q = Q&middot;10k + R. For example,
 * 5537 = 72&middot;76 + 65 and 0 ≤ 65 < 72
 * 5537 + 9928&middot;76 = 76&middot;10000 + 65 = 760065
 * So one could begin with 5537 and repeatedly add 99280 to it until one fails to get a carry into the hundred-thousand's place; back up to the previous step 700497; repeatedly add 9928 until one fails to get a carry into the ten-thousand's place; back up to the previous step 760065; and finally separate that into Q = 76 and R = 65.
 * Did that help? JRSpriggs (talk) 00:02, 2 August 2018 (UTC)

Where the digital division of real numbers?
To be encyclopedic, some introduction and links to floating point, Continued fraction, Arbitrary-precision arithmetic, etc.

Summary and Comparison
It would be nice if the article contained a concise comparison of the division methods: are some methods "better" than some others, in some respect? Currently there is very little that would help see how better and better division methods were developed, and what is the state of the art currently. At its present state the article is just a long list of different division algorithms. — Preceding unsigned comment added by 87.92.32.62 (talk) 11:32, 28 April 2019 (UTC)
 * When working with fixed-length numbers (floating point as well as fixed-point arithmetic), the best algorithm strongly depends of the used hardware and software technology. It is thus difficult to give more details in an encyclopedic article. However, when working with arbitrary-length, the final answer has been given, and I have added it to the lead: division and multiplication have the same computational complexity (up to a constant factor), and a faster division algorithm relies on a faster multiplication algorithm. Some more details would be welcome, but we must wait that a competent editor will be willing to improve the article. D.Lazard (talk) 13:40, 28 April 2019 (UTC)

Galley division
The article must include details of galley division. —Jencie Nasino (talk) 02:20, 16 August 2019 (UTC)
 * I put a link to that article in the "See also" section. There is no need to describe the method in detail again here. And since it is no longer used by anyone, it is notable only for historical reasons. JRSpriggs (talk) 04:58, 17 August 2019 (UTC)

"Anderson Earle Goldschmidt Powers algorithm" listed at Redirects for discussion
An editor has asked for a discussion to address the redirect Anderson Earle Goldschmidt Powers algorithm. Please participate in the redirect discussion if you wish to do so. D.Lazard (talk) 14:30, 24 October 2019 (UTC)

Recurrence relation at top of "Slow Division Methods"
Where is this relation coming from? I have seen R[i+1] = R[i] + q[i] * D * B^i, but not this one. — Preceding unsigned comment added by Mikecondron (talk • contribs) 19:07, 4 February 2020 (UTC)