User:Wvbailey/Euclid's algorithm

A sandbox page:

Algorithms for computers: an example
In computer systems, an algorithm is basically an instance of logic written in software by software developers to be effective for the intended "target" computer(s), in order for the target machines to produce output from given input (perhaps null).

"Elegant" programs, "good" (fast) programs ' The notion of "simplicity and elegance" appears informally in Knuth and precisely in Chaitin:
 * Knuth: ". . .we want good algorithms in some loosely defined aesthetic sense. One criterion . . . is the length of time taken to perform the algorithm . . .. Other criteria are adaptability of the algorithm to computers, its simplicty and elegance, etc"


 * Chaitin: "I'll show that you can't prove that a program is 'elegant,' by which I mean that it's the smallest possible program for producing the output that it does"

Algorithm versus function computable by an algorithm: For a given function multiple algorithms may exist. This will be true, even without expanding the available instruction set available to the programmer (e.g. the two Rogers observes that "It is . . . important to distinguish between the notion of algorithm, i.e. procedure and the notion of function computable by algorithm, i.e. mapping yielded by procedure. The same function may have several different algorithms".

Unfortunately there may be a tradeoff between goodness (speed) and elegance (compactness) -- an elegant program may take more steps to complete a computation than one less elegant. An example of using Euclid's algorithm will be shown below.

Computers (and computors), models of computation: A computer (or human "computor" ) is a restricted type of machine, a "discrete determinisitic mechanical device" that follows blindly its instructions. Melzak's and Lambek's primitive models reduced this notion to four elements: (i) discrete, distinguishable locations, (ii) discrete, indistinguishable counters; if no confusion will result, the word "counters" can be dropped, and a location can be said to contain a single "number", (iii) an agent, and (iv) a list of instructions that are effective relative to the capability of the agent.

Simulation of an algorithm: computer(computor) language: Knuth advises the reader that "the best way to learn an algorithm is to try it . . . immediately take pen and paper and work through an example". But what about a simulation or execution of the real thing? The programmer must translate the algorithm into a language that the simulator/computer/computor can effectively execute. Stone gives an example of this: when computing the roots of a quadratic equation the computor must know how to take a square root. If they don't then for the algorithm to be effective, relative to the capabilities of the computor, it must provide a set of rules for extracting a square root.

This means that the programmer must know a "language" effective relative to a target computing agent.

Van Emde Boas observes "even if we base complexity theory on abstract instead of concrete machines, arbitrariness of the choice of a model remains. It is at this point that the notion of simulation enters". But what model should be used for the simulation? When speed is being measured, the instruction set matters. For example, the subprogram in Euclid's algorithm to compute the remainder would execute much faster if a "modulus" (division) were available rather than being limited to just subtraction (or worse: limited to the Lambek "abacus"'s "decrement by 1").

Structured programming, canonical structures: Kemeny and Kurtz observe that while "undisciplined" use of GOTOs and IF-THENS can result in "spaghetti code" a programmer can write structured programs using these instructions; on the other hand "it is also possible, and not too hard, to write badly structured programs in a structured language." In particular they mention the DO-LOOP (DO-WHILE) and the IF-THEN-ELSE strutures These are two of the three Böhm-Jacopini canonical structures, the other being the simple DO-THEN sequence DO-THEN, IF-THEN-ELSE, and WHILE-DO perhaps augmented with DO-WHILE and CASE. An additional benefit will be a program that lends itself to proofs of correctness using mathematical induction.

Canonical flowchart symbols : A flow-chart is a graphical aid to both writing a program and documenting it. Flowcharts always start at the top of a page and proceed down the page. Primary symbols used are only 4: the directed arrow showing program flow, the basic rectangle (DO-THEN or SEQUENCE), the diamond (IF-THEN-ELSE), and the dot (OR-tie). The canonical structures are made of these primitive shapes. "Nesting" of sub-structures in a superstructure is permitted only if a single exit occurs from the superstructure. The symbols:
 * DO-THEN, SEQUENCE: A series of assignment-rectangles with an entry and an exit.
 * IF-THEN-ELSE. Diamond with arrow entering top; arrows leave left or right vertex or from bottom and go to SEQUENCEs (perhaps null), terminating at a common "or-tie".
 * Unconditional GOTO: not recommended in structured programming but is embedded in the WHILE-DO: rectangle with single arrow exiting bottom but going to instruction specified.


 * WHILE-DO: IF-THEN followed by a SEQUENCE ending in a GOTO that returns its arrow to the top of the diamond and OR-ties with the arrow entering it.

Euclid’s algorithm


Euclid’s algorithm appears as Proposition II in Book VII ("Elementary Number Theory") of his Elements. Euclid poses the problem: "Given two numbers not prime to one another, to find their greatest common measure". He defines "A number [to be] a multitude composed of units": a counting number, a positive integer not including 0. And to "measure" is to place a shorter measuring length s successively (q times) along longer length l until the remaining portion r is less than the shorter length s. In modern words, remainder r = l - q*s, q being the quotient, or remainder r is the "modulus", the integer-fractional part left over after the division.

For Euclid’s method to succeed, the starting lengths must satisfy two requirements: (i) the lengths must not be 0, AND (ii) the subtraction must be “proper”, a test must guarantee that the smaller of the two numbers is subtracted from the larger (alternately, the two can be equal so their subtraction yields 0).

Euclid's original proof adds a third: the two lengths are not prime to one another. Euclid stipulated this so that he could construct a reductio ad absurdum proof that the two numbers' common measure is in fact the greatest. While Nicomachus' algorithm is the same as Euclid's, when the numbers are prime to one another it yields the number "1" for their common measure. So to be precise the following is really Nicomachus' algorithm.

Computer(computor) language for Euclid's algorithm
Only a few instruction types are required to execute Euclid's algorithm -- some logical tests (conditional GOTO), unconditional GOTO, assignment (replacement), and subtraction.
 * A location is symbolized by upper case letter(s), e.g. S, A, etc.
 * The varying quantity (number) in a location will be written in lower case letter(s) and (usually) associated with the location's name. For example, location L at the start might contain the number l = 3009.

A inelegant program for Euclid's algorithm


The following algorithm is framed as Knuth's 4-step version of Euclid's and Nichomachus', but rather than using division to find the remainder it uses successive subtractions of the shorter length s from the remaining length r until r is less than s. The bold-face headings are adapted from Knuth 1973:2-4:

INPUT:
 * 1 [Into two locations L and S put the numbers l and s that represent the two lengths]: INPUT L, S
 * 2 [Initialize R: make the remaining length r equal to the starting/initial/input length l] R := L

E0: [Insure r ≥ s.]
 * 3 [Insure the smaller of the two numbers is in S and the larger in R]: IF R > S THEN the contents of L is the larger number so skip over the exchange-steps 4, 5 and 6: GOTO step 6 ELSE swap the contents of R and S.]
 * 4 L ← R (this first step is redundant, but will be useful for later discussion).
 * 5 R ← S
 * 6 S ← L

E1:[Find remainder]: Until the remaining length r in R is less than the shorter length s in S, repeatedly subtract the measuring number s in S from the remaining length r in R.
 * 7 IF S > R THEN done measuring so GOTO 10 ELSE measure again,
 * 8 R ← R - S
 * 9 [Remainder-loop]: GOTO 7.

E2: [Is the remainder 0?]: EITHER (i) the last measure was exact and the remainder in R is 0 program can halt, OR (ii) the algorithm must continue: the last measure left a remainder in R less than measuring number in S.
 * 10 IF R = 0 then done GOTO step 15 ELSE continue to step 11,

E3.: [Interchange]: The nut of Euclid's algorithm. Use remainder r to measure what was previously smaller number s:; L serves as a temporary location. Swap the contents of R and S.
 * 11 L := R
 * 12 R := S
 * 13 S := L
 * 14 [Repeat the measuring process]: GOTO 7

OUTPUT:


 * 15 [Done. S contains the greatest common divisor]: PRINT S
 * 16 HALT, END, STOP.

An elegant program
The following version of Euclid's algorithm requires only 6 core instructions to do what 13 are required to do in the inelegant version; worse, "Inelegant" requires more types of instructions. The flowchart of "Elegant" can be found at the top of this article. In the (unstructured) Basic language. The instruction LET [ ] = [ ] is the assignment instruction symbolized by ←. 5 REM Euclid's algorithm for greatest common divisor 6 PRINT "Type two integers greater than 0" 10 INPUT A,B 20 IF B=0 THEN GOTO 80 30 IF A > B THEN GOTO 60 40 LET B=B-A 50 GOTO 20 60 LET A=A-B 70 GOTO 20 80 PRINT A  90 END How "elegant" works: In place of an outer "Euclid loop", "Elegant" shifts back and forth between two "co-loops", an A% > B% loop that computes A% := A% - B%, and a B% ≤ A% loop that computes B% := B% - A%. This works because, when at last the minuend M is less than or equal to the subtrahend S ( Difference = Minuend - Subtrahend), the minuend can become s (the new measuring length) and the subtrahend can become the new l (the length to be measured); in other words the "sense" of the subtraction reverses.

Testing the Euclid algorithms
Does an algorithm do what its author wants it to do? A few test cases usually suffice to confirm core functionality. One source uses 3009 and 884. Knuth suggested 40902, 24140. Another interesting case is the two relatively-prime numbers 14157 and 5950.

But exceptional cases must be identified and tested. Will "Inelegant" perform properly when R > S, S > R, R = S? Ditto for "Elegant": B > A, A > B, A = B? (Yes to all). What happens when one number is zero, both numbers are zero? ("Inelegant" computes forever in all cases; elegant computes forever when A = 0.) What happens if negative numbers are entered? Fractional numbers? If the input numbers, the domain of the function computed by the algorithm/program, is to include only positive integers including zero, then the failures at zero indicate that the algorithm (and the program that instantiates it) is a partial function rather than a total function. A notable failure due to exceptions is the Ariane V rocket failure.

Proof of program correctness by use of mathematical induction: Knuth demonstrates the application of mathematical induction to an "extended" version of Euclid's algorithm, and he proposes "a general method applicable to proving the validity of any algorithm". Tausworthe proposes that a measure of the complexity of a program be the length of its correctness proof.

Measuring and improving the Euclid algorithms
Elegance (compactness) versus goodness (speed) : With only 6 core instructions, "Elegant" is the clear winner compared to "Inelegant" at 13 instructions. However, "Inelegant" is faster (it arrives at HALT in fewer steps). Algorithm analysis indicates why this is the case: "Elegant" does two conditional tests in every subtraction loop, whereas "Inelegant" only does one. As the algorithm (usually) requires many loop-throughs, on average much time is wasted doing a "B = 0?" test that is needed only after the remainder is computed.

Can the algorithms be improved?: Once the programmer judges a program "fit" and "effective" -- that is, it computes the function intended by its author -- then the question becomes, can it be improved?

The compactness of "Inelegant" can be improved by the elimination of 5 steps. Observe that steps 4, 5 and 6 are repeated in steps 11, 12 and 13. Comparison with "Elegant" provides a hint that these steps together with steps 2 and 3 can be eliminated. This reduces the number of core instructions from 13 to 8, which makes it "more elegant" than "Elegant" at 9 steps.

The speed of "Elegant" can be improved by moving the B=0? test outside of the two subtraction loops. This change calls for the addition of 3 instructions (B=0?, A=0?, GOTO). Now "Elegant" computes the example-numbers faster; whether for any given A, B and R, S this is always the case would require a detailed analysis.

For Euclid, the first requirement was tacit and not discussed: a length of 0 would not meet the definition of “a number”, i.e. a multitude of units. However, Nichomachus’ example hints at the problem of a remainder equal to “zero”, in fact the equivalent is what terminates his algorithm. Heath 1908:300 reports (quoting Nicomachus) “ ‘. . . but 7 cannot be subtracted from 7.’ The last phrase is curious, but the meaning of it is obvious, as also the meaning of the phrase about ending 'at one and the same number' ".

With regards to the second requirement, Euclid begins his proof with the possibility that at the outset the two lengths are the same. Indeed if this is the case then the method ends. The computer algorithm will have to account for this possibility.