A Uni ed Approach to HGCD Algorithms for polynomials and integers

A Uni�ed Approach to HGCD Algorithms forpolynomials and integersKlaus Thull and Chee K. Yap�Freie Universit�at BerlinFachbereich MathematikArnimallee 2-6D-1000 Berlin 33West GermanyMarch, 1990AbstractWe present a uni�ed framework for the asymptotically fast Half-GCD(HGCD) algorithms, based on properties of the norm. Two other bene�tsof our approach are (a) a simpli�ed correctness proof of the polynomialHGCD algorithm and (b) the �rst explicit integer HGCD algorithm. Theinteger HGCD algorithm turns out to be rather intricate.Keywords: Integer GCD, Euclidean algorithm, Polynomial GCD, Half GCDalgorithm, e�cient algorithm.�On leave from the Courant Institute, New York University, 251 Mercer Street, New York,NY 10012. This author is supported by a grant from the German Science Foundation (DFG),the ESPRIT II Basic Research Action of the E.C. under contract No.3075 (Project ALCOM),and also NSF Grant #DCR-84-01898 and #CCR-87-03458.0

1 IntroductionThe �rst subquadratic algorithm for integer GCD came from Knuth (1970) [6]who described an O(n log5 n log logn) algorithm. The ideas go back to Lehmer[8] ([7, p.328]). Sch�onhage (1971) [11] improved it into an O(n log2 n log logn)algorithm. Moenck [9] adapted Sch�onhage's algorithm into an O(n log2 n) al-gorithm for polynomial GCD. But it's correctness was restricted to input poly-nomials that form \normal remainder sequences". The widely cited book ofAho, Hopcroft and Ullman [1] gives a false proof of correctness for Moenck'salgorithm. The Moenck-Aho-Hopcroft-Ullman algorithm is erroneous for non-trivial reasons, and is still not completely understood { we discuss this in anappendix. Brent-Gustavson-Yun [2] published a little-known1 corrected algo-rithm, but without proof of correctness. Since correctness is not obvious, andthe approach of [1] seems to be less than transparent (considering how its in-correctness had eluded detection), it seems desirable to have for the record anincontrovertible algorithm. We provide a such correctness proof here, which webelieve is new and simple.One bene�t of our uni�ed approach is that it almost gives two algorithms forthe price of one: we will present a parallel development for an integer HGCDalgorithm. In fact, there has not been a previous Half-GCD algorithm, and theone here has various subtleties. Finally, our framework can perhaps be extendedto other Euclidean domains (recall that GCD's are more generally de�ned inunique factorization domains) for which there are currently no subquadraticGCD algorithms (e.g., Gaussian integers). Currently, the most general GCDalgorithms are due to Kaltofen and Rolletschek [5]; their GCD algorithms workin quadratic integer domains which need be Euclidean.One caveat is that although the algorithms of Sch�onhage and Moenck, et alare worst-case asymptotically the fastest for their respective domains, it seemsthat modular methods are favored in practice. Also, the use of probabilistic al-gorithms (see Sch�onhage [12], Kaltofen [4]) may be more practical. There is alsoan entirely di�erent development of GCD algorithms over the ring D[x] whereD is an integral domain [3]. Finally, Strassen [13] has shown that the methodof Sch�onhage is asymptotically optimal with a bound of �(n logn), using adi�erent model of computation (counting only non-scalar multiplications).Notations. In this paper, we are interested in two domains, the ring of inte-gers and the polynomial ring K[x] over some �eld K. We let D denote eitherof these domains. We use the in�x binary operators div and mod to denotethe quotient and remainder functions (see below for precise de�nition) on D.As expected, the following equation holds: a = (a div b) � b+ (amod b); wherea; b 2 D, b 6= 0. The (Euclidean) remainder sequence for a; b is the sequence1In fact, we had independently developed our algorithm in ignorance of their result. Anearlier paper of Yun [14] also noted an error in the algorithm of [1] but there he seems to bereferring to what we called the \easily-�xed" error in the appendix.1

(a0; a1; : : : ; ak) of non-zero elements where ai+1 = ai�1 mod ai (i = 1; : : : ; k�1)and 0 = ak�1 mod ak.Half-GCD. Let us brie y recall that algorithm of Moenck is based on the`Half-GCD' function, HGCD(a; b). By de�nition, this function on input poly-nomials a; b 2 K[x] (deg(a) > deg(b) � 0), outputs the 2 by 2 matrix R suchthat if � cd � = R� ab � thendeg(c) � ddeg(a)=2e > deg(d):Once we have a Half-GCD algorithm, it is a simple matter to implement analgorithm for the GCD running in the same asymptotic time complexity (see[1]). An Integer HGCD function can also be de�ned (see below) with similarproperties. So in the rest of this paper, we focus on the Half-GCD problem.2 Some properties of the NormWe review some familiar properties of polynomial and integer that form thebasis for the HGCD algorithms. Let D be either the ring of integers or thepolynomial ring K[x] over a �eld K.De�ne the norm jjajj of a 2 D to be log2 jaj in case a is an integer, and deg(a)in case a is a polynomial. Hencejjajj 2 f�1g [R�(R� is the set of non-negative real numbers) with jjajj = �1 i� a = 0. This normis an (additive) valuation in the usual sense that the following two additionalproperties hold: jjabjj = jjajj+ jjbjjand jja+ bjj � 1 +maxfjjajj; jjbjjg:However, polynomials satisfy the stronger non-Archimedean property:jja+ bjj � maxfjjajj; jjbjjg:It is the non-Archimedean property that makes polynomials relatively easierthan integers. This property implies that jja+ bjj = maxfjjajj; jjbjjg if jjajj 6= jjbjj.The division property for D relative to the norm jj � jj holds: for any a; b 2 D,b 6= 0, there exists q; r 2 D such thata = qb+ r; jjrjj < jjbjj:We call r a remainder of the pair (a; b). In the polynomial case, the remainderis unique. For integers, a zero remainder is unique. Otherwise, (a; b) has two2

remainders, one positive and one negative. In this paper, we de�ne the functiona mod b to always pick the non-negative remainder. In the polynomial case,we have jjamod xmjj � minfjjajj;m� 1g (1)jja div xmjj = 8<: jjajj �m if jjajj � m�1 else. (2)We let gcd(a; b) denote the greatest common divisor of a; b 2 D. Since thegreatest common divisor is de�ned up to associates, we specify gcd(a; b) to benon-negative for integers and monic for polynomials.In this paper, `matrices' and `vectors' are understood to be 2 by 2 matricesand column 2-vectors, respectively. A matrix of the form Q = � q 11 0 �, wherejjqjj > 0 in the case of polynomials, and q > 0 in the case of integers, is said to beelementary, and q is called the partial quotient of Q. We also write Q = hqi inthis case. A regular matrix Q = Q1 � � �Qk is a product of some k � 0 elementarymatrices. If Qi = hqii for all i then Q is also denoted hq1; : : : ; qki. When k = 0,Q is the identity matrix, denoted by E or hi. Regular matrices satisfy thefollowing ordering property: If Q = � p qr s � is regular, thenQ 6= E ) jjpjj � maxfjjqjj; jjrjjg � minfjjqjj; jjrjjg � jjsjj; jjpjj > jjsjj: (3)For vectors U; V and matrix Q, we writeU Q�! V(or simply, U �! V ) and say V is a reduction of U (via Q) if U = QV whereQ is a regular matrix. If, in addition, U = � ab � ; V = � a0b0 � such thatjjajj > jjbjj and jja0jj > jjb0jj, then we say this is an Euclidean reduction. Clearly ifU Q�! V and V Q0�! W then U QQ0�! W . Observe that Euclid's algorithm for apair a; b 2 D (jjajj > jjbjj) can be regarded as a sequence of Euclidean reductions.Note that regular matrices are unimodular (determinant �1), and hence theyare trivial to invert. We say U; V are equivalent if U = QV for some unimodularmatrix Q. If U = � ab � then we write gcd(U) for the GCD of a and b. It iseasy to see that U and V are equivalent if and only if gcd(U) = gcd(V ).We use the following fact about (Euclidean) remainder sequences:Fact 1 Given a; b; a0; b0 such that jjajj > jjbjj � 0. The following are equivalent.(i) a0; b0 are consecutive elements in a remainder sequence of a; b.3

(ii) There is a regular matrix Q such that� ab � Q�! � a0b0 � (4)and either jja0jj > jjb0jj � 0 (polynomial case) or or a0 > b0 > 0 (integer case).Proof: If (i) holds then we can (by Euclid's algorithm) �nd some regular matrixQ satisfying (ii). Conversely assume (ii). We show (i) by induction on thenumber of elementary matrices in the product Q. The result is immediate ifQ = E. If Q is elementary, then (i) follows from the division property for D.Otherwise let Q = Q00Q0 where Q0 is elementary and Q00 is regular. Then forsome a00; b00, � ab � Q00�! � a00b00 � Q0�! � a0b0 � :But a00 = a0q0 + b0; b00 = a0 where q0 is the partial quotient of Q0. We verifythat this means jja00jj > jjb00jj. By induction, a00; b00 are consecutive elements in aremainder sequence of a; b. Then (i) follows. Q.E.D.Note that we require b0 6= 0 in (ii). If b0 6= 0 then a0 = gcd(a; b).3 The Polynomial CaseWe describe a fast polynomial HGCD algorithm and prove its correctness. Thecorrectness criteria for the HGCD algorithm is that on input a; b 2 K[x] withdeg(a) > deg(b) � 0, the output is a regular matrix Q such that� ab� Q�! � c0d0� ; deg(c0) � �deg(a)2 � > deg(d0): (5)

4

Algorithm Polynomial HGCD(a; b):[1] m ldeg(a)2 m ; fThis is the magic thresholdgif deg(b) < m then return(E);[2] a0 a div xm; b0 b div xm;fnow deg(a0) = m0 where m+m0 = deg(a)gR HGCD(a0; b0);flm02 m is the magic threshold for this recursive callg� a0b0 � R�1� ab �;[3] if deg(b0) < m then return(R);[4] q a0 div b0; � cd � � b0a0 mod b0 �;[5] l deg(c); k 2m� l; fnow l �m < lm02 mg[6] c0 c div xk ; d0 d div xk ; fnow deg(c0) = 2(l �m)gS HGCD(c0; d0); fl�m is magic thresholdfor this recursive call. If � cd � = S� c0d0 � thendeg(c0) � m and deg(d0) < m.g[7] Q R � � q 11 0 � � S; return(Q);The following correctness criteria is central to our analysis. First we set upthe notations:Let a; b 2 K[x], jjajj > jjbjj � 0 and m � 1 be given. As in the HGCDalgorithm above, let a0 = a div xm and b0 = b div xm. This determines a1; b1via the equation� ab � = � a0xm + a1b0xm + b1 � = � a0 a1b0 b1 �� xm1 � : (6)Now let Q be any given regular matrix. This determines a00; b00; a01; b01 via� a00 a01b00 b01 � = Q�1� a0 a1b0 b1 � : (7)Finally, de�ne a0; b0 via� a0b0 � = � a00 a01b00 b01 �� xm1 � : (8)Note that � a0b0 � Q�! � a00b00 � ; � ab � Q�! � a0b0 � :5

Lemma 1 (Correctness Criteria) Let a; b;m;Q be given as above, and de-�ne the remaining notations ai; bi; a0i; b0i; a0; b0 (i = 0; 1) as indicated. Ifjja00jj > jjb00jj; (9)jja0jj � 2jja00jj (10)then jja0jj = m+ jja00jj; (11)jjb0jj � m+maxfjjb00jj; jja0jj � jja00jj � 1g: (12)In particular, jja0jj > jjb0jj.Proof: Let Q = � p qr s �. First observe that equation (9) and a0 = a00p+ b00qimplies jja0jj = jja00jj+ jjpjj. Hence equation (10) is equivalent tojjpjj � jja00jj: (13)Since Q�1 = �� s �q�r p � and a01 = �(a1s� b1q),jja01jj � maxfjja1sjj; jjb1qjjg < m+ jjpjj � m+ jja00jjSince a0 = a00xm + a01, we have proven equation (11).From b01 = �(�a1r + b1p) we get jjb01jj � m� 1 + jjpjj = m� 1 + jja0jj � jja00jj.Since b0 = b00xm + b01, equation (12) follows. Q.E.D.Remarks The requirement (9) is understandable from Fact 1. But we callthe requirement (10) the (lower) \threshold" for jja00jj. This threshold is thereason for the lower bound on deg(c0) in the HGCD output speci�cation (5).The algorithm of Moenck-Aho-Hopcroft-Ullman fails because their algorithmdoes not respect this threshhold bound.We are ready to prove the correctness of the HGCD algorithm.Lemma 2 (HGCD Correctness) Algorithm HGCD is correct: with inputpolynomials a; b where jjajj > jjbjj � 0, it returns a regular matrix Q satisfy-ing (5).Proof: The algorithm returns a matrix in steps [1], [3] or [7]. It is clear thatin each case the returned matrix is regular. So it remains to check (5). In step[1], the result is clearly correct.Consider the matrix R returned in step [3]: the notations m; a0; b0; a0; b0 inthe algorithm conforms to those in lemma 1, after substituting R for Q. Byinduction hypothesis, the matrix R returned by the �rst recursive call (step [2])satis�es jja00jj � djja0jj=2e > jjb00jj6

where � a0b0 � R�! � a00b00 �. Then lemma 1 implies jja0jj = m + jja00jj � m +djjajj=2e. Since m > jjb0jj is the condition for exit at step [3], it follows that (5)is satis�ed on exit at step [3].Finally consider the matrix Q returned in step [7]. With the notations ofsteps [6] and [7], it is clear that � ab � Q�! � c0d0 �. So it su�ces to showjjc0jj � m > jjd0jj. Since we did not exit in step [3], we have m � jjb0jj. In step [4]we form the quotient q and remainder d of a0 divided by b0. Also we renamedb0 to c. Hence m � l where l = jjcjj. In the second recursive call to HGCD,HGCD(c0; d0) returns S. By induction,jjc00jj � djjc0jj=2e > jjd00jj (14)where � c0d0 � S�! � c00d00 � :But jjc0jj = l � k = 2(l�m) so (14) becomesjjc00jj � l�m > jjd00jj:Now let � cd � S�! � c0d0 �. Another application of lemma 1 (substituting kfor m, S for Q, c for a, d for b, etc) shows thatjjc0jj = k + jjc00jj � k + l �m = mand jjd0jj � k +maxfjjd00jj; jjc0jj � jjc00jj � 1g� k +maxfl �m� 1; l�m� 1g= k + l �m� 1 = m� 1:This shows jjc0jj � m > jjd0jj and hence (5). Q.E.D.The proof shows that we could have used k = 2m� l� 1 as well. The com-plexity of the HGCD algorithm here is O(n log2 n) following the usual analysis[1].4 The Integer CaseLet now D be the set of integers. We develop an integer version of the HGCDalgorithm. Two simple but crucial tricks are used. The �rst is to recover thenon-Archimedean property thus: for a; b; c 2 D,a = b+ c; bc � 0 =) jjajj � maxfjjbjj; jjcjjg:7

In the HGCD algorithm for polynomials a; b, we recursively call HGCD ona div xm; b div xm for a suitable m. The integer analogue would be to callHGCD on a div 2m; b div 2m. Instead, the second trick will call HGCD ona0 := 1 + (a div 2m); b0 := b div 2m: (15)Setup. The following notations will be �xed for this section, especially for thenext two lemmas.Assume that we are given a > b > 0 and m � 1 where a � 2m. Thisdetermines the values a0; a1; b0; b1 via� ab � = � a0 �a1b0 b1 �� 2m1 � ; 0 < a1 � 2m; 0 � b1 < 2m: (16)It is worth noting that both tricks are incorporated in (16). De�ning a0; b0 asin (15) is the same as choosing a1 := 2m� (amod 2m) and b1 := bmod 2m.This choice ensures a0 > b0, as required when recursively calling the algorithmon a0; b0.We are also given a regular matrix Q. This determines the valuesa00; b00; a01; b01; a0; b0 via � a00 a01b00 b01 � = Q�1� a0 �a1b0 b1 � (17)and � a0b0 � = � a00 a01b00 b01 �� 2m1 � : (18)Hence we have � a0b0 � Q�! � a00b00 � ; � ab � Q�! � a0b0 � :Finally, we assume two key inequalities:a00 > b00 � 0 (19)2jja00jj � 1 > jja0jj (20)Now write Q asQ = � p qr s � ; Q�1 = D� s �q�r p � (21)where D = detQ = �1. From (17) we obtaina01 = �D(sa1 + qb1); b01 = D(ra1 + pb1): (22)The proof below uses (22) to predict the signs of a01; b01, assuming the sign of D.This is possible thanks to the second trick.The following is the integer analogue of lemma 1:8

Lemma 3 (Partial Correctness Criteria)Let a; b;m;Q be speci�ed as above.({) Suppose detQ = �1.({a) jja0jj = m+ jja00jj+ �1, (0 � �1 < 1).({b) jjb0jj � m+maxfjjb00jj; jja0jj � jja00jj+ 1g.Moreover, jja0jj > jjb0jj.(+) Suppose detQ = +1.(+a) jja0jj � m+ jja00jj.(+b) jjb0jj � 1 +m+maxfjjb00jj; jja0jj � jja00jj+ 1g.Furthermore b0 � 0.In both cases ({) and (+), a0 > 0.Proof: Since a0 = pa00 + qb00, Q has the ordering property (3), and (19) holds,we get jja0jj = jjpjj+ jja00jj+ �2 (0 � �2 < 1):Hence (20) is equivalent to jjpjj+ �2 < jja00jj � 1:We now prove cases ({) and (+) in parallel.Part (a). From (22),jja01jj � maxfjjsa1jj; jjqb1jjg+ 1� jjpjj+m+ 1 (by (3))< jja00jj+m:Hence jja002mjj > jja01jj and so a0 = a002m+a01 > 0. This is as stated in the lemma.If D = �1 then jja0jj = m+ jja00jj+ �1 for some 0 � �1 < 1. This proves subcase({a). If D = +1 then a01 � 0 (by (22)) and subcase (+a) follows (but now wehave no lower bound on jja0jj).Part (b). Again from (22),jjb01jj � maxfjjra1jj; jjpb1jjg+ 1� jjpjj+m+ 1� jja0jj � jja00jj+m+ 1:In case D = +1, b01 � 0 and hence b0 = b002m + b01 � 0, as desired. Also subcase(+b) easily follows. In caseD = �1, b00b01 � 0 and we have the non-Archimedeaninequality: jjb0jj � maxfjjb002mjj; jjb01jjg:This proves subcase ({b).Finally, we must show that D = �1 implies jja0jj > jjb0jj: this follows imme-diately from (19), (20) and subcases ({a) and ({b). Q.E.D.To motivate the next lemma, we now state:9

Correctness Criteria: on input a > b � 0, the HGCD algorithm outputs aregular matrix Q such thata � 7) Q = E (23)a � 8) jja0jj � 1 + � jjajj2 � > jjb0jj; a0 > b0 � 0 (24)where � ab � Q�! � a0b0 � :Note that 8 is the smallest value of n0 such that for all a � n0, if a > b � 0then there exists a Q satisfying (24).Let us see how the integer analogue of the polynomial HGCD might proceed.As in the polynomial case, our �rst recursive call is HGCD(a0; b0), where m isnow chosen to be 1 + djjajj=2e and a0; b0 are given by (15). If the call returnsQ then de�ne a0; b0 as in (17) and (18). The correctness criteria (24) can bebroken down into four conditions: a0 > 0 (25)jja0jj � m (26)m > jjb0jj (27)b0 � 0 (28)Assuming the parameters are properly set up, it is not hard to see that thepartial correctness lemma could be applied to ensure conditions (25) and (27).Unfortunately, depending on the sign of detQ, the lemma can only ensure oneor the other of conditions (26) and (28). If detQ = �1 then we do have a lowerbound on jja0jj, so condition (26) could be ful�lled, but we have no informationon the sign of b0. If detQ = +1 we ful�ll condition (28) but have no lower boundon jja0jj.Hence we need a analysis deeper than in the above \partial" correctnesscriteria. It turns out that we only have to modify Q slightly to obtain someregular matrix Q� such that � ab � Q��! � a�b� �and a�; b� satisfy the correctness criteria, jja�jj � m > jjb�jj, a� > b� � 0. The\�xup lemma" below shows how to do obtain this.First we make some easily veri�ed observations (cf. basic properties of reg-ular continued fractions, e.g., in [10]). It is convenient to lethq1; : : : ; qki;where qi's are positive integers, denote the regular matrix whose partial quotientsequence is (q1; : : : ; qk). 10

� Let Q = hq1; q2; : : : ; qki be the matrix such that� ab � Q�! � a0b0 � :Call T = � 1 10 �1 � the \toggling" matrix. This is so-called because Tis idempotent (T 2 = E) and QT is equal to hq1; : : : ; qk�1; qk�1; 1i in caseqk > 1, and QT = hq1; : : : ; qk�2; qk�1 + 1i in case qk = 1 and k > 1. (Ifqk = 1 and k = 1, QT is not a regular matrix.) In either case, we have� ab � QT�! � a0 + b0�b0 � :� Below we will need to recover the last partial quotient x from a regularmatrix Q = � p qr s �. Note that Q = E if and only if q = 0, but in thiscase x is unde�ned. Hence assume Q 6= E. Then Q is elementary if andonly if s = 0, and in this case x = p. So we next assume that Q is notelementary. WriteQ0 = � p0 q0r0 s0 � ; Q = Q0 �� x 11 0 �where Q0 6= E and p = xp0 + q0; q = p0. There are two cases. Case ofq = 1: Clearly p0 = 1. Since p0 � q0 � 1 (Q0 is not elementary), we musthave q0 = 1. Hence x equals p � 1. Case of q > 1: Then p0 > q0 (thereare two possibilities to check, depending on whether Q0 is elementary ornot). This implies x = p div q. In summary, the last partial quotient ofQ is given by x = 8>><>>: unde�ned if q = 0p if s = 0p� 1 if q = 1p div q otherwiseLemma 4 (Fixing Up)Retaining the notations from the previous lemma and assuming (16)-(21), let tbe any number (the \�xup threshold") such thatjja00jj � t > maxfjjb00jj; jja0jj � jja00jj+ 1g: (29)Moreover, if we write Q as hq1; : : : ; qki and Q� is as speci�ed below, thenjja�jj � m+ t > jjb�jj (30)and b� � 0; (31)where � ab � Q��! � a�b� �. Here Q� is the regular matrix speci�ed as follows:11

({) Suppose detQ = �1.({A) If b0 � 0 then Q� := Q.({B) Else if jja0 + b0jj � m+ t then Q� := QT .({C) Else if the last partial quotient qk � 2 then Q� := hq1; : : : ; qk�1; qk�1i.({D) Else Q� := hq1; : : : ; qk�3; qk�2i.(+) Suppose detQ = +1.(+A) If jja0jj � m + t then there is a unique regular matrix ~Q that is aproduct of at most two partial quotients such that� a0b0 � ~Q�! � a�b� �and a�; b� satisfy (30) and (31). Let Q� := Q ~Q.(+B) Else let Q� := hq1; : : : ; qk�1i.Proof: Observe from (19) and (20) that t exists. It is useful to note the follow-ing: (*) Suppose that jja00jj � 1 + djja0jj=2e > jjb00jj (cf. (24)) and t =1 + djja0jj=2e. Then both (20) and (29) are satis�ed.In the following proof, we often appeal to (*) to enforce the conditions (20)and (29) required for the application of the partial correctness lemma.First assume D = �1.Subcase ({A) In this subcase, (31) is automatic, and (30) follows from case({) of the partial correctness lemma.Subcase ({B) In this subcase, we are simply \toggling" and, as noted earlier,� a�b� � = � a0 + b0�b0 �. Recall that toggling is invalid in case k = 1 and q1 = 1.But if this were the case then� ab � = � 1 11 0 �� a0b0 � :This implies a0 + b0 = a > b = a0 and so b0 > 0, contradicting the fact that wehad already excluded subcase ({A). Again, (31) is immediate, and (30) followsfrom case ({) of the partial correctness lemma (and jja�jj = jja0 + b0jj � m+ t byassumption).Subcase ({C) In this subcase, Q� can be written as Q� 1 0�1 1 �, and so� a�b� � = � a0a0 + b0 �. We see that (31) holds by virtue of a0 + b0 > 0 (sincejja0jj > jjb0jj; a0 > 0). Also (30) holds because the partial correctness lemmaimplies jja0jj � m+ t and, since subcase ({B) fails, jja0 + b0jj < m+ t.12

Subcase ({D) Now qk = 1 and Q� omits the last two partial quotients(qk�1; qk) = (x; 1) where we write x for qk�1. We ought to show k � 2, butthis is the same argument as in subcase ({B). Hence Q� = Q� 1 �x�1 x+ 1 �and � a�b� � = � a0(x+ 1) + b0xa0 + b0 �. It is not hard to check that the remaindersequence of a; b \corresponding" to Q has the form(a; b; : : : ; a�; b�; a0; b0):Then (31) holds because a0 + b0 > 0. To see (30), it is clear that m + t > jjb�jjand it remains to show jja�jj � m + t. Suppose � a0b0 � Q��! � a�0b�0 �. Thena�0 = (x+1)a00+xb00 and b�0 = a00+ b00, where a00; b00 are de�ned as in (17). SincedetQ� = �1, a�0 > b�0 � 0 and 2jja�0jj � 1 > jja0jj, we see that case ({) of thepartial correctness lemma is applicable with Q� instead of Q, a�0 instead of a00,etc. This shows jja�jj � m+ jja�0jj > m+ jja00jj � m+ t:Now we consider the case D = +1.Subcase (+A) By assumption, jja0jj � m+ t. Also, we check that the partialcorrectness lemma is applicable to show jjb0jj < 1 + m + t. By a simple andwell-known fact, at most two steps of the usual Euclidean algorithm su�cesto reduce an integer remainder by half. Hence ~Q is a product of at most twoelementary matrices, as claimed.Subcase (+B) So jja0jj < m + t. Note that a� = b0 + qka0 and b� = a0 > 0.Note that detQ� = �1, and the conditions of the partial correctness lemma aresatis�ed using Q� instead of Q, etc. Hence, from subcase ({a) of that lemma,we get jja�jj � m+ t. Hence (30) and (31) holds. Q.E.D.It is best to understand the subcases of the Fixing Up lemma as follows:({A) Nothing to �x({B) Toggle({C) Correct slight overshoot in last partial quotient({D) Backup two steps(+A) Forward at most two steps(+B) Backup one stepIt is interesting to note that in tests on randomly generated numbers ofabout 45 digits, subcases ({A) and (+A) both occur in comparable frequenciesand in the overwhelming number of subcases; subcases ({B) and (+B) both alsooccur in comparable frequences, but considerably less frequently than the (A)subcases. Subcase ({C) occurred once but subcase ({D) never arose.13

This lemma (and its proof) provides us with a speci�c procedure to convertthe tentative output matrix Q of the HGCD algorithm into a valid one. To bespeci�c, let FIXUP1(Q; a; b;m; t)denote the subroutine that returns Q�, using the notations of the Fixing Uplemma. Of course, we assume that the input parameters satisfy the conditions(16){(20) and (29) for the Fixing Up lemma.In applications of the Fixing Up lemma, we rely on observation (*) thatjja00jj � 1 + djja0jj=2e > jjb00jj (cf. (24)) ensures (29). But as noted following(24), this generally means a0 � 8. Hence we must �x-up Q in a di�erentway when a0 � 7. We use the following alternative method. If a0 � 7, thenb div 2m = b0 � a0 � 1 � 6, and hence b < 7 � 2m. Again exploiting the well-known fact that starting from � ab �, at most 6 steps of the usual Euclideanalgorithm will give us a pair � a00b00 �,� ab � Q��! � a00b00 �where jja00jj � m > jjb00jj. We encode this in a simple (the obvious) subroutineFIXUP0(a; b;m)which returns the regular matrix Q�. Assuming we invoke FIXUP0(a; b;m) onlywhen a0 = 1 + (a mod 2m) � 7, Q� is a product of at most six elementarymatrices.5 The AlgorithmNow we describe the integer HGCD algorithm, prove its correctness and analyzeits complexity.Input: integers a; b with a > b � 0.Output: a regular matrix Q satisfying the correctness criteria (23) or (24).function HGCD (a; b);[1] m 1 + l jjajj2 m; f this is the magical threshold gif jjbjj < m then return (E);[2] a0 1 + (a div 2m); b0 b div 2m;t 1 + d jja0jj2 e;if a0 � 7 then return (FIXUP0(a; b;m));else R FIXUP1(HGCD (a0; b0); a; b;m; t);14

� a0b0 � R�1� ab�;[3] if jjb0jj < m then return(R);[4] q a0 div b0; � cd� � b0a0 mod b0�;[5] l djjcjje; k 2m� l � 1; fWe claim k � 0 g[6] c0 1 + (c div 2k); d0 d div 2k;t0 1 + l jjc0jj2 m;if c div 2m � 6 then return(R �� q 11 0 � � FIXUP0(c; d;m));else S FIXUP1(HGCD (c0; d0); c; d; k; t0); fWe claim c0 � 8 here g[7] � c0d0 � S�1� cd �;fWe claim d0 div 2m � 1 and jjc0jj � m+ 1 > jjd0jj gT FIXUP0(c0; d0;m);Q R �� q 11 0� � S � T ; return(Q);Correctness: HGCD returns in �ve places in the algorithm. We check thatthe matrix returned at each of these places is correct.a) In case the algorithm returns the identity matrix E in step [1], we notethat (23) or (24) holds. (As a matter of fact, if a � 8 then we would surelyreturn at step [1].)b) If a0 � 7 then we would return the resulting matrix from FIXUP0(a; b;m).By de�nition of FIXUP0, this matrix is correct, i.e., (24) holds.c) If a0 � 8, the �rst recursive call to HGCD returns some regular matrix R0which is �xed up as R by FIXUP1. If � a0b0 � R0�! � a00b00 � then, by inductionhypothesis, jja00jj � t > jjb00jj; a00 > b00 � 0. Using the observation (*), we see thatequations (20) and (29) hold. Hence we may apply the Fixing Up lemma, withR0 instead of Q: this means the matrix R returned by FIXUP1 reduces � ab �to � a0b0 � such that jja0jj � m + t > jjb0jj. If the test jjb0jj < m succeeds in step[3], then the matrix R returned by HGCD is clearly correct, i.e., (24) holds.d) Now suppose we reach step [6]. First we show that k � 0, i.e., 2m�1 � l.jjcjj = jjb0jj < m+ t = m+ 1 + � jj1 + (a div 2m)jj2 �� m+ 1+ �1 + bjja div 2mjjc2 � (since jj1 + xjj � 1 + bjjxjjc)15

� m+ 1+ �1 + bjja=2mjjc2 � (since bjjx div yjjc = bjjx=yjjc)< m+ 1+ �1 + bjjajjc �m2 �� (1 + djjajj=2e) + d(m+ 1)=2e= m+ d(m+ 1)=2e � 2m� 1;since m � 3 (using the fact a � 8). Hence l = djjcjje � 2m � 1, as desired. IfHGCD returns the matrix returned by FIXUP0 in this step, the result is clearlycorrect.e) Otherwise, the matrix Q formed in step [7] is returned. Suppose Q reduces� ab � to � a0b0 �. So we want to showjja0jj � m > jjb0jj; a0 > b0 � 0: (32)First we show the claim c0 � 8, stated at the end of step [6]. This would implythat the recursive call to HGCD(c0; d0) returns some matrix Q0 that satis�esthe analogue of (24). We know that c div 2m � 7 (from the test in step [6]).Hence c � 7 � 2m and l = djjcjje � 3 +m or l �m � 3.jjc0 div 2t0 jj > jjc0=2t0 jj � 1 = jjc0jj � t0 � 1= �2 + bjjc0jj=2c (by de�nition of t0)� �2 + bjjc div 2kjj=2c� �2 + �djjc=2kjje � 12 � (since jjx div yjj � djjx=yjje � 1)= �2 + (l �m) � 1 ( l � k = 2l� 2m+ 1)Hence jjc0 div 2t0 jj > 1 and c0 div 2t0 � 3. Then c div 2m � 7 impliesc div 2k � 7 (k � m) and t0 � 3. Thus c0 � (c0 div 2t0) � 2t0 � 24, which isconsiderably more than we need.The conditions of the Fixing Up lemma hold for the call to FIXUP1 in step[6]; again we use observation (*), with c's and d's instead of a's and b's, k insteadof m, t0 instead of t, Q0 instead of Q. This means the matrix S returned byFIXUP1 reduces � cd � to � c0d0 � such that jjc0jj � k + t0 > jjd0jj. We claimk + t0 = m+ 1:We see that l � k = 2l� 2m+ 1 andt0 = 1+ djjc0jj=2e16

= 1+ �djjc0jje2 �= 1+ �djj�+ (c=2k)jje2 � (0 < � � 1)= 1 + � l � k + �2 � (� = 0 or 1)= 1 + �2l� 2m+ 1 + �2 �= 2+ l �m:Thus k + t0 = m + 1, as desired. Hence d0 < 2m+1, d0 div 2m < 2. We havethus justi�ed the comments in step [7]. The matrix T returned by FIXUP0will clearly reduce � c0d0 � to our desired � a0b0 � in (32). This concludes ourcorrectness proof. Q.E.D.Complexity Analysis: The algorithm has to perform comparisons of thekind jjajj : m;and compute ceiling functions in the special formsdjjajje; djjajj=2e;where a;m are positive. (In the algorithm a may be zero, but for simplicityhere, we treat those as special cases.) Since jjajj is generally not rational, we donot want to explicitly compute it. Instead we can reduce the above operationsto checking if a is a power of two, and to computing the function hai, whichis de�ned to be the number of bits in the binary representation of a positiveinteger a. So hai = 1 + blog2 ac and clearly this function is easily computed inlinear time. Then we have jjajj � m, hai � 1 � mand jjajj > m, � hai � 1 > m if a is a power of 2hai > m else.and �nally, � jjajj2 � = 8>><>>: l hai�12 m if a is a power of 2l hai2 m elseThe global structure of the complexity analysis is standard [1]: let M(n)denote the time complexity of multiplying two n-bit integers, H(n) the time17

complexity of our HGCD algorithm where both inputs have at most n bits. Itis known that M(n) = O(n logn log logn). It is not hard to see that FIXUP0and FIXUP1 takes time O(M(n)). HenceH(n) = O(M(n)) + 2H(n=2 +O(1)):Using the fact that n = O(M(n)) andM(n+O(1)) =M(n)+O(n), we concludethat H(n) = O(M(n) logn) = O(n log2 n log logn).The HGCD algorithm as presented is not locally optimized, since we preferwhenever possible to make the correctness proof more transparent. Both theinteger and polynomial HGCD algorithms have been implemented and testedusing Mathematica.6 SummaryWe have developed an integer analogue of the polynomial HGCD algorithm.Its proof of correctness is modeled after corresponding properties used to showthe correctness of the polynomial HGCD. Although the general outline of theinteger case is not hard to understand because of its analogy to the polynomialcase, it is still rather delicate and detailed. Further simpli�cation of our integerHGCD algorithm should be possible.Acknowledgement: Chee Yap would like to thank Lars Ericson for helpingto implement and test the polynomial and integer HGCD algorithms in thecomputer algebra system Mathematica.APPENDIXWe construct an example to show where the polynomial HGCD algorithm of [1](henceforth referred to as the `Old HGCD') breaks down. Roughly speaking,the output criterion for HGCD algorithms has two components: (i) the outputmatrix gives rise to a pair of consecutive members of the remainder sequence,and (ii) the degrees of these two members must \straddle" half the originalmaximum degree. We will construct input polynomials that violates (ii) butnot (i). This means that the complexity analysis of Old HGCD is invalid. Wehave not yet found any example that violates (i). Indeed, we do not even knowif Old HGCD algorithm terminates on all input polynomials. We will say anoutput matrix of a HGCD algorithm is pseudo-correct if it satis�es (i).It is worth remarking that �nding a pair of bad input polynomials to theOld HGCD algorithm requires some thought. This is because it is known thatOld HGCD is correct for inputs with so-called regular remainder sequences, thatis, where the degrees of consecutive members of the remainder sequence di�erby one (except for the initial pair). This was in the original proof of Moenck18

[9]. Furthermore, randomly chosen inputs are highly likely to be regular (sinceirregular remainder sequences depend on vanishing of certain subdeterminants).The Old HGCD algorithm from [1] is as follows. Input are polynomials a; bwith deg(a) > deg(b) � 0. Output is supposed to be an inverse regular matrixT with � gh� = T � ab�such that deg(g) > �deg(a)2 � � deg(h):(Matrix T is roughly the inverse of our R.)function OldHGCD (a; b);[1] m jdeg(a)2 k;if deg(b) � m then return(E);[2] a1 a div xm; b1 b div xm; R OldHGCD (a1; b1); � de� R� ab�;[3] f at this point, nothing prevents e to be the 0-polynomial g[4] f dmod e; q d div e;[5] k �m2 �;[6] e1 e div xk ; f1 f div xk; S OldHGCD (e1; f1);[7] return(S �� 0 11 �q� �R);Note that the matrix R in our algorithm is the inverse of theirs, but this isan unessential point.First of all, we note that at the (null) step [3], the polynomial e may bezero, and the next step would give us a divide-by-zero error. But this is easily�xed: we should check here if deg(e) < m, and if so, immediately return thematrix R. The non-trivial error is that in step [5], the choice of k as m div 2 isin general incorrect. Accordingly we cook up a pair of polynomials to illustratethis. Consider the pair of polynomialsf0 = x20 + x17 + x15 + x12 + x9and g0 = x16 + x13 + x11:Their remainder sequence and corresponding partial quotients arex12 + x9; x11; x9 (33)x4; x4; x; x2: (34)The two remainders conforming to Old HGCD's output speci�cations arethe two of degree 11 and 9 (i.e., 11 > b20=2c � 9) for which the transformation19

matrix is � �x4 x8 + 1x5 + 1 �x9 � x4 � x � : (35)(For comparability, this is the inverse of the matrix returned by our HGCD.)Old HGCD, however, returns the matrix� x5 + 1 �x9 � x4 � x�x7 � x4 � x2 x11 + x8 + x6 + x3 + 1 � (36)which yields the two remainders of degree 9 and �1.A detailed trace of Old HGCD's operations discloses why this is so. The�rst subcall of Old HGCD returns (correctly) the remainders of degrees 16 and12, and a subsequent mid step division yields the next remainder of degree11. Now k is (incorrectly) calculated as 5 and so the arguments to the secondsubcall of Old HGCD are the truncated remainders of degrees 6 and 5, whence(again correctly) the remainders of degrees 4 and �1 are computed which arepost-processed to be the remainders of degree 9 and �1.Nevertheless, the matrix is pseudo-correct. We extend this example to showeven stranger phenomena. Letf = x40 + x37 + x35 + x32 + x29and g = x36 + x33 + x31 + x19:So the original pair f0; g0 is equal to f div x20; g div x20. Here and be-low, we have underlined the \added terms" which comes from f mod x20 andg mod x20. The remainder sequence of f; g begins withx32 + x29 � x23; x31 + x27 + x19; x29 � x28 � x23 � x20;x28 + x27 + x25 + x24 + x23 + x22 + x21 + x20 + x19(cf. (33)) and corresponding partial quotients arex4; x4; x; x2 + x+ 1(cf. (34)).The correct matrix (35) applied to the vector � fg � yields the consecutiveremainders x31 + x27 + x19; x29 � x28 � x23 � x20; (37)while the incorrect matrix (36) computes the pairx29 � x28 � x23 � x20; x30 + x27 + x25 + x22 + x19: (38)20

Note that the degrees in (38) are in inverted order: this behaviour is not un-expected from our analysis! When compared to the degrees in (37), it is clearthat (38) cannot belong to the same strict remainder sequence as (37). This er-ror occurs within the computation of Old HGCD(f; g), and the surprise is thatOld HGCD nevertheless returns a matrix that is pseudo-correct: the matrixreturned by Old HGCD computes the pair4(�x24 + x20 + x19); (2x23 + x21 � x19)=4while the correct HGCD computes32x20 + 12x19; 23x19=256:Both are still consecutive members of the remainder sequence! Another surpriseis that the degrees of the remainders returned by the Old HGCD turned outto be more than the correct degrees (initially they were less). The interestedreader may further compare the behaviour of the HGCD's on other variationsof f; g (say, f + x19; g � x19, and x40f + x39; x40g).References[1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The design and analysis ofcomputer algorithms. Addison-Wesley, 1974.[2] R. P. Brent, F. G. Gustavson, and D. Y. Y. Yun. Fast solution of Toeplitzsystems of equations and computation of Pade approximants. J. Algo-rithms, 1:259{295, 1980.[3] B. Buchberger, G. E. Collins, and R. Loos (eds.). Computer Algebra.Springer-Verlag, 1983.[4] E. Kaltofen. Greatest common divisors of polynomials given by straight-line programs. Journal of the ACM, 35:231{264, 1988.[5] E. Kaltofen and H. Rolletschek. Computing greatest common divisors andfactorizations in quadratic number �elds. Math. Comp., 52:697{720, 1989.[6] D. E. Knuth. The analysis of algorithms. In Actes du Congr�es Internationaldes Math�ematiciens, pages 269{274, Nice, France, 1970. Gauthier-Villars.[7] D. E. Knuth. The Art of Computer Programming: Seminumerical Algo-rithms, volume 2. Addison-Wesley, 2nd edition edition, 1981.[8] D.H. Lehmer. Titlexxx. American Mathematical Monthly, 45:227{233,1937. 21

[9] R. Moenck. Fast computations of GCD's. Proc. 5th ACM Symposium onTheory of Computation, pages 142{171, 1973.[10] Oskar Perron. Die Lehre von den Kettenbr�uchen. Teubner, Leipzig, 2ndedition edition, 1929.[11] Arnold Sch�onhage. Schnelle Berechnung von Kettenbruchentwicklungen.Acta Informatica, 1:139{144, 1971.[12] Arnold Sch�onhage. Probabilistic computation of integer polynomial gcd's.J. Algorithms, 9:365{371, 1988.[13] Volker Strassen. The computational complexity of continued fractions.SIAM Journal Computing, 12:1{27, 1983.[14] David Y. Yun. Uniform bounds for a class of algebraic mappings. SIAMJournal Computing, 12:348{356, 1983.

22

Documents

A Uni ed Approach to HGCD Algorithms for polynomials and integers