max(len(a), len(b)). In particular, you should verify that all of the divisions

62 Euclid™s algorithm

by 2 performed by the algorithm yield integer results. Moreover, show that

the outputs s and t are of length O( ).

4.3 Computing modular inverses and Chinese remaindering

One application of the extended Euclidean algorithm is to the problem of

computing multiplicative inverses in Zn , where n > 1.

Given y ∈ {0, . . . , n ’ 1}, in time O(len(n)2 ), we can determine if y is

relatively prime to n, and if so, compute y ’1 mod n, as follows. We run

the extended Euclidean algorithm on inputs a := n and b := y, obtaining

integers d, s, and t, such that d = gcd(n, y) and ns + yt = d. If d = 1, then

y does not have a multiplicative inverse modulo n. Otherwise, if d = 1, then

t is a multiplicative inverse of y modulo n; however, it may not lie in the

range {0, . . . , n ’ 1}, as required. Based on Theorem 4.3 (and the discussion

immediately following it), we know that |t| ¤ n/2 < n; therefore, either

t ∈ {0, . . . , n ’ 1}, or t < 0 and t + n ∈ {0, . . . , n ’ 1}. Thus, y ’1 mod n is

equal to either t or t + n.

We also observe that the Chinese remainder theorem (Theorem 2.8) can

be made computationally e¬ective:

Theorem 4.5. Given integers n1 , . . . , nk and a1 , . . . , ak , where n1 , . . . , nk

are pairwise relatively prime, and where ni > 1 and 0 ¤ ai < ni for i =

1, . . . , k, we can compute the integer z, such that 0 ¤ z < n and z ≡

ai (mod ni ) for i = 1, . . . , k, where n := i ni , in time O(len(n)2 ).

Proof. Exercise (just use the formulas in the proof of Theorem 2.8, and see

Exercises 3.22 and 3.23). 2

Exercise 4.3. In this exercise and the next, you are to analyze an “incre-

mental Chinese remaindering algorithm.” Consider the following algorithm,

which takes as input integers z, n, z , n , such that

n > 1, gcd(n, n ) = 1, 0 ¤ z < n, and 0 ¤ z < n .

It outputs integers z , n , such that

n = nn , 0 ¤ z < n , z ≡ z (mod n), and z ≡ z (mod n ).

It runs as follows:

1. Set n ← n’1 mod n .

˜

2. Set h ← ((z ’ z)˜ ) mod n .

n

4.4 Speeding up algorithms via modular computation 63

3. Set z ← z + nh.

4. Set n ← nn .

5. Output z , n .

Show that the output z , n of the algorithm satis¬es the conditions stated

above, and estimate the running time of the algorithm.

Exercise 4.4. Using the algorithm in the previous exercise as a subroutine,

give a simple O(len(n)2 ) algorithm that takes as input integers n1 , . . . , nk

and a1 , . . . , ak , where n1 , . . . , nk are pairwise relatively prime, and where

ni > 1 and 0 ¤ ai < ni for i = 1, . . . , k, and outputs integers z and n such

that 0 ¤ z < n, n = i ni , and z ≡ ai (mod ni ) for i = 1, . . . , k. The

algorithm should be “incremental,” in that it processes the pairs (ni , ai ) one

at a time, using time O(len(n) len(ni )) to process each such pair.

Exercise 4.5. Suppose you are given ±1 , . . . , ±k ∈ Z— . Show how to com-

n

’1 ’1

pute ±1 , . . . , ±k by computing one multiplicative inverse modulo n, and

performing less than 3k multiplications modulo n. This result is useful, as

in practice, if n is several hundred bits long, it may take 10“20 times longer

to compute multiplicative inverses modulo n than to multiply modulo n.

4.4 Speeding up algorithms via modular computation

An important practical application of the above “computational” version

(Theorem 4.5) of the Chinese remainder theorem is a general algorithmic

technique that can signi¬cantly speed up certain types of computations in-

volving long integers. Instead of trying to describe the technique in some

general form, we simply illustrate the technique by means of a speci¬c ex-

ample: integer matrix multiplication.

Suppose we have two m — m matrices A and B whose entries are large

integers, and we want to compute the product matrix C := AB. If the

entries of A are (ars ) and the entries of B are (bst ), then the entries (crt )

of C are given by the usual rule for matrix multiplication:

m

crt = ars bst .

s=1

Suppose further that H is the maximum absolute value of the entries

in A and B, so that the entries in C are bounded in absolute value by

H := H 2 m. Then by just applying the above formula, we can compute

the entries of C using m3 multiplications of numbers of length at most

len(H), and m3 additions of numbers of length at most len(H ), where

64 Euclid™s algorithm

len(H ) ¤ 2 len(H) + len(m). This yields a running time of

O(m3 len(H)2 + m3 len(m)). (4.1)

If the entries of A and B are large relative to m, speci¬cally, if

len(m) = O(len(H)2 ), then the running time is dominated by the

¬rst term above, namely

O(m3 len(H)2 ).

Using the Chinese remainder theorem, we can actually do much better

than this, as follows.

For any integer n > 1, and for all r, t = 1, . . . , m, we have

m

crt ≡ ars bst (mod n). (4.2)

s=1

Moreover, if we compute integers crt such that

m

crt ≡ ars bst (mod n) (4.3)

s=1

and if we also have

’n/2 ¤ crt < n/2 and n > 2H , (4.4)

then we must have

crt = crt . (4.5)

To see why (4.5) follows from (4.3) and (4.4), observe that (4.2) and (4.3)

imply that crt ≡ crt (mod n), which means that n divides (crt ’ crt ). Then

from the bound |crt | ¤ H and from (4.4), we obtain

|crt ’ crt | ¤ |crt | + |crt | ¤ H + n/2 < n/2 + n/2 = n.

So we see that the quantity (crt ’ crt ) is a multiple of n, while at the

same time this quantity is strictly less than n in absolute value; hence, this

quantity must be zero. That proves (4.5).

So from the above discussion, to compute C, it su¬ces to compute the

entries of C modulo n, where we have to make sure that we compute “bal-

anced” remainders in the interval [’n/2, n/2), rather than the more usual

“least non-negative” remainders.

To compute C modulo n, we choose a number of small integers n1 , . . . , nk ,

relatively prime in pairs, and such that the product n := n1 · · · nk is just a

bit larger than 2H . In practice, one would choose the ni to be small primes,

and a table of such primes could easily be computed in advance, so that all

4.4 Speeding up algorithms via modular computation 65

problems up to a given size could be handled. For example, the product of

all primes of at most 16 bits is a number that has more than 90, 000 bits.

Thus, by simply pre-computing and storing such a table of small primes,

we can handle input matrices with quite large entries (up to about 45, 000

bits).

Let us assume that we have pre-computed appropriate small primes

n1 , . . . , nk . Further, we shall assume that addition and multiplication mod-

ulo any of the ni can be done in constant time. This is reasonable, both from

a practical and theoretical point of view, since such primes easily “¬t” into

a memory cell. Finally, we assume that we do not use more of the numbers

ni than are necessary, so that len(n) = O(len(H )) and k = O(len(H )).

To compute C, we execute the following steps:

1. For each i = 1, . . . , k, do the following:

(i)

(a) compute ars ← ars mod ni for r, s = 1, . . . , m,

ˆ

(i)

(b) compute ˆst ← bst mod ni for s, t = 1, . . . , m,

b