e2

n

en+1 = . (1.2)

2 + 2en

This implies that e1 > 0 so after the ¬rst step we are always overshooting the

mark. Now 2en < 2 + 2en so (1.2) implies that

1

en+1 < en

2

so the error is cut in half (at least) at each stage and hence, in particular,

x1 > x 2 > · · · ,

the iterates are steadily decreasing. Eventually we will reach the stage that

en < 1.

From this point on, we use the inequality 2 + 2en > 2 in (1.2) and we get the

estimate

1

en+1 < e2 . (1.3)

2n

So if we renumber our approximation so that 0 ¤ e0 < 1 then (ignoring the 1/2

factor in (1.3)) we have n

0 ¤ e n < e2 , (1.4)

0

an exponential rate of convergence.

If we had started with an x0 < 0 then all the iterates would be < 0 and we

√

would get exponential convergence to ’ a. Of course, had we been so foolish

as to pick x0 = 0 we could not get the iteration started.

7

1.2. NEWTON™S METHOD

1.2 Newton™s method

This is a generalization of the above algorithm to ¬nd the zeros of a function

P = P (x) and which reduces to (1.1) when P (x) = x2 ’ a. It is

P (xn )

xn+1 = xn ’ . (1.5)

P (xn )

If we take P (x) = x2 ’ a then P (x) = 2x the expression on the right in (1.5)

is

1 a

xn +

2 xn

so (1.5) reduces to (1.1).

Also notice that if x is a “¬xed point” of this iteration scheme, i.e. if

P (x)

x=x’

P (x)

then P (x) = 0 and we have a solution to our problem. To the extent that xn+1

is “close to” xn we will be close to a solution (the degree of closeness depending

on the size of P (xn )).

In the general case we can not expect that “most” points will converge to

a zero of P as was the case in the square root algorithm. After all, P might

not have any zeros. Nevertheless, we will show in this section that if we are

“close enough” to a zero - that P (x0 ) is “su¬ciently small” in a sense to be

made precise - then (1.5) converges exponentially fast to a zero.

1.2.1 The guts of the method.

Before embarking on the formal proof, let us describe what is going on on the

assumption that we know the existence of a zero - say by graphically plotting

the function. So let z be a zero for the function f of a real variable, and let x

be a point in the interval (z ’ µ, z + µ) of radius µ about z. Then

z

’f (x) = f (z) ’ f (x) = f (s)ds

x

so z

’f (x) ’ (z ’ x)f (x) = (f (s) ’ f (x))ds.

x

Assuming f (x) = 0 we may divide both sides by f (x) to obtain

z

f (x) 1

x’ ’z = (f (s) ’ f (x))ds. (1.6)

f (x) f (x) x

Assume that for all y ∈ (z ’ µ, z + µ) we have

|f (y)| ≥ ρ > 0 (1.7)

|f (y1 ) ’ f (y2 )| ¤ δ|y1 ’ y2 | (1.8)

µ ¤ ρ/δ. (1.9)

8 CHAPTER 1. ITERATIONS AND FIXED POINTS

Then setting x = xold in (1.6) and letting

f (x)

xnew := x ’

f (x)

in (1.6) we obtain

z

δ δ

|xold ’ z|2 .

|xnew ’ z| ¤ |s ’ xold |ds =

ρ 2ρ

xold

Since |xold ’ z| < µ it follows that

1

|xnew ’ z| ¤ µ

2

by 1.9). Thus the iteration

f (x)

x’x’ (1.10)

f (x)

is well de¬ned. At each stage it more than halves the distance to the zero and

has the quadratic convergence property

δ

|xold ’ z|2 .

|xnew ’ z| ¤

2ρ

As we have seen from the application of Newton™s method to a cubic, unless

the stringent hypotheses are satis¬ed, there is no guarantee that the process

will converge to the nearest root, or converge at all. Furthermore, encoding

a computation for f (x) may be di¬cult. In practice, one replaces f by an

approximation, and only allows Newton™s method to proceed if in fact it does

not take us out of the interval. We will return to these points, but ¬rst rephrase

the above argument in terms of a vector variable.

1.2.2 A vector version.

Now let f a function of a vector variable, with a zero at z and x a point in the

ball of radius µ centered at z. Let vx := z ’ x and consider the function

t :’ f (x + tvx )

which takes the value f (z) when t = 1 and the value f (x) when t = 0. Di¬er-

entiating with respect to t using the chain rule gives f (x + tvx )vx (where f

denotes the derivative =(the Jacobian matrix) of f . Hence

1

’f (x) = f (z) ’ f (x) = f (x + tvx )vx dt.

0

This gives

1

’f (x) ’ f (x)vx = ’f (x) ’ f (x)(z ’ x) = [f (x + tvx ) ’ f (x)]vx dt.

0

9

1.2. NEWTON™S METHOD

Applying [f (x)]’1 (which we assume to exist) gives the analogue of (1.6):

1

’1 ’1

x ’ [f (x)] f (x) ’ z = [f (x) ] [f (x + tvx ) ’ f (x)]vx dt.

0

Assume now

[f (y)]’1 ¤ ρ’1 (1.11)

f (y1 ) ’ f (y2 ¤ δ y 1 ’ y2 (1.12)

for all y, y1 , y2 in the ball of radius µ about z, and assume also that (1.9) holds.