for convergence. One source of failure is the instability of the preconditioning operation. These

phenomena of instability have been studied by Elman [81] who proposed a detailed analysis of ILU

and MILU preconditioners for model problems. The theoretical analysis on ILUT stated as Theorem

10.3 is modeled after Theorem 1.14 in Axelsson and Barker [16] for ILU(0).

Some theory for block preconditioners is discussed in Axelsson™s book [15]. Different forms of

block preconditioners were developed independently by Axelsson, Brinkkemper, and Il™in [17] and

by Concus, Golub, and Meurant [61], initially for block matrices arising from PDEs in two dimen-

sions. Later, some generalizations were proposed [137]. Thus, the 2-level implicit-explicit precon-

RSP §

ditioning introduced in [137] consists of using sparse inverse approximations to for obtaining

¡

§

.

The current rebirth of approximate inverse preconditioners [112, 62, 137, 54] is spurred by both

parallel processing and robustness considerations. Other preconditioners which are not covered here

are those based on domain decomposition techniques. Some of these techniques will be reviewed in

Chapter 13.

4

On another front, there is also increased interest in methods that utilize Normal Equations in

–£– e

one way or another. Earlier, ideas revolved around shifting the matrix before applying the

7

IC(0) factorization as was suggested by Kershaw [134] in 1977. Manteuffel [148] also made some

suggestions on how to select a good in the context of the CGW algorithm. Currently, new ways

of exploiting the relationship with the QR (or LQ) factorization to de¬ne IC(0) more rigorously are

being explored; see the recent work in [222].

¡

p

¶

the ¬rst vector computers appeared. Currently, there is a larger effort to develop new prac-

volved implementations on vector computers. These efforts started in the mid 1970s when

The ¬rst considerations for high-performance implementations of iterative methods in-

conjunction with iterative solvers to develop robust preconditioners.

rather small set of computational kernels. Increasingly, direct solvers are being used in

implement on parallel computers than sparse direct methods because they only require a

methods is their low storage requirement. Another advantage is that they are far easier to

are starting to play a major role in many application areas. The main appeal of iterative

cost associated with sparse direct methods for solving these problems, iterative techniques

Because of the increased importance of three-dimensional models combined with the high

puting on the design of iterative methods for solving large linear systems of equations.

The remaining chapters of this book will examine the impact of high performance com-

B |

—E•–“Xd“–…x•

… “"

‘beqD‚RIvVtrCGqDturIscqRWfVRDA … …iEFGBs

YIA S cpIA YQs S a… CF

C X … a aA lAs …A Al IA T A F YF Y e S CV … A F C …FCA eI a c

RVHy … ifF7 cubGfsebGvV DBspR”DbAG“BpuB9iuB9k”Rf9Y cRGt iF™Bp scYqBoID`AY R7DRl csuWfVq… …”{

C Y F9 C Y … A 8‘ C YFa C A Y c elV A A eF9s S c { IF eI Y

etD`AqsRBDa B`AqF B9@RqryRVGtA`qb™Yetr B9‚IHiu9Y‚Sb csuBRDAB… r…@lBiiuWV scq`FY

qD‚DA …RrSc}D%{A GrpDBBVR“Bp RD… r…”7DA`qRBD™e G”DUfSe c BA … RhFGFBwbbBRFBRD‚ABuB9

wIA S s XV c CAp IF A ct c {C YsF9a c9 8‘ … … C s lAaI 9IA pF

9 c9 e CV F A YF C …F s …A Al Y c 9aF CssF l e 9 8 YI …F cQd

Da G”{}”SR9Y ciDt …R”Bp scqBxIBA`Y RPWV DBpRU•V}e DDGVtGRRBIvfVfa`A”AB@f‘qDA RBp G7bA

…RF scqDBQ7fd`A™eY c vVtCUbGtuiBDªDu”uB9 iuf9Ys“uIGy …oDRBRUAB9”baBIcsGREFDBDGrIc

YIA e S X lA IF9a YVI eF lV A S t c CAlIQ Y A e… CAIAt

BiF•YteDARBR|V•Y DA qcBbªe HY HYuB9ie @qIiD‚{A Gie G9kB”GB•FqiFBGlB“B@Wb”R9fY cRGt

lI C lIQ C eFA c c F Y c Y cVs cp c Y XV At YI p F A9 8‘e S CV

…F CFl S X eVs CA AIA9 … … Fs a Y A C 9 8 C X e

w i|ElGRBIR`FYe vVtC“A …RTsc•eGDiDBpBDB”{ fSe c BA … RhFCGBPeYDEFCuys”Ytep AB”7‘GFHuQR9fY

feB7d GRBbA`”Bp scqEFD`AY RDA … iEFCGB|urIGsvV BBpRl RV™feBDBVtGRRDiFBWV scY WlBEFCf}o{PDfAT

AQ cI9a Y A Y C c… … Fs t c …A A C X A9aF CssF… I c Y V Y I A

BpuBEADB9@i‘bfeEARfQsYb`AY GBhaGGDA … iEFGB|IvV}uIvV scYq`FqD‚DA RrSc RVife`AqRl WBRbA‚rScGs

A F9 CA 8 C a c9 CF… … CFs e YIA S …s C X YF clIFa C

Bf9EAGFyBf9HqBf9A`e c„DosCRfQewuIe ‚Y c‚buWV scqB7biD”e”D`AYuyrGbABrIc ttGF ªihVurIc

A Y C A Y YF Y C V c jeI YFQdA XV S e e CF … A C … C X t

… As F F el A A Y C c a c ‘e YF …ssF t C A c IA IF Y

w ifFRsRPEACG‚iVu9Y‚SBp qcuEFDA•Y AbBrI‚ buIvV csqb ca RRRurIcDAfBrIituDwlBRPav scqIBA cu`ae

rIU`Ae cGF@qB9fH”DA RvVtGA …Rba`eGttGF … urIGp WV•e RVDBVtDRiFkD`AqRWfVEaDRQ|ElGRl

c C YF Y e S …T Cs F A C t c … C X 9aF Css C YQs S CAs e CF

qIR`FY“Bf9YV”Bp qcuBIoD•AY RPBp csuDGisyBrIRFwurIcSvVb“YBFªe cwurIscqRvVRDA … …iEFGv§

w e A Y A YF C …F A eIAs A c I t aAT e X t YQs S a… CF

) 6 ) ¨ ¨""

v#$&'#© #© "!¡ g6 #© ¡"$ $ ¡

¦¥ ¤£ ¢ ¡

p

¶ 8˜ m ’ ”p£8 ¡©

8¥ $

¡ ¡¨ ¨¦ ¤

© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

tical iterative methods that are not only ef¬cient in a parallel environment, but also robust.

Often, however, these two requirements seem to be in con¬‚ict.

This chapter begins with a short overview of the various ways in which parallelism has

been exploited in the past and a description of the current architectural models for existing

commercial parallel computers. Then, the basic computations required in Krylov subspace

methods will be discussed along with their implementations.

t˜ª“™ª ‚ 3˜ —

•

$m|

Parallelism has been exploited in a number of different forms since the ¬rst computers were

built. The six major forms of parallelism are: (1) multiple functional units; (2) pipelining;

(3) vector processing; (4) multiple vector pipelines; (5) multiprocessing; and (6) distributed

computing. Next is a brief description of each of these approaches.

¦ ¥

W @ DC bY T @ f

73452 6I6

62 HT P B Y AW hS¤rbY

P @ T 9 W UP

This is one of the earliest forms of parallelism. It consists of multiplying the number of

functional units such as adders and multipliers. Thus, the control units and the registers

are shared by the functional units. The detection of parallelism is done at compilation time

with a “Dependence Analysis Graph,” an example of which is shown in Figure 11.1.

+

+ +

a b * *

c e

d f

§5 Dd ¥ ¤¥£ ¢

¡

§§

Dependence analysis for arithmetic expression:

¨` w a w £` w

§ §

. a

¢ ¢

In the example of Figure 11.1, the two multiplications can be performed simultaneously,

then the two additions in the middle are performed simultaneously. Finally, the addition at

the root is performed.