to is exact, i.e., when , then the block preconditioner , to induced '¢ '

u

D

from is also exact.

Although the previous results indicate that a Preconditioned Schur Complement iter-

ation is mathematically equivalent to a certain preconditioned full matrix method, there

are some practical bene¬ts in iterating with the nonreduced system. The main bene¬t in- £

volves the requirement in the Schur Complement techniques to compute exactly at |

£

each Krylov subspace iteration. Indeed, the matrix represents the coef¬cient matrix of

the linear system, and inaccuracies in the matrix-by-vector operation may result in loss £

of convergence. In the full matrix techniques, the operation is never needed explic- |

itly. In addition, this opens up the possibility of preconditioning the original matrix with

approximate solves with the matrix in the preconditioning operation and . '¢ '

’¥

…”x ri

¥

6QA

3 “6

3

£

—˜ •

C

£

¢

The very ¬rst task that a programmer faces when solving a problem on a parallel computer,

be it a dense or a sparse linear system, is to decide how to map the data into the processors.

For shared memory and SIMD computers, directives are often provided to help the user

input a desired mapping, among a small set of choices. Distributed memory computers

are more general since they allow mapping the data in an arbitrary fashion. However, this

added ¬‚exibility puts the burden on the user to ¬nd good mappings. In particular, when

implementing Domain Decomposition ideas on a parallel computer, ef¬cient techniques

must be available for partitioning an arbitrary graph. This section gives an overview of the

issues and covers a few techniques.

§F ¤

'#$¤ $%"

# 5 I H

) ("

F 9¥F

F ID(

}

Consider a general sparse linear system whose adjacency graph is . There are 5t ”x

D

two issues related to the distribution of mapping a general sparse linear system on a num-

ber of processors. First, a good partitioning must be found for the original problem. This

translates into partitioning the graph into subgraphs and can be viewed independently

from the underlying architecture or topology. The second issue, which is architecture de-

pendent, is to ¬nd a good mapping of the subdomains or subgraphs to the processors, after

—t

7 u pU¢ 7 p z7 yw— y w u

{ {˜p p { p

˜

# “£§

¢

! hC §¤ !

¤ …S !

§

¥

¤ ¤

the partitioning has been found. Clearly, the partitioning algorithm can take advantage of a

measure of quality of a given partitioning by determining different weight functions for the

vertices, for vertex-based partitionings. Also, a good mapping could be found to minimize

communication costs, given some knowledge on the architecture.

Graph partitioning algorithms address only the ¬rst issue. Their goal is to subdivide the

graph into smaller subgraphs in order to achieve a good load balancing of the work among

the processors and ensure that the ratio of communication over computation is small for

the given task. We begin with a general de¬nition.

9 10 11 12

¤2

2

7

5 6 8

V2

c

¨2

1 2 3 4

™

& $"

%#!

Mapping of a simple mesh to 4 processors.

¢

¤r ¥ ™#

¨ ¥ ¢ ¢