2. k1 = idiag(j)

3. k2 = idiag(j+1) “ 1

4. len = idiag(j+1) “ k1

5. y(1:len) = y(1:len) + dj(k1:k2)*x(jdiag(k1:k2))

6. EndDo

Since the rows of the matrix have been permuted, the above code will compute

U U

, a permutation of the vector , rather than the desired . It is possible to permute

the result back to the original ordering after the execution of the above program. This

operation can also be performed until the ¬nal solution has been computed, so that only

two permutations on the solution vector are needed, one at the beginning and one at the

end. For preconditioning operations, it may be necessary to perform a permutation before

or within each call to the preconditioning subroutines. There are many possible variants

of the jagged diagonal format. One variant which does not require permuting the rows is

described in Exercise 8.

¦ `IbC@

Ce

¥ D!¥ Ce ¨9gf IIe 9

Y B Pcd hHI!9

2 2 I6

6 cbY

Ha UB

R P B RHY HB

BH P Y

¡

Given a sparse linear system to be solved on a distributed memory environment, it is natural

to map pairs of equations-unknowns to the same processor in a certain predetermined way.

This mapping can be determined automatically by a graph partitioner or it can be assigned

ad hoc from knowledge of the problem. Assume that there is a convenient partitioning of

the adjacency graph. Without any loss of generality, the matrix under consideration can be

viewed as originating from the discretization of a Partial Differential Equation on a certain

domain. This is illustrated in Figure 11.8. Initially, assume that each subgraph (or subdo-

main, in the PDE literature) is assigned to a different processor, although this restriction

r¡

¶ 8˜ m ’ ”p£8 ¡©

8¥ $

¡¨ ¨¦ ¤

© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

can be relaxed, i.e., a processor can hold several subgraphs to increase parallelism.

External interface

Internal points

points

Internal interface

points

Dd ¥ ¤¥£ ¢

¡

§§

Decomposition of physical domain or adjacency

graph and the local data structure.

A local data structure must be set up in each processor (or subdomain, or subgraph)

which will allow the basic operations such as (global) matrix-by-vector products and pre-

conditioning operations to be performed ef¬ciently. The only assumption to make regard-

ing the mapping is that if row number is mapped into processor , then so is the unknown

, i.e., the matrix is distributed row-wise across the processors according to the distribution

of the variables. The graph is assumed to be undirected, i.e., the matrix has a symmetric

pattern.

It is important to “preprocess the data” in order to facilitate the implementation of the

communication tasks and to gain ef¬ciency during the iterative process. The preprocessing

requires setting up the following: information in each processor.

List of processors with which communication will take place. These are called

“neighboring processors” although they may not be physically nearest neighbors.

List of local nodes that are coupled with external nodes. These are the local inter-

face nodes.

Local representation of the distributed matrix in each processor.

’™8 ¤ "¡ } ¡ } ¨8p

©w w ¥ r

’ 8

¤ ¨& ¦§ Y¡

1

In order to perform a matrix-by-vector product with a distributed sparse matrix, the matrix

consisting of rows that are local to a given processor must be multiplied by some global

£

vector . Some components of this vector will be local, and some components must be

brought from external processors. These external variables correspond to interface points

belonging to adjacent subdomains. When performing a matrix-by-vector product, neigh-

boring processors must exchange values of their adjacent interface nodes.

s

Internal points ( )

¦S

¡¢ ¡

§

(s

Local interface points ( )

£

¢

t

External interface matrix

¦p

§

t·D#

§§

¥£ ¥

¤¡ ¥

The local matrices and data structure associ-

ated with each subdomain.

Let be the local part of the matrix, i.e., the (rectangular) matrix consisting of all

¤¢ ¡

the rows that are mapped to myproc. Call the “diagonal block” of located in , ¤¢ ¡ ¤¢ ¡

§

S

i.e., the submatrix of whose nonzero elements are such that is a local variable.