t

Similarly, call the “offdiagonal” block, i.e., the submatrix of whose nonzero ¤¢ ¡

¦p

§

S

elements are such that is not a local variable. To perform a matrix-by-vector product,

£ ¥

start multiplying the diagonal block by the local variables. Then, multiply the external ¤¢ ¡

§

t

variables by the sparse matrix . Notice that since the external interface points are not

¦p

§ s w s t

® ®

coupled with local internal points, only the rows to in the matrix ¤¢ ¡ ¦p

§

¦S

G

will have nonzero elements. Thus, the matrix-by-vector product can be separated into two

such operations, one involving only the local variables and the other involving external

variables. It is necessary to construct these two matrices and de¬ne a local numbering of

the local variables in order to perform the two matrix-by-vector products ef¬ciently each

time.

To perform a global matrix-by-vector product, with the distributed data structure de-

scribed above, each processor must perform the following operations. First, multiply the

local variables by the matrix . Second, obtain the external variables from the neigh-

¤¢ ¡

§

t

boring processors in a certain order. Third, multiply these by the matrix and add the ¦p

§

resulting vector to the one obtained from the ¬rst multiplication by . Note that the ¡¢ ¡

§

¬rst and second steps can be done in parallel. With this decomposition, the global matrix-

by-vector product can be implemented as indicated in Algorithm 11.10 below. In what

follows, is a vector of variables that are local to a given processor. The components

¤¢ ¡

corresponding to the local interface points (ordered to be the last components in for ¡¢ ¡

(s

convenience) are called . The external interface points, listed in a certain order, con-

¥

¢

t

vP¨ ®

®t ¯

stitute a vector which is called . The matrix is a sparse matrix which ¤¢ ¡

¦p § ¨

represents the restriction of to the local variables . The matrix operates on the ¡¢ ¡ ¦p

§

8˜ m ’ ”p£8 ¡©

8¥ $

¡ ¡¨ ¨¦ ¤

© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

t

external variables to give the correction which must be added to the vector ¡¢ ¡ ¡¢ ¡

¦p §

Ux`

in order to obtain the desired result . ¡¢ ¡

a

¡ £¢£ 7¨ ¦££ " 0§5D§ wE¤©n¦§¥£¢ 0 ¦ ! %2 ˜ &1 ¨0 7¨ % ¤ 0 © 0 ¢ ¨ £ ¡ 9I!

¢ § ¡ ¨¦ ¤¢

! ! &

!

£

1. Exchange interface data, i.e.,

( s¢

2. Scatter to neighbors and

t

3. Gather from neighbors ¦p

§

4. Do Local Matvec: ¡¢ ¡ ¡¢ ¡

w t

t

•

5. Do External Matvec: ¦p ¦p

§

An important observation is that the matrix-by-vector products in lines 4 and 5 can use any

convenient data structure that will improve ef¬ciency by exploiting knowledge on the local

architecture. An example of the implementation of this operation is illustrated next:

call bdxchg(nloc,x,y,nproc,proc,ix,ipr,type,xlen,iout)

y(1:nloc) = 0.0

call amux1 (nloc,x,y,aloc,jaloc,ialoc)

nrow = nloc “ nbnd + 1

call amux1(nrow,x,y(nbnd),aloc,jaloc,ialoc(nloc+1))

In the above code segment, bdxchg is the only routine requiring communication. Its

purpose is to exchange interface values between nearest neighbor processors. The ¬rst call

w

to amux1 performs the operation , where has been initialized to zero ¤¢ ¡ ¤¢ ¡

§

w t

t

¡

prior to the call. The second call to amux1 performs . Notice that the ¦p ¦p

§

t

data for the matrix is simply appended to that of , a standard technique used for

¦p

§ ¤¢ §

t

storing a succession of sparse matrices. The matrix acts only on the subvector of ¦p

§

C

t w

bi®

® ® bink¨ ‘

® ® ¨ ®

which starts at location of . The size of the matrix is . ¦p

§

3

¢ ¢3

G

fh”‚“# ¢ xn—E…tvg—P!U bUx ™˜

• • •" " " …

˜ •…

m|

¡