© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

H

necessary interpolations to get the result in in a given step and then iterate on the ¬ne

mesh in the following step. This can be done without having to pass any data regarding the

matrix or the preconditioner to the FGMRES accelerator.

Note that the purpose of reverse communication simply is to avoid passing data struc-

tures related to the matrices, to the accelerator FGMRES. The problem is that these data

structures are not ¬xed. For example, it may be desirable to use different storage formats

for different architectures. A more elegant solution to this problem is Object-Oriented

Programming. In Object-Oriented Programming languages such as C++, a class can be

declared, e.g., a class of sparse matrices, and operators can be de¬ned on them. Data struc-

tures are not passed to these operators. Instead, the implementation will recognize the types

of the operands and invoke the proper functions. This is similar to what exists currently for

w

arithmetic. For operation , the compiler will recognize what type of operand is

¢ P

involved and invoke the proper operation, either integer, double real, or complex, etc.

”

•… m…X# %n¦§ (

˜ “" …

£(

m|

¢

¡

Matrix-by-vector multiplications (sometimes called “Matvecs” for short) are relatively

easy to implement ef¬ciently on high performance computers. For a description of storage

formats for sparse matrices, see Chapter 3. We will ¬rst discuss matrix-by-vector algo-

rithms without consideration of sparsity. Then we will cover sparse Matvec operations for

a few different storage formats.

¥ D!¥ CbQ9gf IB ISd #I!9

72 ¡2 I6

6 6 cbY

Ha U HB

H WHR

BH Pe Y

The computational kernels for performing sparse matrix operations such as matrix-by-

-vector products are intimately associated with the data structures used. However, there

are a few general approaches that are common to different algorithms for matrix-by-vector

products which can be described for dense matrices. Two popular ways of performing these

operations are (1) the inner product form described in Algorithm 11.1, and (2) the SAXPY

form described by Algorithm 11.2.

0¦©0 0¦%

¥ !" ¨ £ ¡ 9I! ¨ " §5D§ wE¤©n¦§¥£¢

§ ¡ ¨¦ ¤¢

& "

1. Do i = 1, n

2. y(i) = dotproduct(a(i,1:n),x(1:n))

3. EndDo

The dot product operation dotproduct(v(1:n),w(1:n)) computes the dot product of the two

®

vectors v and w of length each. If there is no ambiguity on the bounds, we simply write

dotproduct(v,w). The above algorithm proceeds by rows. It computes the dot-product of

row of the matrix with the vector and assigns the result to . The next algorithm a`

’™8 ¤ "¡ } ¡ } ¨8p

©w w ¥ r

’ 8

¤ ¨& ¦§ ¡¡

1

uses columns instead and results in the use of the SAXPY operations.

0¦©0 0¦%

ª˜ t¶D§

§ ¥ "

¡w¤h¨ n¦ ¤ £¢

¦¢

!

V)

"

1. y(1:n) = 0.0

2. Do j = 1, n

3. y(1:n) = y(1:n) + x(j) * a(1:n,j)

4. EndDo

The SAXPY form of the Matvec operation computes the result as a linear com-

bination of the columns of the matrix . A third possibility consists of performing the

product by diagonals. This option bears no interest in the dense case, but it is at the basis

of many important matrix-by-vector algorithms in the sparse case.

0¦©0 0¦%

¥ " rv" D§ ¡w¤h¨ n¦ ¤ £¢

• § ¦ ¢

! "

1. y(1:n) = 0

2. Do k = “ n+1, n “ 1

3. Do i = 1 “ min(k,0), n “ max(k,0)

4. y(i) = y(i) + a(i,k+i)*x(k+i)

5. EndDo

6. EndDo

The product is performed by diagonals, starting from the leftmost diagonal whose offset is

w

P¨

® –®

¨

to the rightmost diagonal whose offset is .

G G

RcW #I¤¥ Hca Y ¥ ¤¥ B QgCU

52 2 I6

6

4 9 eB B Y9 fe

One of the most general schemes for storing sparse matrices is the Compressed Sparse Row

storage format described in Chapter 3. Recall that the data structure consists of three arrays:

a real array A(1:nnz) to store the nonzero elements of the matrix row-wise, an integer array

JA(1:nnz) to store the column positions of the elements in the real array A, and, ¬nally, a

pointer array IA(1:n+1), the -th entry of which points to the beginning of the -th row in

the arrays A and JA. To perform the matrix-by-vector product in parallel using this

format, note that each component of the resulting vector can be computed independently