sparse version of Algorithm 11.1 for the case where the matrix is stored in CSR format.

˜ ¨ " ¨% " ¨ £ ¡ &9I!

D§ ¡w¤h¨ n¦ ¤ £¢

¦¢

§ "

!

!

¥ ¥

1. Do i = 1, n

2. k1 = ia(i)

3. k2 = ia(i+1)-1

4. y(i) = dotproduct(a(k1:k2),x(ja(k1:k2)))

5. EndDo

8˜ m ’ ”p£8 ¡©

8¥ $

¡¡

¡¨ ¨¦ ¤

© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

Line 4 of the above algorithm computes the dot product of the vector with components

¦¦¦ ¦¦¦

a(k1), a(k1+1), , a(k2) with the vector with components x(ja(k1)), x(ja(k1+1)), ,

x(ja(k2)).

The fact that the outer loop can be performed in parallel can be exploited on any par-

allel platform. On some shared-memory machines, the synchronization of this outer loop

is inexpensive and the performance of the above program can be excellent. On distributed

memory machines, the outer loop can be split in a number of steps to be executed on each

processor. Thus, each processor will handle a few rows that are assigned to it. It is common

to assign a certain number of rows (often contiguous) to each processor and to also assign

the component of each of the vectors similarly. The part of the matrix that is needed is

loaded in each processor initially. When performing a matrix-by-vector product, interpro-

cessor communication will be necessary to get the needed components of the vector that

do not reside in a given processor. This important case will return in Section 11.5.6.

Gather

+

DotProduct

+

y(i)

*

+

x(*) a(i,*)

+

x(1:n)

" Dd ¥ ¤¥£ ¢

¡

§§

Illustration of the row-oriented matrix-by-

vector multiplication.

The indirect addressing involved in the second vector in the dot product is called a

gather operation. The vector x(ja(k1:k2)) is ¬rst “gathered” from memory into a vector of

contiguous elements. The dot product is then carried out as a standard dot-product opera-

tion between two dense vectors. This is illustrated in Figure 11.6.

¥

§5D# ¥§¨

§§ ©

This example illustrates the use of scienti¬c libraries for performing

sparse matrix operations. If the pseudo-code for Algorithm 11.4 is compiled as it is on

the Connection Machine, in CM-FORTRAN (Thinking Machine™s early version of FOR-

TRAN 90), the resulting computations will be executed on the front-end host of the CM-2

or the Control Processor (CP) of the CM-5, rather than on the PEs. This is due to the fact

that the code does not involve any vector constructs. The scienti¬c library (CMSSL) pro-

vides gather and scatter operations as well as scan add operations which can be exploited

to implement this algorithm more ef¬ciently as is show in the following code segment:

y = 0.0

DB¬

¬¬

call sparse util gather ( tmp, x, gather trace, )

tmp = a*tmp

DB¬

¬¬

call cmf scan add (tmp, tmp, cmf upward, cmf inclusive, )

’™8 ¤ "¡ } ¡ } ¨8p

©w w ¥ r

’ 8 ¥¡

¤

¤ ¨& ¦§

1

call sparse util scatter ( y, scatter pointer, tmp,

¬BB¬

¬

scatter trace, )

The sparse util gather routine is ¬rst called to gather the corresponding entries from the

±¢

vector into a temporary array , then the multiplications are carried out element-by-

element in parallel. The cmf scan add routine from the CM Fortran Utility Library is used

to perform the summation for each row. Finally, the call to sparse util scatter copies the

results. Segmented scan-adds are particularly useful for implementing sparse matrix-by-

vector products when they are provided as part of the libraries. Note that the sparse util-

gather setup and sparse util scatter setup routines must be called to compute the com-

munication patterns, gather trace and scatter trace, before this algorithm is called. These

tend to be expensive operations.

Now assume that the matrix is stored by columns (CSC format). The matrix-by-vector

product can be performed by the following algorithm which is a sparse version of Algo-

rithm 11.2.

˜ ª‘ ¨ %

˜

D§ ¡w¤h¨ n¦ ¤ £¢

¦¢ !"

§ "

!

)

¥ ¥

!

1. y(1:n) = 0.0

2. Do i = 1, n

3. k1 = ia(i)

4. k2 = ia(i + 1)-1

5. y(ja(k1:k2)) = y(ja(k1:k2)) + x(j) * a(k1:k2)

6. EndDo

na ¥ #

¯` ®

The above code initializes to zero and then adds the vectors for ¦£

` a ¥%

G