matrices, we can either generalize the diagonal storage scheme or reorder the matrix in or-

der to obtain a diagonal structure. The simplest generalization is the Ellpack-Itpack Format.

¦ ¦

¥ ¥ Y¨g3XU

2 ¡2 I6

6 T QH HcbY

a

9T Y P0E 9 9fe

£

¢ ¢

The Ellpack-Itpack (or Ellpack) format is of interest only for matrices whose maximum

number of nonzeros per row, jmax, is small. The nonzero entries are stored in a real array

ae(1:n,1:jmax). Along with this is integer array jae(1:n,1:jmax) which stores the column

indices of each corresponding entry in ae. Similar to the diagonal scheme, there are also

two basic ways of implementing a matrix-by-vector product when using the Ellpack for-

mat. We begin with an analogue of Algorithm 11.7.

¨ % ¥ " ¡£ ¤¤2 t·D§ ¡w¤h¨ n¦ ¤ £¢ " ¨ £ ¡ & '! (( ¨

¦¢

§

! %

! ¥

"

1. Do i = 1, n

2. yi = 0

3. Do j = 1, ncol

4. yi = yi + ae(j,i) * x(jae(j,i))

5. EndDo

8˜ m ’ ”p£8 ¡©

8¥ $

¡¡

¢ ¡¨ ¨¦ ¤

© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

6. y(i) = yi

7. EndDo

In data-parallel mode, the above algorithm can be implemented by using a temporary

two-dimensional array to store the values , and then performing a pointwise

a a ¥ ¦£ ¥ `

%`

array product of and this two-dimensional array. The result is then summed along the

£

rows

forall ( i=1:n, j=1:ncol ) tmp(i,j) = x(jae(i,j))

y = SUM(ae*tmp, dim=2).

The FORTRAN forall construct performs the operations as controlled by the loop

heading, in parallel. Alternatively, use of the temporary array can be avoided by recoding

the above lines as follows:

forall (i = 1:n) y(i) = SUM(ae(i,1:ncol)*x(jae(i,1:ncol))) .

The main difference between these loops and the previous ones for the diagonal format is

the presence of indirect addressing in the innermost computation. A disadvantage of the

Ellpack format is that if the number of nonzero elements per row varies substantially, many

zero elements must be stored unnecessarily. Then the scheme becomes inef¬cient. As an

extreme example, if all rows are very sparse except for one of them which is full, then the

%¯¡®

®

arrays ae, jae must be full arrays, containing mostly zeros. This is remedied by a

variant of the format which is called the jagged diagonal format.

¨g3XU SCXU 8 9 PcR IA8 8 9

2 2 I6

6 cabY

H RH

T9W

Y9 fe

¡

A more general alternative to the diagonal or Ellpack format is the Jagged Diagonal (JAD)

format. This can be viewed as a generalization of the Ellpack-Itpack format which removes

the assumption on the ¬xed length rows. To build the jagged diagonal structure, start from

the CSR data structure and sort the rows of the matrix by decreasing number of nonzero

elements. To build the ¬rst “jagged diagonal” (j-diagonal), extract the ¬rst element from

each row of the CSR data structure. The second jagged diagonal consists of the second

DB¬

¬¬

elements of each row in the CSR data structure. The third, fourth, , jagged diagonals can

then be extracted in the same fashion. The lengths of the successive j-diagonals decreases.

The number of j-diagonals that can be extracted is equal to the number of nonzero elements

of the ¬rst row of the permuted matrix, i.e., to the largest number of nonzero elements per

row. To store this data structure, three arrays are needed: a real array DJ to store the values

of the jagged diagonals, the associated array JDIAG which stores the column positions of

these values, and a pointer array IDIAG which points to the beginning of each j-diagonal

in the DJ, JDIAG arrays.

’™8 ¤ "¡ } ¡ } ¨8p

©w w ¥ r

’ 8

¤ ¨& ¦§ Q¡

§

1

t¶D#¥¨¥

§§ ©§ ¡

Consider the following matrix and its sorted version :

¬ E ¬ Q p"E

£¤ ¨¦

§ £¤ ¨¦

§

¬ ¡E ¬ HE ¬ EQ ¬ G ¬ EE ¬E ¬E

¬

¤ § ¤ §

¤ § ¤ §

¢ H ¬¬ E ¬

¬‚¥r¤¬

¬ E ¬ ¬ p"E ¬ ¬ ¬r¨E ¬ E

¤ § ¤ §

¬ E ‚£r¡¬ E ¢

¬ ¢¢ ¬ E ¬r¨E ¬ ¢ E GE ¬ E ¬E

¢

¬ G ¬E ¬E ¬E ¬H ¬E ¬E ¬E ¬H ¬ G

¥ © ¥ ©

¬ cG ¬ ¬ ¬ ¬ ¬¬¬ ¬ G ¬ 0G

G G G

The rows of have been obtained from those of by sorting them by number of nonzero

elements, from the largest to the smallest number. Then the JAD data structure for is as

follows:

DJ 3. 6. 1. 9. 11. 4. 7. 2. 10. 12. 5. 8.

JDIAG 1213 4 2 3 3 4 5 4 5

IDIAG 1 6 11 13

Thus, there are two j-diagonals of full length (¬ve) and one of length two.

A matrix-by-vector product with this storage scheme can be performed by the follow-

ing code segment.