¥ §
$ $
§¤( § % ")'§ ¢ ) ¨ £ ) § £ £ ( § §
§
)
c&

The matrixbyvector product is an important operation which is required in most of the
iterative solution algorithms for solving sparse linear systems. This section shows how
these can be implemented for a small subset of the storage schemes considered earlier.
The following Fortran 90 segment shows the main loop of the matrixbyvector oper
ation for matrices stored in the Compressed Sparse Row stored format.
))¡¡ ¢ "
©
§&£¤¡ £© ¥¢
¡
¤
©!¨ )¦¡¡&£¤¡ ©
§
§©
¡
¢ ¢
¤ ¢ ¢
¤
)© §&£3§& " ¨" %¡§ © &£ ¥ &
§ § %
" % %%¢
Notice that each iteration of the loop computes a different component of the resulting
vector. This is advantageous because each of these components can be computed indepen
dently. If the matrix is stored by columns, then the following code could be used instead:
))¨¥¢ "
©
¥ §&£¤¡ £©
¥¢
¤
©!¨ ©)¥¢ &£¤¡¢ ©
¨§
¦
§
¤ © ¢ ¥¢
¤ ¢ ¥¢
¤
© &#¥ §
§£ © § #¥ §
£
"!% )© &¤# ¥ §
§£
¦
" % %%¢
In each iteration of the loop, a multiple of the th column is added to the result, which
is assumed to have been initially set to zero. Notice now that the outer loop is no longer
parallelizable. An alternative to improve parallelization is to try to split the vector operation
in each inner loop. The inner loop has few operations, in general, so this is unlikely to be a
sound approach. This comparison demonstrates that data structures may have to change to
improve performance when dealing with high performance computers.
Now consider the matrixbyvector product in diagonal storage.
))¨¥ "
©
¥ §&$%#" ¡ $$%#" ¥ $
$
)©)¡¡ © "
©
%§ ¡
¥ § ¥%£¤¢ ¦%¡§
¡
¡ %¡ ¦$%#" ¥ &!"
$ § #
" %% ¢
" % %%¢
¦
Here, each of the diagonals is multiplied by the vector and the result added to the
¨ ¨
vector . It is again assumed that the vector has been ¬lled with zeros at the start of
the loop. From the point of view of parallelization and/or vectorization, the above code is
probably the better to use. On the other hand, it is not general enough.
Solving a lower or upper triangular system is another important “kernel” in sparse
matrix computations. The following segment of code shows a simple routine for solving a
‚¥
¦ ¨
unit lower triangular system for the CSR storage format.
ud
¤§ ¥ §¢
¡
¡© © ¢ ¡ ¥ ¥ © ¡ ¥
§
©
¤§
© ¤§
©
¤
" ¡ ¡
¢
£© ¤¡
£ § ¡
¡
¢
¤
¤¡©
£ ©!¨ ©)¦¡¡§
§
¢¡
¢¡ ¡
¥¢
¤
¥¢
¤
&
§ !%¡ %% © § £ ¥ & ) © § £3§ " " #¨%¡ §
§
§
" %% ¢
¦
At each step, the inner product of the current solution with the th row is computed and
wC B wC B1 ¥¡
¨ ¦
subtracted from . This gives the value of . The function computes ¤¥£¤©¨¥ ¢
¢ §¦
¡
¤ ¤ ¢ ¢
¤
the dot product of two arbitrary vectors and . The vector © § ¦ © &