¬¬¬%

, , consecutively, i.e., as . The resulting matrix after the per-

(% (

%

G

mutation is shown in the right side of Figure 11.11. An alternative is simply to keep the

permutation array and use it to identify unknowns that correspond to a given level in the

solution. Then the algorithm for solving the triangular systems can be written as follows,

assuming that the matrix is stored in the usual row sparse matrix format.

¢ ©£ ¤ ¡ & ¥ £ ˜ ¤ 0 0 ¨ £

# © £ ¨ %

'! $!" 75§D§ ¡w¤h¨ n¦ ¤ £¢

¦¢

& % # ¶ § ¨£ ¤ 0

©£ ¥

1. Do lev=1, nlev

2. j1 = level(lev)

3. j2 = level(lev+1) “ 1

4. Do k = j1, j2

5. i = q(k)

6. Do j= ial(i), ial(i+1) “ 1

8˜ m ’ ”p£8 ¡©

8¥ $

¡

¡¨ ¨¦ ¤

© §¥ Yc¦ ¡©0c¨£"¥ §

¥

© § &

$

7. x(i) = x(i) “ al(j) * x(jal(j))

8. EndDo

9. EndDo

10. EndDo

An important observation here is that the outer loop, which corresponds to a level,

performs an operation of the form

¨u

§

£

where is a submatrix consisting only of the rows of level , and excluding the diagonal

§

elements. This operation can in turn be optimized by using a proper data structure for these

submatrices. For example, the JAD data structure can be used. The resulting performance

can be quite good. On the other hand, implementation can be quite involved since two

embedded data structures are required.

Natural ordering Level-Scheduling ordering

7§5 Dd

¶ §§

¤¥£ ¢

¡ ¥

Lower-triangular matrix associated with a ¬-

nite element matrix and its level-ordered version.

¥

D# ¥§¨

§§ ©

Consider a ¬nite element matrix obtained from the example shown in

Figure 3.1. After an additional level of re¬nement, done in the same way as was described

Q ¡

—®

in Chapter 3, the resulting matrix, shown in the left part of Figure 11.12, is of size .

G

¨

In this case, levels are obtained. If the matrix is reordered by levels, the matrix shown in

the right side of the ¬gure results. The last level consists of only one element.

r¡

·

”}$

8’

¨(c2 ¤ ¡" ¡©

©

$ ¥ © c"

©

ª“E˜ “

˜ •

1 Give a short answer to each of the following questions:

&% What is the main disadvantage of shared memory computers based on a bus architecture?

&1 What is the main factor in yielding the speed-up in pipelined processors?

&32 Related to the previous question: What is the main limitation of pipelined processors in

regards to their potential for providing high speed-ups?

$P © A

R

2 Show that the number of edges in a binary -cube is .

$

3 Show that a binary -cube is identical with a torus which is a mesh with wrap-around

connections. Are there hypercubes of any other dimensions that are equivalent topologically to

toruses?

©

e A ¡

4 A Gray code of length is a sequence of -bit binary numbers such that (a) ¡ $P

R

"

any two successive numbers in the sequence differ by one and only one bit; (b) all -bit binary

¡

numbers are represented in the sequence; and (c) and differ by one bit. ¡ RSP

¦e

&% Find a Gray code sequence of length and show the (closed) path de¬ned by the

"

sequence of nodes of a 3-cube, whose labels are the elements of the Gray code sequence.

What type of paths does a Gray code de¬ne in a hypercube?

&1 To build a “binary re¬‚ected” Gray code, start with the trivial Gray code sequence consisting

of the two one-bit numbers 0 and 1. To build a two-bit Gray code, take the same sequence

and insert a zero in front of each number, then take the sequence in reverse order and insert a

¤ 3H H ¨¡¢e ©

£H H

one in front of each number. This gives . The process is repeated until

an -bit sequence is generated. Show the binary re¬‚ected Gray code sequences of length 2,

4, 8, and 16. Prove (by induction) that this process does indeed produce a valid Gray code

sequence.

&32 Let an -bit Gray code be given and consider the sub-sequence of all elements whose ¬rst

T

bit is constant (e.g., zero). Is this an bit Gray code sequence? Generalize this to any of

#

the -bit positions. Generalize further to any set of bit positions. 5 9

© A" ©

A H © R

¢ $

&

Use the previous question to ¬nd a strategy to map a mesh into an -cube.

5 Consider a ring of processors which are characterized by the following communication perfor-

"

mance characteristics. Each processor can communicate with its two neighbors simultaneously,

i.e., it can send or receive a message while sending or receiving another message. The time for

&

a message of length to be transmitted between two nearest neighbors is of the form