(0101) and (1010). Hence, there are only two possibilities left: either s(S1 ) =

S1 and s(S2 ) = S3 , or s(S1 ) = S2 and s(S3 ) = S3 . Further analysis shows

that

s(S1 ) = S1 ” s(0001) = (0100),

s(S3 ) = S3 ” s(0111) = (1101),

because other choices lead to violations of Property 2. For each of these

possibilities, there are 4 possibilities left for s(0, 0, 1, 1) and this choice de-

termines completely the action of s on the coordinate vectors. Table 2 lists

the 8 remaining candidates for the inversion map in GF(16). The cases A-D

correspond to the 4 choices for S1 = s(S2 ) and the cases E-H to the 4 choices

for S3 = s(S2 ).

242 S. Nikova, V. Rijmen, and M. Schl¨¬er

a

Table 2. 8 candidates for the inversion map in GF(16). The ¬rst 4 cases have s(S2 ) =

S1 , the last 4 have s(S2 ) = S3 .

x s(x)

A B C D E F G H

0000 0000 0000 0000 0000 0000 0000 0000 0000

1111 1111 1111 1111 1111 1111 1111 1111 1111

0101 1010 1010 1010 1010 1010 1010 1010 1010

1010 0101 0101 0101 0101 0101 0101 0101 0101

0001 0011 0110 1100 1001 0100 0100 0100 0100

0010 0110 1100 1001 0011 1000 1000 1000 1000

0100 1100 1001 0011 0110 0001 0001 0001 0001

1000 1001 0011 0110 1100 0010 0010 0010 0010

0011 0001 1000 0100 0010 0111 1011 1101 1110

0110 0010 0001 1000 0100 1110 0111 1011 1101

1100 0100 0010 0001 1000 1101 1110 0111 1011

1001 1000 0100 0010 0001 1011 1101 1110 0111

0111 1101 1101 1101 1101 0011 0110 1100 1001

1011 1110 1110 1110 1110 1001 0011 0110 1100

1101 0111 0111 0111 0111 1100 1001 0011 0110

1110 1011 1011 1011 1011 0110 1100 1001 0011

The two choices S1 = s(S2 ) and S3 = s(S2 ) can be restated as S3 = s(S3 )

and S1 = s(S1 ). Hence, the choice is which elements will have order 5 and

will form together with the unit element the cyclic subgroup of order 5 of

the multiplicative group of GF (16).

Recall that the two cyclic groups of order 3 and order 5 build up the

multiplicative group of GF (16). Thus the fact that the cyclic group of order

3 is ¬xed and for the cyclic group of order 5 we have 2 choices implies that

we have two choices for the multiplicative group of GF (16) this naturally

corresponds to the fact that we have only two normal bases in GF (16).

6. Finally, we use the fact that s must satisfy

s(xy) = s(x)s(y), ∀x, y

to eliminate all but two of the candidate maps. We obtain that only case A

and case H are inversion maps, corresponding to the optimal, respectively

the normal base, mentioned before.

Table 3 gives the truth tables for the Boolean functions that computes the

rightmost bit f0 of s(x) in the two cases:

(f3 , f2 , f1 , f0 ) = s(x3 , x2 , x1 , x0 )

Due to the rotational symmetry of s, the other output bits f3 , f2 and f1 can

be computed by rotating the input bits. The Algebraic Normal Form (ANF) of

each output bit f0 is given by:

NB: f0 = x0 + x3 + x0 x1 + x1 x3 + x0 x1 x2 + x0 x1 x3 + x1 x2 x3

(5)

ONB: f0 = x1 + x0 x3 + x0 x2 + x1 x3 + x0 x1 x2 + x0 x1 x3 + x1 x2 x3

Using Normal Bases for Compact Hardware Implementations 243

Table 3. The truth tables for the Boolean functions computing the rightmost bit of

s(x) in the two cases

x A (NB) H (ONB)

0000 0 0

0001 1 0

0010 0 0

0011 1 0

0100 0 1

0101 0 0

0110 0 1

0111 1 1

1000 1 0

1001 0 1

1010 1 1

1011 0 0

1100 0 1

1101 1 0

1110 1 1

1111 1 1

In the next section we show how the ANF can be simpli¬ed to reduce the number

of operations and the resulting hardware size.

4 Hardware Considerations

In this section we optimize the hardware implementation of the di¬erent normal

bases regarding their size. We present the size of the inversion in GF(16) using

the two normal bases and compare our results with the results of Canright.

4.1 Optimizing the Implementation

In order to achieve minimal hardware implementations, the 4 formulae for the 4

output bits of the inversion in GF(16) need to be further optimized. We compute

the four output bits in parallel and share intermediate terms between di¬erent

functions. Hence, formulae which allow to share many terms, have an advantage.

Secondly, ˜+™ operations (XOR) are usually very expensive in hardware im-

plementations. For instance using the AMS 0.35μm technology [1], an XOR

gate costs 2.33 times more than a NAND gate. Hence, the ¬nal implementation

formulae should contain as little ˜+™ operations as possible. For instance, the

inversion formulae using the optimal normal basis (5) can be rewritten as:

f0 = NAND(NOR(NOR(NAND(x0 , x2 )), NAND(x3 , x1 ),

NOR(NOR(NOR(x0 , x3 ), x1 ), x2 )), NAND(x1 , x3 ))

f1 = NAND(NOR(NOR(NAND(x3 , x1 ), NAND(x2 , x0 )),

NOR(NOR(NOR(x3 , x2 ), x0 ), x1 )), NAND(x0 , x2 ))

(6)

f2 = NAND(NOR(NOR(NAND(x2 , x0 )), NAND(x1 , x3 ),

NOR(NOR(NOR(x2 , x1 ), x3 ), x0 )), NAND(x3 , x1 ))

f3 = NAND(NOR(NOR(NAND(x1 , x3 )), NAND(x0 , x2 ),

NOR(NOR(NOR(x1 , x0 ), x2 ), x3 )), NAND(x2 , x0 ))

244 S. Nikova, V. Rijmen, and M. Schl¨¬er

a

These formulae use 16 NAND gates, 20 NOR gates, 20 NOT gates. When im-

plementing the 4 output bits in parallel, the inverters xi and the common terms

NAND(xi , xi+2 ) can be shared between di¬erent output bits. Therefore, these 4

functions can ¬nally be implemented using 20 NOR gates, 8 NAND gates and 4

NOT gates.

4.2 Counting Gate Equivalents

We refer to the size of the NAND gate by one gate equivalent (GE). In the

AMS 0.35μm standard cell library [1] an XNOR gate corresponds to 2GE, an

XOR gate to 2.33GE, an inverter (INV) to 0.65GE and a NOR gate to 1GE.

These values give a total of 30.7GE for our best GF(16) inversion. This compares

favorably to the best GF(16) inversion circuit reported by Canright, which uses

9 XOR and 10 NAND gates, corresponding to 31.0GE [2]. Referring to personal

communication with Satoh, Canright equates XOR and XNOR gates to 1.75GE,

and inverters to 0.75GE. Using these values, our best GF(16) inversion costs

31.0GE, but the cost of the best Canright implementation is reduced to 25.8GE.

However, standard cell libraries provide additional operations other than the

basic Boolean operations INV, NAND, NOR, XNOR and XOR. For example,

there are extended operations with more than one input which can be used to

improve the 3-input NOR terms of (6). Additionally, the AMS 0.35μm library

provides specialized standard cells like AND-OR-INVERT (AOI) which compute

Q = A.B+C+1, or OR-AND-INVERT (OAI) which compute Q = (A+B).C+1.

Both can be implemented using only 1.33GE [1] and have far less size than

the XOR or XNOR gates. Using these additional cells, we have been able to

improve the GF(16) inversion of Canright to 26.7GE and the inversion based on

the optimal normal base formulae to 22.7GE. Table 4 gives an overview of these

results.

Table 4. Equivalent gate costs for the implementation of several GF(16) inversions

Design Canright™s GE [3] basic standard cells [1] all standard cells [1]

Canright 25.8 31.0 26.7

NB 34.3 34.0 24.7

ONB 31.0 30.7 22.7