Table 1 in Section 7 for a detailed comparison).

2 The AES Encryption Algorithm

In this section we recall the main operations involved in the AES algorithm. We

refer to [4] for a full description. AES operates on a 4 — 4 array of bytes si,j ,

termed the state. For encryption, each AES round (except the last) consists of

four stages:

1. AddRoundKey: each byte of the state is xored with the round key ki,j ,

derived from the key schedule:

si,j ← si,j • ki,j

2. SubBytes: each byte of the state is updated using an 8-bit S-box:

si,j ← S(si,j )

3. ShiftRows: the bytes of the state are cyclically shifted in each row by a

certain o¬set; the ¬rst row is left unchanged.

4. MixColumns: the bytes of the state are modi¬ed column by column as

follows:

s0,c ← (02 · s0,c ) • (03 · s1,c ) • s2,c • s3,c

s1,c ← s0,c • (02 · s1,c ) • (03 · s2,c ) • s3,c

s2,c ← s0,c • s1,c • (02 · s2,c ) • (03 · s3,c )

s3,c ← (03 · s0,c ) • s1,c • s2,c • (02 · s3,c )

The pseudo-code for AES encryption with a 128-bit key is recalled in Figure 1

in Appendix. The word array w contains the round keys that are generated by

the key-schedule algorithm. We refer to [4] for a more detailed description.

280 J.-S. Coron

For decryption, each round (except the last) consists in the following

operations:

1. InvShiftRows: is the inverse of the ShiftRows operation. The bytes of the

state are cyclically shifted in each row by a certain o¬set; the ¬rst row is left

unchanged.

2. InvSubBytes: is the inverse of the SubBytes operation. The inverse S-box

S ’1 is applied on each byte of the state.

3. AddRoundKey: the operation is equal to its own inverse.

4. InvMixColumns: is the inverse of the MixColumns operation. The bytes of

the state are modi¬ed column by column as follows:

s0,c ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )

s1,c ← (09 · s0,c ) • (0e · s1,c ) • (0b · s2,c ) • (0d · s3,c )

s2,c ← (0d · s0,c ) • (09 · s1,c ) • (0e · s2,c ) • (0b · s3,c )

s3,c ← (0b · s0,c ) • (0d · s1,c ) • (09 · s2,c ) • (0e · s3,c )

The pseudo-code for the inverse cipher is recalled in Figure 2 in Appendix.

Finally, the key-schedule is based on the following operations:

1. SubWord: takes a four-byte input word and applies the S-box S to each of

the four bytes.

2. RotWord: takes a word [a0 , a1 , a2 , a3 ] as input and performs a cyclic per-

mutation to return [a1 , a2 , a3 , a0 ].

3. Xor with Rcon: takes as input a 32-bits word and xor it with the round

constant word array Rcon[i] = [(02)i’1 , 00, 00, 00], for round 1 ¤ i ¤ 10.

We refer to [4] for a full description of the key-schedule.

3 The Permutation Table Countermeasure

In this section we describe our basic countermeasure with permutation tables.

A variant with a time-memory trade-o¬ is described in Section 6.

Our countermeasure consists in using a randomised representation of the state

variables, using two independent 4-bit permutation tables p1 and p2 that are

freshly generated before each new execution of the algorithm. More precisely,

every intermediate byte x = xh xl , where xh is the high nibble and xl is the low

nibble, will be represented in the following form:

P (x) = p2 (xh ) p1 (xl )

This permuted representation is then used throughout the execution of AES.

Eventually the original representation is recovered at the end of the encryption

algorithm, by applying the inverse permutation.

A New DPA Countermeasure Based on Permutation Tables 281

Generation of Permutation Tables p1 and p2

3.1

The 4-bit permutation tables p1 and p2 are generated for each new execution of

the algorithm as follows. Let s0 be a ¬xed 4-bit permutation table; one can take

for example:

s0 = [14, 6, 0, 5, 9, 1, 4, 15, 8, 10, 12, 2, 3, 13, 11, 7]

One de¬nes a sequence of permutations si for i ≥ 0 as follows:

si+1 (x) = s0 (si (x) • ki )

where each ki ∈ {0, 1}4 is randomly generated. The 4-bit permutation p1 is then

de¬ned as p1 := sn for some n (in practice, one can take n = 4). One applies

the same procedure to generate the other table p2 (with independently generated

ki ™s). Every intermediate byte x = xh xl that appear in AES is then represented

as:

P (x) = p2 (xh ) p1 (xl )

Therefore, P is a 8-bit permutation; its storage requires 16 bytes of RAM. In

the following we explain how to use this permuted representation throughout the

AES operations, so that the intermediate data are never manipulated in clear.

3.2 AddRoundKey

Given P (x) and P (y) we explain how to compute P (x•y) without manipulating

x and y directly (since otherwise this would give a straightforward DPA attack).

We de¬ne the following two 8-bit to 4-bit xor-tables; for all x , y ∈ {0, 1}4:

XT1 (x , y ) := p1 (p’1 (x ) • p’1 (y ))

4 1 1

XT4 (x , y ) := p2 (p2 (x ) • p’1 (y ))

’1

2

2

Those two tables require a total of 256 bytes in memory. Then given p1 (x) and

p1 (y) one can compute p1 (x • y) using:

XT1 (p1 (x), p1 (y)) = p1 (x • y)

4

for all x, y ∈ {0, 1}4. Similarly, we have:

XT2 (p2 (x), p2 (y)) = p2 (x • y)

4

Using those two tables we de¬ne the following function for all x , y ∈ {0, 1}8,

where x = xh xl and y = yh yl :

XT8 (x , y ) = XT2 (xh , yh ) XT1 (xl , yl )

4 4

Then given P (x) and P (y), one can compute P (x • y) as:

XT8 (P (x), P (y)) = P (x • y) (1)

282 J.-S. Coron

The AddRoundKey operation can then be implemented as:

si,j ← XT8 (si,j , ki,j )

where si,j = P (si,j ) and ki,j = P (ki,j ). It is therefore necessary to use the

permuted representation for the round keys. We further describe how this is

done by modifying the key-schedule operations (see sections 3.9, 3.10 and 3.11).

3.3 SubBytes

Let S be the AES SBOX. We de¬ne the following randomised permutation S :

S (x) = P (S(P ’1 (x)))

Given P (x), the permuted representation of S(x) is then computed as:

P (S(x)) = S (P (x))

The SubBytes operation on the permuted state variables can then be computed

using the table S as follows: