DPA. This is due to the following lemma:

Lemma 1. For a ¬xed key and input message, every intermediate byte that

is computed in the course of the randomised AES algorithm has the uniform

distribution in {0, 1}8.

Proof. The proof follows directly from the fact that any intermediate AES data

x is represented as P (x), where P (xh xl ) = p2 (xh ) p1 (xl ) is the randomised

permutation. From the construction of p1 , p2 , this implies that P (x) is randomly

distributed in {0, 1}8.

The previous lemma implies that an attacker who makes statistics about

power-consumption at a single point gets only random values; hence, the coun-

termeasure is resistant against ¬rst-order DPA (like the random masking coun-

termeasure). In order to obtain valuable information about the key, the attacker

must correlate the power consumption at multiple points during the execution of

the algorithm; this is called a High Order Di¬erential Power Analysis (HO-DPA);

such attack usually requires a much larger number of power consumption curves,

which makes it infeasible in practice if the number of executions is limited (for

example, by using a counter).

5 A Compression Scheme

A very nice compression scheme for SBOX tables has been proposed in [11]. This

compression scheme works for SBOXes with a random mask; we recall it in this

286 J.-S. Coron

section.1 Then in Section 6 we show how to adapt it to our permutation table

countermeasure.

Let S(x) for x ∈ {0, 1}8 be a 8-bit to 8-bit SBOX. One de¬nes S1 (x) and

S2 (x) such that S(x) = S2 (x) S1 (x) for all x ∈ {0, 1}8, where S1 (x) and S2 (x)

are 4-bit values. Let r1 , r2 ∈ {0, 1}8 and s ∈ {0, 1}4 be random masks, and let

de¬ne the randomised table:

T (x) = S1 (x • r1 ) • S2 (x • r2 ) • s (6)

which is a 8-bit to 4-bit table. Let x = x • (r1 • r2 ) be a masked data; we have

from (6):

S1 (x) = S1 (x • r1 • r2 ) = T (x • r2 ) • S2 (x ) • s

S2 (x) = S2 (x • r1 • r2 ) = T (x • r1 ) • S1 (x ) • s

which gives:

S1 (x) • s = T (x • r2 ) • S2 (x ) (7)

S2 (x) • s = T (x • r1 ) • S1 (x ) (8)

Therefore given the masked data x one can obtain a masked S1 (x) and a

masked S2 (x), by using the randomised table T . The advantage is that the

size of T is only half the size of a classically randomised SBOX table; here the

storage of T requires only 128 bytes of RAM instead of 256 bytes for a classically

randomised AES SBOX. More precisely, the algorithm is as follows:

InitTable(r1 , r2 , s)

1. Write S(x) = S2 (x) S1 (x)

2. Generate randomised table T (x) = S1 (x•r1 )•S2 (x•r2 )•s for all x ∈ {0, 1}8

TableLookUp(x , r1 , r2 , s)

1. Generate a random t ∈ {0, 1}4

2. u ← T (x • r2 ) • S2 (x )

3. v ← T (x • r1 ) • S1 (x ) • t

4. Output y = v u ∈ {0, 1}8 and r = (s • t) s.

Here we add an additional masking step with t so that the values u and v are

not masked with the same nibble s; from equations (7) and (8), we obtain that

the value returned by TableLookUp is such that y = S(x) • r .

It is easy to see that this countermeasure is resistant against ¬rst-order DPA,

as all the variables that appear in the TableLookUp function are uniformly dis-

tributed. Note that the tables S1 and S2 are only stored in ROM; they don™t

need to be randomised because in the TableLookUp algorithm those tables are

accessed at point x which is already randomised.

1

In [11] the countermeasure is described in plain English which makes it di¬cult to

understand; here we attempt to provide a more clear description.

A New DPA Countermeasure Based on Permutation Tables 287

It is also easy to see that the countermeasure can be generalised to any SBOX

input and output size. Moreover, one can obtain a better compression factor by

splitting S(x) into more shares; for example, a 8-bit SBOX could be split into

8 tables (one for each output bit); then the resulting randomised T table would

be 8 times smaller, at the cost of increasing the running time for every table

look-up.

6 Time Memory Trade-O¬s

In this section we describe three time-memory trade-o¬s. The goal is to re-

duce the RAM requirement of the permutation table countermeasure described

in Section 3, for implementation on low-cost devices, at the cost of increasing

the running time. The main time-memory tradeo¬ is based on the SBOX com-

pression scheme recalled in the previous section. The second idea consists in

using a single XOR table XT1 (x , y ) instead of two XOR tables XT1 (x , y ) and

4 4

XT2 (x , y ) as in Section 3.2. The third idea consists in removing tables TL, TH,

4

RL and RH, by using a “Double-And-Add” approach. In Section 7 we describe

the results of practical implementations of these time-memory trade-o¬s.

6.1 Compressed SBOX

The compression scheme of [11] recalled in the previous section was used for

random masking; here we show how to adapt this compression scheme to our

permutation table countermeasure.

We de¬ne a new permutation P (x) = p2 (xh ) p1 (xl ) where p1 and p2 are 4-bit

permutations tables which are generated like p1 and p2 . As previously, we write:

S(x) = S2 (x) S1 (x)

where S1 (x) and S2 (x) are 4-bit nibbles. We then de¬ne a randomised table:

T (x ) = p1 S1 P ’1 (x ) • p2 S2 P ’1 P ’1

(x ) (9)

The randomised table T (x ) is a 8-bit to 4-bit table; therefore, its storage requires

128 bits in memory, instead of 256 bytes for the randomised table S (x ) in

Section 3.3. Writing x = P (x), we obtain from equation (9):

p1 (S1 (x)) = T (P (x)) • p2 S2 P ’1 P ’1

(P (x))

This shows that given P (x) we can compute p1 (S1 (x)) using randomised table

T and table S2 stored in ROM. Similarly, writing x = P (P (x)), we obtain from

equation (9):

p2 (S2 (x)) = T (P (P (x))) • p1 S1 P ’1 (P (P (x)))

This shows that given P (x) we can compute p2 (S2 (x)) using randomised table

T and table S1 stored in ROM. Therefore, given P (x) we can compute:

P (S(x)) = p2 (S2 (x)) p1 (S1 (x))

288 J.-S. Coron

using the following operations:

InitTable(P, P ):

1. Write S(x) = S2 (x) S1 (x)

2. Generate table T (x ) = p1 S1 P ’1 (x ) • p2 S2 P ’1 P ’1

(x ) for all

x ∈ {0, 1}8.

TableLookUp(x , T, P, P )

1. u ← T (x ) • p2 S2 P ’1 P ’1 (x )

2. v ← T (P (x )) • p1 S1 P ’1 (P (x ))

3. Output v u

Given the tables P , P , T and denoting F (x ) = TableLookUp(x , T, P, P ),

the SubBytes operation on the permuted state variables is computed as follows:

si,j ← F (si,j )

The table T requires 128 bytes in memory, and the additional permutations P

and P ’1 require 32 bytes in memory, so in total 160 bytes are required (instead

of 256 bytes for the randomised table S ). The InvSubBytes operation on the

permuted state variables with compressed inverse SBOX S ’1 is obtained by

replacing S by S ’1 in the previous equations.

6.2 Single XOR Table

In Section 3.2 two 8-bit to 4-bit xor-tables are de¬ned. In this section, we show

that it is su¬cient to de¬ne only one 8-bit to 4-bit xor-table. As in Section 3.2,

we de¬ne: