we have A = g1 1, just after the inner loop iteration for j = L1 , i = 1

(and still after the outer loop iteration for j = L1 if L1 ≥ maxi Li ). From this

and earlier loop iterations, we can obtain the following s ’ 1 intermediate values:

(b1, ,...,b1,L1 +···+Ls’1 )2

j = L1 + · · · + Ls’1 ’ A = g1

... ...

(b ,...,b1,L1 )2

j = L1 , i = 1 ’ A = g1 1,

Thus, we can output the following data to be cached”a cache entry comprising

g1 itself (as index to the cache) and s ’ 1 pairs, each consisting of an integer and

the corresponding power of g1 :

(b1, ,...,b1,L1 )2

g1 , (b1, , . . ., b1,L1 )2 , g1 ,

. . .,

(b1, ,...,b1,L1 +···+Ls’1 )2

(b1, , . . ., b1,L1 +···+Ls’1 )2 , g1

Note that when writing this to cache, the integers may be evaluated as such”

there is no need to store the speci¬c radix-2 representations. (However, since all

of these integers are derived from e1 following a ¬xed rule, it is clear that at most

bits are su¬cient to store complete information on all of them, should memory

e¬ciency be an utmost concern.) With any of the representations mentioned in

Section 2.1, these partial integers are guaranteed to be non-negative, with

(b1, , . . ., b1,L1 )2 ≥ . . . ≥ (b1, , . . ., b1,L1 +···+Ls’1 )2 ≥ 0.

Furthermore, if e1 is uniformly random from some set (0, . . ., q) of integers

where q is an -bit integer, then (unless Ls is very small) all of these inte-

gers will actually be positive with high probability (and will be reasonably

close to 2 ’L1 , . . ., 2 ’L1 ’···’Ls’1 , respectively; i.e., since = 1¤i¤s Li , to

L2 +···+Ls Ls

2 , . . ., 2 ).

Depending on the assumed distribution of e1 , it may be a good idea to skip

writing a cache entry if it ever turns out that (b1, , . . ., b1,L1 +···+Ls’1 )2 = 0. In

any case, writing a cache entry should be skipped if all of the integers in it would

be zero (and thus the corresponding powers of g1 trivial).

48 B. M¨ller and A. Rupp

o

Multi-exponentiation for an Old Base g1

3.2

If a cache entry based on g1 is available (created as described in Section 3.1),

e

then 1¤i¤k gi i may be computed as follows. First, parse the cache entry as

g1 , (»1 , G1 ), . . ., (»s’1 , Gs’1 ) . Here we have Gi = g1 i for 1 ¤ i ¤ s ’ 1, and

»

if one of the exponent representations mentioned in Section 2.1 was used while

creating the cache entry as speci¬ed in Section 3.1, we have »1 ≥ . . . ≥ »s’1 ≥ 0.

Now split e1 into integers Ei (1 ¤ i ¤ s):

“ let d0 = e1 ;

di’1

“ for 1 ¤ i ¤ s ’ 1, let Ei = and di = di’1 ’ Ei »i ;

»i

“ and ¬nally, let Es = ds’1 .

In the exceptional case that »i = 0, Ei = 0 should be substituted for di’1 . By

»i

this construction, we have e1 = E1 »1 + · · · + Es’1 »s’1 + Es . It follows that

E

g11 = GE1 · · · · · Gs’1 · g1 s ,

e E

s’1

1

e

and thus we have transformed the power g11 into a power product using new

exponents Ei . This step is similar to radix-2 exponent splitting; we call it modular

exponent splitting. Suitable digit sets for radix-2 representations of each of the

new exponents can be chosen depending on how much read/write memory is

available for storing powers of the bases G1 , . . ., Gs’1 and g1 (cf. Section 2.1).

For the exponents to the ¬xed bases g2 , . . ., gk , we again (exactly as in Sec-

tion 3.1) assume that these are given in representations ei = 0¤j¤ bi,j · 2j ,

bi,j ∈ Bi . We apply radix-2 exponent splitting to these, giving us exponent

representations of maximum length L.

In total, by applying both modular exponent splitting and radix-2 exponent

splitting, we have converted the k-fold multi-exponentiation into a ks-fold multi-

exponentiation. The maximum length of exponent representations here may ex-

ceed L since we do not have strict guarantees regarding the Ei . However, under

the assumptions regarding the distribution of e1 stated in Section 3.1, the max-

imum length will remain around L with high probability.

This completes the description of our new technique. For an illustrative ex-

ample we refer the reader to Appendix A.

4 Performance

Our multi-exponentiation technique can be used under many di¬erent

parameterizations”the number of bases may vary; the length of exponents

may vary; the amount of memory available for ¬xed precomputation (such as

ROM) may vary; the amount of memory available for cache entries (such as

slow read/write memory) may vary; the amount of memory available for vari-

able precomputed elements (i.e., intermediate powers) needed by the interleaved

exponentiation algorithm may vary; and under any of these parameterizations,

we have to decide on parameters s and L1 , . . ., Ls for exponent splitting (s-fold

Faster Multi-exponentiation through Caching 49

exponent splitting with exponent segment lengths Li ), and we have to decide

on digit sets and representation conversion techniques for the exponents to the

¬xed bases g2 , . . ., gk on the one hand, and for any of the s partial exponents

created from e1 when the algorithm from Section 3.2 uses a cache entry on the

other hand. This encompasses a large variety of di¬erent settings.

In the present section, we will look at a speci¬c range of rather simple

use scenarios for our new technique to assess its performance. Let us assume

that we want to implement the multi-exponentiation technique in an environ-

ment where only a very limited amount of fast read/write memory is avail-

able but where we have some slower memory suitable for the cache, and where

we have plenty of read-only memory for permanently ¬xed precomputed ele-

ments. As powers of g1 are frequently needed in the course of the algorithm,

this is what we will use such fast memory for. As particular examples, let us

consider the cases where we have such fast memory space to store 4, 8, 16 or

32 group elements, and let be 160, 192 or 256, which are practical values for

ECDSA. Note that restricting the space for storing powers of a base also limits

the number of di¬erent digit values that we can use in exponent representations

for the interleaved multi-exponentiation algorithm. We have implemented our

new multi-exponentiation strategy and counted certain group operations under

these prerequisites for di¬erent values of the splitting parameter s, always using

reasonable Li ≈ +1 and a left-to-right signed fractional window representa-

s

tion using appropriate digit sets B ± (m) = {±1, ±3, . . ., ±m, 0} such as to fully

utilize the fast memory. (See [11,23,16] for details regarding left-to-right signed

fractional window conversions.)

We have repeatedly simulated the behavior of our technique for uniformly

random exponents in the interval (0, . . ., 2 ’ 1), covering both the case of “new

bases” to create cache entries (Section 3.1) and the case of “old bases” to observe

the performance given such cache entries (Section 3.2). In these simulations, we

have counted the following operations:

“ Squarings (S) and other multiplications (M ) used for precomputing powers

of g1 (including powers of cache entries derived from g1 );

“ squarings (S) and multiplications (M ) by precomputed powers of g1 (or of

cache entries) within the interleaved multi-exponentiation algorithm.

We have excluded from counting any of the multiplications by ¬xed precomputed

elements (from ROM), since these are not a limiting factor given the assumption

that plenty of space is available for these elements: low-weight exponent represen-

tations accordingly may be used for the corresponding exponents, so changes of

the parameterization have less of an impact here. (Refer to Section 2.1 for applica-

ble weight estimates.) The simulation results can be found in Table 1. The values

in the ¬rst row (s = 1) re¬‚ect the special situation when no splitting at all is done.

This applies to the multi-exponentiation algorithm for a new base g1 for which no

cache entry is available (Section 3.1), where a signed-digit representation is used

for the full-length exponent e1 . The remaining rows contain operation counts for

cases where g1 is an old base, i.e., an existing cache entry is used (Section 3.2).

As we can see from the table, the number of squarings will be reduced to about

50 B. M¨ller and A. Rupp