no cost in hardware implementations.

5.4 Toom-3 and Toom-4

From [24], we take the number of basic operations needed for the 3 and 4-way

splitting. As usual we consider p-bits operands.

With Toom-3 we operate on p/3 -bits long parts. Evaluation requires 5

additions and 2 shifts per operand. The 5 point-wise multiplications involve

(p/3) + 2 -bits long parts, because of evaluations in x and x + 1, which increase

the degree. Interpolation operates on doubled parts, and requires 10 additions,

2 shifts and 2 divisions by x + 1, plus 2 other additions for ¬nal recomposition;

in total we have a cost of 10 + 2 · 2 + 2 additions.

The Toom-3 product by a ¬xed operand hence requires:

“ p/3 · 5 bit operations for the evaluation,

“ 5 multiplications of polynomials with degree (p/3) + 2 ,

“ (p/3) + 2 · 2 · 17 bit operations for the interpolation.

For Toom-4 we have p/4 -bits long parts. Evaluation requires 15 additions

and 9 shifts per operand. The 7 point-wise multiplications involve (p/4)+3 -bits

long parts. Interpolation operates on doubled parts, and requires 29 additions,

16 shifts and 4 divisions by x2 + 1 and x3 + 1, plus 3 other additions for ¬nal

recomposition; in total we have the cost of 29 + 4 · 2 + 3 additions.

The Toom-4 product by a ¬xed operand hence requires:

“ p/4 · 15 bit operations for the evaluation,

“ 7 multiplications of polynomials with degree (p/4) + 3 ,

“ (p/4) + 3 · 2 · 40 bit operations for the interpolation.

All Toom-s methods can be used recursively and have asymptotic complexity

O(plogs (2s’1) ), but the bigger the splitting order, the heavier the overhead of

evaluation/interpolation.

5.5 Numerical Examples

Let us ¬x the following choice for the system parameters: p = 8192, k0 = 2, n0 =

3, that are the choices adopted in the second variant of the cryptosystem (see

Section 3.3).

The use of 2 recursions of Toom-4, 1 of Toom-3 and 4 of Toom-2 reduces one

p-bit multiplication to 72 · 5 · 34 = 19845, 11-bits sized, sub-products, each one

A New Analysis of the McEliece Cryptosystem Based on QC-LDPC Codes 257

with a cost of 112 /2 operations. We have an overhead of 301 655 operations for

evaluations and 1 617 574 for interpolation. The total cost of computing u · G is

then 301 655·k0+1 617 574·n0+72 ·5·34 ·112 ·k0 ·n0 /2+n0 ·p = 12 684 343 bit opera-

tions, included the ¬nal reduction modulo (xp +1). This count gives a far smaller

result than the na¨ technique, that requires k0 · n0 · p2 /2 = 201 326 592 opera-

±ve

tions. With the same approach, we can obtain the cost of computing u · S = u:

8 657 332 operations using Toom-Cook, versus 134 217 728 with a na¨ imple-

±ve

mentation.

With parameters p = 4096, k0 = 3, n0 = 4, that are very similar to the

choices adopted for the proposal in [9], we assume 3 recursions of Toom-4 and 3

of Toom-2, that result in a number of bit operations for u · G of 8 074 444 versus

100 663 296 with the na¨ approach. While for the decryption step u · S = u

±ve

we obtain 6 166 127 bit operations versus 75 497 472.

6 Vector-Toeplitz Convolution

Another approach to speed-up polynomial products in cryptography is the Wino-

grad convolution [26]. It is very similar to Karatsuba™s multiplication, but it has

a smaller overhead. We shortly recall it. Given an even sized 2d — 2d Toeplitz

matrix T, we can factorize it as follows:

⎛ ⎞⎛ ⎞

T1 ’ T0 0 0 0I

T0 T1 I0I ⎝ ⎠ ⎝ I 0⎠ ,

T2 ’ T0 0

0

=

T2 T0 0II

0 0 T0 II

where I is the d— d identity matrix, and T0 , T1 , T2 are themselves d— d Toeplitz

matrices, as also T1 ’ T0 and T2 ’ T0 . It follows that the vector-matrix product

(V0 , V1 ) · T can be computed with three steps:

“ the addition V0 + V1 ,

“ 3 vector-matrix sub-products by d — d Toeplitz matrices,

“ 2 more additions to obtain the result.

Since circulant matrices are also Toeplitz and the proposed size p is a power

of 2, this optimization can be used for as many recursions as needed for our

computations. Asymptotic complexity is exactly the same for Toom-2 and this

approach, but if we analyze again the pre- and post-computation, we obtain:

“ p/2 operations for the “evaluation”,

“ 3 multiplications with dimension p/2,

“ p/2 operations for the “interpolation”.

6.1 Numerical Examples

This method cannot be mixed with Toom-Cook, and its main advantage is that

it is much easier to implement in software.

If we consider again p = 8192, k0 = 2, n0 = 3, the use of 11 recursions of Wino-

grad™s method leads to 14 106 224 operations for encryption versus 12 684 343

258 M. Baldi, M. Bodrato, and F. Chiaraluce

obtained with Toom. The decryption step u · S = u can be completed in

9 871 080 operations, around 15% more than with the polynomial strategy, but

with a far simpler source code. With smaller parameters, for example p =

4096, k0 = 3, n0 = 4, the di¬erence is smaller: 11 recursions used for encryption

give 8 103 706 operations, only 1% more than the cost computed using Toom-

Cook.

On the other hand, the use of polynomials gives greater ¬‚exibility. For exam-

ple, with the odd parameter p = 5555, k0 = 2, n0 = 3, 2 recursions of Toom-4 and

5 of Toom-2 give a cost for encryption of 7 310 809 bit operations, a 12— speed-

up with respect to the na¨ approach. For the same parameters, Winograd™s

±ve

trick is not applicable at all, because p is odd.

7 Cryptosystem Complexity Assessment

In this section we evaluate the encryption and decryption complexity of the pro-

posed cryptosystem, by considering the usage of e¬cient computation algorithms

suitable for the particular structure of the matrices involved.

Encryption complexity is due to multiplication of the cleartext by the code

generator matrix and to addition of intentional errors. It can be expressed as

follows:

Cenc = Cmul (u · G ) + n (12)

where Cmul (u · G ) represents the number of operations needed for calculating

the product u · G and n binary operations are considered for the addition of

vector e.

The decryption complexity, instead, can be divided into three parts:

Cdec = Cmul (x · Q) + CSP A + Cmul (u · S) (13)

where Cmul (x · Q) and Cmul (u · S) represent the number of operations needed

for computing x · Q and u · S, respectively, while CSP A is the number of op-

erations required for LDPC decoding through the sum-product algorithm. In

expressions (12) and (13), Cmul (u · G ) and Cmul (u · S) involve multiplication

by dense matrices, so we can resort to e¬cient algorithms, like the Toom-Cook

method described in the previous section. Cmul (x · Q), instead, expresses the

number of operations needed to perform the product of a 1—n vector by a sparse

n— n matrix (Q, with row/column weight equal to m). For this reason, we resort

to the na¨ implementation, that has the lowest complexity Cmul (x · Q) = n·m.

±ve

For the decoding complexity, the following expression can be adopted [16]:

CSP A = Iave · n [q (8dv + 12R ’ 11) + dv ] (14)

where Iave is the average number of decoding iterations, q is the number of

quantization bits used inside the decoder and R = k0 /n0 is the code rate. Nu-

merical simulations have permitted to verify that, for the codes involved in the

¬rst cryptosystem implementation, assuming q = 6 and t = 190, it is Iave 5.

A New Analysis of the McEliece Cryptosystem Based on QC-LDPC Codes 259

For the codes used in the new variant, instead, assuming q = 6 and t = 470, it

is Iave 9. These values of Iave are further reduced for smaller t.

As concerns the public key length, the proposed cryptosystem uses, as the

public key, a generator matrix, G , formed by k0 — n0 circulant blocks with size

p. Therefore, it can be completely described by k0 · n0 · p bits.

Table 1. Comparison between the proposed versions of QC-LDPC based McEliece

cryptosystem and other schemes

McEliece Niederreiter RSA QC-LDPC QC-LDPC