⎢ YDM , S ← KeyG (YDMGM , t = 1, m = 1); Y := YDMGM ||YDM ; ⎥

⎢ ⎥

DM

Pr ⎢ (δ1 , . . . , δK ) ← AOSign,MINING,Corrupt (1κ ); ⎥

⎢ ⎥

⎣ output1 ← usageChar(tableLook(OPEN(δ1 )), . . . , tableLook(OPEN(δK ))); ¦

output2 ← MINING(δ1 , . . . , δK )] : output1 = output2

recall that the MINING protocol is a code snippet that employs the subprotocols

DEC, CMP and MIX (as de¬ned in section 2) that operate on the mining tags

of the signatures.

2. Traceability: For any ppt adversary A, the following probability is negligible:

⎡ ¤

YDMGM , public user database, sk1 , . . . , skn ← KeyGDMGM (1κ , n);

⎢ aux, YDM , S1 , . . . , Sm , m, t ← A(YDMGM ); Y := YDMGM ||YDM ; ⎥

⎢ ⎥

Pr ⎢ M, δ ← ASign,Corrupt (Y, aux) s.t. (i, M ) ∈ hist(Sign); ⎥

⎢ ⎥

⎣ true ← VERIFY(Y, M, δ); i ← tableLook(OPEN(δ)) : ¦

(i ∈ Corr) ∨ (i ∈ {1, . . . , n})

/

3. Anonymity: an adversary A against anonymity is a ppt that receives the

public-key Y of the system as well as it is allowed to corrupt any number of

signers adaptively. The identi¬ers of the corrupted signers are maintained in

a set Corr . Furthermore A interacts with three oracles OSign, Open, Mining,

where OSign stands for an oracle that receives (i, M ) and returns a signature

on behalf of i-th signer whereas Open and Mining stand for the single (honest)

server executions of the corresponding two protocols de¬ned above.

70 A. Kiayias, S. Xu, and M. Yung

Consider now an oracle anonSign with the speci¬cation that it takes as

input a pair (i, M ) but it ignores its ¬rst input (which is the user™s iden-

tity) and records all its answers. We also de¬ne two oracles anonOpen and

anonMining. (1) anonOpen takes as input a signature δ on M ; if δ is not valid

it returns ⊥; if δ was the output of anonSign on input (i, M ) it returns pki

(the public-key of the user on whose behalf anonSign produced a signature),

otherwise (if δ was not produced by anonSign) the oracle behaves as Open as

long as Open returns some pki such that i ∈ Corr (otherwise, if i ∈ Corr

the oracle returns ⊥). (2) anonMining is given as input a sequence of valid

signatures δ1 , . . . , δK ; if δj was the output of anonSign on input (i, M ), set

Lj = i. Otherwise, compute i = tableLook(Open(δj )) and if i ∈ Corr set

Lj = i; if on the other hand, i ∈ Corr return ⊥. Finally, the anonMining

oracle returns usageChar(L1 , . . . , LK ).

A data mining group signature, satis¬es anonymity if there exists an ora-

cle anonSign such that for the oracles anonOpen, anonMining as de¬ned above

it holds that any ppt anoymity adversary A cannot distinguish between these

three oracles and the OSign, Open, Mining oracles.

Some remarks about the de¬nition above are in place: The de¬nition of anonymity

is in the sense of indistinguishability between the real implementation of Mining

and Open functionalities and an idealized version of them. This allows maximum

¬‚exibility in designing distributed datamining schemes.

The de¬nition of anonymity implies the non-malleability of the encryption

algorithm employed in DMGS, since if an adversary is capable of modifying

the signature of an uncorrupted user without a¬ecting its validity, this would

force the anonOPEN oracle to return ⊥ (something that would not occur in the

case of the OPEN oracle). Furthermore, observe that the de¬nition of anonymity

implies that the encryption algorithm satis¬es IND-CPA security (as all signa-

tures and hence ciphertexts can be simulated by anonSign without the plaintext

information that corresponds to the signer™s identity).

Note that the functions of anonOpen, anonMining can consult the correspon-

dence between simulated signatures and identities and thus maintain correctness

as in a real world execution.

4 Data Mining Group Signatures: E¬cient Construction

Here we present a concrete DMGS scheme based on the short group signature

of Boneh et al. [4], which is based on bilinear maps. The public parameters of

the scheme are the following:

(p1) Two groups of order p where p is a p -bit prime, denoted by G1 = g1

and G2 = g2 , so that e : G1 — G2 ’ GT is a bilinear map such that (1)

for all u ∈ G1 , v ∈ G2 , a, b ∈ Z, e(ua , v b ) = e(u, v)ab , and (2) e(g1 , g2 ) =

1GT . Moreover, let ψ be a computable isomorphism from G2 to G1 with

ψ(g2 ) = g1 .

(p2) An elliptic curve group of prime order q where q is q -bit prime, denoted

by G1 , over which the Decisional Di¬e-Hellman is hard.

Privacy Preserving Data Mining within Anonymous Credential Systems 71

We assume p = q for simplicity but our construction can also be ported to the

more general setting that di¬erent size groups are being used. We notice that p

can be quite small, e.g., the order of 170 bits is su¬cient. Now we specify the

signature scheme.

KeyGDMGM : The public parameters are selected as described above in (p1) and

γ

(p2). The key-generator selects γ ←R Zp and sets w = g2 . The key YDMGM

is set to g1 , g2 , w, u, desc(G1 ||G2 ||GT ||e), n , where desc(·) is a description of

the given groups including membership test and de¬nition of group operation,

1/(γ+xi )

for xi ←R Zp .

and the users™ secret keys are set to ski = xi , σi = g1

xi

Note that e(σi , g2 w) = e(g1 , g2 ) is the property satis¬ed by all user secret keys,

and that σi or Gxi will uniquely identify the user. We also let u ∈ G1 to be

a generator of the group. The DMGM maintains a user database that contains

entries of the form {(idi , Gxi )}i . The algorithm tableLook(·) on input Gxi will

return the identity idi . Note that idi , Gxi can be required to be digitally signed

by the user so that non-repudiation is facilitated (but we do not consider this

aspect in our current modeling).

KeyGDM : here we use the m-server protocols ExpGen and PkGen2 from section 2

over the group G1 . Recall that the protocol is based on parameters m, t and

will produce the values H0 , H1 , . . . , Ht’1 ∈ G1 as well as the private output

Si for each server i such that GSi = H0 H1 . . . Ht’1 . The public-key that will be

it’1

i

(t,m)

used for encryption will be H0 = Gs with s ←’ (S1 , . . . , Sm ). Additionally the

1

protocol PkGen2 produces value G2 ∈ G1 .

Remark 1. If the DMGM should not know the private keys of the users (i.e., if

the property of exculpability is required), then according to [4] one can achieve

this by extending the above KeyGDMGM algorithm as follows: instead of giving

1/(γ+xi )

user i the private key (σi = g1 , xi ), the user and the key issuer can execute

an interactive protocol so that at the end user i will obtain a triple (σi , xi , yi )

such that σi i hyi = g1 , where h1 ∈ G1 is a public parameter, and yi ←R Zp is

γ+x

1

chosen by the user and kept secret from the group manager. If done so, the SIGN

protocol below needs to be extended correspondingly (but this can be done in a

straightforward manner).

Sign: Given a user™s secret key x, σ and a message M . The signing algorithm will

be obtained by applying the Fiat-Shamir heuristics on an appropriately selected

proof of knowledge. The proof will also be helpful for the non-malleability aspects

of the ciphertext that is embedded into the signature. Below we explain this proof

in detail. First, the signer computes the following values: T1 = g1 uz , T2 = g1 σ,

z z

T3 = Gr , T4 = Gr , T5 = H0 Gx , where r, z, z ←R Zp .

r

1 2

Subsequently the signer will construct the signature on a given message M by

providing a proof for a suitable set of relations; the signer knows the witnesses

v

vx , vz , vz , vxz , vxz , vr that satisfy the following relationships: T1 = g1 z uvz in

G1 , T3 = Gvr in G1 , T4 = Gvr in G1 , T5 = H0 r Gvx in G1 , T1 x = g1 xz uvxz

v v v

1 2

in G1 , and e(T2 , g2 )vx · e(g1 , g2 )’vxz · e(g1 , w)’vz = e(g1 , g2 )/e(T2 , w) in GT .

72 A. Kiayias, S. Xu, and M. Yung

As a consequence, the signature is constructed as follows: ¬rst the values

ρz , ρz , ρx , ρxz , ρxz ←R Zp and ρr ←R Zq are selected. Then the following values

are computed:

R1 = g1 z uρz , R2 = e(T2 , g2 )ρx · e(g1 , g2 )’ρxz · e(g1 , w)’ρz

ρ