x=0

n’1 n’1

1/n2 = 1/n.

= P[Xi = x]P[Xj = x] =

x=0 x=0

We can compute the expectation and variance (see Example 6.22):

1 1 1

Var[Wij ] = (1 ’ ).

E[Wij ] =

,

n n n

Now consider the random variable

k k

W := Wij ,

i=1 j=i+1

which represents the number of distinct pairs of people with the same birth-

day. There are k(k ’ 1)/2 terms in this sum, so by the linearity of expecta-

tion, we have

k(k ’ 1)

E[W ] = .

2n

Thus, for k(k ’ 1) ≥ 2n, we “expect” there to be at least one pair of

matching birthdays. However, this does not guarantee that the probability

of a matching pair of birthdays is very high, assuming just pairwise inde-

pendence of the Xi . For example, suppose that n is prime and the Xi are

a subset of the family of pairwise independent random variables de¬ned in

Example 6.17. That is, each Xi is of the form ai X + Y , where X and Y

are uniformly and independently distributed modulo n. Then in fact, either

all the Xi are distinct, or they are all equal, where the latter event occurs

exactly when X = [0]n , and so with probability 1/n ” “when it rains, it

pours.”

To get a useful upper bound on the probability ± that there are no match-

ing birthdays, it su¬ces to assume that the Xi are 4-wise independent. In

this case, it is easy to verify that the variables Wij are pairwise indepen-

dent, since any two of the Wij are determined by at most four of the Xi .

Therefore, in this case, the variance of the sum is equal to the sum of the

6.6 The birthday paradox 123

variances, and so

k(k ’ 1) 1

(1 ’ ) ¤ E[W ].

Var[W ] =

2n n

Furthermore, by Chebyshev™s inequality,

± = P[W = 0] ¤ P[|W ’ E[W ]| ≥ E[W ]]

2n

¤ Var[W ]/E[W ]2 ¤ 1/E[W ] = .

k(k ’ 1)

Thus, if k(k ’ 1) ≥ 4n, then ± ¤ 1/2.

In many practical applications, it is more important to bound ± from

below, rather than from above; that is, to bound from above the probability

1 ’ ± that there are any collisions. For this, pairwise independence of the

Xi su¬ces, since than we have P[Wij = 1] = 1/n, and by (6.5), we have

k k

k(k ’ 1)

1’±¤ P[Wij = 1] = ,

2n

i=1 j=i+1

which is at most 1/2 provided k(k ’ 1) ¤ n.

n

Exercise 6.25. Let ±1 , . . . , ±n be real numbers with i=1 ±i = 1. Show

that

n n

2 2

0¤ (±i ’ 1/n) = ±i ’ 1/n,

i=1 i=1

and in particular,

n

2

±i ≥ 1/n.

i=1

Exercise 6.26. Let X be a set of size n ≥ 1, and let X and X be indepen-

dent random variables, taking values in X , and with the same distribution.

Show that

1

P[X = x]2 ≥ .

P[X = X ] =

n

x∈X

Exercise 6.27. Let X be a set of size n ≥ 1, and let x0 be an arbitrary,

¬xed element of X . Consider a random experiment in which a function F is

chosen uniformly from among all nn functions from X into X . Let us de¬ne

random variables Xi , for i = 0, 1, 2, . . . , as follows:

X0 := x0 , Xi+1 := F (Xi ) (i = 0, 1, 2, . . .).

124 Finite and discrete probability distributions

Thus, the value of Xi is obtained by applying the function F a total of i

times to the starting value x0 . Since X has size n, the sequence {Xi } must

repeat at some point; that is, there exists a positive integer k (with k ¤ n)

such that Xk = Xi for some i = 0, . . . , k ’ 1. De¬ne the random variable K

to be the smallest such value k.

(a) Show that for any i ≥ 0 and any ¬xed values of x1 , . . . , xi ∈ X such

that x0 , x1 , . . . , xi are distinct, the conditional distribution of Xi+1

given that X1 = x1 , . . . , Xi = xi is uniform over X .

(b) Show that for any integer k ≥ 1, we have K ≥ k if and only if

X0 , X1 , . . . , Xk’1 take on distinct values.

(c) From parts (a) and (b), show that for any k = 1, . . . , n, we have

P[K ≥ k | K ≥ k ’ 1] = 1 ’ (k ’ 1)/n,

and conclude that

k’1

(1 ’ i/n) ¤ e’k(k’1)/2n .

P[K ≥ k] =

i=1

(d) Show that

∞

e’k(k’1)/2n = O(n1/2 )

k=1

and then conclude from part (c) that

∞

n

e’k(k’1)/2n = O(n1/2 ).

P[K ≥ k] ¤

E[K] =

k=1 k=1

(e) Modify the above argument to show that E[K] = „¦(n1/2 ).

Exercise 6.28. The setup for this exercise is identical to that of the previous

exercise, except that now, the function F is chosen uniformly from among

all n! permutations of X .

(a) Show that if K = k, then Xk = X0 .

(b) Show that for any i ≥ 0 and any ¬xed values of x1 , . . . , xi ∈ X such

that x0 , x1 , . . . , xi are distinct, the conditional distribution of Xi+1

given that X1 = x1 , . . . , Xi = xi is uniform over X \ {x1 , . . . , xi }.

(c) Show that for any k = 2, . . . , n, we have

1

P[K ≥ k | K ≥ k ’ 1] = 1 ’ ,

n’k+2

6.7 Hash functions 125

and conclude that for all k = 1, . . . , n, we have

k’2

k’1

1

P[K ≥ k] = 1’ =1’ .

n’i n

i=0

(d) From part (c), show that K is uniformly distributed over {1, . . . , n},

and in particular,