pected value µ with high probability.

This fact, known as the law of large numbers, justi¬es the usual intuitive

interpretation given to expectation.

Let us now examine an even more specialized case of the above situation.

Suppose that X1 , . . . , Xn are pairwise independent random variables, each of

which takes the value 1 with probability p, and 0 with probability q := 1 ’ p.

As before, let X be the sample mean of X1 , . . . , Xn . As we calculated in

6.5 Some useful bounds 119

Example 6.22, the Xi have a common expected value p and variance pq.

Therefore, by (6.17), for any > 0, we have

pq

P[|X ’ p| ≥ ] ¤ 2 . (6.19)

n

The bound on the right-hand side of (6.19) decreases linearly in n. If one

makes the stronger assumption that the Xi are mutually independent (so

that X := X1 + · · · + Xn has a binomial distribution), one can obtain a

much better bound that decreases exponentially in n:

Theorem 6.13 (Cherno¬ bound). Let X1 , . . . , Xn be mutually indepen-

dent random variables, such that each Xi is 1 with probability p and 0 with

probability q := 1 ’ p. Assume that 0 < p < 1. Also, let X be the sample

mean of X1 , . . . , Xn . Then for any > 0, we have:

2 /2q

(i) P[X ’ p ≥ ] ¤ e’n ;

2 /2p

(ii) P[X ’ p ¤ ’ ] ¤ e’n ;

2 /2

(iii) P[|X ’ p| ≥ ] ¤ 2 · e’n .

Proof. First, we observe that (ii) follows directly from (i) by replacing Xi

by 1 ’ Xi and exchanging the roles of p and q. Second, we observe that (iii)

follows directly from (i) and (ii). Thus, it su¬ces to prove (i).

Let ± > 0 be a parameter, whose value will be determined later. De¬ne

the random variable Z := e±n(X’p) . Since the function x ’ e±nx is strictly

increasing, we have X’p ≥ if and only if Z ≥ e±n . By Markov™s inequality,

it follows that

P[X ’ p ≥ ] = P[Z ≥ e±n ] ¤ E[Z]e’±n . (6.20)

So our goal is to bound E[Z] from above.

For i = 1, . . . , n, de¬ne the random variable Zi := e±(Xi ’p) . Note that

Z = n Zi , that the Zi are mutually independent random variables (see

i=1

Theorem 6.5), and that

E[Zi ] = e±(1’p) p + e±(0’p) q = pe±q + qe’±p .

It follows that

E[Zi ] = (pe±q + qe’±p )n .

E[Z] = E[ Zi ] =

i i

We will prove below that

2 q/2

pe±q + qe’±p ¤ e± . (6.21)

120 Finite and discrete probability distributions

From this, it follows that

2 qn/2

E[Z] ¤ e± . (6.22)

Combining (6.22) with (6.20), we obtain

2 qn/2’±n

P[X ’ p ≥ ] ¤ e± . (6.23)

Now we choose the parameter ± so as to minimize the quantity ±2 qn/2’±n .

The optimal value of ± is easily seen to be ± = /q, and substituting this

value of ± into (6.23) yields (i).

To ¬nish the proof of the theorem, it remains to prove the inequality

(6.21). Let

β := pe±q + qe’±p .

2 q/2

We want to show that β ¤ e± , or equivalently, that log β ¤ ±2 q/2. We

have

β = e±q (p + qe’± ) = e±q (1 ’ q(1 ’ e’± )),

and taking logarithms and applying parts (i) and (ii) of §A1, we obtain

log β = ±q + log(1 ’ q(1 ’ e’± )) ¤ ±q ’ q(1 ’ e’± ) = q(e’± + ± ’ 1) ¤ q±2 /2.

This establishes (6.21) and completes the proof of the theorem. 2

Thus, the Cherno¬ bound is a quantitatively superior version of the law

of large numbers, although its range of application is clearly more limited.

Example 6.24. Suppose we toss 10,000 coins. The expected number of

heads is 5,000. What is an upper bound on the probability ± that we get

6,000 or more heads? Using Markov™s inequality, we get ± ¤ 5/6. Using

Chebyshev™s inequality, and in particular, the inequality (6.19), we get

1/4 1

±¤ = .

104 10’2 400

Finally, using the Cherno¬ bound, we obtain

4 10’2 /2(0.5)

± ¤ e’10 = e’100 ≈ 10’43.4 . 2

Exercise 6.24. You are given a biased coin. You know that if tossed, it will

come up heads with probability at least 51%, or it will come up tails with

probability at least 51%. Design an experiment that attempts to determine

the direction of the bias (towards heads or towards tails). The experiment

should work by ¬‚ipping the coin some number t times, and it should correctly

determine the direction of the bias with probability at least 99%. Try to

make t as small as possible.

6.6 The birthday paradox 121

6.6 The birthday paradox

This section discusses a number of problems related to the following ques-

tion: how many people must be in a room before there is a good chance

that two of them were born on the same day of the year? The answer is

surprisingly few, whence the “paradox.”

To answer this question, we index the people in the room with integers

1, . . . , k, where k is the number of people in the room. We abstract the

problem a bit, and assume that all years have the same number of days,

say n ” setting n = 365 corresponds to the original problem, except that

leap years are not handled correctly, but we shall ignore this detail. For

i = 1, . . . , k, let Xi denote the day of the year on which i™s birthday falls.

Let us assume that birthdays are uniformly distributed over {0, . . . , n ’ 1};

this assumption is actually not entirely realistic, as it is well known that

people are somewhat more likely to be born in some months than in others.

So for any i = 1, . . . , k and x = 0, . . . , n ’ 1, we have P[Xi = x] = 1/n.

Let ± be the probability that no two persons share the same birthday, so

that 1 ’ ± is the probability that there is a pair of matching birthdays. We

would like to know how big k must be relative to n so that ± is not too

large, say, at most 1/2.

We can compute ± as follows, assuming the Xi are mutually independent.

There are a total of nk sequences of integers (x1 , . . . , xk ), with each xi ∈

{0, . . . , n ’ 1}. Among these, there are a total of n(n ’ 1) · · · (n ’ k + 1) that

contain no repetitions: there are n choices for x1 , and for any ¬xed value of

x1 , there are n ’ 1 choices for x2 , and so on. Therefore

k’1

1 2

± = n(n ’ 1) · · · (n ’ k + 1)/nk = 1’ 1’ ·· · 1’ . (6.24)

n n n

Using the part (i) of §A1, we obtain

Pk’1

’

= e’k(k’1)/2n .

i/n

±¤e i=1

So if k(k ’ 1) ≥ (2 log 2)n, we have ± ¤ 1/2. Thus, when k is at least a

small constant times n1/2 , we have ± ¤ 1/2, so the probability that two

people share the same birthday is at least 1/2. For n = 365, k ≥ 23 su¬ces.

Indeed, one can simply calculate ± in this case numerically from equation

(6.24), obtaining ± ≈ 0.493. Thus, if there are 23 people in the room, there

is about a 50-50 chance that two people have the same birthday.

The above analysis assumed the Xi are mutually independent. However,

we can still obtain useful upper bounds for ± under much weaker indepen-

dence assumptions.

122 Finite and discrete probability distributions

For i = 1, . . . , k and j = i + 1, . . . , k, let us de¬ne the indicator variable

1 if Xi = Xj ,

Wij :=

0 if Xi = Xj .

If we assume that the Xi are pairwise independent, then

n’1

P[Xi = x § Xj = x]