1 n(n + 1)(2n + 1) 1 (n + 1)(2n + 1)

2

x2 · ·=

E[X ] = = .

n 6 n 6

x=1

Therefore,

n2 ’ 1

.2

2 2

Var[X] = E[X ] ’ (E[X]) =

12

Example 6.21. Let X denote the value of a die toss. Let A be the event

that X is even. Then in the conditional probability space given A, we see

that X is uniformly distributed over {2, 4, 6}, and hence

2+4+6

E[X | A] = = 4.

3

Similarly, in the conditional probability space given A, we see that X is

uniformly distributed over {1, 3, 5}, and hence

1+3+5

E[X | A] =

= 3.

3

We can compute the expected value of X using these conditional expecta-

tions; indeed, we have

1 1 7

E[X] = E[X | A]P[A] + E[X | A]P[A] = 4 · +3· = ,

2 2 2

which agrees with the calculation in previous example. 2

116 Finite and discrete probability distributions

Example 6.22. Suppose that a random variable X takes the value 1 with

probability p, and 0 with probability q := 1 ’ p. The distribution of X is

that of a Bernoulli trial, as discussed in Example 6.3. Let us compute E[X]

and Var[X]. We have

E[X] = 1 · p + 0 · q = p.

We also have

E[X 2 ] = 12 · p + 02 · q = p.

Therefore,

Var[X] = E[X 2 ] ’ (E[X])2 = p ’ p2 = pq. 2

Example 6.23. Suppose that X1 , . . . , Xn are mutually independent ran-

dom variables such that each Xi takes the value 1 with probability p and 0

with probability q := 1 ’ p. Let us set X := X1 + · · · + Xn . Note that the

distribution of each Xi is that of a Bernoulli trial, as in Example 6.3, and

the distribution of X is a binomial distribution, as in Example 6.19. By the

previous example, we have E[Xi ] = p and Var[Xi ] = pq for i = 1, . . . , n. Let

us compute E[X] and Var[X]. By Theorem 6.6, we have

n

E[X] = E[Xi ] = np,

i=1

and by Theorem 6.10, and the fact that the Xi are mutually independent,

we have

n

Var[Xi ] = npq. 2

Var[X] =

i=1

Exercise 6.18. A casino o¬ers you the following four dice games. In each

game, you pay 15 dollars to play, and two dice are rolled. In the ¬rst game,

the house pays out four times the value of the ¬rst die (in dollars). In the

second, the house pays out twice the sum of the two dice. In the third,

the house pays the square of the ¬rst. In the fourth, the house pays the

product of the two dice. Which game should you play? That is, which game

maximizes your expected winnings?

Exercise 6.19. Suppose X and Y are independent real random variables

such that E[X] = E[Y ]. Show that

E[(X ’ Y )2 ] = Var[X] + Var[Y ].

6.5 Some useful bounds 117

Exercise 6.20. Show that the variance of any 0/1-valued random variable

is at most 1/4.

Exercise 6.21. A die is tossed repeatedly until it comes up “1,” or until it

is tossed n times (whichever comes ¬rst). What is the expected number of

tosses of the die?

Exercise 6.22. Suppose that 20 percent of the students who took a certain

test were from school A and the average of their scores on the test was

65. Also, suppose that 30 percent of the students were from school B and

the average of their scores was 85. Finally, suppose that the remaining 50

percent of the students were from school C and the average of their scores

was 72. If a student is selected at random from the entire group that took

the test, what is the expected value of his score?

Exercise 6.23. An urn contains r ≥ 0 red balls and b ≥ 1 black balls.

Consider the following experiment. At each step in the experiment, a single

ball is removed from the urn, randomly chosen from among all balls that

remain in the urn: if a black ball is removed, the experiment halts, and

if a red ball is removed, the experiment continues (without returning the

red ball to the urn). Show that the expected number of steps performed is

(r + b + 1)/(b + 1).

6.5 Some useful bounds

In this section, we present several theorems that can be used to bound the

probability that a random variable deviates from its expected value by some

speci¬ed amount.

Theorem 6.11 (Markov™s inequality). Let X be a random variable that

takes only non-negative real values. Then for any t > 0, we have

P[X ≥ t] ¤ E[X]/t.

Proof. We have

E[X] = xP[X = x] = xP[X = x] + xP[X = x].

x x<t x≥t

Since X takes only non-negative values, all of the terms in the summation

are non-negative. Therefore,

tP[X = x] = tP[X ≥ t]. 2

E[X] ≥ xP[X = x] ≥

x≥t x≥t

118 Finite and discrete probability distributions

Markov™s inequality may be the only game in town when nothing more

about the distribution of X is known besides its expected value. However,

if the variance of X is also known, then one can get a better bound.

Theorem 6.12 (Chebyshev™s inequality). Let X be a real random vari-

able. Then for any t > 0, we have

P[|X ’ E[X]| ≥ t] ¤ Var[X]/t2 .

Proof. Let Y := (X ’ E[X])2 . Then Y is always non-negative, and E[Y ] =

Var[X]. Applying Markov™s inequality to Y , we have

P[|X ’ E[X]| ≥ t] = P[Y ≥ t2 ] ¤ Var[X]/t2 . 2

An important special case of Chebyshev™s inequality is the following. Sup-

pose that X1 , . . . , Xn are pairwise independent real random variables, each

with the same distribution. Let µ be the common value of E[Xi ] and ν the

common value of Var[Xi ]. Set

1

(X1 + · · · + Xn ).

X :=

n

The variable X is called the sample mean of X1 , . . . , Xn . By the linearity of

expectation, we have E[X] = µ, and since the Xi are pairwise independent,

it follows from Theorem 6.10 (along with part (ii) of Theorem 6.9) that

Var[X] = ν/n. Applying Chebyshev™s inequality, for any > 0, we have

ν

P[|X ’ µ| ≥ ] ¤ 2 . (6.17)

n

The inequality (6.17) says that for all > 0, and for all δ > 0, there exists n0

(depending on and δ, as well as the variance ν) such that n ≥ n0 implies

P[|X ’ µ| ≥ ] ¤ δ. (6.18)

In words: