E[aX] = aE[X].

Proof. It is easiest to prove this using the de¬ning equation (6.12) for ex-

pectation. For u ∈ U, the value of the random variable X + Y at u is by

de¬nition X(u) + Y (u), and so we have

E[X + Y ] = (X(u) + Y (u))P[u]

u∈U

= X(u)P[u] + Y (u)P[u]

u∈U u∈U

= E[X] + E[Y ].

For the second part of the theorem, by a similar calculation, we have

X(u)P[u] = aE[X]. 2

E[aX] = (aX(u))P[u] = a

u u

More generally, the above theorem implies (using a simple induction ar-

gument) that for any real random variables X1 , . . . , Xn , we have

E[X1 + · · · + Xn ] = E[X1 ] + · · · + E[Xn ].

So we see that expectation is linear; however, expectation is not in general

multiplicative, except in the case of independent random variables:

Theorem 6.7. If X and Y are independent real random variables, then

E[XY ] = E[X]E[Y ].

Proof. It is easiest to prove this using (6.14). We have

xyP[X = x § Y = y]

E[XY ] =

x,y

= xyP[X = x]P[Y = y]

x,y

6.4 Expectation and variance 113

= xP[X = x] yP[Y = y]

x y

= E[X] · E[Y ]. 2

More generally, the above theorem implies (using a simple induction ar-

gument) that if X1 , . . . , Xn are mutually independent real random variables,

then

E[X1 · · · Xn ] = E[X1 ] · · · E[Xn ].

The following fact is sometimes quite useful:

Theorem 6.8. If X is a random variable that takes values in the set

{0, 1, . . . , n}, then

n

P[X ≥ i].

E[X] =

i=1

Proof. For i = 1, . . . , n, de¬ne the random variable Xi so that Xi = 1 if

X ≥ i and Xi = 0 if X < i. Note that E[Xi ] = 1 · P[X ≥ i] + 0 · P[X < i] =

P[X ≥ i]. Moreover, X = X1 + · · · + Xn , and hence

n n

P[X ≥ i]. 2

E[X] = E[Xi ] =

i=1 i=1

The variance of a real random variable X is Var[X] := E[(X ’ E[X])2 ].

The variance provides a measure of the spread or dispersion of the distri-

bution of X around its expected value E[X]. Note that since (X ’ E[X])2

takes only non-negative values, variance is always non-negative.

Theorem 6.9. Let X be a real random variable, and let a and b be real

numbers. Then we have

(i) Var[X] = E[X 2 ] ’ (E[X])2 ,

(ii) Var[aX] = a2 Var[X], and

(iii) Var[X + b] = Var[X].

Proof. Let µ := E[X]. For part (i), observe that

Var[X] = E[(X ’ µ)2 ] = E[X 2 ’ 2µX + µ2 ]

= E[X 2 ] ’ 2µE[X] + E[µ2 ] = E[X 2 ] ’ 2µ2 + µ2

= E[X 2 ] ’ µ2 ,

where in the third equality, we used the fact that expectation is linear, and

114 Finite and discrete probability distributions

in the fourth equality, we used the fact that E[c] = c for constant c (in this

case, c = µ2 ).

For part (ii), observe that

Var[aX] = E[a2 X 2 ] ’ (E[aX])2 = a2 E[X 2 ] ’ (aµ)2

= a2 (E[X 2 ] ’ µ2 ) = a2 Var[X],

where we used part (i) in the ¬rst and fourth equality, and the linearity of

expectation in the second.

Part (iii) follows by a similar calculation (verify):

Var[X + b] = E[(X + b)2 ] ’ (µ + b)2

= (E[X 2 ] + 2bµ + b2 ) ’ (µ2 + 2bµ + b2 )

= E[X 2 ] ’ µ2 = Var[X]. 2

A simple consequence of part (i) of Theorem 6.9 is that E[X 2 ] ≥ (E[X])2 .

Unlike expectation, the variance of a sum of random variables is not equal

to the sum of the variances, unless the variables are pairwise independent:

Theorem 6.10. If X1 , . . . , Xn is a collection of pairwise independent real

random variables, then

n n

Var Xi = Var[Xi ].

i=1 i=1

Proof. We have

2

2

’ E[

Var Xi = E ( Xi ) Xi ]

i i i

E[Xi2 ] + 2 E[Xi ]2

(E[Xi Xj ] ’ E[Xi ]E[Xj ]) ’

=

i i,j i

j<i

(by Theorem 6.6 and rearranging terms)

E[Xi2 ] ’ E[Xi ]2

=

i i

(by pairwise independence and Theorem 6.7)

Var[Xi ]. 2

=

i

For any random variable X and event B, with P[B] = 0, we can de¬ne

the conditional expectation of X given B, denoted E[X | B], to be the

6.4 Expectation and variance 115

expected value of X in the conditional probability distribution given B. We

have

E[X | B] = X(u) · P[u | B] = xP[X = x | B], (6.15)

u∈U x∈X

where X is the image of X.

If B1 , . . . , Bn is a collection of events that partitions U, where each Bi

occurs with non-zero probability, then it follows from the de¬nitions that

n

E[X | Bi ]P[Bi ].

E[X] = (6.16)

i=1

Example 6.20. Let X be uniformly distributed over {1, . . . , n}. Let us

compute E[X] and Var[X]. We have

n

1 n(n + 1) 1 n+1

x· ·=

E[X] = = .

n 2 n 2

x=1

We also have