6.3 Random variables 109

Since (a1 , . . . , at ) = (b1 , . . . , bt ), we know that aj = bj for some j = 1, . . . , t.

Let us assume that a1 = b1 (the argument for j > 1 is analogous).

We ¬rst show that for all x2 , . . . , xt ∈ Zp , we have

P[Za1 ,...,at = z § Zb1 ,...,bt = z | X2 = x2 § · · · § Xt = xt ] = 1/p2 . (6.11)

To prove (6.11), consider the conditional probability distribution given

X2 = x2 § · · · § Xt = xt . In this conditional distribution, we have

Za1 ,...,at = a1 X1 + Y + c and Zb1 ,...,bt = b1 X1 + Y + d,

where

c := a2 x2 + · · · + at xt and d := b2 x2 + · · · + bt xt ,

and X1 and Y are independent and uniformly distributed over Zp (this

follows from the mutual independence of X1 , . . . , Xt , Y before conditioning).

By the result of the previous example, (a1 X1 + Y, b1 X1 + Y ) is uniformly

distributed over Zp — Zp , and since the function sending (x, y) ∈ Zp — Zp to

(x+c, y+d) ∈ Zp —Zp is a bijection, it follows that (a1 X1 +Y +c, b1 X1 +Y +d)

is uniformly distributed over Zp — Zp . That proves (6.11).

(6.10) now follows easily from (6.11), as follows:

P[Za1 ,...,at = z § Zb1 ,...,bt = z ]

P[Za1 ,...,at = z § Zb1 ,...,bt = z | X2 = x2 § · · · § Xt = xt ] ·

=

x2 ,...,xt

P[X2 = x2 § · · · § Xt = xt ]

1

· P[X2 = x2 § · · · § Xt = xt ]

=

p2

x2 ,...,xt

1

· P[X2 = x2 § · · · § Xt = xt ]

=

p2 x

2 ,...,xt

1

· 1. 2

=

p2

Using other algebraic techniques, there are many ways to construct pair-

wise and k-wise independent families of random variables. Such families

play an important role in many areas of computer science.

Example 6.19. Suppose we perform an experiment by executing n

Bernoulli trials (see Example 6.3), where each trial succeeds with the same

probability p, and fails with probability q := 1 ’ p, independently of the

outcomes of all the other trials. Let X denote the total number of successes.

For k = 0, . . . , n, let us calculate the probability that X = k.

To do this, let us introduce indicator variables X1 , . . . , Xn , where for

110 Finite and discrete probability distributions

i = 1, . . . , n, we have Xi = 1 if the ith trial succeeds, and Xi = 0, otherwise.

By assumption, the Xi are mutually independent. Then we see that X =

X1 + · · · + Xn . Now, consider a ¬xed value k = 0, . . . , n. Let Ck denote

the collection of all subsets of {1, . . . , n} of size k. For I ∈ Ck , let AI be

the event that Xi = 1 for all i ∈ I and Xi = 0 for all i ∈ I. Since the Xi

/

k q n’k (as in Example 6.8).

are mutually independent, we see that P[AI ] = p

Evidently, the collection of events {AI }I∈Ck is a partition of the event that

X = k. Therefore,

pk q n’k = |Ck |pk q n’k .

P[X = k] = P[AI ] =

I∈Ck I∈Ck

Finally, since

n

|Ck | = ,

k

we conclude that

n k n’k

P[X = k] = pq .

k

The distribution of the random variable X is called a binomial distri-

bution. 2

Exercise 6.11. Let X1 , . . . , Xn be random variables, and let Xi be the image

of Xi for i = 1, . . . , n. Show that X1 , . . . , Xn are mutually independent if

and only if for all i = 2, . . . , n, and all x1 ∈ X1 , . . . , xi ∈ Xi , we have

P[Xi = xi | Xi’1 = xi’1 § · · · § X1 = x1 ] = P[Xi = xi ].

Exercise 6.12. Let A1 , . . . , An be events with corresponding indicator vari-

ables X1 , . . . , Xn . Show that the events A1 , . . . , An are mutually indepen-

dent if and only if the random variables X1 , . . . , Xn are mutually indepen-

dent. Note: there is actually something non-trivial to prove here, since

our de¬nitions for independent events and independent random variables

super¬cially look quite di¬erent.

Exercise 6.13. Let C be an event that occurs with non-zero probability,

and let B1 , . . . , Bn be a partition of C, such that each event Bi occurs with

non-zero probability. Let X be a random variable whose image is X , and let

D be a probability distribution on X . Suppose that for each i = 1, . . . , n,

the conditional distribution of X given Bi is equal to D . Show that the

conditional distribution of X given C is also equal to D .

6.4 Expectation and variance 111

Exercise 6.14. Let n be a positive integer, and let X be a random variable

whose distribution is uniform over {0, . . . , n ’ 1}. For each positive divisor

d of n, let use de¬ne the random variable Xd := X mod d. Show that for

any positive divisors d1 , . . . , dk of n, the random variables Xd1 , . . . , Xdk are

mutually independent if and only if d1 , . . . , dk are pairwise relatively prime.

Exercise 6.15. With notation as in the previous exercise, let n := 30, and

describe the conditional distribution of X15 given that X6 = 1.

Exercise 6.16. Let W, X, Y be mutually independent and uniformly dis-

tributed over Zp , where p is prime. For any a ∈ Zp , let Za := a2 W +aX +Y .

Show that each Za is uniformly distributed over Zp , and that the collection

of all Za is 3-wise independent.

Exercise 6.17. Let Xib , for i = 1, . . . , k and b ∈ {0, 1}, be mutually inde-

pendent random variables, each with a uniform distribution on {0, 1}. For

b1 , . . . , bk ∈ {0, 1}, let us de¬ne the random variable

Yb1 ···bk := X1b1 • · · · • Xkbk ,

where “•” denotes “exclusive or.” Show that the 2k variables Yb1 ···bk are

pairwise independent, each with a uniform distribution over {0, 1}.

6.4 Expectation and variance

Let D = (U, P) be a probability distribution. If X is a real random variable,

then its expected value is

X(u) · P[u].

E[X] := (6.12)

u∈U

If X is the image of X, we have

x · P[X = x].

E[X] = xP[u] = (6.13)

x∈X u∈X ’1 ({x}) x∈X

From (6.13), it is clear that E[X] depends only on the distribution of X

(and not on any other properties of the underlying distribution D). More

generally, by a similar calculation, one sees that if X is any random variable

with image X , and f is a real-valued function on X , then

E[f (X)] = f (x)P[X = x]. (6.14)

x∈X

We make a few trivial observations about expectation, which the reader

may easily verify. First, if X is equal to a constant c (i.e., X(u) = c for all

112 Finite and discrete probability distributions

u ∈ U), then E[X] = E[c] = c. Second, if X takes only non-negative values

(i.e., X(u) ≥ 0 all u ∈ U), then E[X] ≥ 0. Similarly, if X takes only positive

values, then E[X] > 0.

A crucial property about expectation is the following:

Theorem 6.6 (Linearity of expectation). For real random variables X

and Y , and real number a, we have

E[X + Y ] = E[X] + E[Y ]