independent, which is to say,

P[X = x § Y = y] = P[X = x]P[Y = y].

Equivalently, X and Y are independent if and only if their joint distribution

is equal to the product of their individual distributions. Alternatively, X

and Y are independent if and only if for all values x taken by X with non-

zero probability, the conditional distribution of Y given the event X = x is

the same as the distribution of Y .

Let X1 , . . . , Xn be a collection of random variables, and let Xi be the image

of Xi for i = 1, . . . , n. We say X1 , . . . , Xn are pairwise independent if for

all i, j = 1, . . . , n with i = j, the variables Xi and Xj are independent. We

say that X1 , . . . , Xn are mutually independent if for all x1 ∈ X1 , . . . , xn ∈

Xn , we have

P[X1 = x1 § · · · § Xn = xn ] = P[X1 = x1 ] · · · P[Xn = xn ].

More generally, for k = 2, . . . , n, we say that X1 , . . . , Xn are k-wise inde-

pendent if any k of them are mutually independent.

106 Finite and discrete probability distributions

Example 6.15. We toss three coins, and set Xi := 0 if the ith coin is

“tails,” and Xi := 1 otherwise. The variables X1 , X2 , X3 are mutually inde-

pendent. Let us set Y12 := X1 • X2 , Y13 := X1 • X3 , and Y23 := X2 • X3 ,

where “•” denotes “exclusive or,” that is, addition modulo 2. Then the

variables Y12 , Y13 , Y23 are pairwise independent, but not mutually indepen-

dent ” observe that Y12 • Y13 = Y23 . 2

The following is a simple but useful fact:

Theorem 6.1. Let Xi : U ’ Xi be random variables, for i = 1, . . . , n, and

suppose that there exist functions fi : Xi ’ [0, 1], for i = 1, . . . , n, such that

fi (xi ) = 1 (i = 1 . . . n),

xi ∈Xi

and

P[X1 = x1 § · · · § Xn = xn ] = f1 (x1 ) · · · fn (xn )

for all x1 ∈ X1 , . . . , xn ∈ Xn . Then for any subset of distinct indices

i1 , . . . , i ∈ {1, . . . , n}, we have

P[Xi1 = xi1 § · · · § Xi = xi ] = fi1 (xi1 ) · · · fi (xi )

for all xi1 ∈ Xi1 , . . . , xi ∈ Xi .

Proof. To prove the theorem, it will su¬ce to show that we can “eliminate”

a single variable, say Xn , meaning that for all x1 , . . . , xn’1 , we have

P[X1 = x1 § · · · § Xn’1 = xn’1 ] = f1 (x1 ) · · · fn’1 (xn’1 ).

Having established this, we may then proceed to eliminate any number of

variables (the ordering of the variables is clearly irrelevant).

We have

P[X1 = x1 § · · · § Xn’1 = xn’1 ]

P[X1 = x1 § · · · § Xn’1 = xn’1 § Xn = xn ]

=

xn ∈Xn

f1 (x1 ) · · · fn’1 (xn’1 )fn (xn )

=

xn ∈Xn

= f1 (x2 ) · · · fn’1 (xn’1 ) fn (xn )

xn ∈Xn

= f1 (x1 ) · · · fn’1 (xn’1 ). 2

6.3 Random variables 107

The following three theorems are immediate consequences of the above

theorem:

Theorem 6.2. Let Xi : U ’ Xi be random variables, for i = 1, . . . , n, such

that

1 1

P[X1 = x1 § · · · § Xn = xn ] = ··· (for all x1 ∈ X1 , . . . , xn ∈ Xn ).

|X1 | |Xn |

Then the variables Xi are mutually independent with each Xi uniformly

distributed over Xi .

Theorem 6.3. If X1 , . . . , Xn are mutually independent random variables,

then they are k-wise independent for all k = 2, . . . , n.

Theorem 6.4. If Di = (Ui , Pi ) are probability distributions for i = 1, . . . , n,

then the projection functions πi : U1 — · · · — Un ’ Ui , where πi (u1 , . . . , un ) =

ui , are mutually independent random variables on the product distribution

D1 — · · · — Dn .

We also have:

Theorem 6.5. If X1 , . . . , Xn are mutually independent random variables,

and g1 , . . . , gn are functions, then g1 (X1 ), . . . , gn (Xn ) are also mutually in-

dependent random variables.

Proof. The proof is a straightforward, if somewhat tedious, calculation. For

i = 1, . . . , n, let yi be some value in the image of gi (Xi ), and let Xi :=

’1

gi ({yi }). We have

P[g1 (X1 ) = y1 § · · · § gn (Xn ) = yn ]

X1 = x1 ) § · · · § (

=P ( Xn = xn )

x1 ∈X1 xn ∈Xn

··· (X1 = x1 § · · · § Xn = xn )

=P

x1 ∈X1 xn ∈Xn

··· P[X1 = x1 § · · · § Xn = xn ]

=

x1 ∈X1 xn ∈Xn

··· P[X1 = x1 ] · · · P[Xn = xn ]

=

x1 ∈X1 xn ∈Xn

P[X1 = x1 ] · · ·

= P[Xn = xn ]

x1 ∈X1 xn ∈Xn

108 Finite and discrete probability distributions

X1 = x1 · · · P

=P Xn = xn

x1 ∈X1 xn ∈Xn

= P[g1 (X1 ) = y1 ] · · · P[gn (Xn ) = yn ]. 2

Example 6.16. If we toss n dice, and let Xi denote the value of the ith die

for i = 1, . . . , n, then the Xi are mutually independent random variables. If

we set Yi := Xi2 for i = 1, . . . , n, then the Yi are also mutually independent

random variables. 2

Example 6.17. This example again illustrates the notion of pairwise in-

dependence. Let X and Y be independent and uniformly distributed over

Zp , where p is a prime. For a ∈ Zp , let Za := aX + Y . Then we claim that

each Za is uniformly distributed over Zp , and that the collection of random

variables {Za : a ∈ Zp } is pairwise independent.

To prove this claim, let a, b ∈ Zp with a = b, and consider the map

fa,b : Zp — Zp ’ Zp — Zp that sends (x, y) to (ax + y, bx + y). It is easy to see

that fa,b is injective; indeed, if ax + y = ax + y and bx + y = bx + y , then

subtracting these two equations, we obtain (a ’ b)x = (a ’ b)x , and since

a ’ b = [0]p , it follows that x = x , which also implies y = y . Since fa,b is

injective, it must be a bijection from Zp — Zp onto itself. Thus, since (X, Y )

is uniformly distributed over Zp — Zp , so is (Za , Zb ) = (aX + Y, bX + Y ). So

for all z, z ∈ Zp , we have

P[Za = z § Zb = z ] = 1/p2 ,

and so the claim follows from Theorem 6.2.

Note that the variables Za are not 3-wise independent, since the value of

any two determines the value of all the rest (verify). 2

Example 6.18. We can generalize the previous example as follows. Let

X1 , . . . , Xt , Y be mutually independent and uniformly distributed over Zp ,

where p is prime, and for a1 , . . . , at ∈ Zp , let Za1 ,...,at := a1 X1 +· · ·+at Xt +Y .

We claim that each Za1 ,...,at is uniformly distributed over Zp , and that the

collection of all such Za1 ,...,at is pairwise independent.

To prove this claim, it will su¬ce (by Theorem 6.2) to prove that for all

a1 , . . . , at , b1 , . . . , bt , z, z ∈ Zp ,

subject to (a1 , . . . , at ) = (b1 , . . . , bt ), we have