to choose, that door is opened and he wins whatever is behind the chosen

door. The question is, which strategy is better for the contestant: to stay

or to switch?

Let us evaluate the two strategies. If the contestant always stays with his

initial selection, then it is clear that his probability of success is exactly 1/3.

Now consider the strategy of always switching. Let B be the event that

the contestant™s initial choice was correct, and let A be the event that the

contestant wins the grand prize. On the one hand, if the contestant™s initial

choice was correct, then switching will certainly lead to failure. That is,

P[A | B] = 0. On the other hand, suppose that the contestant™s initial

choice was incorrect, so that one of the zonks is behind the initially chosen

door. Since Monty reveals the other zonk, switching will lead with certainty

to success. That is, P[A | B] = 1. Furthermore, it is clear that P[B] = 1/3.

So we compute

P[A] = P[A | B]P[B] + P[A | B]P[B] = 0 · (1/3) + 1 · (2/3) = 2/3.

Thus, the “stay” strategy has a success probability of 1/3, while the

“switch” strategy has a success probability of 2/3. So it is better to switch

than to stay.

Of course, real life is a bit more complicated. Monty did not always

reveal a zonk and o¬er a choice to switch. Indeed, if Monty only revealed

a zonk when the contestant had chosen the correct door, then switching

would certainly be the wrong strategy. However, if Monty™s choice itself was

a random decision made independent of the contestant™s initial choice, then

switching is again the preferred strategy. 2

Example 6.12. Suppose that the rate of incidence of disease X in the

overall population is 1%. Also suppose that there is a test for disease X;

however, the test is not perfect: it has a 5% false positive rate (i.e., 5% of

healthy patients test positive for the disease), and a 2% false negative rate

(i.e., 2% of sick patients test negative for the disease). A doctor gives the

test to a patient and it comes out positive. How should the doctor advise

his patient? In particular, what is the probability that the patient actually

has disease X, given a positive test result?

Amazingly, many trained doctors will say the probability is 95%, since the

test has a false positive rate of 5%. However, this conclusion is completely

wrong.

Let A be the event that the test is positive and let B be the event that

the patient has disease X. The relevant quantity that we need to estimate

is P[B | A]; that is, the probability that the patient has disease X, given a

6.2 Conditional probability and independence 103

positive test result. We use Bayes™ theorem to do this:

P[A | B]P[B] 0.98 · 0.01

P[B | A] = ≈ 0.17.

=

0.98 · 0.01 + 0.05 · 0.99

P[A | B]P[B] + P[A | B]P[B]

Thus, the chances that the patient has disease X given a positive test result

is just 17%. The correct intuition here is that it is much more likely to get

a false positive than it is to actually have the disease.

Of course, the real world is a bit more complicated than this example

suggests: the doctor may be giving the patient the test because other risk

factors or symptoms may suggest that the patient is more likely to have the

disease than a random member of the population, in which case the above

analysis does not apply. 2

Exercise 6.6. Consider again the situation in Example 6.12, but now sup-

pose that the patient is visiting the doctor because he has symptom Y .

Furthermore, it is known that everyone who has disease X exhibits symp-

tom Y , while 10% of the population overall exhibits symptom Y . Assuming

that the accuracy of the test is not a¬ected by the presence of symptom Y ,

how should the doctor advise his patient should the test come out positive?

Exercise 6.7. Suppose we roll two dice, and let (x, y) denote the outcome

(as in Example 6.6). For each of the following pairs of events A and B,

determine if they are independent or not:

(a) A: x = y; B: y = 1.

(b) A: x ≥ y; B: y = 1.

(c) A: x ≥ y; B: y 2 = 7y ’ 6.

(d) A: xy = 6; B: y = 3.

Exercise 6.8. Let C be an event that occurs with non-zero probability,

and let B1 , . . . , Bn be a partition of C, such that each event Bi occurs with

non-zero probability. Let A be an event and let p be a real number with

0 ¤ p ¤ 1. Suppose that for each i = 1, . . . , n, the conditional probability of

A given Bi is ¤ p (resp., <, =, >, ≥ p). Show that the conditional probability

of A given C is also ¤ p (resp., <, =, >, ≥ p).

Exercise 6.9. Show that if two events A and B are independent, then so are

A and B. More generally, show that if A1 , . . . , An are mutually independent,

then so are A1 , . . . , An , where each Ai denotes either Ai or Ai .

Exercise 6.10. This exercise develops an alternative proof, based on prob-

ability theory, of Theorem 2.14. Let n > 1 be an integer and consider an

104 Finite and discrete probability distributions

experiment in which a number a is chosen at random from {0, . . . , n ’ 1}.

If n = pe1 · · · per is the prime factorization of n, let Ai be the event that a

r

1

is divisible by pi , for i = 1, . . . , r.

(a) Show that

φ(n)/n = P[A1 © · · · © Ar ],

where φ is Euler™s phi function.

(b) Show that if i1 , . . . , i are distinct indices between 1 and r, then

1

P[Ai1 © · · · © Ai ] = .

pi1 · · · pi

Conclude that the events Ai are mutually independent, and P[Ai ] =

1/pi .

(c) Using part (b) and the result of the previous exercise, show that

r

P[A1 © · · · © Ar ] = (1 ’ 1/pi ).

i=1

(d) Combine parts (a) and (c) to derive the result of Theorem 2.14 that

r

(1 ’ 1/pi ).

φ(n) = n

i=1

6.3 Random variables

Let D = (U, P) be a probability distribution.

It is sometimes convenient to associate a real number, or other mathe-

matical object, with each outcome u ∈ U. Such an association is called a

random variable; more formally, a random variable X is a function from

U into a set X . If X is a subset of the real numbers, then X is called

a real random variable. When we speak of the image of X, we sim-

ply mean its image in the usual function-theoretic sense, that is, the set

X(U) = {X(u) : u ∈ U}.

One may de¬ne any number of random variables on a given probability

distribution. If X : U ’ X is a random variable, and f : X ’ Y is a

function, then f (X) := f —¦ X is also a random variable.

Example 6.13. Suppose we ¬‚ip n fair coins. Then we may de¬ne a ran-

dom variable X that maps each outcome to a bit string of length n, where a

“head” is encoded as a 1-bit, and a “tail” is encoded as a 0-bit. We may de-

¬ne another random variable Y that is the number of “heads.” The variable

Y is a real random variable. 2

6.3 Random variables 105

Example 6.14. If A is an event, we may de¬ne a random variable X as

follows: X := 1 if the event A occurs, and X := 0 otherwise. The variable X

is called the indicator variable for A. Conversely, if Y is any 0/1-valued

random variable, we can de¬ne the event B to be the subset of all possible

outcomes that lead to Y = 1, and Y is the indicator variable for the event

B. Thus, we can work with either events or indicator variables, whichever

is more natural and convenient. 2

Let X : U ’ X be a random variable. For x ∈ X , we write “X = x”

as shorthand for the event {u ∈ U : X(u) = x}. More generally, for any

predicate φ, we may write “φ(X)” as shorthand for the event {u ∈ U :

φ(X(u))}.

A random variable X de¬nes a probability distribution on its image X ,

where the probability associated with x ∈ X is P[X = x]. We call this the

distribution of X. For two random variables X, Y de¬ned on a probability

distribution, Z := (X, Y ) is also a random variable whose distribution is

called the joint distribution of X and Y .

If X is a random variable, and A is an event with non-zero probability,

then the conditional distribution of X given A is a probability distri-

bution on the image X of X, where the probability associated with x ∈ X

is P[X = x | A].

We say two random variables X, Y are independent if for all x in the