Example 6.36. To illustrate that Theorem 6.24 does not hold in general,

consider the geometric distribution on the positive integers, where P[j] = 2’j

for j ≥ 1. For i ≥ 1, de¬ne the random variable Xi so that Xi (i) = 2i ,

146 Finite and discrete probability distributions

Xi (i + 1) = ’2i+1 , and Xi (j) = 0 for all j ∈ {i, i + 1}. Then E[Xi ] = 0 for

/

all i ≥ 1, and so i≥1 E[Xi ] = 0. Now de¬ne X := i≥1 Xi . This is well

de¬ned, and in fact X(1) = 2, while X(j) = 0 for all j > 1. Hence E[X] = 1.

2

The variance Var[X] of X exists if and only if E[X] and E[(X ’ E[X])2 ]

exist, which holds if and only if E[X] and E[X 2 ] exist.

Theorem 6.9 holds under the additional hypothesis that E[X] and E[X 2 ]

exist. Similarly, Theorem 6.10 holds under the additional hypothesis that

E[Xi ] and E[Xi2 ] exist for each i.

The de¬nition of conditional expectation carries over verbatim, as do

equations (6.15) and (6.16). The analog of (6.16) for in¬nite partitions

B1 , B2 , . . . does not hold in general, but does hold if X is always non-negative.

6.10.5 Some useful bounds

Both Theorems 6.11 and 6.12 (Markov™s and Chebyshev™s inequalities) hold,

under the additional hypothesis that the relevant expectations and variances

exist.

Exercise 6.54. Suppose X is a random variable taking positive integer

values, and that for some real number q, with 0 ¤ q ¤ 1, and for all integers

i ≥ 1, we have P[X ≥ i] = q i’1 . Show that X has a geometric distribution

with associated success probability p := 1 ’ q.

Exercise 6.55. A gambler plays a simple game in a casino: with each play

of the game, the gambler may bet any number m of dollars; a coin is ¬‚ipped,

and if it comes up “heads,” the casino pays m dollars to the gambler, and

otherwise, the gambler pays m dollars to the casino. The gambler plays

the game repeatedly, using the following strategy: he initially bets a dollar;

each time he plays, if he wins, he pockets his winnings and goes home, and

otherwise, he doubles his bet and plays again.

(a) Show that if the gambler has an in¬nite amount of money (so he

can keep playing no matter how many times he looses), then his ex-

pected winnings are one dollar. Hint: model the gambler™s winnings

as a random variable on a geometric distribution, and compute its

expected value.

(b) Show that if the gambler has a ¬nite amount of money (so that he

can only a¬ord to loose a certain number of times), then his expected

winnings are zero (regardless of how much money he starts with).

6.11 Notes 147

Hint: in this case, you can model the gambler™s winnings as a random

variable on a ¬nite probability distribution.

6.11 Notes

Our Cherno¬ bound (Theorem 6.13) is one of a number of di¬erent types of

bounds that appear in the literature under the rubric of “Cherno¬ bound.”

Universal and pairwise independent hash functions, with applications to

hash tables and message authentication codes, were introduced by Carter

and Wegman [25, 99].

The leftover hash lemma (Theorem 6.21) was originally stated and proved

by Impagliazzo, Levin, and Luby [46], who use it to obtain an important

result in the theory of cryptography. Our proof of the leftover hash lemma is

loosely based on one by Impagliazzo and Zuckermann [47], who also present

further applications.

7

Probabilistic algorithms

It is sometimes useful to endow our algorithms with the ability to generate

random numbers. To simplify matters, we only consider algorithms that

generate random bits. Where such random bits actually come from will not

be of great concern to us here. In a practical implementation, one would

use a pseudo-random bit generator, which should produce bits that “for

all practical purposes” are “as good as random.” While there is a well-

developed theory of pseudo-random bit generation (some of which builds on

the ideas in §6.9), we will not delve into this here. Moreover, the pseudo-

random bit generators used in practice are not based on this general theory,

and are much more ad hoc in design. So, although we will present a rigorous

formal theory of probabilistic algorithms, the application of this theory to

practice is ultimately a bit heuristic.

7.1 Basic de¬nitions

Formally speaking, we will add a new type of instruction to our random

access machine (described in §3.2):

random bit This type of instruction is of the form ± ← RANDOM, where

± takes the same form as in arithmetic instructions. Execution of

this type of instruction assigns to ± a value sampled from the uniform

distribution on {0, 1}, independently from the execution of all other

random-bit instructions.

In describing algorithms at a high level, we shall write “b ←R {0, 1}” to

denote the assignment of a random bit to the variable b, and “s ←R {0, 1}— ”

to denote the assignment of a random bit string of length to the variable s.

In describing the behavior of such a probabilistic or randomized algo-

rithm A, for any input x, we view its running time and output as random

148

7.1 Basic de¬nitions 149

variables, denoted TA (x) and A(x), respectively. The expected running

time of A on input x is de¬ned as the expected value E[TA (x)] of the ran-

dom variable TA (x). Note that in de¬ning expected running time, we are

not considering the input to be drawn from some probability distribution.

One could, of course, de¬ne such a notion; however, it is not always easy to

come up with a distribution on the input space that reasonably models a

particular real-world situation. We do not pursue this issue any more here.

We say that a probabilistic algorithm A runs in expected polynomial

time if there exist constants c, d such that for all n ≥ 0 and all inputs x

of length n, we have E[TA (x)] ¤ nc + d. We say that A runs in strict

polynomial time if there exist constants c, d such that for all n and all

inputs x of length n, A always halts on input x within nc + d, regardless of

its random choices.

De¬ning the distributions of TA (x) and A(x) is a bit tricky. Things are

quite straightforward if A always halts on input x after a ¬nite number

of steps, regardless of the outcomes of its random choices: in this case,

we can naturally view TA (x) and A(x) as random variables on a uniform

distribution over bit strings of some particular length ” such a random bit

string may be used as the source of random bits for the algorithm. However,

if there is no a priori bound on the number of steps, things become more

complicated: think of an algorithm that generates random bits one at a time

until it generates, say, a 1 bit”just as in Example 6.29, we do not attempt

to model this as a probability distribution on the uncountable set of in¬nite

bit strings, but rather, we directly de¬ne an appropriate discrete probability

distribution that models the execution of A on input x.

7.1.1 De¬ning the probability distribution

A warning to the reader: the remainder of this section is a bit technical,

and you might want to skip ahead to §7.2 on ¬rst reading, if you are willing

to trust your intuition regarding probabilistic algorithms.

To motivate our de¬nition, which may at ¬rst seem a bit strange, consider

again Example 6.29. We could view the sample space in that example to

be the set of all bit strings consisting of zero or more 0 bits, followed by a

single 1 bit, and to each such bit string σ of this special form, we assign the

probability 2’|σ| , where |σ| denotes the length of σ. The “random experi-

ment” we have in mind is to generate random bits one at a time until one of

these special “halting” strings is generated. In developing the de¬nition of

the probability distribution for a probabilistic algorithm, we simply consider

150 Probabilistic algorithms

more general sets of “halting” strings, determined by the algorithm and its

input.

To simplify matters, we assume that the machine produces a stream of

random bits, one with every instruction executed, and if the instruction

happens to be a random-bit instruction, then this is the bit it uses. For