Exercise 7.1. Consider a probabilistic algorithm A that halts with prob-

ability 1 on input x, and consider the probability distribution DA,x on the

set S of exact execution paths. Let „ be a ¬xed, partial execution path, and

let B ⊆ S be the event that consists of all exact execution paths that extend

„ . Show that P[B] = 2’|„ | .

Exercise 7.2. Consider a probabilistic algorithm A that halts with prob-

ability 1 on input x, and consider the probability distribution DA,x on the

set S of exact execution paths. For a bit string σ and an integer k ≥ 0, let

{σ}k denote the value of σ truncated to the ¬rst k bits. Suppose that B ⊆ S

is an event of the form

B = {σ ∈ S : φ({σ}k )}

for some predicate φ and some integer k ≥ 0. Intuitively, this means that

B is completely determined by the ¬rst k bits of the execution path. Now

consider the uniform distribution on {0, 1}—k . Let us de¬ne an event B in

this distribution as follows. For σ ∈ {0, 1}—k , let us run A on input x using

the execution path σ for k steps or until A halts (whichever comes ¬rst).

If the number of steps executed was t (where t ¤ k), then we put σ in B

if and only if φ({σ}t ). Show that the probability that the event B occurs

154 Probabilistic algorithms

(with respect to the distribution DA,x ) is the same as the probability that

B occurs (with respect to the uniform distribution on {0, 1}—k ). Hint: use

Exercise 7.1.

The above exercise is very useful in simplifying the analysis of probabilistic

algorithms. One can typically reduce the analysis of some event of interest

into the analysis of a collection of events, each of which is determined by

the ¬rst k bits of the execution path for some ¬xed k. The probability of an

event that is determined by the ¬rst k bits of the execution path may then

be calculated by analyzing the behavior of the algorithm on a random k-bit

execution path.

Exercise 7.3. Suppose algorithm A calls algorithm B as a subroutine. In

the probability distribution DA,x , consider a particular partial execution

path „ that drives A to a point where A invokes algorithm B with a partic-

ular input y (determined by x and „ ). Consider the conditional probability

distribution given that „ is a pre¬x of A™s actual execution path. We can

de¬ne a random variable X on this conditional distribution whose value is

the subpath traced out by the invocation of subroutine B. Show that the

distribution of X is the same as DB,y . Hint: use Exercise 7.1.

The above exercise is also very useful in simplifying the analysis of prob-

abilistic algorithms, in that it allows us to analyze a subroutine in isolation,

and use the results in the analysis of an algorithm that calls that subroutine.

Exercise 7.4. Let A be a probabilistic algorithm, and for an input x and

integer k ≥ 0, consider the experiment in which we choose a random exe-

cution path of length k, and run A on input x for up to k steps using the

selected execution path. If A halts within k steps, we de¬ne Ak (x) to be

the output produced by A, and TAk (x) to be the actual number of steps

executed by A; otherwise, we de¬ne Ak (x) to be the distinguished value

“⊥” and TAk (x) to be k.

(a) Show that if A halts with probability 1 on input x, then for all possible

outputs y,

P[A(x) = y] = lim P[Ak (x) = y].

k’∞

(b) Show that if A halts with probability 1 on input x, then

E[TA (x)] = lim E[TAk (x)].

k’∞

Exercise 7.5. One can generalize the notion of a discrete, probabilistic

process, as follows. Let “ be a ¬nite or countably in¬nite set. Let f be a

7.2 Approximation of functions 155

function mapping sequences of one or more elements of “ to [0, 1], such that

the following property holds:

for all ¬nite sequences (γ1 , . . . , γi’1 ), where i ≥ 1,

f (γ1 , . . . , γi’1 , γ) is non-zero for at most a ¬nite number of

γ ∈ “, and

f (γ1 , . . . , γi’1 , γ) = 1.

γ∈“

Now consider any pre¬x-free set S of ¬nite sequences of elements of “. For

σ = (γ1 , . . . , γn ) ∈ S, de¬ne

n

P[σ] := f (γ1 , . . . , γi ).

i=1

Show that σ∈S P[σ] ¤ 1, and hence we may de¬ne a probability distribu-

tion on S using the probability function P[·] if this sum is 1. The intuition

is that we are modeling a process in which we start out in the “empty” con-

¬guration; at each step, if we are in con¬guration (γ1 , . . . , γi’1 ), we halt if

this is a “halting” con¬guration, that is, an element of S, and otherwise, we

move to con¬guration (γ1 , . . . , γi’1 , γ) with probability f (γ1 , . . . , γi’1 , γ).

7.2 Approximation of functions

Suppose f is a function mapping bit strings to bit strings. We may have

an algorithm A that approximately computes f in the following sense:

there exists a constant , with 0 ¤ < 1/2, such that for all inputs x,

P[A(x) = f (x)] ≥ 1 ’ . The value is a bound on the error probability,

which is de¬ned as P[A(x) = f (x)].

7.2.1 Reducing the error probability

There is a standard “trick” by which one can make the error probability very

small; namely, run A on input x some number, say t, times, and take the

majority output as the answer. Using the Cherno¬ bound (Theorem 6.13),

the error probability for the iterated version of A is bounded by exp[’(1/2’

)2 t/2], and so the error probability decreases exponentially with the number

of iterations. This bound is derived as follows. For i = 1, . . . , t, let Xi

be a random variable representing the outcome of the ith iteration of A;

more precisely, Xi = 1 if A(x) = f (x) on the ith iteration, and Xi = 0

otherwise. Let x be the probability that A(x) = f (x). The probability that

the majority output is wrong is equal to the probability that the sample

156 Probabilistic algorithms

mean of X1 , . . . , Xt exceeds the mean x by at least 1/2 ’ x . Part (i) of

Theorem 6.13 says that this occurs with probability at most

’(1/2 ’ x )2 t ’(1/2 ’ )2 t

¤ exp

exp .

2(1 ’ x ) 2

7.2.2 Strict polynomial time

If we have an algorithm A that runs in expected polynomial time, and which

approximately computes a function f , then we can easily turn it into a new

algorithm A that runs in strict polynomial time, and also approximates

f , as follows. Suppose that < 1/2 is a bound on the error probability,

and T (n) is a polynomial bound on the expected running time for inputs of

length n. Then A simply runs A for at most tT (n) steps, where t is any

constant chosen so that + 1/t < 1/2 ” if A does not halt within this time

bound, then A simply halts with an arbitrary output. The probability that

A errs is at most the probability that A errs plus the probability that A

runs for more than tT (n) steps. By Markov™s inequality (Theorem 6.11),

the latter probability is at most 1/t, and hence A approximates f as well,

but with an error probability bounded by + 1/t.

7.2.3 Language recognition

An important special case of approximately computing a function is when

the output of the function f is either 0 or 1 (or equivalently, false or true).

In this case, f may be viewed as the characteristic function of the language

L := {x : f (x) = 1}. (It is the tradition of computational complexity theory

to call sets of bit strings “languages.”) There are several “¬‚avors” of proba-

bilistic algorithms for approximately computing the characteristic function

f of a language L that are traditionally considered ” for the purposes of

these de¬nitions, we may restrict ourselves to algorithms that output either

0 or 1:

• We call a probabilistic, expected polynomial-time algorithm an At-

lantic City algorithm for recognizing L if it approximately com-

putes f with error probability bounded by a constant < 1/2.

• We call a probabilistic, expected polynomial-time algorithm A a

Monte Carlo algorithm for recognizing L if for some constant

δ > 0, we have:

“ for any x ∈ L, we have P[A(x) = 1] ≥ δ, and

“ for any x ∈ L, we have P[A(x) = 1] = 0.

/

7.2 Approximation of functions 157