that (H, H(A)) is δ-uniform on H — Z, with

√

δ ¤ 264 2’160 /2 = 2’49 . 2

The leftover hash lemma allows one to convert “low quality” sources of

randomness into “high quality” sources of randomness. Suppose that to

conduct an experiment, we need to sample a random variable Z whose dis-

tribution is uniform on a set Z of size n, or at least δ-uniform for a small

value of δ. However, we may not have direct access to a source of “real”

randomness whose distribution looks anything like that of the desired uni-

form distribution, but rather, only to a “low quality” source of randomness.

For example, one could model various characteristics of a person™s typing

at the keyboard, or perhaps various characteristics of the internal state of a

computer (both its software and hardware) as a random process. We can-

not say very much about the probability distributions associated with such

processes, but perhaps we can conservatively estimate the collision or guess-

ing probability associated with these distributions. Using the leftover hash

lemma, we can hash the output of this random process, using a suitably

generated random hash function. The hash function acts like a “magnifying

glass”: it “focuses” the randomness inherent in the “low quality” source

distribution onto the set Z, obtaining a “high quality,” nearly uniform, dis-

tribution on Z.

Of course, this approach requires a random hash function, which may

be just as di¬cult to generate as a random element of Z. The following

theorem shows, however, that we can at least use the same “magnifying

glass” many times over, with the statistical distance from uniform of the

output distribution increasing linearly in the number of applications of the

hash function.

140 Finite and discrete probability distributions

Theorem 6.22. Let H be a universal family of hash functions from A to Z,

where Z is of size n. Let H denote a random variable with the uniform distri-

bution on H, and let A1 , . . . , A denote random variables taking values in A,

with H, A1 , . . . , A mutually independent. Let κ := max{κ(A1 ), . . . , κ(A )}.

Then (H, H(A1 ), . . . , H(A )) is δ -uniform on H — Z — , where

√

δ¤ nκ/2.

Proof. Let Z1 , . . . , Z denote random variables with the uniform distribution

on Z, with H, A1 , . . . , A , Z1 , . . . , Z mutually independent. We shall make a

hybrid argument (as in the proof of Theorem 6.18). De¬ne random variables

W0 , W1 , . . . , W as follows:

W0 := (H, H(A1 ), . . . , H(A )),

for i = 1, . . . , ’ 1, and

Wi := (H, Z1 , . . . , Zi , H(Ai+1 ), . . . , H(A ))

W := (H, Z1 , . . . , Z ).

We have

δ = ∆[W0 ; W ]

¤ ∆[Wi’1 ; Wi ] (by part (iv) of Theorem 6.14)

i=1

¤ ∆[H, Z1 , . . . , Zi’1 , H(Ai ), Ai+1 , . . . , A ;

i=1 H, Z1 , . . . , Zi’1 , Zi , Ai+1 , . . . , A ]

(by Theorem 6.16)

= ∆[H, H(Ai ); H, Zi ] (by Theorem 6.17)

i=1

√

(by Theorem 6.21). 2

¤ nκ/2

Another source of “low quality” randomness arises in certain crypto-

graphic applications, where we have a “secret” random variable A that is

distributed uniformly over a large subset of some set A, but we want to

derive from A a “secret key” whose distribution is close to that of the uni-

form distribution on a speci¬ed “key space” Z (typically, Z is the set of all

bit strings of some speci¬ed length). The leftover hash lemma, combined

with Theorem 6.22, allows us to do this using a “public” hash function ”

generated at random once and for all, published for all to see, and used over

and over to derive secret keys as needed.

6.10 Discrete probability distributions 141

Exercise 6.52. Consider again the situation in Theorem 6.21. Suppose

that Z = {0, . . . , n ’ 1}, but that we would rather have an almost-uniform

distribution over Z = {0, . . . , t ’ 1}, for some t < n. While it may be

possible to work with a di¬erent family of hash functions, we do not have

to if n is large enough with respect to t, in which case we can just use the

value H(A) mod t. If Z is uniformly distributed over Z , show that

√

∆[H, H(A) mod t; H, Z ] ¤ nκ/2 + t/n.

Exercise 6.53. Suppose X and Y are random variables with images X and

Y, respectively, and suppose that for some , we have P[X = x | Y = y] ¤

for all x ∈ X and y ∈ Y. Let H be a universal family of hash functions from

X to Z, where Z is of size n. Let H denote a random variable with the

uniform distribution on H, and Z denote a random variable with the uniform

distribution on Z, where the three variables H, Z, and (X, Y ) are mutually

independent. Show that the statistical distance between (Y, H, H(X)) and

√

(Y, H, Z) is at most n /2.

6.10 Discrete probability distributions

In addition to working with probability distributions over ¬nite sample

spaces, one can also work with distributions over in¬nite sample spaces.

If the sample space is countable, that is, either ¬nite or countably in¬nite,

then the distribution is called a discrete probability distribution. We

shall not consider any other types of probability distributions in this text.

The theory developed in §§6.1“6.5 extends fairly easily to the countably

in¬nite setting, and in this section, we discuss how this is done.

6.10.1 Basic de¬nitions

To say that the sample space U is countably in¬nite simply means that

there is a bijection f from the set of positive integers onto U; thus, we can

enumerate the elements of U as u1 , u2 , u3 , . . . , where ui = f (i).

As in the ¬nite case, the probability function assigns to each u ∈ U a

value P[u] ∈ [0, 1]. The basic requirement that the probabilities sum to

one (equation (6.1)) is the requirement that the in¬nite series ∞ P[ui ]i=1

converges to one. Luckily, the convergence properties of an in¬nite series

whose terms are all non-negative is invariant under a re-ordering of terms

(see §A4), so it does not matter how we enumerate the elements of U.

Example 6.29. Suppose we ¬‚ip a fair coin repeatedly until it comes up

142 Finite and discrete probability distributions

“heads,” and let the outcome u of the experiment denote the number of coins

¬‚ipped. We can model this experiment as a discrete probability distribution

D = (U, P), where U consists of the set of all positive integers, and where

for u ∈ U, we set P[u] = 2’u . We can check that indeed ∞ 2’u = 1, as

u=1

required.

One may be tempted to model this experiment by setting up a probabil-

ity distribution on the sample space of all in¬nite sequences of coin tosses;

however, this sample space is not countably in¬nite, and so we cannot con-

struct a discrete probability distribution on this space. While it is possible

to extend the notion of a probability distribution to such spaces, this would

take us too far a¬eld. 2

Example 6.30. More generally, suppose we repeatedly execute a Bernoulli

trial until it succeeds, where each execution succeeds with probability p > 0

independently of the previous trials, and let the outcome u of the experiment

denote the number of trials executed. Then we associate the probability

P[u] = q u’1 p with each positive integer u, where q := 1 ’ p, since we have

u ’ 1 failures before the one success. One can easily check that these prob-

abilities sum to 1. Such a distribution is called a geometric distribution.

2

Example 6.31. The series ∞ 1/i3 converges to some positive number

i=1

c. Therefore, we can de¬ne a probability distribution on the set of positive

integers, where we associate with each i ≥ 1 the probability 1/ci3 . 2

Example 6.32. More generally, if xi , i = 1, 2, . . . , are non-negative num-

bers, and 0 < c := ∞ xi < ∞, then we can de¬ne a probability distri-

i=1

bution on the set of positive integers, assigning the probability xi /c to i.

2