V0 V1

Now, as in Example 6.26,

∆[X; Y ] = area of A = area of C.

Further, consider any subset W of V. The quantity |P[X ∈ W] ’ P[Y ∈ W]|

is equal to the absolute value of the di¬erence of the area of the subregion

of A that lies above W and the area of the subregion of C that lies above

W. This quantity is maximized when W = V0 or W = V1 , in which case it

is equal to ∆[X; Y ]. 2

We can restate Theorem 6.15 as follows:

∆[X; Y ] = max{|P[φ(X)] ’ P[φ(Y )]| : φ is a predicate on V}.

This implies that when ∆[X; Y ] is very small, then for any predicate φ, the

events φ(X) and φ(Y ) occur with almost the same probability. Put another

way, there is no “statistical test” that can e¬ectively distinguish between

the distributions of X and Y . For many applications, this means that the

distribution of X is “for all practical purposes” equivalent to that of Y , and

hence in analyzing the behavior of X, we can instead analyze the behavior

of Y , if that is more convenient.

6.8 Statistical distance 133

Theorem 6.16. Let X, Y be random variables taking values on a set V, and

let f be a function from V into a set W. Then ∆[f (X); f (Y )] ¤ ∆[X; Y ].

Proof. By Theorem 6.15, for any subset W of W, we have

|P[f (X) ∈ W ] ’ P[f (Y ) ∈ W ]| =

|P[X ∈ f ’1 (W )] ’ P[Y ∈ f ’1 (W )]| ¤ ∆[X; Y ].

In particular, again by Theorem 6.15,

∆[f (X); f (Y )] = |P[f (X) ∈ W ] ’ P[f (Y ) ∈ W ]|

for some W . 2

Example 6.27. Let X be uniformly distributed on the set {0, . . . , n ’ 1},

and let Y be uniformly distributed on the set {0, . . . , m ’ 1}, for m ≥ n. Let

f (y) := y mod n. We want to compute an upper bound on the statistical

distance between X and f (Y ). We can do this as follows. Let m = qn ’ r,

where 0 ¤ r < n, so that q = m/n . Also, let Z be uniformly distributed

over {0, . . . , qn ’ 1}. Then f (Z) is uniformly distributed over {0, . . . , n ’ 1},

since every element of {0, . . . , n ’ 1} has the same number (namely, q) of

pre-images under f which lie in the set {0, . . . , qn ’ 1}. Therefore, by the

previous theorem,

∆[X; f (Y )] = ∆[f (Z); f (Y )] ¤ ∆[Z; Y ],

and as we saw in Example 6.26,

∆[Z; Y ] = r/qn < 1/q ¤ n/m.

Therefore,

∆[X; f (Y )] < n/m. 2

We close this section with two useful theorems.

Theorem 6.17. Let X and Y be random variables taking values on a set V,

and let W be a random variable taking values on a set W. Further, suppose

that X and W are independent, and that Y and W are independent. Then

the statistical distance between (X, W ) and (Y, W ) is equal to the statistical

distance between X and Y ; that is,

∆[X, W ; Y, W ] = ∆[X, Y ].

134 Finite and discrete probability distributions

Proof. From the de¬nition of statistical distance,

|P[X = v § W = w] ’ P[Y = v § W = w]|

2∆[X, W ; Y, W ] =

v,w

|P[X = v]P[W = w] ’ P[Y = v]P[W = w]|

=

v,w

(by independence)

P[W = w]|P[X = v] ’ P[Y = v]|

=

v,w

|P[X = v] ’ P[Y = v]|)

=( P[W = w])(

w v

= 1 · 2∆[X; Y ]. 2

Theorem 6.18. Let U1 , . . . , U , V1 , . . . , V be mutually independent random

variables. We have

∆[U1 , . . . , U ; V1 , . . . , V ] ¤ ∆[Ui ; Vi ].

i=1

Proof. We introduce random variables W0 , . . . , W , de¬ned as follows:

W0 := (U1 , . . . , U ),

for i = 1, . . . , ’ 1, and

Wi := (V1 , . . . , Vi , Ui+1 , . . . , U )

W := (V1 , . . . , V ).

By de¬nition,

∆[U1 , . . . , U ; V1 , . . . , V ] = ∆[W0 ; W ].

Moreover, by part (iv) of Theorem 6.14, we have

∆[W0 ; W ] ¤ ∆[Wi’1 ; Wi ].

i=1

Now consider any ¬xed index i = 1, . . . , . By Theorem 6.17, we have

∆[Wi’1 ; Wi ] = ∆[ Ui , (V1 , . . . , Vi’1 , Ui+1 , . . . , U );

Vi , (V1 , . . . , Vi’1 , Ui+1 , . . . , U )]

= ∆[Ui ; Vi ].

The theorem now follows immediately. 2

The technique used in the proof of the previous theorem is sometimes

6.8 Statistical distance 135

called a hybrid argument, as one considers the sequence of “hybrid” vari-

ables W0 , W1 , . . . , W , and shows that the distance between each consecutive

pair of variables is small.

Exercise 6.41. Let X and Y be independent random variables, each uni-

formly distributed over Zp , where p is prime. Calculate ∆[X, Y ; X, XY ].

Exercise 6.42. Let n be a large integer that is the product of two distinct

primes of roughly the same bit length. Let X be uniformly distributed over

Zn , and let Y be uniformly distributed over Z— . Show that ∆[X; Y ] =

n

’1/2 ).

O(n

Exercise 6.43. Let V be a ¬nite set, and consider any function φ : V ’

{0, 1}. Let B be a random variable uniformly distributed over {0, 1}, and

for b = 0, 1, let Xb be a random variable taking values in V, and assume

that Xb and B are independent. Show that

|P[φ(XB ) = B] ’ 2 | = 1 |P[φ(X0 ) = 1] ’ P[φ(X1 ) = 1]| ¤ 2 ∆[X0 ; X1 ].

1 1

2

Exercise 6.44. Let X, Y be random variables on a probability distribution,

and let B1 , . . . , Bn be events that partition of the underlying sample space,

where each Bi occurs with non-zero probability. For i = 1, . . . , n, let Xi

and Yi denote the random variables X and Y in the conditional probability

distribution given Bi ; that is, P[Xi = v] = P[X = v | Bi ], and P[Yi = v] =

P[Y = v | Bi ]. Show that

n

∆[X; Y ] ¤ ∆[Xi ; Yi ]P[Bi ].

i=1

Exercise 6.45. Let X and Y be random variables that take the same value

unless a certain event F occurs. Show that ∆[X; Y ] ¤ P[F].

Exercise 6.46. Let M be a large integer. Consider three random exper-

iments. In the ¬rst, we generate a random integer n between 1 and M ,

and then a random integer w between 1 and n. In the second, we gen-

erate a random integer n between 2 and M , and then generate a random

integer w between 1 and n. In the third, we generate a random integer

n between 2 and M , and then a random integer w between 2 and n. For

i = 1, 2, 3, let Xi denote the outcome (n, w) of the ith experiment. Show

that ∆[X1 ; X2 ] = O(1/M ) and ∆[X2 ; X3 ] = O(log M/M ), and conclude

that ∆[X1 ; X3 ] = O(log M/M ).

136 Finite and discrete probability distributions