As in the ¬nite case, an event is an arbitrary subset A of U. The prob-

ability P[A] of A is de¬ned as the sum of the probabilities associated with

the elements of A ”in the de¬nition (6.2), the sum is treated as an in¬nite

series when A is in¬nite. This series is guaranteed to converge, and its value

does not depend on the particular enumeration of the elements of A.

Example 6.33. Consider the geometric distribution discussed in Exam-

ple 6.30, where p is the success probability of each Bernoulli trial, and

q := 1 ’ p. For integer i ≥ 1, consider the event A that the number of

trials executed is at least i. Formally, A is the set of all integers greater

than or equal to i. Intuitively, P[A] should be q i’1 , since we perform at

least i trials if and only if the ¬rst i ’ 1 trials fail. Just to be sure, we can

6.10 Discrete probability distributions 143

compute

1

= q i’1 . 2

q u’1 p = q i’1 p q u = q i’1 p ·

P[A] = P[u] =

1’q

u≥i u≥i u≥0

It is an easy matter to check that all the statements made in §6.1 carry

over verbatim to the case of countably in¬nite sample spaces. Moreover, it

also makes sense in the countably in¬nite case to consider events that are a

union or intersection of a countably in¬nite number of events:

Theorem 6.23. Let A1 , A2 , . . . be an in¬nite sequence of events.

(i) If Ai ⊆ Ai+1 for all i ≥ 1, then P[ i≥1 Ai ] = limi’∞ P[Ai ].

i≥1 Ai ] ¤

(ii) In general, we have P[ i≥1 P[Ai ].

(iii) If the Ai are pairwise disjoint, then P[ i≥1 Ai ] = i≥1 P[Ai ].

(iv) If Ai ⊇ Ai+1 for all i ≥ 1, then P[ i≥1 Ai ] = limi’∞ P[Ai ].

Proof. For (i), let A := i≥1 Ai , and let a1 , a2 , . . . be an enumeration of the

elements of A. For any > 0, there exists a value k0 such that k0 ai > i=1

P[A] ’ . Also, there is some k1 such that {a1 , . . . , ak0 } ⊆ Ak1 . Therefore,

for any k ≥ k1 , we have P[A] ’ < P[Ak ] ¤ P[A].

(ii) and (iii) follow by applying (i) to the sequence { i Aj }i , and making

j=1

use of (6.5) and (6.6), respectively.

(iv) follows by applying (i) to the sequence {Ai }, using (the in¬nite version

of) DeMorgan™s law. 2

6.10.2 Conditional probability and independence

All of the de¬nitions and results in §6.2 carry over verbatim to the countably

in¬nite case. Equation (6.7) as well as Bayes™ theorem (equation 6.8) and

equation (6.9) extend mutatis mutandus to the case of an in¬nite partition

B1 , B2 , . . . .

6.10.3 Random variables

All of the de¬nitions and results in §6.3 carry over verbatim to the countably

in¬nite case (except Theorem 6.2, which of course only makes sense in the

¬nite setting).

144 Finite and discrete probability distributions

6.10.4 Expectation and variance

We de¬ne the expected value of a real random variable X exactly as before:

X(u) · P[u],

E[X] :=

u∈U

where, of course, the sum is an in¬nite series. However, if X may take

negative values, then we require that the series converges absolutely; that is,

we require that u∈U |X(u)| · P[u] < ∞ (see §A4). Otherwise, we say the

expected value of X does not exist. Recall from calculus that a series that

converges absolutely will itself converge, and will converge to the same value

under a re-ordering of terms. Thus, if the expectation exists at all, its value

is independent of the ordering on U. For a non-negative random variable

X, if its expectation does not exist, one may express this as “E[X] = ∞.”

All of the results in §6.4 carry over essentially unchanged, except that one

must pay some attention to “convergence issues.”

Equations (6.13) and (6.14) hold, but with the following caveats (verify):

• If X is a real random variable with image X , then its expected value

E[X] exists if and only if the series x∈X xP[X = x] converges abso-

lutely, in which case E[X] is equal to the value of the latter series.

• If X is a random variable with image X and f a real-valued function

on X , then E[f (X)] exists if and only if the series x∈X f (x)P[X = x]

converges absolutely, in which case E[f (X)] is equal to the value of

the latter series.

Example 6.34. Let X be a random variable whose distribution is as in

1/n2 converges and the series

Example 6.31. Since the series 1/n

2 ] does not. 2

diverges, the expectation E[X] exists, while E[X

Theorems 6.6 and 6.7 hold under the additional hypothesis that E[X] and

E[Y ] exist.

If X1 , X2 , . . . is an in¬nite sequence of real random variables, then the ran-

dom variable X := ∞ Xi is well de¬ned provided the series ∞ Xi (u)

i=1 i=1

converges for all u ∈ U. One might hope that E[X] = ∞ E[Xi ]; however,

i=1

this is not in general true, even if the individual expectations E[Xi ] are non-

negative, and even if the series de¬ning X converges absolutely for all u;

nevertheless, it is true when the Xi are non-negative:

Theorem 6.24. Let X := i≥1 Xi , where each Xi takes non-negative val-

ues only. Then we have

E[X] = E[Xi ].

i≥1

6.10 Discrete probability distributions 145

Proof. We have

E[Xi ] = Xi (u)P[u] = Xi (u)P[u]

i≥1 i≥1 u∈U u∈U i≥1

= P[u] Xi (u) = E[X],

i≥1

u∈U

where we use the fact that we may reverse the order of summation in an

in¬nite double summation of non-negative terms (see §A5). 2

Using this theorem, one can prove the analog of Theorem 6.8 for countably

in¬nite sample spaces, using exactly the same argument.

Theorem 6.25. If X is a random variable that takes non-negative integer

values, then

∞

P[X ≥ i].

E[X] =

i=1

A nice picture to keep in mind with regards to Theorem 6.25 is the follow-

ing. Let pi := P[X = i] for i = 0, 1, . . . , and let us arrange the probabilities

pi in a table as follows:

p1

p 2 p2

p 3 p3 p 3

. ..

. .

.

Summing the ith row of this table, we get iP[X = i], and so E[X] is equal

to the sum of all the entries in the table. However, we may compute the

same sum column by column, and the sum of the entries in the ith column

is P[X ≥ i].

Example 6.35. Suppose X is a random variable with a geometric distri-

bution, as in Example 6.30, with an associated success probability p and

failure probability q := 1 ’ p. As we saw in Example 6.33, for all integer

i ≥ 1, we have P[X ≥ i] = q i’1 . We may therefore apply Theorem 6.25 to

easily compute the expected value of X:

∞ ∞

1 1

= .2

q i’1 =

P[X ≥ i] =

E[X] =

1’q p