well de¬ned, and satis¬es 0 < D < ∞.

5.6 Notes

The prime number theorem was conjectured by Gauss in 1791. It was proven

independently in 1896 by Hadamard and de la Vall´e Poussin. A proof of

e

the prime number theorem may be found, for example, in the book by Hardy

and Wright [44].

Theorem 5.21, as well as the estimates for the constants A, B1 , and B2

mentioned in that theorem and Exercises 5.13, 5.14, and 5.15, are from

Rosser and Schoenfeld [79].

Theorem 5.17 is from Wal¬sz [96].

Theorem 5.19, which made the ¬rst connection between the theory of

prime numbers and the zeta function, was discovered in the 18th century

by Euler. The Riemann hypothesis was made by Riemann in 1859, and

to this day, remains one of the most vexing conjectures in mathematics.

Riemann in fact showed that his conjecture about the zeros of the zeta

function is equivalent to the conjecture that for each ¬xed > 0, π(x) =

li(x) + O(x1/2+ ). This was strengthened by von Koch in 1901, who showed

5.6 Notes 95

that the Riemann hypothesis is true if and only if π(x) = li(x)+O(x1/2 log x).

See Chapter 1 of the book by Crandall and Pomerance [30] for more on

the connection between the Riemann hypothesis and the theory of prime

numbers; in particular, see Exercise 1.36 in that book for an outline of a

proof that Conjecture 5.18 follows from the Riemann hypothesis.

A warning: some authors (and software packages) de¬ne the logarithmic

integral using the interval of integration (0, x), rather than (2, x), which

increases its value by a constant c ≈ 1.0452.

Theorem 5.22 was proved by Dirichlet in 1837, while Theorem 5.23 was

proved by de la Vall´e Poussin in 1896. A result of Oesterl´ [69] implies

e e

that Conjecture 5.24 for d ≥ 3 is a consequence of an assumption about the

location of the zeros of certain generalizations of Riemann™s zeta function;

the case d = 2 follows from the bound in Conjecture 5.18 under the ordinary

Riemann hypothesis. Theorem 5.25 is from Heath-Brown [45].

Hypothesis H is from Hardy and Littlewood [43].

For the reader who is interested in learning more on the topics discussed

in this chapter, we recommend the books by Apostol [8] and Hardy and

Wright [44]; indeed, many of the proofs presented in this chapter are minor

variations on proofs from these two books. Our proof of Bertrand™s postu-

late is based on the presentation in Section 9.2 of Redmond [76]. See also

Bach and Shallit [12] (especially Chapter 8), Crandall and Pomerance [30]

(especially Chapter 1) for a more detailed overview of these topics.

The data in Tables 5.1 and 5.2 was obtained using the computer program

Maple.

6

Finite and discrete probability distributions

This chapter introduces concepts from discrete probability theory. We begin

with a discussion of ¬nite probability distributions, and then towards the end

of the chapter we discuss the more general notion of a discrete probability

distribution.

6.1 Finite probability distributions: basic de¬nitions

A ¬nite probability distribution D = (U, P) is a ¬nite, non-empty set

U, together with a function P that maps u ∈ U to P[u] ∈ [0, 1], such that

P[u] = 1. (6.1)

u∈U

The set U is called the sample space and the function P is called the

probability function.

Intuitively, the elements of U represent the possible outcomes of a random

experiment, where the probability of outcome u ∈ U is P[u].

Up until §6.10, we shall use the phrase “probability distribution” to mean

“¬nite probability distribution.”

Example 6.1. If we think of rolling a fair die, then U := {1, 2, 3, 4, 5, 6},

and P[u] := 1/6 for all u ∈ U gives a probability distribution describing the

possible outcomes of the experiment. 2

Example 6.2. More generally, if U is a ¬nite set, and P[u] = 1/|U| for all

u ∈ U, then D is called the uniform distribution on U. 2

Example 6.3. A coin ¬‚ip is an example of a Bernoulli trial, which is

in general an experiment with only two possible outcomes: success, which

occurs with probability p, and failure, which occurs with probability q :=

1 ’ p. 2

96

6.1 Finite probability distributions: basic de¬nitions 97

An event is a subset A of U, and the probability of A is de¬ned to be

P[A] := P[u]. (6.2)

u∈A

Thus, we extend the domain of de¬nition of P from outcomes u ∈ U to

events A ⊆ U.

For an event A ⊆ U, let A denote the complement of A in U. We have

P[…] = 0, P[U] = 1, P[A] = 1 ’ P[A].

For any events A, B ⊆ U, if A ⊆ B, then P[A] ¤ P[B]. Also, for any events

A, B ⊆ U, we have

P[A ∪ B] = P[A] + P[B] ’ P[A © B] ¤ P[A] + P[B]; (6.3)

in particular, if A and B are disjoint, then

P[A ∪ B] = P[A] + P[B]. (6.4)

More generally, for any events A1 , . . . , An ⊆ U we have

P[A1 ∪ · · · ∪ An ] ¤ P[A1 ] + · · · + P[An ], (6.5)

and if the Ai are pairwise disjoint, then

P[A1 ∪ · · · ∪ An ] = P[A1 ] + · · · + P[An ]. (6.6)

In working with events, one makes frequent use of the usual rules of

Boolean logic. DeMorgan™s law says that for events A and B, we have

A ∪ B = A © B and A © B = A ∪ B.

We also have the distributive law: for events A, B, C, we have

A © (B ∪ C) = (A © B) ∪ (A © C) and A ∪ (B © C) = (A ∪ B) © (A ∪ C).

In some applications and examples, it is more natural to use the logical

“or” connective “∨” in place of “∪,” and the logical “and” connective “§”

in place of “©.”

Example 6.4. Continuing with Example 6.1, the probability of an “odd

roll” A = {1, 3, 5} is 1/2. 2

Example 6.5. More generally, if D is the uniform distribution on a set U

of cardinality n, and A is a subset of U of cardinality k, then P[A] = k/n.

2

Example 6.6. Alice rolls two dice, and asks Bob to guess a value that

appears on either of the two dice (without looking). Let us model this

98 Finite and discrete probability distributions

situation by considering the uniform distribution on {(x, y) : x, y = 1, . . . , 6},

where x represents the value of the ¬rst die, and y the value of the second.

For x = 1, . . . , 6, let Ax be the event that the ¬rst die is x, and Bx

the event that the second die is x, Let Cx = Ax ∪ Bx be the event that x

appears on either of the two dice. No matter what value x Bob chooses, the

probability that this choice is correct is

P[Cx ] = P[Ax ∪ Bx ] = P[Ax ] + P[Bx ] ’ P[Ax © Bx ]

= 1/6 + 1/6 ’ 1/36 = 11/36. 2

If D1 = (U1 , P1 ) and D2 = (U2 , P2 ) are probability distributions, we can

form the product distribution D = (U, P), where U := U1 — U2 , and

P[(u1 , u2 )] := P1 [u1 ]P2 [u2 ]. It is easy to verify that the product distribution

is also a probability distribution. Intuitively, the elements (u1 , u2 ) of U1 —U2

denote the possible outcomes of two separate and independent experiments.

More generally, if Di = (Ui , Pi ) for i = 1, . . . , n, we can de¬ne the product

distribution D = (U, P), where U := U1 — · · · — Un , and P[(u1 , . . . , un )] :=

P[u1 ] . . . P[un ].

Example 6.7. We can view the probability distribution in Example 6.6 as

the product of two copies of the uniform distribution on {1, . . . , 6}. 2

Example 6.8. Consider the product distribution of n copies of a Bernoulli

trial (see Example 6.3), with associated success probability p and failure

probability q := 1 ’ p. An element of the sample space is an n-tuple of

success/failure values. Any such tuple that contains, say, k successes and

n ’ k failures, occurs with probability pk q n’k , regardless of the particular

positions of the successes and failures. 2