common part of the two events, A © B, is ˜heads.™ Such functions are referred to as random

included in both A and B and thus P (A © B) variables. We will usually use a bold face upper

is included in the calculation of P (A)+P (B) case character, such as X, to denote the function

and a bold face lower case variable x to denote a

twice.

particular value taken by X. This value is also often

referred to as a realization of X.

2.2.5 Conditional Probability. Consider a Random variables are variable because their

weather event A (such as the occurrence of values depend upon which event in S takes

severe convective activity) and suppose that the place when the experiment is conducted. They

climatological probability of this event is P (A). are random because the outcome in S, and hence

Now consider a 24-hour weather forecast that the value of the function, can not be predicted in

describes an event B within the daily weather advance.

sample space. If the forecast is skilful, our Random variables are discrete if the collection

perception of the likelihood of A will change. That of values they take is enumerable, and continuous

is, the probability of A conditional upon forecast otherwise. Discrete random variables will be

B, which is written P (A|B), will not be the same discussed in this section and continuous random

as the climatological probability P (A). variables in Section 2.6.

The conditional probability of event A, given an The probability of observing any particular

event B for which P (B) = 0, is value x of a discrete random variable X is

determined by characterizing the event {X =

P (A|B) = P (A © B)/P (B). (2.2) x} and then calculating P (X = x). Thus, its

randomness depends upon both P (·) and how X

The interpretation is that only the part of A

is de¬ned on S.

that is contained within B can take place, and

thus the probability that this restricted version

of A takes place must be scaled by P (B) to 2.3.2 Probability and Distribution Functions.

account for the change of context. Note that all In general, it is cumbersome to use the sample

conditional probabilities range between 0 and 1, space S and the probability rule P (·) to

just as ordinary probabilities do. In particular, describe the random, or stochastic characteristics

P (S|B) = P (B|B) = 1. of a random variable X. Instead, the stochastic

2: Probability Theory

22

expected value E(X) is the location of the centre

properties of X are characterized by the probability

of mass of the collection of particles.

function f X and the distribution function FX .

The idea of expectation is easily extended to

The probability function f X of a discrete

functions of random variables. Let g(·) be any

random variable X associates probabilities with

function and let X be a random variable. The

values taken by X. That is

expected value of g(X) is given by

f X (x) = P (X = x).

E g(X) = g(x) f X (x).

Two properties of the probability function are: x

• 0 ¤ f X (x) ¤ 1 for all x, and The interpretation of the expected value as the

average value of g(X) remains the same.

• x f X (x) = 1, where the notation We often use the phrase expectation operator

x

indicates that the summation is taken over all to refer to the act of computing an expectation

possible values of X. because we operate on a random variable (or a

function of a random variable) with its probability

The distribution function FX of a discrete random function to derive one of its properties.

variable X is de¬ned as A very useful property of the expectation

operator E is that the expectation of a sum is a sum

FX (x) = f X (y).

of expectations. That is, if g1 (·) and g2 (·) are both

y¤x

functions de¬ned on the random variable X, then

Some properties of the distribution function are:

E g1 (X) + g2 (X) = E g1 (X) + E g2 (X) .

• FX (x) ¤ FX (y) if x ¤ y, (2.4)

• limx’’∞ FX (x) = 0, and Another useful property is that if g(·) is a

function of X and a and b are constants, then

• limx’+∞ FX (x) = 1.

The phrase probability distribution is often used E ag(X) + b = aE g(X) + b. (2.5)

to refer to either of these functions because the As a special case, note that the expectation of a

probability function can be derived from the constant, say b, is that constant itself. This is, of

distribution function and vice versa. course, quite reasonable. A constant can be viewed

as an example of a degenerate random variable.

2.3.3 The Expectation Operator. A random It has the same value b after every repetition of

variable X and its probability function f X together an experiment. Thus, its average value in repeated

constitute a model for the operation of an sampling must also be b.

A special class of functions of a random variable

experiment: every time it is conducted we obtain

a realization x of X with probability f X (x). A is the collection of powers of the random variable.

natural question is to ask what the average value of The expectation of the kth power of a random

X will be in repeated operation of the experiment. variable is known as the kth moment of X.

For the coin tossing experiment, with X being the Probability distributions can often be identi¬ed

number of ˜heads,™ the answer is 0 — 1 + 1 — 3 + by their moments. Therefore, the determination

8 8

2 — 8 + 3 — 8 = 2 because we expect to observe of the moments of a random variable sometimes

3 1 3

X = 0 (no ˜heads™ in three tosses of the coin) 1/8 proves useful when deriving the distribution of a

of the time, X = 1 (one ˜head™ and two ˜tails™) 3/8 random variable that is a function of other random

of the time, and so on. Thus, in this example, the variables.

expected value of X is 1.5.

In general, the expected value of X is given by 2.3.4 The Mean and Variance. In the preceding

subsection we de¬ned the expected value E(X)

E(X) = x f X (x). of the random variable X as the mean of X

x

itself. Frequently the symbol µ (µ X when clarity

The expected value of a random variable is is required) is used to represent the mean. The

also sometimes called its ¬rst moment, a term that phrase population mean is often used to denote the

has its roots in elementary physics. Think of a expected value of a random variable; the sample

collection of particles distributed so that the mass mean is the mean of a sample of realizations of a

of the particles at location x is f X (x). Then the random variable.

2.4: Examples of Discrete Random Variables 23

a sample space. Such related random variables

Another important part of the characterization

are conveniently organized into a random vector,

of a random variable is dispersion. Random

de¬ned as follows:

variables with little dispersion have realizations

A random vector X is a vector of scalar

tightly clustered about the mean, and vice versa.

random variables that are the result of the same

There are many ways to describe dispersion, but it

experiment.

is usually characterized by variance.

All elements of a random vector are de¬ned on

The population variance (or simply the vari-

the same sample space S. They do not necessarily

ance) of a discrete random variable X with prob-

all have the same probability distribution, because

ability distribution f X is given by

their distributions depend not only on the

Var(X) = E (X ’ µ X )2 generating experiment but also on the way in

which the variables are de¬ned on S.

= (x ’ µ X )2 f X (x).

We will see in Section 2.8 that random vectors

x

also have properties analogous to the probability

The variance is often denoted by σ 2 or σ X .

2

function, mean, and variance.

The square root of the variance, denoted as σ X , The terms univariate and multivariate are often

is known as the standard deviation. used in the statistical literature to distinguish

In the coin tossing example above, in which X between problems that involve a random variable

is the number of ˜heads™ in three tosses with an and those that involve a random vector. In the

honest coin, the variance is given by context of climatology or meteorology, univariate

means a single variable at a single location.

3 1 32 1

2

σ2 = 0 ’ —

+ 3’ — Anything else, such as a single variable at multiple

2 8 2 8

locations, or more than one variable at more than

3 3 3 3 3

2 2

+ 1’ — + 2’ —=. one location, is multivariate to the statistician.