ñòð. 19 |

analysts. The logician knows his work on computational complexity. To the

applied mathematician he is the man who made the single most important

contribution to the impossibly dif?cult problem of understanding turbulent

?uid ?ow. With Arnold and Moser he revolutionised the theory of dynami-

cal systems. With Sinai he added a new invariant to ergodic theory. With

Smirnov he gave statisticians a new tool for non-parametric testing. And

so on, and so on.

But there was a unifying thread to all his work, which was his fascination

with the mathematics of probability. That was the subject of his early work,

the title of his professorial chair, and it remained a major interest throughout

his long mathematical life. Moreover, many of his successful forays into other

parts of the subject were in effect applications of his deep understanding of

stochastic phenomena. I hope therefore that I am not being unduly partisan

if, as a probabilist myself, I concentrate here on the impact of Kolmogorov

on modern probability theory.

Probability calculations go back for centuries before Kolmogorov. They

began to assume modern form with the work of Fermat and Pascal, the normal

distribution of de Moivre and Laplace gave Gauss the tool for his theory of

errors, and later the Russian school of Chebyshev, Bernstein and Markov

showed how complex a calculus could be developed.

But these calculations were distrusted by many mathematicians well into

the twentieth century. When I was a student in Cambridge in the 1950s I was

discouraged from studying probability because it was neither proper pure

mathematics nor the application of mathematics to well-de?ned physical

phenomena. It dealt with ‘random variables’, which were entities subject

to random variation about which all one could say was that probability

statements could be made about them. What these random variables were,

or what the probability statements meant, was left obscure. The basic tools

71

were an ‘addition law of probability’ and a ‘multiplication law of probabil-

ity’, which had a formal similarity but one of which was an axiom and the

other a circular de?nition.

Had we but known, the answer to these dif?culties was contained in

a little book, little in size but gigantic in importance, that had been published

in German in 1933 as Grundbegriffe der Wahrscheinlichkeitsrechnung, but

not until 1950 in English translation. Seventy years later, this book remains

a remarkably modern account of the theory of probability; with a few

changes of notation it could be used as the basis of an excellent lecture

course today.

Mathematicians tend to be bored by foundations, regarding them as best

left to a few eccentric pedants while the real mathematics is done according

to accepted canons which the logicians may ?nd inadequate. Probability

is somewhat different, and experience shows that the subject needs to go

back regularly to its basis, and its basis is that laid down by Kolmogorov in

1933, neither more nor less. If you think about it, that is a very remarkable

statement to make about a 30 year old, and I want to illustrate it in relation

to the development of probability theory over the 70 years since the book

appeared.

As everyone knows, the fundamental concept is that of a probability

space. This is an abstract set whose elements (denoted by small Greek let-

ters) are thought of as the outcomes of some random phenomenon, but of

much greater importance is a distinguished class of subsets (denoted by

italic capitals) which are the events about which probability statements

can be made. Every event has associated with it a number between 0 and 1

which is its probability, and it is assumed that the class of events is closed

under countable set-theoretic operations (that it is a Borel ?eld), and that

probability is a countably additive function from the Borel ?eld of events

onto the unit interval (a probability measure).

He saw that this formulation made a probability space an abstraction

of the Borel–Lebesgue theory of measure and integration that had already

transformed the integral calculus of Newton and Riemann. Random variables

could now be de?ned as measurable functions from the probability space

to the real line (or indeed to more complicated spaces), they induced prob-

ability measures as distributions and joint distributions and thus validated

the calculations of probabilists from Fermat to Markov.

A powerful extension theorem allowed the construction of probability

spaces whose elements are functions of a (time or space) parameter, and

this is the foundation of the whole theory of random processes. The modern

72

de?nition of conditional probability, based on the Radon–Nikodym theorem,

is there in almost full generality. Independence is properly presented as a

special case, and a good version of the strong law of large numbers is proved,

as is a general zero-one law.

One of the most perceptive insights in the book is the importance of the

Borel ?eld of events. In the Borel–Lebesgue theory of measure on Euclidean

space, the notion of a measurable set is a way of excluding pathological

subsets. Every set that the analyst encounters is, if not Borel measurable, at

least Lebesgue measurable, and indeed it is possible to do most real analysis

in an axiom system in which every set is Lebesgue measurable.

Kolmogorov could have followed the same path, but he realised that

a much more useful approach was to regard the Borel ?eld as describing

the information available in a particular context. For instance, in a random

process evolving in time, we have at any time t a Borel ?eld depending on

the value of t. The events in this ?eld are those of which we can say with

con?dence at time t whether or not they have occurred. As t increases, the

?eld gets bigger and bigger, and we have the concept of a ?ltration, an increas-

ing family of Borel ?elds, which is basic to the modern theory of temporal

random processes. Combined with his de?nition of conditional probability,

it facilitated in the hands of Doob and others the theory of martingales and

stopping times that is essential to the development of Markov processes.

Markov processes are only referred to in passing in the Grundbegriffe,

but they were central to Kolmogorov’s later work in probability. Markov

had studied them as a simple generalisation of independent sequences of

random events, in which the probability of one event could depend on its

predecessor but not further back. He had set up the fundamental equation,

a special case of what we now call the Chapman–Kolmogorov equation,

and had seen that a matrix formulation of this equation permitted useful

calculations and limiting results. (The mysterious name Chapman is that of

a distinguished geophysicist Sydney Chapman, who derived a very special

case of the equation in a particular physical problem.)

In fact, the Markov property can now be seen as the exact analogue for

random processes evolving in time of the property of a well-posed dynamical

system. If for instance we model the evolution of the orbits of the planets

around the Sun, we need enough dynamical variables that we can write

down differential equations giving the rates of change of all the variables as

functions of some or all of them. The system of differential equations should

then predict the way in which the dynamics will evolve in time, although in

practice instability and chaos may limit the value of such predictions.

73

Likewise, in a system evolving randomly in time, we need to specify

the state at time t in enough detail that the probability of events after t, con-

ditional on the past up to t, depends only on the present state. In principle

this can always be done (take the state at t to be the conditional probability

given the whole past), but useful calculations depend, as in the deterministic

case, on being able to summarise this complexity into a reasonably simple

format. In other words, we must reduce the problem to a Markov process

on a reasonably concrete state space.

It is one sign of a great mathematician to choose the right level of ab-

straction and generality. Too special, and the results are parochial and lack

application. Too abstract, and they are shallow and super?cial. Kolmogorov

saw that the most interesting case was that of the Markov process, in discrete

or in continuous time, with a countable in?nity of states. He proved the fun-

damental limit theorem that determines the stationary distribution (when it

exists) of the state after a long time, thus anticipating the Erdos–Feller–Pol-

lard theorem on recurrent events. He showed that the process could recur

in?nitely often to its starting state without having a stationary distribution, or

could wander off to in?nity (and could have different destinations at in?nity,

so that the countable state space had an intrinsic compacti?cation).

Markov processes in continuous time have the convenient property that

their stochastic properties are often determined by their transition rates, so

that one can neglect transitions that in time intervals of small length h have

probabilities of smaller order than h. Kolmogorov gave conditions for this

to be true, but he also produced remarkable examples of processes where

this property fails. His work, rigorous and analytic, complemented the

highly intuitive description by Paul Levy of the possible behaviour of these

processes. It led in all sorts of fascinating directions, to the Hille–Yosida

theory of in?nitesimal generators, to the Levy–Trotter theory of local time,

to Chung’s de?nitive treatment of the analytic properties of Markov transi-

tion functions and the subsequent characterisation of Markov transition

functions, to the martingale analysis of Markov processes that culminated

in the intrinsic compacti?cation of the state space, and so on.

But Kolmogorov’s approach to Markov’s theory also made applied prob-

ability possible. When the operational researcher builds models of complex

queueing systems, when the biologist analyses the evolution of a genetically

diverse population, when the statistician uses Markov chain Monte Carlo

techniques of modern Bayesian analysis, they are basing their calculations

on a foundation that Kolmogorov made secure.

ñòð. 19 |