in space). Principal Component Analysis, or a positive anomaly (i.e., a positive deviation from

Empirical Orthogonal Function (EOF) Analysis as the mean pro¬le) at 300 hPa and at 950 hPa at the

it is called in the Earth Sciences, was described same time?

by Pearson [309] in 1902 and by Hotelling [186] EOF analysis is a technique that is used to

in 1935. EOF analysis was introduced into identify patterns of simultaneous variation. To

meteorology by Lorenz [259] in 1956. demonstrate the concept we let xt represent the

m = 9 level geopotential height pro¬le observed

Concepts in linear algebra that are needed to

at time t. The mean pro¬le is denoted by µ and to

read this chapter (linear bases, matrix properties,

describe the variability we form the anomalies

eigenanalysis and singular value decomposition)

are offered in Appendix B. Empirical Orthogonal

xt = xt ’ µ.

Functions are formally de¬ned in Section 13.1.

These anomalies are then expanded into a ¬nite

Techniques for estimating EOFs, eigenvalues, and

series

EOF coef¬cients are explained in Section 13.2. We

discuss the quality of estimates in Section 13.3. k

xt = ± i,t e i (13.1)

Several EOF analyses of climate-related problems

are given as examples in Section 13.4. Rotated i=1

EOFs1 are dealt with in Section 13.5. Finally, a with time coef¬cients ± i,t and ¬xed patterns e i .

time series analysis technique called Singular Sys-Equality is usually only possible when k = m, but

tems Analysis, which uses the same mathematics the variance of the time coef¬cients ± i,t usually

as EOF analysis, is introduced in Section 13.6. decreases quickly with increasing index i, so that

An alternative introduction to EOFs is given by good approximations are usually possible for k

von Storch [387]. much less than m. The patterns are chosen to be

orthogonal so that optimal coef¬cients ± i,t are

13.0.1 Introductory Example:2 Daily Pro¬le obtained by simply projecting the anomalies xt

of Geopotential Height at Berlin. To motivate onto the patterns e i . Moreover, the patterns can be

the concept of Empirical Orthogonal Functions speci¬ed such that the error

we consider a time series of daily geopotential

k 2

height pro¬les as obtained by radiosonde at Berlin xt ’ ± i,t e i

(Germany) (Fraedrich and D¨ mmel [125]). A

u t i=1

total of 1080 observations are available in each

winter (NDJF) season: 120 days times 9 vertical is minimal. The lag-0 sample cross-correlations of

levels between 950 hPa and 300 hPa. Thus, in the optimal time coef¬cients are all zero,

± i,t ± j,t = 0

a 20-year data set we have 21 600 observations

at our disposal to describe the statistics of the t

geopotential height at Berlin in winter. for i = j. The patterns e j are estimated Empirical

The mean state can be estimated by computing Orthogonal Functions.3 The coef¬cients ± are the

i

the mean value at each level. But how should we EOF coef¬cients.4

describe the variability? One way would be to

3 Note that the ˜functions™ e k are really vectors and not

1 A misnomer. functions.

2 The mathematics in this subsection are explained in more 4 Statisticians refer to the EOF coef¬cients as principal

detail in Sections 13.1 and 13.2. components.

293

13: Empirical Orthogonal Functions

294

the same sign throughout the troposphere, that is,

they exhibit an equivalent barotropic structure. The

second EOF, however, changes sign in the middle

of the troposphere: it represents the ¬rst baroclinic

mode.7

13.0.2 ˜Complex™ EOFs. EOFs may be derived

from real- or complex-valued random vectors.

The latter results in complex-valued EOFs. The

˜Complex EOF Analysis™ (CEOF) described in

the climate literature (see [181]) is a special case

of the EOF analysis of complex random vectors.

The time order of the observations is important

Figure 13.1: The ¬rst two EOFs, labelled z 1 and

for these ˜CEOFs,™ or ˜Frequency Domain EOFs,™

z 2 , of the daily geopotential height over Berlin in

since they are the EOFs of a complexi¬ed time

winter. From Fraedrich and D¨ mmel [125].

u

series. In contrast, the time order of observations

is irrelevant in ordinary complex EOF analysis.

The original real-valued time series is made

The analysis of daily Berlin radiosonde data

complex by adding its Hilbert transform (see

showed that only two patterns are required to

Section 16.2) as the imaginary component. The

describe most of the variability in the observed

Hilbert transform can be thought of as the time

geopotential height pro¬les in winter (NDJF) as

derivative of the original process so that the

well as in summer (MJJA). In winter, the ¬rst

EOF analysis of the complexi¬ed process reveals

EOF represents 91.2% of the variance (92.6%

properties of the variability of the state and its

in summer), and the second EOF represents an

additional 8.2% of the variance (7% in summer).5 change at the same time. To avoid confusion with

the ordinary complex EOF analysis we refer to

The remaining seven EOFs, which together with

these EOFs as Hilbert EOFs (see Section 16.3).

the ¬rst two EOFs span the full nine-dimensional

space, represent only 0.6% of the variance of the

height pro¬les (0.4% in summer). Thus, only two

13.1 De¬nition of Empirical

coef¬cient time series are required to represent

Orthogonal Functions

the essential information in the time series of

geopotential height at the nine levels. Instead of

13.1.1 Overview. EOFs are introduced formally

dealing with 1080 numbers per season, only 2 —

in this section as parameters of the distribution of

120 = 240 are needed. This demonstrates one of

an m-dimensional random vector X.8 For the sake

the advantages of EOFs, namely the ability to often

of brevity we assume µ = 0. We ¬rst construct

identify a small subspace that contains most of the

the ¬rst EOF, which is the most powerful single

dynamics of the observed system.6

pattern in representing the variance of X. The

Another advantage is that the patterns can

idea is easily generalized to several patterns and

sometimes be seen as modes of variability.

in [13.1.3] the calculations are condensed into a

In the present example the two patterns e 1

theorem.

and e 2 may be identi¬ed with the equivalent

barotropic mode and the ¬rst baroclinic mode of

13.1.2 The First EOF. The ¬rst step is to ¬nd

the tropospheric circulation: The ¬rst patterns in

one ˜pattern™ e 1 , with e 1 = 1, such that

winter (Figure 13.1) as well as in summer have

2

=E X ’ X, e 1 e 1 (13.2)

5 When we say that an expansion Y ˜represents™ p% of the 1

variance of X , we mean that the variance of Y ’X is (100’ p)%

is minimized.9 Equation (13.2) describes the

of the variance of X . The word ˜explains™ is often used instead

of the word ˜represents™ in the literature. This is misleading

projection of the random vector X onto a one-

since nothing is explained causally; only part of the variability

dimensional subspace spanned by the ¬xed vector

of X has been described by Y .

6 The assumption that the subspace with maximum variance

7 A similar result for the vertical structure of the shelf ocean

coincides with the dynamically active subspace is arbitrary.

has been reported by Kundu, Allen, and Smith [234].

In general, it will not be valid and counter examples can

8 Mainly based on [392].

easily be constructed. However, in climate research, it is often

9 · denotes the vector norm, and ·, · denotes the inner

reasonable to make this assumption. An example demonstrating