the dynamical dominance of EOFs is given by Selten [343].

13.1: De¬nition of Empirical Orthogonal Functions 295

e 1 . Minimizing 1 is equivalent to the maximizing to its second largest eigenvalue »2 .14 This second

of the variance of X that is contained in this pattern is orthogonal to the ¬rst because the

subspace: eigenvectors of a Hermitian matrix are orthogonal

to one another.

—

2

= E X ’ 2 X, e 1 X† e 1

1

— 13.1.4 Theorem. The following theorem results

+ X, e 1 X, e 1

from the analysis presented so far.

—

2

=E X ’ X, e 1 X, e 1 Let X be an m-dimensional random vector with

mean µ and covariance matrix Σ. Let »1 ≥

= Var(X) ’ Var X, e 1 ,

»2 ≥ · · · ≥ »m be the eigenvalues of Σ and

where the variance of the random vector X is let e , . . . , e be the corresponding eigenvectors

1 m

de¬ned to be the sum of variances of the elements of unit length. Since Σ is Hermitian, the

of X.10 Note that eigenvalues are non-negative and the eigenvectors

are orthogonal.

Var X, e 1 = e 1† Σe 1 ,

(i) The k eigenvectors that correspond to

»1 , . . . , »k minimize

where Σ is the covariance matrix of X. Then

i 2.

minimization of equation (13.2), under the k

k = E (X ’ µ) ’ i=1 X ’ µ, e e

i

constraint e 1 = 1, leads to

(13.3)

d

’e 1† Σe 1 + » e 1† e 1 ’ 1 k

1

de (ii) k = Var(X) ’ »i . (13.4)

= 2Σe 1 + 2»e 1 = 0

i=1

where » is the Lagrange multiplier associated m

11 Thus, e 1 is (iii) Var(X) = »i . (13.5)

= 1.

1

with the constraint e

i=1

an eigenvector with a corresponding eigenvalue

», of the covariance matrix Σ. But Σ has m

The total variance of X is broken up into m

eigenvectors. Therefore, to minimize 1 , we select

components. Each of these components is obtained

the eigenvector that maximizes

by projecting X onto one of the EOFs e i . The

Var X, e 1 = e 1† Σe 1 variance contribution of the kth component to the

j » j is just »k . In relative terms,

total variance

= e 1† »e 1 = ».

the proportion of the total variance represented by

Thus 1 is minimized when e 1 is an eigenvector of EOF k is »k / j » j . This proportion may be given

Σ associated with its largest eigenvalue ».12 This as a percentage.

If the components are ordered by the size of the

˜pattern™ is the ¬rst EOF.13

eigenvalues then the ¬rst component is the most

important in representing variance, the second is

13.1.3 More EOFs. Having found the ¬rst EOF,

the second most important and so forth.

we now repeat the exercise by ¬nding the ˜pattern™

Equation (13.3) gives the mean squared error

e 2 that minimizes

k that is incurred when approximating the

1 e 1 ) ’ X, e 2 e 2 2

2 = E (X ’ X, e full m-dimensional random vector X in a k-

dimensional subspace spanned by the ¬rst k EOFs.

subject to the constraint that e 2 = 1. The result The construction of the EOFs ensures that the

is that e 2 is the eigenvector of Σ that corresponds approximation is optimal; the use of any other

10 That is, if X has covariance matrix Σ, then we de¬ne k-dimensional subspace will lead to mean squared

errors at least as large as k .

Var X = tr(Σ).

11 Graybill [148, Section 10.8], describes the differentiation

13.1.5 Properties of the EOF Coef¬cients. The

of quadratic forms.

12 Recall (see Appendix B) that all eigenvalues of the

EOF coef¬cients, or principal components,

Hermitian matrix Σ= E(XX† ) are real and non-negative.

±i = X, e i = XT e i— = e i† X

13 The pattern is unique up to sign if Σ has only one

(13.6)

eigenvector that corresponds to eigenvalue ». Otherwise, the

14 Note that » = » if e 1 is degenerate. In fact, if » has k

pattern can be any vector with unit norm that is spanned by the 1 2 1

eigenvectors corresponding to ». In this case, the EOF is said to linearly independent eigenvectors, then k of the m eigenvalues

of Σ will be equal to ».

be degenerate. See Appendix B.

13: Empirical Orthogonal Functions

296

are uncorrelated, and hence independent when X ±1 , . . . , ±m . Because the EOFs are orthonormal,

is multivariate normal. In fact, for i = j, the expression (13.8) may be inverted to obtain

j— ± = P † X,

Cov ±i , ± j = E (X ’ µ), e i (X ’ µ), e (13.10)

= e i† E (X ’ µ)(X ’ µ)† e j

where P † is the conjugate transpose of P. Another

consequence of the orthonormality of the EOFs is

= e i† Σe j

that

= » j e i† e j = 0

Σ = Cov(X, X)

Therefore, the variance of Xk , the kth compo-

= PCov(±, ±)P †

nent of X, can also be decomposed into contribu-

= P P†

tion from the individual EOFs as

where is the diagonal m — m matrix composed

m

Var(Xk ) = »i |ek |2 .

i

(13.7) of the eigenvalues of Σ,

i=1

= diag(»1 , . . . , »m ).

If the elements of X represent locations in

space, the spatial distribution of variance can be It therefore follows that

visualized by plotting Var(Xk ) as a function of m

location. Similarly, the variance contribution from Var(X) = Var(Xk )

the ith EOF can be visualized by plotting »i |ek i |2 k=1

= tr(Σ)

or »i |ek |2 /Var(Xk ) as a function of location.

i

= tr(P P † )

m

13.1.6 Interpretation. The bulk of the variance

= tr( ) = »k .

of X can often be represented by the ¬rst few

k=1

EOFs. If the original variable has m components

the approximation of X by ± = (±1 , . . . , ±k ), It also follows that the eigenvalues are the m

m, leads to a signi¬cant reduction of roots of the mth degree characteristic polynomial

with k

the amount of data while retaining most of the p (») = det(Σ ’ »I), where I is the m — m

variance. It was shown in the introductory example identity matrix. In fact

of Berlin geopotential height [13.0.2] that just two

p (») = det(P P † ’ »PP † )

EOFs represent almost all of the information in the

= det P( ’ »I)P †

data set.

m