bution of ±, where ± is the m-dimensional

the n — n identity matrix, and J is the n — n

vector of EOF coef¬cients ± j , conditional

matrix composed entirely of units. The n columns

of the m — n data matrix X are the sample vectors upon the samples used to estimate the

x1 , . . . , xn ; the rows mark the m coordinates in EOFs is multivariate normal with mean

†

the original space. The matrix product X X † is a E(±|x1 , . . . , xm ) = P µ and covariance

square matrix even if X is not. †

matrix Cov(±, ±|x1 , . . . , xm ) = P ΣP. Ma-

trix P, which has e j in column j, is a

complicated function of x1 , . . . , xm .

13.2.5 Theorem. The following theorem is

often useful when computing eigenvalues and

• » j is the variance of the EOF coef¬cients

eigenvectors [391].

computed from the sample used to estimate

Let A be any m — n matrix. If » is a nonzero

the EOFs. That is, if ± ji = Xi , e j , then

eigenvalue of multiplicity s of A† A with s linearly n

i=1 |± ji ’ ± j | = » j .

1 2

independent eigenvectors e 1 , . . . , e s , then » is n

also an s-fold eigenvalue of AA† with s linearly Note that » j has at least two interpretations

independent eigenvectors Ae 1 , . . . , Ae s . as a variance estimate. We could regard » j

as an estimate of the variance of the true

EOF coef¬cient ± j = X, e j (see [13.3.3]).

A proof is given in Appendix M.

Alternatively, we could view the estimated

EOFs e j as ¬xed, not quite optimal, proxies

13.2.6 Recipe. The message of Theorem

for e j . Then » j could be viewed as an

[13.2.5] is that the nonzero eigenvalues of AA† are

estimator of the variance of ± i = X, e j

identical to those of A† A and that the eigenvectors

when e j is ¬xed (see [13.3.2]). These two

of the two matrices associated with nonzero

variances are not equal, although they become

eigenvalues are related through a simple linear

asymptotically equivalent as n ’ ∞. Thus,

relationship. Thus the following recipe may be

at least one of the interpretations makes » j a

used to estimate EOFs.

biased estimator. In fact, they are both poor

• If the sample size, n, is larger than the estimators when the sample is small. In the

former case there is uncertainty because the

dimension of the problem, m, then the EOFs

EOFs must be estimated. In the latter case

are calculated directly as the normalized

eigenvectors of the m — m matrix n X (I ’

1 the EOFs are regarded as ¬xed, but there is

n U)(I ’ n U)X .

1 1 a bias because independent data are not used

†

to estimate Var(± i ). See also [13.3.2,3].

• If the sample size, n, is smaller than the

• The sample covariance of a pair of EOF

dimension of the problem, m, the EOFs

may be obtained by ¬rst calculating the coef¬cients computed from the sample used

normalized eigenvectors g of the n — n to estimate the EOFs is zero. That is,

—

matrix n (I ’ n J )X † X (I ’ n J ) and then n

1 1 1

i=1 (± ji ’ ± j )(± ki ’ ± k ) = 0 if j = k.

1

n

computing the EOFs as

As with » j , the covariance has two interpre-

tations. It correctly estimates the covariance

X (I ’ n J )g

1

of the true EOF coef¬cients ± j = X, e j

e= .

X (I ’ n J )g

1

and ±k = X, e k . Alternatively, if we view

the estimated EOFs e j as being ¬xed, then it

13.2.7 Properties of the Coef¬cients of the incorrectly estimates Cov ± j , ± k . The latter,

Estimated EOFs. There are several properties †

the ( j, k) element of P ΣP, can be substan-

worth noting.

tially different from zero if e j and e k are

• As with the true EOFs, the estimated EOFs computed from a small sample.

span the full m-dimensional vector space.

Random vector X can therefore be expanded

13.2.8 Gappy Data. Data are often incomplete,

in terms of the estimated EOFs as X =

that is, there are irregularly distributed gaps in

m

j=1 ± j e , where

j

the data vectors caused by missing observations.

Estimated EOFs and EOF coef¬cients can be

± j = X, e j . (13.30) derived in this case, but the procedure is slightly

13.3: Inference 301

X.17 Since n Σ = X X † , we infer from equation

different. Each element of Σ is estimated by

(B.6) that the right singular vectors v i are equal to

forming sums of all available products

the estimated EOFs e i . The singular values si are

1 related to the estimated eigenvalues by »i = n si2 .

1

(xki ’ µi )(xk j ’ µ j )—

σij =

|K i © K j | k∈K The left singular vectors u i are given by (B.5)

i ©K j

(13.31) 1 †i

ui = X v. (13.35)

si

where K i = {k: component i of xk is not

missing}, and where µi = |K i | k∈K i xki . The The kth column of X represents the vector of

1

estimated EOFs are then the eigenvectors e i of deviations xk ’ µ so that

this covariance matrix estimate. The set K i © K j is 1

ui = (xk ’ µ)† v i . (13.36)

the set of all indices such that xki and xk j are not k

si

missing. The | · | notation is used to indicate the

Thus, u i is the ith normalized EOF coef¬cient

size of the enclosed set. k

(13.21) of the anomalies xk ’ µ. Note that

The EOF coef¬cient ± i of a gappy data vector

the sample variance of the ith normalized EOF

x can not be obtained as a simple dot product of

the gappy data vector x and the estimated EOF e i , coef¬cient is

as in equation (13.30), but a least squares estimate n

1 1 1

can be obtained by choosing ± i to minimize (xk ’ µ)† e i = Var (± i )

si2

n si

x ’ ± i e i . The least square estimate is given by k=1

(13.37)

1

= 2 »i = 1.

x j e i—

j∈K j

±i = si

(13.32)

|e i |2

j∈K j

Note also that equations (13.35)“(13.37) are only

where x j and e i are the jth components of x and valid for those EOFs that correspond to nonzero

j

i , respectively, and where K = { j: x is not eigenvalues. The number of nonzero eigenvalues,

e

which is determined by the rank18 of the centred