16, EDF = 32, EBW = 0.042). The spectral

5% signi¬cance level test of the null hypothesis

that κx y (ω) = 0. Despite the noise and the density is well estimated. We now have a

many large peaks, only a few coherency estimates reasonable indication of the shape of the coherence

rise above the critical value. The bottom panel spectrum although it is severely underestimated at

frequencies ω > 0.1. The peak in the estimated

displays the corresponding phase estimates (solid)

and true phase (horizontal long-dashed line). We coherence spectrum is located at a slightly higher

see that the phase is reasonably well estimated frequency than that in the true spectrum. The phase

in the same interval (0.02, 0.1) in which we is well estimated in the interval (0.02, 0.2).

have some indication that the true coherency is The right hand column of Figure 12.23

nonzero. Elsewhere, the phase estimates are of no illustrates the effect of over-smoothing the

value. periodogram. We used the Daniell estimator with

12.5: Estimating the Cross-spectrum 287

n = 64 (EDF = 128, EBW = 0.17). The spectrum in some more detail, we consider a

estimate of the spectral density at low frequencies bivariate process Xt with components that vary

’1 ’1

similarly at time scales ωb to ωa . Suppose that

is now affected by the ˜peak spreading™ effect of

Yt leads Xt by ζ time intervals on these time

the smoothing. The coherence estimates are now

strongly affected by bias as well. The peak has scales. Then

been shifted to the right and its magnitude has been

≈ e2πiζ ω

x y (ω) x x (ω) for ω ∈ (ωa , ωb ).

diminished. This estimate gives quite a distorted

(12.60)

view of the rotational properties of the sampled

process. However, the phase is surprisingly well

A very simple process of this type has

estimated over a wide frequency band.

Yt = Xt+ζ

12.5.8 Yet Another Source of Bias. Another in which case approximation (12.60) holds for

potential source of bias occurs when one time all time scales (see equation (11.72) in [11.4.4]).

series is delayed relative to another. This may Generally, however, one process might lag or lead

happen in very simple ways, for example, by the other over only some subset of time scales.

shifting the time origin of one series relative to Suppose, for simplicity, that x y is estimated

another. For example, one could conceive of proxy from a time series of length T using the Daniell

data derived from tree rings, varves, ice cores, etc. estimator with bandwidth n/ T . The estimator of

in which time is measured relative to an uncertain the cross-spectrum is

time origin. However, it may also happen in much

j+(n’1)/2

1

more complex ways, with delay occurring on some

x y (ω j ) = (I x y )T k .

time scales but not others. n k= j’(n’1)/2

Unrecognized delay can lead to severe under-

estimation of the cross-spectral density function The bivariate periodogram is an asymptotically

on the time scales at which delay occurs. One unbiased estimator of the cross-spectrum. Thus

might even be led to the false conclusion that two

E I x y T k ≈ e2πiζ ωk x x (ωk )

strongly related time series are unrelated. This is

intuitively easy to understand if we think of a for ωk ∈ (ωa , ωb ). Assuming that x x (ωk ) is

weighted covariance spectral estimator with a lag approximately constant for the ωk s lying in a

window that is zero beyond lag M. Imagine a pair bandwidth centred on ω j , we then obtain

of strongly related processes in which the delay is

ζ . Cross-covariances at lags near ζ will be large j+(n’1)/2

e2πiζ ωk

while those at other lags will be small. If the delay

ζ is greater than M lags, the weighted covariance k= j’(n’1)/2

E x y (ω j ) ≈ x x (ω j ) .

n

estimator will entirely miss the contributions to

(12.61)

the cross-spectrum that are made by the large

cross-covariances near lag ζ . When the delay is If the delay ζ is large (greater than about

the same at all time scales it may be possible to 1/EBW = T /n), the elements of the sum in

correct this problem by aligning the components (12.61) describe points all around the unit circle

of the observed time series (see the examples in and consequently we have large bias with

Jenkins and Watts [195, Sections 9.3.2 and 9.3.3],

|E x y (ω j ) | | x y (ω j )|

and also Bloom¬eld [49, Section 9.6]). A simple,

but not very ef¬cient, way to do this is to shift the

at frequencies in (ωa , ωb ).

time origin of the delayed time series by ζ time

This type of bias affects both the estimated

units.30

coherency and phase spectra. The coherency

To examine the effect of delay on the estimated

spectrum will be underestimated and we might

incorrectly conclude that the processes X and Y are

30 When the delay is independent of frequency, alignment

’1 ’1

uncorrelated at time scales between ωb and ωa .

is performed ef¬ciently by estimating the delay ζ from

the cross-correlation function and then multiplying the The phase spectrum will be estimated incorrectly

cross-periodogram I x y T j by e’2πiζ ω j before using it in

and we may entirely miss the linear component

a periodogram-based spectral estimator. In general, when

of the variation of phase with frequency (see

the delay varies with time scale, simple alignment cannot

approximation (12.60)) that is induced by the

be used. Hannan and Thompson [159, 160] describe a

delay.

method for estimating frequency-dependent delay. See also

Bloom¬eld [49, pp. 228“231].

This Page Intentionally Left Blank

Part V

Eigen Techniques

This Page Intentionally Left Blank

291

Overview

A characteristic dif¬culty in climate research is the size of the phase space. It is practically in¬nite

in the case of the real system, and much smaller, though still very large, in the case of quasi-realistic

models, such as atmospheric or oceanic General Circulation Models. Thus observations or simulated

data sets, per se, are not always useful to the researcher who wants to know the dynamics controlling

the developments and relationships in the system. Statistical analysis becomes an indispensable tool

for helping the researcher to discriminate between a few dynamically signi¬cant components and the

majority of components that are irrelevant or, in terms of frequently used slang, of ˜second (or higher)

order™ for the problem at hand. The task of sorting out the ¬rst-order processes from myriads of

second-order processes makes statistical analysis in climate research different from both conventional

(mathematical) statistics and statistical mechanics. In mathematical statistics, problems are usually of

low dimension, and in statistical mechanics the phase space, though in¬nite, is isotropic or of some

simple structure. In climate problems, however, one has to expect different characteristics for each

different direction in phase space. The problem is to ¬nd the relevant directions.

In this part of the book, a number of linear techniques are introduced that attempt to identify

˜relevant™ components in phase space. We assume that these components take the form of characteristic

vectors, which can usually be represented by patterns (i.e., spatial distributions). These techniques are

often based on an eigenproblem, which arises naturally when maximizing some interesting squared

properties, for instance variance or correlation, under certain constraints. (The differentiation of the

squared property leads to the linear problem, with the eigenvalue originating from the addition of a

Lagrange multiplier.)

In Chapter 13 we begin with the problem of one random vector and its decomposition into its

Empirical Orthogonal Functions (EOFs). The EOFs are orthogonal spatial patterns that can be thought

of as empirically derived basis functions. The low-order EOFs can sometimes be interpreted as natural

modes of variation of the observed system. The time coef¬cients obtained by projecting the observed

¬eld onto the EOFs are uncorrelated and represent the variability of the ¬eld ef¬ciently. We also

introduce two related topics, Singular Systems Analysis and Rotated EOF Analysis.

In Chapter 14 we consider a pair of random vectors and we search for pairs of directions, or patterns,

that represent the strongest joint patterns of variations. Techniques designed for this purpose include

Canonical Correlation Analysis, Maximum Covariance Analysis, and Redundancy Analysis.

Principal Oscillation Patterns (POPs; Chapter 15) are obtained by imposing a speci¬c model for the

time evolution of a ¬eld. This technique is useful if the system under consideration has quasi-oscillatory

modes. POP analysis can be generalized to cyclo-stationary time series. The general concept of state

space models is brie¬‚y outlined in the last section of Chapter 15.

Another approach for analysing the time evolution patterns of variability is to complexify the observed

¬eld (Chapter 16). This can be done by assigning the observed ¬eld to the real part of the complexi¬ed

process and assigning the Hilbert transform of the observed ¬eld to the imaginary part. This approach

has been used widely in conjunction with EOF analysis, but can also be used with the techniques

introduced in Chapters 13“15.

This Page Intentionally Left Blank

13 Empirical Orthogonal Functions

13.0.0 Overview. In this chapter we present a compute the standard deviation at each level and

multivariate analysis technique that is to derive the to plot it in the vertical. However such a pro¬le

dominant patterns of variability from a statistical does not tell us how the variations are correlated in