greatest variance that is uncorrelated with XTe 1 ,

corresponding pattern coef¬cients is maximized.

and so on. The objective of CCA is to ¬nd a The pair of patterns with the largest correlation

pair of patterns fX1 and fY 1 (subject to fX1 = is shown in Figure 14.1. The two patterns, one of

fY 1 = 1) so that the correlation between linear

which consists of two sub-patterns for the pressure

and temperature (Figure 14.1, top and middle),

combinations XT fX1 and YT fY 1 is maximized.1

have a meaningful physical interpretation. Below

A second pair of patterns fX2 and fY 2 is found

normal temperatures in Bern are associated with

so that XT fX2 and YT fY 2 are the most strongly

high pressure over the British Isles and below

correlated linear combinations of X and Y that are

normal temperatures in the rest of Europe since

not correlated with XT fX1 and YT fY 1 , and so on.

the correlation between the local climate pattern

Canonical Correlation Analysis was ¬rst de- (bottom panel, Figure 14.1) and the tropospheric

scribed by Hotelling [187]. pattern (top two panels, Figure 14.1) is negative.

The ˜Canonical Correlation Patterns™ of a paired

Weakened westerly ¬‚ow is associated with

random vector (X, Y) are de¬ned in Section 14.1, reduced precipitation; the seasonal mean, standard

and their estimation is described in Section 14.2.

deviation, and number of ˜wet™ days all tend to be

Examples of some applications are given in below normal. The large-scale patterns have little

Section 14.3. A closely related technique, calledeffect on wind speed and relative humidity.

Redundancy Analysis, is described in Section 14.4. The link between the two patterns in Figure 14.1

is strong. The correlation between the coef¬cient

Introductory Example: Large-scale time series (not shown) is ’0.89, and the CCA

14.0.1

Temperature and SLP over Europe and Local pattern represents a large proportion of the

Weather Elements in Bern. Gyalistras et al. variance of the local climate (Figure 14.2). More

[152] analysed the simultaneous variations of than 50% of interannual variance of the seasonal

the local climate in Bern (Switzerland) and the means of daily mean, minimum and maximum

troposphere over the North Atlantic in DJF. The temperature are represented by the ¬rst CCA pair.

state of the local climate in a given season was They also represent almost 80% of the interannual

represented by a 17-dimensional random vector X variance of DJF precipitation and about 75% of the

consisting of the number of days in the season interannual variance of the number of ˜wet™ days.

with at least 1 mm of precipitation, and the

1 One could also choose f 1 and f 1 to maximize

14.1 De¬nition of Canonical

X Y

the covariance between XT fX1 and YT fY1 . Climatologists Correlation Patterns

sometimes call this SVD analysis since the patterns are

found by obtaining a singular value decomposition of the

14.1.1 One Pair of Patterns. Let us consider

cross-covariance matrix. See [14.1.7], Bretherton, Smith, and

an m X -dimensional random vector X and an

Wallace [64] and Cherry [83].

317

14: Canonical Correlation Analysis

318

Figure 14.2: Percentage of year-to-year variance

of the local climate variables for Bern represented

by the ¬rst CCA pair.

m Y -dimensional random vector Y. We require an

m X -dimensional vector fX and an m Y -dimensional

vector fY such that the inner products β X =

X, fX and β Y = Y, fY have maximum

correlation. That is, we want to maximize

Cov β X , β Y

ρ= (14.1)

βX βY

Var Var

fXT Cov X, Y fY

= .

Var X, fX Var Y, fY

Note that if a pair of vectors fX and fY maximizes

(14.1), then all vectors ± X fX and ±Y fY do the same

for any nonzero ± X and ±Y . Thus the patterns fX

Figure 14.1: First pair of canonical correlation

and fY are subject to arbitrary normalization. In

patterns of Y = (DJF mean SLP, DJF mean

particular, we can choose patterns such that

temperature) and a vector X of DJF statistics of

local weather elements at Bern (Switzerland). Var X, fX = fXT Σ X X fX = 1 (14.2)

Top: The SLP part of the ¬rst canonical correlation

Var Y, fY = fY T ΣY Y fY = 1, (14.3)

pattern for Y.

Middle: The near-surface temperature part of the where Σ

X X and ΣY Y are the covariance matrices

¬rst canonical correlation pattern for Y. of X and Y. Then equation (14.1) can be rewritten

Bottom: The canonical correlation pattern for the as

local variable X.

Note that the correlation between the correspond- ρ = fXT Σ X Y fY , (14.4)

ing pattern coef¬cients is negative.

where Σ X Y is the cross-covariance matrix

From Gyalistras et al. [152].

Σ X Y = E (X ’ µ X )(Y ’ µY )T .

14.1: De¬nition of Canonical Correlation Patterns 319

Vectors fX and fY are found by maximizing Thus the correlation is the square root of the

eigenvalue that corresponds to eigenvectors fX and

= fXT Σ X Y fY + ζ ( fXT Σ X X fX ’ 1)

fY .3

+ ·( fY T ΣY Y fY ’ 1), (14.5)

14.1.2 More Pairs. The derivation detailed

where ζ and · are Lagrange multipliers that are

above can now be repeated to obtain m =

used to account for constraints (14.2) and (14.3).

min(m X , m Y ) pairs of patterns ( fXi , fY i ) and m

Setting the partial derivatives of to zero, we

corresponding pairs of canonical variates4

obtain

‚ βiX = X, fXi (14.12)

= Σ X Y fY + 2ζ Σ X X fX = 0 (14.6)

‚ fX βiY = Y, fY i (14.13)

so that

with correlation

Σ’1 Σ X Y fY = ’2ζ fX , (14.7)

ρi = Cov βiX , βiY = »i .

XX

and

The patterns and canonical variates are indexed

‚

(14.8) in order of decreasing eigenvalue »i . Pairs of

= ΣT Y fX + 2·ΣY Y fY = 0,

X

‚ fY canonical variates are uncorrelated. That is, for

i = j,

which is equivalent to

Σ’1 ΣT Y fX = ’2· fY . Cov βiX , β j = Cov βiY , β Y

X

(14.9) j

YY X

= Cov βiX , β Y = 0.

Then (14.9) is substituted into (14.7) and vice

j

versa to obtain a pair of eigen-equations for fX and

fY :

Σ’1 Σ X Y Σ’1 ΣT Y fX = 4ζ · fX (14.10) 14.1.3 The Canonical Correlation Patterns.

YY X

XX

Σ’1 ΣT Y Σ’1 Σ X Y fY = 4ζ · fY . (14.11) For simplicity, we assume in this subsection that

YY X XX

X and Y are of the same dimension m. Then

An argument similar to that used to establish the canonical variates β X = (β X , . . . , β X )T and