1

Theorem [13.2.4] proves that the two matrices β Y = (β Y , . . . , β Y )T can be viewed as the result

m

1

share the same non-negative eigenvalues.2 The of coordinate transforms that have been applied to

eigenvectors of the two matrices are related to X and Y.5 The transformations relate β X and β Y

each other through a simple equation: if fX is to X and Y through unknown matrices F and

X

a solution of equation (14.10), then Σ’1 ΣT Y fX F Y :

YY X

is a solution of equation (14.11), provided that

X = FXβ X

their joint eigenvalue is nonzero. Finally, equation

(14.14)

Y = FY β Y .

(14.4) is maximized by letting fX and fY be the

solutions of equations (14.10) and (14.11) that

To ¬nd F X , note that

correspond to the largest eigenvalue » = 4ζ ·.

Now that we have found the canonical random β X = ( X, f 1 , . . . , X, f m )T

X X

variables β X = X, fX and β Y = Y, fY that are

= fX X

T

most strongly correlated, the natural next step is to

¬nd the value of ρ. Using equations (14.4), (14.6), 3 Note that the sign of the correlation is arbitrary since f

X

(14.8), and (14.2), (14.3) in sequence, we ¬nd: and f are determined uniquely only up to their signs.

Y

4 We assume that (Σ’1/2 )T Σ ’1/2

’1 T

ρ=2

fXT Σ X Y fY fY T ΣT Y fX X Y ΣY Y Σ X Y Σ X X (or,

XX

X ’1/2 T T ’1/2

’1

equivalently, (ΣY Y ) Σ X Y Σ X X Σ X Y ΣY Y ) has m =

= 4·ζ fX X X fX fY T ΣY Y fY

TΣ

min(m X , m Y ) distinct, nonzero eigenvalues. Eigenvalues of

= ». multiplicity greater than one lead to degeneracy just as in

EOF analysis. Uncorrelated canonical variates can still be

2 Note that iffX is a solution of equation constructed, but their interpretation is clouded by their non-

1/2 unique determination. Tools comparable to North™s Rule-of-

Σ X X fX

(14.10), then is an eigenvector of

Thumb [13.3.5] are not yet developed for CCA. Note that

’1/2 T ’1/2

’1 T 1/2

(Σ X X ) Σ X Y ΣY Y Σ X Y Σ X X . Similarly, ΣY Y fY is a pair of degenerate eigenvalues may be an indication of a

’1/2 ’1/2

an eigenvector of (ΣY Y )T ΣT Y Σ’1 Σ X Y ΣY Y . Since propagating pattern. See Chapter 15.

X XX

5 The discussion in this subsection is easily generalized to

these are non-negative de¬nite matrices, their eigenvalues are

real and non-negative. the case in which X and Y are not of the same dimension.

14: Canonical Correlation Analysis

320

where f X is the m — m matrix with eigenvector fXi last set of vectors, the Y-canonical correlation

patterns FYi . It is therefore necessary to solve only

in its ith column. Thus

the smaller of the two eigenproblems (14.10) and

Cov(X, β X ) = Cov(X, fT X) (14.11).

X

= Cov(X, X)f X = Σ X X f X .

14.1.5 Coordinate Transformations. What

However, substituting equation (14.14) for X, we

happens to the canonical correlation patterns and

also have

correlations when coordinates are transformed

by an invertible matrix L through LX= Z?

Cov(X, β X ) = Cov(F X β X , β X )

For simplicity we assume random vector Y is

= F X Cov(β X , β X ) = F X

unchanged.

since Cov(β X , β X ) = I. Thus To get the same maximum correlation (14.1), we

have to transform the patterns fXi with L’1 ,

F X = ΣX X fX (14.15)

fZi = (L’1 )T fXi . (14.18)

and similarly

Thus the canonical correlation coordinates βiX =

F Y = ΣY Y f Y . (14.16) fZi , Z = fXi , X are unaffected by the trans-

The columns of F X and F Y , FX and FYi , are formation. Note that relation (14.18) can also be

i

i i

called the canonical correlation patterns.6 The obtained by verifying that fZ and fX are eigenvec- T

canonical variates βiX and βiY are also often tors of the CCA matrices Σ Z Z ’1 Σ Z Y ΣY Y ’1 Σ Z Y

T

called canonical correlation coordinates. Since the and Σ X X ’1 Σ X Y ΣY Y ’1 Σ X Y with the same eigen-

canonical correlation coordinates are normalized values.

i

The canonical correlation patterns FX are

to unit variance, the canonical correlation patterns

are expressed in the units of the ¬eld they determined by the covariance matrix of X and the

represent, and they indicate the ˜typical™ strength fXi -pattern (14.15). Therefore,

of the mode of covariation described by the

FZ = Σ Z Z fZi

i

patterns.

= LΣ X X LT (L’1 )T fXi

While the matrix-vector representations of X

and Y in (14.14) are convenient for the derivation

= LΣ X X fXi

of F X and F Y , they are not very evocative.

= L FX .

i (14.19)

Therefore, note that (14.14) can also be written as

Thus the canonical correlation patterns are

X= βiX FX

i

i

(14.17) transformed in the same way as the random vector

Y = i βiY FYi .

X. We may conclude that the CCA is invariant

This allows us to see more clearly that (14.14) under coordinate transformations.

describes an expansion of X and Y with respect to

their corresponding canonical correlation patterns. 14.1.6 CCA after a Transformation to

It also suggests that it may be possible to EOF Coordinates. The CCA algebra becomes

approximate X and Y by truncating the summation considerably simpler if the data are transformed

in (14.17). into EOF space before the analysis (Barnett and

Preisendorfer [21]). Suppose that only the ¬rst k X

14.1.4 Computational Aspects. Once we know and kY EOFs are retained, so that

one set of vectors, say fXi , all other vectors are kX X+ i+

X≈ i ±i e X

easily obtained through simple matrix operations. (14.20)

kY Y + i +

Let us assume that we have the vectors fXi . Y≈ ±i eY ,

i

i . In [14.1.1] we noted

Then (14.15) yields FX

where we have used the renormalized versions

that Σ’1 ΣT Y fXi is equal to fY i after suitable (13.20, 13.21) of the EOFs and their coef¬cients

X

YY

normalization. Application of (14.16) gives the ± + = (» )’1/2 ± and e i+ = (» )1/2 e i . The

i i i

i

CCA is then applied to the random vectors X =

6 Note that neither the eigenvectors f i and f i nor

X Y

(±1 + , . . . , ±k X+ )T and Y = (±1 + , . . . , ±kY+ )T .

the canonical correlation patterns FX and FYi are generally

i X X Y Y

’1/2

1/2

orthogonal. However, the columns of Σ X X f X = Σ X X F X An advantage of this approach is that it is

’1/2

1/2