that the index of redundancy (i.e., the amount

numerator of (14.34), so that

of Y-variance explained through the regression

of B T X on Y) is maximized for any k =

R 2 (Y : QT X X) = R 2 (Y : X). (14.36) k

m

1, . . . , min(m X , m Y ). Matrix B k contains the ¬rst

k columns of B.

The implication of (14.36) is that the coordinate

system in which the random vector X is given Thus redundancy analysis determines the k-

does not matter, so long as it describes the same dimensional subspace that allows for the most

linear space. This is a favourable property since ef¬cient regression on Y. Since we are free to

the information contained in X about Y should not choose the coordinates of this subspace, we may

depend on the speci¬cs of the presentation of X, use a linear basis with k orthogonal patterns that

such as the metric used to measure the components satis¬es (14.39), so that the redundancy index may

of X, or the order of its components. be expressed speci¬cally as (14.40).

However, if the linear transformation Qk The following theorem identi¬es a second set of

patterns, A = (a 1 |a 2 | · · · |a k ), that represent an

maps the m X -dimensional variable X onto a k-

dimensional variable Xk = QT X, the new variable orthogonal partitioning of the variance of Y that

k

is accounted for by the regression of X on Y. More

contains less information about Y, so that

speci¬cally, the regression maps the subspace

R 2 (Y : Xk ) ¤ R 2 (Y : Xk+1 ) (14.37)

13 The column space of a matrix Q is the vector space

¤ R (Y : Xm X ) = R (Y : X)

2 2

spanned by the columns of Q.

14: Canonical Correlation Analysis

330

among all possible single patterns q , the eigen-

represented by Xk onto the space spanned by the

¬rst k columns of A. vector b 1 belonging to the largest eigenvalue of

the matrix Σ’1 Σ X Y ΣY X provides the maximum

The following subsections describe the mathe- XX

information, in a linear sense, about the variance

matics required for the determination of matrices

A and B. The theorems are taken from Tyler™s of Y:

paper [376]. T

R 2 (Y : q T X) ¤ R 2 (Y : b 1 X) (14.46)

14.4.5 The Redundancy Analysis Transforma- for any m X -dimensional vector q . Moreover,

tions. For any random vectors Y of dimension by equations (14.41) and (14.40), the index of

m Y and X of dimension m X , there exists an redundancy takes a particularly simple form,

orthonormal transformation A and a non-singular

k

transformation B such that T

R (Y : B k ) = R 2 (Y : b j X).

2 T

(14.47)

Cov(B T X, B T X) = I j=1

(14.41)

Cov(AT Y, B T X) = D (14.42) Also, note that inequality (14.46) may be

generalized to

where D is an m Y —m X matrix with elements di j =

0 for i = j and diagonal elements d j j = » j for k k

T

j T X) ¤

R (Y : q R 2 (Y : b j X)

2

j ¤ min(m X , m Y ).

The proof, which is detailed in Appendix M, j=1 j=1

(14.48)

revolves around two eigen-equations:

ΣY X Σ’1 Σ X Y a j = » j a j (14.43) for any set of vectors q 1 , . . . , q k .

XX

Σ’1 Σ X Y ΣY X b j = » j b j . (14.44)

XX

14.4.7 The Role of Matrix A. Since B =

Both equations have the same positive eigenvalues (b 1 | · · · |b m X ) is non-singular, random vector X

» j , and the eigenvectors a j and b j belonging can be expanded in the usual manner as

to the same nonzero eigenvalue » j are related

mX

through

X= (XT b j ) p j , (14.49)

1 j=1

Σ’1 Σ X Y a j .

bj = (14.45)

»j XX

where the adjoint patterns P = ( p 1 | · · · | p m X )

’1

The matrices A and B, which are composed are given by P = B . When re-expressed in

T

of eigenvectors a j and b j , respectively, are the matrix-vector form, equation (14.49) simply reads

only matrices that satisfy the requirements of the as

theorem.

X = PB T X.

From the computational point of view, it is

advisable to solve the eigenproblem with the Similarly, since A is orthonormal, the part of Y

Hermitian matrix (14.43), then use the identity

that can be represented by X, that is, Y, can be

(14.45). Since (14.43) is a Hermitian problem, all

expanded as

eigenvectors a j are real valued, and since (14.45)

involves only real matrices, the ˜patterns™ b j are

Y = AAT Y = (YT a j )a j . (14.50)

also real valued.

j

14.4.6 Theorem: Optimality of the Redundancy When we regress Y on X, we ¬nd that Y =

Transformation. The signi¬cance of the redun- ΣY X Σ’1 X. Thus the expansion coef¬cients in

XX

dancy transformation originates from the follow-

(14.50), the elements of AT Y, are given by

ing theorem given by Tyler [376]:

’1

The redundancy index R 2 (Y : QT X) is maximized A Y = A ΣY X Σ X X X.

T T

k

by setting Qk = B k , where B k is the m x — k matrix Now, from equations (14.41) and (14.42) we have

that contains the k eigenvectors satisfying (14.42) that Σ’1 = BBT and AT Σ B = D. Thus

YX

XX

that correspond to the k largest eigenvalues.

Note that the statement holds for all k ¤ m X . Thus, AT Y = AT ΣY X BB T X = DB T X.

14.4: Redundancy Analysis 331

Hence the expansion coef¬cients in (14.50) are 14.4.8 Comparison with CCA. Let us now

consider the special case in which Σ X X and ΣY Y

given by

are both identity matrices. Then B and P are also

T

Y a = »j X b .

j Tj

(14.51) identity matrices, and the regressed patterns a,

the EOFs Y, are the eigenvectors of ΣY X Σ X Y .

Considering both (14.49) and (14.50), we

That is, X provides the most information about the

see that the regression maps variations in the

component of Y that lies in the a 1 direction, where

amplitude of X patterns p j onto variations in the

a 1 is the ¬rst eigenvector of ΣY X Σ X Y . The best

j . On average, Y =

amplitude of Y patterns a predictor of this component is XT Σ X Y a 1 .

j

» j a j when X j = p j (cf. (14.49) and (14.51)). When we perform CCA on the same system we

It is easily shown that the patterns themselves are must solve the paired eigenvalue problem

related by14

Σ X Y ΣY X fX = » fX

AD = ΣY X Σ’1 P.

ΣY X Σ X Y fY = » fY .

XX

That is, the X-patterns are transformed into scaled

The ¬rst pair of eigenvectors of this system is

versions of the Y patterns by the regression given by f = Σ X Y a 1 , and f = a 1 , indicating