related. In other words, they are the best linearly

1 + ρi 1 + ρi

1 1

auto-predictable components in Zt .8 An example θi ’ and z i = ,

ln ln

1 ’ ρi 1 ’ ρi

2 2

is given in [14.3.7].

then the bias of

14.2 Estimating Canonical

1

Correlation Patterns θ i = zi ’ m X + m Y ’ 2 + ρ i2

2nρ i

ρ j2

14.2.1 Estimation. Estimates of canonical m

+ 2(1 ’ ρ i2 )

correlation patterns and coef¬cients are obtained

ρ i2 ’ ρ j2

in the obvious way by replacing Σ X X , ΣY Y , j=1;

j=i

and Σ X Y with corresponding estimates. We

recommend that the problem be kept small

is approximately O(n ’2 ) and

by approximating the data with truncated EOF

expansions (see [14.1.5] and also Bretherton et al. 1

+ O(n ’2 ).

Var(θ i ) =

[64]). This has the added bene¬t of eliminating

n

small-scale spatial noise.

Thus the bounds for an approximate p — 100%

˜

14.2.2 Making Inferences. As noted previ- con¬dence interval for ρi are given by

ously, very little is known about the sampling

√

variability of the eigenvectors or canonical cor- tanh(θ i ± z (1+˜ )/2 / n), (14.26)

p

relation patterns. However, there are some useful

asymptotic results for making inferences about the where z (1+˜ )/2 is the (1 + p)/2-quantile of the

˜

p

canonical correlations themselves. standard normal distribution (Appendix D). Muir-

Bartlett [32] proposed a test of the null head and Waternaux [282] show that asymptotic

hypothesis H0 : ρl+1 = · · · = ρm = 0 that statistics like equations (14.25, 14.26) are not par-

the last m ’ l canonical correlations are zero ticularly robust against departures from the mul-

when it is known that the ¬rst l are nonzero. tivariate normal assumption. Use of the bootstrap

Here m = min(m X , m Y ). Bartlett™s test can be (see Section 5.5) is probably the best practical

used when the canonical correlations have been alternative when this is a concern.

estimated from a sample {(x1 , y1 ), . . . , (xn , yn )} One question rarely mentioned in the context

of independent realizations of random vectors X of CCA is the size of sample needed to make

and Y that are jointly multivariate normal. The test good estimates and inferences. Thorndike [365,

pp. 183“184] suggests that n > 10(m X +

statistic (Bartlett [32])

m Y ) + 50 is a reasonable rule of thumb, and

1

χ 2 = ’(n ’ 1 ’ l ’ (m X + m Y + 1)) argues that n > (m X + m Y )2 + 50 may

2

be needed for some purposes. Our experience,

l m

’2

+ ρ i ln (1 ’ ρ i ) , (14.25) however, is that much smaller samples can provide

2

meaningful information about the ¬rst few patterns

i=1 i=l+1

and correlations. However, be aware that the

ˆ i , is approximately distributed asymptotic results discussed above are not likely

where ρ i = »

as χ 2 ((m X ’ l)(m Y ’ l)) under H0 . The test is to hold under these circumstances. The Monte

8 It seems that the idea was ¬rst suggested by Hasselmann Carlo experiments discussed in the next subsection

in an unpublished paper in 1983 but it was not pursued until give some further insight into what can be

accomplished with small samples.

1996 [103].

14.3: Examples 323

k = 20 n = 250

mode

ρx y n= k=

i

i 50 100 500 1000 10 30 50

1 0.69 0.96 0.83 0.70 0.69 0.68 0.71 0.74

2 0.60 0.92 0.76 0.59 0.58 0.58 0.61 0.65

3 0.37 0.79 0.51 0.33 0.31 0.30 0.36 0.43

4 0.11 0.54 0.28 0.10 0.09 0.06 0.16 0.27

5 0.07 0.46 0.23 0.08 0.06 0.03 0.13 0.25

Table 14.1: The means of 100 canonical correlation estimates computed from simulated samples of n

pairs of 251-dimensional random ¬elds (see text). For brevity, only ¬ve of the 10 canonical correlations

are listed. The true correlations ρx y are given in the second column; the results obtained for variable

i

time series lengths n, with an EOF truncation of k = 20, are given in columns three to six. The effect of

including different numbers of EOFs k, using a ¬xed time series length of n = 250, is listed in columns

seven to nine. From Borgert [55].

Covariance Analysis were more robust than the

14.2.3 Monte Carlo Experiments. Borgert

other techniques considered.

[55] conducted a Monte Carlo study of the

performance of CCA on EOF truncated [14.1.6]

data. He simulated a pair of 251-point random 14.2.4 Irregularly Distributed Gaps in the

¬elds X and Y that consisted of a random linear Data. One way to cope with missing data is to

combination of 10 pairs of patterns. Each pair ¬ll the gaps by spatial or temporal interpolation.

of patterns was multiplied by a pair of random However, this is unsatisfactory if more than just a

coef¬cients that were independent of all other small amount of data is missing because we end

pairs of coef¬cients. Thus the random coef¬cients up trying to diagnose connections between real

are the true canonical variables. Each pair of data on the one hand and imputed data with much

random coef¬cients was generated from a different lower information content on the other. A better

bivariate auto-regressive process. In this way the procedure is to use only the data that are actually

cross-correlations between the pairs of canonical available. This can be achieved by the procedure

variates, the true canonical correlations, were already outlined in [13.2.7]. The various matrices,

known. Thus Borgert was able to simulate a pair of such as Σ X X , are estimated by forming sums over

random ¬elds with known canonical correlations only the available pairs of observations (13.31):

and patterns.

1

(xki ’ µi )(xk j ’ µ j )—

σij =

Borgert used this tool to generate 100 inde-

|K i © K j | k∈K

pendent samples for a number of combinations i ©K j

of sample size n and EOF truncation point k =

where K i = {k: component i of xk is not missing},

k X = kY . A canonical correlation analysis was

the notation |·| indicates the number of elements in

performed on each sample, and statistics assessing

a set, and µi = |K i | k∈K i xki . As with EOFs, the

1

the average quality of the CCA were gathered

calculation of the time coef¬cients can no longer

for each combination of n and k. He found

be done by means of the dot products (14.12)

that the CCA was really able to identify the

and (14.13). Instead coef¬cients are determined by

correct pairs of patterns: the estimated patterns

least squares, as in equation (13.32).

were close to the prescribed patterns. However, as

exempli¬ed in Table 14.1, there were considerable

biases in the estimated correlations if too many

14.3 Examples

EOFs were retained or if the time series were too

short. 14.3.0 Overview. We will present three exam-

Bretherton et al. [64] reviewed a number ples in this section. The joint variability of a

of techniques for diagnosing coupled patterns pair of large-scale ¬elds is examined for evidence

and intercomparing them in a series of small of a cause-and-effect relationship between the

Monte Carlo experiments. They found that CCA occurrence of large-scale sea-level air pressure and

with a priori EOF truncation and Maximum sea-surface temperatures anomalies in the North