i=1(Xi ’ X) i=1 i

This is the maximum likelihood estimator [5.3.8]

As noted in Section 2.8, the correlation

when (X, Y) is bivariate normally distributed.

coef¬cient measures the tendency of X and Y to

Furthermore, (8.4) is asymptotically normally

co-vary (see Example [2.8.12] and Figure 2.10);

distributed with mean ρ XY and variance (1 ’

the greater |ρ|, the greater the ability of X to

ρ XY )2 /n. However, because ρ XY converges slowly

2

specify Y.

to its asymptotic distribution, this result is

Suppose that X and Y are bivariate normally

distributed with means µ X and µY , variances σX 2 generally not used to make inferences about

ρ XY . Instead, inferences are based on Fisher™s

and σY2 , and correlation coef¬cient ρ XY . Their joint

z-transform,

density function is given by (2.35). Suppose also

that only X is observable and we want to ¬nd a 1 + ρ XY

1

z= ,

ln (8.5)

function, say g(X), that speci¬es the value of Y

1 ’ ρ XY

2

as accurately as possible on average. A reasonable

measure of accuracy is the mean squared error, which converges quickly to the normal distribution

N 1 log 1+ρ XY , n’3 when ρ XY is nonzero. It

given by 1

2 1’ρ XY

E((Y ’ g(X))2 ). (8.3) is then easily demonstrated that an approximate

p — 100% con¬dence interval for ρ XY is given by

˜

It can be shown that

(tanh(z L ), tanh(zU )) ,

σY (8.6)

g(X) = µY + ρ XY (X ’ µ X )

σX

where

√

minimizes (8.3) when g is linear in X and that

z L = z ’ Z (1+˜ )/2 / n ’ 3

the mean squared error is σY2 (1 ’ ρ XY ). To reduce

2 p

√

zU = z + Z (1+˜ )/2 / n ’ 3,

the mean squared error to less than 50% of √ the p

variance of Y, it is necessary that |ρ XY | > 1/ 2.

and Z (1+˜ )/2 is the (1 + p)/2-quantile of the

˜

That is, X represents at least 50% of the variance

√ p

of Y when |ρ XY | > 1/ 2. To reduce the root mean standard normal distribution (see Appendix D).

David [100] (see also Pearson and Hartley [308])

squared error to less than 50% of the standard

√

gives tables for exact con¬dence intervals for ρ XY .

deviation of Y it is necessary that |ρ XY | > 3/2 ≈

In the SOI example ρ SST,S O I = 0.667 and thus

0.87.

z = 0.805. For (1 + p/2) = 0.05, Z (1+˜ )/2 =

˜

Using the estimated correlation ρ = 0.667 √p

1.96, so that Z L = 0.805 ’ 1.96/ 621 =

between Wright™s [426] monthly SST index and

0.727, assuming that each of the 52 — 12 months

the monthly SOI [8.1.4] we estimate that the mean

square error of the SO index is 58% of its variance in the index series are independent. This latter

8.2: Correlation 149

assumption is, of course, invalid, but it serves distribution. Critical values for one-sided tests are

our pedagogical purposes at this point. Similarly, obtained analogously.

Z U = 0.884. Finally, from (8.6) we obtain In contrast to tests of the mean (see Section 6.6),

(0.621, 0.708) as the 95% con¬dence interval for inference about the correlation coef¬cient seems to

ρ SST,S O I . This interval is almost symmetric about be relatively weakly affected by serial correlation,

ρ SST,S O I because the sample size is large; it will at least when correlations are small [442]. A

be less symmetric for smaller samples. Note also resampling scheme that further reduces the impact

that this con¬dence interval is probably too narrow of serial correlation on inferences made about the

because it does not account for dependence within correlation coef¬cient is described by [110].

the data.

An approximate test of H0 : ρ XY = 0 can be

8.2.4 More Interpretations of Correlation.

performed by computing

The correlation coef¬cient can also be interpreted

as a measure of the proportion of the variance of

n’2

T = |ρ XY | one variable, say Y, that can be represented by

(8.7)

1 ’ ρ XY

2

constructing a linear model of the dependence of

the mean of Y upon X. Assume that (X, Y) are

and comparing T with critical values from the t

bivariate normally distributed with joint density

distribution with n ’ 2 degrees of freedom (see

function f XY (x, y) given by (2.35). We factor

Appendix F). The type of test, one sided or two

f XY (x, y) into the product of the density function

sided, is determined by the form of the alternative

of Y conditional upon X = x and the marginal

hypothesis.

density function of X (see Sections 2.5 and 2.8) to

Con¬dence interval (8.6) and test (8.7) both

obtain

require the normal assumption. A non-parametric

f XY (x, y)

approach based on ranks can be used when the

f Y |X =x (y|X = x) =

f X (x)

observations are thought not to be normal. The

sample {(Xi , Yi ): i = 1, . . . , n} is replaced by the exp ’(y ’ µY |X =x )2/2σY2 (1 ’ ρ XY )

2

=

corresponding sample of ranks {(RXi , RYi ) : i =

2π σY2 (1 ’ ρ XY)

2

1, . . . , n} where RXi is the rank of Xi amongst the

Xs and RYi is de¬ned similarly.3 The dependence

between X and Y is then estimated with the

where

Spearman rank correlation coef¬cient ρ XY S

σY

µY |X =x = µY ’ ρ XY (µ X ’ x).

n

i=1 R X i RYi ’ N σX

ρ XY =

S

(8.8)

The variance of Y conditional upon X = x is

n n

i=1 R Xi ’ N i=1 RYi ’ N

2 2

σY2 (1 ’ ρ XY ), the same factor discovered in [8.2.2]

2

when we considered X as a predictor of Y. The

where

conditional variance does not depend upon the

n+1 2 speci¬c realized value of X. The mean of Y varies

N =n .

linearly with the realized value of X when ρ XY

2

This is just the ordinary sample correlation is nonzero. Note that the mean of one of the

coef¬cient4 (8.4) of the ranks. Note that ’1 ¤ pair of variables is completely determined by the

ρ XY ¤ 1, that ρ XY = +1 when the rank orders realized value of the other. The squared correlation

S S