number of classes and equitable. identi¬cation. From Murphy et al. [283].

18.2 The Skill of Quantitative the number of entries per box. An example of

Forecasts such a diagram is shown in Figure 18.2 (Murphy

et al. [283]). The forecast is for temperature for

18.2.1 Forecast and Predictand as Bivariate Minneapolis (Minnesota) at a 24-hour lead during

Random Variable. As we noted in [18.0.3], the winter. Forecasts of ˜correct™ or ˜near correct™ are

forecast/predictand pair (F, P) form a bivariate marked by open circles. The maximum density

random variable with a joint density function f F P . estimate usually lies on the diagonal f = p, but for

The conditional density functions f F|P= p and forecasts F ¤ 28 —¦ F the corresponding observed

f P|F= f tell us something about the performance of temperatures tend to be systematically lower than

the forecast. (For a detailed discussion see Murphy the forecast by a few degrees. The conditional

and Winkler [286] and Murphy et al. [283].) standard deviations of the forecast errors are of

First, one would hope that E(F|P = p) = p the order of 5 —¦ F, and forecast errors larger than

and that E(P|F = f) = f. That is, the mean of all 20 —¦ F never occur. Very little can be learned about

forecasts F, given a predictand P = p, is p, and, the the skill of forecasts below 8 —¦ F and above 48 —¦ F

mean of all predictands P is f when averaged over because of poor sampling.

all occasions when F= f. If the former condition

An example of an estimated conditional

is satis¬ed, the forecast is called conditionally

distribution f P|F= f is the estimated distribution of

unbiased.

Minneapolis temperature observations P given the

The conditional variances Var(F|P = p) and

forecast F = f, which is shown in the upper panel

Var(P|F = f) are ideally small.

of Figure 18.3. The 10%, 25%, 50%, 75%, and

Note that the forecast F and the predictand P can

90% quantiles of the observations are derived and

be statistically associated. Let us choose a and b so

plotted for each 5 —¦ F bin of the forecast. Ideally,

that

the solid curve, representing the conditional 50%

E (F ’ (a + bP))2 quantile, will lie on the diagonal. This is not so.

In particular, the mean observed temperature is

is minimized. The line ± + βP is the regression of about 3 —¦ F lower when temperatures below 20 —¦ F

are forecast. When temperatures below 12 —¦ F are

F on P. Two necessary conditions for the forecast

to be unbiased are that a is zero and b is 1 such forecast, about 75% of observations are actually

that the regression line is the 45—¦ diagonal in the less than the forecast. The typical forecast error is

two-dimensional (F, P)-plane. generally independent of the forecast itself.

Estimates of the density function of the forecast

18.2.2 Joint distributions. The joint (F, P)- conditional upon a ¬xed observed temperature p

(i.e., fˆf |P= p ) are displayed in the lower panel

density may be crudely estimated by plotting a

of Figure 18.3 for p = 24 —¦ F, 25 —¦ F, 34 —¦ F. The

scatter diagram, in which each realization (f, p)

two conditional F distributions for p = 14 —¦ F

is marked by a dot. Alternatively one could group

and p = 25 —¦ F are almost symmetric, but since

all realizations (f, p) into small boxes and display

18: Forecast Quality Evaluation

396

The mean squared error is the expected (i.e.,

long-term average) squared error which is de¬ned

by

S F P = E (F ’ P)2 .

2

(18.4)

The Brier skill score is a measure of the skill of

the forecast F relative to a reference forecast R of

the same predictand P. The comparison is made on

the basis of the mean square error of the individual

forecasts. The Brier score is given by

2

SF P

BF R P = 1 ’

S2 P

R

S2 P ’ SF P

2

= .

R

(18.5)

S2 P

R

The proportion of explained variance is the

percentage of P-variance that is explained by F,

Var(P) ’ Var(F ’ P)

R2 P = (18.6)

F

Var(P)

Var(F ’ P)

= 1’ .

Var(P)

18.2.4 Skill Score Ranges. For a perfect

forecast, that is, F = P, the correlation skill score

ρ F P is 1, the mean squared error S F P is zero and

2

Figure 18.3: Estimated conditional probability

the percentage of explained variance R 2 P is 100%.

density functions of the Minneapolis temperature F

If F is the climatological forecast (i.e., F =

forecast. From Murphy et al. [283].

E(P)), then ρ F P and R 2 P are zero and S F P =

2

Top: Quantiles of the distribution of the predictand F

P conditional on the forecast F = f. The frequency Var(P).

of the forecasts is also shown so that the credibility If F is a random forecast, with the same

mean and variance as P then ρ F P is zero and

of the conditional quantiles can be judged.

S F P = Var(F ’ P) = Var(F) + Var(P) =

2

Bottom: Distribution of the forecast F conditional

on the value of the predictand for P = 14 —¦ F, 25 —¦ F, 2Var(P). The explained variance is R 2 P = 1 ’

F

34 —¦ F. The ˜ p( f |x)™ in the diagram is Murphy™s 2Var(P)/Var(P) = ’1.

Thus, the skill scores ρ F P and R 2 P are

notation for the conditional probability density

F

function f F|P= p . constructed so that they have value 1 for a perfect

forecast and zero or less than zero for trivial

reference forecasts.

E(F|P = 14 —¦ F) > 14 —¦ F, it is evident that the

forecast is biased.

18.2.5 Skill Score Characteristics. The corre-

lation skill score is insensitive to some types of

18.2.3 Skill Scores. Several measures are

systematic error. In particular, skill is not affected

frequently used to describe the skill of quantitative

if the forecasts contain a constant bias or if the

forecasts. These measures include the correlation

amplitude of two differ by a constant factor. That

skill score, the mean squared error, the Brier skill

is, for two forecasts F and G = aF + b for some

score and the proportion of explained variance.3

constants a and b, then F and G have the same

The correlation between the forecast F and the

correlation skill score. On the other hand, mean

verifying observation P is called the correlation

squared error is very sensitive to such systematic

skill score and is given by

errors. The results of many years of weather fore-

Cov(F, P) casting have shown that the mean squared error

ρF P = √ . (18.3)

Var(F)Var(P) favours forecasting schemes that avoid extremes

and tend not to deviate greatly from climatology

3 A more complex measure of skill than those de¬ned here

(because the penalty grows as the square of the

is the ˜linear error in the probability space™ (LEPS) score

error [179]).

introduced by Ward and Folland [415].

18.2: The Skill of Quantitative Forecasts 397

18.2.6 Correlation Skill Score and Probability We may therefore write equation (18.8) as

Statements. Some appreciation for the interpre-

P (P ≥ 0 and F ≥ 0) = (18.9)