be helpful for making inferences about the skill

S = ( pC ’ p E )/(1 ’ p E ) parameter (see Section 6.3).

= (0.507 ’ 0.505)/(1 ’ 0.505) = 4 — 10’3 .

18.1.6 Example: Prediction of Snowfall in

Similarly, if pa = pa = 0.60 for precipitation, Ontario. Burrows [76] designed a forecast

F P

then p E = 0.52 and scheme to predict ˜lake-effect™ snowfall for a

0.519 ’ 0.520 number of stations leeward of Lake Huron in

= ’2 — 10’3 .

S= Ontario, Canada. The predictors were designed to

1 ’ 0.520

be useful when the synoptic situation is favourable

Apparently the skill of the Farmer™s Almanac is for the occurrence of lake-effect snow and the only

no greater than that of a forecast constructed by cases considered were those in which the weather

drawing random numbers from slightly skewed map forecast a synoptic situation conducive

distributions. to lake-effect snow. Categorical forecasts were

prepared at 28 stations. Five categories were used,

18.1.4 Mixing Forecasts of Unequal Skill. with F and P de¬ned as follows:

Let us now consider a hypothetical forecasting Snow amount

Category

scheme that operates throughout the year. During (F and P) (cm)

winter, the scheme produces random forecasts so

1 [0,trace]

that the number of correct forecasts in winter is

pCw = p E w and Sw is zero. In summer, however, 2 (trace, 5]

(5, 12.5]

3

the scheme is better than chance and produces

(12.5, 22.5]

forecasts for which pCs = 1.5 p E s . Then Ss = 4

> 22.5

( pCs ’ p E s )/(1 ’ p E s ) = 0.5. For simplicity, we 5

assume that p E w = p E s and that the number of

Figure 18.1 shows a typical ¬eld of predictand P

winter and summer forecasts are equal. Then, over

(snow amount category actually observed) and the

summer and winter, the Heidke skill score Sw+s is

corresponding ¬eld of forecasts. An asterisk in the

larger than the winter score and smaller than the

forecast ¬eld indicates that a forecast was not made

summer score:

at that location. The overall performance of the

( pCw + pCs ) ’ p E w

1 forecasting scheme is summarized in Table 18.2.

Sw+s = 2 = 0.25. Burrows [76] rated a forecast that was one

1 ’ p E2

category different from the predictand (i.e.,

Thus, if we add random forecasts to a set of |p ’ f| = 1) a better forecast than a forecast which

skilful forecasts, the overall skill score will be was two categories different (i.e., |p ’ f| = 2) and

lowered. If we avoid making forecasts when the so on. Entries on the diagonals in Table 18.2 were

forecast scheme is unable to use the information therefore weighted depending upon the value of

contained in the predictor, the skill score will be |p ’ f|. Counts in the table for which |P ’ F| =

k were multiplied by γk = 1 ’ k/4. The

enhanced.

18: Forecast Quality Evaluation

394

Forecast

Observed 1 2 34 5

1 14 13 1 1 0 29

2 12 26 14 2 0 54

3 2 12 14 5 5 38

4 0 2 4 2 1 9

5 0 0 0 0 0 0

28 53 33 10 6 130

Table 18.2: A 5 — 5 contingency table summarizing

the performance of a lake-effect snow forecasting

scheme. From Burrows [76].

forecasts expected by chance. Finally, a skill score

analogous to the Heidke score was computed as

sb ’ sb

random

SB = ,

n ’ sb

random

where n is the total number of forecasts made.

Note that if we set γ0 to 1 and γk to zero for

nonzero k, then S B reduces to the Heidke skill

score (18.1). Like the Heidke score, S B is zero for

random forecasts and 1 for perfect forecasts.

The S B value for Table 18.2 is 33%. The ¬nding

that the forecasts are skilful is also supported by

two other skill scores computed by Burrows.

The critical success index is de¬ned for each

category k = 1, . . . , 5 as the ratio of number of

occasions Ck on which f = p = k and the sum

Figure 18.1: An example of a categorical forecast

of number of occasions on which either p = k

of snow amount at 28 stations in southern Ontario

or f = k minus Ck . This index is 33% for k =

leeward of Lake Huron. The observed snow

0, 32% for k = 1, 25% for k = 2, 12% for

category (see text) is shown in the top panel. The

k = 3, and 0% for k = 4. The critical success

corresponding forecasts are shown in the lower

ratio for category k is simply an estimate of

panel. From Burrows [76].

the probability of forecast conditional upon either

forecasting or observing category k. The critical

skill index can be compared with that expected

under no skill by recomputing the contingency

table under the assumption of independence. The

weighted counts were then totalled for the entire

corresponding critical success indices expected for

table and used as a measure, say sb , of the

random forecasts are 12% for k = 0, 26% for

number of ˜correct™ forecasts. A similar measure

k = 1, 16% for k = 2, 4%, for k = 3, and 0%

of the number of ˜correct™ random forecasts was

and for k = 4.

computed by estimating the entries of Table 18.2

under the assumption that F and P are independent. The probability of detection (POD) is de¬ned as

The estimated distribution of random counts was the ratio of k Ck divided by the number of all

forecasts T . In this case, the POD is 56/130 =

obtained by multiplying the row total by the

column total and dividing by the table total (130). 43%. The probability of detection is simply the

Burrows then weighted and summed the entries in probability of making a correct forecast. The

this new table as before to produce a corresponding estimated POD for a random forecast is 30% in

random , of the number of ˜correct™

measure, say sb this case.

18.2: The Skill of Quantitative Forecasts 395

18.1.7 Comments. The skill score S B intro-

duced in [18.1.6] is a modi¬ed Heidke score.

Barnston [25] points out that the original Heidke

score has two undesirable properties. First, the

Heidke score increases as the number of categories

decreases. For example, for a broad range of mod-

erately skilful forecast sets, the two-class Heidke

skill score will be about double the ¬ve-class score.

Second, if the reference forecast is the random

forecast, and if classes are not observed (or fore-

cast) with equal frequency, then the Heidke skill

score is not equitable. That is, the Heidke score

will favour a biased forecast unfairly. An example Figure 18.2: Estimated joint distributions of

of this property is given in [18.4.2]. Barnston [25], forecasts and observations (F,P). All data are

collected into bins of 5 —¦ F — 5 —¦ F. Values for f

and also Ward and Folland [415] designed modi-