becomes

B FC P = A2 ’ B 2 = ρ F P ’ (ρ F P ’ 1)2

2

Figure 18.7: The root of the mean squared

= 2ρ F P ’ 1

error S F P , labelled ˜RMSE,™ is displayed as a

function of the correlation skill score ρ F P for so that

the two cases discussed in [18.3.5]. Curve ˜A™

illustrates the case in which the forecast and the B FC P ≥ 0 ” ρ F P ≥ 0.5. (18.26)

observations have the same expected value and the

same variance. Curve ˜B™ holds for the improved Similarly, for the x = i case,

forecast [18.2.7]. From Barnston [25].

B FC P ≥ 0 ” ρ F P ≥ 0.5.

A

(18.27)

The experience of several decades of opera-

consider two cases (Barnston [25]). First, suppose

tional weather forecasting has led weather fore-

Var F = Var P . Then

casters to use a larger threshold for the anomaly

correlation coef¬cient, namely 0.6. This choice is

S P F = 2Var P ’ 2ρ F P Var P

2

based on the subjective assessment that a predicted

= 2Var P (1 ’ ρ F P ). ¬eld with ρ F P ≥ 0.6 bears suf¬cient resemblance

A

to the observed ¬eld for the forecast to be of use to

The relationship between the correlation ρ F P at least some users of the forecast product.

and the mean square error S P F is illustrated in

2

Figure 18.7 as curve ˜A.™ When the correlation is

zero then the mean square error is 2Var P , which 18.4 Issues in the Evaluation of

is twice the expected error of the climatology Forecast Skill

forecast. When the correlation is negative, the

mean square error becomes even larger than twice 18.4.1 The Reference Forecast. A forecasting

that of the climatology forecast. scheme can not be accepted as being useful if it

Second, suppose the improved forecast [18.2.7] yields skill scores that can be obtained by means of

is F = bF , where b = ρ F P σ p /σ f , and less sophisticated forecasting procedures. That is,

suppose also that E F = E P = 0. Then the any forecasting scheme must be compared against

improved forecast F is unconditionally unbiased a reference forecast which is easier to prepare than

(i.e., E(F ) = E P ) and it is also conditionally the forecast under consideration.

σ

unbiased because σ F = ρ F P . Thus (18.19) Some standard reference forecasts are:

P

simpli¬es to

• the random forecast, F, which is simply a

random variable with the same statistical

S P F = Var P (1 ’ σ F P ).

2 2

˜ properties as the predictand P;

The resulting relationship between the correlation • the persistence forecast F„ (t) = P(t ’ „ );

and the mean squared error is shown as curve ˜B™

• the damped persistence forecast F„ (t) =

in Figure 18.7. The improved forecast always has

ξ „ P(t ’ „ ) with 0 < ξ < 1 and E(P) = 0;

mean squared error that is less than, or in the case

of zero correlation equal to, that of the climatology

• the climatological forecast F„ (t) = C.

forecast.

18.4: Issues in the Evaluation of Forecast Skill 403

three months of daily forecasts for 18 US districts

Forecast

east of the Rocky Mountains which predicted

Observation Tornado No Tornado

whether conditions would be favourable for the

Tornado 28 23

development of tornados. Daily weather maps

No Tornado 72 2680

served as the predictor. A total of n = 2803

forecasts were prepared. Tornado were observed

on 51 of these occasions. The 2 — 2 contingency

Table 18.3: Finley™s [116] success in predicting

tornados. table describing the results of Finley™s efforts is

given in Table 18.3.

The number of correct forecasts, or hits, was

Another reference forecast which is suitable for

C = 2708 whereas the number of expected

quasi-cyclic processes is the POP forecast (see

random hits would be (512 + 26802 )/T = 2703.

Section 15.3).

Thus, the Heidke skill score (18.2) is

The Heidke skill score of a categorical forecast

(as de¬ned in [18.1.1]) uses the random forecast

(C ’ E)/(T ’ E) = 5.6%.

as its reference. The Heidke score can be modi¬ed

to assess skill relative to another reference forecast Is there a simple reference forecast which

by de¬ning p E in (18.1) as the success rate of this does better? Consider the constant ˜no tornado™

other reference (see [18.1.2]). forecast. Then the number of hits is equal to the

We illustrate the idea of a reference forecast number of occasions with no tornado (i.e., C =

with the following examples. 2752) and the Heidke skill score is 49/105 = 48%.

Thus, the verdict of the Heidke skill score is to

18.4.2 Example: The Old Farmer™s Almanac. abandon Finley™s forecast and to use the trivial

We again consider the forecasts of monthly competitor F= ˜no tornado™ instead.

mean temperature and precipitation issued by But is that a fair answer? The ˜no tornado™

the Old Farmer™s Almanac [364] (see [18.2.3]). forecast would have a false alarm rate of zero,

Because the forecasting algorithm used by the Old but it would not have warned of any tornados.

Farmer™s Almanac is unpublished the complexity Finley, on the other hand, had a false alarm rate

of 72/(72 + 28) = 72%, but correctly warned of

of the procedure is unknown. We might therefore

a tornado on 28/(28 + 23) = 55% of all tornado

ask if there exists a trivial forecasting scheme

which would do better than the Old Farmer™s days.

Almanac. The answer is yes. The constant forecast What see then is that there are no universal

F = above normal has better skill than the rules that can be used to judge the performance of

Old Farmer™s Almanac for both precipitation and each and every forecast. Each case must be judged

temperature. separately while keeping in mind the various

In the case of temperature, we have pa = 1,

F

pitfalls.

pa = 0.55, pb = 0 and pb = 0.45 so that pC =

F P

P

(1—0.55+0—0.45), p E = 0.505 and consequently

S = (0.55 ’ 0.505)/(1 ’ 0.505) = 9.1 — 10’2 . In 18.4.4 Example: The Madden-and-Julian

Oscillation. We evaluate the outcome of two

contrast, we showed in [18.1.3] that the skill of the

Old Farmer™s Almanac is only 4 — 10’3 . series of forecasts of an index of the Madden-

and-Julian Oscillation using the correlation skill

For precipitation, the constant ˜below normal™

score ρ. Forecasts were prepared from 15 sets

forecast yields pa = 0, pa = 0.40, pb = 1 F

F P

of initial conditions with the POP method9 and

and pb = 0.60 so that C = 0.60 — T and

P

with a dynamical forecast model. The correlation

S = (0.60 ’ 0.52)/0.48 = 0.16. This is much

larger than the Almanac™s skill of ’2 — 10’3 . skill score was calculated for the two forecasting

schemes for various temporal lags „ (Figure 18.8).

This example illustrates that the Heidke skill

In these experiments the POP forecast scores

score is inequitable [18.1.7]. In these examples

better than the sophisticated dynamical model.

two competing forecasts, both of which are

Therefore the substantial computational cost of the

statistically independent of the predictand, have

dynamical model is not rewarded with increased

different Heidke skill scores.

forecast skill in this particular case. (See also

[15.3.3].)

18.4.3 Example: Finley™s Tornado Forecast.

In the late nineteenth century, Finley [116] (see 9 POP is an abbreviation for Principal Oscillation Pattern.

also Stanski, Wilson, and Burrows [355]) prepared See sec. 15.3.