1 James Brown [email protected] An introduction to verifying probability forecasts RFC Verification Workshop

1

James Brown

[email protected]

An introduction to verifying probability forecasts

RFC Verification Workshop

2

1. Introduction to methods• What methods are available?

• How do they reveal (or not) particular errors?

• Lecture now, and hands-on training later

2. Introduction to prototype software• Ensemble Verification System (EVS)

• Part of a larger experimental project (XEFS)

• Lecture now, and hands on training later

Goals for today

3

3. To establish user-requirements• EVS in very early (prototype) stage

• Pool of methods may expand or contract

• Need some input on verification products

• AND to address pre-workshop questions…...

Goals for today

4

How is ensemble verification done?

Same for short/long-term ensembles?

What tools, and are they operational?

Which metrics for which situations?

Simple metrics for end-users?

How best to manage the workload?

What data need to be archived/how?

Pre-workshop questions

5

1. Background and status

2. Overview of EVS

3. Metrics available in EVS

4. First look at the user-interface (GUI)

Contents for next hour

6

1. Background and status

7

A first look at operational needs• Two classes of verification identified

1. High time sensitivity (‘prognostic’)• e.g. how reliable is my live flood forecast?...

• …where should I hedge my bets?

2. Less time sensitive (‘diagnostic’)• e.g. which forecasts do less well and why?

A verification strategy?

8

Prognostic exampleT

emp

erat

ure

(oC

)

Forecast lead day

Live forecast (L)

Historical observations | μH = μL ± 1.0˚C

Matching historical forecasts (H)

9

Diagnostic exampleP

rob

abili

ty o

f w

arn

ing

co

rrec

tly

(hit

)

Probability of warning incorrectly (‘false alarms’)

0

0 1.0

1.0

e.g. flood warningwhen P>=0.9

Climatology

Single-valuedforecast

10

Motivation for EVS (and XEFS)• Demand: forecasters and their customers

• Demand for useable verification products

• ….limitations of existing software

History• Ensemble Verification Program (EVP)

• Comprised (too) many parts, lacked flexibility

• Prototype EVS begun in May 07 for XEFS…..

Motivation for EVS

11

Position in XEFS

IFP

Ensemble Viewer

OFS

Raw flow ens.

Pp’ed flow ens.

Ensemble Verification Subsystem

Flow Data

Ens. Product Generation Subsystem

Ensemble verification products

Hydrologic Ensemble Hindcaster

Ens. User Interface

EPP User Interface

Ens. Pre-Processor

Atmospheric forcing data

Ensemble/prob.

products

Ens. Post-Proc.

Ens. Streamflow Prediction Subsystem

HMOS Ensemble Processor

MODs

EPP3ESP2 EnsPost EPG

EVS

Hydro-meteorol.

ensembles

Precip., temp. etc.

Streamflow

12

2. Overview of EVS

13

Diagnostic verification• For diagnostic purposes (less time-sensitive)

• Prognostic built into forecasting systems

Diagnostic questions include….• Are ensembles reliable?

• Prob[flood]=0.9: does it occur 9/10 times?

• Are forecaster MODS working well?

• What are the major sources of uncertainty?

Scope of EVS

14

Verification of continuous time-series• Temperature, precipitation, streamflow etc.

• > 1 forecast points, but not spatial products

All types of forecast times• Any lead time (e.g. 1 day – 2 years or longer)

• Any forecast resolution (e.g. hourly, daily)

• Pair forecasts/observed (in different t-zones)

• Ability to aggregate across forecast points

Design goals of EVS

15

Flexibility to target data of interest• Subset based on forecasts and observations

• Two conditions: 1) time; 2) variable value

• e.g. forecasts where ensemble mean < 0˚C

• e.g. max. observed flow in 90 day window

Ability to pool/aggregate forecast points• Number of observations can be limiting

• Sometimes appropriate to pool points

Design goals of EVS

16

Carefully selected metrics • Different levels of detail on errors

• Some are more complex than others, but….

• Use cases and online docs. to assist

To be ‘user-friendly’• Many factors determine this….

• GUI, I/O, exec. speed, batch modes

Design goals of EVS

17

Example of workflow

How biased are my winter flows > flood

level at dam A?

18

Coordinated across XEFS:

The forecasts• Streamflow: ESP binary files (.CS)

• Temperature and precip: OHD datacard files

The observations• OHD datacard files

Unlikely to be database in near future

Archiving requirements

19

3. Metrics available

20

Many ways to test a probability forecast

1. Tests for single-valued property (e.g. mean)

2. Tests of broader forecast distribution

• Both may involve reference forecasts (“skill”)

Caveats in testing probabilities• Observed probabilities require many events

• Big assumption 1: we can ‘pool’ events

• Big assumption 2: observations are ‘good’

Types of metrics

21

Discrete/categorical forecasts• Many metrics rely on discrete forecasts

• e.g. will it rain? {yes/no} (rain > 0.01)

• e.g. will it flood? {yes/no} (stage > flood level)

What about continuous forecasts?• An infinite number of events

• Arbitrary event thresholds (i.e. ‘bins’)?

• Typically, yes (and choice will affect results)

Problem of cont. forecasts

22

Detail varies with verification question • e.g. inspection of ‘blown’ forecasts (detailed)

• e.g. avg. reliability of flood forecast (< detail)

• e.g. rapid screening of forecasts (<< detail)

All included to some degree in EVS……

Metrics in EVS

23

Greatest + ve

90 percent.

80 percent.

50 percent.

20 percent.

10 percent.

‘Errors’ for 1 forecast

Greatest - ve

Observation

En

sem

ble

fo

reca

st e

rro

rs

Most detailed (box plot)

0 2 4 6 8 10 12 14 16 18 20 Time (days since start time)

24

Greatest + ve

90 percent.

80 percent.

50 percent.

20 percent.

10 percent.

‘Errors’ for 1 forecast

Greatest - ve

Observation

En

sem

ble

fo

reca

st e

rro

rs

Observed value (increasing size)

Most detailed (box plot)

25

Less detail (Reliability)O

bse

rved

pro

bab

ility

giv

en f

ore

cast

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Forecast probability (probability of flooding)

“On occasions when flooding is forecast withprobability 0.5, it should occur 50% of the time.”

“Forecast bias”

26

Less detail (C. Talagrand)

Cu

mu

lati

ve p

rob

abili

ty

“If river stage <=X is forecast with probability 0.5, it should be observed 50% of the time.”

0 10 20 30 40 50 60 70 80 90 100

Position of observation in forecast distribution

“Forecast bias”

27

Least detailed (a score)

0 5 10 15 20 25 30

Riv

er s

tag

e

Time (days)

2.0

1.6

1.2

0.8

0.4

0.0

Flood stage

12

3

5

Forecast Observation

Brier score = 1/5 x {(0.8-1.0)2 + (0.1-1.0)2 +

(0.0-0.0)2 + (0.95-1.0)2 + (1.0-1.0)2}4

28

Least detailed (a score)

0 5 10 15 20 25 30

Cu

mu

lati

ve p

rob

abili

ty

Precipitation amount

1.0

0.8

0.6

0.4

0.2

0.0

Single forecast

Observation

A

B

CRPS = A2 + B2

Then average acrossmultiple forecasts: small scores are better

29

4. First look at the GUI

30

Two-hour lab sessions with EVS• Start with synthetic data (with simple errors)

• Then more on to a couple of real cases

Verification plans and feedback • Real-time (‘prognostic’) verification

• Screening verification outputs

• Developments in EVS

• Feedback: discussion and survey

Rest of today

Documents

1 James Brown [email protected] An introduction to verifying probability forecasts RFC Verification Workshop