Upload
cory-moris-gardner
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
1
James Brown
An introduction to verifying probability forecasts
RFC Verification Workshop
2
1. Introduction to methods• What methods are available?
• How do they reveal (or not) particular errors?
• Lecture now, and hands-on training later
2. Introduction to prototype software• Ensemble Verification System (EVS)
• Part of a larger experimental project (XEFS)
• Lecture now, and hands on training later
Goals for today
3
3. To establish user-requirements• EVS in very early (prototype) stage
• Pool of methods may expand or contract
• Need some input on verification products
• AND to address pre-workshop questions…...
Goals for today
4
How is ensemble verification done?
Same for short/long-term ensembles?
What tools, and are they operational?
Which metrics for which situations?
Simple metrics for end-users?
How best to manage the workload?
What data need to be archived/how?
Pre-workshop questions
5
1. Background and status
2. Overview of EVS
3. Metrics available in EVS
4. First look at the user-interface (GUI)
Contents for next hour
6
1. Background and status
7
A first look at operational needs• Two classes of verification identified
1. High time sensitivity (‘prognostic’)• e.g. how reliable is my live flood forecast?...
• …where should I hedge my bets?
2. Less time sensitive (‘diagnostic’)• e.g. which forecasts do less well and why?
A verification strategy?
8
Prognostic exampleT
emp
erat
ure
(oC
)
Forecast lead day
Live forecast (L)
Historical observations | μH = μL ± 1.0˚C
Matching historical forecasts (H)
9
Diagnostic exampleP
rob
abili
ty o
f w
arn
ing
co
rrec
tly
(hit
)
Probability of warning incorrectly (‘false alarms’)
0
0 1.0
1.0
e.g. flood warningwhen P>=0.9
Climatology
Single-valuedforecast
10
Motivation for EVS (and XEFS)• Demand: forecasters and their customers
• Demand for useable verification products
• ….limitations of existing software
History• Ensemble Verification Program (EVP)
• Comprised (too) many parts, lacked flexibility
• Prototype EVS begun in May 07 for XEFS…..
Motivation for EVS
11
Position in XEFS
IFP
Ensemble Viewer
OFS
Raw flow ens.
Pp’ed flow ens.
Ensemble Verification Subsystem
Flow Data
Ens. Product Generation Subsystem
Ensemble verification products
Hydrologic Ensemble Hindcaster
Ens. User Interface
EPP User Interface
Ens. Pre-Processor
Atmospheric forcing data
Ensemble/prob.
products
Ens. Post-Proc.
Ens. Streamflow Prediction Subsystem
HMOS Ensemble Processor
MODs
EPP3ESP2 EnsPost EPG
EVS
Hydro-meteorol.
ensembles
Precip., temp. etc.
Streamflow
12
2. Overview of EVS
13
Diagnostic verification• For diagnostic purposes (less time-sensitive)
• Prognostic built into forecasting systems
Diagnostic questions include….• Are ensembles reliable?
• Prob[flood]=0.9: does it occur 9/10 times?
• Are forecaster MODS working well?
• What are the major sources of uncertainty?
Scope of EVS
14
Verification of continuous time-series• Temperature, precipitation, streamflow etc.
• > 1 forecast points, but not spatial products
All types of forecast times• Any lead time (e.g. 1 day – 2 years or longer)
• Any forecast resolution (e.g. hourly, daily)
• Pair forecasts/observed (in different t-zones)
• Ability to aggregate across forecast points
Design goals of EVS
15
Flexibility to target data of interest• Subset based on forecasts and observations
• Two conditions: 1) time; 2) variable value
• e.g. forecasts where ensemble mean < 0˚C
• e.g. max. observed flow in 90 day window
Ability to pool/aggregate forecast points• Number of observations can be limiting
• Sometimes appropriate to pool points
Design goals of EVS
16
Carefully selected metrics • Different levels of detail on errors
• Some are more complex than others, but….
• Use cases and online docs. to assist
To be ‘user-friendly’• Many factors determine this….
• GUI, I/O, exec. speed, batch modes
Design goals of EVS
17
Example of workflow
How biased are my winter flows > flood
level at dam A?
18
Coordinated across XEFS:
The forecasts• Streamflow: ESP binary files (.CS)
• Temperature and precip: OHD datacard files
The observations• OHD datacard files
Unlikely to be database in near future
Archiving requirements
19
3. Metrics available
20
Many ways to test a probability forecast
1. Tests for single-valued property (e.g. mean)
2. Tests of broader forecast distribution
• Both may involve reference forecasts (“skill”)
Caveats in testing probabilities• Observed probabilities require many events
• Big assumption 1: we can ‘pool’ events
• Big assumption 2: observations are ‘good’
Types of metrics
21
Discrete/categorical forecasts• Many metrics rely on discrete forecasts
• e.g. will it rain? {yes/no} (rain > 0.01)
• e.g. will it flood? {yes/no} (stage > flood level)
What about continuous forecasts?• An infinite number of events
• Arbitrary event thresholds (i.e. ‘bins’)?
• Typically, yes (and choice will affect results)
Problem of cont. forecasts
22
Detail varies with verification question • e.g. inspection of ‘blown’ forecasts (detailed)
• e.g. avg. reliability of flood forecast (< detail)
• e.g. rapid screening of forecasts (<< detail)
All included to some degree in EVS……
Metrics in EVS
23
Greatest + ve
90 percent.
80 percent.
50 percent.
20 percent.
10 percent.
‘Errors’ for 1 forecast
Greatest - ve
Observation
En
sem
ble
fo
reca
st e
rro
rs
Most detailed (box plot)
0 2 4 6 8 10 12 14 16 18 20 Time (days since start time)
24
Greatest + ve
90 percent.
80 percent.
50 percent.
20 percent.
10 percent.
‘Errors’ for 1 forecast
Greatest - ve
Observation
En
sem
ble
fo
reca
st e
rro
rs
Observed value (increasing size)
Most detailed (box plot)
25
Less detail (Reliability)O
bse
rved
pro
bab
ility
giv
en f
ore
cast
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Forecast probability (probability of flooding)
“On occasions when flooding is forecast withprobability 0.5, it should occur 50% of the time.”
“Forecast bias”
26
Less detail (C. Talagrand)
Cu
mu
lati
ve p
rob
abili
ty
“If river stage <=X is forecast with probability 0.5, it should be observed 50% of the time.”
0 10 20 30 40 50 60 70 80 90 100
Position of observation in forecast distribution
“Forecast bias”
27
Least detailed (a score)
0 5 10 15 20 25 30
Riv
er s
tag
e
Time (days)
2.0
1.6
1.2
0.8
0.4
0.0
Flood stage
12
3
5
Forecast Observation
Brier score = 1/5 x {(0.8-1.0)2 + (0.1-1.0)2 +
(0.0-0.0)2 + (0.95-1.0)2 + (1.0-1.0)2}4
28
Least detailed (a score)
0 5 10 15 20 25 30
Cu
mu
lati
ve p
rob
abili
ty
Precipitation amount
1.0
0.8
0.6
0.4
0.2
0.0
Single forecast
Observation
A
B
CRPS = A2 + B2
Then average acrossmultiple forecasts: small scores are better
29
4. First look at the GUI
30
Two-hour lab sessions with EVS• Start with synthetic data (with simple errors)
• Then more on to a couple of real cases
Verification plans and feedback • Real-time (‘prognostic’) verification
• Screening verification outputs
• Developments in EVS
• Feedback: discussion and survey
Rest of today