Upload
nasim-white
View
45
Download
0
Embed Size (px)
DESCRIPTION
User-Focused Verification. Barbara Brown* NCAR July 2006. [email protected]. Concepts of user-focused verification Barbara Brown, NCAR; [email protected]. Purposes of verification* Administrative Scientific Economic *Brier and Allen, 1951. Concepts of user-focused verification. - PowerPoint PPT Presentation
Citation preview
July 2006 2WAS*IS
Concepts of user-focused verificationBarbara Brown, NCAR; [email protected]
Purposes of verification*– Administrative– Scientific– Economic
*Brier and Allen, 1951
July 2006 3WAS*IS
Concepts of user-focused verification
Purposes of verification (Brier and Allen, 1951)– Administrative– Scientific– Economic
Postulate:– Most verification, to date, serves only the first
purpose (administrative)– This is especially true for verification of
operational systems
July 2006 4WAS*IS
Historical perspectiveThe Finley example
• U.S. Army (Signal Service/Corps) 1877-~1920
• Tornado predictions 1884-1885– Two 8-h outlooks per day– Spotter reports
(~1,000 reporters)– 18 districts, eastern U.S.;
4 parts in each
John Park Finley 1854-
1943
(from Galway, 1985; BAMS)
Lebanon, KS, 1902
July 2006 5WAS*IS
The Finley example
• Finley forecasts
• 96.6% accurate• “Accuracy” if no tornado forecasts
issued: 98.2%
Obs. Yes Obs. No Sum
Fcst.Yes 28 72 100
Fcst. No 23 2680 2703
Sum 51 2752 2803
July 2006 6WAS*IS
The Finley example: Outcomes
• The first (?) scientific discussion of verification• Numerous verification measures developed (e.g.,
Equitable Threat Score, Heidke skill score)• Many issues raised
– Definition of forecast “event”– Quality of observations– Baselines of no skill– “Dimensionality” of the verification problem– Specifying purpose of verification– Use and value of forecasts– Asymmetric costs of misclassification
Source: Murphy, 1996 (WAF, 11, 3-20)
July 2006 7WAS*IS
Current situation
Not much has changed…• Measures-based approaches applied in
practice• Operational verification focuses on
“management” needs and model-centric applications of verification – i.e., not on diagnostic or user-focused approaches
• Focus is on – A few traditional measures– Aggregated statistics– A few parameters (e.g., 500 mb ht, T, PoP)
July 2006 8WAS*IS
Current situation cont.
• Model verification “drives” choices in model parameterizations, development, etc.– Ex: verification of models using RMSE or
anomaly correlation applied to 500 mb heights leads to particular choices in model development and evolution (which may – or may not – be intended)
• Uncertainty in verification measures is rarely estimated
• Forecast use/value rarely considered
July 2006 9WAS*IS
Uncertainty in verification measures
Model precipitation example: Equitable Threat Score (ETS)
Confidence intervals take into account various sources of error, including sampling and observational
Computation of confidence intervals for verification stats is not always straight-forward
July 2006 10WAS*IS
Forecast Evaluation: Quality vs. Value
Forecast QUALITY
(Verification)
Forecast VALUE
(User studies; Impacts)
Forecast EVALUATION
July 2006 11WAS*IS
What’s wrong with the traditional, measures-oriented approach?
Traditional verification measures (e.g., RMSE, CSI, ETS)
• Provide overall monitoring of forecast performanceBut• Measure only limited attributes of forecast quality• Tend to reward “smooth” forecasts• Do not provide information about what went wrong
with a forecast (they only say that it was wrong)• Cannot diagnose how the forecast can be “fixed”
or feed into forecast development process• Are not “informative” to users
July 2006 12WAS*IS
Challenges and issues: Traditional verification approaches
• First four forecasts have POD=0; FAR=1; CSI=0– i.e., all are equally
“BAD”
• Fifth forecast has POD>0, FAR<1, CSI>1
• Traditional verification approach identifies “worst” forecast as the “best”
O F O F
O F O F
FO
July 2006 13WAS*IS
High vs. low resolution
Which rain forecast would you rather use?Mesoscale model (5 km) 21 Mar
2004
Sydney
Global model (100 km) 21 Mar 2004
Sydney
Observed 24h rain
RMS=13.0 RMS=4.6
From E. Ebert “Smooth” forecasts generally “Win” according to traditional verification approaches.
July 2006 14WAS*IS
Why do users need verification information?
• Improve forecasts• Determine whether to use a forecast or
forecasting system• Heeding/ignoring warnings
Interpretation of forecast (“What does a forecast of 32 really mean?”)
• Input to decisions and/or decision-support systems– Economic and forecast value implications
July 2006 15WAS*IS
User-focused verification: Good forecast or Bad forecast?
F O
July 2006 16WAS*IS
User-focused verification: Good forecast or Bad forecast?
F OIf I’m a water manager for this watershed, it’s a
pretty bad forecast…
July 2006 17WAS*IS
User-focused verification: Good forecast or Bad forecast?
If I’m an aviation traffic strategic planner…
It might be a pretty good forecast
OA B
OF
Flight Route
Different users have different ideas about what makes a good forecast
July 2006 18WAS*IS
An initial goal: Diagnostic evaluation approaches
• Identify and evaluate meaningful attributes of the forecasts – Example questions: What is the typical
location error? Size error? Intensity error?
• Provide detailed information about forecast quality– What went wrong? What went right? – How can the forecast be improved? – How do 2 forecasts differ from each other,
and in what ways is one better than the other?
July 2006 19WAS*IS
Examples of alternative (more user-focused) diagnostic approaches (spatial forecasts)
• Scale-separation approaches– How does performance change as the resolution
changes?
• Entity-based verification – What are the major contributors to forecast error?
• “Fuzzy” approaches– Take into account observational error, impacts of
displacement errors
• Composite approach– Evaluate systematic errors
• Object-based verification– Examine forecasts’ ability to reproduce certain attributes
(e.g., location, shape, intensity)
July 2006 20WAS*IS
Object-based verification example
Locations:Forecast objects are• Too far North (except B)• Too far West (except C)
Precipitation intensity:• Median intensity is too large• Extreme (0.90th) intensity is too
small
Size:• Forecasts C and D are too
small• Forecast B is somewhat too
large
Matching:• Two small observed objects
were not matched
AoBo
Co
Do
AfBf
Cf
Df
Forecast Observed
POD = 0.27
FAR = 0.75
CSI = 0.34
July 2006 21WAS*IS
Composite verification example
Average rain (mm) given an event was predicted
FCST-shadeOBS-contour
FCST-shadeOBS-contour
Average rain (mm) given an event was observed
Accepted for public release: 7530-03-70
From J. Nachamkin
July 2006 22WAS*IS
A new paradigm for verification: Levels of User-focus
• Level 0– Measures-oriented
aggregated summaries of performance
– 1 or 2 traditional statistics (e.g., RMSE, CSI)
– Uncertainty in verification measures not considered
– Uses: Administrative
July 2006 23WAS*IS
A new paradigm for verification: Levels of User-focus
• Level 1– Broad diagnostic
approaches applied – A more complete view of
forecast performance– Distributions of errors
presented for meaningful subsets (temporal, spatial)
– Stratified into relevant categories
– Some uncertainty estimates
– Uses/Users: Admin, Forecast developers, some users
July 2006 24WAS*IS
A new paradigm for verification: Levels of User-focus
• Level 2– Features-based
verification applied– Detailed information
about forecast attributes– Attribute information can
be tailored to meet specific types of information needs
– Results stratified into relevant categories
– Uncertainty information provided
– Uses/Users: Admin, Forecast developers, Broad range of users
AoBo
Co
Do
AfBf
Cf
Df
July 2006 25WAS*IS
A new paradigm for verification: Levels of User-focus
• Level 3– Users identify the type of forecast “quality”
or “performance” information that is needed for particular decisions, or as input to a decision support system
– Verification is tailored to meet the needs of specific users
• Level 4– Economic or cost-loss models or survey
methods are used to assess the value or benefits of particular forecasts for specific users and applications
July 2006 26WAS*IS
To summarize…
• What makes a good forecast depends on the user and the decision to be made– Corollary: Different users need different types of
verification information
• Forecast verification measures are uncertain, and that uncertainty should be estimated and communicated
• Approaches are available (or could be developed) that more appropriately represent potential forecast value and would be useful for the process of estimating value