User-Focused Verification

User-Focused Verification

Barbara Brown*NCAR

July 2006

[email protected]

July 2006 2WAS*IS

Concepts of user-focused verificationBarbara Brown, NCAR; [email protected]

Purposes of verification*– Administrative– Scientific– Economic

*Brier and Allen, 1951

July 2006 3WAS*IS

Concepts of user-focused verification

Purposes of verification (Brier and Allen, 1951)– Administrative– Scientific– Economic

Postulate:– Most verification, to date, serves only the first

purpose (administrative)– This is especially true for verification of

operational systems

July 2006 4WAS*IS

Historical perspectiveThe Finley example

• U.S. Army (Signal Service/Corps) 1877-~1920

• Tornado predictions 1884-1885– Two 8-h outlooks per day– Spotter reports

(~1,000 reporters)– 18 districts, eastern U.S.;

4 parts in each

John Park Finley 1854-

1943

(from Galway, 1985; BAMS)

Lebanon, KS, 1902

July 2006 5WAS*IS

The Finley example

• Finley forecasts

• 96.6% accurate• “Accuracy” if no tornado forecasts

issued: 98.2%

Obs. Yes Obs. No Sum

Fcst.Yes 28 72 100

Fcst. No 23 2680 2703

Sum 51 2752 2803

July 2006 6WAS*IS

The Finley example: Outcomes

• The first (?) scientific discussion of verification• Numerous verification measures developed (e.g.,

Equitable Threat Score, Heidke skill score)• Many issues raised

– Definition of forecast “event”– Quality of observations– Baselines of no skill– “Dimensionality” of the verification problem– Specifying purpose of verification– Use and value of forecasts– Asymmetric costs of misclassification

Source: Murphy, 1996 (WAF, 11, 3-20)

July 2006 7WAS*IS

Current situation

Not much has changed…• Measures-based approaches applied in

practice• Operational verification focuses on

“management” needs and model-centric applications of verification – i.e., not on diagnostic or user-focused approaches

• Focus is on – A few traditional measures– Aggregated statistics– A few parameters (e.g., 500 mb ht, T, PoP)

July 2006 8WAS*IS

Current situation cont.

• Model verification “drives” choices in model parameterizations, development, etc.– Ex: verification of models using RMSE or

anomaly correlation applied to 500 mb heights leads to particular choices in model development and evolution (which may – or may not – be intended)

• Uncertainty in verification measures is rarely estimated

• Forecast use/value rarely considered

July 2006 9WAS*IS

Uncertainty in verification measures

Model precipitation example: Equitable Threat Score (ETS)

Confidence intervals take into account various sources of error, including sampling and observational

Computation of confidence intervals for verification stats is not always straight-forward

July 2006 10WAS*IS

Forecast Evaluation: Quality vs. Value

Forecast QUALITY

(Verification)

Forecast VALUE

(User studies; Impacts)

Forecast EVALUATION

July 2006 11WAS*IS

What’s wrong with the traditional, measures-oriented approach?

Traditional verification measures (e.g., RMSE, CSI, ETS)

• Provide overall monitoring of forecast performanceBut• Measure only limited attributes of forecast quality• Tend to reward “smooth” forecasts• Do not provide information about what went wrong

with a forecast (they only say that it was wrong)• Cannot diagnose how the forecast can be “fixed”

or feed into forecast development process• Are not “informative” to users

July 2006 12WAS*IS

Challenges and issues: Traditional verification approaches

• First four forecasts have POD=0; FAR=1; CSI=0– i.e., all are equally

“BAD”

• Fifth forecast has POD>0, FAR<1, CSI>1

• Traditional verification approach identifies “worst” forecast as the “best”

O F O F

O F O F

FO

July 2006 13WAS*IS

High vs. low resolution

Which rain forecast would you rather use?Mesoscale model (5 km) 21 Mar

2004

Sydney

Global model (100 km) 21 Mar 2004

Sydney

Observed 24h rain

RMS=13.0 RMS=4.6

From E. Ebert “Smooth” forecasts generally “Win” according to traditional verification approaches.

July 2006 14WAS*IS

Why do users need verification information?

• Improve forecasts• Determine whether to use a forecast or

forecasting system• Heeding/ignoring warnings

Interpretation of forecast (“What does a forecast of 32 really mean?”)

• Input to decisions and/or decision-support systems– Economic and forecast value implications

July 2006 15WAS*IS

User-focused verification: Good forecast or Bad forecast?

F O

July 2006 16WAS*IS


F OIf I’m a water manager for this watershed, it’s a

pretty bad forecast…

July 2006 17WAS*IS


If I’m an aviation traffic strategic planner…

It might be a pretty good forecast

OA B

OF

Flight Route

Different users have different ideas about what makes a good forecast

July 2006 18WAS*IS

An initial goal: Diagnostic evaluation approaches

• Identify and evaluate meaningful attributes of the forecasts – Example questions: What is the typical

location error? Size error? Intensity error?

• Provide detailed information about forecast quality– What went wrong? What went right? – How can the forecast be improved? – How do 2 forecasts differ from each other,

and in what ways is one better than the other?

July 2006 19WAS*IS

Examples of alternative (more user-focused) diagnostic approaches (spatial forecasts)

• Scale-separation approaches– How does performance change as the resolution

changes?

• Entity-based verification – What are the major contributors to forecast error?

• “Fuzzy” approaches– Take into account observational error, impacts of

displacement errors

• Composite approach– Evaluate systematic errors

• Object-based verification– Examine forecasts’ ability to reproduce certain attributes

(e.g., location, shape, intensity)

July 2006 20WAS*IS

Object-based verification example

Locations:Forecast objects are• Too far North (except B)• Too far West (except C)

Precipitation intensity:• Median intensity is too large• Extreme (0.90th) intensity is too

small

Size:• Forecasts C and D are too

small• Forecast B is somewhat too

large

Matching:• Two small observed objects

were not matched

AoBo

Co

Do

AfBf

Cf

Df

Forecast Observed

POD = 0.27

FAR = 0.75

CSI = 0.34

July 2006 21WAS*IS

Composite verification example

Average rain (mm) given an event was predicted

FCST-shadeOBS-contour

FCST-shadeOBS-contour

Average rain (mm) given an event was observed

Accepted for public release: 7530-03-70

From J. Nachamkin

July 2006 22WAS*IS

A new paradigm for verification: Levels of User-focus

• Level 0– Measures-oriented

aggregated summaries of performance

– 1 or 2 traditional statistics (e.g., RMSE, CSI)

– Uncertainty in verification measures not considered

– Uses: Administrative

July 2006 23WAS*IS


• Level 1– Broad diagnostic

approaches applied – A more complete view of

forecast performance– Distributions of errors

presented for meaningful subsets (temporal, spatial)

– Stratified into relevant categories

– Some uncertainty estimates

– Uses/Users: Admin, Forecast developers, some users

July 2006 24WAS*IS


• Level 2– Features-based

verification applied– Detailed information

about forecast attributes– Attribute information can

be tailored to meet specific types of information needs

– Results stratified into relevant categories

– Uncertainty information provided

– Uses/Users: Admin, Forecast developers, Broad range of users

AoBo

Co

Do

AfBf

Cf

Df

July 2006 25WAS*IS


• Level 3– Users identify the type of forecast “quality”

or “performance” information that is needed for particular decisions, or as input to a decision support system

– Verification is tailored to meet the needs of specific users

• Level 4– Economic or cost-loss models or survey

methods are used to assess the value or benefits of particular forecasts for specific users and applications

July 2006 26WAS*IS

To summarize…

• What makes a good forecast depends on the user and the decision to be made– Corollary: Different users need different types of

verification information

• Forecast verification measures are uncertain, and that uncertainty should be estimated and communicated

• Approaches are available (or could be developed) that more appropriately represent potential forecast value and would be useful for the process of estimating value

Documents

User-Focused Verification