33
ECMWF Met Op training course – Reading, March 2004 Slide 1 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF

ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF

Embed Size (px)

Citation preview

  • Slide 1

ECMWF Slide 1Met Op training course Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF Slide 2 ECMWF Slide 2Met Op training course Reading, March 2004 Why probability forecasts? the widespread practice of ignoring uncertainty when formulating and communicating forecasts represents an extreme form of inconsistency and generally results in the largest possible reductions in quality and value. --Murphy (1993) Slide 3 ECMWF Slide 3Met Op training course Reading, March 2004 Outline 1.Basics 2.Verification measures 3.Performance 4.Signal Detection Theory: Relative Operating Characteristic (ROC) 5.Cost-Loss model 6.Conclusions Slide 4 ECMWF Slide 4Met Op training course Reading, March 2004 BASICS Types of forecasts - Completely confident Rain/No rain - Probabilistic Objective (deterministic, statistical, ensemble-based) Subjective P(x) x xoxo Slide 5 ECMWF Slide 5Met Op training course Reading, March 2004 Verification framework Observed value x will be 0 if the event has not happened and 1 if the event occurred x = 0 or 1 Forecast probability will vary between 0 and 1.0 f = 0, , 1.0 Joint distribution: p(f,x), where x = 0, 1 Slide 6 ECMWF Slide 6Met Op training course Reading, March 2004 Factorization Conditional and marginal probabilities Calibration-Refinement factorization: p(f,x) = p(x|f) p(f) where p(f) is the frequency of use of each forecast probability Likelihood-Base Rate factorization: p(f,x) = p(f|x) p(x) where p(x) is the relative frequency of a Yes observation (e.g., the sample climatology) Slide 7 ECMWF Slide 7Met Op training course Reading, March 2004 Verification measures based on calibration- refinement factorization Reliability diagram p(x=1|f i ) vs. f i Plot of the observed relative frequency of an event as function of its forecast probability. It shows the agreement between the mean forecast probability and the observed frequency. Sharpness diagram p(f) It indicates the capability of the system to forecast extreme values, or values close 0 or 1. Attributes diagram Reliability, Resolution, Skill/No-skill Slide 8 ECMWF Slide 8Met Op training course Reading, March 2004 Performance measures Brier score: Analogous to MSE; negative orientation; For perfect forecasts: BS=0 Brier skill score: Analogous to MSE skill score Slide 9 ECMWF Slide 9Met Op training course Reading, March 2004 Decomposition of the Brier Score ReliabilityResolutionUncertainty Where I is the total number of distinct probability values and resolution tells how informative the probabilistic forecast is; it varies from zero for a system for which all forecast probabilities verify with the same frequency of occurrence to the sample uncertainty for a system for which the frequency of verifying occurrences takes only values 0 or 100% (such a system resolves perfectly the forecast between occurring and non-occurring events); reliability tells how close the frequencies of observed occurrences are from the forecast probabilities (on average, when an event is forecast with probability p, it should occur with the same frequency p); uncertainty varies from 0 to 0.25 and indicates how close to 50% the occurrence of the event was during the sample period (uncertainty is 0.25 when the event is split equally into occurrence and non- occurrence). Slide 10 ECMWF Slide 10Met Op training course Reading, March 2004 Reliability and Sharpness (from Wilks 1995) ClimatologyMinimal RESUnderforecasting Good RES, at expense of REL Reliable forecasts of rare event Small sample size Slide 11 ECMWF Slide 11Met Op training course Reading, March 2004 Attributes diagram (from Wilks 1995) Slide 12 ECMWF Slide 12Met Op training course Reading, March 2004 examples Slide 13 ECMWF Slide 13Met Op training course Reading, March 2004 Reliability diagram 24 h accumulated precipitation forecast verified against observed values for different thresholds: 1mm/24h (right) and 5mm/24 h (bottom). The diagrams are relative to Europe. The period is December 2003 to February 2004. For the 1mm/24h threshold the model is overconfident. The curve is much closer to the diagonal (perfect forecast) in the 5mm/24h threshold Slide 14 ECMWF Slide 14Met Op training course Reading, March 2004 Reliability diagram 24 h accumulated precipitation forecast verified against observed values for different thresholds: 10mm/24h (right) and 20mm/24 h (bottom). The diagrams are relative to Europe for the period December 2003 to February 2004 For the 10mm/24h threshold the model shows a very good match between forecast probability and observed frequencies. The 20mm/24h threshold shows the effect of small sample size! Slide 15 ECMWF Slide 15Met Op training course Reading, March 2004 Reliability diagram T850 anomaly greater then 4K (right) and 8K (bottom). The diagrams are relative to Europe for the period June 2003 to July 2003 For both anomalies, the forecast is overconfident. Slide 16 ECMWF Slide 16Met Op training course Reading, March 2004 Brier Skill Score (reference is long term climate) for Europe at t+96 (top panel) and t+144 (bottom panel). The variable is the temperature at 850hPa. The curve shows the improvement versus the reference system. Smaller anomalies are better forecast Slide 17 ECMWF Slide 17Met Op training course Reading, March 2004 Brier Skill Score (BSS) for different thresholds Forecast range D+4 Improvements of the EPS in 1999 (increase of vertical resolution and change in cloud scheme) and in Autumn 2000 (change in horizontal resolution) Slide 18 ECMWF Slide 18Met Op training course Reading, March 2004 Signal Detection Theory (SDT) Approach that has commonly been applied in medicine and other fields Brought to meteorology by Ian Mason (1982) Evaluates the ability of forecasts to discriminate between occurrence and non-occurrence of an event Summarizes characteristics of the Likelihood-Base Rate decomposition of the framework Tests model performance relative to specific threshold Allows comparison of categorical and probabilistic forecasts Slide 19 ECMWF Slide 19Met Op training course Reading, March 2004 ROC -- Basics Based on likelihood-base rate decomposition p(f,x) = p(f|x) p(x) Basic elements : Hit rate (H) H = a/(a+c) -- Estimate of p(f=1|x=1) False Alarm Rate (F) F = b/(b+d) -- Estimate of p(f=1|x=0) Relative Operating Characteristic curve Plot H vs. F Obs YESObs NO FC YESab FC NOcd Slide 20 ECMWF Slide 20Met Op training course Reading, March 2004 ROC 24h accumulated precipitation for Europe; DJF 2001-2002 > 1mm/24h > 5mm/24h 20% Slide 21 ECMWF Slide 21Met Op training course Reading, March 2004 ROC 24h accumulated precipitation for Europe; DJF 2003-2004 > 5mm/24h > 1mm/24h Slide 22 ECMWF Slide 22Met Op training course Reading, March 2004 ROC 24h accumulated precipitation for Europe; JJA 2002 > 5mm/24h Slide 23 ECMWF Slide 23Met Op training course Reading, March 2004 ROC Area Area under the ROC is a measure of forecast skill - Values less than 0.5 indicate negative skill - Area can be underestimated if curve is approximated by straight line segments Slide 24 ECMWF Slide 24Met Op training course Reading, March 2004 ROC Area T850 verified against analysis for t+96 (top) and t+144 (bottom). Verification area: Europe Slide 25 ECMWF Slide 25Met Op training course Reading, March 2004 -- ROC Area for different thresholds Forecast range D+4 Sensible improvements of the EPS since Autumn 2000 Slide 26 ECMWF Slide 26Met Op training course Reading, March 2004 -- ROC Area for different thresholds Forecast range D+7 Slide 27 ECMWF Slide 27Met Op training course Reading, March 2004 Verification of ensemble forecasts summary Probabilistic forecasts from ensemble systems can be verified using standard approaches for probabilistic forecasts Common methods Brier score Reliability diagram Brier Skill Score ROC ROC area Slide 28 ECMWF Slide 28Met Op training course Reading, March 2004 Bad weather yes Bad weather no Protect yes CC Protect no L0 Event occurs yes Event occurs no Event forecast yes ab Event forecast no cd Using forecast all the time: expense E f =aC+bC+cL Perfect forecast: expense E p = (a+c)C Climate information: expense E c = min(C, (a+b)L) Value of forecast : reduction in expense compared to climate information V= (saving from using forecast)/ (saving from perfect forecast) V= (E c E f )/(E c -E p ) Cost Loss Basics Slide 29 ECMWF Slide 29Met Op training course Reading, March 2004 Value can be written as follows: Value depends on Forecast quality H and F User through C/L Weather event (a+c) if C/L > if C/L < Quality, value and user Slide 30 ECMWF Slide 30Met Op training course Reading, March 2004 Cost-loss model probabilistic forecast value Known the climatological probability that adverse event happens p clim take action if p clim *L is larger than C P clim > C/L action! P clim < C/L no action Slide 31 ECMWF Slide 31Met Op training course Reading, March 2004 Cost Loss model : probabilistic forecast value Act when probability exceed a certain threshold Choice of probability is user dependent Slide 32 ECMWF Slide 32Met Op training course Reading, March 2004 Cost- Loss model: deterministic vs EPS Control forecast: red line EPS: blue line Slide 33 ECMWF Slide 33Met Op training course Reading, March 2004 Conclusion Probabilistic forecasts from ensemble systems can be verified using standard approaches for probabilistic forecasts Common methods are: Brier score Reliability diagram Brier Skill Score ROC ROC area The performance of the EPS assessed using probabilistic scores shows improvements We should not forget that not only quality is important, we should look at the value of a forecast to its final user. The forecast has value if it helps the end user to make decisions