Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Recent trends in dynamical medium-range tropical cyclone track prediction and the role of resolution versus physics in the ECMWF model
Michael Fiorino National Hurricane Center
Miami FL
4 December 2008
2
Abstract The conventional wisdom in the tropical cyclone (TC) modeling community is that dynamical TC track and intensity prediction requires: 1) high-resolution (order 1 km); 2) the assimilation of detailed observations in the inner core; and 3) ensembles of independent deterministic and perturbed models. However, recent results from the deterministic run of the ECMWF global model challenge this wisdom and suggest that the critical and dominant factor in numerical TC prediction skill is the modeling of physical processes. In this paper we analyze how model changes, in both resolution and in the parameterization of key physical processes, impact the model forecasts of the global tropical wind field and TC tracks. Since a change in the cumulus convection scheme in November 2007, the ECMWF TC track forecasts have outperformed multi-model consensus by 20% globally in the medium-range (72 h). We review recent trends in model performance vis-à-vis consensus to demonstrate that this improvement is unprecedented and may be a harbinger of a new era in TC track prediction. Finally, some personal views are offered on the implication of these results for TC model development.
1
1. Introduction – TCs, Tropical Wind Scores and Model Physics One of the more remarkable results from the global atmospheric reanalysis projects is the
strong dependence of the analysis of tropical cyclones (TC) on the model, rather than the
observations. This dependence is demonstrated in Fig. 1 below from the first ECMWF
reanalysis, ERA-15 (Serrano 1998). The ERA-15 analysis detects about 85% of
observed TCs from 1979-1994 (solid blue line), but the operational ECMWF model
(dashed blue line) at the time only reaches that detection rate in 1989. The primary
difference between the operational and ERA-15 analysis is the model and data
assimilation scheme, as the reanalysis used the essentially the same observations as in
operations.
Fig. 1. detection of tropical cyclones in ECMWF ERA-15 reanalysis (solid blue line) and operations (dashed blue line). Thus, the poor quality of the operational model analysis of TCs was not caused by
insufficient observations by the modeling.
2
Fig. 2. tropical skill score -- time in days when the correlation between the analyzed and forecast 850 mb winds in the tropics drops to 70%. Next consider the change in the standard ECMWF tropical forecast skill score for the
period 1980-2008. Comparing this time series (red line in Fig. 2) to the TC detection in
operations in Fig. 1 (blue dashed line), we can easily see a strong correlation of TC
detection and the 850 tropical wind score (ECMWF 2008). Similar correlations have
been found with the NCEP-NCAR reanalysis and the second ECMWF reanalysis ERA-
40 (Uppala et al., 2005). Thus, by understanding how the tropical wind score varies with
model changes, we may gain a better understanding of the critical modeling and
observation factors that influence TC analysis and ultimately prediction.
There are a number of events in the Fig. 2 time series:
• 1980-1986 – slow, steady improvement 1980-1986 and larger upward swings in the Boreal summer
• 1989-1993 – rapid improvement at the end of 1989, even bigger positive swings. • 1993-1999 – noticeably drop in late 1993 followed by a period of faster
improvement • 2000-2001 – less pronounced summer-time skill gain and more pronounced
downward swings in the Austral summer.
3
• 2003-2005 – skill advances to 7 d from 2 d in the early 1980s, but plateaus. Boreal summer time skill reaches 10 d!
• 2006 – slight dip • 2008 – positive bump
The ECMWF web site (www.ecmwf.int) has excellent documentation on model changes
and some details are extracted and put into Appendix A. The highlights of the ECMWF
model resolution and data assimilation history is given in table 1 below:
Table 1 major changes in the ECMWF model
year event 1987 T106L19 model resolution 1993 T213L31 model resolution 1996 3DVAR 1997 4DVAR 2000 Tl319L60 model resolution 2001 Tl511L60 model resolution 2006 Tl799L91 model resolution
A slight but discernable degradation in the tropical wind score can be seen for the
resolution increases in 1987, 1993, 2001 and 2006, with the most pronounced drop
associated with the largest resolution increase in 1993.
This response to resolution change is typical for areas/meteorology sensitive to physics.
The problem is that any parameterization assumes a separation of scales (large scale v
small scales of the physics) and when a model resolution change moves into the gap
between the scales, the parameterizations behave differently. Increased resolution always
violates the scale assumption so the parameters of the physics schemes need to be
“adapted” to the new resolution. This model development process is essential to
restoring skill and while resolution changes can be painful, they do have model
diagnostic value.
The main point is that the big jump in skill in 1989 was not simply due to resolution, but
is more likely the result of a reformulation of the cumulus convection scheme that
included moving to the mass flux approach (Arakawa and Schubert 1974) in that year.
Other significant physics changes include (details in Appendix A):
4
• June 1990: increased latent heat flux in the western Pacific warm pool at low
surface wind speeds • April 1991: revision to the convection scheme for more consistency with the
cloud model and the triggering of showers.
The net result was a near doubling of the skill score from 2.5 d ~ 4.25 d in the early
1990s; until the resolution change in 1993.
The decade of 1995- 2005 saw a remarkable improvement from 4 d to nearly 7 d that I
attribute to 3- and 4-DVAR in the mid 1990s and general model improvements to the
point that by 2004 the ECMWF deterministic TC track predictions were competitive with
the best models of the day, both global (UKMO, NOGAPS and GFS) and limited-area
(GFDL) as demonstrated for the Atlantic in Fig. 3. Details of the track data used is given
in Appendix C, the post-processing is described in Appendix B and the description of the
model in Table 2.
5
Fig. 3 ECMWF track forecast error in the Atlantic basin for 2004, compared to GFDL (gfdl06), UKMO (ukmo06), NOGAPS (ngp06) and GFS (gfsn06). All raw model output have been post processed in the same as in operations as described in Appendix B. The strong correlation of the tropical wind score and TC track prediction has also been
found with other models and in the reanalyses. This correlation is consistent with our
physical understanding of tropical cyclone motion (Fiorino and Elsberry 1989) and its
dependence on the global/large scales of the tropical general circulation. Not only are
TCs the extreme weather event that challenges a model and pushes the physics to the
limit, but TC track prediction is good measure of the quality of the large-scale tropical
wind field.
Table 2. Description of the model TC track data short name source description
ngps06 FNMOC NOGAPS US Navy global model; first formal evaluation in 1994 by JTWC. from 1992-2001 available twice daily and then four times daily from 2002-2008
gfdl06 NCEP GFDL GFDL hurricane model run at NCEP. very few cases in 1992-93 and run twice daily 1994-1999
6
ecmo06 ECMWF ECMWF global model, deterministic 10-d integration ukmo06 UKMO UKMO Unified Model, global operational version. available
1996-2008 in the Atlantic and 1991-2008 in WPAC. model run twice daily at 00 and 12 UTC.
egrr06 UKMO human QC’d output from the UKMO UM global model; first formal evaluation in 1998 by JTWC, available twice daiyl
gfsn06 NCEP GFS NCEP GFS global model clip CLIPER CLIimatology and PERsistence no-skill statistical model bcon NHC/JTWC Best/Baseline model CONsensus made at the operational
forecast centers, Appendix C gives details on which specific consensus model was BCON.
2.0 Medium-range TC track prediction and Multi-model Consensus The big improvement in the ECMWF tropical wind score circa 1990 marks the beginning
of a period of unprecedented gain in dynamical TC forecasting that has resulted in a
nearly 100% improvement in official TC track prediction from the early 1980s to early
2000s, especially at the 72-h forecast time, in what I call the “medium-range.”
There are many reasons to focus on the medium-range, both operationally and in a
modeling sense, but the main reason for using 72-h in this study is because of its model
diagnostic value. Simply put, the model has to “get everything right” to make a good
medium-range track prediction.
By three days into the integration, the model has lost a strong connection with the initial
conditions and even a perfect analysis could not prevent fundamental model error growth
and chaos from causing significant error (~20%) in the solution. While persistence may
be good forecast for TC motion in the short-range (12-24 h), by 72 h a persistence
forecast has no skill (~ 500 km which is 10-15X the initial position error ~ 30 km).
TCs are observed to generally change direction and speed of motion in 3 d and this
change is often the result of an interaction of the vortex with a mid-latitude feature such
as a break in the subtropical ridge. Thus, the model track forecast depends on both the
forecast of the large-scale “steering” flow and on a scale interaction that in turn depends
on changes in the vortex, i.e., the model has to forecast both the vortex and synoptic
scales.
7
Fortunately for global models, the dynamics of vortex/large-scale flow interaction that is
critical to motion occurs on scales resolved by large-scale models (Fiorino and Elsberry
1989). Thus, high-resolution (dx ~10-20 km) is not a requirement for good medium-
range TC track forecasts.
Another milestone in numerical medium-range TC track prediction was when the
ECMWF tropical wind score reached 7 d in 2003. This coincided with the first
operational application of multi-model consensus forecasting in which an ensemble of
quasi-independent deterministic model runs are combined to produce a consensus
forecast. While a number of schemes for combining the forecasts have been tried, a
simple average of the tracks has proven as successful as or better than more elegant
approaches.
The success of consensus depends on two key factors: 1) the degree of decorrelation of
the errors between the individual models; and 2) all members must have skill similar to,
or close to, the best model (Goerss 2000). The result from operations is that in the mean
consensus generally has more skill than any of the individual model used in the
consensus.
2.1 Trends in the Atlantic basin
All the points in the proceeding discussion are demonstrated in Fig. 4 where we show a
time series of yearly mean 72-h forecast error for two “best” models, the standard no-skill
baseline aid CLIPER (CLImatology and PERsistence), the best/baseline consensus aid
BCON and the official forecast. The models are described in Table 2 and details on the
data sets, tracking and consensus are given in Appendix C.
We first consider the Atlantic basin and the most consistently skillful model with a long
record – GFDL hurricane model, but before discussing the models note the curious
variation in the CLIPER error. There is a notable downward trend in CLIPER error and
8
an oscillation with a 10-y period. The lower CLIPER error implies that hurricanes are
behaving in a more climatological manner. However, the CLIPER model was updated in
2000 and the forecast extended from 72 h to 120 h, so that part of the change is because
of the improved TC data bases used in the model development. Nonetheless, there seems
to be a downward trend from 2000-2007, but a rise in 2008. The significance of the rise
is that prior to 2008, the model and consensus error tended to generally follow CLIPER,
but in 2008 model error moved downward despite the TC motion being more less
climatological than in previous years.
From 1992-1997 the GFDL model had lower error than the official forecast (purple line
below the red line), but the number of cases was very small in 1992-93 since the model
was still experimental, but by 1995 the model had high availability to the forecaster and
was run twice daily. Since 2000, the GFDL model is run at the same frequency at which
the official forecasts are made – four times daily. The point here is not that the GFDL
model “beat” the human forecasters, but that the model showed skill and that the
forecasters were able to successfully use the guidance.
It is also noteworthy how the model showed more year-year variability until 2004, when
the ECMWF tropical score reached 7 d. This skill consistency is perhaps the result of the
stability (greater run-to-run consistency) a large observing system gives the global model
(NCEP GFS). In turn, the global model provides initial and lateral boundary conditions
to the limited-area GFDL model.
9
Fig. 4. medium-range (72-h) mean forecast error in the Atlantic basin for the years 1992-2008 for: a "best" dynamical model gfdl06 - blue); the UKMO global Unified Model (ukmo06 – orchid), the best/baseline consensus (bcon – yellow), the official NHC forecast (ofcl - red) and the no-skill baseline aid CLImatology and PERsistence (clip - wheat). The solid line is a smoothed version of the time series (dashed). The dynamical model track was post-processed to be consistent with operations and the aids (see Appendix B for details). The error and number of cases (in []) is given table below the plot. bcon is only available from 2000-2008 and the statistics are homogenous with clip. Also note how both the models and the official forecast error slowly varies in a similar
manner as CLIPER (wheat colored line); rising in the early 2000s and then falling until
2007-2008. CLIPER is also used to measure forecast difficulty as high CLIPER errors
imply that the TCs did not behave in a climatology or typical manner for that year. It
could be very telling that the model did not follow CLIPER upward in 2008; an
indication that the global model analysis has even higher quality than in previous years.
Details aside, the model and official forecast skill is much greater than CLIPER and there
has been a clear downward trend in both. Also note how the official forecast is lower or
beats the model. From 1992 to 2008 the official forecast error has been cut from 294 nm
10
to 127 nm – a greater than 100% improvement. The GFDL model improvement over that
time periods was not as dramatic, but consensus (heavy gold line) was better than the
model and on par with the forecasters. Similarly, the UKMO global (orchid line) shows a
similar trend.
Fig. 5 gain or % improvement of the GFDL model, BCON consensus, and the official OFCL forecasts relative to CLIPER. Positive values indicate lower foreast error than CLIPER. An alternative view of the error statistics is to calculate a % gain or improvement against
some baseline as show in Fig. 5. The standard comparison baseline is CLIPER and
positive values indicate how much better (lower) the mean forecast error is relative to the
baseline.
The general improvement trend is less pronounced over the 17-y period, but what is more
interesting is how the model and consensus are becoming even better vis-à-vis CLIPER
11
from 2006-2008 to over 50%. We also see that consensus is close or slighter better than
the official forecast, but essentially they are neck and neck.
To bring the comparison into sharper focus, we use BCON as the baseline vice CLIPER
as in Fig. 5.
Fig. 6. as in Fig. 5 but for the % gain of the model (gfdl06 - purple) and the official NHC forecast (ofcl - red) against consensus. Negative values indicate the aid is poorer (higher forecast error) than the baseline.
The GFDL model is clearly not as skillful as consensus and is ~25 % worse (Fig. 6). The
UKMO model is similar, but with much larger year-to-year swings. The +16% gain on
BCON in 2007 is not significant1 because of very few cases/storms in the Atlantic for
that year.
1 Statistical significance is not addressed as the purpose is to examine broad trends and relationship (more cases) not to overly focus on year-to-year differences.
12
The degree of degradation varies with the model and year, but what we do not find is a
model that is consistently better than consensus at 72 h. The only way a model can
outperform consensus is if it has much lower error than its peers, i.e., skill does not come
from error compensation, but from better meteorology – good results for the physically
more correct reasons.
Note how it took five years for NHC to effectively utilize BCON, to the point that the
official forecast is now as good as, or better than, consensus. The seemingly slow
adoption of new guidance is understandable as no one forecast aid has ever been
consistently superior to all other models, especially consensus.
The situation with TC model consensus is akin to the concept of “diversification” as a
financial investment strategy. The best market sector always changes from year to year
so that the safest bet is to invest in all sectors or to form an “investment consensus.”
There is, however, one very important difference between TC track prediction and the
financial markets – the model equations are conservative and a physically most-accurate
solution does exist. Thus, it is possible for a single model to outperform consensus, but
only by producing a physically more accurate solution relative to its peers.
2.1 Trends in the western North Pacific (WPAC) basin
In some respects the WPAC basin is more “tropical” (low latitude) than the Atlantic in
that the source of TC disturbances is a large (~2000 km) near-equatorial, monsoon trough.
Atlantic TCs, in contrast, generally originate from tropical waves coming off Africa and
the mean latitude of formation and existence as a tropical cyclone is about 5-7 deg of
latitude poleward compared to WPAC. Also, there are far more “hybrid” and subtropical
cyclones in the Atlantic than in WPAC indicating a stronger mid-latitude influence.
Thus it would be expected that WPAC TC motion is in general, and over many years,
more influenced by convection and moist physics than in higher-latitude basins. This is
not to say there are no “tropical” Atlantic hurricanes, just that there are more low-latitude
13
typhoons in WPAC. Moreover, attention should be given to the multi-year trends and
means, and not year-to-year details in the statistics.
As in the Atlantic basin, the CLIPER model changed in 2001, nonetheless, the curious
slight downward trend in WPAC can also be seen.
Fig. 7. same as Fig. 4 but for WPAC. Two models are shown – NOGAPS (ngps06 - blue) and the UKMO (egrr06 - orchid), both have a long history in WPAC and are generally competitive with other models used in consensus (bcon – heavy yellow). The official JTWC (ofcl – red) is also given. The performance trend of two dynamical models with a long record in WPAC – the
UKMO and NOGAPS global models – is downward, but with a leveling off or
increasing error from 2005-2008; unlike the Atlantic. However, the errors are much
higher in the 1990s in WPAC and 1999 is a transition year to markedly improved model
forecasts – the year the ECMWF tropical wind score reached 5 d. The modeling
improvements in the late 1990s at both centers appear to have paid off with a larger
impact on TC skill in WPAC than in the Atlantic. This apparent greater sensitivity to
modeling in WPAC will also be found in the ECMWF model results for 2005-2008.
14
Fig. 8. same as Fig. 5 except for WPAC basin. The comparison versus CLIPER is given in Fig. 8 for completeness. The drop off in
relative skill of BCON and NOGAPS in 2007-2008 is disconcerting.
15
Fig. 9. as in Fig. 6, but in WPAC. JTWC was the first center to develop and apply consensus or “hybrid” approaches to
track prediction as far back as 1993, so it is not surprising they were able to effectively
assimilate BCON into there forecasts and in the early 2000s were on par or better than
consensus. Even when the official forecast underperformed BCON in the mid 2000s, the
difference was only about -7% and some of the degradation may be explained by higher
personnel turnover at JTWC compared to NHC. Regardless, as BCON goes, so goes the
official forecasts; at both forecast centers.
As in the Atlantic, the UKMO and NOGAPS models are not as skillful as consensus and
are ~ 25-30% worse. The model loss against BCON shows greater variability and more
inter-model disagreement compared to the Atlantic. This is another sign of greater
sensitivity to the modeling in WPAC.
16
2.3 Summary of trends in track prediction and the relationship with consensus
The main result of this review of medium-range dynamical track prediction during the
period 1992-2008 is that while the models/consensus have steadily improved, no
individual model or single deterministic run has ever been greater than 10% better than
consensus in any one year or one basin and on a 5-10 y time scale is typically 15-20%
worse.
3.0 Dependence of ECMWF track prediction skill on model changes – it really is the physics We now consider two changes to the ECMWF model that would be expected to affect TC
track prediction – horizontal resolution and model physics, especially convection. In
February of 2006, model resolution increased from T511 to T799 or approximately from
dx~40 km to dx~ 25 km. The second change involved the cumulus parameterization in
November 2007.
We collect about one year’s worth of tropical cyclone forecasts before and after each
model change to determine if there are detectable impacts on medium-range track
prediction. The period between the resolution increase to T799 (February 2006) and the
physics change (November 2007) is 21 mo and includes two northern Hemisphere
(NHEM) seasons and one southern Hemisphere (SHEM). However, the number of
NHEM cases in this longer period is not much larger than in the one-year periods because
of unusually weak TC activity in 2007. Details of the periods and model changes are
given in Table 3, but note that number of verifiable model forecast at 72 h is similar.
Thus, intercomparison between the three periods will not be overly biased by differences
in number of cases.
Table 3. The three time periods considered and main characteristics of the ECMWF model.
Time Period (yyyymm)
model changes # cases at 72 h
color code
200502-200602 T511 resolution (dx ~ 40 km) 400 red
17
200602-200711 T799 resolution (dx ~ 25 km) 460 yellow200711-200811 T799 resolution + modified cumulus
convection (CC) 428 green
The mean FE at the standard forecast times for the model versus best/baseline consensus,
BCON, for all TCs is given in Fig. 10 mainly for completeness. While the lower error for
the model (light-shaded bar) versus consensus after the physics change (green bars) is
apparent, showing the relative gain brings the difference into sharper focus.
Fig. 10. Mean forecast error (FE) for all TCs globally of the ECMWF model and Best/Baseline Consensus (BCON) for each time period. red - T511; yellow - T799; green - T799 + convection change. The lighter-shaded bar is for the model the heavy shading for BCON. The main result is shown in Fig. 11 – the % gain/loss of the model versus consensus
calculated in the same way as in Figs. 6 and 9, but here for three versions of the ECMWF
model; again for all TCs globally.
18
First note how the T511 version of the model (red bar) was about 20-15% worse (higher
mean FE) than BCON at all forecast times. This relationship with consensus is typical or
slightly better than the best models in the Atlantic (Fig. 6) which is 15-25% poorer.
The resolution increase to T799 (yellow v red bars) made a distinct improvement in
relative skill, particularly at the medium-range so that by 120 h the model was on par, or
better than consensus. The gains at the longer forecast times likely come from model
improvements, e.g., slower error growth.
Fig. 11. % gain or improvement in mean forecast error (FE) of ECMWF v BCON for all TCs globally. red - T511; yellow - T799; green - T799 + convection change. The 15-20% gain after the physics changes in November 2007 (green v yellow bars) is
simply unprecedented and indicates a fundamental advancement in performance for the
ECMWF model. The gains at the short-range are particularly impressive and imply an
improved analysis as well as model.
19
One important requirement for successful data assimilation is small “innovations” or the
difference between observations and the model forecast background. A model that
makes a short-range forecast (typically 6 h) close to the observations will produce smaller
innovations and smaller innovations have a higher probability of improving the model
analysis and subsequent forecasts.
Simply put, the better the model, the smaller the innovations and the better the
analysis/forecast, especially in the short-range (12-36 h for TCs).
The model changes in November 2007 resulted in a fundamental improvement on
scales/meteorology significant to TC track prediction. While the tropical wind score does
show a jump in 2007 (Fig. 2), the change is not as dramatic, possibly because the gains
are regional.
20
Fig. 12. as in Fig. 11 except for the Atlantic (LANT) basin.
21
Fig. 13. as in Fig. 12 except for the western North Pacific basin (WPAC). Figs. 12 and 13 show the same % gain/loss relative to BCON as in Fig. 11, but separately
for the Atlantic and WPAC. The pattern of change is similar to the global pattern of
improvements at the longer forecast times with increased resolution, and the model better
than consensus at all times with the physics change. The difference is that pattern is
much stronger in WPAC and even stronger in SHEM (not shown).
The 20-30% gain in WPAC with the physics changes is more extreme than found
globally (Fig. 11) and may be partly explained by a stronger influence of convection on
the tropical large-scales in WPAC. We also found a greater response to modeling in the
two other dynamical models in reviewing the errors in Fig. 7. Finally, some of the
models used in BCON saw a drop off in skill in 2008 and that would tend to exaggerate
the % gain.
22
The more muted response in the Atlantic may be a consequence of approaching an
asymptote in skill as the mean 72-h FE for BCON FE in 2008 was very low at 129 nm
and had the greatest-ever improvement over CLIPER at 63%. Despite these high levels
of skill, the ECMWF model in 2008 achieved an even lower mean FE of 116 nm.
Another way to put these statistics in perspective is to consider predictability studies,
most notably Leslie et al., 1998 where they used nonlinear systems theory and both a
barotropic and baroclinic model to estimate “inherent” predictability limits. Their
motivation was partially to define a point of diminishing returns for the development of
TC-specific NWP models. At 72 h all three techniques produced estimates of
approximately 120 nm. There was some dependence on basin, but no more than a 5%
difference, so that the 120 nm mean FE is representative of the lower bound on error.
Leslie et al., 1998 also compared the estimates to the error of NWP models circa 1995
and found that the models were within 35-40% of the inherent limit. Clearly, this
estimate is either too high or the ECMWF model is approaching the “perfect model” as
the model is below the limit. The weaker impact in the Atlantic is consistent with
approaching a limit.
4.0 Summary and implications for medium-range TC track prediction We have examined recent trends in dynamical medium-range TC track prediction and the
relative role of model resolution v physics. The medium-range (72 h or day 3) was the
focus for two basic reasons: 1) any TC forecast aid must first make good track forecasts
before second-order properties such as maximum surface wind speed (intensity) can be
considered in the official forecast, i.e., all aspects of the forecast must be physically
reasonable and consistent; and 2) by 72-h into the integration, model errors become
dominant, i.e., a good analysis cannot overcome model errors.
The near halving in mean 72-h track forecast error for both the models and the official
forecasts from ~300 nm in the mid 1990s to ~150 nm in the mid 2000s is a strong
23
testament to the incredible advances in global model NWP, especially in the tropical
general circulation. This approximately 100% improvement in medium-range TC track
prediction is far greater than in the mid-latitudes, e.g., the 5-d 500 mb NHEM anomaly
correlation for ECMWF increased from ~ 0.75 to to 0.88 or ~ 15% improvement.
A consensus or simple averaging of the track forecasts from multiply NWP center model
runs was found to have higher skill than any individual model and that the model was
typically 20% worse at 72 h than consensus. However, recent results from ECMWF
global model put this long-standing relationship into question.
In November 2007 significant changes were made to the ECMWF model physics,
including the parameterization of cumulus convection. By comparing TC track forecasts
before and after changes in the ECMWF model physics, we found a dramatic
improvement in medium-range track, especially in the convectively more active,
monsoon-trough TC basins of WPAC and the SHEM. The improvement in the Atlantic
was less pronounced, probably due to approaching an asymptote in skill as the model and
consensus forecasts in 2008 had the lowest errors in history and below predictability
estimates from the 1990s. We also found that an increase in the ECMWF model
resolution in February 2006 had a much smaller impact.
The implication of these ECMWF model results for the future of TC track prediction and
hurricane model forecast improvement are many fold and strongly challenge
conventional wisdom.
The first notion is that high spatial resolution is a necessary or even a sufficient condition
for TC prediction. For TC simulation, the inner core must be resolved, but in terms of
motion the global model results indicate that the resolution of ECMWF model (~ 25 km)
is adequate as this model outperformed both a higher-resolution global model (JMA) and
the limited-area models (GFDL and HWRF).
24
The second implication is that TC motion becomes a global problem sooner than
previously thought, so that by 72 h small changes in the large-scale, far from the storm,
have a significant affect on track, and that global-scale information must be accurately
communicated into a limited-area model. However, the lateral boundary conditions
cannot be mathematically formulated to do this communication accurately (Harrison and
Elsberry 1972) and even if it were possible, there would be still be a “physics” barrier
because of differences in the parameterizations/physics between the global and limited-
area model.
The most accurate solution is a two-way interactive nest inside the global model, as
opposed to the current one-way influence of a separate global model on a different
limited-area model. The forecast time at which global scales errors significantly degrade
the limited-area model solution could be as early as 48 h, in which case running such
one-way influence models past 48 h is counter indicated.
The consensus approach to deterministic forecasting has been very successful over the
last 9 years and has motivated the application of single- and multi-model ensembles
systems to improve consensus by adding solutions with higher skill and greater error
decorrelation. However, the ECMWF model globally was 20% better than consensus in
2008 – a staggering achievement for an NWP model. Hitherto, the models were 20%
worse. The third suggestion is that the path to better forecasts may not lie in ensembles
and that we must have better understanding of how the ECMWF model smashed through
the predictability limits from the 1990s.
Fourth and finally, the reasonable assumption that skill, especially for intensity, is
critically dependent on the analysis of the TC vortex is questionable. ECMWF is the
only operational NWP center that does nothing special to analyze the TC wind structure.
Other modeling systems use either synthetic observations or wholesale vortex
replacement. One explanation why the ECMWF “less is more” approach gives better
track forecasts is that vortex specification is distorting the larger-scale flow around the
cyclone and thereby adding error, albeit small, on scales that vortex motions are sensitive.
25
In the current era of a huge observing system and accurate models, small errors do matter
and the TC analysis problem may be far more challenging than previously considered.
26
Acknowledgements The TC best track and model data come from the two US operational TC forecast centers, the Joint Typhoon Warning Center (JTWC) and the National Hurricane Center (NHC). Thanks go to Ed Fukada and Angelo Alvarez of JTWC and Chris Sisko at NHC for their help in accessing the TC data.
27
Appendix A. Details on ECMWF model changes from www.ecmwf.int and the UKMO operational global version of the Unified Model. A.1 ECMWF Physics Changes 1989-1990 http://www.ecmwf.int/products/data/operational_system/evolution/evolution_1989.html
2 May 1989 Forecast mode cycle 32. A set of three important modifications to the model's physics was implemented in the operational forecasting system:
1. A new parametrisation scheme for radiative fluxes and the representation of cloud optical. properties.
2. A reformulation of cumulus parametrisation using the mass flux approach.
http://www.ecmwf.int/products/data/operational_system/evolution/evolution_1990.html
5 June 1990 The following changes were implemented in the forecast model cycle 36:
1. The parametrisation of surface fluxes at low wind speed over sea was modified by replacing the transfer co-efficients for heat and moisture in unstable conditions (free convection limit). The change has had a considerable impact on the latent heat flux which, over the Western Pacific (warm pool), can increase by up to 25 W/m2. Synoptically it primarily affects the tropical flow which becomes more realistic at lower levels (e.g. improved monsoon flow) as well as at upper levels (reduced zonal men wind error);
2. 3. The formulation for the convective cloud cover was modified to account for non-
precipitating shallow cumuli. The effects of this change are an increase of total cloud cover (most noticeable in the trades and over the continents in daytime), a reduction of continental precipitation, and enhancement of evaporation over subtropical oceans (by 5-10 W/m2).
http://www.ecmwf.int/products/data/operational_system/evolution/evolution_1991.html
9 April 1991 The following modifications were implemented in the forecast model cycle 38:
3.0 Two revisions to the convection scheme, to ensure consistent cloud physics within the cloud model, and to introduce a cloudtop temperature check to improve the onset determination of shower precipitation.
28
A.2 ECMWF Resolution change February 2006 From: http://www.ecmwf.int/products/data/operational_system/evolution/evolution_2006.html
1 February 2006 Introduction of Cycle 30r1. This version includes the following changes:
• Increase in horizontal resolution to T799 for the deterministic forecast and the outer loops of 4D-Var.
• Increase in vertical resolution to 91 levels for the deterministic model. Model top raised to 0.01 hPa.
• Increase in horizontal resolution to T255 for the second inner loop of 4D-Var. • Increase in horizontal resolution to T399 and in vertical resolution to 62 levels
(model top approx. 5 hPa) for the EPS. • Increase in horizontal resolution to 0.36 degrees for the global ocean wave
model.
• Increase wave spectral resolution to 24 directions and 30 frequencies (from 12 and 25 respectively) for the EPS ocean wave model.
• Use of grid-point humidity and ozone in 4D-Var. • Revised coefficients (version 2.3) for the linearised ozone chemistry scheme,
supplied by Daniel Cariolle, CERFACS. • Use of Jason altimeter wave height data and ENVISAT ASAR spectra in the
wave model data assimilation. ERS-2 SAR spectra are no longer assimilated.
The impact of the new cycle on the performance of the deterministic forecast system has been tested on more than 360 cases between July 2004 and December 2005.
The impact on the performance of the forecasting system has been evaluated comprehensively checking out all components of the system. There are small but statistically significant improvements to deterministic 500 hPa height forecasts over Europe and for both extra-tropical hemispheres out to day 6. Overall impact on weather parameters is modest, but precipitation scores over Europe are consistently improved, in particular for large daily rainfall amounts.
Tropical temperature and wind forecasts are generally better when verified against radiosondes. Tropical cyclone position and intensity forecasts are improved in the high resolution system with some spectacular results for the intense hurricanes, such as Katrina, which hit the US Gulf Coast last autumn.
There are increases in the number of observations used in the assimilation with better analysis and background departure statistics. Extra data also had a significant beneficial
29
impact on the ocean wave analyses. Verification against independent observations confirmed improvements in the ocean wave forecasts.
The Ensemble Prediction System has been tested for some 30 cases in July and August 2005 and again in the e-suite over the period 17 October to 30 November 2005. The overall benefit of the higher resolution for the EPS is reflected in the results for the probability scores which are consistently improved for 500hPa height anomalies. In a similar way, the evaluation of precipitation events over Europe gave better results for the e-suite with respect to the Brier and ROC area scores.
A.3 ECMWF Physics changes November 2007 From: http://www.ecmwf.int/products/data/operational_system/evolution/evolution_2007.html#6November2007
6 November 2007 Introduction of Cycle 32r3 The new cycle includes significant changes to the model physics, including the convection scheme, with a beneficial increase in model activity globally, but in particular in the tropics.
The main changes included in this cycle are:
• New formulation of convective entrainment and relaxation timescale • Reduction in free atmosphere vertical diffusion • New soil hydrology scheme • New radiosonde temperature and humidity bias correction • Increase in number of radio occultation data from COSMIC • Assimilation of AMSR-E, TMI, SSMIS window channels (clear sky) • Assimilation of SBUV ( NOAA-17, NOAA-18) and monitoring of OMI ozone data
The impact of the new cycle on the performance of the forecasting system has been tested in research mode for the months of January to May 2007, and in pre-operational runs over the period June to September 2007. The new cycle shows improved scores for the extra-tropical southern hemisphere throughout the forecast range, and for the northern hemisphere in the first half of the forecast. There is a substantial improvement to the lower tropospheric winds in the tropics compared to observations. Surface weather parameters have improved in the new system, in particular through a reduction in the biases of 2m temperature mainly over the Americas and better precipitation forecasts over Europe.
The main changes to the EPS included in this cycle are:
• initial perturbation amplitude reduced by 30% • singular vectors targeted on tropical cyclones are computed with the new moist
physics package in the tangent-linear and adjoint models (as used in the operational in 4D- Var since cycle 32r2).
30
The reduction of the initial perturbation amplitude resulted in a good match of spread and ensemble mean error. The magnitude of the reduction was based on preliminary experimentation over 13 summer cases and 13 winter cases. The EPS has also been evaluated in pre-operational runs during June to September 2007. Results confirm that the reduction in initial spread leads to a generally improved match between spread and average ensemble mean error throughout the forecast range for the extra-tropics, with slight over-activity towards day 15. Probabilistic skill is generally improved at mid-latitudes. Preliminary experimentation with the new moist physics package over 27 summer cases indicates an improved reliability of the tropical cyclone strike probability forecasts and neutral impact in the extra-tropics. ROC statistics indicate that these changes also had a beneficial impact on probabilistic precipitation forecasts over Europe.
A.4 UKMO global version of Unified Model Changes 1999-2008 (Heming 2008, personal communication) 29 March 1999: 3D-Var introduced and use of ATOVS data. 27 June 2000: MOSES surface scheme introduced 7 August 2002: New Dynamics package 11 December 2002: New version of MOSES and introduce QuikScat 26 May 2004: AIRS data assimilated 5 October 2004: 4D-Var introduced 18 January 2005: HadGEM physics introduced 13 December 2005: Increase resolution N320 (~40km) 50 levels 14 March 2006: Big package of satellite data and physics changes 15 May 2007: More physics changes 27 November 2007: Use of IASI & ASCAT and OSTIA SST analysis 25 November 2008 (TODAY!): Big package of changes to model, assimilation and observations (too numerous to mention). Today's change is the last science change before the move to our new supercomputer and upgrade to ~25km and 70 levels next year. from: http://www.metoffice.gov.uk/research/nwp/publications/mosac/doc-2007-01a.pdf
G15 28 January 1998 • Enhanced resolution: 60km, 30 levels
G16 15 April 1998 • Preliminary runs introduced in place of LAM
G17 12 May 1998 • Specification of orography corrected
31
over Antarctica.
G18 22 Sept 1998 • GWD constant = 6600 • Reduced correlation scale in Tropics • CAPE closure convection (1h adjustment scale)
G18 9 February 1999 • New mean sea level pressure diagnostic
G19 29 March 1999 • 3d VAR introduced • Use of ATOVS soundings from the AMSU instrument on NOAA15 commenced
G19 13 July 1999 • Sea-ice analysis from SSM/I installed
G20 20 July 1999 3d VAR/OBS upgrade:
• Revised covariances • Use of ATOVS over Siberia • Thinned scatwinds to one per analysis grid
G21 19 October 1999 3d VAR/OBS upgrade: • SSMI windspeeds • TOVS & ATOVS radiances in VAR • Improved processing of PMSL - greater use of P_surface • Use of Antarctic obs + Use of winds at pole • Improved covariance model • Improved descent algorithm for nearly quadratic J • Aircraft obs thinning &
32
revised obs errors
G22 17 May 2000 3d VAR/OBS upgrade: • Use of observed rather than retrieved radiance from TOVS/ATOVS • Revised background error covariance model • Use of model background at correct observation time.
G23 27 June 2000 • MOSES I surface scheme implemented
G24 13 Feb 2001 3d VAR/OBS upgrade: • Use of scatterometer winds from ERS • Improved humidity error correlations Increased use of AMSU-A radiances in cloudy areas • Introduction of surface wind speeds from a second SSM/I satellite • Introduction of wind profiler information
G25 18 April 2001 3d VAR/OBS upgrade: • NOAA16 ATOVS data replaces NOAA14 TOVS data • Begin using data from AMSU-B instrument on NOAA15 and NOAA16.
G26 16 Oct 2001 3d VAR/OBS upgrade: • Error assumed for Satob winds doubled • Change in thinning strategy for ATOVS • Removal of a bug in the surface type
33
associated with ATOVS radiances • Use of fractional sea ice in ATOVS processing
G27 7 Aug 2002 • Introduction of semi-implicit non-hydrostatic dynamical formulation
using HADAM4 physics
G28 1 Oct 2002 • Commence assimilation
of NOAA-17 ATOVS
G29 3 Dec 2002 • Merge of global and
preliminary suites. Global model data cut-off now 1h 50min for all runs.
• Digital filter to initialise model at T+0, rather than
incremental analysis update to T+3
Cycle Date Definition
G32 26 May 2004 • Introduction of AIRS radiances. • Increased use of ATOVS in early data cut-off runs through introduction of EUMETSAT ATOVS Retransmission Service radiances over Europe and the North Atlantic. Increased use of ATOVS over land and in the tropics through more accurate radiative transfer modelling and bias correction.
G33 4 Oct 2004 • Introduction of 4D-Var
G34 18 Jan 2005 • Remaining HadGEM1 physics – 8B BL, 3C precipitation, 30m CAPE closure
34
G35 8 Feb 2005 • DA upgrade – LH release in PF model, MODIS winds, correct RTTOV-7 radiation calculations
G36 14 Jun 2005 • DA and satellite upgrade – updated COV stats, GOES BUFR replaces SATOB, new SST data.
G37 17 Aug 2005 • Soil nudging scheme - Assimilation of soil moisture using soil increments derived from screen level temperature and humidity increments. • Introduce NOAA-18 and withdraw AQUA AMSU.
G38 13 Dec 2005 • Resolution upgrade - 40km (N320) , 50 levels
G39 14 Mar 2006 • Re-introduce AIRS and ERS-2 Scat. Replace MeteoSat7 with MeteoSat8 satwinds. • Tropical improvements - Physics upgrade, Valley cooling fix
G40 14 June 2006 • Data assimilation upgrade • Improved soil moisture nudging • Boundary layer bug fixes
G41 26 Sep 2006 • Satellite changes - SSMIS and GPS RO
G42 5 Dec 2006 • Satellite changes - more ATOVS + add GPS-RO • Statospheric stability
35
package: (Spectral Gravity wave drag; Increased N-S halo; removal ramped upper diffusion) • Convective cloud decay • PMSL correction; Surface/soil and 10m wind gust diagnostics
G43 6 Mar 2007
• Data Assimilation Upgrade including first use of METOP (ATOVS)
G44 15 May 2007
Model Physics Upgrade including • Introduce biogenic aerosols climatology • bare soil albedo based on MODIS • moisture flows into level below saturated or frozen soli rather than lost as run-off • seasonal varying leaf area index (LAI) Assimilation of COSMIC GPS radio occultation data
G45 14 Aug 2007
Data Assimilation upgrade including • Improved use of GPSRO, AIRS and ATOVS • Use of hourly SYNOPs • Improved PF model physics (convection)
2 Oct 2007 New SST analysis (OSTIA)
36
Appendix B. Some details on model track output post-processing for operational applications.
A numerical model is always run after the synoptic time to allow observations to reach
the center. The delay is typically 3 h, after which the data assimilation is performed to
create the model initial conditions and forecast model is run. The total delay between the
valid time of the model analysis and availability of a 6 day forecast is about 5 h. Thus,
for a 06Z official forecast, only the 00Z model integration would be available and this
“late” model needs to be post-processed to bring the model guidance 6 h forward in time
to be coincident with the official forecast.
The typical way to move the forecast track forward is through interpolation and/or
offsetting e.g., relabel the t=6 h forecast position as t=0, t=18 to t=12 and so on to t=126
= 120. Not all model trackers produce output every 6 h and interpolation may be needed.
Also note that a typically error growth rate of mean forecast error is about 4% / 6h. Thus,
the 6-h interpolated track will have slightly less skill (~ 4%) than the original model track
The post-processing involves more than time interpolation (and extrapolation) of the
track, specifically:
1. linear interpolation to 3-h and then extrapolate the last position 12-h forward in
time. 2. smooth the 3-h interpolated track 3. relabel the forecast times, so that for a 6-h correction, set Mf(τ)=Ms(τ +6h) where
Mf is the finall model latitiude/longitude, τ is the forecast time, and Ms is the smooth 3-h track.
4. take a difference between the model initial latitude/longitude and the operational estimate and remove this “offset” from every forecast position. This step removes the initial position error and puts all models at the same starting point
5. bias correct the τ = 12 h position by adding 1/3 of the 12 h position based on the operational current speed and motion using a rhumb line, Mmotion(12h), i.e., Mf(12h)=2/3*Mf(12h) + 1/3*Mmotion(12h)
While the post-processing used in this paper is similar to the what is done in
operationally at NHC, it is somewhat better in that the mean error from my post
37
processing is ~1% lower than from operations, especially at τ =12 because of the bias
correction in step 5.
The main objection of the post-processing is to insure an “apples-to-apples” comparison,
but it also gives results that are more likely to be found in operational forecasting.
My naming convention is to add ‘06’ to the model name/symbol to indicate a 6 h
forward-in-time post-processing.
Appendix C. Details on track data The primary source of forecast track data is the operational ATCF
(http://www.nrlmry.navy.mil/atcf_web/docs/database/new/database.html ) data files at
both NHC and JTWC. The ECMWF model trackers are run at ECMWF and were
provided directly from ECMWF for the years 2004-2005 and for 2006-present from the
BUFR files transmitted from ECMWF to the WMO tropical cyclone forecast centers via
the WMO GTS communication system.
The definition of BCON was first based on availability in the ATCF data files. The
consensus aids run by JTWC always form a consensus even if only two models are
available, whereas the first con models at NHC required all members be present. In 2004
a variable member consensus was introduced at NHC and at this point I start using this
consensus aid. The problem with fixed member consensus is availability. The typical
POD (probability of detection or “coverage”) for the fixed con model GUNA is ~ 70%
whereas for CONU it is closer to 95%. For model/consensus comparisons it is fairer to
compare models with comparable POD.
basin years ATCF
name comment
WPAC/JTWC 2000-2001 NCON first consensus aid, part of SAFA system “ 2002 CONU 7-model consensus “ 2003-2008 CONW 10-model consensus including a barotropic
model and the Australian limited-area tropical model
38
LANT/NHC 2000 GUNS 3-model consensus requiring all models to form a consensus
LANT/NHC 2001-2003 GUNA 4-model consensus requiring all models to form a consensus
LANT/NHC 2004-2007 CONU 5-model consensus requiring only 2 models to form a consensus
“ 2008 TVCN 6-model consensus
39
References DeMaria, M., M.B. Lawrence and J.T. Kroll, 1990: An error analysis of Atlantic tropical
cyclone track guidance models. Weather and Forecasting, 5, 47-61. ECMWF 2008: forecast verification web:
http://www.ecmwf.int/products/forecasts/d/charts/medium/verification/timeseries/z_score_monthly_mean!Tropics!run!12!pop!od!oper!w_scores!latest!/
Goerss, J. S., 2000: Tropical cyclone track forecasts using an ensemble of dynamical
models. Mon. Wea. Rev., 128, 1187-1193. Goress, J. S., C. R. Sampson and J. M. Gross, 2004: A history of western North Pacifc
tropical cyclone track forecast skill. Wea. Forecasting, 19, 633-638. Harrison, E.J., Jr. and R.L. Elsberry, 1972: A method to incorporating nested finite grids
in the solution of systems of geophysical equations. J. Atmos. Sci., 29, 1235-1245. Leslie, L. M., R.F. Abbey and G.J. Holland, 1998: Tropical cyclone track predictability.
Meteorol. Atmos. Phys., 65, 223-231. Serrano, E. 1998: Tropical cyclones. ERA-15 Project Report Series, 5, 30 pp. ECMWF
available at: http://www.ecmwf.int/publications/library/do/references/show?id=83257
Uppala, S.M., P.W. Kållberg, A.J. Simmons, U. Andrae, V. da Costa Bechtold, M.
Fiorino, J.K Gibson, J. Haseler, A. Hernandez, G.A. Kelly, X. Li, K. Onogi, S. Saarinen, N. Sokka, R.P. Allan, E. Andersson, K. Arpe, M.A. Balmaseda, A.C.M. Beljaars, L. van de Berg, J. Bidlot, N. Bormann, S. Caires, F. Chevallier, A. Dethof, M. Dragosavac, M. Fisher, M. Fuentes, S. Hagemann, E. Hólm, B.J. Hoskins, L. Isaksen, P.A.E.M. Janssen, R. Jenne, A.P. McNally, J.-F. Mahfouf, J.-J. Morcrette, N.A Rayner, R.W. Saunders, P. Simon, A. Sterl, K.E. Trenberth, A. Untch, D. Vasiljevic, P. Viterbo and J. Woollen, 2005: The ERA-40 Reanalysis. Q. J. R. Meteorol. Soc., 131(612), 2961-3012.