Recent trends in dynamical medium-range tropical …...1 Recent trends in dynamical medium-range tropical cyclone track prediction and the role of resolution versus physics in the

1

Recent trends in dynamical medium-range tropical cyclone track prediction and the role of resolution versus physics in the ECMWF model

Michael Fiorino National Hurricane Center

Miami FL

4 December 2008

2

Abstract The conventional wisdom in the tropical cyclone (TC) modeling community is that dynamical TC track and intensity prediction requires: 1) high-resolution (order 1 km); 2) the assimilation of detailed observations in the inner core; and 3) ensembles of independent deterministic and perturbed models. However, recent results from the deterministic run of the ECMWF global model challenge this wisdom and suggest that the critical and dominant factor in numerical TC prediction skill is the modeling of physical processes. In this paper we analyze how model changes, in both resolution and in the parameterization of key physical processes, impact the model forecasts of the global tropical wind field and TC tracks. Since a change in the cumulus convection scheme in November 2007, the ECMWF TC track forecasts have outperformed multi-model consensus by 20% globally in the medium-range (72 h). We review recent trends in model performance vis-à-vis consensus to demonstrate that this improvement is unprecedented and may be a harbinger of a new era in TC track prediction. Finally, some personal views are offered on the implication of these results for TC model development.

1

1. Introduction – TCs, Tropical Wind Scores and Model Physics One of the more remarkable results from the global atmospheric reanalysis projects is the

strong dependence of the analysis of tropical cyclones (TC) on the model, rather than the

observations. This dependence is demonstrated in Fig. 1 below from the first ECMWF

reanalysis, ERA-15 (Serrano 1998). The ERA-15 analysis detects about 85% of

observed TCs from 1979-1994 (solid blue line), but the operational ECMWF model

(dashed blue line) at the time only reaches that detection rate in 1989. The primary

difference between the operational and ERA-15 analysis is the model and data

assimilation scheme, as the reanalysis used the essentially the same observations as in

operations.

Fig. 1. detection of tropical cyclones in ECMWF ERA-15 reanalysis (solid blue line) and operations (dashed blue line). Thus, the poor quality of the operational model analysis of TCs was not caused by

insufficient observations by the modeling.

2

Fig. 2. tropical skill score -- time in days when the correlation between the analyzed and forecast 850 mb winds in the tropics drops to 70%. Next consider the change in the standard ECMWF tropical forecast skill score for the

period 1980-2008. Comparing this time series (red line in Fig. 2) to the TC detection in

operations in Fig. 1 (blue dashed line), we can easily see a strong correlation of TC

detection and the 850 tropical wind score (ECMWF 2008). Similar correlations have

been found with the NCEP-NCAR reanalysis and the second ECMWF reanalysis ERA-

40 (Uppala et al., 2005). Thus, by understanding how the tropical wind score varies with

model changes, we may gain a better understanding of the critical modeling and

observation factors that influence TC analysis and ultimately prediction.

There are a number of events in the Fig. 2 time series:

• 1980-1986 – slow, steady improvement 1980-1986 and larger upward swings in the Boreal summer

• 1989-1993 – rapid improvement at the end of 1989, even bigger positive swings. • 1993-1999 – noticeably drop in late 1993 followed by a period of faster

improvement • 2000-2001 – less pronounced summer-time skill gain and more pronounced

downward swings in the Austral summer.

3

• 2003-2005 – skill advances to 7 d from 2 d in the early 1980s, but plateaus. Boreal summer time skill reaches 10 d!

• 2006 – slight dip • 2008 – positive bump

The ECMWF web site (www.ecmwf.int) has excellent documentation on model changes

and some details are extracted and put into Appendix A. The highlights of the ECMWF

model resolution and data assimilation history is given in table 1 below:

Table 1 major changes in the ECMWF model

year event 1987 T106L19 model resolution 1993 T213L31 model resolution 1996 3DVAR 1997 4DVAR 2000 Tl319L60 model resolution 2001 Tl511L60 model resolution 2006 Tl799L91 model resolution

A slight but discernable degradation in the tropical wind score can be seen for the

resolution increases in 1987, 1993, 2001 and 2006, with the most pronounced drop

associated with the largest resolution increase in 1993.

This response to resolution change is typical for areas/meteorology sensitive to physics.

The problem is that any parameterization assumes a separation of scales (large scale v

small scales of the physics) and when a model resolution change moves into the gap

between the scales, the parameterizations behave differently. Increased resolution always

violates the scale assumption so the parameters of the physics schemes need to be

“adapted” to the new resolution. This model development process is essential to

restoring skill and while resolution changes can be painful, they do have model

diagnostic value.

The main point is that the big jump in skill in 1989 was not simply due to resolution, but

is more likely the result of a reformulation of the cumulus convection scheme that

included moving to the mass flux approach (Arakawa and Schubert 1974) in that year.

Other significant physics changes include (details in Appendix A):

4

• June 1990: increased latent heat flux in the western Pacific warm pool at low

surface wind speeds • April 1991: revision to the convection scheme for more consistency with the

cloud model and the triggering of showers.

The net result was a near doubling of the skill score from 2.5 d ~ 4.25 d in the early

1990s; until the resolution change in 1993.

The decade of 1995- 2005 saw a remarkable improvement from 4 d to nearly 7 d that I

attribute to 3- and 4-DVAR in the mid 1990s and general model improvements to the

point that by 2004 the ECMWF deterministic TC track predictions were competitive with

the best models of the day, both global (UKMO, NOGAPS and GFS) and limited-area

(GFDL) as demonstrated for the Atlantic in Fig. 3. Details of the track data used is given

in Appendix C, the post-processing is described in Appendix B and the description of the

model in Table 2.

5

Fig. 3 ECMWF track forecast error in the Atlantic basin for 2004, compared to GFDL (gfdl06), UKMO (ukmo06), NOGAPS (ngp06) and GFS (gfsn06). All raw model output have been post processed in the same as in operations as described in Appendix B. The strong correlation of the tropical wind score and TC track prediction has also been

found with other models and in the reanalyses. This correlation is consistent with our

physical understanding of tropical cyclone motion (Fiorino and Elsberry 1989) and its

dependence on the global/large scales of the tropical general circulation. Not only are

TCs the extreme weather event that challenges a model and pushes the physics to the

limit, but TC track prediction is good measure of the quality of the large-scale tropical

wind field.

Table 2. Description of the model TC track data short name source description

ngps06 FNMOC NOGAPS US Navy global model; first formal evaluation in 1994 by JTWC. from 1992-2001 available twice daily and then four times daily from 2002-2008

gfdl06 NCEP GFDL GFDL hurricane model run at NCEP. very few cases in 1992-93 and run twice daily 1994-1999

6

ecmo06 ECMWF ECMWF global model, deterministic 10-d integration ukmo06 UKMO UKMO Unified Model, global operational version. available

1996-2008 in the Atlantic and 1991-2008 in WPAC. model run twice daily at 00 and 12 UTC.

egrr06 UKMO human QC’d output from the UKMO UM global model; first formal evaluation in 1998 by JTWC, available twice daiyl

gfsn06 NCEP GFS NCEP GFS global model clip CLIPER CLIimatology and PERsistence no-skill statistical model bcon NHC/JTWC Best/Baseline model CONsensus made at the operational

forecast centers, Appendix C gives details on which specific consensus model was BCON.

2.0 Medium-range TC track prediction and Multi-model Consensus The big improvement in the ECMWF tropical wind score circa 1990 marks the beginning

of a period of unprecedented gain in dynamical TC forecasting that has resulted in a

nearly 100% improvement in official TC track prediction from the early 1980s to early

2000s, especially at the 72-h forecast time, in what I call the “medium-range.”

There are many reasons to focus on the medium-range, both operationally and in a

modeling sense, but the main reason for using 72-h in this study is because of its model

diagnostic value. Simply put, the model has to “get everything right” to make a good

medium-range track prediction.

By three days into the integration, the model has lost a strong connection with the initial

conditions and even a perfect analysis could not prevent fundamental model error growth

and chaos from causing significant error (~20%) in the solution. While persistence may

be good forecast for TC motion in the short-range (12-24 h), by 72 h a persistence

forecast has no skill (~ 500 km which is 10-15X the initial position error ~ 30 km).

TCs are observed to generally change direction and speed of motion in 3 d and this

change is often the result of an interaction of the vortex with a mid-latitude feature such

as a break in the subtropical ridge. Thus, the model track forecast depends on both the

forecast of the large-scale “steering” flow and on a scale interaction that in turn depends

on changes in the vortex, i.e., the model has to forecast both the vortex and synoptic

scales.

7

Fortunately for global models, the dynamics of vortex/large-scale flow interaction that is

critical to motion occurs on scales resolved by large-scale models (Fiorino and Elsberry

1989). Thus, high-resolution (dx ~10-20 km) is not a requirement for good medium-

range TC track forecasts.

Another milestone in numerical medium-range TC track prediction was when the

ECMWF tropical wind score reached 7 d in 2003. This coincided with the first

operational application of multi-model consensus forecasting in which an ensemble of

quasi-independent deterministic model runs are combined to produce a consensus

forecast. While a number of schemes for combining the forecasts have been tried, a

simple average of the tracks has proven as successful as or better than more elegant

approaches.

The success of consensus depends on two key factors: 1) the degree of decorrelation of

the errors between the individual models; and 2) all members must have skill similar to,

or close to, the best model (Goerss 2000). The result from operations is that in the mean

consensus generally has more skill than any of the individual model used in the

consensus.

2.1 Trends in the Atlantic basin

All the points in the proceeding discussion are demonstrated in Fig. 4 where we show a

time series of yearly mean 72-h forecast error for two “best” models, the standard no-skill

baseline aid CLIPER (CLImatology and PERsistence), the best/baseline consensus aid

BCON and the official forecast. The models are described in Table 2 and details on the

data sets, tracking and consensus are given in Appendix C.

We first consider the Atlantic basin and the most consistently skillful model with a long

record – GFDL hurricane model, but before discussing the models note the curious

variation in the CLIPER error. There is a notable downward trend in CLIPER error and

8

an oscillation with a 10-y period. The lower CLIPER error implies that hurricanes are

behaving in a more climatological manner. However, the CLIPER model was updated in

2000 and the forecast extended from 72 h to 120 h, so that part of the change is because

of the improved TC data bases used in the model development. Nonetheless, there seems

to be a downward trend from 2000-2007, but a rise in 2008. The significance of the rise

is that prior to 2008, the model and consensus error tended to generally follow CLIPER,

but in 2008 model error moved downward despite the TC motion being more less

climatological than in previous years.

From 1992-1997 the GFDL model had lower error than the official forecast (purple line

below the red line), but the number of cases was very small in 1992-93 since the model

was still experimental, but by 1995 the model had high availability to the forecaster and

was run twice daily. Since 2000, the GFDL model is run at the same frequency at which

the official forecasts are made – four times daily. The point here is not that the GFDL

model “beat” the human forecasters, but that the model showed skill and that the

forecasters were able to successfully use the guidance.

It is also noteworthy how the model showed more year-year variability until 2004, when

the ECMWF tropical score reached 7 d. This skill consistency is perhaps the result of the

stability (greater run-to-run consistency) a large observing system gives the global model

(NCEP GFS). In turn, the global model provides initial and lateral boundary conditions

to the limited-area GFDL model.

9

Fig. 4. medium-range (72-h) mean forecast error in the Atlantic basin for the years 1992-2008 for: a "best" dynamical model gfdl06 - blue); the UKMO global Unified Model (ukmo06 – orchid), the best/baseline consensus (bcon – yellow), the official NHC forecast (ofcl - red) and the no-skill baseline aid CLImatology and PERsistence (clip - wheat). The solid line is a smoothed version of the time series (dashed). The dynamical model track was post-processed to be consistent with operations and the aids (see Appendix B for details). The error and number of cases (in []) is given table below the plot. bcon is only available from 2000-2008 and the statistics are homogenous with clip. Also note how both the models and the official forecast error slowly varies in a similar

manner as CLIPER (wheat colored line); rising in the early 2000s and then falling until

2007-2008. CLIPER is also used to measure forecast difficulty as high CLIPER errors

imply that the TCs did not behave in a climatology or typical manner for that year. It

could be very telling that the model did not follow CLIPER upward in 2008; an

indication that the global model analysis has even higher quality than in previous years.

Details aside, the model and official forecast skill is much greater than CLIPER and there

has been a clear downward trend in both. Also note how the official forecast is lower or

beats the model. From 1992 to 2008 the official forecast error has been cut from 294 nm

10

to 127 nm – a greater than 100% improvement. The GFDL model improvement over that

time periods was not as dramatic, but consensus (heavy gold line) was better than the

model and on par with the forecasters. Similarly, the UKMO global (orchid line) shows a

similar trend.

Fig. 5 gain or % improvement of the GFDL model, BCON consensus, and the official OFCL forecasts relative to CLIPER. Positive values indicate lower foreast error than CLIPER. An alternative view of the error statistics is to calculate a % gain or improvement against

some baseline as show in Fig. 5. The standard comparison baseline is CLIPER and

positive values indicate how much better (lower) the mean forecast error is relative to the

baseline.

The general improvement trend is less pronounced over the 17-y period, but what is more

interesting is how the model and consensus are becoming even better vis-à-vis CLIPER

11

from 2006-2008 to over 50%. We also see that consensus is close or slighter better than

the official forecast, but essentially they are neck and neck.

To bring the comparison into sharper focus, we use BCON as the baseline vice CLIPER

as in Fig. 5.

Fig. 6. as in Fig. 5 but for the % gain of the model (gfdl06 - purple) and the official NHC forecast (ofcl - red) against consensus. Negative values indicate the aid is poorer (higher forecast error) than the baseline.

The GFDL model is clearly not as skillful as consensus and is ~25 % worse (Fig. 6). The

UKMO model is similar, but with much larger year-to-year swings. The +16% gain on

BCON in 2007 is not significant1 because of very few cases/storms in the Atlantic for

that year.

1 Statistical significance is not addressed as the purpose is to examine broad trends and relationship (more cases) not to overly focus on year-to-year differences.

12

The degree of degradation varies with the model and year, but what we do not find is a

model that is consistently better than consensus at 72 h. The only way a model can

outperform consensus is if it has much lower error than its peers, i.e., skill does not come

from error compensation, but from better meteorology – good results for the physically

more correct reasons.

Note how it took five years for NHC to effectively utilize BCON, to the point that the

official forecast is now as good as, or better than, consensus. The seemingly slow

adoption of new guidance is understandable as no one forecast aid has ever been

consistently superior to all other models, especially consensus.

The situation with TC model consensus is akin to the concept of “diversification” as a

financial investment strategy. The best market sector always changes from year to year

so that the safest bet is to invest in all sectors or to form an “investment consensus.”

There is, however, one very important difference between TC track prediction and the

financial markets – the model equations are conservative and a physically most-accurate

solution does exist. Thus, it is possible for a single model to outperform consensus, but

only by producing a physically more accurate solution relative to its peers.

2.1 Trends in the western North Pacific (WPAC) basin

In some respects the WPAC basin is more “tropical” (low latitude) than the Atlantic in

that the source of TC disturbances is a large (~2000 km) near-equatorial, monsoon trough.

Atlantic TCs, in contrast, generally originate from tropical waves coming off Africa and

the mean latitude of formation and existence as a tropical cyclone is about 5-7 deg of

latitude poleward compared to WPAC. Also, there are far more “hybrid” and subtropical

cyclones in the Atlantic than in WPAC indicating a stronger mid-latitude influence.

Thus it would be expected that WPAC TC motion is in general, and over many years,

more influenced by convection and moist physics than in higher-latitude basins. This is

not to say there are no “tropical” Atlantic hurricanes, just that there are more low-latitude

13

typhoons in WPAC. Moreover, attention should be given to the multi-year trends and

means, and not year-to-year details in the statistics.

As in the Atlantic basin, the CLIPER model changed in 2001, nonetheless, the curious

slight downward trend in WPAC can also be seen.

Fig. 7. same as Fig. 4 but for WPAC. Two models are shown – NOGAPS (ngps06 - blue) and the UKMO (egrr06 - orchid), both have a long history in WPAC and are generally competitive with other models used in consensus (bcon – heavy yellow). The official JTWC (ofcl – red) is also given. The performance trend of two dynamical models with a long record in WPAC – the

UKMO and NOGAPS global models – is downward, but with a leveling off or

increasing error from 2005-2008; unlike the Atlantic. However, the errors are much

higher in the 1990s in WPAC and 1999 is a transition year to markedly improved model

forecasts – the year the ECMWF tropical wind score reached 5 d. The modeling

improvements in the late 1990s at both centers appear to have paid off with a larger

impact on TC skill in WPAC than in the Atlantic. This apparent greater sensitivity to

modeling in WPAC will also be found in the ECMWF model results for 2005-2008.

14

Fig. 8. same as Fig. 5 except for WPAC basin. The comparison versus CLIPER is given in Fig. 8 for completeness. The drop off in

relative skill of BCON and NOGAPS in 2007-2008 is disconcerting.

15

Fig. 9. as in Fig. 6, but in WPAC. JTWC was the first center to develop and apply consensus or “hybrid” approaches to

track prediction as far back as 1993, so it is not surprising they were able to effectively

assimilate BCON into there forecasts and in the early 2000s were on par or better than

consensus. Even when the official forecast underperformed BCON in the mid 2000s, the

difference was only about -7% and some of the degradation may be explained by higher

personnel turnover at JTWC compared to NHC. Regardless, as BCON goes, so goes the

official forecasts; at both forecast centers.

As in the Atlantic, the UKMO and NOGAPS models are not as skillful as consensus and

are ~ 25-30% worse. The model loss against BCON shows greater variability and more

inter-model disagreement compared to the Atlantic. This is another sign of greater

sensitivity to the modeling in WPAC.

16

2.3 Summary of trends in track prediction and the relationship with consensus

The main result of this review of medium-range dynamical track prediction during the

period 1992-2008 is that while the models/consensus have steadily improved, no

individual model or single deterministic run has ever been greater than 10% better than

consensus in any one year or one basin and on a 5-10 y time scale is typically 15-20%

worse.

3.0 Dependence of ECMWF track prediction skill on model changes – it really is the physics We now consider two changes to the ECMWF model that would be expected to affect TC

track prediction – horizontal resolution and model physics, especially convection. In

February of 2006, model resolution increased from T511 to T799 or approximately from

dx~40 km to dx~ 25 km. The second change involved the cumulus parameterization in

November 2007.

We collect about one year’s worth of tropical cyclone forecasts before and after each

model change to determine if there are detectable impacts on medium-range track

prediction. The period between the resolution increase to T799 (February 2006) and the

physics change (November 2007) is 21 mo and includes two northern Hemisphere

(NHEM) seasons and one southern Hemisphere (SHEM). However, the number of

NHEM cases in this longer period is not much larger than in the one-year periods because

of unusually weak TC activity in 2007. Details of the periods and model changes are

given in Table 3, but note that number of verifiable model forecast at 72 h is similar.

Thus, intercomparison between the three periods will not be overly biased by differences

in number of cases.

Table 3. The three time periods considered and main characteristics of the ECMWF model.

Time Period (yyyymm)

model changes # cases at 72 h

color code

200502-200602 T511 resolution (dx ~ 40 km) 400 red

17

200602-200711 T799 resolution (dx ~ 25 km) 460 yellow200711-200811 T799 resolution + modified cumulus

convection (CC) 428 green

The mean FE at the standard forecast times for the model versus best/baseline consensus,

BCON, for all TCs is given in Fig. 10 mainly for completeness. While the lower error for

the model (light-shaded bar) versus consensus after the physics change (green bars) is

apparent, showing the relative gain brings the difference into sharper focus.

Fig. 10. Mean forecast error (FE) for all TCs globally of the ECMWF model and Best/Baseline Consensus (BCON) for each time period. red - T511; yellow - T799; green - T799 + convection change. The lighter-shaded bar is for the model the heavy shading for BCON. The main result is shown in Fig. 11 – the % gain/loss of the model versus consensus

calculated in the same way as in Figs. 6 and 9, but here for three versions of the ECMWF

model; again for all TCs globally.

18

First note how the T511 version of the model (red bar) was about 20-15% worse (higher

mean FE) than BCON at all forecast times. This relationship with consensus is typical or

slightly better than the best models in the Atlantic (Fig. 6) which is 15-25% poorer.

The resolution increase to T799 (yellow v red bars) made a distinct improvement in

relative skill, particularly at the medium-range so that by 120 h the model was on par, or

better than consensus. The gains at the longer forecast times likely come from model

improvements, e.g., slower error growth.

Fig. 11. % gain or improvement in mean forecast error (FE) of ECMWF v BCON for all TCs globally. red - T511; yellow - T799; green - T799 + convection change. The 15-20% gain after the physics changes in November 2007 (green v yellow bars) is

simply unprecedented and indicates a fundamental advancement in performance for the

ECMWF model. The gains at the short-range are particularly impressive and imply an

improved analysis as well as model.

19

One important requirement for successful data assimilation is small “innovations” or the

difference between observations and the model forecast background. A model that

makes a short-range forecast (typically 6 h) close to the observations will produce smaller

innovations and smaller innovations have a higher probability of improving the model

analysis and subsequent forecasts.

Simply put, the better the model, the smaller the innovations and the better the

analysis/forecast, especially in the short-range (12-36 h for TCs).

The model changes in November 2007 resulted in a fundamental improvement on

scales/meteorology significant to TC track prediction. While the tropical wind score does

show a jump in 2007 (Fig. 2), the change is not as dramatic, possibly because the gains

are regional.

20

Fig. 12. as in Fig. 11 except for the Atlantic (LANT) basin.

21

Fig. 13. as in Fig. 12 except for the western North Pacific basin (WPAC). Figs. 12 and 13 show the same % gain/loss relative to BCON as in Fig. 11, but separately

for the Atlantic and WPAC. The pattern of change is similar to the global pattern of

improvements at the longer forecast times with increased resolution, and the model better

than consensus at all times with the physics change. The difference is that pattern is

much stronger in WPAC and even stronger in SHEM (not shown).

The 20-30% gain in WPAC with the physics changes is more extreme than found

globally (Fig. 11) and may be partly explained by a stronger influence of convection on

the tropical large-scales in WPAC. We also found a greater response to modeling in the

two other dynamical models in reviewing the errors in Fig. 7. Finally, some of the

models used in BCON saw a drop off in skill in 2008 and that would tend to exaggerate

the % gain.

22

The more muted response in the Atlantic may be a consequence of approaching an

asymptote in skill as the mean 72-h FE for BCON FE in 2008 was very low at 129 nm

and had the greatest-ever improvement over CLIPER at 63%. Despite these high levels

of skill, the ECMWF model in 2008 achieved an even lower mean FE of 116 nm.

Another way to put these statistics in perspective is to consider predictability studies,

most notably Leslie et al., 1998 where they used nonlinear systems theory and both a

barotropic and baroclinic model to estimate “inherent” predictability limits. Their

motivation was partially to define a point of diminishing returns for the development of

TC-specific NWP models. At 72 h all three techniques produced estimates of

approximately 120 nm. There was some dependence on basin, but no more than a 5%

difference, so that the 120 nm mean FE is representative of the lower bound on error.

Leslie et al., 1998 also compared the estimates to the error of NWP models circa 1995

and found that the models were within 35-40% of the inherent limit. Clearly, this

estimate is either too high or the ECMWF model is approaching the “perfect model” as

the model is below the limit. The weaker impact in the Atlantic is consistent with

approaching a limit.

4.0 Summary and implications for medium-range TC track prediction We have examined recent trends in dynamical medium-range TC track prediction and the

relative role of model resolution v physics. The medium-range (72 h or day 3) was the

focus for two basic reasons: 1) any TC forecast aid must first make good track forecasts

before second-order properties such as maximum surface wind speed (intensity) can be

considered in the official forecast, i.e., all aspects of the forecast must be physically

reasonable and consistent; and 2) by 72-h into the integration, model errors become

dominant, i.e., a good analysis cannot overcome model errors.

The near halving in mean 72-h track forecast error for both the models and the official

forecasts from ~300 nm in the mid 1990s to ~150 nm in the mid 2000s is a strong

23

testament to the incredible advances in global model NWP, especially in the tropical

general circulation. This approximately 100% improvement in medium-range TC track

prediction is far greater than in the mid-latitudes, e.g., the 5-d 500 mb NHEM anomaly

correlation for ECMWF increased from ~ 0.75 to to 0.88 or ~ 15% improvement.

A consensus or simple averaging of the track forecasts from multiply NWP center model

runs was found to have higher skill than any individual model and that the model was

typically 20% worse at 72 h than consensus. However, recent results from ECMWF

global model put this long-standing relationship into question.

In November 2007 significant changes were made to the ECMWF model physics,

including the parameterization of cumulus convection. By comparing TC track forecasts

before and after changes in the ECMWF model physics, we found a dramatic

improvement in medium-range track, especially in the convectively more active,

monsoon-trough TC basins of WPAC and the SHEM. The improvement in the Atlantic

was less pronounced, probably due to approaching an asymptote in skill as the model and

consensus forecasts in 2008 had the lowest errors in history and below predictability

estimates from the 1990s. We also found that an increase in the ECMWF model

resolution in February 2006 had a much smaller impact.

The implication of these ECMWF model results for the future of TC track prediction and

hurricane model forecast improvement are many fold and strongly challenge

conventional wisdom.

The first notion is that high spatial resolution is a necessary or even a sufficient condition

for TC prediction. For TC simulation, the inner core must be resolved, but in terms of

motion the global model results indicate that the resolution of ECMWF model (~ 25 km)

is adequate as this model outperformed both a higher-resolution global model (JMA) and

the limited-area models (GFDL and HWRF).

24

The second implication is that TC motion becomes a global problem sooner than

previously thought, so that by 72 h small changes in the large-scale, far from the storm,

have a significant affect on track, and that global-scale information must be accurately

communicated into a limited-area model. However, the lateral boundary conditions

cannot be mathematically formulated to do this communication accurately (Harrison and

Elsberry 1972) and even if it were possible, there would be still be a “physics” barrier

because of differences in the parameterizations/physics between the global and limited-

area model.

The most accurate solution is a two-way interactive nest inside the global model, as

opposed to the current one-way influence of a separate global model on a different

limited-area model. The forecast time at which global scales errors significantly degrade

the limited-area model solution could be as early as 48 h, in which case running such

one-way influence models past 48 h is counter indicated.

The consensus approach to deterministic forecasting has been very successful over the

last 9 years and has motivated the application of single- and multi-model ensembles

systems to improve consensus by adding solutions with higher skill and greater error

decorrelation. However, the ECMWF model globally was 20% better than consensus in

2008 – a staggering achievement for an NWP model. Hitherto, the models were 20%

worse. The third suggestion is that the path to better forecasts may not lie in ensembles

and that we must have better understanding of how the ECMWF model smashed through

the predictability limits from the 1990s.

Fourth and finally, the reasonable assumption that skill, especially for intensity, is

critically dependent on the analysis of the TC vortex is questionable. ECMWF is the

only operational NWP center that does nothing special to analyze the TC wind structure.

Other modeling systems use either synthetic observations or wholesale vortex

replacement. One explanation why the ECMWF “less is more” approach gives better

track forecasts is that vortex specification is distorting the larger-scale flow around the

cyclone and thereby adding error, albeit small, on scales that vortex motions are sensitive.

25

In the current era of a huge observing system and accurate models, small errors do matter

and the TC analysis problem may be far more challenging than previously considered.

26

Acknowledgements The TC best track and model data come from the two US operational TC forecast centers, the Joint Typhoon Warning Center (JTWC) and the National Hurricane Center (NHC). Thanks go to Ed Fukada and Angelo Alvarez of JTWC and Chris Sisko at NHC for their help in accessing the TC data.

27

Appendix A. Details on ECMWF model changes from www.ecmwf.int and the UKMO operational global version of the Unified Model. A.1 ECMWF Physics Changes 1989-1990 http://www.ecmwf.int/products/data/operational_system/evolution/evolution_1989.html

2 May 1989 Forecast mode cycle 32. A set of three important modifications to the model's physics was implemented in the operational forecasting system:

1. A new parametrisation scheme for radiative fluxes and the representation of cloud optical. properties.

2. A reformulation of cumulus parametrisation using the mass flux approach.

http://www.ecmwf.int/products/data/operational_system/evolution/evolution_1990.html

5 June 1990 The following changes were implemented in the forecast model cycle 36:

1. The parametrisation of surface fluxes at low wind speed over sea was modified by replacing the transfer co-efficients for heat and moisture in unstable conditions (free convection limit). The change has had a considerable impact on the latent heat flux which, over the Western Pacific (warm pool), can increase by up to 25 W/m2. Synoptically it primarily affects the tropical flow which becomes more realistic at lower levels (e.g. improved monsoon flow) as well as at upper levels (reduced zonal men wind error);

2. 3. The formulation for the convective cloud cover was modified to account for non-

precipitating shallow cumuli. The effects of this change are an increase of total cloud cover (most noticeable in the trades and over the continents in daytime), a reduction of continental precipitation, and enhancement of evaporation over subtropical oceans (by 5-10 W/m2).

http://www.ecmwf.int/products/data/operational_system/evolution/evolution_1991.html

9 April 1991 The following modifications were implemented in the forecast model cycle 38:

3.0 Two revisions to the convection scheme, to ensure consistent cloud physics within the cloud model, and to introduce a cloudtop temperature check to improve the onset determination of shower precipitation.

28

A.2 ECMWF Resolution change February 2006 From: http://www.ecmwf.int/products/data/operational_system/evolution/evolution_2006.html

1 February 2006 Introduction of Cycle 30r1. This version includes the following changes:

• Increase in horizontal resolution to T799 for the deterministic forecast and the outer loops of 4D-Var.

• Increase in vertical resolution to 91 levels for the deterministic model. Model top raised to 0.01 hPa.

• Increase in horizontal resolution to T255 for the second inner loop of 4D-Var. • Increase in horizontal resolution to T399 and in vertical resolution to 62 levels

(model top approx. 5 hPa) for the EPS. • Increase in horizontal resolution to 0.36 degrees for the global ocean wave

model.

• Increase wave spectral resolution to 24 directions and 30 frequencies (from 12 and 25 respectively) for the EPS ocean wave model.

• Use of grid-point humidity and ozone in 4D-Var. • Revised coefficients (version 2.3) for the linearised ozone chemistry scheme,

supplied by Daniel Cariolle, CERFACS. • Use of Jason altimeter wave height data and ENVISAT ASAR spectra in the

wave model data assimilation. ERS-2 SAR spectra are no longer assimilated.

The impact of the new cycle on the performance of the deterministic forecast system has been tested on more than 360 cases between July 2004 and December 2005.

The impact on the performance of the forecasting system has been evaluated comprehensively checking out all components of the system. There are small but statistically significant improvements to deterministic 500 hPa height forecasts over Europe and for both extra-tropical hemispheres out to day 6. Overall impact on weather parameters is modest, but precipitation scores over Europe are consistently improved, in particular for large daily rainfall amounts.

Tropical temperature and wind forecasts are generally better when verified against radiosondes. Tropical cyclone position and intensity forecasts are improved in the high resolution system with some spectacular results for the intense hurricanes, such as Katrina, which hit the US Gulf Coast last autumn.

There are increases in the number of observations used in the assimilation with better analysis and background departure statistics. Extra data also had a significant beneficial

29

impact on the ocean wave analyses. Verification against independent observations confirmed improvements in the ocean wave forecasts.

The Ensemble Prediction System has been tested for some 30 cases in July and August 2005 and again in the e-suite over the period 17 October to 30 November 2005. The overall benefit of the higher resolution for the EPS is reflected in the results for the probability scores which are consistently improved for 500hPa height anomalies. In a similar way, the evaluation of precipitation events over Europe gave better results for the e-suite with respect to the Brier and ROC area scores.

A.3 ECMWF Physics changes November 2007 From: http://www.ecmwf.int/products/data/operational_system/evolution/evolution_2007.html#6November2007

6 November 2007 Introduction of Cycle 32r3 The new cycle includes significant changes to the model physics, including the convection scheme, with a beneficial increase in model activity globally, but in particular in the tropics.

The main changes included in this cycle are:

• New formulation of convective entrainment and relaxation timescale • Reduction in free atmosphere vertical diffusion • New soil hydrology scheme • New radiosonde temperature and humidity bias correction • Increase in number of radio occultation data from COSMIC • Assimilation of AMSR-E, TMI, SSMIS window channels (clear sky) • Assimilation of SBUV ( NOAA-17, NOAA-18) and monitoring of OMI ozone data

The impact of the new cycle on the performance of the forecasting system has been tested in research mode for the months of January to May 2007, and in pre-operational runs over the period June to September 2007. The new cycle shows improved scores for the extra-tropical southern hemisphere throughout the forecast range, and for the northern hemisphere in the first half of the forecast. There is a substantial improvement to the lower tropospheric winds in the tropics compared to observations. Surface weather parameters have improved in the new system, in particular through a reduction in the biases of 2m temperature mainly over the Americas and better precipitation forecasts over Europe.

The main changes to the EPS included in this cycle are:

• initial perturbation amplitude reduced by 30% • singular vectors targeted on tropical cyclones are computed with the new moist

physics package in the tangent-linear and adjoint models (as used in the operational in 4D- Var since cycle 32r2).

30

The reduction of the initial perturbation amplitude resulted in a good match of spread and ensemble mean error. The magnitude of the reduction was based on preliminary experimentation over 13 summer cases and 13 winter cases. The EPS has also been evaluated in pre-operational runs during June to September 2007. Results confirm that the reduction in initial spread leads to a generally improved match between spread and average ensemble mean error throughout the forecast range for the extra-tropics, with slight over-activity towards day 15. Probabilistic skill is generally improved at mid-latitudes. Preliminary experimentation with the new moist physics package over 27 summer cases indicates an improved reliability of the tropical cyclone strike probability forecasts and neutral impact in the extra-tropics. ROC statistics indicate that these changes also had a beneficial impact on probabilistic precipitation forecasts over Europe.

A.4 UKMO global version of Unified Model Changes 1999-2008 (Heming 2008, personal communication) 29 March 1999: 3D-Var introduced and use of ATOVS data. 27 June 2000: MOSES surface scheme introduced 7 August 2002: New Dynamics package 11 December 2002: New version of MOSES and introduce QuikScat 26 May 2004: AIRS data assimilated 5 October 2004: 4D-Var introduced 18 January 2005: HadGEM physics introduced 13 December 2005: Increase resolution N320 (~40km) 50 levels 14 March 2006: Big package of satellite data and physics changes 15 May 2007: More physics changes 27 November 2007: Use of IASI & ASCAT and OSTIA SST analysis 25 November 2008 (TODAY!): Big package of changes to model, assimilation and observations (too numerous to mention). Today's change is the last science change before the move to our new supercomputer and upgrade to ~25km and 70 levels next year. from: http://www.metoffice.gov.uk/research/nwp/publications/mosac/doc-2007-01a.pdf

G15 28 January 1998 • Enhanced resolution: 60km, 30 levels

G16 15 April 1998 • Preliminary runs introduced in place of LAM

G17 12 May 1998 • Specification of orography corrected

31

over Antarctica.

G18 22 Sept 1998 • GWD constant = 6600 • Reduced correlation scale in Tropics • CAPE closure convection (1h adjustment scale)

G18 9 February 1999 • New mean sea level pressure diagnostic

G19 29 March 1999 • 3d VAR introduced • Use of ATOVS soundings from the AMSU instrument on NOAA15 commenced

G19 13 July 1999 • Sea-ice analysis from SSM/I installed

G20 20 July 1999 3d VAR/OBS upgrade:

• Revised covariances • Use of ATOVS over Siberia • Thinned scatwinds to one per analysis grid

G21 19 October 1999 3d VAR/OBS upgrade: • SSMI windspeeds • TOVS & ATOVS radiances in VAR • Improved processing of PMSL - greater use of P_surface • Use of Antarctic obs + Use of winds at pole • Improved covariance model • Improved descent algorithm for nearly quadratic J • Aircraft obs thinning &

32

revised obs errors

G22 17 May 2000 3d VAR/OBS upgrade: • Use of observed rather than retrieved radiance from TOVS/ATOVS • Revised background error covariance model • Use of model background at correct observation time.

G23 27 June 2000 • MOSES I surface scheme implemented

G24 13 Feb 2001 3d VAR/OBS upgrade: • Use of scatterometer winds from ERS • Improved humidity error correlations Increased use of AMSU-A radiances in cloudy areas • Introduction of surface wind speeds from a second SSM/I satellite • Introduction of wind profiler information

G25 18 April 2001 3d VAR/OBS upgrade: • NOAA16 ATOVS data replaces NOAA14 TOVS data • Begin using data from AMSU-B instrument on NOAA15 and NOAA16.

G26 16 Oct 2001 3d VAR/OBS upgrade: • Error assumed for Satob winds doubled • Change in thinning strategy for ATOVS • Removal of a bug in the surface type

33

associated with ATOVS radiances • Use of fractional sea ice in ATOVS processing

G27 7 Aug 2002 • Introduction of semi-implicit non-hydrostatic dynamical formulation

using HADAM4 physics

G28 1 Oct 2002 • Commence assimilation

of NOAA-17 ATOVS

G29 3 Dec 2002 • Merge of global and

preliminary suites. Global model data cut-off now 1h 50min for all runs.

• Digital filter to initialise model at T+0, rather than

incremental analysis update to T+3

Cycle Date Definition

G32 26 May 2004 • Introduction of AIRS radiances. • Increased use of ATOVS in early data cut-off runs through introduction of EUMETSAT ATOVS Retransmission Service radiances over Europe and the North Atlantic. Increased use of ATOVS over land and in the tropics through more accurate radiative transfer modelling and bias correction.

G33 4 Oct 2004 • Introduction of 4D-Var

G34 18 Jan 2005 • Remaining HadGEM1 physics – 8B BL, 3C precipitation, 30m CAPE closure

34

G35 8 Feb 2005 • DA upgrade – LH release in PF model, MODIS winds, correct RTTOV-7 radiation calculations

G36 14 Jun 2005 • DA and satellite upgrade – updated COV stats, GOES BUFR replaces SATOB, new SST data.

G37 17 Aug 2005 • Soil nudging scheme - Assimilation of soil moisture using soil increments derived from screen level temperature and humidity increments. • Introduce NOAA-18 and withdraw AQUA AMSU.

G38 13 Dec 2005 • Resolution upgrade - 40km (N320) , 50 levels

G39 14 Mar 2006 • Re-introduce AIRS and ERS-2 Scat. Replace MeteoSat7 with MeteoSat8 satwinds. • Tropical improvements - Physics upgrade, Valley cooling fix

G40 14 June 2006 • Data assimilation upgrade • Improved soil moisture nudging • Boundary layer bug fixes

G41 26 Sep 2006 • Satellite changes - SSMIS and GPS RO

G42 5 Dec 2006 • Satellite changes - more ATOVS + add GPS-RO • Statospheric stability

35

package: (Spectral Gravity wave drag; Increased N-S halo; removal ramped upper diffusion) • Convective cloud decay • PMSL correction; Surface/soil and 10m wind gust diagnostics

G43 6 Mar 2007

• Data Assimilation Upgrade including first use of METOP (ATOVS)

G44 15 May 2007

Model Physics Upgrade including • Introduce biogenic aerosols climatology • bare soil albedo based on MODIS • moisture flows into level below saturated or frozen soli rather than lost as run-off • seasonal varying leaf area index (LAI) Assimilation of COSMIC GPS radio occultation data

G45 14 Aug 2007

Data Assimilation upgrade including • Improved use of GPSRO, AIRS and ATOVS • Use of hourly SYNOPs • Improved PF model physics (convection)

2 Oct 2007 New SST analysis (OSTIA)

36

Appendix B. Some details on model track output post-processing for operational applications.

A numerical model is always run after the synoptic time to allow observations to reach

the center. The delay is typically 3 h, after which the data assimilation is performed to

create the model initial conditions and forecast model is run. The total delay between the

valid time of the model analysis and availability of a 6 day forecast is about 5 h. Thus,

for a 06Z official forecast, only the 00Z model integration would be available and this

“late” model needs to be post-processed to bring the model guidance 6 h forward in time

to be coincident with the official forecast.

The typical way to move the forecast track forward is through interpolation and/or

offsetting e.g., relabel the t=6 h forecast position as t=0, t=18 to t=12 and so on to t=126

= 120. Not all model trackers produce output every 6 h and interpolation may be needed.

Also note that a typically error growth rate of mean forecast error is about 4% / 6h. Thus,

the 6-h interpolated track will have slightly less skill (~ 4%) than the original model track

The post-processing involves more than time interpolation (and extrapolation) of the

track, specifically:

1. linear interpolation to 3-h and then extrapolate the last position 12-h forward in

time. 2. smooth the 3-h interpolated track 3. relabel the forecast times, so that for a 6-h correction, set Mf(τ)=Ms(τ +6h) where

Mf is the finall model latitiude/longitude, τ is the forecast time, and Ms is the smooth 3-h track.

4. take a difference between the model initial latitude/longitude and the operational estimate and remove this “offset” from every forecast position. This step removes the initial position error and puts all models at the same starting point

5. bias correct the τ = 12 h position by adding 1/3 of the 12 h position based on the operational current speed and motion using a rhumb line, Mmotion(12h), i.e., Mf(12h)=2/3*Mf(12h) + 1/3*Mmotion(12h)

While the post-processing used in this paper is similar to the what is done in

operationally at NHC, it is somewhat better in that the mean error from my post

37

processing is ~1% lower than from operations, especially at τ =12 because of the bias

correction in step 5.

The main objection of the post-processing is to insure an “apples-to-apples” comparison,

but it also gives results that are more likely to be found in operational forecasting.

My naming convention is to add ‘06’ to the model name/symbol to indicate a 6 h

forward-in-time post-processing.

Appendix C. Details on track data The primary source of forecast track data is the operational ATCF

(http://www.nrlmry.navy.mil/atcf_web/docs/database/new/database.html ) data files at

both NHC and JTWC. The ECMWF model trackers are run at ECMWF and were

provided directly from ECMWF for the years 2004-2005 and for 2006-present from the

BUFR files transmitted from ECMWF to the WMO tropical cyclone forecast centers via

the WMO GTS communication system.

The definition of BCON was first based on availability in the ATCF data files. The

consensus aids run by JTWC always form a consensus even if only two models are

available, whereas the first con models at NHC required all members be present. In 2004

a variable member consensus was introduced at NHC and at this point I start using this

consensus aid. The problem with fixed member consensus is availability. The typical

POD (probability of detection or “coverage”) for the fixed con model GUNA is ~ 70%

whereas for CONU it is closer to 95%. For model/consensus comparisons it is fairer to

compare models with comparable POD.

basin years ATCF

name comment

WPAC/JTWC 2000-2001 NCON first consensus aid, part of SAFA system “ 2002 CONU 7-model consensus “ 2003-2008 CONW 10-model consensus including a barotropic

model and the Australian limited-area tropical model

38

LANT/NHC 2000 GUNS 3-model consensus requiring all models to form a consensus

LANT/NHC 2001-2003 GUNA 4-model consensus requiring all models to form a consensus

LANT/NHC 2004-2007 CONU 5-model consensus requiring only 2 models to form a consensus

“ 2008 TVCN 6-model consensus

39

References DeMaria, M., M.B. Lawrence and J.T. Kroll, 1990: An error analysis of Atlantic tropical

cyclone track guidance models. Weather and Forecasting, 5, 47-61. ECMWF 2008: forecast verification web:

http://www.ecmwf.int/products/forecasts/d/charts/medium/verification/timeseries/z_score_monthly_mean!Tropics!run!12!pop!od!oper!w_scores!latest!/

Goerss, J. S., 2000: Tropical cyclone track forecasts using an ensemble of dynamical

models. Mon. Wea. Rev., 128, 1187-1193. Goress, J. S., C. R. Sampson and J. M. Gross, 2004: A history of western North Pacifc

tropical cyclone track forecast skill. Wea. Forecasting, 19, 633-638. Harrison, E.J., Jr. and R.L. Elsberry, 1972: A method to incorporating nested finite grids

in the solution of systems of geophysical equations. J. Atmos. Sci., 29, 1235-1245. Leslie, L. M., R.F. Abbey and G.J. Holland, 1998: Tropical cyclone track predictability.

Meteorol. Atmos. Phys., 65, 223-231. Serrano, E. 1998: Tropical cyclones. ERA-15 Project Report Series, 5, 30 pp. ECMWF

available at: http://www.ecmwf.int/publications/library/do/references/show?id=83257

Uppala, S.M., P.W. Kållberg, A.J. Simmons, U. Andrae, V. da Costa Bechtold, M.

Fiorino, J.K Gibson, J. Haseler, A. Hernandez, G.A. Kelly, X. Li, K. Onogi, S. Saarinen, N. Sokka, R.P. Allan, E. Andersson, K. Arpe, M.A. Balmaseda, A.C.M. Beljaars, L. van de Berg, J. Bidlot, N. Bormann, S. Caires, F. Chevallier, A. Dethof, M. Dragosavac, M. Fisher, M. Fuentes, S. Hagemann, E. Hólm, B.J. Hoskins, L. Isaksen, P.A.E.M. Janssen, R. Jenne, A.P. McNally, J.-F. Mahfouf, J.-J. Morcrette, N.A Rayner, R.W. Saunders, P. Simon, A. Sterl, K.E. Trenberth, A. Untch, D. Vasiljevic, P. Viterbo and J. Woollen, 2005: The ERA-40 Reanalysis. Q. J. R. Meteorol. Soc., 131(612), 2961-3012.

Documents

Recent trends in dynamical medium-range tropical …...1 Recent trends in dynamical medium-range tropical cyclone track prediction and the role of resolution versus physics in the