Upload
geraint
View
213
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [Memorial University ofNewfoundland]On: 06 October 2014, At: 22:44Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number:1072954 Registered office: Mortimer House, 37-41 Mortimer Street,London W1T 3JH, UK
International Review ofApplied EconomicsPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/cira20
Up Around the Bend:Linear and nonlinearmodels of the UK economycomparedGeraint JohnesPublished online: 21 Jul 2010.
To cite this article: Geraint Johnes (2000) Up Around the Bend: Linear andnonlinear models of the UK economy compared, International Review ofApplied Economics, 14:4, 485-493
To link to this article: http://dx.doi.org/10.1080/02692170050150156
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of allthe information (the “Content”) contained in the publications on ourplatform. However, Taylor & Francis, our agents, and our licensorsmake no representations or warranties whatsoever as to the accuracy,completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions andviews of the authors, and are not the views of or endorsed by Taylor& Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information.Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilitieswhatsoever or howsoever caused arising directly or indirectly inconnection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private studypurposes. Any substantial or systematic reproduction, redistribution,reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of accessand use can be found at http://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
ISSN 0269-2171 print/ISSN 1465-3486 online/00/040485-09 � 2000 Taylor & Francis Ltd
International Review of Applied Economics, Vol. 14, No. 4, 2000
Geraint Johnes, The Management School, Lancaster University, Lancaster LA1 4YX, UK.
Up Around the Bend: linear and nonlinear models
of the UK economy compared
GERAINT JOHNES
ABSTRACT A variety of methods Ð including vector autoregression (B ayesian and non-
B ayesian) and neural networks Ð are used to construct models of the UK economy, and
their forecasting performance is compared.
1. Introduction
Recent work on economic time series has concerned the investigation of
nonlinearities in the data (see, for example, Mills, 1991, 1995; Byers & Peel, 1994;
Peel & Speight, 1997). Typically such studies involve the use of threshold
autoregressive estimators of the type developed by Tong (1990), and have, in the
main, involved analyses of the time series properties of a single variable. An
alternative method of analysis draws on the literature on neural networks (White,
1992; Lachtermacher & Fuller, 1995), and has appeal for two reasons. First, neural
networks do not impose any specific form of nonlinearity, but rather they `let the
data speak’ ; the universal approximation property of such networks allows the
systems thus modelled to capture whatever nonlinearity exists in the data. Secondly,
multivariate analyses are especially straightforward to conduct within the framework
of a neural network. Interesting examples of the application of neural networks to
the case of time series forecasting include the work of Hill et al. (1994, 1996),
Haefke & Helmenstein (1996) and Swanson & White (1997).
The aim of the present paper is to construct both linear and neural network
models, which provide a simple representation of the UK economy. The forecasting
ability of the various models is compared.
2. Data
The main disadvantage of neural networks concerns their severe data requirements.
In engineering and artificial intelligence applications, models of this kind often use
many hundreds or thousands of data points. With the exception of financial series,
it is rare to find such rich data sources in economics, but a long run of monthly data
may be found for several key variables, and it is these series that are exploited in the
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
486 G. Johnes
present paper. The data are all monthly series taken from the OECD database, and
cover the period January 1961 through April 1997; estimation is conducted using
data to December 1994, the remaining observations being retained for forecast
evaluation.1 The variables are defined as follows: the growth rate of industrial
production (y ); the logged unemployment rate (u ); price inflation (p ); and the
logged rate of interest on 3 month loans (r).2
It might be worth commenting at this stage on the theory that underlies this
choice of variables. Consider a standard aggregate-supply aggregate-demand
model. The supply side of the economy is frequently represented by a Phillips curve,
which may differ across the short and long runs; this provides a relationship between
output, prices and unemployment. The demand side may be thought of, in the short
run at least, as depicting a series of ISLM relationships in which the interest rate
measures the stance of government demand management policy and where the level
of output responds to this stimulus. Our four variables thus fall neatly into such a
framework, although it should be noted that y is chosen as a proxy for national
output because the former is, unlike the latter, available as a monthly series.
However, to place excessive emphasis on the theory would be misleading.
Much of the appeal of the forecasting methods used here is that they are (in large
measure) atheoretical. The failure of conventional macroeconomics to arrive at a
robust model with strong microeconomic underpinnings has led many observers to
prefer purely statistical models for the purpose of forecasting.
2.1 Order of Integ ration of the Seasonal Data
It is appropriate at this stage to investigate the properties of the four time series by
conducting tests of seasonal integration by invoking the procedure suggested by
Osborn et al. (1988). This involves running the regression
D1Dj x = j1(D j x ) ± 1 + j2(D1 x ) ± j + j3(D1Dj x ) ± 1 + seasonal terms + error (1)
where j, for monthly data, equals 12. The t-statistic on j1 (t1 , say) is the non-
seasonal unit root test, while that on j2 (t2 , say) is the seasonal unit root test. The
null hypothesis that x is an I(1,1) series is rejected in favour of the alternative that:
x is I(1,0) if j1 ³ 0 and j2 < 0; x is I(0,1) if j1 < 0 and j2 ³ 0; or x is I(0,0) if j1
< 0 and j2 < 0 Ð all, of course, at the chosen level of significance. This suite of tests
is commonly referred to by the OCSB acronym.
The test statistics reported in Table 1 indicate that all four variables are I(1,0);
that is, first differencing is necessary and sufficient to produce a stationary series.
This is an extremely common finding for seasonal economic series (see, for
example, Salisu & Balasubramanyam, 1997).
2.2 Testing for Cointegration
In the absence of cointegration between our four variables, a common forecasting
procedure would be to conduct a vector autoregression (VAR) on the first
differences. This is indeed the procedure pursued by Holden & Broomhead (1990)
in their study of VAR forecasts for the UK economy. It is not, however, a valid
procedure to adopt when cointegrating relationships are present in the data. To
check for this, the Johansen (1988) procedure is used. This procedure is conducted
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
Linear and nonlinear models of the UK economy 487
as follows. Let X consist of variables y, u, p and r, and estimate the parameters of the
equation
DX = S iGiDX ± i + PX ± k + « (2)
where « is a vector of residuals and k is the order of the test. The matrix P can be
represented as
P = ab9 (3)
where b may be referred to as the cointegrating matrix that provides information
about the long run relationships between the variables in the model. To be specific,
the columns of b are the cointegrating vectors. Where the rank of the matrix P,
denoted by r, is less than the number of variables in the VAR model, the number of
cointegrating relationships in the data equals r. As long as r > 0, therefore, at least
some subsets of the variables are cointegrated.
Johansen suggests two likelihood ratio tests that may be used to establish the
rank of P. Denote by Ãli the ith ranked eigenvalue associated with the equation
z lT ± 1 ST
t = 1Rk tR9k t ± T
± 1 ST
t = 1Rk tR90 t{T
± 1 ST
t = 1R0 tR90 t}
± 1T± 1 S
T
t = 1R0 tR9k t z = 0 (4)
where R0 t is the vector of residuals from the regressions of DX on its own lagged
values, Rk t is the vector of residuals from the regression of X ± k on lagged values of
DX, n is the number of variables and where T is the number of periods in the sample.
Hence the Ãli may be interpreted as the squared canonical correlations between the
residuals defined above. The statistic
± T Sn
i = r + 11n(1 ± Ãli ) (5)
may be used to test the null hypothesis that there are at most r cointegrating vectors,
where n is the number of variables, and T is the number of periods in the sample.
The same null hypothesis may also be tested using the statistic
± T 1n(1 ± Ãlr + 1 ) (6)
Table 1. OCSB tests of seasonal integration
Variable
Level
t1 t2
First difference
t1 t2
y ± 1.77 ± 26.08 ± 10.92 ± 16.38u 3.41 ± 30.49 ± 5.40 ± 25.55p 6.53 ± 21.62 ± 8.72 ± 13.66r 0.71 ± 21.31 ± 9.22 ± 16.69
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
488 G. Johnes
These tests are commonly referred to respectively as the trace test and the
maximum eigenvalue (lm ax ) test. Critical values are provided by Johansen &
Juselius (1990).3
The values of the trace and maximal eigenvalue test statistics, obtained using
the CATS in RATS software, are reported in Table 2 for values of r up to three.
Both the lm ax and the trace tests indicate that there exist two cointegrating
vectors Ð that is, the test values exceed the corresponding critical values when
testing r £ 1, but not when testing r £ 2, against the appropriate alternative. Since
there is therefore a cointegrating relationship between the variables in our model,
the straightforward VAR on the first differences must be eschewed in favour of a
vector error correction mechanism (VECM ) or cointegrated VAR approach.
3. Estimation
The parameters of the equation
DX = S iC iDX ± i + V X ± 1 + «* (7)
are estimated with a VAR length of four periods, that is i = 1,. . ., 4. The choice of
VAR length is intended to conserve degrees of freedom.4 Note that Equation (7) is
formally equivalent to Equation (2).
Since the introduction to the literature of the VAR approach (Sims, 1980),
many authors have noted the danger that such systems can overparameterise.
Hence, a model that fits the data well in-sample may nevertheless be a poor
forecaster. To mitigate this problem, Doan et al. (1984) have proposed an approach
in which prior beliefs about the parameter values in the VAR systems are modified
in Bayesian fashion. These so-called BVAR systems require the researcher to set a
number of hyperparameters that govern both the priors themselves and the
tightness with which the model adheres to these prior beliefs (usually measured by
the standard deviation on own first lag). A particularly simple variant of the BVAR
approach is to assume, as a prior, a symmetric matrix of coefficients with ones along
the main diagonal and a constant, z, 0 £ z £ 1, elsewhere; the hyperparameters to
be chosen by the researcher are then just two, namely tightness and z. A very high
value of the tightness hyperparameter reduces the BVAR to a VAR, while a less
permissive tightness hyperparameter attaches more weight to prior beliefs. In order
clearly to separate out the VAR and BVAR approaches, we adopt a fairly low value
Table 2. Johansen tests for the number of cointegration vectors
Null hypothesis: r £ Eigenvalue lm ax
5% criticalvalue forlm ax test Trace
5% criticalvalue fortrace test
0 0.1684 71.35 28.17 109.41 53.351 0.0613 24.46 21.89 38.06 35.072 0.0236 9.23 15.75 13.60 20.173 0.0112 4.36 9.09 4.36 9.09
Note: critical values are taken from Johansen & Juselius (1990, table A3).
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
Linear and nonlinear models of the UK economy 489
for tightness in the following. Moreover, it would appear reasonable to adopt an
agnostic prior view about whether or not our model comprises variables that are
truly interdependent; this informs our choice of prior for z. Clearly, both the VAR
and BVAR approaches assume relationships between the variables in the model to
be linear.
An alternative approach to forecasting, which has received much recent
attention, is to construct an artificial neural network (White, 1992). This method
has the considerable appeal that it allows the relationships between variables in
the system to be nonlinear, but Ð since some forms of network have universal
approximation properties Ð it does not require the imposition of assumptions
concerning the precise form of nonlinearity (White et al., 1989). Neural networks
of this kind typically comprise a layer of inputs, a layer of hidden neurodes, and
a layer of output neurodes; each input is connected to each neurode in the hidden
layer, and each neurode in the hidden layer is connected to each neurode in the
output layer. Information flows unidirectionally from the input, through the
hidden, to the output layer; hence we refer to a single hidden layer feedforward
network. The network processes data as follows: information supplied to each
neurode in the hidden and output layers is first combined as a weighted average.
This weighted average then undergoes a simple nonlinear transformation (that is,
a `squashing’ function is employed) to provide the output from the neurode; in
the present case, the logistic transformation is used. The weights used to combine
inputs into each neurode prior to squashing are estimated using the method of
backpropagation (Rumelhart et al., 1986)5, convergence to the solution being
governed by two parameters, a and b, known respectively as the momentum and
the learning constant; the backpropagation algorithm is described more fully in
the appendix. Since all inputs into this neural network should lie within the unit
interval, the data on Dx ± i and x ± 1 , i = 1,. . ., 4 and x = y,p,u,r , are transformed by
the rule x* = ex/(1 + ex ). The transformed variables constitute the input layer; the
output layer consists of the one- step ahead forecasts of the (transformed) Dx,
x = y,p,u,r. While, in principle, a neural network ought to provide at least as good
a forecaster as a VAR, especially for long range forecasts, the possibility of over-
parameterisation means that linear methods could, in practice, outperform the
network.
Four forecasters are compared below. These are:
d a naÈõ ve forecast in which y, p, u and r are assumed to remain unchanged over the
forecasting period;
d a VAR of the form given by Equation (7);
d a BVAR of the same form, with z = 0.5 and tightness equal to 0.1;
d a feedforward neural network with 10 neurodes in a single hidden layer with a =
0.3 and b = 0.1.6
The VAR and BVAR models are estimated using RATS, while the neural
network is trained using a specially written program which may be found at
http://www.lancs.ac.uk/people/ecagj/nmodel.for.
The above forecasters represent a reasonable cross-section of system forecast-
ing models currently available. A notable absentee, however, is the family of
deterministic regularisation methods, which include splines, kernel estimation and
local regression; these have been widely discussed elsewhere (see, for example,
Young & Pedregal, 1999; Hackl & Westlund, 1996).
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
490 G. Johnes
Table 3. Forecasting performance
Industrialproduction
Priceinflation
Unemploymentrate
Interestrate
6 months ahead:Neural network
RMSE 3.0274 0.4803 0.6103 0.4478GRMSE 1.9226 0.3234 0.3906 0.2539VAR
RMSE 5.1172 1.0141 0.5606 0.8575GRMSE 3.2150 0.4259 0.4173 0.5398BVAR
RMSE 4.6491 0.7724 0.5183 0.9927GRMSE 2.9705 0.3974 0.3837 0.7781NaÈõ ve
RMSE 3.0335 0.5266 0.6647 0.4454GRMSE 1.8088 0.4281 0.4465 0.0000
12 months ahead:Neural network
RMSE 3.8169 0.6193 1.0099 0.6623GRMSE 2.4728 0.3287 0.8104 0.3087VAR
RMSE 5.2642 0.7310 0.7723 1.5271GRMSE 2.7404 0.4497 0.4059 1.4320BVAR
RMSE 4.3364 0.6263 0.6438 1.7243GRMSE 2.5956 0.3125 0.4286 1.6492NaÈõ ve
RMSE 3.8361 0.7662 1.0738 0.6281GRMSE 2.5006 0.3006 0.8726 0.3573
18 months ahead:Neural network
RMSE 4.6442 0.6321 1.4802 0.7096GRMSE 2.2831 0.5147 1.3409 0.6354VAR
RMSE 4.9985 1.3957 0.8873 1.9176GRMSE 2.3687 1.1949 0.4506 1.8827BVAR
RMSE 4.1682 1.0955 0.6779 2.0860GRMSE 2.2194 1.0081 0.5044 2.0539NaÈõ ve
RMSE 4.6195 0.7889 1.5579 0.6507GRMSE 2.0463 0.7119 1.4193 0.5610
24 months ahead:Neural network
RMSE 4.8309 0.4269 2.1641 0.4704GRMSE 2.6932 0.3500 2.1469 0.4458VAR
RMSE 5.0817 2.0198 0.4930 1.9638GRMSE 3.6652 1.9958 0.2375 1.9638BVAR
RMSE 4.3627 1.4301 0.3031 2.0709GRMSE 2.5253 1.4043 0.2449 2.0667NaÈõ ve
RMSE 4.8180 0.5449 2.2550 0.4101GRMSE 2.7064 0.4746 2.2448 0.3690
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
Linear and nonlinear models of the UK economy 491
4. Forecasting Performance
Statistics on the forecasting performance of the various models are provided in Table
3, and refer to 6-, 12-, 18- and 24-month ahead forecasts. The statistics report the
root mean square error (RMSE), (S i«i2/n )1/ 2, and the geometric root mean square
error (GRMSE), (P i« i2 )1/2n , where « i is the error associated with the ith observation,
i = 1,. . ., n. These measures are discussed in detail by Fildes (1992). Before
constructing Table 3, the variables have each been transformed back to forms that
allow easy interpretation: 100 times the annual change in the log value of the
industrial production index, 100 times the annual change in the log value of the
consumer price index, the unemployment rate (%) and the interest rate (%).
Models based on (simple or Bayesian) VAR provide the best forecasts for
unemployment, and dominate the other methods more strongly as the length of the
forecasting period increases. The naÈõ ve model provides the best forecasts of the
interest rate, although the neural network forecaster is also considerably superior to
the VAR models in connection with this variable. In light of the literature on efficient
markets for financial variables, it is perhaps not surprising that the interest rate
should be well predicted by the naive forecaster. The neural network model is the
best forecaster of price inflation. In the case of industrial production growth there
is little to choose between the alternative models, all of which perform rather badly;
but the neural network appears to produce the best short term forecasts while the
BVAR model provides superior longer term forecasts for this variable. (The poor
performance of all models in the case of industrial production growth may be due
to the volatility of this variable during the out-of-sample period; the demeaned
variable has a Durbin ± Watson statistic of 2.354.)
The results reported here all refer to a model in which all variables are
simultaneously forecast. The outcome of the forecasting competition is strikingly
different from that of studies that have focused upon a single time series (see, for
example, Johnes, 1999). In the latter, neural networks often appear to provide a very
promising forecaster, especially as the forecasting period lengthens.
5. Conclusions
That the performance of the neural network model relative to the other forecasters
in this competition should deteriorate with an increase in the number of steps ahead
being forecast is somewhat surprising and is at variance with the findings of other
studies (for example, Johnes, 1999). While it is clear that no one forecaster
unambiguously outperforms its competitors, and that neural networks offer a
potentially useful addition to the toolkit of economic forecasters, it is difficult to
escape the conclusion that, for the forecasters tested here, simple specifications
typically perform as well, and sometimes better, than the new breed of nonlinear
models. Nevertheless, as is typically the case with econometric forecasting models,
the prediction errors resulting from all methods examined here are substantial.
Notes
Without implication, the author thanks two referees for helpful comments.
1. In neural network terminology, the `training set’ comprises data points through the end of 1994. A
referee has suggested that it would be preferable to test forecasting using a rolling window approach.
This argument has its merits, but in view of the fact that the available number of observations is small
by the standards of neural network analysis, it is a route that is eschewed here.
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
492 G. Johnes
2. Two of these variables, y and p, are measured as the twelfth difference in logged values of the levels
(of industrial output and the consumer price index respectively). Use of the levels data prior to testing
for the order of integration of these variables is eschewed in the light of earlier work that suggests that
output and price series are typically I(2). The twelfth difference is preferred to first differences because
of the possible presence of seasonal effects.
3. The relevant critical values are also reproduced in Table 2. It should be noted that the cointegration
tests reported here assume linearity, while the neural network approach adopted in the following
accommodates linearity only as a special case. Since the aim of the paper is to compare forecasts using
the standard techniques of linear VARs and nonlinear networks, it nonetheless seems appropriate to
use the standard tests to establish the nature of the VECM which is to be used.
4. While a VAR of length 4 entails the estimation of just 84 coefficients, the corresponding neural
network, with 10 neurodes in the hidden layer, involves the estimation of some 280 weights.
5. Backpropagation provides a particularly simple algorithm. A number of alternatives, many of which
allow faster computation, are available; some of these are discussed by Sexton et al. (1999 ).
6. These values for momentum and the learning constant were chosen to secure reasonably fast
convergence of the backpropagation algorithm.
References
Byers, J.D. & Peel, D.A. (1994) Non-linear adjustment of real wages, employment and output in the UK,
Applied Economics, 26, pp. 459 ± 463. Doan, T.A., Litterman, R.B. & Sims, C.A. (1984) Forecasting
and conditional projection using realistic prior distributions, Econometric Reviews, 3, pp. 1 ± 144.
Fildes, R. (199 2) The evaluation of extrapolative forecasting methods, International Journal of Forecasting,
8, pp.81 ± 98.
Hackl, P. & Westlund, A.H. (1996) Demand for international telecommunication: time-varying price
elasticity, Journal of Econometrics, 70, pp. 243 ± 260.
Haefke, C. & Helmenstein, C. (1996) Forecasting Austrian IPOs: an application of linear and neural
network error-correction models, Journal of Forecasting, 15, pp. 237± 251.
Hill, T., Marquez, L., O’ Connor, M. & Remus, W. (1994) Artificial neural network models for
forecasting and decision making, International Journal of Forecasting, 10, pp. 5 ± 15.
Hill, T., O’Connor, M. & Remus, W. (1996) Neural network models for time series forecasts,
Management Science, 42, pp. 1082 ± 1092.
Holden, K. & Broomhead, A. (1990) An examination of vector autoregressive forecasts for the UK
economy, International Journal of Forecasting, 6, pp. 11 ± 23.
Johnes, G. (1999 ) Forecasting unemployment, Applied Economics Letters, 6, pp. 605 ± 607.
Johansen, S. (1988 ) Statistical analysis of cointegration vectors, Journal of Economic Dynamics and
Control, 12, pp. 231± 254.
Johansen, S. & Juselius, K. (1990 ) The full information maximum likelihood procedure for inference on
cointegration with applications to the demand for money, Oxford Bulletin of Economics and Statistics, 52,
pp. 169 ± 210.
Lachtermacher, G. & Fuller, J.D. (1995) Backpropagation in time-series forecasting, Journal of
Forecasting , 14, pp. 381± 393.
Mills, T.C. (1991 ) Nonlinear time series models in economics, Journal of Economic Surveys, 5, pp.
215 ± 242.
Mills, T.C. (1995) Are there asymmetries or nonlinearities in UK output?, Applied Economics, 27, pp.
1211 ± 1217.
Osborn, D.R., Chui, A.P.L., Smith, J.P. & Birchenhall, C.R. (1988) Seasonality and the order of
integration for consumption, Oxford Bulletin of Economics and Statistics, 50, pp. 361 ± 377.
Peel, D.A. & Speight, A.E.H. (1997) Threshold nonlinearities in unemployment rates: models and
forecasts for the UK and G3, University of Wales Swansea, Department of Economics Discussion
Paper 97± 02.
Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986) Learning representations by backpropagating
errors, Nature, 323, pp. 533 ± 536.
Salisu, M.A. & Balasubramanyam, V.N. (1997) Income and price elasticities of demand for alcoholic
drinks, Applied Economics Letters, 4, pp. 247± 251.
Sexton, R.S., Dorsey, R.E. & Johneson, J.D. (1999) Optimisation of neural networks: a comparative
analysis of the genetic algorithm and simulated annealing, European Journal of Operational Research,
114, pp. 589 ± 601.
Sims, C.A. (1980) Macroeconomics and reality, Econometrica, 48, pp. 1± 48.
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4
Linear and nonlinear models of the UK economy 493
Swanson, N.F. & White, H. (1997) Forecasting economic time series using flexible versus fixed
specifications and linear versus nonlinear econometric models, International Journal of Forecasting, 13,
pp. 439 ± 462.
Tong, H. (1990 ) Non-linear Time Series: a dynamical system approach (Oxford: Oxford University Press).
White, H., Hornik, K. & Stinchcombe, M.B. (1989) Multilayer feedforward networks are universal
approximators, Neural Networks, 2, pp. 359 ± 366; this also forms chapter 3 of White (1992).
White, H. (1992) Artificial Neural Networks: approximation and learning theory (Oxford, Blackwell).
Young, P. & Pedregal, D. (1999) Recursive and en-bloc approaches to signal extraction, Journal of Applied
Statistics, 26, pp. 103 ± 128.
Appendix: The Backpropagation Algorithm
The output of each neurode in the network is found by squashing a weighted
average of the inputs into that neurode. The weights used in this averaging process
constitute the parameters of the model to be estimated. Initially these weights
assume values that are set at random. The backpropagation procedure described
below is then used repeatedly to refine the estimates of the weights until the absolute
within-sample forecasting errors for each variable lie below prescribed levels. In the
present instance, the largest absolute within-sample error for the transformed
variables are required to be less than 5 ´ 10 ± 5 (for industrial production growth and
price inflation) or 5 ´ 10 ± 4 (for rates of interest and unemployment).
Backpropagation of the errors in order to adjust the estimates of the weights
proceeds as follows. Let the weight assigned by the bth hidden layer neurode to the
cth input be denoted by W b c; let the weight assigned by the dth output neurode to the
signal from the bth hidden layer neurode be denoted by V d b; and let the single step-
ahead forecast error attached to the dth output neurode be denoted by md . This last
term is backpropagated through the network so that the weights W bc and Vd b , [??015]
b,c,d, are adjusted according to the learning laws:
DVdb = pmd Qb + aD ÄVd b
DW bc = p{Sdmd Vd b}Q c + a D ÄW b c
where Q p represents the output of the pth neurode and where the value of a variable
in the previous round of the algorithmic process is denoted by a tilde ( ~ ).
Dow
nloa
ded
by [
Mem
oria
l Uni
vers
ity o
f N
ewfo
undl
and]
at 2
2:44
06
Oct
ober
201
4