Up Around the Bend: Linear and nonlinear models of the UK economy compared

This article was downloaded by: [Memorial University ofNewfoundland]On: 06 October 2014, At: 22:44Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number:1072954 Registered office: Mortimer House, 37-41 Mortimer Street,London W1T 3JH, UK

International Review ofApplied EconomicsPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/cira20

Up Around the Bend:Linear and nonlinearmodels of the UK economycomparedGeraint JohnesPublished online: 21 Jul 2010.

To cite this article: Geraint Johnes (2000) Up Around the Bend: Linear andnonlinear models of the UK economy compared, International Review ofApplied Economics, 14:4, 485-493

To link to this article: http://dx.doi.org/10.1080/02692170050150156

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of allthe information (the “Content”) contained in the publications on ourplatform. However, Taylor & Francis, our agents, and our licensorsmake no representations or warranties whatsoever as to the accuracy,completeness, or suitability for any purpose of the Content. Anyopinions and views expressed in this publication are the opinions andviews of the authors, and are not the views of or endorsed by Taylor& Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information.Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilitieswhatsoever or howsoever caused arising directly or indirectly inconnection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private studypurposes. Any substantial or systematic reproduction, redistribution,reselling, loan, sub-licensing, systematic supply, or distribution in any

http://www.tandfonline.com/loi/cira20

http://dx.doi.org/10.1080/02692170050150156

form to anyone is expressly forbidden. Terms & Conditions of accessand use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

ISSN 0269-2171 print/ISSN 1465-3486 online/00/040485-09 � 2000 Taylor & Francis Ltd

International Review of Applied Economics, Vol. 14, No. 4, 2000

Geraint Johnes, The Management School, Lancaster University, Lancaster LA1 4YX, UK.

Up Around the Bend: linear and nonlinear models

of the UK economy compared

GERAINT JOHNES

ABSTRACT A variety of methods Ð including vector autoregression (B ayesian and non-

B ayesian) and neural networks Ð are used to construct models of the UK economy, and

their forecasting performance is compared.

1. Introduction

Recent work on economic time series has concerned the investigation of

nonlinearities in the data (see, for example, Mills, 1991, 1995; Byers & Peel, 1994;

Peel & Speight, 1997). Typically such studies involve the use of threshold

autoregressive estimators of the type developed by Tong (1990), and have, in the

main, involved analyses of the time series properties of a single variable. An

alternative method of analysis draws on the literature on neural networks (White,

1992; Lachtermacher & Fuller, 1995), and has appeal for two reasons. First, neural

networks do not impose any specific form of nonlinearity, but rather they `let the

data speak’ ; the universal approximation property of such networks allows the

systems thus modelled to capture whatever nonlinearity exists in the data. Secondly,

multivariate analyses are especially straightforward to conduct within the framework

of a neural network. Interesting examples of the application of neural networks to

the case of time series forecasting include the work of Hill et al. (1994, 1996),

Haefke & Helmenstein (1996) and Swanson & White (1997).

The aim of the present paper is to construct both linear and neural network

models, which provide a simple representation of the UK economy. The forecasting

ability of the various models is compared.

2. Data

The main disadvantage of neural networks concerns their severe data requirements.

In engineering and artificial intelligence applications, models of this kind often use

many hundreds or thousands of data points. With the exception of financial series,

it is rare to find such rich data sources in economics, but a long run of monthly data

may be found for several key variables, and it is these series that are exploited in the

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

486 G. Johnes

present paper. The data are all monthly series taken from the OECD database, and

cover the period January 1961 through April 1997; estimation is conducted using

data to December 1994, the remaining observations being retained for forecast

evaluation.1 The variables are defined as follows: the growth rate of industrial

production (y ); the logged unemployment rate (u ); price inflation (p ); and the

logged rate of interest on 3 month loans (r).2

It might be worth commenting at this stage on the theory that underlies this

choice of variables. Consider a standard aggregate-supply aggregate-demand

model. The supply side of the economy is frequently represented by a Phillips curve,

which may differ across the short and long runs; this provides a relationship between

output, prices and unemployment. The demand side may be thought of, in the short

run at least, as depicting a series of ISLM relationships in which the interest rate

measures the stance of government demand management policy and where the level

of output responds to this stimulus. Our four variables thus fall neatly into such a

framework, although it should be noted that y is chosen as a proxy for national

output because the former is, unlike the latter, available as a monthly series.

However, to place excessive emphasis on the theory would be misleading.

Much of the appeal of the forecasting methods used here is that they are (in large

measure) atheoretical. The failure of conventional macroeconomics to arrive at a

robust model with strong microeconomic underpinnings has led many observers to

prefer purely statistical models for the purpose of forecasting.

2.1 Order of Integ ration of the Seasonal Data

It is appropriate at this stage to investigate the properties of the four time series by

conducting tests of seasonal integration by invoking the procedure suggested by

Osborn et al. (1988). This involves running the regression

D1Dj x = j1(D j x ) ± 1 + j2(D1 x ) ± j + j3(D1Dj x ) ± 1 + seasonal terms + error (1)

where j, for monthly data, equals 12. The t-statistic on j1 (t1 , say) is the non-

seasonal unit root test, while that on j2 (t2 , say) is the seasonal unit root test. The

null hypothesis that x is an I(1,1) series is rejected in favour of the alternative that:

x is I(1,0) if j1 ³ 0 and j2 < 0; x is I(0,1) if j1 < 0 and j2 ³ 0; or x is I(0,0) if j1

< 0 and j2 < 0 Ð all, of course, at the chosen level of significance. This suite of tests

is commonly referred to by the OCSB acronym.

The test statistics reported in Table 1 indicate that all four variables are I(1,0);

that is, first differencing is necessary and sufficient to produce a stationary series.

This is an extremely common finding for seasonal economic series (see, for

example, Salisu & Balasubramanyam, 1997).

2.2 Testing for Cointegration

In the absence of cointegration between our four variables, a common forecasting

procedure would be to conduct a vector autoregression (VAR) on the first

differences. This is indeed the procedure pursued by Holden & Broomhead (1990)

in their study of VAR forecasts for the UK economy. It is not, however, a valid

procedure to adopt when cointegrating relationships are present in the data. To

check for this, the Johansen (1988) procedure is used. This procedure is conducted

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

Linear and nonlinear models of the UK economy 487

as follows. Let X consist of variables y, u, p and r, and estimate the parameters of the

equation

DX = S iGiDX ± i + PX ± k + « (2)

where « is a vector of residuals and k is the order of the test. The matrix P can be

represented as

P = ab9 (3)

where b may be referred to as the cointegrating matrix that provides information

about the long run relationships between the variables in the model. To be specific,

the columns of b are the cointegrating vectors. Where the rank of the matrix P,

denoted by r, is less than the number of variables in the VAR model, the number of

cointegrating relationships in the data equals r. As long as r > 0, therefore, at least

some subsets of the variables are cointegrated.

Johansen suggests two likelihood ratio tests that may be used to establish the

rank of P. Denote by Ãli the ith ranked eigenvalue associated with the equation

z lT ± 1 ST

t = 1Rk tR9k t ± T

± 1 ST

t = 1Rk tR90 t{T

± 1 ST

t = 1R0 tR90 t}

± 1T± 1 S

T

t = 1R0 tR9k t z = 0 (4)

where R0 t is the vector of residuals from the regressions of DX on its own lagged

values, Rk t is the vector of residuals from the regression of X ± k on lagged values of

DX, n is the number of variables and where T is the number of periods in the sample.

Hence the Ãli may be interpreted as the squared canonical correlations between the

residuals defined above. The statistic

± T Sn

i = r + 11n(1 ± Ãli ) (5)

may be used to test the null hypothesis that there are at most r cointegrating vectors,

where n is the number of variables, and T is the number of periods in the sample.

The same null hypothesis may also be tested using the statistic

± T 1n(1 ± Ãlr + 1 ) (6)

Table 1. OCSB tests of seasonal integration

Variable

Level

t1 t2

First difference

t1 t2

y ± 1.77 ± 26.08 ± 10.92 ± 16.38u 3.41 ± 30.49 ± 5.40 ± 25.55p 6.53 ± 21.62 ± 8.72 ± 13.66r 0.71 ± 21.31 ± 9.22 ± 16.69

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

488 G. Johnes

These tests are commonly referred to respectively as the trace test and the

maximum eigenvalue (lm ax ) test. Critical values are provided by Johansen &

Juselius (1990).3

The values of the trace and maximal eigenvalue test statistics, obtained using

the CATS in RATS software, are reported in Table 2 for values of r up to three.

Both the lm ax and the trace tests indicate that there exist two cointegrating

vectors Ð that is, the test values exceed the corresponding critical values when

testing r £ 1, but not when testing r £ 2, against the appropriate alternative. Since

there is therefore a cointegrating relationship between the variables in our model,

the straightforward VAR on the first differences must be eschewed in favour of a

vector error correction mechanism (VECM ) or cointegrated VAR approach.

3. Estimation

The parameters of the equation

DX = S iC iDX ± i + V X ± 1 + «* (7)

are estimated with a VAR length of four periods, that is i = 1,. . ., 4. The choice of

VAR length is intended to conserve degrees of freedom.4 Note that Equation (7) is

formally equivalent to Equation (2).

Since the introduction to the literature of the VAR approach (Sims, 1980),

many authors have noted the danger that such systems can overparameterise.

Hence, a model that fits the data well in-sample may nevertheless be a poor

forecaster. To mitigate this problem, Doan et al. (1984) have proposed an approach

in which prior beliefs about the parameter values in the VAR systems are modified

in Bayesian fashion. These so-called BVAR systems require the researcher to set a

number of hyperparameters that govern both the priors themselves and the

tightness with which the model adheres to these prior beliefs (usually measured by

the standard deviation on own first lag). A particularly simple variant of the BVAR

approach is to assume, as a prior, a symmetric matrix of coefficients with ones along

the main diagonal and a constant, z, 0 £ z £ 1, elsewhere; the hyperparameters to

be chosen by the researcher are then just two, namely tightness and z. A very high

value of the tightness hyperparameter reduces the BVAR to a VAR, while a less

permissive tightness hyperparameter attaches more weight to prior beliefs. In order

clearly to separate out the VAR and BVAR approaches, we adopt a fairly low value

Table 2. Johansen tests for the number of cointegration vectors

Null hypothesis: r £ Eigenvalue lm ax

5% criticalvalue forlm ax test Trace

5% criticalvalue fortrace test

0 0.1684 71.35 28.17 109.41 53.351 0.0613 24.46 21.89 38.06 35.072 0.0236 9.23 15.75 13.60 20.173 0.0112 4.36 9.09 4.36 9.09

Note: critical values are taken from Johansen & Juselius (1990, table A3).

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4


for tightness in the following. Moreover, it would appear reasonable to adopt an

agnostic prior view about whether or not our model comprises variables that are

truly interdependent; this informs our choice of prior for z. Clearly, both the VAR

and BVAR approaches assume relationships between the variables in the model to

be linear.

An alternative approach to forecasting, which has received much recent

attention, is to construct an artificial neural network (White, 1992). This method

has the considerable appeal that it allows the relationships between variables in

the system to be nonlinear, but Ð since some forms of network have universal

approximation properties Ð it does not require the imposition of assumptions

concerning the precise form of nonlinearity (White et al., 1989). Neural networks

of this kind typically comprise a layer of inputs, a layer of hidden neurodes, and

a layer of output neurodes; each input is connected to each neurode in the hidden

layer, and each neurode in the hidden layer is connected to each neurode in the

output layer. Information flows unidirectionally from the input, through the

hidden, to the output layer; hence we refer to a single hidden layer feedforward

network. The network processes data as follows: information supplied to each

neurode in the hidden and output layers is first combined as a weighted average.

This weighted average then undergoes a simple nonlinear transformation (that is,

a `squashing’ function is employed) to provide the output from the neurode; in

the present case, the logistic transformation is used. The weights used to combine

inputs into each neurode prior to squashing are estimated using the method of

backpropagation (Rumelhart et al., 1986)5, convergence to the solution being

governed by two parameters, a and b, known respectively as the momentum and

the learning constant; the backpropagation algorithm is described more fully in

the appendix. Since all inputs into this neural network should lie within the unit

interval, the data on Dx ± i and x ± 1 , i = 1,. . ., 4 and x = y,p,u,r , are transformed by

the rule x* = ex/(1 + ex ). The transformed variables constitute the input layer; the

output layer consists of the one- step ahead forecasts of the (transformed) Dx,

x = y,p,u,r. While, in principle, a neural network ought to provide at least as good

a forecaster as a VAR, especially for long range forecasts, the possibility of over-

parameterisation means that linear methods could, in practice, outperform the

network.

Four forecasters are compared below. These are:

d a naÈõ ve forecast in which y, p, u and r are assumed to remain unchanged over the

forecasting period;

d a VAR of the form given by Equation (7);

d a BVAR of the same form, with z = 0.5 and tightness equal to 0.1;

d a feedforward neural network with 10 neurodes in a single hidden layer with a =

0.3 and b = 0.1.6

The VAR and BVAR models are estimated using RATS, while the neural

network is trained using a specially written program which may be found at

http://www.lancs.ac.uk/people/ecagj/nmodel.for.

The above forecasters represent a reasonable cross-section of system forecast-

ing models currently available. A notable absentee, however, is the family of

deterministic regularisation methods, which include splines, kernel estimation and

local regression; these have been widely discussed elsewhere (see, for example,

Young & Pedregal, 1999; Hackl & Westlund, 1996).

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

490 G. Johnes

Table 3. Forecasting performance

Industrialproduction

Priceinflation

Unemploymentrate

Interestrate

6 months ahead:Neural network

RMSE 3.0274 0.4803 0.6103 0.4478GRMSE 1.9226 0.3234 0.3906 0.2539VAR

RMSE 5.1172 1.0141 0.5606 0.8575GRMSE 3.2150 0.4259 0.4173 0.5398BVAR

RMSE 4.6491 0.7724 0.5183 0.9927GRMSE 2.9705 0.3974 0.3837 0.7781NaÈõ ve

RMSE 3.0335 0.5266 0.6647 0.4454GRMSE 1.8088 0.4281 0.4465 0.0000


RMSE 3.8169 0.6193 1.0099 0.6623GRMSE 2.4728 0.3287 0.8104 0.3087VAR

RMSE 5.2642 0.7310 0.7723 1.5271GRMSE 2.7404 0.4497 0.4059 1.4320BVAR


RMSE 3.8361 0.7662 1.0738 0.6281GRMSE 2.5006 0.3006 0.8726 0.3573


RMSE 4.6442 0.6321 1.4802 0.7096GRMSE 2.2831 0.5147 1.3409 0.6354VAR

RMSE 4.9985 1.3957 0.8873 1.9176GRMSE 2.3687 1.1949 0.4506 1.8827BVAR


RMSE 4.6195 0.7889 1.5579 0.6507GRMSE 2.0463 0.7119 1.4193 0.5610


RMSE 4.8309 0.4269 2.1641 0.4704GRMSE 2.6932 0.3500 2.1469 0.4458VAR

RMSE 5.0817 2.0198 0.4930 1.9638GRMSE 3.6652 1.9958 0.2375 1.9638BVAR


RMSE 4.8180 0.5449 2.2550 0.4101GRMSE 2.7064 0.4746 2.2448 0.3690

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4


4. Forecasting Performance

Statistics on the forecasting performance of the various models are provided in Table

3, and refer to 6-, 12-, 18- and 24-month ahead forecasts. The statistics report the

root mean square error (RMSE), (S i«i2/n )1/ 2, and the geometric root mean square

error (GRMSE), (P i« i2 )1/2n , where « i is the error associated with the ith observation,

i = 1,. . ., n. These measures are discussed in detail by Fildes (1992). Before

constructing Table 3, the variables have each been transformed back to forms that

allow easy interpretation: 100 times the annual change in the log value of the

industrial production index, 100 times the annual change in the log value of the

consumer price index, the unemployment rate (%) and the interest rate (%).

Models based on (simple or Bayesian) VAR provide the best forecasts for

unemployment, and dominate the other methods more strongly as the length of the

forecasting period increases. The naÈõ ve model provides the best forecasts of the

interest rate, although the neural network forecaster is also considerably superior to

the VAR models in connection with this variable. In light of the literature on efficient

markets for financial variables, it is perhaps not surprising that the interest rate

should be well predicted by the naive forecaster. The neural network model is the

best forecaster of price inflation. In the case of industrial production growth there

is little to choose between the alternative models, all of which perform rather badly;

but the neural network appears to produce the best short term forecasts while the

BVAR model provides superior longer term forecasts for this variable. (The poor

performance of all models in the case of industrial production growth may be due

to the volatility of this variable during the out-of-sample period; the demeaned

variable has a Durbin ± Watson statistic of 2.354.)

The results reported here all refer to a model in which all variables are

simultaneously forecast. The outcome of the forecasting competition is strikingly

different from that of studies that have focused upon a single time series (see, for

example, Johnes, 1999). In the latter, neural networks often appear to provide a very

promising forecaster, especially as the forecasting period lengthens.

5. Conclusions

That the performance of the neural network model relative to the other forecasters

in this competition should deteriorate with an increase in the number of steps ahead

being forecast is somewhat surprising and is at variance with the findings of other

studies (for example, Johnes, 1999). While it is clear that no one forecaster

unambiguously outperforms its competitors, and that neural networks offer a

potentially useful addition to the toolkit of economic forecasters, it is difficult to

escape the conclusion that, for the forecasters tested here, simple specifications

typically perform as well, and sometimes better, than the new breed of nonlinear

models. Nevertheless, as is typically the case with econometric forecasting models,

the prediction errors resulting from all methods examined here are substantial.

Notes

Without implication, the author thanks two referees for helpful comments.

1. In neural network terminology, the `training set’ comprises data points through the end of 1994. A

referee has suggested that it would be preferable to test forecasting using a rolling window approach.

This argument has its merits, but in view of the fact that the available number of observations is small

by the standards of neural network analysis, it is a route that is eschewed here.

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

492 G. Johnes

2. Two of these variables, y and p, are measured as the twelfth difference in logged values of the levels

(of industrial output and the consumer price index respectively). Use of the levels data prior to testing

for the order of integration of these variables is eschewed in the light of earlier work that suggests that

output and price series are typically I(2). The twelfth difference is preferred to first differences because

of the possible presence of seasonal effects.

3. The relevant critical values are also reproduced in Table 2. It should be noted that the cointegration

tests reported here assume linearity, while the neural network approach adopted in the following

accommodates linearity only as a special case. Since the aim of the paper is to compare forecasts using

the standard techniques of linear VARs and nonlinear networks, it nonetheless seems appropriate to

use the standard tests to establish the nature of the VECM which is to be used.

4. While a VAR of length 4 entails the estimation of just 84 coefficients, the corresponding neural

network, with 10 neurodes in the hidden layer, involves the estimation of some 280 weights.

5. Backpropagation provides a particularly simple algorithm. A number of alternatives, many of which

allow faster computation, are available; some of these are discussed by Sexton et al. (1999 ).

6. These values for momentum and the learning constant were chosen to secure reasonably fast

convergence of the backpropagation algorithm.

References

Byers, J.D. & Peel, D.A. (1994) Non-linear adjustment of real wages, employment and output in the UK,

Applied Economics, 26, pp. 459 ± 463. Doan, T.A., Litterman, R.B. & Sims, C.A. (1984) Forecasting

and conditional projection using realistic prior distributions, Econometric Reviews, 3, pp. 1 ± 144.

Fildes, R. (199 2) The evaluation of extrapolative forecasting methods, International Journal of Forecasting,

8, pp.81 ± 98.

Hackl, P. & Westlund, A.H. (1996) Demand for international telecommunication: time-varying price

elasticity, Journal of Econometrics, 70, pp. 243 ± 260.

Haefke, C. & Helmenstein, C. (1996) Forecasting Austrian IPOs: an application of linear and neural

network error-correction models, Journal of Forecasting, 15, pp. 237± 251.

Hill, T., Marquez, L., O’ Connor, M. & Remus, W. (1994) Artificial neural network models for

forecasting and decision making, International Journal of Forecasting, 10, pp. 5 ± 15.

Hill, T., O’Connor, M. & Remus, W. (1996) Neural network models for time series forecasts,

Management Science, 42, pp. 1082 ± 1092.

Holden, K. & Broomhead, A. (1990) An examination of vector autoregressive forecasts for the UK

economy, International Journal of Forecasting, 6, pp. 11 ± 23.

Johnes, G. (1999 ) Forecasting unemployment, Applied Economics Letters, 6, pp. 605 ± 607.

Johansen, S. (1988 ) Statistical analysis of cointegration vectors, Journal of Economic Dynamics and

Control, 12, pp. 231± 254.

Johansen, S. & Juselius, K. (1990 ) The full information maximum likelihood procedure for inference on

cointegration with applications to the demand for money, Oxford Bulletin of Economics and Statistics, 52,

pp. 169 ± 210.

Lachtermacher, G. & Fuller, J.D. (1995) Backpropagation in time-series forecasting, Journal of

Forecasting , 14, pp. 381± 393.

Mills, T.C. (1991 ) Nonlinear time series models in economics, Journal of Economic Surveys, 5, pp.

215 ± 242.

Mills, T.C. (1995) Are there asymmetries or nonlinearities in UK output?, Applied Economics, 27, pp.

1211 ± 1217.

Osborn, D.R., Chui, A.P.L., Smith, J.P. & Birchenhall, C.R. (1988) Seasonality and the order of

integration for consumption, Oxford Bulletin of Economics and Statistics, 50, pp. 361 ± 377.

Peel, D.A. & Speight, A.E.H. (1997) Threshold nonlinearities in unemployment rates: models and

forecasts for the UK and G3, University of Wales Swansea, Department of Economics Discussion

Paper 97± 02.

Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986) Learning representations by backpropagating

errors, Nature, 323, pp. 533 ± 536.

Salisu, M.A. & Balasubramanyam, V.N. (1997) Income and price elasticities of demand for alcoholic

drinks, Applied Economics Letters, 4, pp. 247± 251.

Sexton, R.S., Dorsey, R.E. & Johneson, J.D. (1999) Optimisation of neural networks: a comparative

analysis of the genetic algorithm and simulated annealing, European Journal of Operational Research,

114, pp. 589 ± 601.

Sims, C.A. (1980) Macroeconomics and reality, Econometrica, 48, pp. 1± 48.

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4


Swanson, N.F. & White, H. (1997) Forecasting economic time series using flexible versus fixed

specifications and linear versus nonlinear econometric models, International Journal of Forecasting, 13,

pp. 439 ± 462.

Tong, H. (1990 ) Non-linear Time Series: a dynamical system approach (Oxford: Oxford University Press).

White, H., Hornik, K. & Stinchcombe, M.B. (1989) Multilayer feedforward networks are universal

approximators, Neural Networks, 2, pp. 359 ± 366; this also forms chapter 3 of White (1992).

White, H. (1992) Artificial Neural Networks: approximation and learning theory (Oxford, Blackwell).

Young, P. & Pedregal, D. (1999) Recursive and en-bloc approaches to signal extraction, Journal of Applied

Statistics, 26, pp. 103 ± 128.

Appendix: The Backpropagation Algorithm

The output of each neurode in the network is found by squashing a weighted

average of the inputs into that neurode. The weights used in this averaging process

constitute the parameters of the model to be estimated. Initially these weights

assume values that are set at random. The backpropagation procedure described

below is then used repeatedly to refine the estimates of the weights until the absolute

within-sample forecasting errors for each variable lie below prescribed levels. In the

present instance, the largest absolute within-sample error for the transformed

variables are required to be less than 5 ´ 10 ± 5 (for industrial production growth and

price inflation) or 5 ´ 10 ± 4 (for rates of interest and unemployment).

Backpropagation of the errors in order to adjust the estimates of the weights

proceeds as follows. Let the weight assigned by the bth hidden layer neurode to the

cth input be denoted by W b c; let the weight assigned by the dth output neurode to the

signal from the bth hidden layer neurode be denoted by V d b; and let the single step-

ahead forecast error attached to the dth output neurode be denoted by md . This last

term is backpropagated through the network so that the weights W bc and Vd b , [??015]

b,c,d, are adjusted according to the learning laws:

DVdb = pmd Qb + aD ÄVd b

DW bc = p{Sdmd Vd b}Q c + a D ÄW b c

where Q p represents the output of the pth neurode and where the value of a variable

in the previous round of the algorithmic process is denoted by a tilde ( ~ ).

Dow

nloa

ded

by [

Mem

oria

l Uni

vers

ity o

f N

ewfo

undl

and]

at 2

2:44

06

Oct

ober

201

4

Documents

Up Around the Bend: Linear and nonlinear models of the UK economy compared