Spatiotemporal Forecasting of Neighborhood Median Residential …aiecon.org/conference/efmaci2004/pdf/Housing_Prices... · 2016-02-18 · Spatiotemporal Forecasting of Neighborhood

Spatiotemporal Forecasting of Neighborhood Median Residential Housing Prices

Mak Kaboudan School of Business, University of Redlands

1200 East Colton Avenue, Redlands, CA 92373

Tel: (909) 748-6349

Email: [email protected]

June 25, 2004

1

Spatiotemporal Forecasting of Neighborhood Median Residential Housing Prices

The market value of a house can change over time following movements in its neighborhood median

home price even when there was no change in its own attributes. Hedonic price model specifications can

capture deviations from neighborhood median prices by determining the market value of a house based on

its own attributes but may not capture neighborhood time dependent price variations. To forecast these

variations over time, spatiotemporal model specifications may be appropriate but their estimation is

statistically challenged by autocorrelation and heteroscedasticity problems. Techniques robust against

these problems can be employed instead. This paper compares the accuracy of two different

neighborhood median home price forecasts obtained by using two robust computational techniques,

genetic programming (GP) and neural networks (NN). GP is an optimization technique that delivers fairly

reliable forecasting models. NN use a universal class model that is easily trained to approximate the

dynamics of data well. GP and NN neighborhood median housing price forecasts are compared with those

of a standard statistical method – weighted least squares.

I. INTRODUCTION

Forecasting residential housing prices is important to lending institutions, tax assessors, and

homeowners or buyers among others who are affected by the real estate market dynamics. Yet, accurate

price-predictions remain a challenge, perhaps because models that explain variations in prices over time

as well as between neighborhoods and among houses can be rather complex. Most studies that forecast

prices of residential homes rely on property address level detailed data. Details at property level furnish

housing attributes upon which hedonic pricing models have been based for decades. Applications of

hedonic methods to the housing markets are plenty; see Goodman and Thibodeau (2003) for a recent

application. Advances based on hedonic methods include work on geographically weighted regression

models (Fotheringham et al., 2002) and work on local regression (or semi-parametric) models (Clapp et

al., 2002). Bin (2004) compares parametric versus semi-parametric hedonic regression models. Hedonic

models can be viewed as representations of localized nano-detailed pricing algorithms since they focus on

small-scale variations. Although these capture quantitative and qualitative characteristics of individual

2

properties (such as age, square footage, with garage or not, with air conditioning or not, etc.) and location

factors that impact the price of a house (Bin, 2004; Mason and Quigley, 1996), they may not explain

temporal price changes well.

Dynamical changes in housing prices are determined by supply and demand forces. Economic

variables such as income and mortgage rate are demand for housing determinants. Because they change

over time, prices of all residential houses change, ceteris paribus attributes. Therefore, identical attributes

should have different impacts on the price of a house at different time periods. This explains why

appraisers of a specific property adjust “current” neighborhood prices to allow for differences among

properties. If this is the case, it is reasonable to model local patterns of dependency that capture the

impact of large-scale variations of inter-neighborhood temporal changes in economic conditions on

neighborhood median prices first. Hedonic models that capture the impact of intra-neighborhood

variations to account for differences in age, square footage, etc. follow. A neighborhood median price

becomes one of the input variables in a hedonic model. Including temporal economic changes along with

differences in attributes in the same price equation is possible and has been done (see Clapp, 2004, for

example). However, including variables that explain large-scale variations (due to microeconomic

changes) as well as others that explain small-scale variations (due to differences in attributes) in the same

equation may create an identification problem that can be difficult to resolve. It may not be possible to

isolate the impact of general economic changes from the impact of specific attributes of a house on its

price. Estimating separate equations seems more appropriate. After estimating a neighborhood median

price model, estimating a hedonic price model follows. Given the ubiquity of hedonic models, focus here

is only on forecasting neighborhood median prices using microeconomic spatiotemporal specifications

that include one or more explanatory variables to discriminate between geographical locations.

Delivering timely forecasts of residential housing prices is important. A model that forecasts

“known information” is of little value to decision makers. Models capable of forecasting only one-step-

ahead (a month, quarter, or year) tend to deliver known information. By the time data on explanatory

3

variables needed to produce a forecast become available, that one-step-ahead forecast value has

materialized. To produce an extended forecast, typically “predicted values” of explanatory variables are

used. Using such an approach may deliver inaccurate forecasts because predicted values of input variables

are most probably imprecise. The alternative is to construct models that can produce forecasts for a few

steps ahead using “actual” rather than “fitted or forecasted” values of explanatory variables. In this paper;

it is shown that using actual values delivers reasonable forecasts of neighborhood residential housing

prices. Attention to this problem is not new, but the solution suggested is. Gençay and Yang (1996),

Clapp et al. (2002), and Bin (2004) emphasize evaluating out-of-sample predictions when comparing

performances of different forecasting models.

Techniques to model and forecast spatiotemporal series are far from being established. Getis and

Ord (1992), Anselin et al. (1996), and Longley and Batty (1996, pp. 227-230) discuss spatial

autocorrelation and other statistical complications encountered when analyzing or modeling spatial data.

Anselin (1998 and 1999) reviews potential solutions to some of these problems. However, further

complications occur when spatial data is taken over time. If such models account for the presence of low

dimensional nonlinear or chaotic dynamics, a rapid increase in forecast errors over time occurs due to

sensitivity to initial conditions. This issue was addressed by Rubin (1992) and Maros-Nikolaus and

Martin-González (2002).

Given that statistical problems hinder modeling and forecasting efforts when dealing with

spatiotemporal data, using modeling and forecasting techniques that circumvent statistical estimation of

model parameters is worthy of investigation. To determine their efficacy, two computational techniques,

genetic programming (GP) and artificial neural networks (NN), are tested and compared with weighted

least squares (WLS) modeling. GP is a computer algorithm that produces regression-type models (Koza,

1992). The traditional statistical calculations to estimate model coefficients and the restrictions imposed

by statistical models are totally absent. GP is a univariate modeling technique that typically delivers

nonlinear equations. They are difficult to interpret but forecast rather well. López et al. (2000) proposed

4

using a proper empirical orthogonal function decomposition to describe the dynamics of a system then

used GP to extract dynamical rules from the data. They apply GP to obtain one-step-ahead forecast of

confined spatiotemporal chaos. NN are also a computerized classification technique that delivers forecasts

but (unlike GP) without delivering a model. NN architecture is based on the human neural system. It is

programmed to go into a training iterative process designed to learn the dynamics of a system. NN is a

more established technique with superior power in fitting complex dynamics and has gained attention and

acceptance of many forecasters. Gopal and Fischer (1996) used NN in spatial forecasting. Rossini (2000)

used NN in forecasting residential housing prices employing hedonic-type specifications. Attraction to

these two computational techniques may be attributed to their robustness with respect to many statistical

problems that standard econometric or statistical modeling methods face. More specifically, they are

robust against problems of multicollinearity, autocorrelation, and non-stationarity.

This investigation implements a specification strategy designed to account for the effects of

temporal variations on future neighborhood median single-family residential home prices in the City of

Cambridge, MA. What is presented here is of exploratory nature. The main assumption is that changes in

real mortgage rate and in real per capita income have different effects on prices in the different

neighborhoods. Although only data available freely via the web was employed, results reported below

seem promising. The next Section contains a description of the data. Section III explains the univariate

model specification employed throughout. An introduction to GP and how it can be used in forecasting as

well as a review of NN follow. Forecast results using WLS, GP, and NN are compared in Section V. The

final Section contains concluding remarks. The results reported below are mixed. Forecasts using GP

were significantly more reasonable and more likely to occur than those obtained by NN and WLS.

II. INPUT DATA

The data utilized in this study was obtained via an Internet search to find relatively complete

information to use in developing a housing price model. Data on housing sales for the City of Cambridge,

5

MA, was found on a web page published by that city’s Community Development Department,

Community Planning Division (2003). The publication reports annual median prices of single-family

homes for the period 1993-2002 of twelve neighborhoods in Cambridge. Figure 1 shows thirteen

neighborhoods, but data was available only for twelve of them. There were no data for area 2. The

dependent variable to model and forecast is P(i)t or real median price of homes in neighborhood i sold at

time period t, where t = 1,…,T. (The consumer price index for Boston was used to deflate all nominal

data.) A neighborhood with less than three sales in any given year was deleted because it was easily

identified as an outlier. Data of 1993 and 1994 were lost degrees of freedom because all temporal

explanatory variables were taken with a minimum of two lags to obtain fitted values for the succeeding

two years without use of fitted or forecasted values. Remaining data after deleting outliers were then

divided into two sets. The first set uses 1995 to 2000 historical data to produce forecasting models. It

contained 67 observations. The other set contained 21 observations that were reserved to evaluate and

compare 2001 and 2002 out-of-sample forecasts. (Data starting 2003 were not made available until the

middle of 2004 when this paper was completed.)

The explanatory variables considered to capture variations in P(i)t include:

• MRt-2 = Two years lagged real mortgage rate. This temporal variable is constant for neighborhoods.

• PCI(i)t-2 = Neighborhood real per capita income lagged two years. This spatiotemporal variable varies

over time as well as among neighborhoods. Per capita income was not available by neighborhood but

by census tract. Figure 2 shows these tracts. To obtain representation values of each neighborhood’s

per capita income, an approximate congruency between maps of neighborhood and tracts in Figures 1

and 2 was used. Neighborhood 1 contains approximately three tracts (3521, 3522, and 3523), for

example. Their distribution was visually approximated. Tracts 3521, 3522, and 3523 were subjectively

assigned weights of 0.60, 0.30, and 0.10, respectively, to reflect their proportional neighborhood

shares. A weighted average per capita income was then computed for that neighborhood. Others were

similarly approximated. Using this method may be a source of measurement error. Since there is no

6

alternative, this solution was assumed reasonable given the contiguity of neighborhoods. The results

obtained and presented later seem to confirm that such PCI representation is adequate.

• DVPCI(i)t-2 = Twelve spatiotemporal dummy variables designed to capture the impact of changes in

real per capita income in neighborhood i at time period t-2 on that neighborhood median price in

period t. DVPCI(i)t = PCIt-2 * Wij, where Wij = 1 if i = j and zero otherwise for i = 1, .., n

neighborhoods and j = 1, …, k neighborhoods where n = k. Use of this type of dummy variable is

atypical. Goodman and Thibodeau (2003) use dummy variables to account for time of sale. Gençay

and Yang (1996) use a more standard Boolean dummy variables for (all but one) neighborhoods.

Boolean dummy variables capture intercept shifts. DVPCI(i) captures slope changes instead.

• Yt-2 = Real median household income for the city of Cambridge, MA. This temporal variable is

constant across neighborhoods.

• RNPRi = Relative neighborhood price ranking. It was computed by averaging each neighborhood’s

median prices over the period 1993-2000 first, sorting these averages second, and then assigning the

numbers 1 through 12 with “1” assigned to the lowest and “12” to the highest average. This integer

spatial variable varies by neighborhood only and serves as polygon ID.

• LATi = Latitude of the centroid of neighborhood i.

• LONi = Longitude of the centroid of neighborhood i.

• APt-2 = Average real median price of homes in the city of Cambridge, MA, lagged two years. This

temporal variable remains constant across neighborhoods.

• P(i)t-2 = Real median price lagged two periods.

Given that P(i)t and a few of the explanatory variables vary spatially, there should be some

accounting for spatial autocorrelation. Classical spatial autocorrelation measurement using the Moran

coefficient or the Geary ratio is not practical here since data is taken over time. Instead, spatial

autocorrelation was estimated using the following OLS regression model:

t tP(i) P(i 1)= α + ρ − (1)

7

where ρ measures the degree of autocorrelation. This equation provides a simple measure of

autocorrelation between pairs of contiguous neighbors over time. Autocorrelation is present if the

estimated ρ is significantly different from zero. The resulting equation using data 1995-2002 is:

t tP(i) 138.9 0.452 P(i 1)= + − . (2)

For both the intercept and estimate of ρ, p-value = 0.00. Equation (2) confirms the existence of spatial

autocorrelation between pairs of contiguous neighborhoods averaged over time.

III. BASIC MODEL SPECIFICATION

The basic model is a univariate specification designed to capture variations in P(i)t using mainly

economic data while incorporating spatial aspects, where P(i)t is a K x 1 vector with K = n*T. Formally:

t i t tP(i) f(S ,X ,Z(i) )= (3)

where iS is a set of spatial variables that vary only among neighborhoods (i), tX is a set of time series

variables that vary only over time (t), and tZ(i) are variables that vary over both. Accordingly, set

iS includes RNPRi, LATi, and LONi; set tX includes MRt-2, Yt-2, and APt-2; set tZ(i) includes DVPCI(i)t-2,

PCI(i)t-2, and P(i)t-2.

Forecasts from WLS were selected as base to evaluate the efficacy of forecasts from the two

nonparametric computational techniques GP and NN. Ordinary least squares (OLS) is a typical parametric

method to use. However, due to the pooling of cross-sectional with time series data and because spatial

correlation was detected, there is a good chance that the variance of the error is not constant over

observations. When this problem occurs, the model has heteroscedastic error disturbance. OLS parameter

estimators are unbiased and consistent but they are not efficient (Pindyck and Rubinfeld, 1998, p. 147).

The weighted least squares (WLS) method (which is a special case of generalized least squares, Pindyck

and Rubinfeld, 1998, p. 148) is used to correct for heteroscedasticity if it is detected. To test for

heteroscedasticity, the Goldfeld-Quandt test applies (Goldfeld and Quandt, 1965). The null hypothesis of

8

homoscedasticity is tested against the alternative of heteroscedasticity after available data is first divided

into two groups. A simple regression model is fit to each group using the dependent variable and a single

explanatory variable which is thought to be related to the error variance. The regressions are completed

after ranking available observations according to the independent variable and after deleting a middle

section of the data. Thusly, the price data were grouped according to the ordered RNPR (lowest to

highest). The first regression P1(i)t.= f(RNPR1i) was estimated using the top 25 observations and the

second regression P2(i)t.= f(RNPR2i) was obtained using the bottom 25 observations. (The 17 middle ones

were ignored.) The ratio of the residual sum of squares (RSS) from the two equations follows an F

distribution. The two equations were estimated and yielded RSS1 = 23272.29 and RSS2 = 90508.77, or F-

statistic = RSS2/RSS1 = 3.89. Under the null of homoscedasticity with 23 degrees of freedom in the

numerator and the denominator and at the 5% level of significance, the F critical value is ≈ 2. Since the F-

statistic = 3.89 > critical F = 2, the null is rejected in favor of heteroscedasticity, and equation (3) is

estimated using WLS instead of OLS.

Table 1 contains the estimated WLS model that forecasts P(i)t. All estimated coefficients are

significantly different from zero at the 1% level of significance except for the coefficient of MRt-2. It is

statistically different from zero at the 10% level of significance. Coefficients of all other explanatory

variables considered for this regression were deleted either because they were statistically equal to zero at

more than 20% levels of significance or because their sign was illogical.

The main strength in using WLS lies in the interpretation of the estimated coefficients. Although it is

difficult to tell if they are totally meaningful, their signs are at least consistent with expectations.

Estimated coefficients in the table suggest that if APt-2 increases by $1,000, P(i)t will increase by $959. A

decrease of MRt-2 by 1% will cause P(i)t to increase by $27,484 on average. Every $1,000 increase in

PCI(1) causes P(1)t to increase by $23,420, every $1,000 increase in PCI(2) causes P(2)t to increase by

$31,801, and so on. The last variable in the table Latlon was inspired by equation (15) in Clapp (2004).

Latloni = (LONi*LATi)2. After a lengthy search for the proper variable(s) to represent location, it was the

9

only variable that was not highly collinear with other variable(s) in the equation. The coefficient’s

negative sign suggests that prices tend to be higher in the southern, south-western, and western

neighborhoods of Cambridge.

IV. GENETIC PROGRAMMING & NEURAL NETWORKS

a) Genetic Programming:

Foundations of GP are in Koza (1992). GP is an optimization technique in the form of a computer

program based on Darwin’s notion of survival of the fittest and designed to solve diverse problems from

many disciplines. Figure 3 depicts a basic GP architecture used to evolve thousands of model

specifications, solve them all, test their fitness, and deliver a final best-fit model to use in forecasting. To

obtain a best-fit equation, the computer program starts by randomly assembling an initial population of

equations (say 100, 1000, or even 5000 of them). The user determines the size of such population. The

user also provides data input files of the variables. To assemble equations to include as members of a

population, the program randomly combines a few variables with randomly selected mathematical

operators such as +, -, *, protected /, protected , sin, cos, among others. Protected division and square

root are necessary to prevent division by zero and taking the square root of negative numbers. These

protections follow standards most GP researchers agree upon to avoid computational problems. More

specifically, these protections are programmed such that if in (x÷y), y = 0, then (x÷y) = 1. And if in y1/2, y

< 0, then y1/2 = -| y|1/2. Once members are assembled, their respective fitness is computed. While there are

several choices of fitness measures, the mean square error (MSE = [Σ(Actualt – Fittedt)2/T] is typically

used. An equation with the lowest MSE in a population is declared fittest. If at any time an equation

accurately replicates values of the dependent variable, the program terminates. A user-controlled

threshold minimum MSE determines what is considered reasonable. If GP does not find an equation with

the Min (MSE), which is normal, the program breeds a new population. Populations succeeding the initial

one are the outcome of a programmed breeding by cloning or self-reproduction, crossover, and mutation.

10

In self-reproduction, the top percentage (say 10 or 20%) best equations in an existing population are

simply copied into the new one. In crossover, randomly selected sections from two (usually fitter)

equations in an existing population are exchanged to breed two offspring. In mutation, a randomly

selected section from a randomly selected equation in an existing population is replaced by newly

assembled part(s) to breed one new member. Thusly, GP continues to breed new generations until an

equation with Min (MSE) is found or a preset maximum number of generations is reached. The equation

with Min (MSE) in the last population bred is then reported as fittest.

TSGP (for Time Series Genetic Programming, Kaboudan, 2003) software is used to obtain GP price

models here. TSGP is a computer code written for Windows environment in C++ that is designed

specifically to obtain (or evolve in a Darwinian sense) forecasting models. It uses two types of input: data

input files and a configuration file. Data values of the dependent and each of the independent variables

must be supplied in separate ASCII text files. The configuration file contains execution information

including name of the dependent variable, number of observations to fit, number of observations to

forecast, number of equation specifications to evolve, and other GP-specific parameters. Before executing

TSGP, run parameters must be selected. TSGP default parameter values are set at population size = 1000,

number of generations = 100, self-reproduction rate = 20%, crossover rate = 20%, mutation rate = 60%,

and number of best-fit equations to evolve in a single run = 100. The GP literature on selection of these

parameters has conflicting opinions and therefore extensive search is used to determine what is best to use

under different circumstances. For example, while Chen and Smith (1999) favor higher crossover rate,

Chellapilla (1997) favors higher mutation rate, and while Fuchs (1999) and Gathercole and Ross (1997)

favor small population sizes over few generations, O'Reilly (1999) argues differently. Discussion on

parameter selection options are in Banzahf et al. (1998). With such disagreements, trial and error helps in

choosing the appropriate parameters. This situation is aggravated by the fact that assembling equations in

GP is random and the fittest equation is one that has global minimum MSE. Unfortunately, GP software

11

typically gets easily trapped at a local minimum rather than global MSE. It is therefore necessary to

compare a large number of best-fit equations (minimum of 100) to select only one.

TSGP produces two types of output files. One has a final model specification. The other contains

actual and fitted values as well as performance statistics such as R2, MSE, and the mean absolute percent

error = MAPE = [T-1Σ |Actualt – Fittedt| / Actualt].

GP delivers equations that may not reproduce history very well but they may forecast well. A best-

fit model fails to forecast well when the algorithm used delivers outcomes that are too fit. This

phenomenon is known as overfitting. (See Lo and MacKinlay, 1999, for more on overfitting.) Generally,

if an equation produces accurate out-of-sample ex post forecasts, confidence in and reliability of its ex

ante forecast increases. An ex post forecast is one produced when dependent variable outcomes are

already known but were not used in estimating the model. An ex ante forecast is one produced when the

dependent variable outcomes are unknown. While evaluating accuracy of the ex post forecast, historical

fitness of the evolved model cannot be ignored, however. If the model failed to reproduce history, most

probably it will not deliver a reliable forecast either. The best forecasting model is therefore identified in

two steps. First, final 100 fittest equations evolved are sorted according to lowest fitting MSE. Those

equations with the lowest 10 MSE (or 20, arbitrarily set) are then sorted according to ex post forecast

MAPE. That equation among the selected 10 with the lowest forecast MAPE is selected as best to use for

ex ante forecasting. This heuristic approach seems to work well. The idea is that if a model reproduces

history reasonably well and if it simultaneously forecasts a few periods fairly accurately, then it has a

higher probability of successfully forecasting additional periods into the future.

TSGP was executed to search for 100 best-fit equations. The fittest among the resulting 100 best-fit

equations was then identified. The final equation recognized as best is:

P(i)t = PTIt-2 * cos ((APt-2 / MR t-2) / PTYt-2) + RNPRi +PTIt-2* cos (AP t-2 / PTYt-2) + 2 * PTIt-2* cos (RNPRi) + 2 * PTIt-2 * cos(PTYt-2) + PTIt-2* sin(PTYt-2) + cos(P(i)t-2) + P(i)t-2 + PTIt-2* sin(P(i)t-2 + (APt-2 / PTYt-2)) + APt-2 / MRt-2 (4)

12

where PTYt-2 = P(i)t-2/Yt-2 and PTIt-2 = P(i)t-2/PCI(i)t-2. It’s R2 = 0.83 and MSE = 1995.98. In this nonlinear

equation, P(i)t is determined according to prior prices, relative neighborhood price ranking, mortgage rate,

ratio of price to city income, and ratio of price to neighborhood per capita income as well as the spatial

variable RNPR.

b) Neural Networks:

Neural networks architecture (NN) is a more established computational technique than GP. The

literature on NN is huge. Principe et al. (2000) among many others provide a complete description on

how NN can be used in forecasting. Two structures are commonly used in constructing networks:

multilayer perceptrons (MLP) and generalized feedforward networks (GFF). MLP is a layered

feedforward network that learns nonlinear function mappings using nonlinear activation functions. It is

typically trained with static back propagation and requires differentiable, continuous nonlinear activation

functions such as hyperbolic tangent or sigmoid. Input data are presented to the network that learns to

predict. Although it trains slowly and requires large samples with which to train, it is easy to use and

approximates well. GFF is a generalization of MLP with connections that jump over one or more layers.

It is also trained with static backpropagation.

To obtain the best forecast of the median neighborhood prices, both MLP and GFF were attempted

employing the same set of spatiotemporal data GP and WLS. First, base MLP and GFF configurations

were selected as a starting point to identify the suitable network structure to use. Both are with one hidden

layer, use hyperbolic tangent transfer function, employ 0.70 learning momentum, and train only 100

epochs. These parameters were then changed one at a time until the best ex post forecasting network was

identified. Hidden layers tested were set to one, two, and three. Transfer functions tested under each

scenario were hyperbolic tangent and sigmoid. The better networks were then trained using learning rules

with momentum set first at 0.7 once, then set at 0.9. Testing of each configuration that started with 100

training epochs was increased by increments of 100 until the best network is identified. The parameter

used to terminate training was MSE to be consistent with GP. Unfortunately, neural networks can overfit

13

when training goes on for too long. The final outcome selected is therefore one that minimizes both

training MSE as well as the impact of overfitting. The forecast of neural networks reported below was

produced using NeuroSolutions software (2002).

The final network structure selected was MLP with one hidden layer. It is depicted in Figure 4. Only

ten variables were used after experimenting with all of the available ones. Use of the twelve DVPCI(i)t-2

produced very poor forecasts; they had to be removed from the set of input variables used. The transfer

function in the selected network was hyperbolic tangent, its learning momentum = 0.90, and 1700

training epochs were used. The best NN configuration produced a better fit of P(i)t training values than

the selected GP model. Table 2 contains comparative statistics on estimation and training results using

WLS, GP, and NN. Figure 5 confirms NN’s outstanding ability in reproducing history when compared

with WLS and GP. The figure depicts the 67 observations representing the six years (1995-2000), where

observations 1-12 belong to 1995, 13-24 belong to 1996, and so on. Given these results, it is easy to

expect that NN will produce the better out-of-sample (2001 and 2002) forecasts. This is not the case as

demonstrated next.

V. FORECASTING

Although NN delivered best fit in reproducing historical data used in training (1995-2000), forecasts

by the WLS model were better than those by NN, and forecasts by GP were better than those by WLS.

Table 3 contains a comparison of the forecast statistics. The Theil’s U-statistic reported in the table is a

measure of forecast performance. It is known as Theil’s inequality coefficient and is defined as:

k k1 1 2

kj 1 j 1

2k

MSEUk P(i) k P(i)− −

= =∑ ∑

=+

(5)

where j = 1, 2, …, k (with k = 21 observations representing 2001 and 2002 forecasted), and k

P(i) are

forecast values of kP(i) . This statistic will always fall between zero and one where zero indicates a

14

perfect fit (Pindyck and Rubinfeld, 1998, p. 387). NPMSE (= PMSE / variance ( kP(i) )) is normalized

prediction MSE.

Figure 5 provides visual comparison between actual and ex post predicted prices over the forecast

period (2001-2002) as well as ex ante forecasts for 2003-2004 delivered by WLS, GP, and NN,

respectively. Two observations are easily deduced from the comparison in Table 3 and Figure 6. First,

over the ex post forecast period, GP clearly delivers the best forecast. Second, and more importantly, the

ex ante forecast GP delivers seems more logical than that of WLS or NN. NN has underestimated prices

in areas where homes are relatively more expensive.

VI. CONCLUSION

This paper explored computational forecasting of neighborhood median residential housing prices

utilizing spatiotemporal data with GP and NN and compared their efficacy with those of a statistical

method. Explanatory variables employed included those that account for the impact of (a) temporal

variations in income and mortgage rate and (b) location differences on neighborhood residential housing

prices. Such combination of variables prevents use of OLS because some of the temporal variables are

highly collinear and location variables are spatially autocorrelated. Further, the homoscedasticity null was

rejected when the data was tested. Under these conditions, the two nonparametric computational

techniques GP and NN were employed in modeling and forecasting prices because these algorithms are

robust against many statistical problems. Weighted least squares (WLS) specification, applicable under

conditions of heteroscedasticity, was also used (instead of OLS) to forecast prices as a comparison base.

Over the historical or training period, NN outperformed the other two techniques. For out-of-sample ex

post forecast, GP outperformed the other two. GP’s superior ex post forecast suggests that, among the

three, its ex ante forecast is possibly the most reliable. It was the most logically acceptable one.

Three ways to account for location variables in models were implemented. The first way utilized

longitude and latitude as explanatory variables in WLS. (These are standard in many prior studies.) The

15

second way utilized dummy variables weighted by one of the variables that vary across neighborhoods

(per capita income). These weighted dummy variables helped account for the impact of income change by

neighborhood on its median housing prices relative to other neighborhoods in the WLS model. The third

way utilized relative ranking of neighborhood prices. It was one of the variables used both by GP and NN.

When constructing the three models, emphasis was on obtaining predictions useful in decision

making. Two-year-ahead forecasts were produced without relying on any forecasted values of explanatory

variables. This was possible by setting a lag-length equal to the desired number of periods to predict.

Use of GP, weighted dummy variables, and extended lag-length are all new in modeling and

forecasting residential home prices. Further research and experimentation of all three is clearly warranted.

Linking the type of spatiotemporal model presented here with standard hedonic models also warrants

further investigation.

REFERENCES

Anselin, L. (1998) “Exploratory spatial data analysis in a geocomputational environment,” in M. Fisher,

H. Scholten, and D Unwin (eds) Geocomputation: A Primer. Chichester: Wiley.

Anselin, L. (1999) “The future of spatial analysis in the social sciences,” Geographic Information

Sciences, 5 67–76.

Anselin, L., Bera, A., Florax, R., and Yoon, M. (1996) “Simple diagnostic tests for spatial dependence,”

Regional Science and Urban Economics, 26, 77-104.

Banzhaf, W., Nordin, P., Keller, R., and Francone, F. (1998) Genetic Programming: An Introduction.

San Francisco: Morgan Kaufmann Publishers, Inc.

Bin, O. (2004) “A prediction comparison of housing sales prices by parametric versus semi-parametric

regressions,” Journal of Housing Economics, 13, 68-84.

Cambridge Community Development Department Community Planning Division (2003) “Cambridge

demographic, socioeconomic & real estate market information”, http://www.ci.cambridge.ma.us/

~CDD/data/index.html.

Chellapilla, K. (1997) “Evolving computer programs without subtree crossover,” IEEE Transactions on

Evolutionary Computation, 1, 209-216.

16

Chen, S. and Smith, S. (1999) “Introducing a new advantage of crossover: Commonality-based

selection,” in Wolfgang Banzhaf, Jason Daida, Agoston E. Eiben, Max H. Garzon, Vasant Honavar,

Mark Jakiela, and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary

Computation Conference, volume 1, Orlando, FL: Morgan Kaufmann, Berlin, 13-17.

Clapp, J. (2004) “A semiparametric method for estimating local house price indices,” Real Estate

Economics, 32, 127-160.

Clapp, J., Kim, H., and Gelfand, A. (2002) “Predicting spatial patterns of house prices using LPR and

Bayesian smoothing,” Real Estate Economics, 30, 505-532.

Fotheringham, S., Brunsdon, C., and Charlton, M. (2002) Geographically Weighted Regression: The

Analysis of Spatially varying Relationships. West Essex, England: John Wiley and Sons.

Fuchs, M. (1999) “Large populations are not always the best choice in genetic programming,” in

Wolfgang Banzhaf, Jason Daida, Agoston E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela,

and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference,

volume 2, San Francisco: Morgan Kaufmann, 1033—1038.

Gathercole, C., Ross, P. (1997) “Small populations over many generations can beat large populations over

few generations in genetic programming,” in Genetic Programming 1997: Proceedings of the Second

Annual Conference, J. Koza, K. Deb, M. Dorigo, D. Fogel, M. Garzon, H. Iba, and R. Riolo, editors,

San Francisco: Morgan Kaufmann, 111-118.

Gençay, R. and Yang, X. (1996) “A forecast comparison of residential housing prices by parametric

versus semiparametric conditional mean estimators,” Economics Letters, 52, 129-135.

Getis, A. and Ord, K. (1992) “The analysis of spatial autocorrelation by use of distance statistics,”

Geographical Analysis, 24, 189-206.

Goldfeld, S. and Quandt, R. (1965) “Some tests for homoscedasticity,” Journal of The American

Statistical Association, 60, 539-547.

Goodman, A. and Thibodeau, T. (2003) “Housing market segmentation and hedonic prediction

accuracy,” Journal of Housing Economics, 12, 181-201.

Gopal, S. and Fischer, M. (1996) “Learning in single hidden layer feedforward neural network models:

Backpropagation in a spatial interaction modeling context,” Geographical Analysis, 28, 38-55.

Kaboudan, M. (2003) “TSGP: A Time Series Genetic Programming Software,” http://newton.uor.edu/

facultyfolder/mahmoud_kaboudan/tsgp.

Koza, J. (1992), Genetic Programming, Cambridge, MA: The MIT Press.

Lo, A. and MacKinlay, C. (1999). A Non-Random Walk Down Wall Street, Princeton, NJ: Princeton

University Press.

17

Longley, P., and Batty, M. (1996) Spatial Analysis: Modelling in a GIS Environment, GeoInformation

International, Cambridge, UK.

López, C., Álvarez, A., and Hernández-García, E. (2000) “Forecasting confined spatiotemporal chaos

with genetic algorithms,” Physics Review Letters, 85, 2300-2303.

Maros-Nikolaus, P. and Martin-González, J. (2002) “Spatial forecasting: Detecting determinism from

single snapshots,” International Journal of Bifurcation and Chaos in Applied Sciences and

Engineering, 12, 369-376.

Mason, C. and Quigley, J. (1996) “Non-parametric hedonic housing prices,” Housing Studies, 11, 373-

385.

NeuroSolutionsTM (2002). The Neural Network Simulation Environment, Version 3, NeuroDimensions,

Inc., Gainesville, FL.

O'Reilly, U. 1999. Foundations of genetic programming: Effective population size, linking and mixing.

GECCO'99 workshop, April 1999.

Pindyck, R. & Rubinfeld, D. (1998). Econometric Models and Economic Forecasting. Boston: Irwin

McGraw-Hill.

Principe, J., Euliano, N., and Lefebvre, C. (2000). Neural and Adaptive Systems: Fundamentals Through

Simulations, New York: John Wiley & Sons, Inc.

Rossini, P. (2000) “Using expert systems and artificial intelligence for real estate forecasting,” a paper

presented at the Sixth Annual Pacific-Rim Real Estate Society Conference, Sydney, Australia, 24-27

January 2000, http://business2.unisa.edu.au/prres/Proceedings/Proceedings2000/P6A2.pdf.

Rubin, D. (1992) “Use of forecasting signatures to help distinguish periodicity, randomness, and chaos in

ripples and other spatial patterns,” Chaos, 2, 525-535.

18

. Table 1. Weighted least squares regression estimated coefficients

Variable Coeff. Std. Error t-Stat p-value APt-2 0.959 0.406 2.362 0.022 MRt-2 -27.484 13.987 -1.965 0.055 DVPCI(1)t-2 23.420 9.407 2.490 0.016 DVPCI(3)t-2 31.801 12.181 2.611 0.012 DVPCI(4)t-2 21.647 7.971 2.716 0.009 DVPCI(5)t-2 11.882 4.079 2.913 0.005 DVPCI(6)t-2 10.521 2.869 3.667 0.001 DVPCI(7)t-2 27.662 9.568 2.891 0.006 DVPCI(8)t-2 19.833 4.748 4.177 0.000 DVPCI(9)t-2 9.279 2.371 3.913 0.000 DVPCI(10)t-2 6.587 1.608 4.097 0.000 DVPCI(11)t-2 6.764 3.114 2.172 0.034 DVPCI(12)t-2 13.212 4.775 2.767 0.008 DVPCI(13)t-2 15.332 4.821 3.180 0.002

Latlon -2.650 1.261 -2.102 0.040

R2 = 0.76 DW = 1.76 MSE = 1061.23

19

Table 2. Comparative estimation and training statistics

Statistic WLS GP NN

R2 0.76 0.83 0.96 MSE 1061.23 1995.98 463.51 MAPE 0.130 0.212 0.091

Table 3. Comparative forecast statistics

Table 3. Forecast comparison WLS GP NN Theil's U 0.081 0.044 0.097 MAPE 0.153 0.079 0.156 PMSE 3331.6 981.9 4425.26 NPMSE 0.170 0.079 0.225

20

Figure 1. City of Cambridge: Neighborhood Boundaries with Major City Streets.

(http://www.ci.cambridge.ma.us/~CDD/commplan/neighplan/ pdfmaps/neighcitymap.html.)

21

Figure 2. City of Cambridge: Census tracts.

(http://www.ci.cambridge.ma.us/~CDD/data/maps/1990_census_tract_map.html.)

22

Figure 3. Process depicting evolution of equations to select a best-fit one using GP.

23

Figure 4. MLP network used to generate the final forecast.

24

Fig. 5.a

0

100

200

300

400

500

600

1 6 11 16 21 26 31 36 41 46 51 56 61 66

Observations

P Actual WLS

Fig. 5.c

0100200300400500600

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65

Observations

P Actual NN

Figure 5. Comparative historical estimation and training results of WLS, GP, and NN. Fig. 5.c. confirms

NN’s fitting ability. The worst fit among the three is Fig. 5.b. produced using GP.

Fig. 5.b

0

100

200

300

400

500

600

1 6 11 16 21 26 31 36 41 46 51 56 61 66

Observations

P Actual GP

25

Fig. 6.a

0

200

400

600

800

1 5 9 13 17 21 25 29 33 37 41

Observations

P Actual WLS -xp WLS -xa

Fig. 6.b

0

200

400

600

800

1 5 9 13 17 21 25 29 33 37 41

Observations

PActual GP - xo GP - xa

Fig. 6.c

0100200300400500600700800

1 5 9 13 17 21 25 29 33 37 41Observations

P Actual NN - xp NN - xa

Figure 6. WLS, GP, and NN forecast comparison. The first 21 observations are ex post forecasts

while the later 22 observations are ex ante forecasts. GP’s superiority is evident and

most logical.

Documents

Spatiotemporal Forecasting of Neighborhood Median Residential …aiecon.org/conference/efmaci2004/pdf/Housing_Prices... · 2016-02-18 · Spatiotemporal Forecasting of Neighborhood