Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Spatiotemporal Forecasting of Neighborhood Median Residential Housing Prices
Mak Kaboudan School of Business, University of Redlands
1200 East Colton Avenue, Redlands, CA 92373
Tel: (909) 748-6349
Email: [email protected]
June 25, 2004
1
Spatiotemporal Forecasting of Neighborhood Median Residential Housing Prices
The market value of a house can change over time following movements in its neighborhood median
home price even when there was no change in its own attributes. Hedonic price model specifications can
capture deviations from neighborhood median prices by determining the market value of a house based on
its own attributes but may not capture neighborhood time dependent price variations. To forecast these
variations over time, spatiotemporal model specifications may be appropriate but their estimation is
statistically challenged by autocorrelation and heteroscedasticity problems. Techniques robust against
these problems can be employed instead. This paper compares the accuracy of two different
neighborhood median home price forecasts obtained by using two robust computational techniques,
genetic programming (GP) and neural networks (NN). GP is an optimization technique that delivers fairly
reliable forecasting models. NN use a universal class model that is easily trained to approximate the
dynamics of data well. GP and NN neighborhood median housing price forecasts are compared with those
of a standard statistical method – weighted least squares.
I. INTRODUCTION
Forecasting residential housing prices is important to lending institutions, tax assessors, and
homeowners or buyers among others who are affected by the real estate market dynamics. Yet, accurate
price-predictions remain a challenge, perhaps because models that explain variations in prices over time
as well as between neighborhoods and among houses can be rather complex. Most studies that forecast
prices of residential homes rely on property address level detailed data. Details at property level furnish
housing attributes upon which hedonic pricing models have been based for decades. Applications of
hedonic methods to the housing markets are plenty; see Goodman and Thibodeau (2003) for a recent
application. Advances based on hedonic methods include work on geographically weighted regression
models (Fotheringham et al., 2002) and work on local regression (or semi-parametric) models (Clapp et
al., 2002). Bin (2004) compares parametric versus semi-parametric hedonic regression models. Hedonic
models can be viewed as representations of localized nano-detailed pricing algorithms since they focus on
small-scale variations. Although these capture quantitative and qualitative characteristics of individual
2
properties (such as age, square footage, with garage or not, with air conditioning or not, etc.) and location
factors that impact the price of a house (Bin, 2004; Mason and Quigley, 1996), they may not explain
temporal price changes well.
Dynamical changes in housing prices are determined by supply and demand forces. Economic
variables such as income and mortgage rate are demand for housing determinants. Because they change
over time, prices of all residential houses change, ceteris paribus attributes. Therefore, identical attributes
should have different impacts on the price of a house at different time periods. This explains why
appraisers of a specific property adjust “current” neighborhood prices to allow for differences among
properties. If this is the case, it is reasonable to model local patterns of dependency that capture the
impact of large-scale variations of inter-neighborhood temporal changes in economic conditions on
neighborhood median prices first. Hedonic models that capture the impact of intra-neighborhood
variations to account for differences in age, square footage, etc. follow. A neighborhood median price
becomes one of the input variables in a hedonic model. Including temporal economic changes along with
differences in attributes in the same price equation is possible and has been done (see Clapp, 2004, for
example). However, including variables that explain large-scale variations (due to microeconomic
changes) as well as others that explain small-scale variations (due to differences in attributes) in the same
equation may create an identification problem that can be difficult to resolve. It may not be possible to
isolate the impact of general economic changes from the impact of specific attributes of a house on its
price. Estimating separate equations seems more appropriate. After estimating a neighborhood median
price model, estimating a hedonic price model follows. Given the ubiquity of hedonic models, focus here
is only on forecasting neighborhood median prices using microeconomic spatiotemporal specifications
that include one or more explanatory variables to discriminate between geographical locations.
Delivering timely forecasts of residential housing prices is important. A model that forecasts
“known information” is of little value to decision makers. Models capable of forecasting only one-step-
ahead (a month, quarter, or year) tend to deliver known information. By the time data on explanatory
3
variables needed to produce a forecast become available, that one-step-ahead forecast value has
materialized. To produce an extended forecast, typically “predicted values” of explanatory variables are
used. Using such an approach may deliver inaccurate forecasts because predicted values of input variables
are most probably imprecise. The alternative is to construct models that can produce forecasts for a few
steps ahead using “actual” rather than “fitted or forecasted” values of explanatory variables. In this paper;
it is shown that using actual values delivers reasonable forecasts of neighborhood residential housing
prices. Attention to this problem is not new, but the solution suggested is. Gençay and Yang (1996),
Clapp et al. (2002), and Bin (2004) emphasize evaluating out-of-sample predictions when comparing
performances of different forecasting models.
Techniques to model and forecast spatiotemporal series are far from being established. Getis and
Ord (1992), Anselin et al. (1996), and Longley and Batty (1996, pp. 227-230) discuss spatial
autocorrelation and other statistical complications encountered when analyzing or modeling spatial data.
Anselin (1998 and 1999) reviews potential solutions to some of these problems. However, further
complications occur when spatial data is taken over time. If such models account for the presence of low
dimensional nonlinear or chaotic dynamics, a rapid increase in forecast errors over time occurs due to
sensitivity to initial conditions. This issue was addressed by Rubin (1992) and Maros-Nikolaus and
Martin-González (2002).
Given that statistical problems hinder modeling and forecasting efforts when dealing with
spatiotemporal data, using modeling and forecasting techniques that circumvent statistical estimation of
model parameters is worthy of investigation. To determine their efficacy, two computational techniques,
genetic programming (GP) and artificial neural networks (NN), are tested and compared with weighted
least squares (WLS) modeling. GP is a computer algorithm that produces regression-type models (Koza,
1992). The traditional statistical calculations to estimate model coefficients and the restrictions imposed
by statistical models are totally absent. GP is a univariate modeling technique that typically delivers
nonlinear equations. They are difficult to interpret but forecast rather well. López et al. (2000) proposed
4
using a proper empirical orthogonal function decomposition to describe the dynamics of a system then
used GP to extract dynamical rules from the data. They apply GP to obtain one-step-ahead forecast of
confined spatiotemporal chaos. NN are also a computerized classification technique that delivers forecasts
but (unlike GP) without delivering a model. NN architecture is based on the human neural system. It is
programmed to go into a training iterative process designed to learn the dynamics of a system. NN is a
more established technique with superior power in fitting complex dynamics and has gained attention and
acceptance of many forecasters. Gopal and Fischer (1996) used NN in spatial forecasting. Rossini (2000)
used NN in forecasting residential housing prices employing hedonic-type specifications. Attraction to
these two computational techniques may be attributed to their robustness with respect to many statistical
problems that standard econometric or statistical modeling methods face. More specifically, they are
robust against problems of multicollinearity, autocorrelation, and non-stationarity.
This investigation implements a specification strategy designed to account for the effects of
temporal variations on future neighborhood median single-family residential home prices in the City of
Cambridge, MA. What is presented here is of exploratory nature. The main assumption is that changes in
real mortgage rate and in real per capita income have different effects on prices in the different
neighborhoods. Although only data available freely via the web was employed, results reported below
seem promising. The next Section contains a description of the data. Section III explains the univariate
model specification employed throughout. An introduction to GP and how it can be used in forecasting as
well as a review of NN follow. Forecast results using WLS, GP, and NN are compared in Section V. The
final Section contains concluding remarks. The results reported below are mixed. Forecasts using GP
were significantly more reasonable and more likely to occur than those obtained by NN and WLS.
II. INPUT DATA
The data utilized in this study was obtained via an Internet search to find relatively complete
information to use in developing a housing price model. Data on housing sales for the City of Cambridge,
5
MA, was found on a web page published by that city’s Community Development Department,
Community Planning Division (2003). The publication reports annual median prices of single-family
homes for the period 1993-2002 of twelve neighborhoods in Cambridge. Figure 1 shows thirteen
neighborhoods, but data was available only for twelve of them. There were no data for area 2. The
dependent variable to model and forecast is P(i)t or real median price of homes in neighborhood i sold at
time period t, where t = 1,…,T. (The consumer price index for Boston was used to deflate all nominal
data.) A neighborhood with less than three sales in any given year was deleted because it was easily
identified as an outlier. Data of 1993 and 1994 were lost degrees of freedom because all temporal
explanatory variables were taken with a minimum of two lags to obtain fitted values for the succeeding
two years without use of fitted or forecasted values. Remaining data after deleting outliers were then
divided into two sets. The first set uses 1995 to 2000 historical data to produce forecasting models. It
contained 67 observations. The other set contained 21 observations that were reserved to evaluate and
compare 2001 and 2002 out-of-sample forecasts. (Data starting 2003 were not made available until the
middle of 2004 when this paper was completed.)
The explanatory variables considered to capture variations in P(i)t include:
• MRt-2 = Two years lagged real mortgage rate. This temporal variable is constant for neighborhoods.
• PCI(i)t-2 = Neighborhood real per capita income lagged two years. This spatiotemporal variable varies
over time as well as among neighborhoods. Per capita income was not available by neighborhood but
by census tract. Figure 2 shows these tracts. To obtain representation values of each neighborhood’s
per capita income, an approximate congruency between maps of neighborhood and tracts in Figures 1
and 2 was used. Neighborhood 1 contains approximately three tracts (3521, 3522, and 3523), for
example. Their distribution was visually approximated. Tracts 3521, 3522, and 3523 were subjectively
assigned weights of 0.60, 0.30, and 0.10, respectively, to reflect their proportional neighborhood
shares. A weighted average per capita income was then computed for that neighborhood. Others were
similarly approximated. Using this method may be a source of measurement error. Since there is no
6
alternative, this solution was assumed reasonable given the contiguity of neighborhoods. The results
obtained and presented later seem to confirm that such PCI representation is adequate.
• DVPCI(i)t-2 = Twelve spatiotemporal dummy variables designed to capture the impact of changes in
real per capita income in neighborhood i at time period t-2 on that neighborhood median price in
period t. DVPCI(i)t = PCIt-2 * Wij, where Wij = 1 if i = j and zero otherwise for i = 1, .., n
neighborhoods and j = 1, …, k neighborhoods where n = k. Use of this type of dummy variable is
atypical. Goodman and Thibodeau (2003) use dummy variables to account for time of sale. Gençay
and Yang (1996) use a more standard Boolean dummy variables for (all but one) neighborhoods.
Boolean dummy variables capture intercept shifts. DVPCI(i) captures slope changes instead.
• Yt-2 = Real median household income for the city of Cambridge, MA. This temporal variable is
constant across neighborhoods.
• RNPRi = Relative neighborhood price ranking. It was computed by averaging each neighborhood’s
median prices over the period 1993-2000 first, sorting these averages second, and then assigning the
numbers 1 through 12 with “1” assigned to the lowest and “12” to the highest average. This integer
spatial variable varies by neighborhood only and serves as polygon ID.
• LATi = Latitude of the centroid of neighborhood i.
• LONi = Longitude of the centroid of neighborhood i.
• APt-2 = Average real median price of homes in the city of Cambridge, MA, lagged two years. This
temporal variable remains constant across neighborhoods.
• P(i)t-2 = Real median price lagged two periods.
Given that P(i)t and a few of the explanatory variables vary spatially, there should be some
accounting for spatial autocorrelation. Classical spatial autocorrelation measurement using the Moran
coefficient or the Geary ratio is not practical here since data is taken over time. Instead, spatial
autocorrelation was estimated using the following OLS regression model:
t tP(i) P(i 1)= α + ρ − (1)
7
where ρ measures the degree of autocorrelation. This equation provides a simple measure of
autocorrelation between pairs of contiguous neighbors over time. Autocorrelation is present if the
estimated ρ is significantly different from zero. The resulting equation using data 1995-2002 is:
t tP(i) 138.9 0.452 P(i 1)= + − . (2)
For both the intercept and estimate of ρ, p-value = 0.00. Equation (2) confirms the existence of spatial
autocorrelation between pairs of contiguous neighborhoods averaged over time.
III. BASIC MODEL SPECIFICATION
The basic model is a univariate specification designed to capture variations in P(i)t using mainly
economic data while incorporating spatial aspects, where P(i)t is a K x 1 vector with K = n*T. Formally:
t i t tP(i) f(S ,X ,Z(i) )= (3)
where iS is a set of spatial variables that vary only among neighborhoods (i), tX is a set of time series
variables that vary only over time (t), and tZ(i) are variables that vary over both. Accordingly, set
iS includes RNPRi, LATi, and LONi; set tX includes MRt-2, Yt-2, and APt-2; set tZ(i) includes DVPCI(i)t-2,
PCI(i)t-2, and P(i)t-2.
Forecasts from WLS were selected as base to evaluate the efficacy of forecasts from the two
nonparametric computational techniques GP and NN. Ordinary least squares (OLS) is a typical parametric
method to use. However, due to the pooling of cross-sectional with time series data and because spatial
correlation was detected, there is a good chance that the variance of the error is not constant over
observations. When this problem occurs, the model has heteroscedastic error disturbance. OLS parameter
estimators are unbiased and consistent but they are not efficient (Pindyck and Rubinfeld, 1998, p. 147).
The weighted least squares (WLS) method (which is a special case of generalized least squares, Pindyck
and Rubinfeld, 1998, p. 148) is used to correct for heteroscedasticity if it is detected. To test for
heteroscedasticity, the Goldfeld-Quandt test applies (Goldfeld and Quandt, 1965). The null hypothesis of
8
homoscedasticity is tested against the alternative of heteroscedasticity after available data is first divided
into two groups. A simple regression model is fit to each group using the dependent variable and a single
explanatory variable which is thought to be related to the error variance. The regressions are completed
after ranking available observations according to the independent variable and after deleting a middle
section of the data. Thusly, the price data were grouped according to the ordered RNPR (lowest to
highest). The first regression P1(i)t.= f(RNPR1i) was estimated using the top 25 observations and the
second regression P2(i)t.= f(RNPR2i) was obtained using the bottom 25 observations. (The 17 middle ones
were ignored.) The ratio of the residual sum of squares (RSS) from the two equations follows an F
distribution. The two equations were estimated and yielded RSS1 = 23272.29 and RSS2 = 90508.77, or F-
statistic = RSS2/RSS1 = 3.89. Under the null of homoscedasticity with 23 degrees of freedom in the
numerator and the denominator and at the 5% level of significance, the F critical value is ≈ 2. Since the F-
statistic = 3.89 > critical F = 2, the null is rejected in favor of heteroscedasticity, and equation (3) is
estimated using WLS instead of OLS.
Table 1 contains the estimated WLS model that forecasts P(i)t. All estimated coefficients are
significantly different from zero at the 1% level of significance except for the coefficient of MRt-2. It is
statistically different from zero at the 10% level of significance. Coefficients of all other explanatory
variables considered for this regression were deleted either because they were statistically equal to zero at
more than 20% levels of significance or because their sign was illogical.
The main strength in using WLS lies in the interpretation of the estimated coefficients. Although it is
difficult to tell if they are totally meaningful, their signs are at least consistent with expectations.
Estimated coefficients in the table suggest that if APt-2 increases by $1,000, P(i)t will increase by $959. A
decrease of MRt-2 by 1% will cause P(i)t to increase by $27,484 on average. Every $1,000 increase in
PCI(1) causes P(1)t to increase by $23,420, every $1,000 increase in PCI(2) causes P(2)t to increase by
$31,801, and so on. The last variable in the table Latlon was inspired by equation (15) in Clapp (2004).
Latloni = (LONi*LATi)2. After a lengthy search for the proper variable(s) to represent location, it was the
9
only variable that was not highly collinear with other variable(s) in the equation. The coefficient’s
negative sign suggests that prices tend to be higher in the southern, south-western, and western
neighborhoods of Cambridge.
IV. GENETIC PROGRAMMING & NEURAL NETWORKS
a) Genetic Programming:
Foundations of GP are in Koza (1992). GP is an optimization technique in the form of a computer
program based on Darwin’s notion of survival of the fittest and designed to solve diverse problems from
many disciplines. Figure 3 depicts a basic GP architecture used to evolve thousands of model
specifications, solve them all, test their fitness, and deliver a final best-fit model to use in forecasting. To
obtain a best-fit equation, the computer program starts by randomly assembling an initial population of
equations (say 100, 1000, or even 5000 of them). The user determines the size of such population. The
user also provides data input files of the variables. To assemble equations to include as members of a
population, the program randomly combines a few variables with randomly selected mathematical
operators such as +, -, *, protected /, protected , sin, cos, among others. Protected division and square
root are necessary to prevent division by zero and taking the square root of negative numbers. These
protections follow standards most GP researchers agree upon to avoid computational problems. More
specifically, these protections are programmed such that if in (x÷y), y = 0, then (x÷y) = 1. And if in y1/2, y
< 0, then y1/2 = -| y|1/2. Once members are assembled, their respective fitness is computed. While there are
several choices of fitness measures, the mean square error (MSE = [Σ(Actualt – Fittedt)2/T] is typically
used. An equation with the lowest MSE in a population is declared fittest. If at any time an equation
accurately replicates values of the dependent variable, the program terminates. A user-controlled
threshold minimum MSE determines what is considered reasonable. If GP does not find an equation with
the Min (MSE), which is normal, the program breeds a new population. Populations succeeding the initial
one are the outcome of a programmed breeding by cloning or self-reproduction, crossover, and mutation.
10
In self-reproduction, the top percentage (say 10 or 20%) best equations in an existing population are
simply copied into the new one. In crossover, randomly selected sections from two (usually fitter)
equations in an existing population are exchanged to breed two offspring. In mutation, a randomly
selected section from a randomly selected equation in an existing population is replaced by newly
assembled part(s) to breed one new member. Thusly, GP continues to breed new generations until an
equation with Min (MSE) is found or a preset maximum number of generations is reached. The equation
with Min (MSE) in the last population bred is then reported as fittest.
TSGP (for Time Series Genetic Programming, Kaboudan, 2003) software is used to obtain GP price
models here. TSGP is a computer code written for Windows environment in C++ that is designed
specifically to obtain (or evolve in a Darwinian sense) forecasting models. It uses two types of input: data
input files and a configuration file. Data values of the dependent and each of the independent variables
must be supplied in separate ASCII text files. The configuration file contains execution information
including name of the dependent variable, number of observations to fit, number of observations to
forecast, number of equation specifications to evolve, and other GP-specific parameters. Before executing
TSGP, run parameters must be selected. TSGP default parameter values are set at population size = 1000,
number of generations = 100, self-reproduction rate = 20%, crossover rate = 20%, mutation rate = 60%,
and number of best-fit equations to evolve in a single run = 100. The GP literature on selection of these
parameters has conflicting opinions and therefore extensive search is used to determine what is best to use
under different circumstances. For example, while Chen and Smith (1999) favor higher crossover rate,
Chellapilla (1997) favors higher mutation rate, and while Fuchs (1999) and Gathercole and Ross (1997)
favor small population sizes over few generations, O'Reilly (1999) argues differently. Discussion on
parameter selection options are in Banzahf et al. (1998). With such disagreements, trial and error helps in
choosing the appropriate parameters. This situation is aggravated by the fact that assembling equations in
GP is random and the fittest equation is one that has global minimum MSE. Unfortunately, GP software
11
typically gets easily trapped at a local minimum rather than global MSE. It is therefore necessary to
compare a large number of best-fit equations (minimum of 100) to select only one.
TSGP produces two types of output files. One has a final model specification. The other contains
actual and fitted values as well as performance statistics such as R2, MSE, and the mean absolute percent
error = MAPE = [T-1Σ |Actualt – Fittedt| / Actualt].
GP delivers equations that may not reproduce history very well but they may forecast well. A best-
fit model fails to forecast well when the algorithm used delivers outcomes that are too fit. This
phenomenon is known as overfitting. (See Lo and MacKinlay, 1999, for more on overfitting.) Generally,
if an equation produces accurate out-of-sample ex post forecasts, confidence in and reliability of its ex
ante forecast increases. An ex post forecast is one produced when dependent variable outcomes are
already known but were not used in estimating the model. An ex ante forecast is one produced when the
dependent variable outcomes are unknown. While evaluating accuracy of the ex post forecast, historical
fitness of the evolved model cannot be ignored, however. If the model failed to reproduce history, most
probably it will not deliver a reliable forecast either. The best forecasting model is therefore identified in
two steps. First, final 100 fittest equations evolved are sorted according to lowest fitting MSE. Those
equations with the lowest 10 MSE (or 20, arbitrarily set) are then sorted according to ex post forecast
MAPE. That equation among the selected 10 with the lowest forecast MAPE is selected as best to use for
ex ante forecasting. This heuristic approach seems to work well. The idea is that if a model reproduces
history reasonably well and if it simultaneously forecasts a few periods fairly accurately, then it has a
higher probability of successfully forecasting additional periods into the future.
TSGP was executed to search for 100 best-fit equations. The fittest among the resulting 100 best-fit
equations was then identified. The final equation recognized as best is:
P(i)t = PTIt-2 * cos ((APt-2 / MR t-2) / PTYt-2) + RNPRi +PTIt-2* cos (AP t-2 / PTYt-2) + 2 * PTIt-2* cos (RNPRi) + 2 * PTIt-2 * cos(PTYt-2) + PTIt-2* sin(PTYt-2) + cos(P(i)t-2) + P(i)t-2 + PTIt-2* sin(P(i)t-2 + (APt-2 / PTYt-2)) + APt-2 / MRt-2 (4)
12
where PTYt-2 = P(i)t-2/Yt-2 and PTIt-2 = P(i)t-2/PCI(i)t-2. It’s R2 = 0.83 and MSE = 1995.98. In this nonlinear
equation, P(i)t is determined according to prior prices, relative neighborhood price ranking, mortgage rate,
ratio of price to city income, and ratio of price to neighborhood per capita income as well as the spatial
variable RNPR.
b) Neural Networks:
Neural networks architecture (NN) is a more established computational technique than GP. The
literature on NN is huge. Principe et al. (2000) among many others provide a complete description on
how NN can be used in forecasting. Two structures are commonly used in constructing networks:
multilayer perceptrons (MLP) and generalized feedforward networks (GFF). MLP is a layered
feedforward network that learns nonlinear function mappings using nonlinear activation functions. It is
typically trained with static back propagation and requires differentiable, continuous nonlinear activation
functions such as hyperbolic tangent or sigmoid. Input data are presented to the network that learns to
predict. Although it trains slowly and requires large samples with which to train, it is easy to use and
approximates well. GFF is a generalization of MLP with connections that jump over one or more layers.
It is also trained with static backpropagation.
To obtain the best forecast of the median neighborhood prices, both MLP and GFF were attempted
employing the same set of spatiotemporal data GP and WLS. First, base MLP and GFF configurations
were selected as a starting point to identify the suitable network structure to use. Both are with one hidden
layer, use hyperbolic tangent transfer function, employ 0.70 learning momentum, and train only 100
epochs. These parameters were then changed one at a time until the best ex post forecasting network was
identified. Hidden layers tested were set to one, two, and three. Transfer functions tested under each
scenario were hyperbolic tangent and sigmoid. The better networks were then trained using learning rules
with momentum set first at 0.7 once, then set at 0.9. Testing of each configuration that started with 100
training epochs was increased by increments of 100 until the best network is identified. The parameter
used to terminate training was MSE to be consistent with GP. Unfortunately, neural networks can overfit
13
when training goes on for too long. The final outcome selected is therefore one that minimizes both
training MSE as well as the impact of overfitting. The forecast of neural networks reported below was
produced using NeuroSolutions software (2002).
The final network structure selected was MLP with one hidden layer. It is depicted in Figure 4. Only
ten variables were used after experimenting with all of the available ones. Use of the twelve DVPCI(i)t-2
produced very poor forecasts; they had to be removed from the set of input variables used. The transfer
function in the selected network was hyperbolic tangent, its learning momentum = 0.90, and 1700
training epochs were used. The best NN configuration produced a better fit of P(i)t training values than
the selected GP model. Table 2 contains comparative statistics on estimation and training results using
WLS, GP, and NN. Figure 5 confirms NN’s outstanding ability in reproducing history when compared
with WLS and GP. The figure depicts the 67 observations representing the six years (1995-2000), where
observations 1-12 belong to 1995, 13-24 belong to 1996, and so on. Given these results, it is easy to
expect that NN will produce the better out-of-sample (2001 and 2002) forecasts. This is not the case as
demonstrated next.
V. FORECASTING
Although NN delivered best fit in reproducing historical data used in training (1995-2000), forecasts
by the WLS model were better than those by NN, and forecasts by GP were better than those by WLS.
Table 3 contains a comparison of the forecast statistics. The Theil’s U-statistic reported in the table is a
measure of forecast performance. It is known as Theil’s inequality coefficient and is defined as:
k k1 1 2
kj 1 j 1
2k
MSEUk P(i) k P(i)− −
= =∑ ∑
=+
(5)
where j = 1, 2, …, k (with k = 21 observations representing 2001 and 2002 forecasted), and k
P(i) are
forecast values of kP(i) . This statistic will always fall between zero and one where zero indicates a
14
perfect fit (Pindyck and Rubinfeld, 1998, p. 387). NPMSE (= PMSE / variance ( kP(i) )) is normalized
prediction MSE.
Figure 5 provides visual comparison between actual and ex post predicted prices over the forecast
period (2001-2002) as well as ex ante forecasts for 2003-2004 delivered by WLS, GP, and NN,
respectively. Two observations are easily deduced from the comparison in Table 3 and Figure 6. First,
over the ex post forecast period, GP clearly delivers the best forecast. Second, and more importantly, the
ex ante forecast GP delivers seems more logical than that of WLS or NN. NN has underestimated prices
in areas where homes are relatively more expensive.
VI. CONCLUSION
This paper explored computational forecasting of neighborhood median residential housing prices
utilizing spatiotemporal data with GP and NN and compared their efficacy with those of a statistical
method. Explanatory variables employed included those that account for the impact of (a) temporal
variations in income and mortgage rate and (b) location differences on neighborhood residential housing
prices. Such combination of variables prevents use of OLS because some of the temporal variables are
highly collinear and location variables are spatially autocorrelated. Further, the homoscedasticity null was
rejected when the data was tested. Under these conditions, the two nonparametric computational
techniques GP and NN were employed in modeling and forecasting prices because these algorithms are
robust against many statistical problems. Weighted least squares (WLS) specification, applicable under
conditions of heteroscedasticity, was also used (instead of OLS) to forecast prices as a comparison base.
Over the historical or training period, NN outperformed the other two techniques. For out-of-sample ex
post forecast, GP outperformed the other two. GP’s superior ex post forecast suggests that, among the
three, its ex ante forecast is possibly the most reliable. It was the most logically acceptable one.
Three ways to account for location variables in models were implemented. The first way utilized
longitude and latitude as explanatory variables in WLS. (These are standard in many prior studies.) The
15
second way utilized dummy variables weighted by one of the variables that vary across neighborhoods
(per capita income). These weighted dummy variables helped account for the impact of income change by
neighborhood on its median housing prices relative to other neighborhoods in the WLS model. The third
way utilized relative ranking of neighborhood prices. It was one of the variables used both by GP and NN.
When constructing the three models, emphasis was on obtaining predictions useful in decision
making. Two-year-ahead forecasts were produced without relying on any forecasted values of explanatory
variables. This was possible by setting a lag-length equal to the desired number of periods to predict.
Use of GP, weighted dummy variables, and extended lag-length are all new in modeling and
forecasting residential home prices. Further research and experimentation of all three is clearly warranted.
Linking the type of spatiotemporal model presented here with standard hedonic models also warrants
further investigation.
REFERENCES
Anselin, L. (1998) “Exploratory spatial data analysis in a geocomputational environment,” in M. Fisher,
H. Scholten, and D Unwin (eds) Geocomputation: A Primer. Chichester: Wiley.
Anselin, L. (1999) “The future of spatial analysis in the social sciences,” Geographic Information
Sciences, 5 67–76.
Anselin, L., Bera, A., Florax, R., and Yoon, M. (1996) “Simple diagnostic tests for spatial dependence,”
Regional Science and Urban Economics, 26, 77-104.
Banzhaf, W., Nordin, P., Keller, R., and Francone, F. (1998) Genetic Programming: An Introduction.
San Francisco: Morgan Kaufmann Publishers, Inc.
Bin, O. (2004) “A prediction comparison of housing sales prices by parametric versus semi-parametric
regressions,” Journal of Housing Economics, 13, 68-84.
Cambridge Community Development Department Community Planning Division (2003) “Cambridge
demographic, socioeconomic & real estate market information”, http://www.ci.cambridge.ma.us/
~CDD/data/index.html.
Chellapilla, K. (1997) “Evolving computer programs without subtree crossover,” IEEE Transactions on
Evolutionary Computation, 1, 209-216.
16
Chen, S. and Smith, S. (1999) “Introducing a new advantage of crossover: Commonality-based
selection,” in Wolfgang Banzhaf, Jason Daida, Agoston E. Eiben, Max H. Garzon, Vasant Honavar,
Mark Jakiela, and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary
Computation Conference, volume 1, Orlando, FL: Morgan Kaufmann, Berlin, 13-17.
Clapp, J. (2004) “A semiparametric method for estimating local house price indices,” Real Estate
Economics, 32, 127-160.
Clapp, J., Kim, H., and Gelfand, A. (2002) “Predicting spatial patterns of house prices using LPR and
Bayesian smoothing,” Real Estate Economics, 30, 505-532.
Fotheringham, S., Brunsdon, C., and Charlton, M. (2002) Geographically Weighted Regression: The
Analysis of Spatially varying Relationships. West Essex, England: John Wiley and Sons.
Fuchs, M. (1999) “Large populations are not always the best choice in genetic programming,” in
Wolfgang Banzhaf, Jason Daida, Agoston E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela,
and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference,
volume 2, San Francisco: Morgan Kaufmann, 1033—1038.
Gathercole, C., Ross, P. (1997) “Small populations over many generations can beat large populations over
few generations in genetic programming,” in Genetic Programming 1997: Proceedings of the Second
Annual Conference, J. Koza, K. Deb, M. Dorigo, D. Fogel, M. Garzon, H. Iba, and R. Riolo, editors,
San Francisco: Morgan Kaufmann, 111-118.
Gençay, R. and Yang, X. (1996) “A forecast comparison of residential housing prices by parametric
versus semiparametric conditional mean estimators,” Economics Letters, 52, 129-135.
Getis, A. and Ord, K. (1992) “The analysis of spatial autocorrelation by use of distance statistics,”
Geographical Analysis, 24, 189-206.
Goldfeld, S. and Quandt, R. (1965) “Some tests for homoscedasticity,” Journal of The American
Statistical Association, 60, 539-547.
Goodman, A. and Thibodeau, T. (2003) “Housing market segmentation and hedonic prediction
accuracy,” Journal of Housing Economics, 12, 181-201.
Gopal, S. and Fischer, M. (1996) “Learning in single hidden layer feedforward neural network models:
Backpropagation in a spatial interaction modeling context,” Geographical Analysis, 28, 38-55.
Kaboudan, M. (2003) “TSGP: A Time Series Genetic Programming Software,” http://newton.uor.edu/
facultyfolder/mahmoud_kaboudan/tsgp.
Koza, J. (1992), Genetic Programming, Cambridge, MA: The MIT Press.
Lo, A. and MacKinlay, C. (1999). A Non-Random Walk Down Wall Street, Princeton, NJ: Princeton
University Press.
17
Longley, P., and Batty, M. (1996) Spatial Analysis: Modelling in a GIS Environment, GeoInformation
International, Cambridge, UK.
López, C., Álvarez, A., and Hernández-García, E. (2000) “Forecasting confined spatiotemporal chaos
with genetic algorithms,” Physics Review Letters, 85, 2300-2303.
Maros-Nikolaus, P. and Martin-González, J. (2002) “Spatial forecasting: Detecting determinism from
single snapshots,” International Journal of Bifurcation and Chaos in Applied Sciences and
Engineering, 12, 369-376.
Mason, C. and Quigley, J. (1996) “Non-parametric hedonic housing prices,” Housing Studies, 11, 373-
385.
NeuroSolutionsTM (2002). The Neural Network Simulation Environment, Version 3, NeuroDimensions,
Inc., Gainesville, FL.
O'Reilly, U. 1999. Foundations of genetic programming: Effective population size, linking and mixing.
GECCO'99 workshop, April 1999.
Pindyck, R. & Rubinfeld, D. (1998). Econometric Models and Economic Forecasting. Boston: Irwin
McGraw-Hill.
Principe, J., Euliano, N., and Lefebvre, C. (2000). Neural and Adaptive Systems: Fundamentals Through
Simulations, New York: John Wiley & Sons, Inc.
Rossini, P. (2000) “Using expert systems and artificial intelligence for real estate forecasting,” a paper
presented at the Sixth Annual Pacific-Rim Real Estate Society Conference, Sydney, Australia, 24-27
January 2000, http://business2.unisa.edu.au/prres/Proceedings/Proceedings2000/P6A2.pdf.
Rubin, D. (1992) “Use of forecasting signatures to help distinguish periodicity, randomness, and chaos in
ripples and other spatial patterns,” Chaos, 2, 525-535.
18
. Table 1. Weighted least squares regression estimated coefficients
Variable Coeff. Std. Error t-Stat p-value APt-2 0.959 0.406 2.362 0.022 MRt-2 -27.484 13.987 -1.965 0.055 DVPCI(1)t-2 23.420 9.407 2.490 0.016 DVPCI(3)t-2 31.801 12.181 2.611 0.012 DVPCI(4)t-2 21.647 7.971 2.716 0.009 DVPCI(5)t-2 11.882 4.079 2.913 0.005 DVPCI(6)t-2 10.521 2.869 3.667 0.001 DVPCI(7)t-2 27.662 9.568 2.891 0.006 DVPCI(8)t-2 19.833 4.748 4.177 0.000 DVPCI(9)t-2 9.279 2.371 3.913 0.000 DVPCI(10)t-2 6.587 1.608 4.097 0.000 DVPCI(11)t-2 6.764 3.114 2.172 0.034 DVPCI(12)t-2 13.212 4.775 2.767 0.008 DVPCI(13)t-2 15.332 4.821 3.180 0.002
Latlon -2.650 1.261 -2.102 0.040
R2 = 0.76 DW = 1.76 MSE = 1061.23
19
Table 2. Comparative estimation and training statistics
Statistic WLS GP NN
R2 0.76 0.83 0.96 MSE 1061.23 1995.98 463.51 MAPE 0.130 0.212 0.091
Table 3. Comparative forecast statistics
Table 3. Forecast comparison WLS GP NN Theil's U 0.081 0.044 0.097 MAPE 0.153 0.079 0.156 PMSE 3331.6 981.9 4425.26 NPMSE 0.170 0.079 0.225
20
Figure 1. City of Cambridge: Neighborhood Boundaries with Major City Streets.
(http://www.ci.cambridge.ma.us/~CDD/commplan/neighplan/ pdfmaps/neighcitymap.html.)
21
Figure 2. City of Cambridge: Census tracts.
(http://www.ci.cambridge.ma.us/~CDD/data/maps/1990_census_tract_map.html.)
22
Figure 3. Process depicting evolution of equations to select a best-fit one using GP.
23
Figure 4. MLP network used to generate the final forecast.
24
Fig. 5.a
0
100
200
300
400
500
600
1 6 11 16 21 26 31 36 41 46 51 56 61 66
Observations
P Actual WLS
Fig. 5.c
0100200300400500600
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65
Observations
P Actual NN
Figure 5. Comparative historical estimation and training results of WLS, GP, and NN. Fig. 5.c. confirms
NN’s fitting ability. The worst fit among the three is Fig. 5.b. produced using GP.
Fig. 5.b
0
100
200
300
400
500
600
1 6 11 16 21 26 31 36 41 46 51 56 61 66
Observations
P Actual GP
25
Fig. 6.a
0
200
400
600
800
1 5 9 13 17 21 25 29 33 37 41
Observations
P Actual WLS -xp WLS -xa
Fig. 6.b
0
200
400
600
800
1 5 9 13 17 21 25 29 33 37 41
Observations
PActual GP - xo GP - xa
Fig. 6.c
0100200300400500600700800
1 5 9 13 17 21 25 29 33 37 41Observations
P Actual NN - xp NN - xa
Figure 6. WLS, GP, and NN forecast comparison. The first 21 observations are ex post forecasts
while the later 22 observations are ex ante forecasts. GP’s superiority is evident and
most logical.