Upload
odu
View
0
Download
0
Embed Size (px)
Citation preview
1
Latin Hypercube Sampling and the Identification of
the Foreclosure Contagion Threshold
by
Marshall Gangel
Virginia Modeling, Analysis, and Simulation Center (VMASC)
Old Dominion University
Norfolk, VA 23529
Michael J. Seiler*
Professor and Robert M. Stanton Chair of Real Estate and Economic Development
Old Dominion University
2154 Constant Hall
Norfolk, VA 23529-0223
757.683.3505 phone
757.683.3258 fax
and
Andrew J. Collins
Virginia Modeling, Analysis, and Simulation Center (VMASC)
Old Dominion University
Norfolk, VA 23529
Published in the Journal of Behavioral Finance
May 31, 2011
* Contact author
2
Latin Hypercube Sampling and the Identification of
the Foreclosure Contagion Threshold
Over the last several years, the U.S. economy has experienced a significant recession brought on
by the collapse of the residential real estate market. During this downturn, the number of real
estate foreclosures has risen drastically. Recent studies have empirically demonstrated a
reduction in real estate values due to neighboring foreclosures, termed the foreclosure contagion
effect. The foreclosure contagion effect impacts healthy neighboring properties that surround the
foreclosed property as a function of both time and distance.
We mathematically specify a precise equation that identifies the foreclosure contagion
threshold – the boundary that separates surviving markets from those that crash. Then, using a
new technique to our field known as Latin Hypercube Sampling, we presents the results of a
large scale sensitivity analysis to find that beyond the foreclosure discount and disposition time
variables, the percentage of adjustable rate mortgages (ARMs) and the foreclosure distance
discount distance weight are the two secondary most contributing causes to a market collapse.
Keywords: emergent behavior, foreclosure contagion threshold, agent-based models, Latin
hypercube sampling.
3
Latin Hypercube Sampling and the Identification of
the Foreclosure Contagion Threshold
Introduction
The real estate market has a significant role in the nation‟s financial systems as evidenced during
the ongoing financial crisis. Lending practices allowed high risk individuals to obtain mortgages
they could not afford. The result was a surge in foreclosures causing great instability within the
overall financial system which quickly turned global. While a handful of studies have measured
the magnitude of the foreclosure contagion effect, only Gangel, Seiler, and Collins (2011) have
used Agent-based Modeling and Simulation (ABMS) to measure how mortgage foreclosure
contagion can impact an overall marketplace. While their study focused on two key inputs,
foreclosure discount and disposition time, they failed to mathematically specify the contagion
threshold between surviving and failing markets. Without this equation, it is difficult for
policymakers to identify points where it is necessary to intervene and restore financial stability in
the system.
The second purpose of this study is to perform a sensitivity analysis on the numerous remaining
variables included in their model. Specifically, this study seeks to identify, through the use of
Latin Hypercube Sampling (LHS) – a technique newer before used in finance or real estate – the
sensitivity of additional variables on the potential collapse of the real estate market, and thus on
the overall financial markets.
In addition to successfully specifying the key equation to identify the foreclosure contagion
threshold, we are also able to identify the variables “foreclosure discount distance weight” (the
4
weight given to the foreclosed property based on its proximity to the subject property) and “loan
type” (adjustable rate versus fixed rate mortgages) as the next most significant contributors to a
market collapse. This finding is important because if policymakers are striving to lower the
negative impact of foreclosures, they should begin by streamlining the foreclosure process. In a
preventative sense moving forward, policymakers should also be wary of the percentage of
ARMs present in the market as an indicator of eminent collapse. These two variables are in the
direct control of policymakers, whereas foreclosure discount and foreclosure distance discount
weight are not.
Literature Review
Foreclosures within the real estate market occur when the borrower can no longer fulfill the
mortgage contract and eventually defaults. A legal process then begins which allows the creditor,
typically a bank, to gain possession of the property and then sell it to a third party. The money
received from the sale is applied to the remaining balance on the original loan. The foreclosure
process is extremely detrimental for all entities involved. Foreclosures directly cause financial
loss, personal credit decline, and property devaluation. Until recently, the most negative effects
of foreclosure were assumed to solely impact the parties directly involved with the foreclosed
properties. Although foreclosures were perceived to cause some devaluation to nearby
properties, research to define and quantify the contagion wasn‟t pursued until recently
(Immergluck and Smith, 2006).
Recent research has demonstrated that the negative effects of foreclosure are not felt solely by
the involved entities, but are also experienced by real estate properties that are located within
5
proximity to foreclosure events. This externalization of the negative consequences of foreclosure
events is known as the contagion effect. The effect is contagious because foreclosures are shown
to decrease neighboring property values which may lead to additional foreclosures. The negative
effects of foreclosed properties also spills over as a result of increased crime (Immergluck and
Smith, 2006). Harding, Rosenblatt, and Yao (2009) state the contagion effect is caused by
several factors. First, foreclosed properties are eventually listed for sale along with the other
properties that are listed in the traditional fashion. Therefore, foreclosures add to the supply of
properties that are contending for buyers, resulting in excess supply and thus lower property
values. The second factor involves the typical loss of value due to neglect, vandalism, and
abandonment, which can be observed from visual inspection of the premises of the foreclosure
property and its neighbors.
(insert Exhibit 1 here)
The datasets used in extant research are geographically focused within a single Metropolitan
Statistical Area (MSA). Exhibit 1 displays the MSAs, time period examined, and findings of the
major studies in foreclosure contagion research. Each study chose a different MSA for different
reasons. Rogers and Winter (2009) selected St. Louis because they felt its real estate market and
foreclosure activity were minimally impacted by the country‟s housing bubble. They support
this argument by comparing the rise of property values in St. Louis against other areas of the
country. They observed a smaller increase in value which indicates St. Louis property values are
driven by local economic factors and not the drastic increase that most American cities
6
experienced. Intuitively, a dataset with as few external factors as possible will produce results
with a higher confidence.
Lin, Rosenblatt, and Yao (2009) use data from Chicago for two primary reasons. First, Chicago
was used as a dataset in earlier studies. This promotes the comparison of their results with the
results of earlier research efforts. Second, they chose the city of Chicago because it possessed
one of the highest foreclosure rates in the country. In general, a dataset that contains a greater
number of occurrences of the phenomena to be studied should provide more thorough results
than a dataset with fewer occurrences.
Harding, Rosenblatt, and Yao (2009) select their multiple geographic datasets by reviewing the
accuracy of the data. Their initial dataset consisted of the entire nation. They reduced their
dataset to 37 MSAs by selecting the zip codes that contained the data with the highest integrity.
Finally, they reduced their dataset to seven MSAs. The final seven MSAs were the MSAs that
the authors felt could best be controlled for local market conditions.
Immergluck and Smith (2006) and Schuetz, Been, and Ellen (2008) do not directly state why
they chose Chicago and New York, respectively, as their sources of data. Immergluck and Smith
(2006) may have chosen Chicago for reasons similar to Lin, Rosenblatt, and Yao (2009).
Schuetz, Been, and Ellen (2008) may have chosen New York because of their familiarity with
the data due to their geographic collocation with the city.
7
The literature hypothesizes that the negative contagion effect of foreclosures is a function of time
and distance from a foreclosed property. Immergluck and Smith (2006) was the premier study in
quantifying the contagion effect of foreclosures. Their research only studied the contagion effect
of foreclosures as a function of distance. This negative effect decreases as the distance between
the foreclosure and the neighboring properties increases. The research following this initial
study has included foreclosure time along with distance. The foreclosure time theory
hypothesizes a foreclosure event will have the greatest negative contagion effect on the value of
properties that immediately surround the foreclosure, at the point in time when the foreclosure
initially occurs. With the exception of Immergluck and Smith (2006), the literature hypothesizes
that the negative contagion effect of a foreclosure event on its neighboring properties‟ values will
decrease as time and distance increases from the foreclosure event.
When measuring the strength of the foreclosure contagion effect, the Lin, Rosenblatt, and Yao
(2009) results demonstrate a significantly higher value than the other authors. Rogers and
Winter (2009) comment on this discrepancy, but are not able to explain it. Such a discrepancy
indicates that the contagion effect of foreclosures could change based on variables not examined
by the current research or that the contagion effect is not fully understood. Schuetz, Been, and
Ellen (2008) find negative coefficients, but they do not convert these figures into percentages.
ABM Background
Agent-based Modeling and Simulation (ABMS) is a relatively new technique in the world of
quantitative analysis. ABMS has been developed over the last couple of decades to study
problems that can be described as Complex Adaptive Systems (CAS). CAS are typically
8
composed of individual entities that follow a set of rules. The aggregate actions of the entities
directly impact the posture of a system. The behavior of the system, which is determined by the
complex aggregation of many entities, is referred to as emergent behavior. The goal of ABMS is
to understand how entity or agent level actions impact the emergent behavior of the entire system
in which the agents exist (North and Macal, 2007; Miller and Page, 2007).
ABMS provides a unique approach that differs from the traditional analytical techniques.
Techniques such as system dynamics make assumptions to represent collective effects of a
system or subsystem. The resulting model of a systems dynamics model is usually constructed as
a set of assumptions which dynamically interact for a given amount of time. Although the
ABMS modeling process includes its own assumptions, the strength of this technique is the
capability of agent level behavior and rules. This bottom up modeling approach enables ABMS
to explore system level emergent behavior through the implementation of individual agent rules
and behaviors (Gilbert, 2008).
Agents are the key element to ABMS. An agent is defined as a decision-maker within the
model. Agents typically follow a predefined list of rules or behaviors that is determined by the
individuals constructing the model during development. The agent behaviors are usually
dependent on the environment in which the agent resides (North and Macal, 2007).
One of the first observations of Complex Adaptive Behavior was made by Adam Smith (1776) in
his writing “The Wealth of Nations”. Smith noted that individuals in a society act with self-
centered motives without considering the impact of their decisions on society. The collective
9
decisions of all the agents create global actions that benefit society but cannot be traced to a
single agent‟s decision. He referred to this principle as the Invisible Hand. Smith‟s Invisible
Hand also illustrates the theme of emergence. Emergent behavior is the global behavior of a
system which is determined by the aggregate actions of many agents. Some actions of some
agents may have a greater impact on the emergent behavior than other actions or agents, but a
single agent cannot manipulate the outcome of the system (North and Macal, 2007).
Although emergence and CAS were first observed hundreds of years ago, ABMS techniques
were recently developed. The recent emergence of ABMS is due to two factors: an increased
need for the technique and the capability to build ABMS. The world has always contained
complexity, but the need to understand complex systems has increased drastically in recent
decades. The fields of social sciences, biology, and business are three fields that contain
substantial complexities. Subsequently, these fields are implementing ABMS to analyze
complex systems (North and Macal, 2007). Secondly, ABMS development was possible due to
the advances in computational techniques in the field of computer science. Techniques such as
object oriented programming have enabled analysis to simulate complex systems. ABMS would
not be possible without the combination of complex problems and computational techniques
(North and Macal, 2007). AMBS does not replace or supersede other analytical techniques;
instead, it complements the robust set of tools that are used by researchers. ABMS should
continue to build its reputation as an analytic technique as individuals demonstrate how it can be
utilized to study the complex world in which we live.
Latin Hypercube Sampling (LHS) Methodology
10
When performing system wide sensitivity analysis, one must consider each combination of the
sampled input variable values. However, this could result in a large number of combinations.
For example, if there were six input variables under consideration and each was sampled for 50
values, then there would be approximately 16 billion combinations to consider. In a stochastic
simulation, each input variable combination would need to be run several times for statistical
significance, thus the total number of runs required to achieve this analysis would be impractical.
Latin Hypercube Sampling (LHS) offers a way to look at the total variation of input variables
without having to conduct a large numbers of runs. By sampling the input variables in a certain
way, LHS produces results similar to looking at all combination of input variables, but with only
a fraction of the runs. The process of LHS was developed by McKay, Beckman, & Conover
(1979). LHS was originally designed for use within the sensitivity analysis of computer
simulations and was developed around the same time that Agent-based modeling was first being
developed as a subject (Schelling, 1969). Though both methods were developed during a similar
time-frame, they have rarely been applied together. The advantage of the LHS approach to
sampling is that similar results1 are achieved as those from a simple random sample, but LHS
requires a much smaller sample size to achieve this. Given that each sample run requires
additional time and computing power, this is a very practical benefit. The method of using LHS
described below assumes the input variables have a continuous distribution, but the approach can
easily be adapted to the discrete distribution case as well.
Given that „k‟ input variables are of interest, and „N‟ samples are required, LHS is achieved as
follows. Each input variable „X‟ is given a probability density function „f(X)‟ and a domain [xmin,
1 By similar results, we mean that both methods produce a similar distribution of the output variables.
11
xmax]. This domain is split into N equiprobable disjoint segments, each which have their own
minimum „ximin‟ and maximum „x
imax„, thus:
max
min
1 , 1,2, ,i
i
x
Nx
f x dx i N
Determining the values of „ximin‟ and „x
imax„ can occur iteratively if „x
1min‟. The following
formula is used:
1
min
1 1max min
1
min max
inf : 0
, 1,2,...,
, 1,2,..., 1
i i
N
i i
x x F x
x F F x i N
x x i N
As cumulative distribution functions are used in this iteration, it only needs to be slightly adapted
for use with discrete variables. Given any „i‟ a new random variable can be determined „Xi‟
with a probability density function (PDF) of:
min max. if x x ,x
0
i i
i
N f xf x
otherwise
This is a PDF as:
1if x dx
For each input variable „X‟, a single sample „xi‟ is taken from each of the „N‟ new random
variables „Xi‟.
To determine the tuples of input variables for the simulation runs, a single sample is randomly
selected from the set {x1, x
2,…, x
n} for each of the input variables and a tuple of input variable
values is formed. This process is repeated „N‟ times without replacement to generate the
complete set of samples.
12
Partial Correlation Coefficients (PCC)
Given a set of „N‟ paired samples from a jointly distributed random variable „R‟ = {Ri, Rj}, the
sample correlation between the random variables can be calculated using the following formula:
1
22
1 1
, , 1,2,...,
N
it i jt j
tij N N
it i jt j
t t
r r
c i j K
r r
Where {ri1, ri2,…, riN} and {ri1, ri2,…, riN} are the samples with means „µi‟ and „µj‟ respectively.
This type of correlation is usually called Pearson‟s correlation coefficient though its use is so
wide spread that it is often just called „correlation‟. When considering rank-ordered samples,
Spearman Rank-Order correlation is used – which is a special case of Pearson‟s correlation
coefficient.
Correlation is a measure of how the means of two variables are associated overt time. For
correlation to have any meaning, the two samples under consideration must come from the same
source or data point. For example, an input variable and output variable from a simulation can
be measured for correlation if each input variable can be paired with an output variable (i.e., both
data sample values are from the same simulation run). However, if there is some other variable
that is also varied over the sample of simulation runs, then its variation might have an impact on
the correlation of the two variables.
To exemplify this effect consider the following sample of tuples: {1, 2, 4}, {2, 3, 8}, {3, 4, 12},
{4, 5, 16}, {5, 8, 24} and {6, 9, 30}. The first variable looks like it could have a positive
13
relationship with the second variable but so does the third variable; to what extent does the first
variable have a relationship with the second becomes difficult to determine because of the
influence from the third variable.
The influence of other variables can be removed by using a variation of the correlation method
called Kendall‟s partial correlation coefficient (PCC) or Kendall‟s Tau. Just as Spearman Rank-
Order correlation is a special case of Pearson‟s correlation coefficient, so too is Kendall‟s partial
rank correlation coefficient (PRCC) a special case of Kendall‟s partial tau correlation coefficient
(Kendall, 1942). PCC works by adjusting the correlation coefficient of any pair of variables by
the correlation coefficient of the other variables.
The following step show how the PCC „ij‟ is worked out given „k‟ variables (this could include
the output variables):
1. Define symmetric matrix C := [cij]
2. B := [bij] = C-1
(exists as lead diagonal of C is all ones)
3. ij
ij
ii jj
b
b b
More information about partial correlations can be found in the introductory text by Anderson
(2003) and the canonical work by Kendall and Stuart (1958).
Statistical Significance
Kendall‟s partial rank correlation coefficient is based on Spearman‟s rank correlation coefficient
(Kendall & Gibbons, 1990), a.k.a. Spearman‟s Rho. Ironically, though the coefficient bears his
name, Maurice Kendall originally did not approve of this approach (see Kendall, 1942). Due to
14
the complicated nature of the distribution of Kendall‟s partial rank correlation coefficient, it has
either been ignored as a method or approximated (Nelson & Yang, 1988). Thus a common
hypothesis test approximation is to use the same statistics as for Spearman‟s rho (Kendall &
Gibbons, 1990; Blower & Dowlatabadi, 1994; Anderson, 2003; Marino et al., 2008). If the null
hypothesis is that there is non-significant partial correlation (i.e., rrank = 0) then the following
statistic can be used:
21
2
ij
ij
ij
t
n
The degrees of freedom, Student‟s t-distribution statistic, are „n -2‟. An underlying assumption
of this approximate test is that variables are normally distributed. There are several approaches
that could be employed to remove this normality assumption, such as Fisher‟s transformation
(Fisher, 1915), but considering this test is already an approximation, the approach was deemed
an unnecessary complication.
Results
Recall that it is not the purpose of the current investigation to build an ABM to replicate the
residential real estate market. Instead, our first goal is to mathematically specify the boundary
between healthy and crashing markets in terms of the tradeoff between foreclosure discount and
disposition time. Using the ABM model of Gangel, Seiler, and Collins (2011), we then perform
LHS analysis to identify the relative strengths of additional deterministic variables not
investigated in their model.
Foreclosure Threshold Identification
15
(insert Exhibit 2 here)
Foreclosure discount and disposition time are the two variables selected for foreclosure threshold
analysis. Exhibit 2 demonstrates a sample of the results of the single variable analysis with a
discrete step of -0.005 for the foreclosure discount. Each line on the exhibit represents the
average of 30 simulation runs with the same foreclosure discount value. The results of the
parametric analysis of the foreclosure discount demonstrate that the impact of the foreclosure
discount has an inverse macro level impact on average property values. This behavior can be
considered emergent because the temporary effect of the foreclosure discount of individual
foreclosures creates a lasting impact on the value of the entire market. For relatively weak
contagion effects, the market continues to gain value as time increases. But as the contagion
discount becomes stronger, the growth of the average value of the entire market decreases,
reaches a maximum, and then (in some simulations) declines. The only differences between the
runs are random numbers with previously discussed underlying distributions generated by the
computer. Once this behavior was observed at a particular foreclosure discount, it was observed
for all configurations that were executed with larger foreclosure discount values. Once the
crashing threshold was crossed, the contagion effect of foreclosures was too strong for the
market to retain its value. The market‟s foreclosures create a positive feedback loop that causes
extreme devaluation of surrounding property values which, in turn, causes additional
foreclosures. This cycle continues until the entire market loses all value.
The second variable that was examined by using parametric analysis was disposition time.
Disposition time represents the amount of time that a property remains foreclosed on the market.
16
Unlike the contagion effect of foreclosures, which cannot be controlled, the amount of time that
foreclosures are on the market is usually dictated by processes mandated by state-specific laws
(Seiler et al., 2011). Because the market is impacted by the amount of time foreclosures are left
to linger on the market, the contagion effect of foreclosures can be mitigated with the
introduction of new policy.
As seen in Panel B of Exhibit 2, similar to the foreclosure discount variable, the disposition time
also has a substantial impact on the average property value of the entire market. This behavior
can be considered an emergent behavior because it was not directly implemented in the model.
Foreclosures have a temporary negative impact on their neighbors‟ value while they are present.
When a foreclosed property is resold, it can no longer directly impact its neighbors‟ values.
Indirectly, however, there is a legacy or hangover effect that persists. The contagion effect of
foreclosures can have a lasting impact if a property is sold while in the presence of foreclosed
properties. Given this principle, the results from the above figure are logical. Markets with
longer disposition times are exposed to larger number of foreclosures, on average. Therefore,
the contagion effect can have a large negative impact on the entire market.
The single variable parametric analysis was executed numerous times with different values set
for foreclosure discount and disposition time variables. In all configurations, the crashing
market phenomenon was observed, but it occurred at different combinations of the two
foreclosure variables. For example, the crashing threshold occurred at a foreclosure discount of -
5.8% when the disposition time was set at a value of 8 months. When the disposition time was
set at a value of 7 months, the crashing threshold occurred at a foreclosure discount of -6.6%.
17
Once the analysis of the results revealed that the disposition time and foreclosure discount
variables share an interdependent effect on the average property value metric, multiple variable
parametric analyses were initiated to understand the relationship between the two variables.
Various combinations of the two variables were then executed in an attempt to establish a
relationship between them by inspecting the impact on the average property value metric over
numerous simulation runs. Before the simulation runs could be executed, a primary variable and
benchmark needed to be selected. The crashing threshold was established as the benchmark for
this analysis. A market with an equal probability of succeeding or failing is recognizable and
constant for any configuration of the model. The primary variable would serve as a pseudo
independent variable during the multiple variable parametric analysis. Once the value of the
primary variable was set, the second variable would be adjusted until the desired benchmark, or
crashing threshold, was observed in the output data. The disposition time variable was selected
as the primary variable because it has a limited number of discrete values that it can possess,
while the foreclosure discount variable is a continuous variable with unlimited values. In theory,
a foreclosure discount value exists for each disposition time value which subsequently produces
a crashing threshold, but not vice versa.
Thousands of combinations were executed to produce enough data to establish a relationship
between the foreclosure discount and disposition time variables. The disposition time variable
was set between 1 and 14 months with a discrete step of 1 month. For each value of disposition
time, the model was executed with numerous values for the foreclosure discount variable until
the crashing threshold was found. If the simulation runs produced all successful markets, the
18
foreclosure discount was set to values with greater negative intensity. If the simulation runs
produced all crashing markets, the foreclosure discount was set with values with lesser negative
intensities. Once the crashing threshold was bounded, the discrete step of the foreclosure
discount was reduced until the crashing threshold result was obtained. Exhibit 3 represents a
sample of the simulation runs produced by the multiple variable parametric analyses. Each point
represents a unique combination between foreclosure discount and disposition time. The y axis
represents values of foreclosure discount, and the x axis represents values of disposition time.
Each point or combination was executed thirty times to produce enough output data to identify
the crashing threshold.
(insert Exhibit 3 and 4 here)
The multiple variable parametric analysis reveals that the relationship between the disposition
time and foreclosure discount variables follow the below equation:
Exhibit 4 graphically represents the above equation. Each point represents a unique combination
of the two variables that result in a crashing threshold where half of the simulation runs produce
a successful market and the other half produce a market crash. Any combination of the two
variables found above the curve produces an unsuccessful or crashing market, whereas
combinations below the curve results in a successful market. This emergent behavior suggests
that the contagion effect of foreclosures can be mitigated by controlling the amount of time that
19
foreclosures are allowed to remain on the market. Furthermore, the relationship between these
two variables may follow a mathematical function. This relationship was not programmed into
the model, but resulted from other rules and functions that were previously described.
Latin Hypercube Sampling and Partial (Rank) Correlation Coefficients
Sensitivity analysis is split into two components: the sampling of input variables and the
determination of effects of this variation. We use the approached of Blower & Dowlatabadi
(1994) and Marino et al. (2008). This involves using LHS for the input variable sampling and
Partial Correlation Coefficients (PCC) / Partial Rank Correlation Coefficients to determine its
impact.
(insert Exhibit 5 here)
LHS
To properly sample the possible input variables, certain assumptions about their underlying
distributions have to be made. Exhibit 5 shows the list of variables that where considered for the
sensitivity analysis. The first two variables, disposition time and foreclosure discount, have
already been varied in previous runs. The first new input variable is the Foreclosure Distance
Discount Weight. The simulation model currently assumes that the appraised value of a property
is equally affected by the amount of time that local foreclosed properties have been foreclosed
and the distance that these properties are from the property of interest. This would correspond to
an input value of 0.5. In the sensitivity analysis, if this variable was assigned a weight of 0, then
only the amount of time that a local property has been foreclosed would have any affect.
20
Conversely, if the input variable is 1, then only the distance of the foreclosed properties would be
under consideration.
The input variable Foreclosure Effect Radius indicates the maximum distance that a foreclosed
property will have on the appraisal of other properties. At present, the current radius sweep is 10
homes in any direction2. The Loan Type variable indicates the fraction of mortgages which are
fixed rate. Appraisal Time indicates the amount of time that passes before each property is
appraised. Currently, the simulation appraises each property at each time-step, which is one
month.
Each of the input variables was given a probability distribution from which 50 samples could be
selected. The simplest possible probability distributions were chosen for each input variable,
given the information available. Within Bayesian statistics, the uniform distribution is used
when there is no information about the underlying distribution of the variable of interest. When
possible, we use the uniform distribution for both discrete and continuous case.
In the case of Foreclosure Discount, the literature suggests that foreclosures could have
approximately a 1% (Immergluck & Smith, 2006) or a 10% (Rogers & Winter, 2009) effect on
the appraisals of surrounding properties. A Chi-squared distribution was chosen here because it
is the simplest continuous positive distribution that only requires one parameter, the mean, as an
input. As it was not clear which mean value to use, two sets of sensitivity analysis simulation
runs were conducted. One with Foreclosure Discount mean of 1% and one with a mean of 10%.
2 Using a torous grid environment, this means that up to 351 homes are considered each time an appraisal id
performed.
21
The Poisson distribution was chosen for Foreclosure Effect Radius and Appraisal Time because
it was the simplest positive discrete distribution that only requires one parameter. In the case of
Appraisal Time, an adjusted version of the Poisson distribution was chosen to remove the chance
that a time of zero was sampled. Once each input variable was sampled 50 times, the samples
were combined to form the inputs to 50 different simulation runs. Each run was repeated 30
times due to the stochastic nature of the simulation, and the results were then averaged from
these 30 runs. With the results collected from the LHS simulation runs, various types of analysis
can now be conducted on them. As with the previous results, only the final average property
price is considered for analysis.
Simulation runs with 1% average foreclosure discount
The first set of LHS simulation results considers an average of the Foreclosure Discount of 1%.
With such a low discount, none of the simulations saw a property market crash. As there are no
property crashes to consider, the average property price is used as the dependent variable.
(insert Exhibit 6 here)
Regression
Exhibit 6 shows the coefficients from standard multivariate linear regression. As the table
indicates, all independent variables, except Appraisal Time, have a significant impact on the final
average property values. The Adjusted R-squared value is 0.715, which implies that
approximately 72% of all variation in the average property values could be accredited to these
variables. Each of the 30 x 50 simulation runs is considered separately for this analysis, and as
22
such, it is expected that some of the remaining variation is due to the differences within the 30
simulation runs for a particular input variable set. Once each input variable is sampled 50 times,
the samples are combined to form the inputs to 50 different simulation runs. Each run is then
repeated 30 times due to the stochastic nature of the simulation. Finally, the results are then
averaged from these 30 runs.
The coefficient from the Foreclosure Distance Discount Weight variable indicates that the
greater the effect that the distance of a foreclosure has on the appraisal of local properties, the
lower the average property price will be. The Foreclosure Distance Discount Weight represents
the relative weight of a foreclosure distance as opposed to disposition time on a local appraisal.
A foreclosed property‟s distance remains constant for the duration that the foreclosure property
is on the market, where as the effect of disposition time declines within the simulation during the
time that the property is foreclosed. Thus, it is not surprising that foreclosure distance will have
a more negative effect on the prices of surrounding properties which is reflected within the
regression coefficient.
Partial Correlation Coefficient
We now focus on the relative strengths of the individual additional variables used in the model.
Analysis of the LHS can be done by looking at the Kendal‟s Partial Rank Correlation Coefficient
(PRCC) and Kendall‟s Partial Tau Correlation Coefficient (PCC) (Kendall & Gibbons, 1990).
The reason for using Partial Correlation Coefficients for a given variable, as opposed to standard
Pearson Correlation Coefficient, is that the variation from the other input variables is removed
23
from the correlation value. The removal of this external variation from other variables is
important within the analysis of a LHS sample as all the variables change at once.
Exhibit 6 shows the partial correlation values of different input variables. The partial correlated
values are on the same scale as the normal correlated values ranging from -1 to +1. The only
variable that has a significant non-zero correlation is Foreclosure Distance Discount Weight.
Recall that the Foreclosure Distance Discount Weight measures how much more an impact the
distance of a foreclosed property has on the pricing of the subject property relative to the impact
of recent foreclosures (time). The mechanism of the model can explain this correlation between
Foreclosure Distance Discount Weight and average housing prices. Simply put, the distance the
subject property is from a foreclosed property remains constant over time, whereas the
„recentness‟ of a foreclosure decreases with time, and therefore, will have less impact.
Because we include both discrete and continuous representations of our variables, the preferred
use of a ranked or non-ranked approach to the partial correlation analysis can be debated. For
completeness, we also report the Partial Rank Correlation Coefficients. The figures are very
similar to the results from the partial correlation coefficient. The only exception is the additional
significance of the loan type variable.
Simulation runs with 10% average foreclosure discount
The previous set of result contained no simulation runs where the property market actually
crashed. There was no property market crashes because the distribution of each of the input
variables were conservative estimates. To induce property market crashes within the simulation
24
results, more extreme input variables are required. Foreclosure Discount was again selected as
the influencing input variable to achieve this task because the previous results showed a very
high negative deterministic relationship with average house price. The previous discussed
results were generated using an average Foreclosure Discount of 1%. To achieve the more
extreme results presented in the right half of Exhibit 6, an average foreclosure discount of 10% is
used to be consistent with the upper range found in the literature (Immergluck & Smith, 2006).
All other input variables retain their previous value ranges.
The new output from the LHS is analyzed in a similar fashion to the previous set of results. The
difference between these analyses is that a new dependent variable is used. Instead of using the
average property price as the dependent variable, a binary indicator of whether or not the market
crashed is used. If the market did not crash, the dependent variable has a value of one, and if the
market did crash, then the dependent variable takes a value of zero. As the dependent variable is
binary, logistic regression analysis is used instead of traditional linear regression.
The results are very similar to the results seen in the previous analysis in terms of significance of
the input variables. In the regression, all the independent variables are significant, whereas the
partial correlation indicates that only two are significant: foreclosure distance discount weight
and loan type. A positive correlation between loan type and market crashes indicates that the
greater the percentage of fixed rate properties there are in the marketplace, the less likely the
market is going to crash.
Conclusions
25
The purpose of this study is to identify the equation specifying the boundary or threshold
between markets that do and do not crash based on foreclosure discount and disposition time.
After a series of analyses, we were able to specify this relationship with precision. We then
introduced a new methodology to the field, Latin Hypercube Sampling, that allows for a
sensitivity analysis to be conducted when traditionally, the number of combinations is too
numerous for mainstream computational techniques to handle. We find that beyond the two
major contributors: foreclosure discount and disposition time, the foreclosure distance discount
weight and loan type are the two most significant determinants of unstable real estate markets.
This study is helpful to policymakers in that it gives them a better understanding of just how
much stress a real estate market can take before it collapses. Moreover, by conducting the LHS
analysis, we further provide policymakers with suggestions on where to begin when trying to
stem the tide of a mortgage foreclosure contagion.
26
References
Anderson, T.W., 2003, “An Introduction to Multivariate Statistical Analysis,” 3rd ed., Wiley-
Interscience.
Blower, S.M. & H. Dowlatabadi, 1994, “Sensitivity and uncertainty analysis of complex-models
of disease transmission - an HIV model, as an example,” International Statistical Review, 62(2),
229-243.
Gangel, M., M. Seiler, and A. Collins, 2011, “Exploring the Foreclosure Contagion Effect Using
Agent-Based Modeling,” Journal of Real Estate Finance and Economics, forthcoming.
Gilbert, N., 2008, “Agent-Based Models”, in Series: Quantitative Applications in the Social
Sciences, SAGE Publications.
Fisher, R.A., 1915, “Frequency Distribution of the Values of the Correlation Coefficient in
Samples from an Indefinitely Large Population,” Biometrika, 10(4), 507-521.
Harding, J., E. Rosenblatt, and V. Yao, 2009, “The Contagion Effect of Foreclosed Properties,”
Journal of Urban Economics, 66(3), 164-178.
Immergluck, D., and G. Smith, 2006, “The External Costs of Foreclosures: The Impact of
Single-Family Mortgage Foreclosures on Property Values,” Housing Policy Debate, 17(1), 57-
79.
Kendall, M., 1942, “Partial Rank Correlation,” Biometrika, 32, 277-283.
Kendall, M., & J. Gibbons, 1990, “Rank Correlation Methods,” 5th ed., Charles Griffin and
Company.
Kendall, M., & A. Stuart, 1958, “The Advanced Theory of Statistics,” Charles Griffin and
Company.
Lin, Z., E. Rosenblatt, and V. Yao, 2009, “Spillover Effects of Foreclosures on Neighborhood
Property Values,” Journal of Real Estate Finance and Economics, 38(4), 387-407.
Marino, S., I. Hogue, C. Ray, and D. Kirschner, 2008, “A Methodology for Performing Global
Uncertainty and Sensitivity Analysis in Systems Biology. Journal of Theoretical Biology,
254(1), 178-196.
McKay, M., R. Beckman, & W. Conover, 1979, “A Comparison of Three Methods for Selecting
Values of Input Variables in the Analysis of Output from a Computer Code,” Technometrics,
21(2), 239-245.
Miller. J., and S. Page, 2007, “Complex Adaptive Systems: An Introduction to Computational
Models of Social Life,” Illustrated edition. Princeton University Press.
27
Nelson, P. & S. Yang, 1988, “Some Properties of Kendall's Partial Rank Correlation
Coefficient,” Statistics & Probability Letters, 6(3), 147-150.
North, M., and C. Macal, 2007, “Managing Business Complexity: Discovering Strategic
Solutions with Agent-Based Modeling and Simulation”, Oxford University Press.
Rogers, W., and W. Winter, 2009, “The Impact of Foreclosures on Neighboring Housing Sales,”
Journal of Real Estate Research, 31(4), 455-479.
Schelling, T., 1969, “Models of Segregation,” American Economic Review, 59(2), 488-493.
Schuetz, J., V. Been, and I. Ellen, 2008, “Neighborhood Effects of Concentrated Mortgage
Foreclosures,” Journal of Housing Economics, 17(4), 306-319.
Seiler, M., V. Seiler, M. Lane, and D. Harrison, 2011,“Fear, Shame, and Guilt: Economic and
Behavioral Motivations for Strategic Default,” working paper, Old Dominion University.
Smith, A., 1776, “The Wealth of Nations,” later published as “An Inquiry into the Nature and
Causes of the Wealth of Nations,” Edwin Cannan, ed., 1904. Library of Economics and Liberty.
28
Exhibit 1: Summary of the Foreclosure Contagion Literature
Author Location Date Range
Studied Distant Bins Time Bins Max
Contagion (%)
Rogers and Winter St. Louis, Missouri 1998 - 2007 0-200, 300-400, and
500-600 yards 0-6, 7-12, 13-18, and
19-24 months -1.4
Lin, Rosenblatt, and Yao Chicago, Illinois 1990 - 2006 0-20 km with 25 bins 0-2, 3-5, and 6-10 years -8.7
Harding, Rosenblatt, and Yao
Altanta, Georgia Charlotte, North Carolina
Columbus, Ohio Las Vegas, Nevada
Los Angeles, California Memphis, Tennessee
St. Louis, Missouri
1990 - 2007 0-300, 300-500, 500-1000, and 1000-2000
feet 0-2 years with 13 bins -1
Immergluck and Smith Chicago, Illinois
1997-1999 0-0.125 and 0.125-
0.25 miles 0-2 years -0.09
Schuetz, Been, and Ellen New York, New York 2000-2005 0-250, 250-1000 feet 0-18 and 18+ months NA
29
Exhibit 2. Disposition Time and Foreclosure Discount versus Home Prices
Panel A: Relationship between Foreclosure Discount and Home Prices over Time
Panel B. Relationship between Disposition Time and Home Prices
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
$1,400,000
1
55
10
9
16
3
21
7
27
1
32
5
37
9
43
3
48
7
54
1
59
5
64
9
70
3
75
7
81
1
86
5
91
9
97
3
Ave
rage
Pro
pe
rty
Val
ue
Months
-0.1
-0.095
-0.09
-0.085
-0.08
-0.075
-0.07
-0.065
-0.06
-0.055
-0.05
-0.045
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
1
49
97
14
5
19
3
24
1
28
9
33
7
38
5
43
3
48
1
52
9
57
7
62
5
67
3
72
1
76
9
81
7
86
5
91
3
96
1
Ave
rage
Pro
pe
rty
Val
ue
Months
Example of Parametric Analysis of Foreclosure Time
3
4
5
6
7
8
9
10
11
12
30
Exhibit 3. Defining the Threshold Line between Foreclosure Discount and
Disposition Time
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 2 4 6 8 10 12 14
Fore
clo
sure
Dis
cou
nt
Disposition Time
31
Exhibit 4. Specifying the Equation of the Foreclosure Contagion Threshold
y = 0.4737x-1.001
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10 12 14 16
Fore
clo
sure
Dis
cou
nt
Disposition Time
32
Exhibit 5: Assumed distributions Variable used for the Latin Hypercube Sampling
Variable Distribution Parameters
Disposition Time Uniformed (Discrete) 1-12 (months)
Foreclosure Discount Chi-squared Mean: 1% & 10%
Foreclosure Distance Discount
Weight
Uniformed
(Continuous)
0-1
Foreclosure Effect Radius Poisson distribution Mean: 50 (10 house at
size 5)
Loan Type Uniformed (continuous) 0-1 (fraction of FRM)
Appraisal Time Poisson distribution
(adjusted) Mean: 1 (month)
33
Exhibit 6: Regression, Partial Correlation Coefficients, and Partial Rank Correlation Coefficients for Foreclosure Discounts of
1% and 10% The dependent variable in the “Foreclosure Discount = 1%” regression is average ending property value. The dependent variable in the “Foreclosure Discount =
10%” regression is a dummy variable equal to 1 if the market did not crash, and equal to 0 if the market did crash. Disposition time is the number of months the
property is allowed to linger on the market. Foreclosure discount is the amount by which the property decreases due to foreclosure. Foreclosure Distance
Discount Weight refers to the relationship between how close the neighboring foreclosure is to the subject property and how great is its negative impact.
Foreclosure Effect Radius indicates the maximum distance that a foreclosed property can be away from the subject property and still have an impact. Loan Type
indicates the fraction of mortgages which are fixed rate. Appraisal Time indicates the amount of time that passes before each property is appraised.
Foreclosure Discount = 1% Foreclosure Discount = 10%
Traditional
Regression
PCC
PRCC
Logistic
Regression PCC
PRCC
Intercept 4,740,000**
104**
Disposition Time -26,800**
-3.80**
Foreclosure Discount -25,100,000**
-212**
Foreclosure Distance Discount Weight -526,000**
-0.613* -0.551* -59.8** -0.562* -0.452*
Foreclosure Effect Radius -2,290**
-0.187 -0.211 -0.261** -0.245 -0.265
Loan Type -89,800*
-0.125 -0.529* 21.0** 0.409** 0.332**
Appraisal Time 1,400
0.010 0.137 1.27** 0.126 0.103
R2 0.715
Negelkerke R2
0.903
Cox & Snell R2
0.591
* Significance at 95%; ** Significance at 99%