Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
SCALE IDENTIFICATION IN SPATIALLY EXPLICIT POPULATION-ENVIRONMENT MODELING
Deirdre M. Mageean1, John G. Bartlett2 and Raymond J. O’Connor3
1 Margaret Chase Smith Center for Public Policy and Department of Resource Economics and Policy,University of Maine,Orono, ME 04473
2 Southern Global Change ProgramUSDA Forest ServiceRaleigh, NC 27606
3 Department of Wildlife EcologyUniversity of MaineOrono, ME 04469-5755
Address for Correspondence:
Dr. Deirdre MageeanMargaret Chase Smith Center for Public Policy Coburn Hall, University of MaineOrono, ME 04473(207) 581- 1644(207) 581- 1266 (fax)[email protected]
1
ABSTRACT
Human interactions with the environment are often contingent on, and constrained by, regional conditions and the development of operational methods of quantifying them has therefore proved elusive. We show from first principles that at least four structural classes of such interaction are possible. Regression tree models can identify such structuring in the influence of environmental variables on demographic responses.
We used regression trees, optimized by cross-validation, to model county-level population density in 1990 and to model relative population change over 1980-1990 in relation to climate and remotely sensed land cover variables over a 12,600 cell hexagon grid for the conterminous United States. The population density model yielded six, essentially spatially clustered, end nodes or zones. Within each zone population density was statistically associated with a unique set of environmental constraints. The model explained 59.8% of the variation in population density, mostly through elevation and a high population density flag, and remained unchanged following several post hoc model optimization tests. Constructing equivalent models for other variables relevant to population-environment relationships e.g. change in population density between 1980 and 1990, various indices of human settlement, etc., yielded models that were also functions of climate, seasonality, and certain types of land cover. Model structures for the different population-environment response variables generally approximated one particular theoretical structure, though the relative population change model conformed to a second type.
The residuals of empirical population density from the regression tree prediction within each hexagon revealed a systematic large-scale gradient across the eastern edge of the Great Plains. Within each of the six zones, on the other hand, the residuals in population densities were typically locally clustered, indicating a need for locally finer resolution models in these places. Many of these clusters of residuals coincided spatially with individual Omernik ecoregions and the amplitude of the residuals depended on which ecoregion was involved. In addition, residuals in hexagons that straddled ecoregion boundaries showed that populations there were locally higher than predicted, and conversely for interior hexagons, indicating that, even in a country as developed as the United States, population distribution was differentially centered on areas of local environmental diversity. Towns, highways, and ecoregional boundaries, typically in combination, were present where the continental scale model under-estimated population densities.
These results imply that no single scale and model structure is optimal for socio-demographic analysis over a global or continental extent. Instead, initial global models can fruitfully be regionally and locally refined in a recursive but geographically specific manner by use of hierarchical modeling techniques.
2
INTRODUCTION
Complex linked systems such as the interaction of human population distribution and the
natural environment require the synthesis and integration of data generated by processes that
operate on very different spatial scales. Moreover, these processes may be structured
hierarchically (Costanza et al. 1992, O'Neill et al. 1989). Spatially-extensive, aggregate factors
often emerge as important in analysis of such systems (Miller et al.1996, Roth et al. 1996,
Wickham and Norton 1994, Hall et al. 1995).
In hierarchical systems lower-level "noise" may become locally or regionally coordinated
to the point where it constitutes a perturbation of a higher level (Norton and Ulanowicz 1992).
In such situations the global hierarchical model has to be locally modified to account for the
changed scale at which these localized processes are operating. This may be especially true of
human-environment interactions which, in general, may be concealed because population growth
interacts with other variables in location-specific ways. Deforestation, for example, has multiple
causes, with the particular mix of causes varying from place to place, and the pattern depends on
the intersection in time and space of the combined effects of various conditions (Rudel and
Roper 1996). Similarly, many population-environment analyses fail to find population correlates
of environmental degradation studied across nations but do find marked correlations between
land use change and population when analysis is restricted to regions of similar socioeconomic
characteristics. These types of regional distinctiveness argue strongly for analysis that
recognizes and allows for contingent effects between response and predictor variables and that
can determine where local development of higher resolution models is needed (Meyer and
Turner 1994). It remains, however, possible that in other situations population-environment
interactions are not hierarchically structured in this way but rather reflect the outcome of the
simultaneous action of multiple global (in the sense of long wavelength) drivers e.g. Brown
(1995).
These diverse efforts to develop a spatially explicit understanding of population-
environment interactions (e.g., Cowen and Jensen 1998; Wood and Skole 1998; LUCC 1996)
have drawn attention to the need to conceptualize the links of population-environment research
to remotely sensed data and to develop tools to combine the disparate data so as to distinguish
among these possible structures (Geoghegan et al. 1998). Censuses, because they are
3
comprehensive, can be aggregated to various units, normally defined by political or
administrative boundaries. However, environmental influences typically work across such units,
and identification of their particular scales and spatial domains then requires comprehensive
environmental data and analysis across spatial, temporal, and hierarchical scales (Geoghegan et
al. 1998). Remotely observed land cover data can, for example, reflect both the influences of
urbanization and road development on tropical forests (Mertens and Lambin 1997) and the
converse effects of environmental determinants of population distribution and change (Rindfuss
and Stern 1998) but need to be analyzed at appropriate resolution and spatial extent. To this end
any advances in the conceptualization and quantification of the linkages between population
distribution and dynamics and environmental data are of considerable value. In the present paper
we describe how different patterns of environmental drivers on population attributes may
interact, we present an example of how such pattern can be determined statistically, and we
develop an approach to determining the scaling of the observed phenomena.
Spatial structuring in human-environment interactions
Decisions about the appropriate scale of analysis and aggregation of data are typically
driven by considerations both of theory and of data availability (Rindfuss and Stern 1998). To
understand the potential patterning of human impacts on the environment one must also
understand the patterning of environmental influences on humans. One can conceive of the
influence of environmental factors on humans being patterned within four distinct structures,
characterized by regional patterning of the effects of different variables. In the first of three
hierarchical patterns (Table 1a), each of a set of regions is dominated by the influence of a single
environmental factor considered favorable if above some threshold, and unfavorable otherwise.
Within each region all sites examined have common environmental conditions. Then the
hierarchy of regional environmental influences will take the form shown in Table 1a. First, one
region (for spatially auto-correlated data) is separated off on the basis of all of its sites satisfying
a threshold in variable A (which will be the strongest of the five variables). These sites have a
response level R1 (in population density, change in population density, wealth, housing
conditions, or whatever other response variable is being modeled in terms of the biophysical
drivers). Of the remaining sites, all of which are unfavorable in respect of the environmental
4
conditions characterized by variable A, a second block of sites is separated from the remaining
sites on the basis of variable B being favorable, and the response variable takes value R2 in this
region. Similarly, among the remaining sites (all of them now unfavorable in respect both of
variable A and of variable B), a third variable may be favorable, leading to response R3. This
process continues until eventually no further variable is available to discriminate among the
remaining locations. One might expect that the levels of response R1, R2, R3, etc., should form
a monotonically declining sequence but with empirical data it will be possible for reversals to
occur, reflecting particularly localized strength of response to a variable. This relatively simple
scenario can logically be described as a regional dominance model.
An alternative structure emerges where regions are influenced by the congruence of
multiple global factors (Table 1b). (“Global” is used here to refer to the entire spatial domain of
the sample points being considered which may, but need not, be world-wide). Suppose three
factors influence the distribution of humans and that each again acts globally in a simple binary
(favorable/unfavorable) manner. If these factors are A, B, and C, then there is a total of eight
possible combinations. If for specificity we assume that the absolute strengths of variables A, B,
and C are in that order, then the eight response levels R1 through R8 form a monotonically
declining sequence. Note that each of the eight regions is defined as the set intersection of the
values of the three variables. This scenario is logically termed the global constraint intersection
model.
The third situation is depicted by Table 1c. Here each region is controlled by virtue of
being in a set intersection of constraints but, in contrast to the situation in Table 1b where all
factors had global effects, here each factor along the hierarchy from left to right has only a
regional or local effect contingent on the value of stronger variables at higher levels (to its left)
in the hierarchy. Thus, if we view the regions of Table 1c in a human-environment context, each
region with a given response level e.g., R1 - results from the interaction between the
demographic variable and its chain of environmental constraints as specified in the columns of
the table. Note that these constraints 1) are only locally operative, and 2) may result in similar
values for the demographic variable across multiple regions. That is, the response variables
values R1 and R5 may be numerically very similar but the similar values occur by reason of
quite different combinations of environmental drivers. This scenario is logically a local
5
constraint intersection model.
The fourth type of model is not illustrated explicitly here since it reflects merely the
global influence of a predictor variable across the domain of interest and can be approximated by
piecewise regression. That is, if the response-predictor relation is perfectly linear, the
approximation is in effect the equivalent of Table1b where variables B and C are now replaced
by quartile and octile splits on variable A; if the relationship is curvilinear, analogous
approximation between irregularly spaced percentiles results.
These four models have very different implications as to how population and
environment interact - the first as a series of regionally single factor influences, the second the
varied outcomes of a small set of environmental factors interacting globally, the third as a
hierarchical structure of increasingly local effects, and the fourth a globally dominant
environmental factor. In essence these theoretical constructs provide a basis for classification of
the structures actually present in the environmental relationships of any population variables to
be considered. This type of model development and assessment, coupled with the rich
demographic and environmental databases already available, has the potential to provide critical
information in developing a theoretical framework for human-environment interactions. For
example, Kates’ (1998) argument that the interaction of global environmental change with
economic restructuring and with population growth and migration is most readily examined at a
local level would be supported by a pattern of demographic response variables consistently
conforming to the local constraint intersection model but Brown’s (1995) macro-ecological
model of distribution would lead one to expect the global constraint intersection to be dominant.
Similarly, were the systematic nesting of ever smaller scales inherent in the regional dominance
model common-place, it might well reflect universal scaling due of data aggregation, as
described by Costanza and Maxwell (1994).
It is tempting to think of the processes just described as playing out on an array of spatial
scales that are universal across the domain of interest and that are immediately recognizable as to
relevant resolution. However, as the interaction of ecological and socio-economic processes
shapes landscape pattern (Lee et al. 1992), the connections of population to land cover change
and environmental drivers become weaker at smaller scales because other variables are
influential locally. That is, it is necessary to acknowledge that the appearance of new, higher
6
resolution processes may appear only locally within the domain of discourse. For example, the
processes that shape human activity within the Willamette Valley of Oregon or the Central
Valley of California are unlikely to be the same as those shaping activity in the Cascades or in
the Sierra Nevada; and these processes are also unlikely to be simply a variant of those mountain
processes from which the sub-set linked to the high-frequency structure of the mountains is
simply absent. The nexus of processes that play out at any point on the landscape will generate a
sphere of spatial influence whose dimensions are currently unpredictable and which will vary
from place to place across the landscape. Improved ability to detect the presence of relevant
scales of human influence on landscapes will therefore be of considerable value in revealing
where such variables are to be searched for, it being impossible to gather high resolution data
with uniform success globally. Meyer and Turner (1994) use the same thinking in arguing that
better data as to the spatial congruence of population density with particular land covers will
clarify the role as a driving force of land cover change. Hence we develop in the present paper a
method of using the residuals about the class of models discussed above to identify where the
global models need to be modified in light of local and regional factors.
In the present paper we examine the distribution of human population density and of
attributes of the Bureau of the Census data for 1990 across the conterminous United States as a
model of how scale effects might be recognized and incorporated in population-environmental
analysis.
MATERIALS AND METHODS
Mageean and Bartlett (Mageean and Bartlett 1998a,b, Jones et al. 1999) used regression
tree analysis to model population density and socio-economic variables across the conterminous
United States against a series of environmental variables. The county-level population data used
were the 1990 population data from the U.S. Bureau of the Census, mapped to a regular grid
using a geographic information system (ARC/INFO - ESRI 1996). The grid used was the U.S.
Environmental Protection Agency’s Environmental Monitoring and Assessment Program
(EMAP), a hexagonal grid with approximately 12,000 spatial units within the 48 states (White et
al. 1992). Each grid cell was approximately 635 km2, with a center-to-center spacing of
approximately 25 km. Each hexagon intersected the county level data in a series of polygons
7
and a density was assigned to each hexagon as an area-weighted sum of the densities in these
polygons.
The environmental data were those used by O’Connor et al. (1996). Briefly, this data set
comprised selected climate data, land cover class data from the Loveland et al. (1991) prototype
land cover classification of the United States, extended with the addition of an urban layer from
the Digital Chart of the World (Danko 1992), various landscape metrics derived therefrom
(Hunsaker et al. 1994) and various supporting data such as elevation, stream density, road
density, and gross patterns of federal versus non-federal ownership (later formalized as Wickham
et al. 1995). In the event, only rather few variables, notably climate variables, contributed to the
predictions.
Very high population densities were a complex function of urbanization rather than of
any of the environmental variables they considered and densities of more than 100 persons/km2
were high enough to make the likelihood of any significant ecological functioning negligible
(Terborgh 1989). We therefore flagged densities above this threshold as urban densities and
used this flag as a predictor variable to handle the excessively high densities involved.
Regression tree analysis - developed by Breiman et al. (1984), and used here in the form
of the S-Plus (Stat Sci, Inc., Seattle, Washington) implementation reviewed by Clark and
Pregibon (1992), is well adapted to handle the complex and unpredictable interactions of
ecological data. Regression tree analysis recursively partitions a data set through a series of
binary nodes, at each binary node evaluating all independent variables and all threshold values
thereof for ability to dichotomize the data into two subsets that are as different as possible as to
the values of the dependent variables. The independent variable and threshold that most
distinctively dichotomizes the sample is chosen as the appropriate splitting criterion for that
node. This process is repeated separately with each of the two descendent nodes, and the process
propagated until some stopping rule is satisfied. The process is prone to over-fitting in that, in
the extreme, partitioning could proceed until each terminal node contains just one member of the
sample group (or multiple members with zero variance) and therefore be fitted exactly. This
problem is addressed by over-fitting the tree initially, then pruning it back to a smaller tree on the
basis of cross-validation.
For the present work the regression tree model thus created was used as the basis for
8
predicting population density in each of the 12,600 hexagons in the conterminous United States
and computing the residuals about those predictions for each hexagon. The resulting residuals
were exported to SYSTAT (SPSS Inc., Chicago, IL) for statistical analysis using analysis of
variance, t-test, non-parametric two-sample tests, and other conventional statistical tools. Maps
were prepared using ARCVIEW GIS (ESRI 1996). In addition to population density we
considered several other variables used by Bartlett et al. (2000), namely relative population
change over the decade, farmstead density, agricultural intensity, mean age of built structure, a
wealth index, and metropolitan status.
RESULTS
The Continental Model
Figure 1 shows the regression tree analysis for 1990 population density and for 1980-90
decadal relative change in population density. The regression tree for population density used a
logarithmic transformation of population density (persons/km2) as the dependent variable and
yielded a partitioning of the 12,600 hexagons into six end nodes (A-F in Figure 1a) by means of
a series of binary partitions of the data. The statistical fit of the model was moderately good,
accounting for 59.8 per cent of the deviance in density. The end nodes identify areas whose
environmental conditions are defined as the set intersections of the environmental constraints in
their chain of splitting conditions back to the root node, and these are mapped in Figure 2. Thus,
node A consists of those hexagons that had low population densities in cold, low elevation areas
(mean January temperature below -10.5oC). Similar interpretations can be applied to the other
end nodes. Four of the end nodes (A, B, D, and E) are groups of largely spatially contiguous
hexagons, with scattered outliers around their edges (Figure 2). Node C, on the other hand
(almost inevitably because of its definition in terms of concentrations of high population
density), was much more diffuse, capturing most of the large metropolitan centers in the country,
though Denver was absent because of its segregation by the split on elevation. Finally, node F
contained locations with both high elevation and high annual precipitation associated with the
western mountain ranges. The national pattern was thus a function of three major, large-scale,
natural factors (elevation, January temperatures, and annual precipitation). Among the many
small-scale patterning variables considered in the analysis, only the artificial urbanization index
9
appeared. The large wavelengths probably reflect spatial auto-correlation in the defining
variables of January climate, annual precipitation, and elevation.
The model for relative population change over the decade (Figure 1b) was rather
different, and notably more asymmetric than that for population density. Population change split
initially on the basis of seasonality, with a threshold of 25.2oC. In areas with high seasonality a
further split into end nodes occurred, again on the basis of seasonality. The effect was to divide
the hexagons across the conterminous United States into three groups, characterized by
seasonality of above 28.3oC, seasonality levels between 25.2 and 28.3oC, and seasonality below
25.2oC. The least seasonal hexagons then, in turn, split off groups of various hexagons on the
basis of land cover (mean area of patches of barren land) (yielding node E), then on average
January temperature (yielding node D), then on deciduous forest patch size (yielding node C),
finally dividing into nodes A and B on the basis of relief.
The population density model of Figure 1a clearly conforms to the local constraint
intersection model rather than to the global constraint intersection model or to the regional
dominance model. Seven of the eight other variables subjected to regression tree analysis also
conformed to this model (though specific results are not presented here). Only the 1980-90
decadal population change shown in Figure 1b differed, conforming fairly closely to the regional
dominance model, the only deviation being an additional side-branch split off on seasonality.
Identifying Mesoscale Pattern
Each of the six terminal nodes for population density in Figure 1a predicted a population
density across all locations in its spatial domain (Figure 2) but for each hexagon the true
population density was in fact known. The deviations of these known values from their
respective node predictions could simply reflect statistical noise about the within-node mean i.e.,
densities within each hexagon were randomly distributed about the mean. Alternatively, local
conditions within each domain may have caused systematic local deviation of the human
population from the value predicted across the node as a whole, yielding spatial concentrations
of deviations indicative of the need to introduce mesoscale or regional models to modify the
continental one. Figure 3 shows the distribution of hexagons that were respectively either one
standard deviation above or one standard deviation below the mean for their node. The error was
10
computed as “error = observed - predicted”, so that a positive value were obtained where
population density was under-estimated by the model. The deviations proved to be concentrated
in space. For example, in node A the concentration was primarily centered around Minneapolis-
St. Paul (Figure 3a), indicating that the influence of this metropolis extended way beyond the
area captured in the location of this city in node C. Smaller clusters of underestimates occurred
to the western edge of the node and also in Vermont. In contrast, the overestimates of population
in node A occurred in very small clusters and primarily on the western edge of the node,
excepting one cluster in Maine. Similar concentration of population around the suburbs of the
major urban areas was evident in node B for much of the eastern U.S. (Figure 3b) and for many
western cities e.g., Phoenix , San Diego, and Los Angeles. Overestimates of population were
primarily along the western half of the node, with smaller clusters in eastern Maine, Michigan,
the Florida panhandle, New Mexico, and through the Pacific states. Most of the large
metropolitan areas in node C (Figure 3c) had centers underestimated in population e.g., Boston,
New York, Philadelphia, and Washington. Node D (Figure 3d) differed from node B in having
larger blocks of especially contiguous errors of both sorts, with positive and negative clusters
distributed fairly widely across the node. A concentration of overestimates along the eastern
boundary of node D alongside the corresponding underestimates along this boundary in node B
suggest a gradient in densities from east to west that was poorly modeled by the piecewise fits of
the node constants. Node E hexagons generally underestimated the population density along the
Mexican border with western Texas (Figure 3e), probably largely reflecting the growth of the
trans-border towns in recent years (Sable 1989). In this node population density was
systematically overestimated across the deserts of southwestern Arizona and of southern
California. The remaining node (Figure 3f) captured high elevation wet hexagons in both the
eastern and western part of the U.S., with distinct regional concentrations of overestimates in the
Appalachians and in the Cascades and underestimates in eastern Idaho and western Montana.
Ecoregional Patterning
Omernik (1987) classified land in the United States into ecoregions within which
geographical ecology was similar. The analysis of Figure 1a, based as it was on climate and land
cover attributes, potentially captured some or all of this regionalization. As there are 76
11
ecoregions, the best one could hope for by way of a fit was that each of the nodes of Figure 1a
would correspond to aggregations of ecoregions rather than have node boundaries overlapping
ecoregions. We therefore also calculated for each node an “ownership” value in respect of each
ecoregion, to measure the extent to which each ecoregion was contained within a node. If a node
consisted entirely and exclusively of, say, some six ecoregions, then its ownership value of each
of these six regions would be 100%. On the other hand, if one of the overlapped ecoregions had
25% of its hexagons in another node, then the first node would own only 75% of that ecoregion.
Table 2 presents the results. A minority of ecoregions had ownership values of 100%, indicating
that most ecoregions spanned more than one node.
We considered further whether the residuals about the node means might vary from
ecoregion to ecoregion within a node and conducted an analysis of variance for each of the six
nodes (Table 3). Within all six nodes there was a strong effect in residual with ecoregion,
though weakest in node C (the metropolitan areas node). In the other five ecoregions, however,
there was strong variation from ecoregion to ecoregion in the mean value of the residual from the
continental model. Recalling that the population densities were analyzed after log-
transformation, these differences indicate proportionately large variation in actual densities
between ecoregions within nodes. Examination of the mean residuals for individual ecoregions
showed no clear pattern, in some cases with neighboring ecoregions sharing rather similar values
but in other cases with spatially distant ecoregions having similar values and neighbors having
quite different values, this within individual nodes. Thus in summary, although ecoregions were
not tightly coupled to the nodes of Figure 1a, the residuals of population density within any node
were not randomly spread across the individual ecoregions, and therefore seemed to deviate
individualistically from the global model.
Correlates of Residuals
A further ecoregional phenomenon was apparent. When all hexagons that straddled two
or more ecoregion boundaries were classed as “edge hexagons”, leaving as “interior hexagons”
those whose entire area was contained within a single ecoregion, population density residuals in
edge hexagons were significantly more positive than were those in interior hexagons (0.115
0.995 versus -0.0470.979, t = 8.32, P < 0.001). We conclude, therefore, that edge hexagons
12
typically had higher population densities than expected for the node as a whole, while at the
same time the interior hexagons within each ecoregion tended to over-predict population density.
Identifying local and regional drivers
As noted earlier, systematic clustering of residuals in population density implied that
some rather small-scale factors were working to bring about a local departure of population
density from the levels predicted by the continental model. The ecoregion boundaries
constituted one such factor, as just noted, but clearly accounted for only a small fraction of the
cases of under-estimation by the continental model. We therefore also considered as possible
correlates the extent to which towns and highways might lead to local enhancement of
population density. The presence of a town provides a social organization of humans into a
finite space which is large relative to our spatial units of analysis and a highway typically
promotes population growth along its length by virtue of the ease of travel it provides. Figure 4
shows the distribution of positive residuals with respect to the presence of ecoregions, towns, and
highways, either alone or in all possible combinations. Inspection of this map shows that the
majority of these residuals (63%; Table 4) coincided with the co-occurrence of highways and
towns; a third of these also coincided with eccoregion boundaries, notably along a chain of sites
down the eastern side of the Appalachians and in a few clusters in western states e.g., Utah,
Colorado, and Arizona. Clusters associated with highways alone were scarce (8.6%) and were
most noticeable in southern California, New Mexico, and Montana. Clusters centered only on a
town or only on an ecoregion boundary were present only as scattered instances. Finally, those
clusters that did not relate to one or more of these three variables (10.6% of the total) were
essentially confined to western and southwestern states. Thus most of the clusters of larger than
expected population densities were linked to highway and towns.
DISCUSSION
The methods described here constitute an extension of the approach developed by
Legendre and Fortin (1989) who used a regression surface to model long-wavelength, spatially
extensive patterns, followed by analysis of the distribution of residuals to identify smaller scale
patterns. The regression tree analysis used here has many similarities to other classes of
13
multivariate regression but offers the special advantages for population-environment analysis of
allowing the manifestation of contingent effects and interactions, even in circumstances where
the nature of these cannot be formulated a priori. The recursive partitioning involved allows
different rules to emerge in different places, possibly involving quite different variables in
different parts of the spatial domain. Our earlier work (Mageean and Bartlett 1998a,b, Bartlett et
al. 2000) exploited these properties to derive a continental model of the environmental
contingencies of human population density across the conterminous United States. Here, our
residuals analysis extended the approach to address the limitations of continental models and to
determine where smaller regional models needed to be introduced.
We found that the local constraints model (Table 1c) was the pattern shown by the
regression tree for population density (Figure 1) and for all the other trees constructed bar that
for population change; this last proved to parallel the regional dominance model (Table 1a).
Thus the majority of the patterns of population-environment interactions, as in previous studies
(Mageean and Bartlett 1998b, Bartlett et al. 2000), provide evidence in favor of Kates’ (1998)
generalization that it is at local level that explanations of these interactions must be sought.
What our models provide is a quantified delineation of the spatial domains of the different
environmental drivers or combinations of drivers for which global data are available, together
with the ability to identify where specifically local data not considered in the global data set are
needed to account for a regionally cohesive departure from the global predictions. Thus our
prototype promises to allow systematic, quantified evaluation of the thinking underlying Kates’
(1998) argument. It is of particular interest in this respect that it is population change that
matches the regional dominance model. A reasonable interpretation would be that an individual
driver may be strong enough on its own to drive population decreases or to stimulate increase,
but that where it has no effect another driver then has its effects manifest. This simple notion is
one that cannot be captured by conventional regression models unless they are expressly
structured a priori to model such a phenomenon. Huston (2001) and O’Connor (2001) both
argue that the modeling of distributional data will typically be misleading if such multiple
alternative constraints are not accommodated in the analysis adopted.
Regression trees assess correlation and not causation, limiting the interpretation possible
here. Nevertheless, such analyses delineate the patterns that causal explanations must account
14
for and provide pointers to the likely causal factors. In biogeography the Holdridge life zone
classification (Holdridge 1967) characterizes biomes on the basis of empirical relationships to
temperature and precipitation, an idea no different than our recognition of particular end nodes as
the set intersection of environmental constraints. Small and Cohen’s (2000) recent use of the
Gridded Population of the World database to determine the distribution of the world’s population
within the phase space of the environmental variables they considered is an explicitly human
distribution study in the same spirit. The fact of the end partitioning achieved in the present
study likewise implies that the regionalization deserves use in formulating hypotheses for further
study, particularly because of the extensive checking against collinearity within our suite of
potential predictor variables. It remains possible, as with all regression studies, that the variables
identified here may be yet confounded with other factors not considered as candidate predictors
but, if so, their action is necessarily limited to the spatial domain of that end node within our
regression tree. Thus our analysis limits the scope for potential misinterpretation and provides us
with “experimental” and “control” domains within which any potential confounder must display
appropriate effects.
That none of the many landscape metrics we considered appeared as predictors of
population density runs contrary to the widespread advocacy of their value in the recognition of
patterns within landscapes (Turner 1990, O'Neill et al. 1986), landscape pattern being a mixture
of natural and anthropogenic patches of varying size and configuration driven by the joint action
of physical, biological, and social forces [Burgess and Sharpe 1981, Forman and Godron 1986,
Krummel et al. 1987, Turner 1987, Turner 1990). Our residuals analysis, rather than landscape
metrics, was needed to detect regional patterns. One possibility is that high spatial variability in
population density may have prevented the regression tree analysis from detecting anything other
than long wavelength effects: the cross-validation process used is notoriously data hungry.
However, an alternative explanation may be that the linkage of landscape metrics patterns to
ecological processes breaks down under intense human use of the land. Hulshoff (1995) found
that few landscape indices were useful in the intensively managed landscapes of the Netherlands:
the dominance index was insensitive to changes in landscape, shape indices of natural areas were
dominated by the presence of their human-modified surroundings, and no index tracked
locational changes in patches. These findings suggest that it might be unwise to rely on the
15
spatial patterning of landscape metrics as definitive in scale recognition with demographic and
socio-economic data, and that residuals analysis of the type demonstrated here may be a more
robust approach to detecting the need for regional scale changes. The promise of earlier
exploratory work towards the use of landscape metrics in detecting varied scale patterns, notably
Wickham and Norton’s (1994) system of mapping units called Landscape Pattern Types (LPT)
and Flamm and Turner’s (1994) landscape-condition labels, may play out over smaller extents
mapped with higher resolution spatial data. Flamm and Turner’s simulations showed that
considerable differences arise in the dynamics of models using pixel-based landscape metrics
than in the dynamics of models in which discrete landscape patches are treated as units. Such
results suggest that in the study of population-environment interactions it is essential to identify
where patterns in simple landscape metrics come together to form synthetic wholes before
conducting analyses of the influence of landscape composition and pattern on human
distribution. In effect, these studies and the present one support Norton and Ulanowicz’s (1992)
position as to local perturbation of spatially extensive models and meet Meyer and Turner’s
(1994) need for a middle scale between global (here continental) and local within which to
address population-environment relationships.
Overall we detected three patterns or levels of deviation superimposed on the regression
tree continental model. The first was the systematic change in sign of deviations across node
boundaries from east to west, indicating that the overall population density surface is merely
being approximated by the planes fitted within each node. The spatial contiguity within the
nodes reflects the parallel auto-correlation of the environmental variables involved (Figure 3) but
did not at the same time capture the gradients therein cf. the steep descent in precipitation across
the eastern edge of the Great Plains. The second level at which deviations from the continental
model were detected in our analytical approach was the variation in residuals from Omernik
ecoregion to Omernik ecoregion within nodes (Table 3). These results linked the residuals
decisively to the Omernik ecoregions, indicating that the residuals about our models are in some
way associated with the characteristics and attributes that define each ecoregion as a synoptic
whole. The third level at which a pattern was detected here was in the relationship with the
boundaries of ecoregions, where we found a significant bias in most nodes toward
underestimation of the population density at the ecotone of two or more ecoregions. Since this
16
signal was manifest as a pattern in the residuals despite the ecoregion to ecoregion variation in
mean, it indicates that the ecotonal effect must be quite strong. Finally, we showed that the
occurrence of local clusters of under-estimates of population density were associated with the
presence of towns, highways, and ecoregions, the first two of which capture effects better
described as social than as biophysical. We are not suggesting that demonstrating that higher
densities of people are found along highways or in towns is a scientific break-through! Rather,
they are used here solely as a demonstration of how analysis of residuals in future work of this
type can be used to localize and investigate local or regional departures from models covering
large spatial extents. In that future work neither the localization of the domain of the regional or
local drivers nor the identity of those drivers will be known in advance. In the prototype set out
here a small number of attributes - towns, highways, ecoregion boundaries - proved correlated
with all but about 10% of the clusters across the conterminous U.S. i.e. the effects turned out to
be local effects that operated globally. However, it is in fact not necessary that this be the
outcome: it is entirely conceivable that local departures from global models be location-specific,
each requiring consideration of factors that are influential nowhere else in the domain of the
model.
The most intriguing of these three findings was the tendency for high population densities
to exceed the predicted values along the edges of ecoregions: there are more people at the
transition areas between ecoregions than in the interiors thereof. It is well known in small scale
studies that ecotones are particularly valuable for wildlife (Leopold 1933) but the observation
has not been reported, to our knowledge, of humans in relation to the much larger spatial extent
ecoregions. It may be relevant that Pysek (1992) analyzed the transition zones between human
settlements and adjacent rural areas in central Europe and discovered higher vegetation diversity
and plant species richness in settlement transition zones as compared to more urban or more rural
areas.
A key distinction between our work and previous investigations of how changes in scale
associated with change in the resolution (grain size) of a spatial analysis affect the conclusions
reached (Busing and White 1993, Benson and MacKenzie 1995, Moody and Woodcock 1995, Qi
and Wu 1996) is that they implicitly assume that a single scale (i.e., resolution) is optimal across
the domain of interest. A fundamental weakness in this approach is that it requires the
17
relationship between predictor and response variables to be identical over the whole spatial
domain of the data set at all scales. Our analyses provide an empirical example in which the
clusters at the regional and local scale do not show a regularity of pattern amenable to such
analysis, the need for finer resolution varying across space and being correlated with different
factors or combinations of factors (towns, highways, boundaries) at different locations. Global
analysis of our population density correlates at multiple levels of aggregation would have yielded
misleading results because the rules are local, not global.
The key to scale recognition in our analysis was the spatial contiguity of the hexagons
and of the residuals, which almost certainly originated in a spatial auto-correlation of the key
predictor variables, an auto-correlation which was not explicitly used in the regression tree
partitioning. Although it is well understood that statistical inference requires the members of a
sample to be mutually independent and that spatial auto-correlation lowers the effective degrees
of freedom, here our analysis can be seen as a first determination of the scale of any spatial
effects present, Borcard et al. (1992) recognize four distinct patterns of spatial variation: a
statistical, non-spatial dependence of the response variable on the environmental variables
considered; a purely spatial auto-correlation of the dependent variable over space; a dependence
of the dependent variable on a correlated response of the environmental variable over space (i.e.,
spatial auto-correlation within the predictor variable’s distribution); and a residual non-spatial
noise component. The value of a spatial regression tree analysis in any given case would then
depend on the relative magnitude of these four components.
With growing appreciation of the role of spatially explicit processes in shaping the
natural environment and the interaction of human populations with them, our work provides a
systematic approach to delineating the spatial domain within which the more local investigations
must take place. We have shown that ideas suggested within the natural sciences for the
application of the principles of landscape ecology to environmental issues (Legendre and Fortin
1989, Flamm and Turner 1994) can be developed to be viable identifiers of appropriate scales,
and in particular of regionally restricted scale changes, for the characterization of the spatial
scale of human activities and their environmental implications. Our combination of regression
tree analysis to define spatially extensive interaction of demography and environment, combined
with residuals analysis to identify the location and extent of departures from such models, offer a
18
systematic approach to achieving such identification.
LITERATURE CITED
Bartlett, J.G., D.M. Mageean, and R.J. O'Connor. 2000. Residential expansion as a continental threat to U.S. coastal ecosystems. Population and Environment 21: 429-468.
Benson, B.J., and M.D. MacKenzie. 1995. Effects of sensor spatial resolution on landscape structure parameters. Landscape Ecology 10: 113-120.
Borcard, D., P. Legendre, and P. Drapeau. 1992. Partialling out the spatial component of ecological variation. Ecology 73: 1045-1055.
Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone. 1984. Classification and regression trees. Wadsworth, Belmont CA.
Brown, J.H. 1995. Macroecology. University of Chicago Press, Chicago, IL.
Burgess, R.L., and D.M. Sharpe, (Eds). 1981. Forest Island Dynamics in Man-Dominated Landscapes. Springer-Verlag, New York.
Busing, R.T., and P.S. White. 1993. Effects of area on old-growth forest attributes: implications for the equilibrium landscape concept. Landscape Ecology 8:119-126.
Clark, L. A. and D. Pregibon 1992. Tree-based models. Pages 377-419 In: J.M. Chambers and T.J. Hastie, (Eds.) Statistical models in S. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California.
Costanza, R., L. Wainger, C. Folke, and K.G. Mäler. 1992. Modelling complex ecological economic systems: Towards an evolutionary, dynamic understanding of humans and nature. Beijer International Institute of Ecological Economics Reprint Series Number 17.
Costanza, R., and T. Maxwell 1994. Resolution and predictability: an approach to the scaling problem. Landscape Ecology 9:47-57.
Cowen, D. J., and J. R. Jensen. 1998. Extraction and modelling of urban attributes using remote sensing technology, Pages 164-188 In: D.L. Liverman, E.F. Moran, R.R. Rindfuss, and P.C. Stern (Eds.) People and pixels: linking remote sensing and social science. Washington D.C.: National Academy Press.
Danko, D.M. 1992. The digital chart of the world. GeoInfo Systems 2:29-36.
ESRI - ARC/INFO Version 7.0.4. 1996. Environmental Systems Research Institute, Inc. Redlands, CA.
Flamm, R.G., and M.G. Turner. 1994. Alternative model formulations for a stochastic simulation of landscape change. Landscape Ecology 9:37-46.
Forman, R.T.T., and M. Godron. 1986. Landscape Ecology. New York: John Wiley and Sons.
Geoghegan, J., L. Pritchard, Y. Ogneva-Himmelberger, R.R. Chowdhury, S. Sanderson, and B.L. Turner II. 1998. "Socializing the pixel" and "pixelizing the social" in land-use and land-
19
cover change, Pages 51-69, In: D.L. Liverman, E.F. Moran, R.R. Rindfuss, and P.C. Stern (Eds.), People and pixels: linking remote sensing and social science. Washington D.C.: National Academy Press.
Hall, C.A.S., Hanqin Tian, Ye Qi, G. Pontius, J. Cornell, and J. Uhlig. 1995. Spatially explicit models of land-use change and their application to the tropics. DOE Research Summary No. 31, February 1995.
Holdridge, L.R. 1967. Life Zone Ecology. Tropical Science Center, San Jose, Costa Rica.
Hulshoff, R.M. 1995. Landscape indices describing a Dutch landscape. Landscape Ecology 10:101-111.
Hunsaker, C.T., R.V. O’Neill, S.P. Timmins, B.L. Jackson, D.A. Levine, and D.J. Norton 1994. Sampling to characterize landscape pattern. Landscape Ecology 9:207-226.
Huston, M.A. 2001. Ecological context for predicting occurrences. In: J.M. Scott, P.J. Heglund, J.B. Haufler, M.L. Morrison, M.G. Raphael, W.B. Wall, and F. Samson, (Eds.) Predicting Species Occurrences: Issues of Accuracy and Scale. Island Press.
Jones, M.T., J.G. Bartlett, and D.M. Mageean. 1998. Visualising the hierarchical organization of a spatially explicit socioeconomic system. Systems Research and Information Systems 8:137-149.
Kates, R.W. 1998. Expanding our directions. Land Use and Land Cover Change Newsletter (Special Issue: The Earth’s Changing Land Conference), Number 3, March 1998). Barcelona, Spain: Institut Cartogràfic de Catalunya.
Lee, R.G., R.O. Flamm, M.G. Turner, C. Bledsoe, P. Chandler, C. DeFarrare, R. Gottfried, R.J. Naiman, N. Schumaker, and D. Wear. 1992. Integrating sustainable development and environmental quality: a landscape ecology approach. Pages 499-521 In: R.J. Naiman (Ed.) New Perspectives in Watershed Management. Springer Verlag: New York.
Legendre, P., and M.-J. Fortin. 1989. Spatial pattern and ecological analysis. Vegetatio 80:107-138.
Leopold, A. 1933. Game Management. Charles Scribner’s Sons: New York.
Loveland, T.R., J.W. Merchant, D.J. Ohlen, and J.F. Brown. 1991. Development of a land-cover characteristics database for the conterminous U.S. Photogr. Engin. Rem. Sens. 57:1453-1463.
LUCC. 1996. Land use and cover change: Open Science Meeting Proceedings. L. Fresco, R. Leemans, B.L. Turner ll, D. Skole, A.G. van Zeijl-Rozema, and V. Haarmann (Eds.). Barcelona, Spain: Institut Cartogràfic de Catalunya, 1997.
Mageean, D.M. and J.G. Bartlett. 1998a. Putting people on the map: integrating social science data with environmental data. In: Pecora 13 Proceedings: Human interactions with the environment - perspectives from space. Bethesda, MD: American Society of Photogrammetry and Remote Sensing. CD-ROM, 1 disk.
20
Mageean, D.M., and J.G. Bartlett. 1998b. Using population data to address the problems of human dimensions of environmental change. Pages 193-205 In: S. Morain (Ed.), GIS in natural resource management: balancing the technical-political equation. Santa Fe, NM: High Mountain Press.
Mertens, B., and E.F. Lambin. 1997. Spatial modelling of deforestation in southern Cameroon, Applied Geography 17:143-162.
Meyer, W.B., and B.L. Turner II. 1994. Global land-use and land-cover change: an overview. Pages 3-10 In: W.B. Meyer and B.L. Turner II, Eds. Changes in land use and land cover: a global perspective. Cambridge University Press: Cambridge.
Miller, J.R., L.A. Joyce, R.L. Knight, and R.M. King. 1996. Landscape patterns and road density in the Southern Rocky Mountains. Landscape Ecology 11:115-127.
Moody, A., and C.E. Woodcock. 1995. The influence of scale and the spatial characteristics of landscapes on land-cover mapping using remote sensing. Landscape Ecology 10:363-379.
Norton, B.G., and R.E. Ulanowicz. 1992. Scale and biodiversity policy: a hierarchical approach Ambio 21:244-249.
O’Connor, R.J. 2001. The conceptual basis of species distribution modeling: time for a paradigm shift. Pages 25-33 In: Scott, J.M., P.J. Heglund, F. Samson, J. Haufler, M. Morrison, M. Raphael, and B. Wall (Eds.), Predicting Species Occurrences: issues of scale and accuracy. Island Press.
O’Connor, R.J., M.T. Jones, D. White, C. Hunsaker, T. Loveland, B. Jones, and E. Preston. 1996. Spatial partitioning of environmental correlates of avian biodiversity in the conterminous United States. Biodiversity Letters 3:97-110.
Omernik, J.M. 1987. Ecoregions of the conterminous United States. Annals of the Association of American Geographers 77:118-125.
O’Neill, R.V., C.T. Hunsaker, S.P. Timmins, B.L. Jackson, K.B. Jones, K.H. Riitters, and J.D. Wickham. 1996. Scale problems in reporting landscape pattern at the regional scale. Landscape Ecology 11:169-180.
O'Neill, R.V., A.R. Johnson, and A.W. King. 1989. A hierarchical framework for the analysis of scale. Landscape Ecology 3:193-205.
Pyšek, P. 1992. Settlement outskirts - may they be considered as ecotones? - Ekológia (ÈSFR) 11:273-286.
Qi, Y., and J. Wu. 1996. Effects of changing spatial resolution on the results of landscape pattern analysis using spatial autocorrelation indices. Landscape Ecology 11:39-49.
Rindfuss, R.R., and P.C. Stern. 1998. Linking remote sensing and social science: the need and the challenges, Pages 1-27, In: D.L. Liverman, E.F. Moran, R.R. Rindfuss, and P.C. Stern (eds.), People and pixels: linking remote sensing and social science. Washington D.C.: National Academy Press.
21
Roth, N.E., J.D. Allan, and D.L. Erikson. 1996. Landscape influences on stream biotic integrity assessed at multiple spatial scales. Landscape Ecology 11:141-156.
Rudel, T.K., and T. Roper. 1996 Regional patterns and historical trends in tropical deforestation, 1976-1990: a qualitative comparative analysis. Ambio 25:160-166.
Sable, M.H., (Ed.) 1989. Las Maquiladoras : Assembly and manufacturing plants on the United States-Mexico border : An International Guide. Haworth Press: Binghamton, New York.
Small, C., and J.E. Cohen. 2000. Physiography, climate and the global distribution of human population. Available at: http://sedac.ciesin.org/plue/gpw/index.html?workshop.html&2
S-PLUS Version 3.3. 1995. StatSci, a division of MathSoft, Inc., Seattle, WA.
Terborgh, J. 1989. Where have all the birds gone? : Essays on the biology and conservation of birds that migrate to the American tropics. Princeton University Press, Princeton, New Jersey.
Turner, M.G. 1987. Spatial simulation of landscape changes in Georgia: a comparison of three transition models. Landscape Ecology 1:29-36.
Turner, M.G. 1990. Spatial and temporal analysis of landscape patterns. Landscape Ecology 4:21-30.
White, D., J. Kimmerling, and W.S. Overton. 1992. Cartographic and geometric components of a global design for environmental monitoring. Cartographic and Geographic Information Systems 19:5-22.
Wickham, J.D., and D.J. Norton. 1994. Mapping and analyzing landscape patterns. Landscape Ecology 9:7-23.
Wickham, J.D., J. Wu, and D.F. Bradford. 1995. Stressor data sets for studying species diversity at large spatial scales. US EPA 600/R-95/018. Office of Research and Development. U.S. Environmental Protection Agency, Washington, DC.
Wood, C.H. and D. Skole. 1998. Linking satellite, census, and survey data to study deforestation in the Brazilian Amazon, Pages 70-93, In: D.L. Liverman, E.F. Moran, R.R. Rindfuss, and P.C. Stern (Eds.), People and pixels: linking remote sensing and social science. Washington D.C.: National Academy Press.
22
Table 1a. Sequentially nested population-environment interaction (the regional dominance model) in which favorable conditions with each successive factor is influential only if all factors higher in the sequence are unfavorable.
Variable A Variable B Variable C Variable D Variable E Response level
Favorable Indifferent R1
Unfavorable Favorable Indifferent R2
Unfavorable Favorable Indifferent R3
Unfavorable Favorable Indifferent R4
Unfavorable Favorable R5
Unfavorable R6
Table 1b. Hierarchically structured population-environment interaction (the global constraint intersection model) where each environmental factor has influence over the whole spatial domain, and where R1>R2>R3>... >R8 if the influence of factor A is greater than that of factor B which in turn is greater than that of factor C.
Variable A Variable B Variable C Response level
Favorable Favorable Favorable R1
Favorable Favorable Unfavorable R2
Favorable Unfavorable Favorable R3
Favorable Unfavorable Unfavorable R4
Unfavorable Favorable Favorable R5
Unfavorable Favorable Unfavorable R6
Unfavorable Unfavorable Favorable R7
Unfavorable Unfavorable Unfavorable R8
23
Table 1c. Hierarchically structured population-environment interaction (the local constraint intersection model) where the influence of local factors is contingent on conditions higher in the hierarchy.
Variable A Variable B Variable C Variable D Variable E Variable F Variable G Response level
Favorable Favorable Favorable R1
Favorable Favorable Unfavorable R2
Favorable Unfavorable Favorable R3
Favorable Unfavorable Unfavorable R4
Unfavorable Favorable Favorable R5
Unfavorable Favorable Unfavorable R6
Unfavorable Unfavorable Favorable R7
Unfavorable Unfavorable Unfavorable R8
24
Table 2. Percent ownership of Omernik ecoregions by six regression tree end nodes forresponse variable: population density in 1990 and its environmental predictors (Figure 1a). An ownership value of 100 indicates that the entire ecoregion was contained within a single end node. Only ecoregions with >50% ownership by a given end node are listed.
25
Ecoregion Node A Ecoregion Node B Ecoregion Node C Ecoregion Node D Ecoregion Node E Ecoregion Node F
48 100 28 100 64 79 12 100 24 95 4 8749 100 36 100 59 66 41 100 14 61 5 7651 74 37 100 76 50 42 100 30 56 62 6550 55 39 99 43 100 66 65
35 98 44 100 69 6433 98 45 10031 97 18 10040 96 16 9973 95 20 9974 94 13 9971 92 21 9372 92 19 9265 91 9 9070 90 22 9054 89 11 9029 86 17 8063 85 23 7434 84 26 683 83 25 67
55 82 10 6775 81 15 6057 81 46 5532 81 27 547 80
68 8053 7447 7156 7061 6538 602 56
67 561 55
Table 3. Analysis of variance results for standardized population density residuals (z-scores). Residuals about the predicted mean for each regression tree node (see Figure 1a) were transformed to z-scores and variation assessed across Omernik (1987) ecoregions within that node.
Node # ecoregions wholly or partly in the end node
F-ratio Significance Level
A 8 77.47 0.001
B 32 48.91 0.001
C 29 3.74 0.001
D 25 39.31 0.001
E 11 78.73 0.001
F 15 70.51 0.001
Table 4. Frequency of occurrence of highways, towns, and ecoregion boundaries in hexagons with higher population density than predicted by the regression tree model of Figure 1a.
Highways
(n = 1494)
Towns
(n = 1410)
Ecoregion Boundaries
(n = 673)
Number of hexagons with this combination
Per cent of Total
(total n = 1982)
Present Present Present 416 21.0
Present Present Absent 828 41.8
Present Absent Present 80 4.0
Present Absent Absent 170 8.6
Absent Present Present 66 3.3
Absent Present Absent 100 5.0
Absent Absent Present 111 5.6
Absent Absent Absent 211 10.6
26
Figure 1 a. Regression tree for population density in 1990 and its environmental correlates across the conterminous United States. b. Regression tree for relative change in population density 1980-1990 and its environmental correlates. Numbers inside the oval or rectangle are mean values for the response variable (pop. density/km2 or change in pop. density) across all hexagons associated with that branch (oval) or end node (rectangle). The environmental variable that best explains the variation in response variable for each recursive hexagon subgroup is shown at each branch and its splitting value is given.
27
A B
C
D E
F
Avg. January Temp.<-10.5oC
Avg. January Temp.>-10.5oC
Avg. January Temp.<1.8oC
Avg. January Temp.>1.8oC
Avg. Precipitation<922.8mm
Avg. Precipitation>922.8mm
High Density?YES
High Density?NO
Mean Elevation<454 m
Mean Elevation>454 m
35.50
9.4260.60
8.42 19.89327.76
5.00 22.3226.069.73
24.05
Population Density 1990
Figure 1 b.
28
Avg. Seasonality < 25.2oCAvg. Seasonality > 25.2oC
Avg. Patch SizeBarren Land
> 1.7km2
Avg. Patch SizeBarren Land
< 1.7km2 Avg. Seasonality< 28.3oC
Avg. Seasonality> 28.3oC
Avg. January Temp.< 12.3oC
Avg. January Temp.> 12.3oC
Avg. Patch SizeDeciduous Forest
< 0.5km2
Elevation Range < 484m
Elevation Range > 484m
A B
C
D
E F G
7.73
-1.0214.03
38.3111.61
10.59 37.93
5.04
18.619.72
13.89
2.62 -4.57
Change in Population Density 1980-1990
Avg. Patch SizeDeciduous Forest
> 0.5km2
Figure 2. The regionalization of environmental correlates for population density in 1990 modeled by the regression tree analysis. All hexagons of one color form a common end node (labeled on Figure 1a) and thus share the same combination of predictor variables.
29
End NodeABCDEF
Population Density / km2 (1990)
Figure 3. Standard error scores (z-scores) [(observed-mean)/standard deviation] associated with regression tree end nodes for population density in 1990 and its environmental correlates across the conterminous United States. Light gray hexagons are within 1 standard deviation of the mean for that node while medium gray hexagons are greater than 1 standard deviation below the mean (over-fit areas) and dark gray hexagons are greater than 1 standard deviation above the mean (under-fit areas) for that end node.
30
31
Figure 4. The distribution of positive residuals with respect to the presence of highways, towns, and ecoregions, either alone or in all possible combinations.
32
Ushexes roads6.shp000001010011100101110111
Us_states.lamb
All Factors Combined(highways, towns, ecoregions – considered simultaneously)
KEY:0 = not present1 = presentExamples –011 = towns and ecoregions (but not highways) were important. 111 = all three were important000 = none were important
Hig
hway
sTo
wns
E
core
gion
s