View
216
Download
1
Tags:
Embed Size (px)
Citation preview
GIS and Spatial Statistics: GIS and Spatial Statistics: Methods and Applications in Public HealthMethods and Applications in Public Health
Marcia CastroMarcia CastroAssistant Professor of DemographyAssistant Professor of Demography
Harvard School of Public HealthHarvard School of Public Health
Institute of Behavioral Science, Computing and Research Services, and the Social Sciences Data LabUniversity of Colorado at Boulder - March 11, 2008
Spatial StatisticsSpatial Statistics
First Law of Geography
(Tobler 1979)
“Everything is related to everything else, but near things are more related
than distant things.”
Types of research questionsTypes of research questions
Spatial determinants of transmission
Spatial associations of risk factors with disease and interaction with temporal processes
Origins of diseases and outbreaks
Spatial and temporal distribution of disease and risk factors
Planning of surveillance program and targeting control activities
Improved allocation of limited resources
Types of spatial dataTypes of spatial data
PointsEvents – crimes, accidents, flu cases
Sample from a surface – air quality monitors, house sales
Objects – county centroids
AreaAggregates of events – accidents per census tract
Summary measures – density, mean house value
Spatial Pattern AnalysisSpatial Pattern Analysis
Some attributesTesting of Hypothesis
Hypothesis generation
Pattern evolution
Pattern prediction
Clustering
Test spatial regression assumptions
Cannot unequivocally determine cause and effect
Cannot assign meaning to spatial relationships
Problems / ChallengesProblems / Challenges
Modifiable areal unit problem (MAUP)
Scale effect – spatial data analysis at different scales may produce different results
Zoning effect – regrouping zones at a given scale may produce different results
Optimal neighborhood size
Alternative zoning schemes
Problems / ChallengesProblems / Challenges
Spatial dependenceTobler’s law
Spatial heterogeneityUneven distributions at the global scale
Boundary problems
Missing data
ConfidentialityCollection, analysis, publication, data sharing
Disclosure riskMethods do mask data
Spatial AutocorrelationSpatial Autocorrelation
Null hypothesis:
Spatial randomness
Values observed at one location do not depend on values observed at neighboring locations
Observed spatial pattern of values is equally likely as any other spatial pattern
The location of values may be altered without affecting the information content of the data
Regular Random Aggregated
Spatial autocorrelationSpatial autocorrelation
Formal test of match between locational similarity and value similarity
Locational similarity defined by spatial weightsBinary or Standardized
Types of neighborhoods:Contiguity (common boundary)
Distance (distance band, K-nearest neighbors)
General weights (social distance, distance decay)
Spatial autocorrelationSpatial autocorrelation
Test for the presence of spatial autocorrelationGlobal
LocalLISA – Local Indicators of Spatial Autocorrelation
Local spatial autocorrelation – LISALocal spatial autocorrelation – LISA
Moran’s Ii, Geary’s ci, Ki
Test CSR – positive and negative autocorrelationpositive - similar values (either high or low) are spatially clustered
negative - neighboring values are dissimilar
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
X
Sp
atia
l La
g (
X)
-+ (Low-High)Below mean values
Above neighbors' mean
++ (High-High)Above mean values
Above neighbors' mean
-- (Low-Low)Below mean values
Below neighbors' mean
+- (High-Low)Above mean values
Below neighbors' mean
Local spatial autocorrelation – LISALocal spatial autocorrelation – LISA
Gi (d)
Does not consider the value of location i itself
Used for spread or diffusion studies
Useful for focal clusteringe.g. cholera infection around a specific water source
Gi*(d)
Takes the value of location i into account
Most appropriate for the identification of clustersHigh and low values
Choice of d is not straightforward
LocalLocal StatisticsStatistics
(21)
(10)
(9)
(12)
(6)
(22)
(19)
(17)
(3)
(7)
(20)
(16) (18) (13)
(11)
(5)
(4)
(24)
(15)
(8)
(14)
(23)
(2)
(1)
LocalLocal StatisticsStatistics
(21)
(10)
(9)
(12)
(22)
(19)
(17)
(3)
(7)
(20)
(16) (18) (13)
(11)
(5)
(4)
(24)
(15)
(8)
(14)
(23)
(2)
(1)
(6)
LocalLocal StatisticsStatistics
Multiple and dependent testsTwo sources of spatial dependence
Geometric Between the values of nearby locations
LocalLocal StatisticsStatistics
Multiple comparison correctionConservative – Bonferroni, Sidak
Probability that a true null hypothesis is incorrectly rejected - Type I error
False Discovery RateProportion of null hypotheses incorrectly rejected among all those that were rejected
Q = V / (V + S)Proportion of rejected hypotheses that are
erroneously rejected
FDR defined as the mean of Q:
FDR & Local StatisticsFDR & Local Statistics
Clusters fully identified
Clusters partially identified
Clusters missed
False clusters identified
Unadjusted
Bonferroni
Bonferroni with v
FDR
Clusters fully identified
Clusters partially identified
Clusters missed
False clusters identified
Unadjusted
Bonferroni
Bonferroni with v
FDR
Unadjusted
Bonferroni
Bonferroni with v
FDR
(a) Scenario (ii), d=3 (b) Scenario (iii), d=2 (c) Scenario (iv), d=2
Clusters fully identified Clusters partially identifiedClusters missed False clusters identified
MethodsMethods
GeostatisticsSemivariogram & Kriging
Weight the surrounding measured values to derive a prediction for each location
Weights are obtained from the semivariogram
SemivariogramSemivariogram
h
(h)
a
c0
2Z
Nugget effect A discontinuity at the origin generated by micro-scale
variation and/or measurement error
World of autocorrelation
Range - The distance beyond which the observed points are
not correlated anymore
Sill - The limit of the semivariogram as the distance hSill - The limit of the semivariogram as the distance h
World of independence
Creating the empirical semivariogramCreating the empirical semivariogram
Empiricalvalues
Directional Influence (Anisotropy)Directional Influence (Anisotropy)
Fitting a model to the empirical semivariogramFitting a model to the empirical semivariogram
Fitted model
Empirical values
KrigingKriging
BLUE
Different modelse.g. Cokriging
Prediction error
MethodsMethods
Multivariate analysisThe presence of spatial autocorrelation violates the independence assumption of standard linear regression models
Checking residuals – Moran’s I
Geographically weighted regressionLocal estimates of regression parameters
Spatial weights – distance-decay kernel functions
Not parsimonious
MethodsMethods
Multivariate analysisSpatially filtered regression
Spatial econometrics Spatial lag model (real contagion)
Value of the dependent variable in one area is influenced by the values of that variable in the surrounding neighborhood;
A weighted average of the dependent value for the neighborhood location is introduced as an additional covariate.
Spatial error model (false contagion model)Omitted covariates;
Autoregressive error term is included.
Spatial Analysis & Policy MakingSpatial Analysis & Policy Making
“…although basic science is directed at the discovery of general principles, the ultimate value of such knowledge, apart from simple curiosity, lies in our ability to apply it to local conditions and, thus, determine specific outcomes. Although such science may itself be placeless, the application of scientific knowledge in policy inevitably requires explicit attention to spatial variation, particularly when the basis of policy is local.”
(Goodchild, Anselin, Appelbaum and Harthorn 2000: 142)