GIS and Spatial Statistics: Methods and Applications in Public Health Marcia Castro Assistant Professor of Demography Harvard School of Public Health Institute

GIS and Spatial Statistics: GIS and Spatial Statistics: Methods and Applications in Public HealthMethods and Applications in Public Health

Marcia CastroMarcia CastroAssistant Professor of DemographyAssistant Professor of Demography

Harvard School of Public HealthHarvard School of Public Health

Institute of Behavioral Science, Computing and Research Services, and the Social Sciences Data LabUniversity of Colorado at Boulder - March 11, 2008

Spatial StatisticsSpatial Statistics

First Law of Geography

(Tobler 1979)

“Everything is related to everything else, but near things are more related

than distant things.”

Types of research questionsTypes of research questions

Spatial determinants of transmission

Spatial associations of risk factors with disease and interaction with temporal processes

Origins of diseases and outbreaks

Spatial and temporal distribution of disease and risk factors

Planning of surveillance program and targeting control activities

Improved allocation of limited resources

Types of spatial dataTypes of spatial data

PointsEvents – crimes, accidents, flu cases

Sample from a surface – air quality monitors, house sales

Objects – county centroids

AreaAggregates of events – accidents per census tract

Summary measures – density, mean house value

Spatial Pattern AnalysisSpatial Pattern Analysis

Some attributesTesting of Hypothesis

Hypothesis generation

Pattern evolution

Pattern prediction

Clustering

Test spatial regression assumptions

Cannot unequivocally determine cause and effect

Cannot assign meaning to spatial relationships

Problems / ChallengesProblems / Challenges

Modifiable areal unit problem (MAUP)

Scale effect – spatial data analysis at different scales may produce different results

Zoning effect – regrouping zones at a given scale may produce different results

Optimal neighborhood size

Alternative zoning schemes

Problems / ChallengesProblems / Challenges

Spatial dependenceTobler’s law

Spatial heterogeneityUneven distributions at the global scale

Boundary problems

Missing data

ConfidentialityCollection, analysis, publication, data sharing

Disclosure riskMethods do mask data

Spatial AutocorrelationSpatial Autocorrelation

Null hypothesis:

Spatial randomness

Values observed at one location do not depend on values observed at neighboring locations

Observed spatial pattern of values is equally likely as any other spatial pattern

The location of values may be altered without affecting the information content of the data

Regular Random Aggregated

Spatial autocorrelationSpatial autocorrelation

Formal test of match between locational similarity and value similarity

Locational similarity defined by spatial weightsBinary or Standardized

Types of neighborhoods:Contiguity (common boundary)

Distance (distance band, K-nearest neighbors)

General weights (social distance, distance decay)

Spatial autocorrelationSpatial autocorrelation

Test for the presence of spatial autocorrelationGlobal

LocalLISA – Local Indicators of Spatial Autocorrelation

Local spatial autocorrelation – LISALocal spatial autocorrelation – LISA

Moran’s Ii, Geary’s ci, Ki

Test CSR – positive and negative autocorrelationpositive - similar values (either high or low) are spatially clustered

negative - neighboring values are dissimilar

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5

X

Sp

atia

l La

g (

X)

-+ (Low-High)Below mean values

Above neighbors' mean

++ (High-High)Above mean values

Above neighbors' mean

-- (Low-Low)Below mean values

Below neighbors' mean

+- (High-Low)Above mean values

Below neighbors' mean

Local spatial autocorrelation – LISALocal spatial autocorrelation – LISA

Gi (d)

Does not consider the value of location i itself

Used for spread or diffusion studies

Useful for focal clusteringe.g. cholera infection around a specific water source

Gi*(d)

Takes the value of location i into account

Most appropriate for the identification of clustersHigh and low values

Choice of d is not straightforward

LocalLocal StatisticsStatistics

(21)

(10)

(9)

(12)

(6)

(22)

(19)

(17)

(3)

(7)

(20)

(16) (18) (13)

(11)

(5)

(4)

(24)

(15)

(8)

(14)

(23)

(2)

(1)


(21)

(10)

(9)

(12)

(22)

(19)

(17)

(3)

(7)

(20)

(16) (18) (13)

(11)

(5)

(4)

(24)

(15)

(8)

(14)

(23)

(2)

(1)

(6)


Multiple and dependent testsTwo sources of spatial dependence

Geometric Between the values of nearby locations


Multiple comparison correctionConservative – Bonferroni, Sidak

Probability that a true null hypothesis is incorrectly rejected - Type I error

False Discovery RateProportion of null hypotheses incorrectly rejected among all those that were rejected

Q = V / (V + S)Proportion of rejected hypotheses that are

erroneously rejected

FDR defined as the mean of Q:

FDR & Local StatisticsFDR & Local Statistics

Clusters fully identified

Clusters partially identified

Clusters missed

False clusters identified

Unadjusted

Bonferroni

Bonferroni with v

FDR

Clusters fully identified

Clusters partially identified

Clusters missed

False clusters identified

Unadjusted

Bonferroni

Bonferroni with v

FDR

Unadjusted

Bonferroni

Bonferroni with v

FDR

(a) Scenario (ii), d=3 (b) Scenario (iii), d=2 (c) Scenario (iv), d=2

Clusters fully identified Clusters partially identifiedClusters missed False clusters identified

MethodsMethods

GeostatisticsSemivariogram & Kriging

Weight the surrounding measured values to derive a prediction for each location

Weights are obtained from the semivariogram

SemivariogramSemivariogram

h

(h)

a

c0

2Z

Nugget effect A discontinuity at the origin generated by micro-scale

variation and/or measurement error

World of autocorrelation

Range - The distance beyond which the observed points are

not correlated anymore

Sill - The limit of the semivariogram as the distance hSill - The limit of the semivariogram as the distance h

World of independence

Creating the empirical semivariogramCreating the empirical semivariogram

Empiricalvalues

Directional Influence (Anisotropy)Directional Influence (Anisotropy)

Fitting a model to the empirical semivariogramFitting a model to the empirical semivariogram

Fitted model

Empirical values

KrigingKriging

BLUE

Different modelse.g. Cokriging

Prediction error

MethodsMethods

Multivariate analysisThe presence of spatial autocorrelation violates the independence assumption of standard linear regression models

Checking residuals – Moran’s I

Geographically weighted regressionLocal estimates of regression parameters

Spatial weights – distance-decay kernel functions

Not parsimonious

MethodsMethods

Multivariate analysisSpatially filtered regression

Spatial econometrics Spatial lag model (real contagion)

Value of the dependent variable in one area is influenced by the values of that variable in the surrounding neighborhood;

A weighted average of the dependent value for the neighborhood location is introduced as an additional covariate.

Spatial error model (false contagion model)Omitted covariates;

Autoregressive error term is included.

Spatial Analysis & Policy MakingSpatial Analysis & Policy Making

“…although basic science is directed at the discovery of general principles, the ultimate value of such knowledge, apart from simple curiosity, lies in our ability to apply it to local conditions and, thus, determine specific outcomes. Although such science may itself be placeless, the application of scientific knowledge in policy inevitably requires explicit attention to spatial variation, particularly when the basis of policy is local.”

(Goodchild, Anselin, Appelbaum and Harthorn 2000: 142)

Documents

GIS and Spatial Statistics: Methods and Applications in Public Health Marcia Castro Assistant Professor of Demography Harvard School of Public Health Institute