A Summary of An Introduction to Statistical Problem Solving in Geography Chapter 12: Inferential Spatial Statistics Prepared by W. Bullitt Fitzhugh Geography

A Summary of An Introduction to Statistical Problem Solving in

Geography Chapter 12: Inferential Spatial

Statistics

Prepared by W. Bullitt FitzhughGeography 3000 - Advanced Geographic Statistics

Dr. Paul SuttonWinter Quarter, 2010

Inferential Spatial Statistics: Point Pattern and Area Pattern Analysis

Common in Geography to find points and areas used to represent spatial phenomenon

Methods exist to determine whether sample point pattern or sample area pattern follow a random spatial distribution

Spatial patterns that are distributed randomly are not usually of interest because underlying phenomenon has “no spatial logic”

Types of Spatial Patterns

Spatial patterns may be: clustered, random, or dispersed

Clustered points or areas have high densities in some locations and low or zero densities in other locations

Dispersed points or areas are (nearly) uniformly distributed across a study area

Random points or areas do not tend towards a clustered or dispersed spatial distribution

In real life, the type of patterns encountered in a research problem may be ambiguous with elements of multiple pattern types

Spatial Patterns Example: Lakeside Trees and Condo Development

Spatial Autocorrelation

Positive spatial autocorrelation occurs when nearby points or areas have similar values (clustered patterns)

Negative spatial autocorrelation occurs when nearby points or areas have dissimilar values (dispersed patterns)

No spatial autocorrelation exists when point or area patterns are randomly distributed

Point Pattern Analysis

• Point Pattern Analyses are useful analytic tools for geographic research problems where the variable(s) at hand are represented by points on a map

• “Nearest Neighbor Analysis” is a statistical procedure for understanding the spacing of points on a map

• “Quadrant Analysis” focuses on the nature of the spatial distribution of the point pattern within the overall study area

• Both methods aim to shed light on the underlying process(es) behind the geographic results

Nearest Neighbor Analysis

• Nearest Neighbor Analysis (NNA) can be used as a descriptive statistic or as a method to test hypotheses about the population from which the sample points were taken

• NNA uses the average nearest neighbor distance as an index of point spacing (descriptive)

– If a random spatial pattern, average nearest neighbor distance NNC(r) = 1/[2*SQRT(Density)]

– If a perfectly distributed or uniform spatial pattern, average nearest neighbor distance NND(d) 1.07453/SQRT(Density)

– If a perfectly clustered spatial pattern, average nearest neighbor distance NND(c) = 0 (all the points are at the same coordinates)

– Density = Number of Points/Study Area

• A standardized nearest neighbor index is used to compare results from different data sets

– R = Observed Mean Nearest Neighbor Distance/ Random Mean Nearest Neighbor Distance

Nearest Neighbor Analysis, Contd.

• The observed average nearest neighbor of a data set can be compared to a theoretical average nearest neighbor (assuming random spatial distribution) to test the hypothesis that points are randomly distributed

• Ho: NND = NND(r)

• Ha: NND not= NND(r) OR NND > NND(r) OR NNR(r) > NND)

• Choice of Ha depends on having a rational for thinking pattern may be clustered or distributed

Nearest Neighbor Analysis, Example

X Y Distance to A

Distance to B

Distance to C

Distance to D

Distnace to E

Distance to F

A 27 46 0 36.67

28.64 19.85 6 15.3

B 34 10 36.67 0 21.47 31.83 38.28 39.81

C 15 20 28.64 21.47

0 13.34 26.68 39.62

D 12 33 19.85 31.83

13.34 0 15.81 34

E 21 46 6 38.28

26.68 15.81 0 21.21

F 42 49 15.3 39.81

39.62 34 21.21 0

Nearest Neighbor Analysis, Example

• Area of Study Area = (Max(X) – Min(X)) *(Max(Y) – Min(Y))

• Ho: NND = NNDe (point pattern is random)

• Ha: NND > NNDe (point pattern is more distributed than random)

• z(NND) = 2.99 => p-value = .001, Ho rejected

NND(A) E

NND(B) C

NND(C) D

NND(D) E

NND(E) A

NND(F) D

NND 11.44

NNDe 6.98

s(NNDe) 1.49

z(NND) 2.99

Quadrant Analysis

• Quadrant Analysis focuses on the frequency of points occurring in a defined part of the study area

• Quadrants are superimposed over the study area, and the number of points in each quadrant is examined

• Based on quadrant (cell) frequency variability• Point pattern in the whole study area is

described through the analysis of point frequencies in each quadrant

Quadrant Analysis, Contd.

• Variance Mean Ratio (VMR) = (Variance of Cell Frequencies)/(Mean Cell Frequency)

• Disbursed distribution of points, cell frequencies should be similar– VMR ~= Zero

• Clustered distribution of points, cell frequencies should be low or zero for most cells with a few cells having many points– VMR is large

• Random distribution of points, variance of cell frequency should be near the mean cell frequency

Quadrant Analysis, Contd.

• VMR can be used as inferential test statistic

– Chi-square

– Function of VMR and number of cells m

– X^2 = VMR*(m-1)

• Ho: VMR = 1 (point R pattern is random)

• Ha: VMR not=1 OR VMR > 1 OR VMR < 1

• Need a large number of points spread accross a large number of cells for Quadrant Analysis to be worthwhile approach

Area Pattern Analysis

• Aspects of Area Pattern Analysis are analogous to Point Pattern Analysis

• Basic statistic for analysis of area patterns is the “joint count”– Join is operationally defined as two areas with common boundary

– Measure of spatial autocorrelation for nominal, areal data

– Familiar GIS function

• Binary Categories used in pattern analysis– Each individual area assigned black or white value

– Clustering occurs if areas with same binary value are contiguous

– Dispersion occurs if number of areas with black/white joins is greater than number of same-category joins

– Randomness occurs if number of similar and dissimilar joins are roughly equal

Area Pattern Analysis, Contd.

• How do you determine the number of expected black-white joins?

– “Free sampling” approach relies on theoretical background to inform probability of a given area having black or white value

• Probability of black or white value for a cell corresponds to binomial distribution, probability p for taking one value and q = 1-p for the other value

– “Non-free sampling” approach does not rely on any information outside of the

• Probability of black or white cell based on number of black and white cells in study area

– When unsure which approach to take, take non-free sampling

Area Pattern Analysis: Non-free Sampling Test

• Ho: Observed number of black-white joins (OBW) = expected number of random black-white joins (EBW)

• Ha: OBW not= EBW OR OBW > EBW OR EBW > OWB

• Choice of Ha depends on having a rationale

– If there is no rationale in to choose Ha in either direction, choose two-tailed test

Area Pattern Analysis: Non-free Sampling Test

• EBW = 2*J*B*W/N*(N – 1)

– J = Total Number of Joins

– B = Number of Black Areas

– W = Number of White Areas

– N = Total Number of Areas

• Test statistic Z = (OBW – EBW)/s(BW)

• s(BW) is standard error of expected number of black-white joins

– Given by formula 12.12 in McGrew (messy)

• Worked example found on pages 185 – 189 of text

Documents

A Summary of An Introduction to Statistical Problem Solving in Geography Chapter 12: Inferential Spatial Statistics Prepared by W. Bullitt Fitzhugh Geography