34
Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Embed Size (px)

Citation preview

Page 1: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Point Pattern Analysisusing

Spatial Inferential Statistics

Briggs Henan University 2010

1

Page 2: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Last time• Concept of statistical inference

– Drawing conclusions about populations from samples

– Null Hypothesis of no difference

– Alternative hypotheses (which we really want to accept)

• Random point pattern

• Is our observed point pattern “significantly different from random”

Briggs Henan University 20102

Page 3: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

From Centrographic Statistics (previously): Centrographic Statistics calculates single,

summary measures

PPA analyses the complete set of points

From Spatial Autocorrelation (discussed later):with PPA, the points have location only; there is no “magnitude” value

With Spatial Autocorrelation points have different magnitudes; there is an attribute variable.

How Point Pattern Analysis (PPA) is different

Briggs Henan University 2010

3

Page 4: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Two primary approaches:• Point Density using Quadrat Analysis

– Based on polygons – Analyze points using polygons!– Uses the frequency distribution or density of points within

a set of grid squares.

• Point Association using Nearest Neighbor Analysis

– Based on points– Uses distances between the points

Although the above would suggest that the first approach examines first order effects and the second approach examines second order effects, in practice the two cannot be separated.

Briggs Henan University 20104

Approaches to Point Pattern Analysis

Page 5: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Quadrat Analysis:The problem of selecting quadrat size

Briggs Henan University 2010

5

Too small: many quadrats with zero points

Too big: many quadrats have similar number of points

O.K.

Length of Quadrat edge

=

A=study areaN= number of points

Modifiable Areal Unit Problem

Page 6: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

6Quadrats don’t have to be square--and their size has a big influence

Uniform grid--used for secondary data

Multiple ways to create quadrats--and results can differ accordingly!

Random sampling--useful in field work

Frequency counts by Quadrat would be:

Number of points

in Quadrat Count Proportion Count Proportion

0 51 0.797 29 0.7631 11 0.172 8 0.2112 2 0.031 1 0.0263 0 0.000 0 0.000

64Q = # of quadartsP = # of points = 15

Census Q = 64 Sampling Q = 38

Types of Quadrats

Page 7: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 20107

Quadrat Analysis: Variance/Mean Ratio (VMR)• Apply uniform or random grid over

area (A) with width of square given by:

• Treat each cell as an observation and count the number of points within it, to create the variable X

• Calculate variance and mean of X, and create the variance to mean ratio: variance / mean

• For an uniform distribution, the variance is zero.– Therefore, we expect a variance-mean ratio close to 0

• For a random distribution, the variance and mean are the same. – Therefore, we expect a variance-mean ratio around 1

• For a clustered distribution, the variance is relatively large– Therefore, we expect a variance-mean ratio above 1

Where:A = area of regionn = # of points

See following slide for example. See O&U p 98-100 for another example

Page 8: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

Note:N = number of Quadrats = 10Ratio = Variance/mean

RANDOM

UNIFORM/DISPERSED

CLUSTERED

Formulae for variance

1

)(1

2

N

XXn

ii

1

]/)[(1

2

N

NXXn

ii2

3 15 02 11 33 1

Quadrat #

Number of Points Per Quadrat x^2

1 3 92 1 13 5 254 0 05 2 46 1 17 1 18 3 99 3 9

10 1 120 60

Variance 2.222Mean 2.000

Var/Mean 1.111

random

x

0 00 0

10 100 00 0

Quadrat #

Number of Points Per Quadrat x^2

1 0 02 0 03 0 04 0 05 10 1006 10 1007 0 08 0 09 0 0

10 0 020 200

Variance 17.778Mean 2.000

Var/Mean 8.889

Clustered

x

2 22 22 22 22 2

Quadrat #

Number of Points

Per Quadrat x^2

1 2 42 2 43 2 44 2 45 2 46 2 47 2 48 2 49 2 4

10 2 420 40

Variance 0.000Mean 2.000

Var/Mean 0.000

uniform

x

Page 9: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Significance Test for VMR• A significance test can be conducted based upon the chi-square

frequency distribution• The test statistic is given by: (sum of squared differences)/Mean

• The test will ascertain if a pattern is significantly more clustered than would be expected by chance (but does not test for a uniformity)

• The values of the test statistics in our cases would be:

• For degrees of freedom: N - 1 = 10 - 1 = 9, the value of chi-square at the 1% level is 21.666.

• Thus, there is only a 1% chance of obtaining a value of 21.666 or greater if the points had been allocated randomly. Since our test statistic for the clustered pattern is 80, we conclude that there is (considerably) less than a 1% chance that the clustered pattern could have resulted from a random process

=

random

60-(202)/10 = 10 2

uniform

40-(202)/10 = 0 2

clustered

200-(202)/10 = 80 2

Page 10: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 201010

Quadrat Analysis: Frequency Distribution Comparison

• Rather than base conclusion on variance/mean ratio, we can compare observed frequencies in the quadrats (Q= number of quadrats) with expected frequencies that would be generated by– a random process (modeled by the Poisson frequency distribution)

– a clustered process (e.g. one cell with P points, Q-1 cells with 0 points)

– a uniform process (e.g. each cell has P/Q points)

• The standard Kolmogorov-Smirnov test for comparing two frequency distributions can then be applied – see next slide

• See Lee and Wong pp. 62-68 for another example and further discussion.

Page 11: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Kolmogorov-Smirnov (K-S) Test• The test statistic “D” is simply given by:

D = max [ Cum Obser. Freq – Cum Expect. Freq]The largest difference (irrespective of sign) between observed cumulative frequency and expected

cumulative frequency

• The critical value at the 5% level is given by:

D (at 5%) = 1.36 where Q is the number of quadrats

Q

• Expected frequencies for a random spatial distribution are derived from the Poisson frequency distribution and can be calculated with:

p(0) = e-λ = 1 / (2.71828P/Q) and p(x) = p(x - 1) * λ /xWhere x = number of points in a quadrat and p(x) = the probability of x points

P = total number of points Q = number of quadratsλ = P/Q (the average number of points per quadrat)

See next slide for worked example for cluster case 11Briggs Henan University 2010

Page 12: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Calculation of Poisson Frequencies for Kolmogorov-Smirnov testCLUSTERED pattern as used in lecture

A B C D E F G H

=ColA * ColB=Col B / q !Col E - Col G

Number of Observed Cumulative Cumulative Absolute Points in Quadrat Total Observed Observed Poisson Poisson Differencequadrat Count Point Probability Probability Probability Probability

0 8 0 0.8000 0.8000 0.1353 0.1353 0.66471 0 0 0.0000 0.8000 0.2707 0.4060 0.39402 0 0 0.0000 0.8000 0.2707 0.6767 0.12333 0 0 0.0000 0.8000 0.1804 0.8571 0.05714 0 0 0.0000 0.8000 0.0902 0.9473 0.14735 0 0 0.0000 0.8000 0.0361 0.9834 0.18346 0 0 0.0000 0.8000 0.0120 0.9955 0.19557 0 0 0.0000 0.8000 0.0034 0.9989 0.19898 0 0 0.0000 0.8000 0.0009 0.9998 0.19989 0 0 0.0000 0.8000 0.0002 1.0000 0.2000

10 2 20 0.2000 1.0000 0.0000 1.0000 0.0000

The Kolmogorov-Smirnov D test statistic is the largest Absolute Difference = largest value in Column h 0.6647

Critical Value at 5% for one sample given by:1.36/sqrt(Q) 0.4301 SignificantCritical Value at 5% for two sample given by: 1.36*sqrt((Q1+Q2)/Q1*Q2))

number of quadrats Q 10 (sum of column B)number of points P 20 (sum of Col C)number of points in a quadrat x

poisson probability p(x) = p(x-1)*(P/Q)/x (Col E, Row 11 onwards)

if x=0 then p(x) = p(0)=2.71828^P/Q (Col E, Row 10)

Euler's constant 2.7183

Row 10

The spreadsheet spatstat.xls contains worked examples for the Uniform/ Clustered/ Random data previously used, as well as for Lee and Wong’s data

12

Briggs Henan University 2010

Page 13: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 201013

Weakness of Quadrat Analysis• Results may depend on quadrat size and orientation (Modifiable areal unit problem)

– test different sizes (or orientations) to determine the effects of each test on the results

• Is a measure of dispersion, and not really pattern, because it is based primarily on the density of points, and not their arrangement in relation to one another

• Results in a single measure for the entire distribution, so variations within the region are not recognized (could have clustering locally in some areas, but not overall)

For example, quadrat analysis cannot distinguish between these two, obviously different, patterns

For example, overall pattern here is dispersed, but there are some local clusters

Page 14: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Nearest-Neighbor Index (NNI) (O&U p. 100)

• Uses distances between points• It compares:

– the mean of the distance observed between each point and its nearest neighbor – with the expected mean distance if the distribution was random:

Observed Average Distance Expected Average Distance

For random pattern, NNI = 1For clustered pattern, NNI = 0For dispersed pattern, NNI = 2.149

14Briggs Henan University 2010

See next slide for formulae for calculation

NNI =

Page 15: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 201015

Calculating Nearest Neighbor Index

Where:

The average distance to nearest neighbor

Area of region: result very

dependent on this value

Page 16: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Significance Test for NNI• The test statistic is calculated as follows:

Z = Av. Distance Observed - Av. Distance Expected. Standard Error

• It has a Normal Frequency Distribution.• It tests if the observed pattern is significantly different from

random.• if Z is below –1.96 or above +1.96, we are “95% confident that

the distribution is not randomly distributed.”– or can say: If the observed pattern was random, there are less than 5

chances in 100 we would have observed a z value this large.Note: in the example on the next slide, the fact that the NNI for uniform is 1.96 is

coincidence!

Briggs Henan University 201016

Page 17: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 201017

An /

26136.02

(Standard error)

Calculating Test Statistic for Nearest Neighbor Index

Where:

Page 18: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

PointNearest

Neighbor Distance1 2 12 3 0.13 2 0.14 5 15 4 16 5 27 6 2.78 10 19 10 110 9 1

10.9

r 1.09Area of Region 50Density 0.2Expected Mean 1.118034R 0.974926NNI

Mean distance

PointNearest

Neighbor Distance1 2 0.12 3 0.13 2 0.14 5 0.15 4 0.16 5 0.17 6 0.18 9 0.19 10 0.110 9 0.1

1

r 0.1Area of Region 50Density 0.2Expected Mean 1.118034R 0.089443NNI

Mean distance

PointNearest

Neighbor Distance1 3 2.22 4 2.23 4 2.24 5 2.25 7 2.26 7 2.27 8 2.28 9 2.29 10 2.210 9 2.2

22

r 2.2Area of Region 50Density 0.2Expected Mean 1.118034R 1.96774NNI

Mean distance

Source: Lembro

RANDOM UNIFORMCLUSTERED

Z = 5.508Z = -0.1515 Z = 5.85518

Briggs Henan University 2010

Page 19: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Running in ArcGIS Telecom and Software Companies

Briggs Henan University 2010 19

Result is very dependent on area of the region. There is an option to insert your own value.Default value is the “minimum enclosed rectangle that encompasses all features.

Page 20: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

results

Briggs Henan University 2010

20

Scroll up the window to see all the results.Note: Progress box continues to run until graphic is closed. Always close graphic window first.

Produced if “Display output graphically “ box is

Page 21: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 201021

Evaluating the Nearest Neighbor Index• Advantages

– Unlike quadrats, the NNI considers distances between points – No quadrat size problem

• However, NNI has problems – Very dependent on the value of A, the area of the study region. What

boundary do we use for the study area?– Minimum enclosing rectangle? (highly affected by a few outliers)– Convex hull– Convex hull with buffer. What size buffer?

– There is an “adjustment for edge effects” but problems remain– Based on only the mean distance to the nearest neighbor– Doesn’t incorporate local variations, or clustering scale

• could have clustering locally in some areas, but not overall

– Based on point location only and does not incorporate magnitude of phenomena (quantity) at that point

Page 22: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Ripley’s K(d) Function• Ripley’s K is calculated multiple times, each for a different

distance band, – So it is represented as K(d): K is a function of distance, d

• The distance bands are placed around every point• K (d) is the average density of points at each distance (d), divided by

the average density of points in the entire area (n/a)

– If the density is high for a particular band, then clustering is occurring at that distance

Briggs Henan University 201022

Ripley B.D. 1976. The second –order analysis of stationary point processes. Journal of Applied Probability 13: 255-266

O&U p. 135-137

Where S is a point, and C(si, d) is a circle of radius d, centered at si

Page 23: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

23

The distance bands are placed around every point.Note the big problem of edge effects from circles outside the study area.

Source: O’Sullivan & Unwin, p.

The low end (0.2) corresponds to distances within the cluster

The high end (0.6) corresponds to distance between the clusters

within

between

Not this simple with real data!!!

disp

erse

d

Begins flat

clus

tere

d

Page 24: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

24Result is very dependent on area of the region. Can insert your own value.

Weight field—number of points at that location

Distance bands

Running in ArcGIS Telecom and Software Companies

use 9 for tests--99 takes a long time!

Again—study area has big effect so there are several options for this

Page 25: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Clustered, since observed is above expected

Dispersed, if observed was below expected

Pattern is clustered!Expected based on random pattern

Not this simple with real data!!!

Observed

Expected

Interpreting the Results

Page 26: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

26

Distance bands: start 5,000 feet size: 10,000 feetExpected assumes random patternConfidence band—9 iterations(takes long time for 99!)

Results for 10,000 feet Bands

Page 27: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

27

Distance bands: start 10,000 feet size: 20,000 feet

Also experiment with different region (study area) boundaries.

Results for 20,000 feet Bands

Page 28: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

28

Distance between clusters 70,000 feet = 13 miles

= 20 km

X field = ExpectedK or HiConfEnv

Y field = Diffk = ObservedK - ExpectedK

Plotting the Difference Between Observed and Expected K, versus Distance

Page 29: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Problems with Ripley K(d)• Dependent on study area boundary (edge effect)

– Circles go outside study area– Special adjustments are available (see O&U p. 148)– Try different options for boundary in ArcGIS

• Affected by circle radii selected– Try different values

• Each point has unit value—no magnitude or quantity– Weight field assumes “X” points at that location– e.g. X = 3, then 3 points at that location

Briggs Henan University 201029

Page 30: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

What have we learned?

How to measure and test if spatial patterns are clustered or dispersed.

Briggs Henan University 2010

30

Page 31: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Why is this important?

Briggs Henan University 201031

? Is it clustered?

We can measure and test --not just look and guess!

That is science.

Page 32: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Not just GIS!• I taught these tools to senior

undergraduate geography students.• They are also used in Earth Management.

• A former Henan University student and faculty member (now at UT-Dallas) is using Ripley’s K function for research on urban forests.

Briggs Henan University 201032

Page 33: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Next Time• No classes next week

• Next class will be Wednesday November 17

Topic

• Spatial Autocorrelation– Unlike PPA, in Spatial Autocorrelation points

have different magnitudes; there is an attribute variable.

Briggs Henan University 201033

Page 34: Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010 1

Briggs Henan University 2010

34