29
Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed . Most tests of point patterns compare the observed patterns to CSR. The two measurements that are used to describe pattern are: Density of points across the analysis area Distance between points within the analysis area

Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Embed Size (px)

Citation preview

Page 1: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Point Pattern Analysis

Point Patterns fall between the two extremes, highly clustered and highly dispersed . Most tests of point patterns compare the observed patterns to CSR.

The two measurements that are used to describe pattern are:

Density of points across the analysis area

Distance between points within the analysis area

Page 2: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Distance Methods

Distance methods are becoming more common

* Does not require rasterization

* Easy to do with GIS

51520 y

Point 1

10, 15

Point 2

15,20

51015 x07.755 2222 yxD

Page 3: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Issues with Length Measurement• Measurements in GIS are often made on

horizontal projections of objects– length and area may be substantially lower than on

a true three-dimensional surface

Page 4: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Be careful

• 0.25:1 – Hypotenuse = 1.03

• 0.5:1 – Hypotenuse = 1.11

• 1:1 – Hypotenuse = 1.41

• 2:1 – Hypotenuse = 2.24

• 3:1 – Hypotenuse = 3.16

• No an issue if the gradient is uniform.

Page 5: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Manhattan Distance

• Distance is computed between to points (cells) by moving either N-S or E-W.

51510 y

Cell 1

15, 15

Cell 2

10, 20 (row, column)

51520 x )(* yxCellsizeD

Page 6: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Distance Methods

• Nearest-Neighbor Distance (NND)

* Basic Statistics from Sample (Mean, SD)

* Compare to Expect Population Mean, SD

* Z statistic, R statistic

* Assumes a normal distribution to compute expected values

* Global estimate of pattern

Page 7: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Nearest Neighbor Distance

R < 1

R > 1

R = 1

Pointsfor Random NND MeanExpected

NND Observed MeanR

Page 8: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Nearest Neighbor Analysis

Nearest neighbor analysis examines the distances between each point and the closest point to it, and then compares these to expected values for a random sample of points from a CSR (complete spatial randomness) pattern. CSR is generated by means of two assumptions: 1) that all places are equally likely to be the recipient of a case (event) and 2) all cases are located independently of one another.

The mean nearest neighbor distance =

where N is the number of points. di is the nearest neighbor

distance for point i.

Page 9: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

The expected value of the nearest neighbor distance in a random pattern =

where A is the area and B is the length of the perimeter of the study area.

The variance =

Page 10: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

And the Z statistic =

This approach assumes:

Equations for the expected mean and variance cannot be used for irregularly shaped study areas. The study area is a regular rectangle or square. Area (A) is calculated by (Xmax – Xmin) * (Ymax – Ymin), where these represent the study area boundaries.

R statistic = Observed Mean d / Expect d

R = 1 random, R 0 cluster, R 2+ uniform

Page 11: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

2 x 0.5

A = 1, B = 5

E (di) = 0.05277

Var (d) = 8.85 x 10-6

1 x 1

A = 1, B = 4

E(di) = 0.05222

Var(d) = 8.48 x 10-6

2 x 2: E(di) = 0.10444

Page 12: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Real world study areas are complex and violate the assumptions of most equations for expected values.

Wilderness Campsites

Page 13: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Solution

• Simulate randomization using Monte Carlo Methods.

Compare simulated distribution to observed.

* If possible use the “true” area and perimeter to compute the expected value.

* Software that does not ask for area/perimeter or a shapefile of the study area will assume a

rectangle.

Page 14: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Autotheft – Within City

Page 15: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Autotheft - Downtown

Page 16: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Autotheft - Neighborhood

Page 17: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Nearest Neighbor - ArcMap

Method Area Observed NND

Expected NND

Z Score P-Valve

Euclidean 1668437432 278 729 -33.1 0.000

Euclidean 943000863 278 548 -26.3 0.000

Manhattan 1668437432 399 729 -28.6 0.000

Manhattan 57850697 227 235 -1.1 0.284

Manhattan 10743164 251 223 +1.8 0.071

Page 18: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Distance Methods• G Function (Revised NND)

* Same measurements as NND

* Analyzed using a CDF – Compare to Expected

* Expected CDF can be Theoretical or Generated (E(G(d)) =

* d statistic (max distance between Observed and Expected CDF)

* Can test d statistic with the Kolmogorov-

Smirnov Test

2

1 de

Page 19: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

G Function

ndsdnodG i /])(.[)( min

From O’Sullivan and UnwinGeographic Information Analysis

1/12 = 0.083

Page 20: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Distance Methods

• F Function

* Similar to G – but measures distance for a set of random points

* Also uses CDF and same Expected Distribution Function as G

* Harder to Interpret!!!

* I have never used it. I also do not like it!

Both G and F Functions have edge and area problems. Better to use a generated expected distribution

Page 21: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

G and F Functions

Clustered

Evenly Spaced

From O’Sullivan and UnwinGeographic Information Analysis

Page 22: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

• K Function (Riley, 1976)

* Statistic is based on the sum of all the points within a distance d of each observation

where n = # of points

λ = Density (n/area)

C(si, d) = a circle with radius d centered at point si

Distance Methods

)/()],(.[)( 1 ndsCSnodK ini

Page 23: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Ripley K counts the number of points found with r distance from each point.

The maximum r distance should be about ½ the short dimension of the input points.

The K increases quicker then expected the points are clustered.

If K increases slower then expected the points are dispersed.

Page 24: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Distance Methods

• Expect K(d)

E(K(d)) = λ π d2 / λ = π d2

L(d) = (K(d)/ π)1/2

E(L(d)) = d

Page 25: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

K Function

Clustered

Evenly Spaced

From O’Sullivan and UnwinGeographic Information Analysis

L(d

)L

(d)

Page 26: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed
Page 27: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

There are a total of 32 points in this analysis.

New Mexico is approximately 500km per side, so we will set our maximum study distance at 250km. We choose 25 increments so that we will calculate the observed L(d) and confidence interval for every 10km.

99 permutations are used for creating the confidence envelope in order to test the null hypothesis at approximately the a=0.01 level.

Page 28: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

Figure 2: Graph of K-Function Results

Page 29: Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed

A graph of the K-function results is shown below. The observed L(d) is 0 for 10km and 20km because the closest pair of points is approximately 29km apart.

At a distance of 30km, the observed L(d) falls within the generated confidence interval. However, for distances between 40km and 90km the observed L(d) lies outside of the confidence interval.

This indicates that we can reject the null hypothesis of CSR. Also, since the observed L(d) is less than the Minimum L(d), this implies that we have a statistically significant dispersed or regular distribution of points.