Interaction Analysis of Spatial P o int P atterns - UCSBchris/Medrano_GEO 210C/GEO 210C... · 2011-03-29 · Kyri aki di s (UCSB) Geog 210C Sp ring 2009 2 / 27. Outline Concepts &

Interaction Analysis of Spatial Point PatternsGeog 210C

Introduction to Spatial Data Analysis

Phaedon C. Kyriakidiswww.geog.ucsb.edu/!phaedon

Department of Geography

University of California Santa Barbara

Santa Barbara, CA 93106-4060

[email protected]

Spring Quarter 2009

Spatial Point Patterns

DefinitionSet of point locations with recorded “events” within study region, e.g., locationsof trees, disease or crime incidents

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100N=100 clustered events in a study region

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100N=100 random events in a study region

! point locations could correspond to all possible events or to subsets of them(mapped versus sampled point pattern)

! attribute values could have also been measured at event locations, e.g., treediameter (marked point pattern) – not considered in this handout

Objective of this handout

! Introduce statistical tools for quantifying spatial interaction of events,e.g., clustering versus randomness or regularity

Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 2 / 27

Outline

Concepts & Notation

Distance & Distance Matrices

Distances Involved in Spatial Point Patterns

Quantifying Spatial Interaction: G Function

Quantifying Spatial Interaction: F Function

Quantifying Spatial Interaction: K Function

Points To Remember


Concepts & Notation

Some Notation

Point eventsSet of N locations of events occurring in a study area:

{ui , i = 1, . . . ,N}, ui ! D " RK

ui = coordinate vector of i-th event location, e.g., in 2D ui = {xi yi}, ! = belongs to,D = study domain, a subset " of a K -dimensional space RK

Variable of interesty(s) = number of events (a count) within arbitrary domain or support s withmeasure (length, area, volume) |s|; support s is centered at an arbitrary location uand can also be denoted as s(u); in statistics, y(s) is treated as a realization of arandom variable (RV) Y (s)

ObjectiveQuantify interaction, e.g., covariation, between outcomes of any two RVs Y (s)and Y (s "). To do so, all RVs must lie in the same “environment”; in other words,the long-term average (expectation) of RV Y (s) should be similar to that of Y (s ")


Concepts & Notation

Intensity of Events

Local intensity !(u)Mean number of events per unit area at an arbitrary location or point u, formallydefined as:

!(u) = lim|s|#0

!E{Y (s)}

|s|

", u ! D

where E{Y (s)} denotes the expectation (mean) of RV Y (s) within region s(u) centeredat u and |s| is the area of that region

Overall intensity !

Estimated as: ! =n

|D| , where |D| = measure (area) of study region D

First-order stationarityAny RV Y (s) should have the same long-term average, for a fixed areal unit s.This implies a constant intensity: !(u) = !, #u ! D, and the expected number ofevents with a region s is just a function of |s|: E{Y (s)} = !|s|, s ! D


Concepts & Notation

Interaction Between Count RVs

Second-order intensityLong-term average (expectation) of products of counts per unit areas at any twoarbitrary points u and u", formally defined as:

"(u,u") = lim|s|,|s!|#0

!E{Y (s)Y (s ")}

|s||s "|

", u,u" ! D

Some terminology

! second-order stationarity: expectation of all RVs is constant (first-orderstationarity), and second-order intensity is a function of separation vectorbetween any two locations u and u"

! isotropy: only distance (not orientation) of separation vector matters

OutlookQuantifying interaction in spatial point patterns within the above assumptions orworking hypotheses amounts to studying distances between events


not the same as E{Y(s)}*E{Y(s')}, unless variables are independent


Distance

A measure of proximity (typically along a crow’s flight path) between any twolocations or spatial entities

Euclidean distanceConsider two points in a 2D (geographical or other) space with coordinatesui = (xi , yi ) and uj = (xj , yj). The Euclidean distance dij between points ui anduj is computed via Pythagoras’s theorem as:

dij = d(ui ,uj) = ||ui $ uj || =#

(xi $ xj)2 + (yi $ yj)2

||ui $ uj || is called the 2-norm of vector hij = ui $ uj

locations ui and uj are called, respectively, the tail and head of vector hij

x ix

iu

jy

iy

ix jx

iy jydij

j

j

y

u

xPh. Kyriakidis (UCSB) Geog 210C Spring 2009 7 / 27


Distance Metric

Formal characteristics of a distance metricA measure dij of proximity between locations ui and uj is a valid distance metric ifit satisfies the following requirements:

! distance between a point and itself is always zero: dii = 0! distance between a point and another one is always positive: dij > 0! distance between two points is the same no matter which point you consider

first: dij = dji! the triangular inequality holds: sum of length of two sides of a triangle

cannot be smaller than length of third side: dij % dil + dlj

A metric dij need not always be Euclidean,hence should checked to ensure that it is a valid distance metric



Non-Euclidean Distances

Alternative “distance” measures(i) over a road, or railway, (ii) along a river, (ii) over a network

u

5u

4u

1u2u

3

Euclidean distance between locationsnetwork distance between locations

Even more exotic “distance” measures(i) travel time over a network, (ii) perceived travel time between urban landmarks,(iii) volume of exports/imports

Euclidean distances between network nodes#= actual or perceived distances on the network

the latter might not even be formal distance metrics, i.e.: dij #= dji



Minkowski’s Generalized Distance

DefinitionConsider two points in a K -dimensional (geographical or other) space RK withcoordinate vectors ui = [ui1, . . . , uik , . . . , uiK ] and uj = [uj1, . . . , ujk , . . . , ujK ]. The

Minkowski distance of order p (with p > 1), denoted as d (p)ij , between points ui

and uj is computed as:

d (p)ij =

$K%

k=1

|uik $ ujk |p&1/p

Particular cases! Manhattan or city-block distance: d (1)

ij ='K

k=1 |uik $ ujk |

! Euclidean distance: d (2)ij =

#'Kk=1 |uik $ ujk |2

! infinity norm or Chebyshev distance, as p &':max(|ui1 $ uj1|, . . . , |uik $ ujk |, . . . , |uiK $ ujK |)

Distances computed from points in multidimensional spacesare routinely used in statistical pattern recognition;

points represent objects or cases, each described by K attribute values



Euclidean Distance Matrix: Single Set of Points

DefinitionConsider a set of N points {u1, . . . ,ui , . . . ,uN} in a K -dimensional (geographicalor other) space. The distance matrix D is square (N ( N) matrix containing thedistances {d(ui ,uj), i = 1, . . . ,N, j = 1, . . . ,N} between all N ( N possible pairsof points in the set

ui u1 u2 u3 u4 u5

xi x1 x2 x3 x4 x5

yi y1 y2 y3 y4 y5

by convention, u1 is the coordinate vector of the 1st point in the set (1st entry in data file)

D =

!

""""#

d11 d12 d13 d14 d15

d21 d22 d23 d24 d25

d31 d32 d33 d34 d35

d41 d42 d43 d44 d45

d51 d52 d53 d54 d55

$

%%%%&=

!

""""#

0 d12 d13 d14 d15

d12 0 d23 d24 d25

d13 d23 0 d34 d35

d14 d24 d34 0 d45

d15 d25 d35 d45 0

$

%%%%&= [dij ]

i-th row (or column) contains distances between i-th point ui and all others (including itself)D is symmetric with zeros along its diagonal



Euclidean Distance Matrix: Two Sets of Points

DefinitionConsider 2 sets of points {u1, . . . ,ui , . . . ,uN} and {t1, . . . , tj , . . . , tM} in aK -dimensional (geographical or other) space. The distance matrix D is a (N (M)matrix containing the Euclidean distances {d(ui , tj), i = 1, . . . ,N, j = 1, . . . ,M}between all N (M possible pairs formed by these two sets of points

ui u1 u2 u3 u4 u5

xi x1 x2 x3 x4 x5

yi y1 y2 y3 y4 y5

tj t1 t2 t3 t4 t5 t6 t7xj x1 x2 x3 x4 x5 x6 x7

yj y1 y2 y3 y4 y5 y6 y7

by convention, u1 is the coordinate vector of the 1st datum in the data set #1, and similarly for t1

D =

!

""""#

d11 d12 d13 d14 d15 d16 d17

d21 d22 d23 d24 d25 d26 d27

d31 d32 d33 d34 d35 d36 d37

d41 d42 d43 d44 d45 d46 d47

d51 d52 d53 d54 d55 d56 d57

$

%%%%&= [dij ]

i-th row contains distances between i-th point ui in set #1 and all points in set #2j-th column contains distances between j-th point tj in set #2 and all points in set #1

D is not symmetric, i.e., d12 #= d21: pair {u1, t2} is not the same as pair {u2, t1}



Distances Between Events in A Point Pattern

Event-to-event distanceDistance dij between event at location ui and another event at location uj :

dij =#

(xi $ xj)2 + (yi $ yj)2

Point-to-event distanceDistance dpj between a randomly chosen point at location tp and an event atlocation uj :

dpj =#

(xp $ xj)2 + (yp $ yj)2

Event-to-nearest-event distanceDistance dmin(ui ) between an event at location ui and its nearest neighbor event:

dmin(ui ) = min{d ijj "=i

, j = 1, . . . ,N}

Point-to-nearest-event distanceDistance dmin(tp) between a randomly chosen point at location tp and its nearestneighbor event:

dmin(tp) = min{dpj , j = 1, . . . ,N}Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 13 / 27


Event-to-Nearest-Event Distances

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

u1

Pattern with N=5 events

u2

u3 u4 u5

!

"""""#

0.00 76.24 59.81 92.21 77.70

76.24 0.00 42.83 20.35 17.62

59.81 42.83 0.00 46.03 30.58

92.21 20.35 46.03 0.00 15.94

77.70 17.62 30.58 15.94 0.00

$

%%%%%&

Distance matrix

e.g., 59.81 = dmin(u1), 17.62 = dmin(u2)Some events might be nearest neighbors of each other: e.g., u4, u5,

or have same nearest neighbor: e.g., u2, u3, u4 are nearest neighbors of u5

Mean nearest neighbor distance

Average of all dmin(ui ) values: dmin =1

N

N%

i=1

dmin(ui )

Drawback: single number does not su!ce to describe point pattern



The G Function

DefinitionProportion of event-to-nearest-event distances dmin(ui ) no greater than givendistance cuto" d , estimated as:

G (d) =#{dmin(ui ) % d , i = 1, . . . ,N}

N

Cumulative distribution function (CDF) of all N event-to-nearest-event distances; insteadof computing average dmin of dmin values, compute their CDF

For point pattern in previous page

10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

event-to-nearest neighbor distance, d

Sample histogram of event−nearest−neighbor distances

freq

uenc

y

10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


G(d

)

Sample G function

for larger number of events N, G(d) becomes smootherPh. Kyriakidis (UCSB) Geog 210C Spring 2009 15 / 27


Event-to-Nearest-Event (E2NE) Distance Histograms

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100N=100 random stratified events in a study region

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 2 4 6 8 10 120

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16


Histogram of E2NE distances (evenly−spaced events)

freq

uenc

y

0 1 2 3 4 5 60

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09


Histogram of E2NE distances (clustered events)

freq

uenc

y

! for evenly-spaced events, more E2NE distances similar to spacing of events! for clustered events, more small E2NE distances and fewer large such distances



Sample G Function Examples

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


G(d

)

Sample G function (evenly−spaced events)

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


G(d

)

Sample G function (clustered events)

! for evenly-spaced events, G(d) rises gradually up to the distance at which mostevents are spaced, and then increases rapidly

! for clustered events, G(d) rises rapidly at short distances, and then levels o! atlarger d-values



The F Function

DefinitionProportion of point-to-nearest-event distances dmin(tj) no greater than givendistance cuto" d , estimated as:

F (d) =#{dmin(tj) % d , j = 1, . . . ,M}

M

Cumulative distribution function (CDF) of all M point-to-nearest-event distances

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100Pattern with N=5 events and M=100 random points

0 10 20 30 40 50 60 700

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

point-to-nearest neighbor distance, d

F(d

)

Sample F function

for larger number M of random points, F (d) becomes even smootherNote: The F function provides information on event proximity to voids



Point-to-Nearest-Event (P2NE) Distance Histograms

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 2 4 6 8 100

0.005

0.01

0.015

0.02

0.025

0.03


Histogram of P2NE distances (evenly−spaced events)

freq

uenc

y

0 10 20 30 40 500

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04


Histogram of P2NE distances (clustered events)

freq

uenc

y

! for evenly-spaced events, there are more nearest events at small distances fromrandomly placed points

! for clustered events, P2NE distances are generally larger than the previous case,and there are a few large such distances



Sample F Function Examples

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


F(d

)

Sample F function (evenly−spaced events)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


F(d

)

Sample F function (clustered events)

! for evenly-spaced events, F (d) rises rapidly up to the distance at which most eventsare spaced, and then levels o! (more nearest neighbors at small distances fromrandomly placed points)

! for clustered events, F (d) rises rapidly at short distances, and then levels o! atlarger d-values



Comparing Sample G and F Functions

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

distance, d

prop

orti

on

Sample G and F functions (evenly−spaced events)

G(d)F (d)

0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

distance, d

prop

orti

on

Sample G and F functions (clustered events)

G(d)F (d)

! for evenly-spaced events, there is more “open” space (smaller point-to-event

distances), hence F (d) rises faster than G(d)! for clustered events, the reverse is true



The Sample K Function

Concept building

1. construct set of concentric circles (of increasing radius d) around each event2. count number of events in each distance “band”3. cumulative number of events up to radius d around all events = sample K

function K (d)

.

.

1u

u2

u3

..

..

..

.

within distance h=6 units

. ..

. ...

..

.

.. ..

.

..

. ...

.

.

.

Example of K function estimation

from event at location

.


3 events

6 eventswithin distance h=6 units


4 eventswithin distance h=6 units

Formal definition

K(d) =E{# of events within distance d of any arbitrary event }

E{# of events within study domain }

$ 1!

1N

#{dij % d , i = 1, . . . , N, j(#= i) = 1, . . . , N} = K(d)Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 22 / 27


Interpreting The Sample K Function

Re-expressing

K(d) =1!

1N

#{dij % d , i = 1, . . . , N, j(#= i) = 1, . . . , N}

=|D|N

1N

#{dij % d , i = 1, . . . , N, j(#= i) = 1, . . . , N}

= |D|(proportion of event-to-event distances % d)

In other words: Function K (d) is the sample cumulative distribution function(CDF) of all N2 $ N event-to-event distances, scaled by |D|

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

u1

Pattern with N=5 events

u2

u3 u4 u5

0 20 40 60 80 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

event-to-event distance, d

Sample histogram of event−to−event distances

freq

uenc

y

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


K(d

)/|A

|

Sample K function (/10000)

Note: Ignore bin at d = 0 (center plot) and point at d = 0 (right plot)



Event-to-Event Distance Histograms

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 100 120 1400

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

event-to-event distance

Histogram of event−to−event distances (evenly−spaced)

freq

uenc

y

0 20 40 60 80 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045


Histogram of event−to−event distances (clustered)

freq

uenc

y

! for evenly-spaced events, there are more medium-sized E2E distances than small orlarge such distances

! for clustered events, the distribution of E2E distances is multi-modal



Event-to-Event Distance CDFs

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 100 120 1400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


cum

ulat

ive

freq

uenc

y

CDF of event−to−event distances (evenly−spaced)

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


cum

ulat

ive

freq

uenc

y

CDF of event−to−event distances (clustered)

! for clustered events, there are multiple bumps in the CDF of E2E distances due tothe grouping of events in space



Sample K Function Examples

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90


0 10 20 30 40 500

500

1000

1500

2000

2500

3000

3500

4000

4500

5000


Are

a!

prop

ortion

,K(d

)

Sample K function (evenly−spaced events)

0 10 20 30 40 500

500

1000

1500

2000

2500

3000

3500

4000

4500

5000


Are

a!

prop

ortion

,K(d

)

Sample K function (clustered events)

! sample K function K(d) is monotonically increasing and is a scaled (by domainmeasure |D|) version of the CDF of E2E distances


Points To Remember

Recap

Quantifying interaction in spatial point patterns

! event-to-nearest-event distances $& use the sample G function G (d)! point-to-nearest-event distances $& use the sample F function F (d)! event-to-event distances $& use the sample K function K (d)

K function looks at information beyond nearest neighbors

Caveats! clustering is always a function of the overall intensity of a point pattern! clustering might occur due to local intensity variations or due to interaction;

it is very di!cult to disentangle each contribution

Watch out for! boundaries and edge e"ects! distance distortions due to map projections! sampled versus mapped point patterns


Documents

Interaction Analysis of Spatial P o int P atterns - UCSBchris/Medrano_GEO 210C/GEO 210C... · 2011-03-29 · Kyri aki di s (UCSB) Geog 210C Sp ring 2009 2 / 27. Outline Concepts &