64
SPATIAL MODELS FOR DATA REPORTED AS COUNTS OVER GEOGRAPHIC AREAS Gary Simon, 28 APRIL 2006

SPATIAL MODELS FOR DATA REPORTED AS COUNTS OVER GEOGRAPHIC AREAS

  • Upload
    jude

  • View
    32

  • Download
    3

Embed Size (px)

DESCRIPTION

SPATIAL MODELS FOR DATA REPORTED AS COUNTS OVER GEOGRAPHIC AREAS. Gary Simon, 28 APRIL 2006. With special thanks… Frank LoPresti, Academic Computing Services, GIS Group Kevin Tun, Stern I.T. Group. Here’s an interesting obscure formula. Consider a set of points: - PowerPoint PPT Presentation

Citation preview

Page 1: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

SPATIAL MODELS

FOR DATA

REPORTED AS COUNTS

OVER GEOGRAPHIC AREAS

Gary Simon, 28 APRIL 2006

Page 2: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

With special thanks…

Frank LoPresti, Academic Computing

Services, GIS Group

Kevin Tun, Stern I.T. Group

Page 3: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Here’s an interesting obscure formula. Consider a set of points:

Point 1: (x1 , y1)

Point 2: (x2 , y2)

….

Point n: (xn , yn)

Page 4: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Connect the points in order. Draw a line from point 1 to point 2, then from point 2 to point 3, …., from point n-1 to point n. Finally draw a line from point n back to point 1.

Assume that none of the segments cross, so that this is a polygon.

Page 5: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The area of the resulting polygon is given by

1 11

12

n

i i i ii

x y x y

The + occurs when the perimeter is drawn counter-clockwise, the – when drawn clockwise.

Page 6: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS
Page 7: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The data:

K regions

Counts zl , z2 , …, zK

Total count z+

Populations P1, P2 , …, PK

Total population P+

Page 8: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The obvious null hypothesis of uniformity is tested by

G2 = 1

2 logK k

kkk

zz Pz

P

Page 9: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Uniformity is often rejected. What should be the alternative to uniformity?

Techniques like kriging assess covariance structure and not the structure of the expected counts.

Page 10: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

There are also techniques that measure spatial association (Cliff and Ord, 1973, 1981) with I and with c, and these also relate to covariance notions. Cliff, A.D. and Ord, J.K. (1981) Spatial Autocorrelation, London: Pion.

Cliff, A.D. and Ord, J.K. (1981) Spatial Processes: Models and Applications, London: Pion.

Spatial association can also be given angular interpretations (Simon, 1997). Simon, Gary (1997) An Angular Version of Spatial Correlations, with Exact Significance Tests,

Geographical Analysis, vol 29, #3, pp 267-278.

Page 11: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Let’s form a model for the “spatial force” and give this model a central location or hot spot.

Note this location as s = .x

y

ss

Here sx and sy are parameters to be estimated.

Page 12: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Let f(z) be the spatial force at

location z = .

Then let f(z) =

=

xy

2

22

c z s

1 11

c

z s z s

Page 13: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Since f(z) = ,

f(s) = c .

At any z with = α ,

f(z) = .

Thus α is a “half-strength” distance.

2

22

c z s

z s

2c

Page 14: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

In this form, the only role of c is to assure the condition

1 1

EK K

k kk k

z z

Page 15: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

This can be generalized to mix uniform and hot-spot features.

f(z) = 2

221c

z s

The parameter ω assesses the strength of the hot-spot relative to uniformity.

Negative ω notes a protective effect.

Page 16: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The maximum likelihood expected counts { ek } will be used in the test statistic

1

2 logK k

kk k

zz

e

G2 =

Page 17: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The value of ek will be computed as

Pk × “average” force on county k

scaled so that 1 1

K K

k kk k

z e

Page 18: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Consider cancer rates in Florida. “Age-Adjusted Death Rates for Florida, 1998 – 2002.”

http://www.stateofflorida.com

Page 19: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Florida has 67 counties.

There were 38,814 cases in a population of 15,982,378. The rate is 2.43 per 1,000.

The G2 statistic is 2,816.27 on 66 degrees of freedom.

The cancer rates are not uniform.

Page 20: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The maximum likelihood fit occurred at parameter values

sx = 375.8877

sy = 300.6793

α = 13.4375

ω = 2.325

Page 21: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

This fit has G2 = 2,246.93 on

67 - 4 = 62 degrees of freedom.

This is still an inadequate fit, but the reduction in G2 is 569.34 with four degrees of freedom.

Page 22: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The fitted values are these:

The hot spot is at (82.56 w long, 28.80 n lat), in Citrus County.

Page 23: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Map information comes in (longitude, latitude) form that needs to be converted to (x, y) form in (say) miles.

Page 24: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Each degree of latitude has the same mile equivalent.

North Pole

Equatorial plane

One degree of latitude

cuts off same arc length

at all latitudes.

Page 25: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

However, a degree of longitude represents a small distance near the poles and a large distance near the equator.

Equator

30° N Latitude

Page 26: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Problem: Find the length of one degree of longitude at latitude θ.

Solution: Form a triangle with one corner at the north pole, an angle of one degree at the north pole, and with sides 90°-θ.

Page 27: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

In a spherical triangle, the sides also have angle measure.

Equator

30° N Latitude

Page 28: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

We can use the law of sines for spherical triangles:

sin sin sinsin sin sin

a b cA B C

A, B, C are the angles and a, b, c are the sides.

Page 29: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The computation of E(zk) = ek is found as Pk × “average” force on county k.

This average force could be f(ck), where ck is the center of the county.

Page 30: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Instead we will use

where denotes the county and h is the two-dimensional variable of integration.

Areaf d h hB

B

Page 31: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Area BThe value of can be obtained from outside sources.

The challenge comes in finding

This can be difficult even for simple figures; is not simple.

f d h hB

Page 32: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Finding requires

some organized description of , the boundary of .

Fortunately, such descriptions are available from mapping programs.

f d h hB

Page 33: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Consider this geographical region:

Page 34: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Mapping program MapInfo will export an MIF file giving coordinates of (latitude, longitude) points on the boundary.

The file has layout 26-75 40.1288-75.0154 40.1378-75.1094 40.0454...-75 40.0294-74.9755 40.0485-74.9893 40.1259-75 40.1288

Page 35: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

A graph of these points:

Page 36: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

With the boundary so identified, county is a polygon, so the task of finding is equivalent to integrating over that polygon.

f d h hB

The mathematics can be done with Green’s theorem.

Page 37: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Green’s theorem for connected region and for scalar functions P and Q of two variables is

=

P dx Q dy

B

Q P dx dyx y

B

Page 38: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The boundary needs to be parameterized as a function of a single variable, say t. This is possible when the boundary is made up of simple curves or, as in the MapInfo story, straight lines.

Page 39: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The line connecting

to

is parameterized as

,k kx yB B 1 1,k kx y B B

1

1

; 0 1k k k

k k k

x t x x x tt

y t y y y t

B B B

B B B

Note that dy means . 1k ky y dt B B

Page 40: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

In the statement of Green’s theorem,

=

let’s use and

so that 1Q P

x y

Q P dx dyx y

B

P dx Q dy

B

1Qx

0Py

Page 41: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Green’s theorem is now

=

= Area() =

Q P dx dyx y

B

dx dyB

P dx Q dy

B

Page 42: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

This solves as

P(x, y) = 0 and Q(x, y) = x

and then

Area() = x dyB

Page 43: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

With the boundary given as a polygon, the calculation is routine.

The consequence is

Area() =

where m is the number of boundary points of region .

1 11

12

m

k k k kk

x y x y

B B B B B

Page 44: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

This calculation finds the area of region and, as a side benefit, discovers whether the point ordering was clockwise or counter-clockwise.

Page 45: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

We need also the integrated force function

f d h hB

Page 46: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Match

to Green’s theorem

=

with P(x, y) ≡ 0

and

f d h hB

Q P dx dyx y

B

P dx Q dy

B

,Q f f x yx

h

Page 47: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

This means that we need to be able to find

Q(x, y) =

The solution is Q(x, y) =

,f dx f x y dx z

21

2 22 2tan x

y y

c x scxy s y s

Page 48: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Then

=

=

f d h hB

,Q x y dyB

21

2 22 2tan x

y y

c x scx dyy s y s

B

Page 49: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS
Page 50: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Let , , … ,

be the boundary points of . Then

1 1,x yB B 2 2,x yB B ,m mx yB B

B B

1 segment

, ,m

k k

Q x y dy Q x y dy

B

B

Segment k connects point k to point k + 1.

(Last segment goes back to point 1.)

Page 51: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Each segment is parameterized by t, with 0 t 1. The integral can be found by any reasonable approximation method.

If the interval is short, use the average of the integrand at the endpoints.

Page 52: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

In particular,

Segment

,k

Q x y dy 1 1

Segment

, ,

2k k k k

k

Q x y Q x ydy

B B B B

=

=

1

1 11

0

, ,2

k k k kk k

Q x y Q x yy y dt

B B B BB B

1 11

, ,

2k k k k

k k

Q x y Q x yy y

B B B BB B

Page 53: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The summation over k collapses to the very simple form

1 11

,

2

mk k

k kk

Q x yy y

B B B

B B

The counter k is to be interpreted mod(m).

Page 54: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

For any values of sx, sy, α, ω it is possible to express a Poisson likelihood and thus to get maximum likelihood estimates.

This is not easy computation.

Page 55: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Trevelyan, Smallman-Raynor, and Cliff provided a spatial analysis of the 1916 polio epidemic that hit the northeastern United States.

Trevelyan, Barry, Smallman-Raynor, Matthew, and Cliff, Andrew D. (2005) The Spatial Structure of Epidemic

Emergence: Geographical Aspects of Poliomyelitis in North-eastern USA, July-October 1916, Journal of the RoyalStatistical Society, Series A, vol 168, part 4, pp 701-722.

Page 56: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Their region of inquiry:

County-based data for 148 counties.

Page 57: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

These counties had total population 20,532,602 and 20,777 cases of polio.

This is about 1.01 cases per thousand people.

Page 58: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Observed polio rates:

Page 59: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Test for uniformity gives

G2 = 16,713.64

147 degrees of freedom

Page 60: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Maximum likelihood estimates:

sx = 450.78

sy = 135.77

α = 56.80

ω = 15.66The center is offshore, east of Ocean County,

New Jersey.

Page 61: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The display of fitted rates:

Page 62: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Fit measure is G2 = 7,045.73143 degrees of freedom

Reduction in G2 is 9,667.91, for four degrees of freedom.

Page 63: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

Next step: Use the integrated force function as a carrier in a Poisson regression.

Page 64: SPATIAL MODELS  FOR DATA  REPORTED AS COUNTS  OVER GEOGRAPHIC AREAS

The End