36
Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14, 2004

Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Embed Size (px)

Citation preview

Page 1: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Applying Geostatistical Methods to Lattice Data: An Initial

Examination of U.S. Presidential Elections in Iowa

A.C. ThomasStatistics 225

December 14, 2004

Page 2: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Sources/Guides

• Main source: “Hierarchical Models”, chapters 2 and 3 (geostatistical and spatial data)

• Data sources: http://www.sos.state.ia.us/elections/results/ (1996/2000)

• http://www.cnn.com/ (2004)• Special thanks: Brad Carlin (UMN),

Andy Gelman (Columbia), Paul Edlefsen (Harvard)

• GeoR: P.J. Ribeiro and P.J. Diggle

Page 3: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Motivation

• In this course, we have learned about three different methods of examining spatial data (depending on relevant conditions) with some interchangeabilities

• Often, we may not have the tools to examine data sets using one method (i.e. the shortcomings of R in manipulating lattice data)

• In this case, we will compare and contrast the effectiveness of a geostatistical method used on lattice data to a lattice method through self cross-validation

Page 4: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Interrelationship

• Geostats and kriging: using variograms and distance relationships to predict quantities across distances

• Lattices: using neighbour relationships to predict quantities across distances

• Direct similarities: some weighting schemes across distances directly resemble covariograms

Page 5: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Why election data?

• Why not?• Spatial organization is well understood

and constant in time (county borders have not changed across data sets) and built into R (maps library)

• While specific challengers change over time, parties are relatively constant, as are other control variables

• Ramifications are germane to the functioning of society (and the insatiable appetite of news junkies and policy wonks)

Page 6: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Questions:

• For this data set, does a geostatistical approximation produce a result comparable in error to a lattice model?

• If so, can we use fitted information from one election to predict the complete results of the next one? (And how much are we off?)

Page 7: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Chosen model: Iowa

Page 8: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Why Iowa?

• 99 counties which have roughly equal area, removing a possible nuisance (and are rectilinear, so easier to draw)

• Swing state, with a rough vote balance over time

• Not too big, not too small in either population or size

Page 9: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Simplification: No third parties

• For now, considering only the votes for Democrat and Republican candidates in presidential elections from 1996-2004

• Not so bad in 2000/2004, when independent vote was about 3% of total

• Worse in 1996 (Perot’s successful campaign drew a lot), up to 10% of total votes

Page 10: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Iowa in 1996 (Dole, Clinton)

Page 11: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Iowa in 2000 (Bush, Gore)

Page 12: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Iowa in 2004 (Bush, Kerry)

Page 13: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Initial impressions

• There seems to be a tendency to vote more Republican the further west we look

• (Observation, courtesy Matt Anthony: as we go east, we hit Illinois, a Democratic core.)

• What is the population distribution by county over time?

Page 14: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Iowa’s total voters, 1996

Page 15: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Iowa’s total voters, 2000

Page 16: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Iowa’s total voters, 2004

Page 17: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Quick-and-dirty non-spatial analysis

• Question: how does population size correlate with the Democratic vote?

• Correlation between blue vote and “total” vote:

• 1996: = 0.18• 2000: = 0.30• 2004: = 0.29.• So population would appear to be

an important covariate.

Page 18: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Geostatistical analysis

• Locations: centroids of each county (obtained through centroid.polygon function in maps library of R)

• Data: Republican percentage of vote (arbitrarily chosen, not necessarily personal political affiliation)

Page 19: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Initial data plots: Unaltered

Page 20: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Initial fitting

• Semivariogram appears to increase without bound, suggesting nonstationarity

• Plan: use Universal Kriging with this semivariogram

• Problem: Trend appears to be power law, with power greater than 2 (impossible to fit with conventional definitions

• Possible solutions: a) remove trend from data. b) don’t care.

Page 21: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Plan A: Remove trend from data

• What it does: lets us remove known spatial dependence, look at other trends

• Initial look: – major discrepancies.

Page 22: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Plan B: Don’t care.

• The goodness of fit only tails off at the end

• Preliminary results show the other option to be extremely inaccurate due to noise levels in residual data

Page 23: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Second trend removed, data centered

Page 24: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Exploratory Kriging

Page 25: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Meaningful Kriging

• Since we want to test the predictive power of this method, we should test it on our current data through cross-validation

• Key: remove one point, use semivariogram with remaining points to interpolate the value at each centroid

• Then, return trend to data and compare with original values

• Use universal kriging with second-degree trend

Page 26: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

1996 Redux – Predicted Values

• In total, Dole “receives” 9,726 more votes than predicted.

• Absolute error: 43,526

• Total 2-party votes: 1,112,902

Page 27: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Fitting variograms between models

• For all, power model was appropriate choice ^2 + ^2 * t^

• 1996: ^2 = 9.24e-4, =1.98, ^2=0.031• 2000: ^2 = 9.93e-4, =2.00, ^2=0• 2004: ^2 = 1.16e-3, =2.00, ^2=0.025• All roughly identical, even with different

total averages

Page 28: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

2000 Predicted

• Prediction: Bush gets 26,000 more votes

• Absolute error: 181,880

• Total Bush/Gore votes: 1,272,890

Page 29: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

2004 Prediction

• Prediction: Bush gets 32,094 more votes

• Absolute difference: 74,458

• Total votes: 1,479,702

Page 30: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

“Naïve Neighbour”

• For a baseline comparison, take the simplest (stupidest) lattice cross-validation test – “ask your neighbour”, trivial SAR weights

• Predicted value at a square is simply the mean of border-sharing neighbours (data is Republican percentage of vote)

Page 31: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

“NN” 1996

• Dole: 10,819 more predicted

• Total deviation: 40,923

Page 32: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

“NN” 2000

• Bush gets 28,535 extra in prediction

• Total deviation: 59,670

Page 33: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

“NN” 2004

• Bush gets 37,175 more

• Total deviation: 76,926

Page 34: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Cross-validation summary

Geostat error

NN error

Geostat total error

NN total error

Voting pop.

1996

9,726 10,819 43,526 40,923 1,112,902

2000

26,000

28,535 61,485 59,670 1,272,890

2004

32,094

37,175 74,458 76,926 1,479,702

Page 35: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Conclusions

• Data is definitely not stationary, even after removing trends

• Good kriging is about as effective as “naïve neighbour”, both without covariates

• Prediction with these tools at this simple level is not yet accurate enough

• Each method overpredicts the Republican vote

• Fitting information for each year is very close

Page 36: Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential Elections in Iowa A.C. Thomas Statistics 225 December 14,

Future Developments and Unanswered Questions – New!

• I’ve since introduced universal co-kriging with population, past voting behavior and second-degree spatial dependences using the gstat package.

• Needed: data from the last 4 elections, conveniently packaged. Other prediction using spatial methods.