Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential...

Applying Geostatistical Methods to Lattice Data: An Initial

Examination of U.S. Presidential Elections in Iowa

A.C. ThomasStatistics 225

December 14, 2004

Sources/Guides

• Main source: “Hierarchical Models”, chapters 2 and 3 (geostatistical and spatial data)

• Data sources: http://www.sos.state.ia.us/elections/results/ (1996/2000)

• http://www.cnn.com/ (2004)• Special thanks: Brad Carlin (UMN),

Andy Gelman (Columbia), Paul Edlefsen (Harvard)

• GeoR: P.J. Ribeiro and P.J. Diggle

Motivation

• In this course, we have learned about three different methods of examining spatial data (depending on relevant conditions) with some interchangeabilities

• Often, we may not have the tools to examine data sets using one method (i.e. the shortcomings of R in manipulating lattice data)

• In this case, we will compare and contrast the effectiveness of a geostatistical method used on lattice data to a lattice method through self cross-validation

Interrelationship

• Geostats and kriging: using variograms and distance relationships to predict quantities across distances

• Lattices: using neighbour relationships to predict quantities across distances

• Direct similarities: some weighting schemes across distances directly resemble covariograms

Why election data?

• Why not?• Spatial organization is well understood

and constant in time (county borders have not changed across data sets) and built into R (maps library)

• While specific challengers change over time, parties are relatively constant, as are other control variables

• Ramifications are germane to the functioning of society (and the insatiable appetite of news junkies and policy wonks)

Questions:

• For this data set, does a geostatistical approximation produce a result comparable in error to a lattice model?

• If so, can we use fitted information from one election to predict the complete results of the next one? (And how much are we off?)

Chosen model: Iowa

Why Iowa?

• 99 counties which have roughly equal area, removing a possible nuisance (and are rectilinear, so easier to draw)

• Swing state, with a rough vote balance over time

• Not too big, not too small in either population or size

Simplification: No third parties

• For now, considering only the votes for Democrat and Republican candidates in presidential elections from 1996-2004

• Not so bad in 2000/2004, when independent vote was about 3% of total

• Worse in 1996 (Perot’s successful campaign drew a lot), up to 10% of total votes

Iowa in 1996 (Dole, Clinton)

Iowa in 2000 (Bush, Gore)

Iowa in 2004 (Bush, Kerry)

Initial impressions

• There seems to be a tendency to vote more Republican the further west we look

• (Observation, courtesy Matt Anthony: as we go east, we hit Illinois, a Democratic core.)

• What is the population distribution by county over time?

Iowa’s total voters, 1996

Quick-and-dirty non-spatial analysis

• Question: how does population size correlate with the Democratic vote?

• Correlation between blue vote and “total” vote:

• 1996: = 0.18• 2000: = 0.30• 2004: = 0.29.• So population would appear to be

an important covariate.

Geostatistical analysis

• Locations: centroids of each county (obtained through centroid.polygon function in maps library of R)

• Data: Republican percentage of vote (arbitrarily chosen, not necessarily personal political affiliation)

Initial data plots: Unaltered

Initial fitting

• Semivariogram appears to increase without bound, suggesting nonstationarity

• Plan: use Universal Kriging with this semivariogram

• Problem: Trend appears to be power law, with power greater than 2 (impossible to fit with conventional definitions

• Possible solutions: a) remove trend from data. b) don’t care.

Plan A: Remove trend from data

• What it does: lets us remove known spatial dependence, look at other trends

• Initial look: – major discrepancies.

Plan B: Don’t care.

• The goodness of fit only tails off at the end

• Preliminary results show the other option to be extremely inaccurate due to noise levels in residual data

Second trend removed, data centered

Exploratory Kriging

Meaningful Kriging

• Since we want to test the predictive power of this method, we should test it on our current data through cross-validation

• Key: remove one point, use semivariogram with remaining points to interpolate the value at each centroid

• Then, return trend to data and compare with original values

• Use universal kriging with second-degree trend

1996 Redux – Predicted Values

• In total, Dole “receives” 9,726 more votes than predicted.

• Absolute error: 43,526

• Total 2-party votes: 1,112,902

Fitting variograms between models

• For all, power model was appropriate choice ^2 + ^2 * t^

• 1996: ^2 = 9.24e-4, =1.98, ^2=0.031• 2000: ^2 = 9.93e-4, =2.00, ^2=0• 2004: ^2 = 1.16e-3, =2.00, ^2=0.025• All roughly identical, even with different

total averages

2000 Predicted

• Prediction: Bush gets 26,000 more votes

• Absolute error: 181,880

• Total Bush/Gore votes: 1,272,890

2004 Prediction

• Prediction: Bush gets 32,094 more votes

• Absolute difference: 74,458

• Total votes: 1,479,702

“Naïve Neighbour”

• For a baseline comparison, take the simplest (stupidest) lattice cross-validation test – “ask your neighbour”, trivial SAR weights

• Predicted value at a square is simply the mean of border-sharing neighbours (data is Republican percentage of vote)

“NN” 1996

• Dole: 10,819 more predicted

• Total deviation: 40,923

“NN” 2000

• Bush gets 28,535 extra in prediction

“NN” 2004

• Bush gets 37,175 more

Cross-validation summary

Geostat error

NN error

Geostat total error

NN total error

Voting pop.

9,726 10,819 43,526 40,923 1,112,902

26,000

28,535 61,485 59,670 1,272,890

32,094

37,175 74,458 76,926 1,479,702

Conclusions

• Data is definitely not stationary, even after removing trends

• Good kriging is about as effective as “naïve neighbour”, both without covariates

• Prediction with these tools at this simple level is not yet accurate enough

• Each method overpredicts the Republican vote

• Fitting information for each year is very close

Future Developments and Unanswered Questions – New!

• I’ve since introduced universal co-kriging with population, past voting behavior and second-degree spatial dependences using the gstat package.

• Needed: data from the last 4 elections, conveniently packaged. Other prediction using spatial methods.

Applying Geostatistical Methods to Lattice Data: An Initial Examination of U.S. Presidential...

Documents

Geostatistical Reservoir Characterization in Barracuda

Geostatistical Approaches for Quantifying Facies

Geostatistical Analyst Tutorial

Geostatistical Analysis of Hydrologic Parameters

Geostatistical Rock Physics AVA Inversion

A hierarchical geostatistical factor model for ...dse.univr.it/home/workingpapers/MF-AOAS-2011-08-21.pdf · A hierarchical geostatistical factor model for ... A HIERARCHICAL GEOSTATISTICAL

Geostatistical Model, Covariance structure and Cokriginggpss.cc/slim09/slides/wackernagel.pdf · Geostatistical Model, Covariance structure and Cokriging Hans Wackernagel

Geostatistical Inversion in Carbonate and Clastic ... · Geostatistical Inversion in Carbonate and Clastic Reservoirs: Oilfield Case Studies ... of using geostatistical seismic inversion

Geostatistical Analysis

Using ArcGIS Geostatistical Analyst · 2005-01-25 · 2 USING ARCGIS GEOSTATISTICAL ANALYST Exploratory spatial data analysis,

ArcGIS Geostatistical Analyst Tutorialwebhelp.esri.com/arcgisdesktop/9.3/pdf/Geostatistical_Analyst_Tutorial.pdf · concepts of geostatistics. ArcGIS GeoStAtIStIcAl AnAlySt tutorIAl

Using Deterministic and Geostatistical Techniques to ...hydrologydays.colostate.edu/Papers_2011/Eldeiry_paper_updated.pdf · Hydrology Days 2011 Using Deterministic and Geostatistical

Bayesian Inference for Geostatistical Regression Modelsnsu/starmap/johnson.spatial.regression.pdf · Bayesian Inference for Geostatistical Regression Models ... generalized linear

Spatial data analysis: geostatistical tools

GEOSTATISTICAL SIMULATION TECHNIQUES APPLIED TO …

Geostatistical DHSA

A volume flattening methodology for geostatistical ... · A volume flattening methodology for geostatistical properties ... modeling of the reservoir. ... methodology for geostatistical

Geostatistical Tools - ZA TISAK

PRESIDENTIAL ELECTIONS IN NON-PRESIDENTIAL SYSTEMS: PRESIDENTIAL

The Application of Geostatistical Methods to Prepare …file.scirp.org/pdf/OJG_2013011711444037.pdfThe Application of Geostatistical ... Modeling . Geostatistical ... The Application