Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
The Impact on Survey Operations and Sampling
Jizhou Fu and Lee Fiorio
Modeling Coverage Error in Address Lists Due to Geocoding Error:
AAPOR 2012, Orlando
• ABS Background• Analysis Goals• Data and Methodology• Results• Discussion • Limitations
Outline
3
• Address based frames first need geographical boundaries• Types of address-based frames
• US Postal Service Delivery Sequence File (DSF)– Purchased through market research vendors– Updated frequently– Adequate replacement for field listing in urban and suburban areas
• Dependent or Enhanced Listing– Provide DSF to listers for enhancement in the field– Reduces cost and increases accuracy of traditional lisitng
• Because of costs, DSF should be used where possible• Enhanced listing should be used where DSF is inadequate• Evaluating DSF coverage: DSF-to-Census Ratio
Address-Based Sampling (ABS) Background
4
• Geographic information on the DSF:• Address, city, county, state, zip, zip4, carrier route, walk
sequence
• Geographic information not on the DSF:• Census block, census block group, census tract, latitude or
longitude
• Geocoding • Appends latitude and longitude as well as census geography• Requires commercial software • PO Boxes and Rural Route address not easily geocoded• Potential for error
DSF Geography
5
Geocoding Error
6
7
Geocoding Error
8
Geocoding Error
9
Geocoding Error
10
Geocoding Error
11
Geocoding Error
• What are the correlates of geocoding error?• Logistic Model
– Urbanicity– Housing unit density– Vacancy rates– Drop delivery– Housing unit type (single family home, apartment)– Home ownership– Adjacent to water blocks
• Does geocoding error exhibit spatial clustering?• Moran’s I• Logistic Model
– Autocovariate
Research Questions
12
• NORC National Frame Listing effort• Fall 2011• Out of 1,516 segments (census tracts or block groups), 126 segments
needed enhancement• Device based listing
– Latitude and longitude collected– Segment level address list– Real-time QC in central office
• Selected 21 enhanced segments for analysis• Geocapture worked for at least 90% of addresses• Mix of urban and rural• Range of DSF-to Census ratios -- 0.31 to .81
• 8,560 DSF lines
Data and Methodology
13
14
Geocoding error: over-coverage vs under-coverage
Addresses added in the
field
Final enhanced list
Confirmed DSF
addresses
Unconfirmed DSF
addresses
DSF
(over-coverage) (under-coverage)(coverage)
4,8597,5041,056
• 12.3% of DSF lines unconfirmed in field
• Difficult to separate causes of under-coverage
• Focus on over-coverage
• Sample drawn of 4,000 DSF lines provided for enhancement• Dependent variable: flag if correctly geocoded into the segment• Independent variables:
• Address-level (DSF)– Drop point flag– Vacant flag– Record type indicator (High rise, rural, single family home)
• Block-level (census)– DSF-to-Census ratio – four categories(<0.9, 0.9 to 1.25, 1.25 to 2, >2)– TEA Code Flag– Type of Enumeration Area– Principal city flag– Water adjacency flag– Housing unit density– Area– Percent Multi-unit
Data and Methodology (cont’d)
15
Table 1: Logistic Model ResultsParameter Estimate
Intercept -***DSF-to-Census <0.9 +***DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***
16
Ratio Categories
Urbanicity
Postal Characteristics
Geographical Considerations
Significance: * p<0.05, ** p<0.01, *** p<0.001
17
Table 2: A closer look at impact of DSF-to-Census Ratio
Category Parameter Odds Ratio
Signifi-cance
1 DSF-to-Census <0.9 2.25 ***
3 DSF-to-Census 1.25 to 2.0 2.37 **
4 DSF-to-Census >2.0 4.29 ***
• Addresses in category 1 census blocks have the same odds of being geocoded incorrectly as category 2
• Does geocoding error exhibit spatial clustering?• Do blocks with geocoding error neighbor blocks with
geocoding error?
y = β1x1 + β2x2 + … + βpWy + ε
• Where Wy is weighted average of neighboring values or ‘spatial lag’
18
Spatial Autocorrelation
Spatial Autocorrelation
19
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
Spatial Autocorrelation
20
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
Moran’s I – Measure of Spatial Autocorrelation
21
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
Spatial Autocorrelation
22
1 2
3 4 5
Example Segment
Error1 12 13 04 05 1
Example variable of interest y
23
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
y1 12 13 04 05 1
=
Wy1 12 23 14 25 1
Weight Matrix W Geocoding Error y
Spatial Autocorrelation
*
Weighted average of neighbors Wy
• Degree of linear association between observed values y and a weighted average of neighboring values Wy
• Observed: 0.0281• Very significant (p < 0.0001)• Positive, indicating possible spatial clustering
• Add Wy to final logistic model
y = Xβ1x1 + Xβ2x2 + … + XβpWy + ε
24
Moran’s I and Spatial Autocorrelation Model
Table 3: Logistic Regression with Spatial AutocovariateParameter EstimateIntercept -***DSF-to-Census <0.9 +**DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***Autocovariate (W.y) +* 25
Ratio Categories
Urbanicity
Postal Characteristics
Geographical Considerations
Map 1: Example of Clustering
26
Map 2: Example of Clustering
27
• Urbanicity, postal characteristics, block-level DSF-to-census ratio are highly correlated with geocoding error
• Addresses in low DSF-to-Census ratio blocks have similar odds of geocoding error as addresses in high DSF-to-Census ratio blocks
• Geocoding error exhibits spatial clustering• Problematic blocks within a segments can be used as a potential
flag for larger geocoding error
• Help with address frame decisions
Discussion
28
• Analysis was limited to segments that already have less than acceptable DSF coverage
• Possible that census characteristics and DSF flags behave differently above threshold
• Sample of 21 segments used in analysis not random• Limits the ability to generalize findings
• Definition of geocoding error limited to over-coverage error
Limitations
29
Thank You!
Lee Fiorio [email protected]