If you can't read please download the document
Upload
kira
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Dual System Estimation and Census Adjustment. Stephen E. Fienberg Statistics 36-149 Department of Statistics Carnegie Mellon University November 27-29, 2001. fish* penguins homeless prostitutes in Glasgow Italians with diabetes*. people in the U.S.** people with HIV virus - PowerPoint PPT Presentation
Citation preview
Dual System Estimation and Census Adjustment
Stephen E. FienbergStatistics 36-149Department of Statistics Carnegie Mellon UniversityNovember 27-29, 2001
What Do Following Populations Have in Common?fish* penguinshomelessprostitutes in Glasgow Italians with diabetes*
people in the U.S.**people with HIV virus adolescent injuries in Pittsburgh, PAWWW
Example 1: Diabetes PrevalenceBruno et al. (1994) used 4 sources for ascertainment of diabetes in Casale Monferrat, Northern Italy
s1: diabetes clinic and/or family physicians
s2: patients discharged with diagnosis from hospitals
s3: insulin or oral hypoglycaemic prescriptions
s4: requests for reimbursement for insulin and reagent strips
Example 1: Diabetes (cont.)s1 Yes Yes No Nos2 Yes No Yes Nos3 s4Yes Yes 58 46 14 8Yes No 157 650 20 182No Yes 18 12 7 10No No 104 709 74 -n = 2069
Example 2: Fish in a Lake200 fish caught 1st time150 fish caught 2nd timeOf 150 fish in 2nd sample, 125 were among 200 counted in 1st sampleTotal number of fish caught= 200 + (150 - 125) = 225
But how many fish have gone undetected?
Example 2: Fish in a LakeProportion of fish in 2nd sample also in 1st= 125/150 = 5/6 Generalize from sample to population(5/6) N = 200N = (6/5) 200 = 240This is method of capture-recapture due to Peterson, Lincoln, Schnabel, etc.^^
Capture-Recapture Model Sample 2
In Out Total
In a b n1
Out c d ?? N -n1
Total n2 N - n2 N ??N = n1 n2/a^Sample 1
Role of Independence
Some Formal DetailsAlternatively, we think in terms of the ratio of odds for row 1 vs. odds for row 2:
P{A and B} / P{A and Bc} P{Ac and B} / P{Ac and Bc}
P{A and B} P{Ac and Bc} P{Ac and B} P{A and Bc}
and under independence this equals 1.=
Some Formal DetailsBack to data.
We think of independence in terms of equality of odds, and we set
ad /bc = 1
and estimate unobserved d by
d = bc/a
N = a+ b+ c+bc/ a
= n1 n2/a^^
More Formal Version = n1 n2/a125 75200
25 ?
150 = 150 200/125 = 240
Sample 2
Sample 1
In
Out
Total
In
a
c
n1
Out
b
d
N-n1
Total
n2
N-n2
N
Example 1: DiabetesLooking at Pairs of Lists
Estimated s.e.s are on the order of 100. Only 3 of 6 estimates exceed n = 2069. Pair N s1, s22,351s1, s32,185s1, s42,262s2, s32,057s2, s4 803s3, s41,555^
Diabetes Example:What is Going Wrong?Independence of lists in the pairs!
Capture-Recapture AssumptionsRandom samplesIndependenceClosed populationPerfect matching (no tag loss)Homogeneity
How do we check on assumptions?
The problem of the wiley trout.
Accuray and Coverage Evaluation SurveySurvey approximately 314,000 HH in 11,000 blocks. Used to correct raw census counts using capture-recapture or dual systems estimation methodology.Correct for omissions AND erroneous enumerations.
ACE Design Two parts to ACE sample of blocks:sample of population -- P-sampleused to estimate omissionsmatched records against those for censussample of census -- E-sampleused to estimate erroneous enumerationssubtract out EEs from census counts before using DSE
Dual Systems Components
Census
Sample
In
Out
Total
In
Matches
ACE
Non-Matches
ACE
Total
Out
Census
Non-Matches
Missed in
Both
Total
Census
Total
Population
Total
Census
Sample
In
Out
Total
In
Matches
PES
Non-Matches
Total
PES
Out
Census
Non-Matches
Missed
In
Both
Total
Total
Census
Total Population
Census
Sample
In
Out
Total
In
Matches
PES
Non-Matches
Total
PES
Out
Census
Non-Matches
Missed
In
Both
Total
Total
Census
Total Population
DSE With Same Values As Fish nCEN =census count - EEs
= nCEN nACE/a125 75200
25 ?
150 = 150 200/125 = 240
Census
Sample
In
Out
Total
In
a
c
nACE
Out
b
d
N-nACE
Total
nCEN
N-nCEN
N
DSE Features in 2000Excluded homeless/shelters and group quarters from calculations in 2000Adjusted sample counts for moversSearching in adjacent blocks
Dual Systems AssumptionsPerfect matchingidea of probabilistic matching with variable probabilities for different individualsHomogeneityDependence between sample and censusheterogeneity and dependence get combined in what is called correlation biasErrorless assessment of erroneous enumerations
ACE ImplementationAggregate counts from census blocks for various demographic and racial/ethnic groups.Apply DSE for these aggregates (called post-strata).Generalizing from adjustments for the ACE sample of blocks and strata to the nation.synthetic error
Post-strataInstead of doing DSE at the block level, we reorganize the data by grouping parts of blockes according toagerace/ethnicitysexoccupancy statusmail return rateResults in over 480 post-strata, and we apply DSE in each.
What Do We Know About Dual SystemsAssumptions at Post-strata Level?
Synthetic AssumptionCarrying the adjustments back to the individual blocks not in the ACE sample:Assumes the homogenity of all of those parts of blocks in each post-stratum.Result is that some blocks increase and some blocks decrease in estimated population sizedecreases total 1 millionincreases total 4.3 million
March 2001 Adjustment Decision Not ready to adjust using DSE.Concerns:DAloss functionscounties under 100,000balancing error synthetic error
Oct. 2001 Adjustment Decision Still not ready to adjust!Old concerns:DAloss functions?balancing error - nosynthetic error -noNew concern:missed EEs in ACE