Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Systematic and Complex Sampling!
Professor Ron Fricker!Naval Postgraduate School!
Monterey, California!
2/1/13 1
Reading Assignment:!Scheaffer, Mendenhall, Ott, & Gerow!
Chapter 7.1-7.4!
2
Goals for this Lecture!
• Define systematic sampling!– Examples!– Estimators (assuming SRS equivalence)!
• Discuss examples of complex sampling designs!
• Explain the Kish grid!• Introduce variance estimation under complex
designs!
2/1/13
What is Systematic Sampling?!
• Systematic sampling: Given a list of items, select every kth element in the list!– Start by randomly selecting the first item from the
first k elements!• Basis for how random searches are done of
cars coming onto a base!– Often useful for things like sampling visitors to a
web site!– Recently wrote a sampling methodology for
INSURV based on systematic sampling!• See http://faculty.nps.edu/rdfricke/docs/NPS-OR-12-001.pdf!
3 2/1/13
Advantages and Disadvantages of Systematic Sampling!
• Advantages:!– Can be easier to perform in the field!– Less subject to selection errors by fieldworkers!– Can provide more information per unit cost than
SRS!• Potential disadvantages: !
– If list systematically varies in a cycle of approximately every kth item, then can introduce a bias in the result!
– May be harder to estimate variance in some situations!
4 2/1/13
When To Use Systematic Sampling!
• If probability sampling is too complicated to implement in the field!– E.g., unreasonable to expect INSURV inspectors
to either generate a random list of items to inspect or to run around the ship/submarine to inspect a random set of items!
• When generating a sampling frame list is impossible or too hard!– Can be more effective and efficient to simply
survey every kth item encountered!• E.g., every kth visitor to a web site!
5 2/1/13
Mean Estimation Summary(Assuming SRS Equivalency)!
• Estimator for the mean:!
• Variance of :!
• Bound on the error of estimation (margin of error):!
2/1/13 6
ysy =1n
yii=1
n
∑
y Var ysy( ) = 1− nN
⎛⎝⎜
⎞⎠⎟s2
n
2 Var ysy( ) = 2 1− n
N⎛⎝⎜
⎞⎠⎟
s2
n
Estimating Totals(Assuming SRS Equivalency)!
• Estimator for the total:!
• Variance of :!
• Bound on the error of estimation (margin of error):!
2/1/13 7
τ̂ = N × ysy =Nn
yii=1
n
∑
τ̂ Var τ̂( ) = Var N ysy( ) = N 2 1− nN
⎛⎝⎜
⎞⎠⎟s2
n
2 Var τ̂( ) = 2N 1− n
N⎛⎝⎜
⎞⎠⎟
s2
n
Estimating Proportions(Assuming SRS Equivalency)!
• Estimator for the proportion:!
• Variance of :!
• Bound on the error of estimation (margin of error):!
2/1/13 8
p̂ = ysy =1n
yii=1
n
∑
p̂ Var p̂( ) = 1− n
N⎛⎝⎜
⎞⎠⎟
p̂ 1− p̂( )n
2 Var p̂( ) = 2 1− n
N⎛⎝⎜
⎞⎠⎟
p̂ 1− p̂( )n
Complex Sampling for Real-World Surveying!• Usually, real world requirements and
constraints result in complex sampling!– Some combination of stratification and clustering
along with unequal sampling probabilities!• For example, geographic clustering arises
with face-to-face interviewer-based surveys!– Often it’s multi-stage clustering as well!
• Stratification often also necessary to ensure desired representation in sample!
• When combined, estimation gets much more complicated!
9 2/1/13
NAEP Sampling Scheme!
• First stage: 96 PSUs consisting of metropolitan statistical areas (MSAs), a single non-MSA county, or a group of contiguous non-MSA counties!
– About a third of the PSUs are sampled with certainty!– Remainder are stratified and one selected from each stratum with
probability proportional to size!• Second stage: selection public and nonpublic schools within the PSUs !
– For elementary, middle, and secondary samples, independent samples of schools are selected with probability proportional to measures of size!
• Third and final stage: 25 to 30 eligible students are sampled systematically with probabilities designed to make the overall selection probabilities approximately constant!
– Except students from private schools and schools with high proportions of black or Hispanic students oversampled!
• In 1996 nearly 150,000 students were tested from just over 2,000 participating schools!
10 2/1/13 Source: Grading the Nation's Report Card: Evaluating NAEP and Transforming the Assessment of Educational Progress, Board on Testing and Assessment (BOTA), National Academy of Science, 1999.
National Survey of Third World Country!
• First step: Stratify sample by state/province proportional to population!– Oversample any state with less than 100 or 200 interviews to allow
for state-to-state comparisons!• Second step: Within state/province, stratify by urban and rural!
– Urban/rural stratification used to make sure that all localities are represented !
– As a general rule, locations of 10,000 or more classified urban, otherwise classified rural !
• Third step: Select PSUs within state/provinces and by urban/rural location!
• Fourth step: Select starting point within each PSU for each interviewer!– Starting points defined as locations with sufficient public presence
to be known by local residents, such as schools, markets, etc.!
11 2/1/13
The World Health Survey Illustration!
2/1/13 12 Source: World Health Organization. The World Health Survey (WHS): Sampling Guidelines for Participating Countries. Accessed online at http://www.who.int/entity/healthinfo/survey/whssamplingguidelines.pdf.
House Selection Via Systematic Sampling!
13 2/1/13
Selection of Household in Multi-dwelling Structure!
2/1/13 14
Respondent Selection in Each House!
• To select the person to interview within a household:!– List all adult males and females aged 18 years
and above in the household on a Kish grid!• A Kish grid is essentially a table of randomly
generated numbers!• It’s a pre-assigned table of random numbers to
find the person to be interviewed!• Alternative is the next-birthday method!
– One respondent is selected using the grid!– Once the responded is selected, the interview is
conducted with only that respondent !15 2/1/13
Kish Grid (aka Kish Tables) Example!
16 2/1/13
Sequ
entia
lly w
ork
dow
n th
e lis
t!
Overall Selection Probabilities
Source: Kish, L. (1949). A Procedure for Objective Respondent Selection Within the Household, Journal of the American Statistical Association, 380-387.
Variance Estimation for Complex Designs!
• Complex sampling methods require nonstandard methods to estimate variances!– I.e., Can’t just plug the data into statistical
software and use their standard errors!– (Very rare) exception: SRS with large population
and low nonresponse!• Software for (some) complex survey designs:!
– Free: CENVAR, VPLX, CPLX, EpiInfo!– Commercial: SAS, Stata, SUDAAN, WesVar!
• Two estimation methods: Taylor series expansion and Jackknife!
17 2/1/13
Variance Estimation (Taylor Series)!
• Taylor series approximation: converts ratios into sums!
• Example: Variance for weighted mean!
!assuming a SRS can be expressed as!1 1
n n
w i i ii i
y w y w= =
=∑ ∑
( ) ( ) ( ) ( )( )
2
2
2 ,i i w i w i i iw
i
Var w y y Var w y Cov w y wVar y
w
+ −= ∑ ∑ ∑ ∑
∑
18 2/1/13
Variance Estimation (Jackknife and Balanced Repeated Replication)!• Jackknife and balanced repeated replication
methods rely on empirical methods!– Basically, resample from data c times!– Calculate overall mean as!
!and then estimate variance as!1
1 c
y yc γ
γ == ∑
( ) ( )21
1( 1)
c
v y y yc c γ
γ == −
− ∑
19 2/1/13
What We Have Covered!
20
• Defined systematic sampling!– Examples!– Estimators (assuming SRS equivalence)!
• Discussed examples of complex sampling designs!
• Explained the Kish grid!• Introduced variance estimation under
complex designs!
2/1/13