Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
11
Complex Survey Analysis2010 Workshop of the Association of Public
Health Epidemiologists of OntarioToronto, Ontario
September 20-21, 2010
Susan Bondy, PhDDalla Lana School of Public Health
2
Outline
• Survey analysis in health context• Review of survey samples
– Complex design elements– Issues and implications
• Working with software• Tips / Q&A (all around)
3
What we report from surveys• Descriptive statistics
– Means and rates (e.g., % prevalence), – TOTALS
• Measures of difference, association and effect– % diff, risk diff, OR, RR, rho, etc.– These test hypotheses
• Always reported with expression of variance– Margin of Error (MOE or +/- part)– Confidence intervals
Analytic concerns (health surveys)
• Representativeness, ‘representivity’• Reliability / precision
– Impact of design elements on precision• Privacy and confidentiality
4
Understanding Complex Samples
You will need to understand the jargon to use the software
6
Simple Random Sample• Selection is entirely at random• Everyone has same selection probability
– No unequal or over-sampling– No stratification– Independent selection; not in groups
• Self-weighting (no probability weights)• Theoretically “With Replacement”
• Statistically efficient – But field costs might be a killer– Rare in multi-agenda public health surveys
Strata
• Mutually-exclusive categories (layers)• COMPREHENSIVE (add up to whole pop,
or universe)
• These are NOT sampled• Sampling (of some other unit) is done
WITHIN these layers
Strata - examples• All of Ontario
• Samples of households within EACH LHIN• LHIN is the stratum
• All school boards• Sample of classrooms within each board• Board is the stratum
• All ages• Sample separately for diff age groups• Age group is the stratum
8
Strata vs. clusters – tougher examples
• E.g., health services research• 7 clinics offer ALL care for Ontario• Sampling done with each of all 7 centres.• Data used to describe all Ontario care.
• Clusters or strata?
9
A: Strata
• Because:– They add up to the universe to be described
(“comprehensive”)– Not selected at random, – Layering fixed by design
10
Forms of stratification
• EXplicit stratification– Also known as “over-sampling”– For planned “Domain analysis”
• E.g., LHIN-specific results
• Example– LHINs not equal in true population– Samples equal (for same precision in each)– Higher sampling fraction in smaller LHINs
• Creates need for sampling weights 11
Example
Ontario, SRSEnds up with:• n=3000 in Toronto• 1000 would have been
plenty • Wasted money
• n=300 in North• Poor estimates• Suppressed data• Wasted money
Ontario, equal regional samples
On purpose:• 1000 in Toronto• 1000 in North
• Good, usable data• Cost-efficient
12
Forms of stratification
• IMplicit stratification– Sampling again specific to each layer, but– Now the sampling is done to KEEP the
sample structured like the population – Sampling with Probability Proportionate to
Size (“PPS”)• Reduces need for sampling weights!
– Done to avoid a ‘bad sample’
13
“Bad sample?”
• A good sample:– Has the same distribution of characteristics as
the real population• E.g., same proportions by age and sex
• Large enough samples are good ‘on average’
• but random is random and you will get weird samples
14
Risk for ‘bad samples’
• Imagine a survey of babies• Triplets+ rare• VERY high rates of bad outcome• So, number of multiples will drive this year’s estimates
• SRS 1 – No triplets + • Low death rate estimated
• SRS 2 – accidentally 3 times the norm• High death rate estimated
• Net effect? • High sample to sample random error 15
Implicit stratification• FORCE a few multiples into the sample
• Same small % as actual pop• Too few for a specific report• BUT, total survey less prone to random error, year
over year
PRINCIPLE:• Find factors strongly associated with outcome• Force this into design and analysis to gain
precision
16
Another bad sample – sampling natural groups of unequal size
• Example: 346 Municipalities in Ontario– One Toronto– 8 Cities > 200,000 population– 337 small centres– Systematic size bias:
• Geo/politics • Smaller areas of governance where pop is spread thin
• Choosing SRS of communities would create a disproportionately rural sample !
• So selection is PPS (proportionate to size)
18
“Group” sampling• E.g., people by FAMILY, students by CLASS,
teeth by MOUTH , babies including TWINS, etc.,
• Common in health studies – population and clinic-based surveys– Also experimental designs– May be used naïvely
• Used because of relative cost-efficiency
“PSU” or “cluster”?Classic WHO household surveys
• One country is divided into thousands of PSUs – close to equal population size
• E.g., communities or parts of large cities– Stage 1: 50 to 100 of these “PSU” sampled
• Note the large number!– Stage 2: Sub-regions sampled within each
• Now ‘SSU’ often called “Clusters”– Stage 3: Households within SSUs
19
Better jargon
• Primary Sampling Unit (PSU)• Secondary Sampling Unit (SSU)• Tertiary … you get the idea
• For most complex software, ideal to understand each stage:– Element sampled, sampling method and
fraction
20
Stratum or cluster?
• 7 hospitals agreed to take part in some project; not at random, out of say 24
• Discuss…
21
Analysis
22
Analysis 1: Preparation of the data set
• Field staff have to finalize the dataset • Documenting numbers
– Complete observations– Final dispositions– Response and participation rates
• Data cleaning and documentation– Acting on skip patterns, etc.
23
Preparation of sampling weights
• “Sampling” or “Stratification” weights– These undo the effects of oversampling– Calculated by figuring out:
• The true proportions in categories used in sampling (known for pop and sample, before selection)
• The raw proportions in the sample• Use weights to make sample apply to pop
24
Post-stratification weights• Use information about people that you
couldn’t know before recruitment– E.g., education; smoking status– Again work out wanted percentages– Add further adjustment to weights– Only beneficial if correctly associated with
outcome of interestNecessary? Better? Needs to be
considered.25
26
Survey estimation – two parts
• Prevalence = 13.0 (95% CI = 10.0-16.0)• Odds ratio = 2.1 (95% CI = 1.6-4.0)
Point estimates weighted to
correct for over-sampling
Variances calculated and applied using full
design information BY SMART
SOFTWARE
Design Effect (DEFF)• A statistic showing how much less efficient
a complex sampling design is, relative to SRS of identical size
• DEFF =1 Same efficiency as SRS• DEFF >1 Less efficient than SRS• DEFF =2 As efficient as SRS of ½ size
27
Jargon:
95% C.I.
Analysis type Estimate 95% CIModel-based(assume SRS)
13% 11.0 – 15.2
Account for weights 10% 8.0 – 12.3Account for weights and clustering
10% 7.5 – 13.0
28
Point estimateAffected by weights IF
population mixed
Affected by weighting AND by
clustering
29
2 most common approaches for complex survey variance estimation
“Taylor-Series”aka
“Linearized” variance estimation
“Bootstrap”
Includes tools such as bootstrap weights
30
Bootstrapping approaches• Sampling variability “observed” not
calculated from a fixed formula– Felt to reflect “true” sampling variability, – Chance alone if survey really repeated an infinite
number of times• Virtually free of assumptions
– Tends to be more appropriate and conservative • Very broadly applicable
– E.g., to smaller sample sizes– Sometimes to analyses that other software can’t do
BootstrappingCustom-bootstrapping
• Advanced programming • Draw many (e.g., 1000)
samples from your overall N– Respect strat and clustering– Reweight each time– Save 1000 point estimates
• Variance in 1000 estimates is new corrected variance
Bootstrap weights files
• (example StatCan) • Resampling done once to
produce a set of resampling weights– 1000 weights per observation
• Point estimate calculated once with each weight var (1000 times)
• Variance within 1000 estimates is new variance31
32
Taylor SeriesSoftware uses complex linear
equations to calculate corrected variance for every estimate
• Requires assumptions about data !–Eg., pretty large sample sizes
• Very difficult for user to know:–when limits are being pushed
• Need to tell software full sampling design
33
Software optionsEpi Info Linearized estimation only with very limited analysis options
NB: use only procedures for surveys
SPSS Linearized estimation only (most recent versions may add!)Several analyses availableNB: use only the stand-alone module for complex surveys
Stata Linearized or BS Weights (called via BRR routine)Good range of ‘canned’ complex analysesNB: use the ‘svy’ commands provided
SAS Linearized: means, prop. linear and logistic (more in v10)NB: use only “PROCSURVEY___” commands
Wesvar Linearized or BS Weights (called via BRR routine)Good range of ‘canned’ complex analyses
Statistics Canada Bootvar
BS Weights + bonuses: CV and suppression rulesSomewhat limited analysis options (can request more)NB: programs are macros for SAS or SPSS
34
Tell your software1. Clustered sampling
Correct method WIDENS 95%CIs2. Stratification
Correct method might narrow CIs (a bit)3. Weights
Correct method WIDENS CIs 4. Finite population correction
Never allow this to shrink your CIs
Epi / public health norms:
• Always use population-weighted analyses• Only these are sure to reflect the actual pop
• Never use the “finite population correction”• Well, it’s bloody unlikely• Small samples from small true groups are tough,
statistically; ‘nuff said.
• Always use vetted COMMANDS specifically designed for complex samples
35
36
Using “Taylor-series” type software
1) Use syntax (or dialogue boxes) to declare:
• Weight variable• Stratification variable• Group unit for cluster sampling
– Primary sampling unit (PSU)• Usu. ignore requests for finite population info
2) Run your analysis using ONLY special commands for complex samples
Software specific
• SAS – proc survey commands– Declare strata, weights, cluster for the first
sampling stage– Options are within each proc statement
• Stata – svy-utilities– Can set design options once – Can include all stages, separate post-
stratification weights, and standardization weights
37
Software specific
• SPSS – only separate CS module!– Possibly least intuitive
• Set-up profile, then analyse– Read examples etc – Allows multi-stage– DO ensure first stage set at “with
replacement”
38
Sampling method jargon
• “Sampling with replacement”• In theory done with SRS• Not actually done (we don’t interview twice)• Sampling WOR (without…)
• More conservative assumption is to pretend it was “WR” and from a theoretically infinitely large potential sample
39
Selecting your procedures
• Ratio commands• Create dummy vars for numerator and
denominator, then use to calculate proportions
• Proportion and table commands• Act like table analyses, varying niceness
40
Selecting your procedures
• Means commands• Obviously for continuous vars• ALSO COMMON default for proportions• Try recoding 0/1 vars as 0/100 and spit out %s• Taylor series is ‘large sample technique’ so using
large sample analysis to get mean (and limits) for binary vars as continuous is consistent.
41
Total commands?Wgtd % of all obs’ns
Yes 40%NO 40%DK 20%
Weighted totals:Yes 40,000No 40,000DK 20,000
Wgtd % of valid responsesYes 50%No 50%….
42
Are you happy reporting this as the population total?Alternative is to apply percent estimate (and its upper and lower limits) and use this to estimate pop numbers from pop denominator.
Survey regress/ survey logist
• Commands are least weird to look at
• A big challenge is that you can’t use favourite tests for adding/dropping vars– Likelihood ratio tests are now N/A– Have to use Wald tests to test hypotheses
about coefficients– Some come on output; may need custom
tests 43
Additional stuff with Stata
• Can include extra sets of weights– Post-stratification weights– Standard pop values for standardization
44
Sub-group analyses• Survey stats all about LARGE samples
• Many PSUs, many people per PSU• Analysis of small subsets can lack precision and result in
‘bad samples’
• Probably less harm when studying narrow age group (for example)
• People still come from lots of PSUs
• Risky to study sub-geography• Too few PSUs, unless survey engineered for that level of
geography
• Use “domain” option in commands• Not “if” or “where”• There are limits 45
Privacy and precision
Rules for release or suppression of data
• Always use confidence intervals• Apply rules to suppress estimates that lack
minimum precision– E.g., Statistics Canada
• Minimum observations in numerators• Coefficient of Variation or Relative Standard Error
cut-points (warnings and ‘do not release’)
• Rules for confidentiality– Usually 5+ minimum obs per cell– Suppress zeros cells 47
The perennial FAQs
• When/why do I have to use complex survey software?
• E.g., I have no clusters, just weights
• When/why do I have to bootstrap instead of using SAS/SPSS/Stata?
• Others?
48