Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario

11

Complex Survey Analysis2010 Workshop of the Association of Public

Health Epidemiologists of OntarioToronto, Ontario

September 20-21, 2010

Susan Bondy, PhDDalla Lana School of Public Health

2

Outline

• Survey analysis in health context• Review of survey samples

– Complex design elements– Issues and implications

• Working with software• Tips / Q&A (all around)

3

What we report from surveys• Descriptive statistics

– Means and rates (e.g., % prevalence), – TOTALS

• Measures of difference, association and effect– % diff, risk diff, OR, RR, rho, etc.– These test hypotheses

• Always reported with expression of variance– Margin of Error (MOE or +/- part)– Confidence intervals

Analytic concerns (health surveys)

• Representativeness, ‘representivity’• Reliability / precision

– Impact of design elements on precision• Privacy and confidentiality

4

Understanding Complex Samples

You will need to understand the jargon to use the software

6

Simple Random Sample• Selection is entirely at random• Everyone has same selection probability

– No unequal or over-sampling– No stratification– Independent selection; not in groups

• Self-weighting (no probability weights)• Theoretically “With Replacement”

• Statistically efficient – But field costs might be a killer– Rare in multi-agenda public health surveys

Strata

• Mutually-exclusive categories (layers)• COMPREHENSIVE (add up to whole pop,

or universe)

• These are NOT sampled• Sampling (of some other unit) is done

WITHIN these layers

Strata - examples• All of Ontario

• Samples of households within EACH LHIN• LHIN is the stratum

• All school boards• Sample of classrooms within each board• Board is the stratum

• All ages• Sample separately for diff age groups• Age group is the stratum

8

Strata vs. clusters – tougher examples

• E.g., health services research• 7 clinics offer ALL care for Ontario• Sampling done with each of all 7 centres.• Data used to describe all Ontario care.

• Clusters or strata?

9

A: Strata

• Because:– They add up to the universe to be described

(“comprehensive”)– Not selected at random, – Layering fixed by design

10

Forms of stratification

• EXplicit stratification– Also known as “over-sampling”– For planned “Domain analysis”

• E.g., LHIN-specific results

• Example– LHINs not equal in true population– Samples equal (for same precision in each)– Higher sampling fraction in smaller LHINs

• Creates need for sampling weights 11

Example

Ontario, SRSEnds up with:• n=3000 in Toronto• 1000 would have been

plenty • Wasted money

• n=300 in North• Poor estimates• Suppressed data• Wasted money

Ontario, equal regional samples

On purpose:• 1000 in Toronto• 1000 in North

• Good, usable data• Cost-efficient

12

Forms of stratification

• IMplicit stratification– Sampling again specific to each layer, but– Now the sampling is done to KEEP the

sample structured like the population – Sampling with Probability Proportionate to

Size (“PPS”)• Reduces need for sampling weights!

– Done to avoid a ‘bad sample’

13

“Bad sample?”

• A good sample:– Has the same distribution of characteristics as

the real population• E.g., same proportions by age and sex

• Large enough samples are good ‘on average’

• but random is random and you will get weird samples

14

Risk for ‘bad samples’

• Imagine a survey of babies• Triplets+ rare• VERY high rates of bad outcome• So, number of multiples will drive this year’s estimates

• SRS 1 – No triplets + • Low death rate estimated

• SRS 2 – accidentally 3 times the norm• High death rate estimated

• Net effect? • High sample to sample random error 15

Implicit stratification• FORCE a few multiples into the sample

• Same small % as actual pop• Too few for a specific report• BUT, total survey less prone to random error, year

over year

PRINCIPLE:• Find factors strongly associated with outcome• Force this into design and analysis to gain

precision

16

Another bad sample – sampling natural groups of unequal size

• Example: 346 Municipalities in Ontario– One Toronto– 8 Cities > 200,000 population– 337 small centres– Systematic size bias:

• Geo/politics • Smaller areas of governance where pop is spread thin

• Choosing SRS of communities would create a disproportionately rural sample !

• So selection is PPS (proportionate to size)

18

“Group” sampling• E.g., people by FAMILY, students by CLASS,

teeth by MOUTH , babies including TWINS, etc.,

• Common in health studies – population and clinic-based surveys– Also experimental designs– May be used naïvely

• Used because of relative cost-efficiency

“PSU” or “cluster”?Classic WHO household surveys

• One country is divided into thousands of PSUs – close to equal population size

• E.g., communities or parts of large cities– Stage 1: 50 to 100 of these “PSU” sampled

• Note the large number!– Stage 2: Sub-regions sampled within each

• Now ‘SSU’ often called “Clusters”– Stage 3: Households within SSUs

19

Better jargon

• Primary Sampling Unit (PSU)• Secondary Sampling Unit (SSU)• Tertiary … you get the idea

• For most complex software, ideal to understand each stage:– Element sampled, sampling method and

fraction

20

Stratum or cluster?

• 7 hospitals agreed to take part in some project; not at random, out of say 24

• Discuss…

21

Analysis

22

Analysis 1: Preparation of the data set

• Field staff have to finalize the dataset • Documenting numbers

– Complete observations– Final dispositions– Response and participation rates

• Data cleaning and documentation– Acting on skip patterns, etc.

23

Preparation of sampling weights

• “Sampling” or “Stratification” weights– These undo the effects of oversampling– Calculated by figuring out:

• The true proportions in categories used in sampling (known for pop and sample, before selection)

• The raw proportions in the sample• Use weights to make sample apply to pop

24

Post-stratification weights• Use information about people that you

couldn’t know before recruitment– E.g., education; smoking status– Again work out wanted percentages– Add further adjustment to weights– Only beneficial if correctly associated with

outcome of interestNecessary? Better? Needs to be

considered.25

26

Survey estimation – two parts

• Prevalence = 13.0 (95% CI = 10.0-16.0)• Odds ratio = 2.1 (95% CI = 1.6-4.0)

Point estimates weighted to

correct for over-sampling

Variances calculated and applied using full

design information BY SMART

SOFTWARE

Design Effect (DEFF)• A statistic showing how much less efficient

a complex sampling design is, relative to SRS of identical size

• DEFF =1 Same efficiency as SRS• DEFF >1 Less efficient than SRS• DEFF =2 As efficient as SRS of ½ size

27

Jargon:

95% C.I.

Analysis type Estimate 95% CIModel-based(assume SRS)

13% 11.0 – 15.2

Account for weights 10% 8.0 – 12.3Account for weights and clustering

10% 7.5 – 13.0

28

Point estimateAffected by weights IF

population mixed

Affected by weighting AND by

clustering

29

2 most common approaches for complex survey variance estimation

“Taylor-Series”aka

“Linearized” variance estimation

“Bootstrap”

Includes tools such as bootstrap weights

30

Bootstrapping approaches• Sampling variability “observed” not

calculated from a fixed formula– Felt to reflect “true” sampling variability, – Chance alone if survey really repeated an infinite

number of times• Virtually free of assumptions

– Tends to be more appropriate and conservative • Very broadly applicable

– E.g., to smaller sample sizes– Sometimes to analyses that other software can’t do

BootstrappingCustom-bootstrapping

• Advanced programming • Draw many (e.g., 1000)

samples from your overall N– Respect strat and clustering– Reweight each time– Save 1000 point estimates

• Variance in 1000 estimates is new corrected variance

Bootstrap weights files

• (example StatCan) • Resampling done once to

produce a set of resampling weights– 1000 weights per observation

• Point estimate calculated once with each weight var (1000 times)

• Variance within 1000 estimates is new variance31

32

Taylor SeriesSoftware uses complex linear

equations to calculate corrected variance for every estimate

• Requires assumptions about data !–Eg., pretty large sample sizes

• Very difficult for user to know:–when limits are being pushed

• Need to tell software full sampling design

33

Software optionsEpi Info Linearized estimation only with very limited analysis options

NB: use only procedures for surveys

SPSS Linearized estimation only (most recent versions may add!)Several analyses availableNB: use only the stand-alone module for complex surveys

Stata Linearized or BS Weights (called via BRR routine)Good range of ‘canned’ complex analysesNB: use the ‘svy’ commands provided

SAS Linearized: means, prop. linear and logistic (more in v10)NB: use only “PROCSURVEY___” commands

Wesvar Linearized or BS Weights (called via BRR routine)Good range of ‘canned’ complex analyses

Statistics Canada Bootvar

BS Weights + bonuses: CV and suppression rulesSomewhat limited analysis options (can request more)NB: programs are macros for SAS or SPSS

34

Tell your software1. Clustered sampling

Correct method WIDENS 95%CIs2. Stratification

Correct method might narrow CIs (a bit)3. Weights

Correct method WIDENS CIs 4. Finite population correction

Never allow this to shrink your CIs

Epi / public health norms:

• Always use population-weighted analyses• Only these are sure to reflect the actual pop

• Never use the “finite population correction”• Well, it’s bloody unlikely• Small samples from small true groups are tough,

statistically; ‘nuff said.

• Always use vetted COMMANDS specifically designed for complex samples

35

36

Using “Taylor-series” type software

1) Use syntax (or dialogue boxes) to declare:

• Weight variable• Stratification variable• Group unit for cluster sampling

– Primary sampling unit (PSU)• Usu. ignore requests for finite population info

2) Run your analysis using ONLY special commands for complex samples

Software specific

• SAS – proc survey commands– Declare strata, weights, cluster for the first

sampling stage– Options are within each proc statement

• Stata – svy-utilities– Can set design options once – Can include all stages, separate post-

stratification weights, and standardization weights

37

Software specific

• SPSS – only separate CS module!– Possibly least intuitive

• Set-up profile, then analyse– Read examples etc – Allows multi-stage– DO ensure first stage set at “with

replacement”

38

Sampling method jargon

• “Sampling with replacement”• In theory done with SRS• Not actually done (we don’t interview twice)• Sampling WOR (without…)

• More conservative assumption is to pretend it was “WR” and from a theoretically infinitely large potential sample

39

Selecting your procedures

• Ratio commands• Create dummy vars for numerator and

denominator, then use to calculate proportions

• Proportion and table commands• Act like table analyses, varying niceness

40

Selecting your procedures

• Means commands• Obviously for continuous vars• ALSO COMMON default for proportions• Try recoding 0/1 vars as 0/100 and spit out %s• Taylor series is ‘large sample technique’ so using

large sample analysis to get mean (and limits) for binary vars as continuous is consistent.

41

Total commands?Wgtd % of all obs’ns

Yes 40%NO 40%DK 20%

Weighted totals:Yes 40,000No 40,000DK 20,000

Wgtd % of valid responsesYes 50%No 50%….

42

Are you happy reporting this as the population total?Alternative is to apply percent estimate (and its upper and lower limits) and use this to estimate pop numbers from pop denominator.

Survey regress/ survey logist

• Commands are least weird to look at

• A big challenge is that you can’t use favourite tests for adding/dropping vars– Likelihood ratio tests are now N/A– Have to use Wald tests to test hypotheses

about coefficients– Some come on output; may need custom

tests 43

Additional stuff with Stata

• Can include extra sets of weights– Post-stratification weights– Standard pop values for standardization

44

Sub-group analyses• Survey stats all about LARGE samples

• Many PSUs, many people per PSU• Analysis of small subsets can lack precision and result in

‘bad samples’

• Probably less harm when studying narrow age group (for example)

• People still come from lots of PSUs

• Risky to study sub-geography• Too few PSUs, unless survey engineered for that level of

geography

• Use “domain” option in commands• Not “if” or “where”• There are limits 45

Privacy and precision

Rules for release or suppression of data

• Always use confidence intervals• Apply rules to suppress estimates that lack

minimum precision– E.g., Statistics Canada

• Minimum observations in numerators• Coefficient of Variation or Relative Standard Error

cut-points (warnings and ‘do not release’)

• Rules for confidentiality– Usually 5+ minimum obs per cell– Suppress zeros cells 47

The perennial FAQs

• When/why do I have to use complex survey software?

• E.g., I have no clusters, just weights

• When/why do I have to bootstrap instead of using SAS/SPSS/Stata?

• Others?

48

Documents

Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario