49
Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona -IDEA Brigitte Helynck, Philippe Malfait, Institut de veille sanitaire Modified: Viviane Bremer, EPIET 2004, Suzanne Cotter 2005, Richard Pebody 2006 Lazereto de Mahón, Menorca, Spain September 2006

Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Embed Size (px)

Citation preview

Page 1: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Sampling and Sample Size Calculation

Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona-IDEA Brigitte Helynck, Philippe Malfait, Institut de veille sanitaire

Modified: Viviane Bremer, EPIET 2004, Suzanne Cotter 2005,Richard Pebody 2006

Lazereto de Mahón, Menorca, Spain

September 2006

Page 2: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Objectives: sampling

• To understand:• Why we use sampling • Definitions in sampling • Sampling errors • Main methods of sampling• Sample size calculation

Page 3: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Why do we use sampling?

Get information from large populations with:

– Reduced costs

– Reduced field time

– Increased accuracy

– Enhanced methods

Page 4: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Definition of sampling

Procedure by which some members

of a given population are selected as representatives of the entire population

Page 5: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Definition of sampling terms

Sampling unit (element)• Subject under observation on which

information is collected– Example: children <5 years, hospital discharges,

health events…

Sampling fraction• Ratio between sample size and population

size– Example: 100 out of 2000 (5%)

Page 6: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Definition of sampling terms

Sampling frame • List of all the sampling units from which

sample is drawn– Lists: e.g. children < 5 years of age, households,

health care units…

Sampling scheme• Method of selecting sampling units from

sampling frame– Randomly, convenience sample…

Page 7: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Survey errors

• Systematic error (or bias)

Sample not typical of population

– Inaccurate response (information bias)

– Selection bias

• Sampling error (random error)

Page 8: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Representativeness (validity)

A sample should accurately reflect distribution ofrelevant variable in population

• Person e.g. age, sex• Place e.g. urban vs. rural• Time e.g. seasonality

Representativeness essential to generalise

Ensure representativeness before starting,

Confirm once completed

Page 9: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Sampling and representativeness

Sample

Target Population

SamplingPopulation

Target Population Sampling Population Sample

Page 10: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Sampling error

• Random difference between sample and population from which sample drawn

• Size of error can be measured in probability samples

• Expressed as “standard error”– of mean, proportion…

• Standard error (or precision) depends upon:– Size of the sample – Distribution of character of interest in population

Page 11: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Sampling error

When simple random sample of size ‘n’ is selected from population of size N, standard error (s) for population mean or proportion is:

σ p(1-p)

n n

Used to calculate, 95% confidence intervals

xxsXsZX 2or 2

Estimated 95% confidence interval

Page 12: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Quality of a sampling estimate

Precision & validity

No precision

Random error

Precision butno validity

Systematicerror (bias)

Page 13: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Survey errors: example

Measuring height:• Measuring tape held differently by different

investigators

→ loss of precision – Large standard error

• Tape shrunk/wrong

→ systematic error– Bias (cannot be corrected afterwards)

179

177

178

175

176

173

174

Page 14: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Types of sampling

• Non-probability samples

• Probability samples

Page 15: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Non probability samples

• Convenience samples (ease of access)

• Snowball sampling (friend of friend….etc.)

• Purposive sampling (judgemental)• You chose who you think should be in the study

Probability of being chosen is unknownCheaper- but unable to generalise, potential for bias

Page 16: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Probability samples

• Random sampling– Each subject has a known probability of being

selected

• Allows application of statistical sampling theory to results to: – Generalise – Test hypotheses

Page 17: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Methods used in probability samples

• Simple random sampling• Systematic sampling• Stratified sampling• Multi-stage sampling • Cluster sampling

Page 18: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Simple random sampling

• Principle– Equal chance/probability of drawing each unit

• Procedure– Take sampling population– Need listing of all sampling units (“sampling frame”)– Number all units– Randomly draw units

Page 19: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Simple random sampling

• Advantages– Simple– Sampling error easily measured

• Disadvantages– Need complete list of units– Does not always achieve best representativeness– Units may be scattered and poorly accessible

Page 20: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Example: evaluate the prevalence of tooth decay among 1200 children attending a school

• List of children attending the school• Children numerated from 1 to 1200• Sample size = 100 children• Random sampling of 100 numbers between 1

and 1200

How to randomly select?

Simple random sampling

Page 21: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

EPITABLE: random number listing

Page 22: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

EPITABLE: random number listing

Also possible in Excel

Page 23: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Simple random sampling

Page 24: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Systematic sampling

• Principle– Select sample at regular intervals based on sampling

fraction

• Advantages– Simple– Sampling error easily measured

• Disadvantages– Need complete list of units– Periodicity

Page 25: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Systematic sampling

• N = 1200, and n = 60 sampling fraction = 1200/60 = 20

• List persons from 1 to 1200

• Randomly select a number between 1 and 20 (ex : 8)

1st person selected = the 8th on the list 2nd person = 8 + 20 = the 28th etc .....

Page 26: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Systematic sampling

Page 27: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Stratified sampling

• Principle :– Divide sampling frame into homogeneous

subgroups (strata) e.g. age-group, occupation;

– Draw random sample in each strata.

Page 28: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Stratified sampling

• Advantages– Can acquire information about whole population and

individual strata– Precision increased if variability within strata is less

(homogenous) than between strata

• Disadvantages– Can be difficult to identify strata

– Loss of precision if small numbers in individual strata • resolve by sampling proportionate to stratum population

Page 29: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Multiple stage sampling

Principle:

• consecutive sampling

• example : sampling unit = household– 1st stage: draw neighborhoods – 2nd stage: draw buildings– 3rd stage: draw households

Page 30: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Cluster sampling

• Principle– Sample units not identified independently but in a

group (or “cluster”)

– Provides logistical advantage.

Page 31: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Cluster sampling

• Principle– Whole population divided into groups e.g.

neighbourhoods

– Random sample taken of these groups (“clusters”)

– Within selected clusters, all units e.g. households included (or random sample of these units)

Page 32: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Example: Cluster sampling

Section 4

Section 5

Section 3

Section 2Section 1

Page 33: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Cluster sampling

• Advantages– Simple as complete list of sampling units within

population not required

– Less travel/resources required

• Disadvantages– Potential problem is that cluster members are more

likely to be alike, than those in another cluster

(homogenous)….

– This “dependence” needs to be taken into account in the

sample size….and the analysis (“design effect”)

Page 34: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Selecting a sampling method

• Population to be studied– Size/geographical distribution– Heterogeneity with respect to variable

• Availability of list of sampling units• Level of precision required• Resources available

Page 35: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Sample size estimation

• Estimate number needed to

• reliably measure factor of interest

• detect significant association

• Trade-off between study size and resources….

• Sample size determined by various factors:

• significance level (“alpha”)

• power (“1-beta”)

• expected prevalence of factor of interest

Page 36: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Type 1 error

• The probability of finding a difference with our sample compared to population, and there really isn’t one….

• Known as the α (or “type 1 error”)

• Usually set at 5% (or 0.05)

Page 37: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Type 2 error

• The probability of not finding a difference that actually exists between our sample compared to the population…

• Known as the β (or “type 2 error”)

• Power is (1- β) and is usually 80%

Page 38: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

A question?

Are the English more intelligent than the Dutch?

• H0 Null hypothesis: The English and Dutch have the same mean IQ

• Ha Alternative hypothesis: The mean IQ of the English is greater than the Dutch

Page 39: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Type 1 and 2 errors

Truth

Decision H0 true H0 false

Reject H0 Type I error Correct decision

Accept H0 Correct Type II error

decision

Page 40: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Power

• The easiest ways to increase power are to:– increase sample size

– increase desired difference (or effect size)

– decrease significance level desired e.g. 10%

Page 41: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Steps in estimating sample size for descriptive survey

• Identify major study variable• Determine type of estimate (%, mean, ratio,...) • Indicate expected frequency of factor of interest• Decide on desired precision of the estimate • Decide on acceptable risk that estimate will fall outside

its real population value• Adjust for estimated design effect• Adjust for expected response rate

Page 42: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Sample size fordescriptive survey

z: alpha risk expressed in z-score

p: expected prevalence

q: 1 - p

d: absolute precision

g: design effect

z² * p * q 1.96²*0.15*0.85n = -------------- ---------------------- = 544

d² 0.03²

Cluster sampling

z² * p * q 2*1.96²*0.15*0.85n = g* -------------- ------------------------ = 1088d² 0.03²

Simple random / systematic sampling

Page 43: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Case-control sample size: issues to consider

• Number of cases• Number of controls per case • Odds ratio worth detecting• Proportion of exposed persons in source

population• Desired level of significance (α) • Power of the study (1-β)

– to detect at a statistically significant level a particular odds ratio

Page 44: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Case-control:STATCALC Sample size

Page 45: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Case-control: STATCALC Sample size

Risk of alpha error 5%

Power 80%

Proportion of controls exposed 20%

OR to detect > 2

Page 46: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Case-control:STATCALC Sample size

Page 47: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Statistical Power of aCase-Control Study

for different control-to-case ratios and odds ratios (with 50 cases)

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10

Control-Case Ratio

Po

wer OR=2

OR=4

OR=3

Page 48: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Conclusions

• Probability samples are the best

• Ensure – Representativeness– Precision

• …..within available constraints

Page 49: Sampling and Sample Size Calculation Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona

Conclusions

• If in doubt…

Call a statistician !!!!