View
498
Download
0
Embed Size (px)
Citation preview
Complex sampling design & analysis. A revisionAssoc. Prof. Dr. Jamalludin Ab Rahman MD MPHDepartment of Community MedicineKulliyyah of Medicine
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
2
Content Sampling method & sample size for survey What is complex sampling method Sampling weight Complex sampling analysis
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
3
About sampling Not feasible to select ALL population Best sampling should be able to represent population Sampling error occurs when statistics ≠ parameters Sampling error is not sampling bias Sampling error is random, sampling bias is predictable
(systematic) Sampling design affects sampling error Standard error measures sampling error
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
4
6-7t
h Ap
ril 2
016
The aim of any sampling plan should is to reduce sampling error, and to avoid sampling
bias
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
5
Describe the sample Target population – inferred population Study population – representative of the target population Sampling frame – list of sampling unit Sampling unit – unit to be sampled Observation unit – unit to be observed/measured
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
6
Sampling method Random vs. non-random Random ensures representativeness Simple vs. complex SRS = all samples have equal chance to be
selected i.e. equal probability of selection
Anything not SRS is complex sampling
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
7
6-7t
h Ap
ril 2
016
Simple Random Sampling Systematic
Random Sampling
Stratified Random Sampling
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
8
Stratified versus cluster sampling Stratified for heterogeneous groups
e.g. male-female, age groups Cluster for homogenous groups – rarely
homogenous, only in ideal situation e.g. schools, districts
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
9
6-7t
h Ap
ril 2
016
Cluster Stratified
• There are clusters not selected at all
• Large variance
• All strata selected• Smaller variance
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
Design Effect (deff) Design Effect = How much the sample differ from population Different value for different variable Usually deff for complex survey >> 1 If > 1.5, meaning effective loss 50% of sample if
designed using SRS
6-7t
h Ap
ril 2
016
10
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
Design Factor (deft) Design factor (deft) is sqrt(deff) ~ effect of
sampling to standard error If deft = 2, the SE is twice larger than if the
sampling design is SRS The use of deff or deft, is as guide (a priori) to
measure sample size or to measure whether sample size has been adequately achieved (post hoc)
6-7t
h Ap
ril 2
016
11
Sampling Weight aka Probability Weight N/n (inverse of sampling fraction) Two stage = (N1/n1)*(N2/n2) The sum of PW = population Weighting can increase standard error
12
Sampling weight… Why? There is always imperfection in sampling Weighting will try to correct
1. Unequal probability of selection – base/design weight
2. Non-response bias3. Stratification in population – trying to represent true
characteristics of population e.g. by sex, ethnic etc. – post stratification
Slide | 13
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
14
Example N = 100,000 people Sample (n) = 1000 Therefore, SW = 100,000/1000 = 100 Every 1 sample represents 100 people in that
region
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
15
Example – two stage 6-7t
h Ap
ril 2
016
Grade Class Students SW1 SW2 SWN1 n1 N2 n2 N1/n1 N2/n2 SW1*SW2
1 5 3 150 30 1.7 5.0 8.32 6 3 180 30 2.0 6.0 12.03 6 3 175 30 2.0 5.8 11.74 7 3 185 30 2.3 6.2 14.45 4 3 170 30 1.3 5.7 7.6
* Non-proportionate distribution
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
16
Example – stratified, one-stage 6-7t
h Ap
ril 2
016
Population Size Sample Size Sampling Weight
District 1 District 2 District 1 District 2 District 1 District 2
Urban Rural Urban Rural Urban Rural Urban Rural Urban Rural Urban Rural
Under 18 10000 13000 20000 15000 100 100 100 100 100 130 200 150
18-60 30000 25000 60000 45000 100 100 100 100 300 250 600 450
Above 60 5000 7000 5000 10000 100 100 100 100 50 70 50 100
45000 45000 85000 70000 300 300 300 300
1 sample from District 1 urban represents 100 people1 sample from District 2 urban represents 200 people
* Non-proportionate distribution
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
17
Complex sampling analysis Accommodate sampling weight Adjust for standard error
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
18
Estimating standard error Linearization method
(Taylor’s series) – assume linear association Replication method – sub-sample & calculate
variance for each samples – e.g. BRR (Balanced Repeated Replication), Jacknife, bootstrapping
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
19
Practical Session
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
20
Practical Sampling distribution Calculating sampling weight Preparing data for analysis Complex sample analysis (using SPSS)
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
21
Sampling distribution Using 2016 adult household by location
(urban/rural) in Malaysia, prepare sampling distribution to represent up to Malaysian urban/rural if the sample size calculated is 10,000 respondents
Taking 12 LQ per EB and 2 adults per LQ Proportionate to size
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
22
6-7t
h Ap
ril 2
016
Population Size by census ('000)*
No. State Urban Rural Total
1 Johor 1,682 537 2,219
2 Kedah 905 433 1,338
3 Kelantan 508 543 1,050
4 Melaka 537 47 584
5 Negeri Sembilan 492 198 690
6 Pahang 564 427 991
7 Perak 1,260 394 1,653
8 Perlis 102 66 167
9 Pulau Pinang 1,069 69 1,138
10 Sabah 1,064 597 1,661
11 Sarawak 1,009 694 1,703
12 Selangor 3,583 274 3,857
13 Terengganu 450 250 700
14 WP Kuala Lumpur 1,133 1,133
15 WP Labuan 50 6 57
16 WP Putrajaya 46 46
14,454 4,533 18,987
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
23
Calculating sampling weight 6-7t
h Ap
ril 2
016
PSU (Kindergarten) SSU (Children) URBAN RURAL URBAN RURAL
Total population *
Kindergarten visited
Total population *
Kindergarten visited
Total population *
Children Examined
Total population *
Children Examined
FT Kuala Lumpur 471
34
-
-
10,940
687
-
-
Perlis 65
5
222
7
1,007
97
2,557
113
Kedah 164
19
757
69
1,913
203
9,154
846
Penang 297
21
316
24
4,845
402
4,496
366
Perak 356
19
1,040
55
6,382
412
12,627
819
Selangor 1,051
93
607
55
22,951
2,204
7,994
815
Negeri Sembilan 206
15
420
30
2,924
253
4,850
373
Melaka 131
8
384
22
1,941
125
5,111
316
Johor 586
42
1,121
80
9,389
779
13,594
1,163
Pahang 235
13
873
45
4,188
224
12,092
642
Terengganu 400
21
813
35
6,979
336
9,308
427
Kelantan 144
9
1,042
58
2,924
178
14,882
934
FT Putrajaya 71
4
-
-
2,170
127
-
-
Sabah 395
32
1,230
101
10,330
998
13,837
1,006
Sarawak 590
30
1,493
67
13,395
644
14,936
725
FT Labuan 74
8
-
-
1,400
135
-
-
Total 5,236
373
10,318
648
103,678
7,804
125,438
8,545
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
24
Preparing data for analysis Merge SW into dataset
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
25
6-7t
h Ap
ril 2
016
(C) J
amal
ludi
n Ab
Rah
man
. All
right
s res
erve
d.
27
Complex sample analysis Preparing cs plan Analysis using SPSS
6-7t
h Ap
ril 2
016