1 Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of observation units...

Preview:

Citation preview

1

Ch 5: Cluster sampling with equal probabilities DEFN: A cluster is a group of

observation units (or “elements”)

Population Obs Unit Cluster

U.S. residents person household

Lincoln households household city block, or postal route

UNL employees employee department

Maple trees in Vermont

tree 1 km 1 km plot

2

Cluster sample DEFN: A cluster sample is a

probability sample in which a sampling unit is a cluster

Frame SU OU List of phone numbers phone number person List of blocks block household List of UNL departments department faculty member List of plots plot tree

3

Cluster sample – 2 1-stage cluster sampling

Divide the population (of K elements) into N clusters (of size Mi for cluster i)

Cluster = group of elements An element belongs to 1 and only 1 cluster

Sampling unit Cluster = group of elements = PSU = primary

sampling unit We’ll start by assuming a SRS of clusters (equal prob) Can use any design to select clusters (STS, PPS) –

we’ll work with other designs in Ch 6 Data collection

Collect information on ALL elements in the cluster

4

1-stage CS STS

Take an SRS f rom ever stratum:Take an SRS of clusters; observe all elements within the clusters in thesample:

A block of cells is a stratum

A block of cells is a clusterSU is a cluster

Don’t sample from every cluster

SU is an element (or OU)

Sample from every stratum

Sample of 40 elements

5

Cluster vs. stratified sampling Cluster sample

Divide K elements into N clusters Cluster or PSU i has Mi elements

Take a sample of n clusters Stratified sampling

N elements divided into H strata An element belongs to 1 and only 1 stratum

Take a sample of n elements, consisting of nh elements from stratum h for each of the H strata

N

iiMK

1

6

Cluster sample – 3 2-stage cluster sampling (later)

Process Select PSUs (stage 1) Select elements within each sampled PSU (stage

2) First stage sampling unit is a …

PSU = primary sampling unit = cluster Second stage sampling unit is a …

SSU = secondary sampling unit = element = OU Only collect data on the SSUs that were

sampled from the cluster

7

1-stage vs. 2-stage cluster sampling

Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:

1-stage cluster sample (stop here)

OR

Stage 1 of 2-stage cluster sample(select PSUs)

Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)

8

Why use cluster sampling? May not have a list of OUs for a frame, but a list

of clusters may be available List of Lincoln phone numbers (= group of residents) is

available, but a list of Lincoln residents is not available List of all NE primary and secondary schools (= group

of students) is available, but a list of all students in NE schools is not available

May be cheaper to conduct the study if OUs are clustered

Occurs when cost of data collection increases with distance between elements

Household surveys using in-person interviews (household = cluster of people)

Field data collection (plot = cluster of plants, or animals)

9

Defining clusters due to frame limitations A cluster (or PSU) is a group of

elements corresponding to a record (row) in the frame

Example Population = employees in

McDonald’s franchises Element = employee Frame = list of McDonald’s stores PSU = store = cluster of employees

10

Defining clusters to reduce travel costs A cluster (or PSU) is a group of

nearby elements Example

Population = all farms Element = farm Frame = list of sections (1 mi x 1 mi

areas) in rural area PSU = section = cluster of farms

11

Cluster samples usually lead to less precise estimates Elements within clusters tend to be correlated

due to exposure to similar conditions Members of a household Employees in a business Plants or soil within a field plot

We are getting less information than if selected same number of unrelated elements

Select sample of city blocks (clusters of households) Ask each household:

Should city upgrade storm sewer system? PSU (city block) 1

No storm sewer households will tend to say yes PSU (city block) 2

New development households will tend to say no

12

Defining clusters for improved precision Define clusters for which within-cluster

variation is high (rarely possible) Make each cluster as heterogeneous as possible

Like making each cluster a mini-population that reflects variation in population

Minimizes the amount of correlation among elements in the cluster

Opposite of the approach to stratification Large variation among strata, homogeneous within

strata Define clusters that are relatively small

Extreme case is cluster = element Decreasing the number of correlated

observations in the sample

13

Example for single-stage cluster sampling w/ equal prob (CSE1) Dorm has N = 100 suites (clusters) Each suite has Mi = 4 students (4 elements

in cluster i , i = 1, 2, … , N) Note that there are

Take SRS n = 5 suites (clusters) Ask each student living in each of the 5

suites How many nights per week do you eat dinner in

the dining hall? Will get observations from a sample of 20

students = 5 suites x 4 students/suite

population in students 400)4(1001

N

iiMK

14

Dorm example – 2

Stu-dent

Suite 6

Suite 21

Suite 28

Suite 54

Suite 89

1 5 3 6 5 1

2 5 2 4 4 4

3 4 4 4 6 3

4 6 5 5 6 2

Total 20 14 19 21 10

15

Dorm example – 3 SRS of n = 5 dorm rooms Data on each cluster (all students in dorm

room) ti = total number of dining hall dinners for dorm

room i t2 = 14 dining hall dinners for 4 students in dorm

room 2 Estimated total number of dining hall nights

for the dorm students HT estimator of total = pop size x sample mean (of

cluster totals)dinners hall dining 1680)8.16(100

)1021191420(51

1001ˆ

1

n

iiunb t

nNt

16

Notation Indices

i = index for PSU i i j = index for SSU j in PSU i

Number of PSUs (clusters) in the population N clusters

Number of SSUs (elements) in a PSU (cluster) Mi elements

Number of SSUs (elements) in the polulation

In Chapters 1-4, this was designated as N elements

1

N

iiMK

17

Notation – 2

N = 12 PSUs

K = 20 + 12 + … + 9 + 16

= 150 SSUs

M1 = 20 SSUs

M2 = 12 SSUs

M12 = 16 SSUs

M11 = 9 SSUs

i =1

i =9

i =4i =3i =2

i =11 i =12

i =5

SSU i = 9j = 1 SSU

i = 9j = 7

18

Notation – 3 Response variable for SSU j in PSU

i yij e.g., age of j-th resident in household

i e.g., whether or not dorm resident j

in room i owns a computer

19

Cluster size =

Cluster population total

Note that we observe cluster population total (or mean or variance) for each sample cluster in 1-stage cluster sampling

We will estimate cluster parameters in 2-stage cluster sampling

iM

jiji yt

1

Cluster-level population parameters (for cluster i )

Mi elements

20

Cluster population mean

Within-cluster variance

Cluster-level population parameters (for cluster i ) – 2

iM

jiUij

ii yy

MS

1

22

11

i

iM

jij

iiU M

ty

My

i

1

1

21

75.733.4

39

9

21111

11

11

Sy

t

M

U

Popuation

83.3

46

88.6Sboxes12

2

2

222

Uy

t

M

33.3

30

00.9S9

6

6

266

Uy

t

M

00.7

95.4

99

20

21

1

1

1

S

y

t

M

U

1-stage cluster sample

22

Cluster-level population parameters (for cluster i ) – 3 For 1-stage cluster samples

Have a complete enumeration of the cluster elements

Cluster population parameters are known For 2-stage cluster samples

Observe data on a sample of elements in a cluster

Estimate cluster population parameters

23

Population parameters Same parameters as in previous

chapters, rewritten in notation for cluster sampling

Population size

(** K was referred to as N in previous chapters)

Population total (sum of all cluster totals)

N

ii

N

i

M

jij tyt

i

11 1

elements 1

N

iiMK

24

Population Parameters-2 Population mean (of K elements)

Population variance (among K elements)

Variance among N cluster totals

N

i

M

jUij

i

yyK

S1 1

22

11

N

i

M

jijU

i

yK

y1 1

1

N

iit N

tt

NS

1

22

11

25

Data from cluster samples Work with element and cluster-level data Element data set will have columns for

Cluster id Element id within cluster Variable (y)

Will also summarize this data set to generate cluster parameters (1-stage) or estimates of cluster parameters (2-stage)

Cluster id Cluster total (or estimate) Cluster mean (or estimate) Cluster variance (or estimate)

26

1-stage cluster sampleElement data Cluster

summary

i j yij

1 1 y11

1 2 y12

1 3 Y13

1 4 y14

2 1 y21

2 2 y22

2 3 y23

3 1 y31

i ti

1 t1

2 t2

3 t3

iUy

Uy1

Uy2

Uy3

2iS

21S22S23S

27

Estimation for CSE1 Chapter reading

Section 5.2.1 covers equal sized clusters (Mi constant, read)

We’ll start with 5.2.3 (unequal sized clusters, Mi varies)

Section 5.2.2 covers theory Two types estimators

Unbiased – HT estimator Ratio estimation

Equal probability sample of clusters – assume SRS of clusters

28

CSE1 unbiased estimation under SRS – total t Estimator for population total using data

collected from a 1-stage cluster sample SRS of clusters

Estimator of variance of

n

iiunb t

nN

t1

ˆ

unbt

2

1

22

11

where1ˆˆ

N

tt

ns

n

s

Nn

NtV unbi

n

it

tunb

29

Dorm example – 4 Estimated population total

Estimated variance

dinners hall dining 1680)8.16(100

)1021191420(51

1001ˆ

1

n

iiunb t

nNt

06.203ˆ

230,415

7.21

100

511001ˆˆ

7.21])8.1610(...)8.1620[(15

1

1

22

2

22

1

2

2

unb

tunb

n

i

unbit

tSE

n

s

N

nNtV

N

tt

ns

30

Two events : A and B Pr{ A and B both occur }

= P { A occurs } x P { B occurs given A occurs } In our setting

A = sample cluster i B = sample element j (in cluster i)

Inclusion probability for for element j in cluster i ij = Pr {including element j and cluster i in sample}

= Pr {including cluster i in sample} x Pr {incl. element j given cluster i has been

included in sample}

CSE1 inclusion probability for an element

31

Need to two pieces Pr {including cluster i in sample} = n / N Pr {including element j given cluster i has been

included in sample} = 1 Inclusion probability ij

= Pr {including element j and cluster i in sample}= Pr {including cluster i in sample} x

Pr {including element j given cluster i has been included in sample} = (n / N ) x 1 = n / N

CSE1 inclusion probability for an element – 2

32

CSE1 weight for an element Weight for element j in cluster i

Inverse element inclusion probability wij = 1/ ij = N /n

Estimator using weights

n

ii

n

i

M

jij

n

i

M

jijijunb t

nN

ynN

ywtii

11 11 1

ˆ

33

Dorm example – 5 Inclusion probability for student j in

dorm room i N = 100 dorm rooms n = 5 sample dorm rooms Take all 4 students in dorm room ij = n / N = 1/20 = 0.05

Weight for student j in dorm room i wij = N / n = 20 students

34

CSE1 unbiased estimation under SRS – mean Unbiased estimator for population

mean For SRS, estimator for total divided by

number of population elements (OUs) Units are y-units per element

unbunb

unbunb

tVK

yV

tK

y

ˆˆ1ˆˆ

ˆ1ˆ

2

Uy

35

Dorm example – 6

51.0ˆ

257688.0400

230,41ˆˆˆˆ

per weekstudent per dinners hall dining 20.4

)4(100

1680ˆˆ

22

unb

unbunb

unbunb

ySE

K

tVyV

K

ty

36

Unbiased estimation – proportion p What is y ?

37

Ratio estimation Usually ti (cluster total) is correlated with Mi

(cluster size) As Mi (# SSUs/elements in cluster i ) increases,

value for ti (total of yij for cluster i ) increases Positive correlation between Mi and ti No intercept

Perfect conditions for SRS ratio estimator

Notation of Ch 3 Notation of Ch 5

yi (variable of interest) ti (cluster total)

xi (auxiliary info) Mi (cluster size)

38

Ratio estimation for CSE1 Estimator for population mean

Units are y-units per element

n

ii

n

ii

r

M

ty

1

39

Ratio estimation for CSE1 – 2 Estimator for variance of ratio

estimator of population mean

is average cluster size for populationUM

1

ˆ1

1

1

ˆ1

1ˆˆ

1

22

2

1

2

2

n

yyM

MnNn

n

Myt

MnNn

yV

n

irii

U

n

iiri

Ur

40

Ratio estimation for CSE1 – 3 Average cluster size

If unknown, can estimate with sample mean of cluster sizes

NK

MN

MN

iiU

1

1

n

iiS M

nM

1

1

41

Dorm example – 7 Estimated population mean

Average cluster size

n

ii

n

ii

r

M

ty

1

N

KM

NM

N

iiU

1

1

42

Dorm example – 8 Estimated variance

1

ˆ1

1

1

ˆ1

1ˆˆ

1

22

2

1

2

2

n

yyM

MnNn

n

Myt

MnNn

yV

n

irii

U

n

iiri

Ur

43

Ratio estimation for CSE1 – 4 Estimator for population total

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

44

Dorm example – 9 Estimated population total

Estimated variance

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

45

CSE1: impact of cluster size If cluster sizes Mi are variable across

clusters, generally estimate population parameter with less precision If ti is related to Mi , then get large

variation among cluster totals if Mi is variable

Variance of population parameter estimator (unbiased or ratio) is a function of variation among cluster totals

46

2-stage equal probability cluster sampling (CSE2) CSE2 has 2 stages of sampling

Stage 1. Select SRS of n PSUs from population of N PSUs

Stage 2. Select SRS of mi SSUs from Mi elements in PSU i sampled in stage 1

47

2-stage cluster sampling

Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:

Stage 1 of 2-stage cluster sample(select PSUs)

Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)

48

Motivation for 2-stage cluster samples

Recall motivations for cluster sampling in general Only have access to a frame that lists

clusters Reduce data collection costs by going

to groups of nearby elements (cluster defined by proximity)

49

Motivation for 2-stage cluster samples – 2 Likely that elements in cluster will be

correlated May be inefficient to observe all elements in

a sample PSU Extra effort required to fully enumerate a

PSU does not generate that much extra information

May be better to spend resources to sample many PSUs and a small number of SSUs per PSU Possible opposing force: study costs

associated to going to many clusters

50

Have a sample of elements from a cluster We no longer know the value of

cluster parameter, ti

Estimate ti using data observed for mi SSUs

CSE2 unbiased estimation for population total t

im

jij

i

iiii y

m

MyMt

1

ˆ

51

CSE2 unbiased estimation for population total – 2 Approach is to plug estimated

cluster totals into CSE1 formula CSE1

CSE2

n

iii

n

jiunb yM

nN

tnN

t11

ˆˆ

n

iiUi

n

jiunb yM

nN

tnN

t11

ˆ

52

The variance of has 2 components associated with the 2 sampling stages1. Variation among PSUs2. Variation among SSUs within PSUs

CSE2 unbiased estimation for population total – 3

unbt

n

i i

ii

i

itunb m

sM

M

m

nN

n

s

Nn

NtV1

22

22 11ˆˆ

among PSU

within PSU

53

In CSE1, we observe all elements in a cluster We know ti

Have variance component 1, but no component 2

In CSE2, we sample a subset of elements in a cluster We estimate ti with Component 2 is a function of estimates

variance for

CSE2 unbiased estimation for population total – 4

it

it

i

i

i

ii m

s

M

mM

22 1

54

CSE2 unbiased estimation for population total – 5 Estimated variance among cluster

totals

Estimated variance among elements in a cluster

n

i

unbit N

tt

ns

1

2

ˆ1

1

im

jiij

ii yy

ms

1

22

11

55

CSE2 unbiased estimation for population total – 6

n

i i

ii

i

itunb m

sM

M

m

nN

n

s

Nn

NtV1

22

22 11ˆˆ

n

i

unbit N

tt

ns

1

2

ˆ1

1

im

jiij

ii yy

ms

1

22

11

56

Dorm example – 10 Stage 2: select 2 students in each

room

Stu-

dent

Rm 6

Rm 21

Rm 28

Rm 54

Rm 89

1 5 3 6 5 1

2 5 2 4 4 4

3 4 4 4 6 3

4 6 5 5 6 2

Total

? ? ? ? ?

57

Dorm example – 11 Stage 1

Cluster = N = n = SRS

Stage 2 Element = Mi = mi = SRS

it

58

Dorm example – 12

it

Stu-dent

(j)

Rm 6

(i=1)

Rm 21

(i=2)

Rm 28

(i=3)

Rm 54

(i=4)

Rm 89

(i=5)

1 5 3 4 5 4

2 6 2 5 4 2

2is

iy

ii yM

im

jiij

i

yym 1

2

11

59

Dorm example – 13

n

jiunb t

nN

t1

ˆˆ

n

i

unbit N

tt

ns

1

2

ˆ1

1

60

Dorm example – 14

n

i i

ii

i

itunb m

sM

M

m

nN

n

s

Nn

NtV1

22

22 11ˆˆ

61

CSE2 unbiased estimation for population mean

2

ˆˆˆˆ

ˆˆ

K

tVyV

Kt

y

unbunb

unbunb

Uy

62

Dorm example – 15

2

ˆˆˆˆ

ˆˆ

K

tVyV

Kt

y

unbunb

unbunb

63

Two events : A and B Pr{ A and B both occur }

= P { A occurs } x P { B occurs | A occurs } “|” denotes “given” (a condition)

In our setting A = sample cluster i B = sample element j

Inclusion probability symbols ij = Pr {including element j and cluster i in sample} i = Pr {including cluster i in sample} j|i = Pr {incl. element j | cluster i has been included

in sample}

CSE2 inclusion probability for an element

64

Need to two pieces i = Pr {including cluster i in sample} = n / N

j|i = Pr {including element j | cluster i has been included in sample} = mi /Mi

Inclusion probability for element j in cluster i ij = i j|i =

CSE2 inclusion probability for an element – 2

i

i

Mm

Nn

65

CSE2 weight for an element Sampling Weight for element j in

cluster i

Estimator for population total

n

ii

n

iii

n

i

M

jij

i

in

i

M

jij

i

in

i

M

jijijunb

tnN

yMnN

ymM

nN

ymM

nN

ywtiii

11

1 11 11 1

ˆ

ˆ

i

i

ijij m

MnN

w 1

66

What does equal probability mean in Ch 5? Clusters (PSUs) sampled using SRS Equal inclusion probability for stage 1

PSUs (clusters)

i is same for all i

Nn

i

67

What does equal probability mean in Ch 5? – 2 Elements (SSUs) in a given PSU are

sampled using SRS All elements (j ) in a sample PSU (i ) are

selected with equal probability This is a conditional probability (given PSU i )

For a given PSU i , j|i is the same for all elements j

i

iij M

m|

68

What does equal probability mean in Ch 5? – 3 Note that

Equal probability at stage 1 (i )

plus Equal probability at stage 2 given PSU i (j|i )

does NOT imply equal inclusion probability for an element

In fact, element-level (unconditional) inclusion probability is not necessarily constant

Depends on cluster size Mi and sample size mi for the cluster to which the element belongs

i

iij M

mNn

69

CSE2 ratio estimation for population mean

n

ii

n

iii

n

ii

n

ii

r

M

yM

M

ty

1

1

1

1

ˆˆ

Uy

70

CSE2 ratio estimation for population mean – 2

n

iiiU

n

irii

n

iriiir

n

i i

i

i

ii

r

Ur

Mn

MMM

yyMn

yMyMn

s

ms

M

mM

Nnns

Nn

MyV

1S

1

22

1

22

1

22

2

1or of mean sampleby estimated be can

ˆ1

1ˆ1

1

11

11ˆˆ

71

Dorm example – 16

it

Stu-dent

(j)

Rm 6

(i=1)

Rm 21

(i=2)

Rm 28

(i=3)

Rm 54

(i=4)

Rm 89

(i=5)

1 5 3 4 5 4

2 6 2 5 4 2

5.5 2.5 4.5 4.5 3.0

22 10 18 18 12

0.5 0.5 0.5 0.5 2.0

2is

iy

ii yM

im

jiij

i

yym 1

2

11

72

Dorm example – 16

n

ii

n

ii

r

M

ty

1

1

ˆˆ

n

iriir yyM

ns

1

222 ˆ

11

73

Dorm example – 17

n

i i

i

i

ii

rr m

sM

mM

Nnns

Nn

MyV

1

22

21

11

1ˆˆ

74

CSE2 ratio estimation for population total t

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

75

Dorm example – 18

rr yKt ˆˆ

rr yVKtV ˆˆˆˆ 2

76

Coots egg example Target pop = American coot eggs in Minnedosa,

Manitoba PSU / cluster = clutch (nest) SSU / element = egg w/in clutch Stage 1

SRS of n = 184 clutches N = ??? Clutches, but probably pretty large

Stage 2 SRS of mi = 2 from Mi eggs in a clutch Do not know K = ??? eggs in population, also large Can count Mi = # eggs in sampled clutch i

Measurement yij = volume of egg j from clutch i

77

Coots egg example – 2 Scatter plot of volumes

vs. i (clutch id) Double dot pattern - high

correlation among eggs WITHIN a clutch

Quite a bit of clutch to clutch variation

Implies May not have very high

precision unless sample a large number of clutches

Certainly lower precision than if obtained a SRS of

eggs3681

n

iim

ijy

i

Could use a side-by-side plot for data with larger cluster sizes – PROC UNIVARIATE w/ BY CLUSTER and PLOTS option

78

Coots egg example – 3 Plot

Rank the mean egg volume for clutch i ,

Plot yij vs. rank for clutch i Draw a line between yi 1 and

yi2 to show how close the 2 egg volumes in a clutch are

Observations Same results as Fig 5.3, but

more clear Small within-cluster

variation Large between-cluster

variation Also see 1 clutch with large

WITHIN clutch variation check data (i = 88)

ijy

i sorted by iy

iy

79

Coots egg example – 4 Plot si vs. for clutch i Since volumes are

always positive, might expect si to increase as gets larger

If is very small, yi 1 and yi 2 are likely to be very small and close small si

See this to moderate degree

Clutch 88 has large si , as noted in previous plot

is

iy

iy

iy

80

Coots egg example – 5 Estimation goal

Estimate , population mean volume per coot egg in Minnedosa, Manitoba

What estimator? Unbiased estimation

Don’t know N = total number of clutches or K = total number of eggs in Minnedosa, Manitoba

Ratio estimation Only requires knowledge of Mi , number of eggs in

selected clutch i , in addition to data collected May want to plot versus Mi it

Uy

81

Coots egg example – 6

Clutch

Mi

iy 2is

it i

ii

i ms

MM

222

1

2ˆˆ

rii yMt

1 13 3.86 0.0094 50.23594 0.671901 318.9232 2 13 4.19 0.0009 54.52438 0.065615 490.4832 3 6 0.92 0.0005 5.49750 0.005777 89.22633 4 11 3.00 0.0008 32.98168 0.039354 31.19576 5 10 2.50 0.0002 24.95708 0.006298 0.002631 6 13 3.98 0.0003 51.79537 0.023622 377.053 7 9 1.93 0.0051 17.34362 0.159441 25.72099 8 11 2.96 0.0051 32.57679 0.253589 26.83682 9 12 3.46 0.0001 41.52695 0.006396 135.4898 10 11 2.96 0.0224 32.57679 1.108664 26.83682 … … … … … … …

180 9 1.95 0.0001 17.51918 0.002391 23.97106 181 12 3.45 0.0017 41.43934 0.102339 133.4579 182 13 4.22 0.00003 54.85854 0.002625 505.3962 183 13 4.41 0.0088 57.39262 0.630563 625.7549 184 12 3.48 0.000006 41.81168 0.000400 142.1994 sum 1757 4375.947 42.17445 11,439.58 var 149.565814

ry 2.490579

82

Don’t

know

Use

Coots egg example – 7

061.0184

511.62549.91ˆ

18417.421

184511.62184

1549.91ˆˆ

549.9184/1757

511.62183

58.439,111

ˆˆ

49.21757

947.4375ˆ

ˆ

2

2

2

r

r

S

riiSi

r

iSi

iSi

r

ySE

NNyV

M

n

yMts

M

ty

Don’t know N , but assumed large

FPC 1

2nd term is very small, so approximate SE ignores 2nd

UM

sM

83

Coots egg example – 8 What is first-stage PSU inclusion

probability?

What is conditional SSU inclusion probability at second stage?

What is unconditional SSU inclusion probability?

84

CSE2: Unbiased vs. ratio estimation Unbiased estimator can poor precision if

Cluster sizes (Mi ) are unequal ti (cluster total) is roughly proportional to Mi

(cluster size)

Biased (ratio estimator) can be precise if ti roughly proportional to Mi

This happens frequently in pops w/cluster sizes (Mi) vary

85

CSE2: Self-weighting design Stage 1: Select n PSUs from N PSUs in pop

using SRS Inclusion probability for PSU i :

Stage 2: Choose mi proportional to Mi so that mi /Mi is constant, use SRS to select sample

Inclusion probability for SSU j given PSU i :

Unconditional inclusion probability for SSU j in cluster i is constant for all elements

Nn

i

cMm

i

iij |

cNn

ij Inclusion probability may vary in practice because may not be possible for mi /Mi to be equal to c for all clusters

86

Self-weighting designs in general Why are self-weighting samples

appealing?

Are dorm student or coot egg samples self-weighting 2-stage cluster samples?

What other (non-cluster) self-weighting designs have we discussed?

87

Self-weighting designs in general – 2 What is the caveat for variance

estimation in self-weighting samples? No break on variance of estimator – must

use proper formula for design

Why are self-weighting samples appealing? Simple mean estimator Homogeneous weights tends to make

estimates more precise

88

Return to systematic sampling (SYS) Have a frame, or list of N elements Determine sampling interval, k

k is the next integer after N/n Select first element in the list

Choose a random number, R , between 1 & k R-th element is the first element to be

included in the sample Select every k-th element after the R-th

element Sample includes element R, element R + k,

element R + 2k, … , element R + (n-1)k

89

SYS example Telephone survey of members in an

organization abut organization’s website use N = 500 members Have resources to do n = 75 calls N / n = 500/75 = 6.67 k = 7 Random number table entry: 52994

Rule: if pick 1, 2, …, 7, assign as R; otherwise discard #

Select R = 5 Take element 5, then element 5+7 =12, then

element 12+7 =19, 26, 33, 40, 47, …

90

SYS – 2 Arrange population in rows of

length k = 7R 1 2 3 4 5 6 7 i

1 2 3 4 5 6 7 1

8 9 10 11 12 13 14 2

15 16 17 18 19 20 21 3

22 23 24 25 26 27 28 4

… …

491

492

493

494

495

496

497

71

498

499

500

72

91

Relationship between SYS and cluster sampling Design relationships

Element = ? Cluster = ? Sampling unit(s) = ? Cluster sampling design = ?

Relationship between frame ordering and expected precision of a an estimate from a cluster sample?

Periodic, where cycle of pattern is coincident with sampling interval k

Ordered by X , which is correlated with response variable Y

Random

92

SYS – 3 Suppose X [age of member] is correlated with

Y [use of org website] Sort list by X before selecting sample

k 1 2 3 4 5 6 7 X i

1 2 3 4 5 6 7 young 1

8 9 10 11 12 13 14 2

15 16 17 18 19 20 21 3

22 23 24 25 26 27 28 4

… mid …

491

492

493

494

495

496

497

71

498

499

500

old 72