29
Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada ICES III June 2007 Optimal Coordination of Samples in Business Surveys

Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

Embed Size (px)

Citation preview

Page 1: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

Lenka Mach, Statistics CanadaIoana Şchiopu-Kratina, Statistics Canada

Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics Canada

ICES IIIJune 2007

Optimal Coordination of Samplesin Business Surveys

Page 2: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

2

OUTLINE OF THE PRESENTATION:

1. Coordinated sampling

2. Optimal Sample Coordination

2.1 Transportation Problem

2.2 Reduced Transportation Problem

2.3 Variability of the Overlap

3. Example 1: NWCR method for negative coordination of two surveys.

4. Example 2: Reduced TP for positive coordination after re-stratification.

5. Conclusion

Page 3: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

3

1. COORDINATED SAMPLING

• Needed when multiple sample surveys of overlapping populations

are conducted.

• Encompasses many different techniques to control the overlap of samples = number of common units.

higher overlap (positive coordination)• Objective:

lower overlap (negative coordination)

than if samples are selected independently.

• References: Ernst (1999), ICES II (2000), etc.

Page 4: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

4

1. COORDINATED SAMPLING

First Survey:

S = set of all possible samples s

(marginal) prob. distribution on S

Second Survey:

S’ = set of all possible samples s’

(marginal) prob. distribution on S’

Integrated surveys:

joint prob. distribution s. t.

and

SsspP

SsspQ

SsSsssp ,,

Ssspssps

,, Ssspssps

,,

Page 5: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

5

1. COORDINATED SAMPLING

Overlap of s and s’

= number of units that s and s’ have in common

Expected sample overlap

(1)

Survey are positively coordinated if

sso ,

sspssossoEs s

,,,

spspssosspssos ss s

,,,

Page 6: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

6

2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem

We integrate two surveys so that the expected overlap is maximized (minimized):

Find max (min) of (1)

over all (2)

subject to (3)

sspssossoEs s

,,,

SsSsssp ,,X

Ssspssps

,,

Ssspssps

,,

1, s s

ssp

objectivefunction

unknown

constraints

Page 7: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

7

2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem

s’1 s’2 s’3 … s’L p(s)

s1

…p(s1)

s2

…p(s2)

s3

…p(s3)

… … … … … … …

sK … p(sK)

p(s’) p(s’1) p(s’2) p(s’3) … p(s’L) 1

ss’

o(s1,s’1) o(s1,s’2)

o(s2,s’1)

o(s1,s’3) o(s1,s’L)

o(s3,s’1)

o(s2,s’2) o(s2,s’3) o(s2,s’L)

o(s3,s’2)

o(sK,s’L)

o(s3,s’L)o(s3,s’3)

o(sK,s’1) o(sK,s’2) o(sK,s’3)

X1 1 X12 X1 3 X1 L

X2 1 X2 2 X2 3X2L

X3 1 X3 3 X3 LX3 2

XK 1 XK 3XK 2 XK L

Page 8: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

8

2. OPTIMAL SAMPLE COORDINATION 2.1 Transportation Problem

TP is too large, too many variables!

Example: First survey selects SRSWOR of n = 20 from N = 40.

= 137,846,528,820

n

NK

BUT, for stratified SRSWOR designs, we can reduce TP by grouping samples!Condition: The matrix of o(s, s’) within each group must be “symmetric”.

We use a two-stage procedure.

Page 9: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

9

2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem

Notation:P frame for Survey 1, P’ frame for Survey 2, C = P P’ c = c(s) = number of units in C sc’ = c’(s’) = number of units in C s’

Solution - Stage 1:• Group samples s super-rows c• Group samples s’ super-columns c’• Form a matrix of blocks (c, c’), define block optimum o(c, c’) • Solve the reduced TP joint probabilities p(c, c’)

Solution - Stage 2:Distribute p(c, c’) evenly among the pairs (s, s’) that have the optimum overlap

– each row s within the block gets the same probability– each column s’ within the block gets the same probability

Page 10: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

10

2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem

Matrix of o(s, s’) within a block.

211121112211121112

231

223

221

131

123

121

213121232121

duuduuduuduuduuduu

bbuubbuubbuu

Page 11: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

11

2. OPTIMAL SAMPLE COORDINATION 2.2 Reduced Transportation Problem

Example 1:

Survey 1: N =40, SRSWOR n =20

Survey 2: N’=41, SRSWOR n’=20

C=37

D=3 B=4

c = 17, 18, 19, 20 4 super-rowsc’ = 16, 17, 18, 19, 20 5 super-columns

Reduced TP has only 4 x 5 = 20 unknowns.

Constraints:

n

Ncn

DcCcp )(

'

'''')'( n

Ncn

BcCcp

Page 12: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

12

2. OPTIMAL SAMPLE COORDINATION 2.3 Variability of the Overlap

• Optimal coordination maximizes (minimizes)• In practice, one pair of samples (s, s’) is selected

its overlap o(s, s’) should be close to ! • TP can be used in 2 steps:

– Step 1: as described on Slide 6

– Step 2: - Use from Step 1 as an additional constraint

- New objective function: For example, find the minimum of

(4)

ssoE ,

ssoE ,

ssoE ,

sspssoEssossoVs s

,,,, 2

Page 13: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

13

3. Example 1NWCR method for negative coordination of two surveys.

Survey 1: N =40, SRSWOR n =20

Survey 2: N’=41, SRSWOR n’=20D=3, C=37, B=4

Minimize . ssoE ,

Stage 1 – Solve the Reduced TP:• Group samples s into super-rows and s’ into super-columns. • Order super-rows by ascending c and super-columns by descending c’,

form a matrix of blocks.• Block optimum o(c, c’) = max{0, c+c’–C} = smallest possible overlap o(s, s’) within (c, c’). • Use NWCR algorithm to obtain a solution.

Stage 2 - Determine p(s, s’) for each pair (s, s’):• Distribute p(c, c’) equally among all pairs (s, s’) within the block that

have o(s, s’) = o(c, c’).

Page 14: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

14

3. Example 1 NWCR method for negative coordination of two surveys.

Table 1a: Reduced TP, p(c, c’) assigned by NWCR

c’p(c)

c 20 19 18 17 16

17 0 0.0591 0 0.0563 0 0 0 0 0 0 0.1154

18 1 0 0 0.2064 0 0.1782 0 0 0 0 0.3846

19 2 0 1 0 0 0.2158 0 0.1689 0 0 0.3846

20 3 0 2 0 1 0 0 0.0675 0 0.0478 0.1154

p(c’) 0.0591 0.2627 0.3940 0.2364 0.0478 1.0000

o(c, c’)p(c, c’)

Page 15: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

15

3. Example 1 NWCR method for negative coordination of two surveys.

Stage 2 - Distribution of probabilities within blocks

Consider (c=17, c’=20) with o(c, c’)=0:

• there are = 15,905,368,710 different samples (rows) s

• there are = 15,905,368,710 different samples (columns) s’

The matrix of overlaps o(s, s’) is symmetric:For each sample s, there is exactly one sample s’ such that o(s, s’)=0.For each sample s’, there is exactly one sample s such that o(s, s’)=0.

Each sample s will get probability of Each sample s’ will get probability of

33

1737

44

2037

,71015,905,368

0.0591

Page 16: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

16

3. Example 1 NWCR method for negative coordination of two surveys.

Theorem:

(a) The joint density XNWCR obtained by the NWCR method for negative coordination satisfies the constraints given in (3).

(b) XNWCR has the minimum expected overlap within the set of joint densities that satisfy (3).

(c) XNWCR has the minimum variance within this set of joint densities.

Proof in Mach, Reiss, Şchiopu-Kratina (2006).

Page 17: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

17

3. Example 1 NWCR method for negative coordination of two surveys.

Simultaneous Selection

i) Select one block using the joint probabilities p(c, c’) in Table 1a.ii) To draw samples s and s’, randomly select units from each set: C = common

units, D = deaths, B = births.

Suppose block (19, 18) selected in i). To select s, randomly select 19 units from 37 in C, and 1 unit from 3 in D . To select s’, take the remaining 37-19=18 units from C, and randomly select two units from 4 in B .

Sequential Selection (s drawn first)

i) Select one block from the super-row c(s) using the conditional probabilities p{(c, c’)| c(s)} corresponding to the joint probabilities in Table 1b.ii) Randomly select units from C and B sets to form s’.

Page 18: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

18

3. Example 1 NWCR method for negative coordination of two surveys.

Deaths(D=3)

Common Units(C=37)

Births(B=4)

s s’

n = 20 o (s, s’ ) = 0 c’ = 18 n ’= 20c = 19

Page 19: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

19

3. Example 1 NWCR method for negative coordination of two surveys.

20 19 18 17 16 p(c)

17 0 0.0083 0 0.0336 0 0.0426 0 0.0239 0 0.0043 0.1127

18 1 0.0225 0 0.1022 0 0.1574 0 0.0890 0 0.0160 0.3871

19 2 0.0210 1 0.0987 0 0.1508 0 0.0993 0 0.0182 0.3880

20 3 0.0052 2 0.0251 1 0.0426 0 0.0319 0 0.0074 0.1122

p(c’) 0.0570 0.2596 0.3934 0.2441 0.0459 1.0000

Table 1b: Empirical block probabilities for Sequential SRSWOR (PRN)

E [o(s, s’)] V [o(s, s’)]

NWCR 0 0

PRN 0.2716 0.3212

Table 1c: Expectations

Page 20: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

20

4. Example 2Reduced TP for positive coordination after re-stratification.

C1 : C1 = 2

New stratum:N’ =15n’ = 5

C2 : C2 = 3

C3 : C3 = 10

Old stratum 1:N1 =20n1 =10

Old stratum 2:N2 = 6n2 = 3

Old stratum 3:N3 =10n3 = 2

Objective: Maximize . ssoE ,

Page 21: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

21

4. Example 2 Reduced TP for positive coordination after re-stratification.

Super-rows:→ 3 x 4 x 1 = 12 super-rows

Super-columns:

(0, 0, 5), (0, 1, 4), (0, 2, 3), (0, 3, 2), (1, 0, 4), (1, 1, 3), (1, 2, 2), (1, 3, 1),

(2, 0, 2), (2, 1, 2), (2, 2, 1), (2, 3, 0). → 12 super-columns

Reduced TP has 12 x 12 = 144 unknowns.

Constraints:

:,, 321 cccc .2,3,2,1,0,2,1,0 321 ccc

:5''',',','' 321321 ccccccc

2

2

22

22

2

2

1

1

11

11

1

1)(nN

cnCN

cC

nN

cnCN

cC

p c

'

''''

)'(3

3

2

2

1

1nN

cC

cC

cC

p c

Product of hypergeometricprobabilities

Multihypergeometricprobabilities

Page 22: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

22

4. Example 2 Reduced TP for positive coordination after re-stratification.

c 1,2,2 2,1,2 0,3,2 … 0,0,5 p(c)

2,3,2 5 0 5 0.0115 5 0 … 2 0 0.0118

2,2,2 5 0 5 0.0301 4 0 … 2 0 0.1066

1,3,2 5 0 4 0 5 0.0031 … 2 0 0.0263

… … … … … … …

0,0,2 2 0 2 0 2 0 … 2 0.0118 0.0118

p(c’) 0.0899 0.0450 0.0150 … 0.0839 1.0000

c’

Table 2a: Block overlap and probabilities p(c,c’) (TP solution)

o(c, c’) = min(c1,c1’) + min(c2,c2’) + min(c3,c3’)

ETP [o(s, s’)] = 3.6494 VTP [o(s, s’)] = 0.7292

Page 23: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

23

4. Example 2 Reduced TP for positive coordination after re-stratification.

Sequential selection:Suppose c = (2,3,2) with p(c’)=0.01184

c’ 2,1,2 2,3,0 Σ

p(c’) 0.01151 0.00033 0.01184

p{c’ |c=(2,3,2)} 0.97213 0.02787 1

ETP{o |c=(2,3,2)} = 5

VTP {o |c=(2,3,2)} = 0

i) Select super-column c’ using p{c’ |c=(2,3,2)}.

ii) Suppose c’ = (2,1,2) selected. → Randomly de-select 2 units from s C2 to form s’.

Table 2b: Probabilities for c = (2,3,2)

Page 24: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

24

4. Example 2 Reduced TP for positive coordination after re-stratification.

Is the matrix of overlaps o(s, s’), within a block, is symmetric?

Consider block {c =(2,3,2), c’ =(2,1,2)} with o(c, c’)=5:

• = 43,758 x 1 x 45 different samples (rows) s

• = 1 x 3 x 45 different samples (columns) s’

For each s, there are exactly 3 samples s’ such that o(s, s’)=5.For each s’, there are exactly 43,758 samples s such that o(s, s’)=5.

Each s’ will get probability of

210

03

33

818

22

210

13

22

453

0.01151

Page 25: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

25

4. Example 2 Reduced TP for positive coordination after re-stratification.

43,758 rows

333445433344543334454

333445433344543334454333445433344543334454333444533344453334445

333444533344453334445333444533344453334445

43,758 rows

16 s’ 16 s’ 16 s’28 s’ 28 s’ 28 s’

Table 2c: Matrix of o(s, s’); block {c =(2,3,2), c’ =(2,1,2)}

Page 26: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

26

4. Example 2 Reduced TP for positive coordination after re-stratification.

c 1,2,2 2,1,2 0,3,2 … 0,0,5 p(c)

2,3,2 5 0.0022 5 0.0015 5 0.0007 … 2 0.0002 0.0124

2,2,2 5 0.0160 5 0.0173 4 0.0006 … 2 0.0022 0.1067

1,3,2 5 0.0055 4 0.0001 5 0.0025 … 2 0.0007 0.0254

… … … … … … …0,0,2 2 0.0001 2 0 2 0 … 2 0.0069 0.0116

p(c’) 0.0897 0.0453 0.0153 … 0.0847 1.0000

Table 2d: Empirical block probabilities for Sequential SRSWOR (PRN)

c’

E [o(s, s’)] V [o(s, s’)] E{o |c=(2,3,2)} V{o |c=(2,3,2)}

TP 3.6494 0.7292 5 0

PRN 3.5602 0.6940 4.3282 0.5746

Table 2e: Expectations

Page 27: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

5. CONCLUSION

Optimal sample coordination is a TP.

For stratified SRSWOR, we can reduce TP by grouping samples.

The groups must be formed so that the matrix of o(s, s’) within each group is symmetric.

The solution and the selection is done in two stages.

Different objective functions can be defined, depending on the goal of the sample coordination project.

Page 28: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

Pour plus d’information, veuillez contacter

For more information please contact

www.statcan.ca

Optimal Coordination of Samplesin Business Surveys

Lenka Mach

E-mail/Courriel: [email protected]

Page 29: Lenka Mach, Statistics Canada Ioana Şchiopu-Kratina, Statistics Canada Philip T. Reiss, New York University Child Study Center Jean-Marc Fillion, Statistics

29

REFERENCES

Ernst, L.R. (1999), “The Maximization and Minimization of Sample Overlap Problems: A Half Century of Results,” Bulletin of the International Statistical Institute, Proceedings, Tome LVIII, Book 2, pp 293-296.

Mach, L., Reiss, P.T., and Şchiopu-Kratina, I. (2006), “Optimizing the Expected Overlap of Survey Samples via the Northwest Corner Rule,” Journal of the American Statistical Association, Vol. 101, No. 476, Theory and Methods, pp. 1671-1679.

McKenzie, B. and Gross, B. (2000), “Synchronized Sampling,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 237-243.

Ohlsson, E. (2000), “Coordination of PPS Samples Over Time,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 255-264.

Royce, D. (2000), “Issues in Coordinated Sampling at Statistics Canada,” ICES II, The Second International Conference on Establishment Surveys, American Statistical Association, pp. 245-254.