19
Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 Haraldsen & Snijkers 2016 1 Sampling Business Surveys

Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Haraldsen & Snijkers 2016 1

Sampling Business Surveys

Page 2: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

TSE approach: Quality = 1 – ∑(Bias2 + Variance)

)]VVV()BBBBB[(1 gsinocesPrtMeasuremenSampling

2

gsinocesPrtMeasuremeneNonresponsFrameionSpecificat

Population

Sample frame

Respondents

Construct

Measurement instrument

Response

Adjusted Edited

Data delivery

Sample

Coverage errors

Sampling errors

Nonresponse errors

Adjustment errors

Validity

Measurement errors

Processing errors

WHO? WHAT? HOW?

Development & testing

Specification

Survey communication & management

Data delivery & documentation

Coding, cleaning & data integration

Survey Cycle Extended Survey Cycle

2 Haraldsen & Snijkers 2016

Page 3: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Administrative --> Statistical Business Register. Norway 2009

Haraldsen & Snijkers 2016 3

0.0 %

20.0 %

40.0 %

60.0 %

80.0 %

100.0 %

Identied Legal Entities Errors subtracted Passive subtracted

100.0 % 93.0 %

49.0 %

Page 4: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample

4 Haraldsen & Snijkers 2016

• IDENTIFY • CLASSIFY

• FRAME

• SAMPLE

Inactive included Active excluded One counted as several (multiplicity) Several counted as one (clustering) Inaccurate classification Misclassification

UPDATE

Page 5: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample

5 Haraldsen & Snijkers 2016

• With computerized questionnaires the relationship between cost and sample size is weaker than before

• For the individual company sample size does not affect response burden

• For the business world, however, it does

Page 6: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Sampling:

Haraldsen & Snijkers 2016 6

• Stratification as default

• Complete enumerated strata

• Partitions based on the most important domains in estimation

• Keep the number of stratifiers (and strata) low

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample

Page 7: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Stratification =

• Decide what auxiliary variables to use as stratification variables

• Determine the number of strata

• Assign units to strata • Allocate the number to

be sampled from each strata (sample allocation)

• Take constraints (like expected nonresponse) into consideration

• (Random) Sampling within strata

= Industry code + no of employees?

Haraldsen & Snijkers 2016

7

= Which predict the survey outcome well

the partition of the population in such a way that the elements within a stratum are as similar as possible and the means of the strata are as different as possible.

= The point where variance decrease flattens out

Page 8: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Measures of variability

Haraldsen & Snijkers 2016 8

Range 𝑋 = 𝑋𝑀𝑎𝑥 − 𝑋𝑀𝑖𝑛

Variance 𝑠2 = 1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Standard deviation

𝑠 = 1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Coefficient of Variation

𝑐𝑣 = 1

𝑥

1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Page 9: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Stratification by intuition: Dalenius-Hodges cumulative method for determining stratum boundaries

0

5

10

15

20

25

30

35

t = 29,9

t/4 = 7,5

2t/4 = 14,9

3t/4 = 22,4

Page 10: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Stratification by intuition 2: Splitting large groups

1-10 > 10-20 > 20-30 > 30-40 > 40-50

44 52 23 21 17

6,6 13,8 18,6 23,2 27,3 35,6 39,7 43,3 43,3 45,5 45,5 45,5 47,1

1 1 2 2 3 3 4 4 4 4 4 4 4

0

5

10

15

20

25

30

35

40

45

50

t = 47,1

t/4 = 11,8

2t/4 = 23,5

3t/4 = 35,3

Page 11: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Alternatives

Haraldsen & Snijkers 2016 11

Varying sample

variances Hidiroglou (1986) Lavallée – Hidroglou (1988)

Power Allocation Allocation = proportional population standard deviation estimate x stratum population size (Neyman 1934)

Cochran (1977)

𝑛𝑖 = 𝑛𝑠ℎ𝑁ℎ 𝑠ℎℎ 𝑁ℎ

= internal + external cost (response burden)

Probability Proportion (to size) Sampling: Direct rather than stratified sampling proportional to a suitable auxiliary information

0 ≤ s(eed) ≤ X/n n1≈ s n2≈ s + X/n

n3≈ s + X/n + X/n

nn≈ ………

Page 12: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Hard to sample:

• A range of products (or services) that vary between none and a lot – Large samples or samples based on census

• Prices (transactions) of a range of products (or services) – Multistage sampling

• Estimates of rare characteristics – Satellite Registers

– Two stage sampling/Filter questions

Haraldsen & Snijkers 2016 12

Page 13: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Cutoff sampling

Haraldsen & Snijkers 2016 13

93.1 %

6.4 % 0.5 %

28.5 % 31.8 %

39.7 %

23.9 % 24.6 %

51.6 %

0.0 %

10.0 %

20.0 %

30.0 %

40.0 %

50.0 %

60.0 %

70.0 %

80.0 %

90.0 %

100.0 %

Small = 0-9 employees Medium = 10 -99employees

Large = 100 + employees

No of businesses No of employees Economic turnover

Variable rather than unit coverage: • The large are fully

enumerated • The middle size are

sampled • The small are left out

Common in spite of immediate disadvantages: • Estimates among the small based on assumptions • Biased estimates of the total • Changes among the smallest go unnoticed • How to determine the cutoff?

Page 14: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Sample rotation and overlap

Haraldsen & Snijkers 2016 14

Variance of change = var (t1) + var (t2) – 2cov(t1,t2)

Panel element

Representativity +/- Learning effects (Perceived) Response Burden

Common,

Frozen Frame

Page 15: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Sample rotation Response Burden

• For the business world response burden is unaffected by sample rotation

Haraldsen & Snijkers 2016 15

Participation scenario, 2 surveys

Actual burden (1,0), B1 (0,1), B2 (1,1), B1B2 Expected burden

Allow overlap p1(1-p2) (1-p1)p2 p1p2 p1B1 + p2B2

No overlap p1 p2 0 p1B1 + p2B2

Page 16: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Sample rotation Response Burden • For the business world response burden is

unaffected by sample rotation

Haraldsen & Snijkers 2016 16

• Sample/population size decides the room for rotation

• Actual fairness is unrealistic

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 1 0 1286 1135 1256 1124 1635 Sample no 2 0 400 947 2094 2236 1982 Sample no 3 0 948 1509 838 1124 1239 Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Page 17: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Sample rotation Perceived Response Burden

• Compared to those not sampled, fairness is a weak argument

• For most businesses response burden appears slightly more evenly distributed

Haraldsen & Snijkers 2016 17

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982

Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Neither 4 nor 5 78 % 44 % -25 % -50 % -100 % -100 % Both 4 and 5 0,23 % 3 % 25 % 50 % 100 % 100 %

Page 18: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Sample rotation Perceived Response Burden

• Compared to those not sampled, fairness is a weak argument

• For most businesses response burden appears slightly more evenly distributed

• Expected survey holidays are probably the most convincing argument

Haraldsen & Snijkers 2016 18

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Available for rotation 273276 8214 To be rotated 37914 8734 Survey holiday calculation 7 Conditional holiday calculation 9 2 0

Page 19: Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 · Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

“A design that is robust to nonsampling errors will often be better than a highly optimized design that cannot be realized in practice”

Paul Smith’s summary of chapter 5

Haraldsen & Snijkers 2016 19