10
Auuttul. J. Skrtiat., 17 (l), 1976, 1 2 2 1 A NOTE ON OPTIMUM STRATIFICATION IN SAMPLING WITH VARYING PROBABILITIES RAVINDRA SINGH Punjab Agricultural University, Ludhiana, India Summary Singh and Sukhatme [4] have considered the problem of optimum stratification on an auxiliary variable x when the units from the Merent strata are selected with probability proportional to the value of the auxiliary variable and the sample sizes for the different strata are determined by using Neyman allocation method. The present paper considers the same problem for the proportional and equal allocation methods. The rules for finding approximately optimum strata boundaries for these two allocation methods have been given. An investigation into the relative efficiency of these allocation methods with respect to the Neyman allocation has also been made. The performance of equal allocation is found to be better than that of proportional allocation and practically equivalent to the Neyman allocation. 1. Introduction Let the population under study be divided into L strata and a stratified sample of size n be drawn from this population, the sample drawn from the h-th stratum being of size nh so that X nh=n. It is assumed that the units from the different strata are d r a m with probabilities proportional to the values of the auxiliary variable x and with replacement (ppswr). It is well known that if the study variable y is highly positively correlated with the variable z and the regression line of y on x passes (or nearly passes) through the origin then selecting the units with probability proportional to x provides a more efficient estimate of population mean ?’ in comparison to the simple mean estimate 8. It is eaay to see that an unbiased stratified estimate of the mean 7 is given by L h-1 L A-1 ?j,t=N-’ x ijh, where yh is the usual unbiased estimate of the h-th stratum total P, and is based on nh units drawn from that stratum. If for the A-th stratum yhi is the value of the study variable y for the ith unit, xA, is the corresponding value of the auxiliary variable x and x h denotes the stratum total for x, the variance of the estimate grt is given by Menueclipt received April 30, 1974 ; revised September 4, 1974.

A NOTE ON OPTIMUM STRATIFICATION IN SAMPLING WITH VARYING PROBABILITIES

Embed Size (px)

Citation preview

Auuttul. J . Skrtiat., 17 (l), 1976, 1 2 2 1

A NOTE ON OPTIMUM STRATIFICATION I N SAMPLING WITH VARYING PROBABILITIES

RAVINDRA SINGH Punjab Agricultural University, Ludhiana, India

Summary Singh and Sukhatme [4] have considered the problem of optimum

stratification on an auxiliary variable x when the units from the Merent strata are selected with probability proportional to the value of the auxiliary variable and the sample sizes for the different strata are determined by using Neyman allocation method. The present paper considers the same problem for the proportional and equal allocation methods. The rules for finding approximately optimum strata boundaries for these two allocation methods have been given. An investigation into the relative efficiency of these allocation methods with respect to the Neyman allocation has also been made. The performance of equal allocation is found to be better than that of proportional allocation and practically equivalent to the Neyman allocation.

1. Introduction Let the population under study be divided into L strata and a

stratified sample of size n be drawn from this population, the sample drawn from the h-th stratum being of size nh so that X nh=n. It is assumed that the units from the different strata are d r a m with probabilities proportional to the values of the auxiliary variable x and with replacement (ppswr). It is well known that if the study variable y is highly positively correlated with the variable z and the regression line of y on x passes (or nearly passes) through the origin then selecting the units with probability proportional to x provides a more efficient estimate of population mean ?’ in comparison to the simple mean estimate 8. It is eaay to see that an unbiased stratified estimate of the mean 7 is given by

L

h - 1

L

A - 1 ?j,t=N-’ x ijh,

where yh is the usual unbiased estimate of the h-th stratum total P, and is based on nh units drawn from that stratum. If for the A-th stratum yhi is the value of the study variable y for the ith unit, xA, is the corresponding value of the auxiliary variable x and x h denotes the stratum total for x, the variance of the estimate grt is given by

Menueclipt received April 30, 1974 ; revised September 4, 1974.

A NOTE ON OPTIMUM STaATIFICA‘TION 13

Whatever be the method of allocation the variance is clearly a function of strata boundaries. The question as to whether the relationship between y and x can also be profitably utilized to further increase the accuracy of the estimate of population mean P by utilizing the techniques like stratification on the same variable x was considered by Singh and Sukhatme [4]. For this purpose they considered Neyman method of sample allocation and also obtained the method of determining the approximately optimum strata boundaries (AOSB) for this allocation method. In practice the use of Neyman allocation requires the advance knowledge of the strata variances C$ (h=1,2, . . . ,L) ( d e h e d in (1.3)) which is in general not available. It is, therefore, important to examine the possibdity of having some other equally efficient allocation method which also does not require any information about the strata parameters. One such procedure is the equal allocation method. In this paper we examine the relative performance of this allocation in comparison to the Neyman allocation. For the sake of widening the scope of present investigation we have also considered the case of proportional alloca- tion which is very popular with survey statisticians.

2; Minimal Equations and their Approximate Solutions Under the superpopulation model (2.1) of [ a ] it can be easily

seen that the expected value of the variance in (1.2) with proportional and equal allocations of the sample to Merent strata is given by

2 L

h - 1 (2.1) Vp(VJt)=n-’ C W,(phxp,e-phc)

and L

h - 1 (2.2) v E ( q J t ) = h - ’ c Wi(pkpt ,e -PL)

where O(s) =(c2(x) +cp(x))/s and phe etc. denote the expected values of the correspondmg functions of x in the h-th stratum and W , is the proportion of the population falling in that stratum.

Proceeding on the lines of [43 it is easy to see that the minimal equations giving optimum strata boundaries corresponding to the minimum of variance in (2.1) are for i = h + l and h=1,2, . . ., L-1,

(2.3) (e(x,)pnx+x,~,e-phx~.*e)-[C2(x*) -(c(x*) - ~ t , c ) ~ l

=(e(x,)pix +xhpie-pizpie) - [c2(x*) -(c(%,+)

and their approximate solutions are given by the following c u m q p r ( x ) rule :

cum. qpa(x) rule : I f the function p , ( x ) = ( (c ’ ( s ) )2 -O’ (x ) ) f ( x ) is bounded and possesses first two derivatives for all x in (a , b ) then for a given value of L taking equal intervals on the cum. q p , ( x ) yields approximately optimum strata boundaries.

Similarly in the case of equal allocation the minimal equations are obtained as

(2.4) W h [e(sh)phx +“hplO --2c(sh)phcl =wi[e(xh)pix +xhpie--ac(xh)~icl

14 EAVMDRA SINGH

where i = h + l and h=1,2, . . ., L--1. The approximate solutions to these equations are obtained by the cum. fd? rule given below.

cum. fdcp rule: If the function f (x)z/cp(z) is bounded and possesses f is t two derivatives for dl x in (a, b ) then for a given vdue of L taking equd intervals on the cum. f (z)dq(s) yields approx- imately optimum strata boundaries.

3. The Relative Efficiency In this section we shall find approximate expressions for the

relative efficiencies of proportional and equal allocations in comparison to the Neyman allocation. It will be assumed that when following any particular allocation method the strata boundaries are obtained by following the corresponding stratification rule. Thus for example the relative efficiency for equal allocation will be given by

where the variance in the numerator corresponds to the Neyman allocation and is based on strata obtained by using the stratification rules given in [a] and the variance in the denominator corresponds to equal allocation and is based on strata obtained by using the cum. f dq~ rule given in this paper. For finding the expressions for the relative efficiency we shall make use of the following two lemmas.

Lemma 3.1. If (Xh-1,xh) are the boundaries of h-th stratum and .E-h =xh - x ~ - ~ , then

3

- e w u w t ] (1 +o(G))

Lemma 3.2. If ( X ~ - ~ , X ~ ) are the boundaries of h-th stratum and Kh =xh - x ~ - ~ , then

where p , ( t ) =((c'(t))2-O'(t)) f (t) . Now for the strata boundaries obtained by cum. q p , ( x ) rule we have

A NOTE ON OPTIMUM STBbTIFICATION 15

b s" V(Pl(W~. = ja V(P,(t))df/L. oh- 1

(3.4)

Therefore, if the terms of order O(m4), m=sup (Kh) , are neglected we

have approximately (a,b)

b

(3.5) nvP(&A+ ( t )p( t )d t+( l2Lt ) -L( 1: v(P4(t))dt)3.

Also from the relation (4.3) of [ 4 ] we have

Thus we get from (3.5) and (3.6) the relative efficiency of pro- portional allocation as

If the frequency distribution of x is available in the form of M classes and the functional forms for cp(t), p 2 ( t ) etc. are known then one can get the approximate value of R.E. for any L by finding the values of A , B , C and D from the frequency table. For this purpose we can

approximate A by C fid(cp(zi)) where zi is the mid value of the i-th

class interval, and similar expressions can be used for B, C and D. In case of equal allocation the approximately optimum strata

boundaries (AOSB) are obtained by taking equal intervals on the cum. f (s) l/p(z) ; therefore,

M

i = l

If the effect of the differences between AOSB for the Neyman and

equal allocations on the value of $'(p,(t))dt are neglected, the

value of this term for the AOSB for equal allocation can be

approximately taken as v ( p , ( l ) ) d t / L . Then we get from Lemma

3.2 and relation (3.8) after dropping the terms involving higher powers of strata widths

1."

16 RAVINDRA SINGH

(3.9)

and the vairiance for equal allocation becomes

From the relations (3.6) and (3.10) it is seen that the variances for the two allocations are approximately same. The two allocations are, therefore, approximately equally efficient for large values of L and for the situation considered in this paper. This observation is also supported by the numerical investigation made in the next section.

4. Numerical Investigation For numerical investigation in this paper we consider the same

four densities for 2 which were considered by Singh and Sukhatme [4]. The truncation of exponential and right normal densities is also same as in [4]. In this case also the AOSB are obtained by using the frequency distribution versions of different stratiiication rules. For calculating the expected value of z-l (needed when g=O) in different strata we have used numerical integration methods for exponential and normal distributions. Each stratum width in these cases waa divided into 100 equal intervals. In [4] since nVN(gst) , the variance for Neyman allocation, used expected values of x-1 numerically calculated by dividing the strata widths into equal intervals which numbered different from 100, these variances were recalculated on the basis of 100 equal width divisions of the strata widths.

In the following four tables are given the strata boundaries, variances and the relative efficiencies of the proportional and equal allocations for the three values of g (i.e. 0 , l and 2 ) and for four density functions. The relative efficiency of the equal allocation is found to be generally more than that for the proportional allocation. The minimum value for the relative efficiency of equal allocation is found to be 99.93 per cent and hence under the situations considered one can safely use equal allocation of the sample to different strata with cum.fl/cp rule for finding the corresponding AOSB in place of going in for the Neyman allocation. The minimum value for the relative efficiency of proportional allocation is 82-36 per cent. The proportional allocation, therefore, does not qualify for recommendation in place of Neyman or the equal allocation. The relative efficiency for the proportional allocation is also found to decrease with the increase in the value of g. In some cases the relative efficiencies are found to be more than 100 per cent. This observation has also been made earlier by Singh and Parkash [5] and Singh [6]. This happens because the variances used in calculating the relative efficiencies are based on approximate solutions of the minimal equations and not on the exact solutions. Also the relative efficiency for these values of c(z) and cp(z) does not depend on the correlation value.

In the case of proportional allocation no boundaries are gven in the tables €or g = l because in this case any set of boundaries is optimum.

TA

BL

E 1

AO

SB

and

P

erce

ntag

e R

e&&

E

fic

iew

(R

.E.)

: R

cc

taw

lar

Dis

tri6

udi

on

1.443

1.283, 1.616

1.208, 1.443, 1.706

1,164, 1.346, 1-646, 1.763

1.136,1-283,1-443, 1.G16.1.801

0.26247

100.00

0.25110

99.99

0.2

60

62

100.00

0.2

60

40

100.00

0.26027

100.00

0.25

006

0.26

005

0.2

60

05

0.26006

0.26006

99-3

0 99

-17

99.12

99

-09

99 * 0

8

1.600

1.260. 1.600, 1-

760

1.200, 1

.40

0, 1.000, 1

.80

0

1~10

7,1~

333.

1~60

0.

1.607, 1.833

1.333, 1.067

0.24710

97-3

0

0,24Q44

96.64

0.24044

96.67

0.24972

90.64

0.24900

90-8

1

-

L

Pro

por

tion

al A

lloc

atio

n

Equ

al A

lloca

tion

AO

SB

R

.E.

0.26247

0.26 108

0*2

60

61

0*26039

0-2

60

28

1-600

1.333. 1.667

1.260, 1-600. 1.760

1.200, 1.400, 1.600, 1-800

1,167. 1.333, 1.600, 1.667, 1.833

0.26267

0.261 16

0.26066

0.26042

0.26020

99-9

6 99.97

90. D

B 99- 9

9 100.00

99

.98

0

9.9

9

99 * 9

8 9

9.8

9

99 * Q

D

0

0.24830

0.24707

0.24784

0.24777

0.24174

1.642

1.373, 1.701

1 * 286, 1 *

642, 1 *

778

1,231, 1,442, 1,838, 1.824

1.194. 1.373, 1.642, 1.701. 1.864

0.24834

0.24790

0.24788

0.24780

0.2477e

1

0.24100

0.24106

0*24100

0.24106

0.24106

1.681

1-414, 1.732

1.323. 1.681, 1.803

1.266, 1.483, 1-073, 1.844

1.224, 1.414, 1.681, 1.732, 1-871

10

0*0

0

100-00

100*00

10

0*0

0

lOO*

OQ

0.24100

0.24 106

0.2

41

00

0- 24

108

0.24106

2

Tm

m

2 AOSB a

nd

Per

cen

tage

Rel

ativ

e E

fick

ncy

(R

.E.)

: RGU T

riu

np

lar

Dis

trib

uti

on

AO

SB

1 * 368

1,226, 1.613

1.164, 1.368.

1-8

02

1.129.1.276, 1.44a.1.660

1.106.1.226, 1.368, 1.613.1.702

U -

0

"VP

G,,

)

0.26206

0.26096

0.26064

0.26036

0.26026

L

1 2

nv&

,)

0.26234

0.26094

0- 2 6066

0.26033

0.25024

2 0.24869

3 0.24836

4 0.24824

6

0.24820

6

0.24818

2 0-24242

3 0.24242

4

0.24242

6

0.24242

6 0

.24

24

2

--

~

1 - 406

1.263, 1 .a62

1.194. 1.406, 1.647

1-16

4 1.319 1-408, 1.702

1.128, 1*203.1*406.1*602.1.740

0.2

60

00

0*

2600

0 0.26000

0.26000

0.26000

0,24786

0- 24901

0.24942

0.24963

0.24974

RE.

100.11

100~

00

100.01

99.99

100*

00

99-4

4 99.34

99.30

99-2

7 9

9.2

8

97.80

97-3

6 97.10

97.11

97-0

7

Eq

ual

Ah

cati

on

AO

SB

1,293

1,184, 1.423

1.134, 1.293, 1

*60

0

1.108, 1.226, 1.368, 1.663

1.087. 1.184. 1.293, 1.423. 1.592

1,320

1.204, 1.462

1,160, 1.320, 1.630

1.119. 1-249, 1

.39

6, 1.682

1.098, 1.204. 1.320. 1.462, 1.820

.

1.347

1 * 226, 1.482

1.1

68

. 1.347, 1

.66

8

1.134. 1.274, 1.426, 1-609

I.112, 1

.22

6, 1.347, 1-482, 1

.86

6

0.26212

0.26100

0.2

6Q

69

0.26039

0.26028

R.E

.

100.09

90

.98

9

9-8

9

99.98

99.98

0.24863

0.24835

0.24826

0.24820

0-2

48

18

0-24

242

0.24242

0.24242

0-24242

0.24242

99-8

8 100*00

100 *

00

100~

00

100~00

100.

00

100.00

10

0~

00

10

0~00

1

00

~0

0

TULE

3

AO

SB

and

Per

cen

iap

R&

iw

Efl

cim

cy (

R.B

.) : h

kp

olu

rda

1 D

ietr

ibu

iion

L

2 3 4 6 6 2 3 4 6 6

Q P

rop

orti

onal

All

ocat

ion

'LV

"Y,r)

AO

SB

nV

.J$s

t)

R.E

.

0.26

906

1.9

19

0.

2608

4 9

9.3

2

0.26

269

1.47

0, 2

.68

1

0-2

64

31

9

9-3

2

0.26

076

1.2

94

,1*9

19

,3.0

07

0

-26

24

6

99

-32

0

.24

89

1

1-2

18

,1*6

34

,2-2

69

,3.3

46

0

.26

06

0

99

.33

0.

2482

2 1

.18

2,1

*47

0,

1-9

19

,2*6

61

, 3

.80

9

0.2

49

90

9

9.3

3

0.23

476

0.2

44

10

96

.17

0.23

308

0.2

44

10

96

.48

0.23

243

0.2

44

10

9

6.2

2

0.23

216

0.2

44

1 0

9

6.1

1

0.23

196

0.2

44

10

96

.02

0

~~~~

0,2

36

80

0.

2432

9 0

-24

61

2

0,2

47

48

0

.24

82

3

1

~

86

-68

8

4.0

3

83.0

7 8

2-8

1

82.3

6

I A

OSB

1 * 6

92

1.41

0, 2

.07

2

1.28

9,

1.69

2.

2.37

4 1.

224.

1.

607.

1.

914.

2

.6

1.62

6, 2

.36

1

1.38

2.

1.87

9.

2.6

74

1.

297,

1.

669.

2.

142,

2

.9

I 1.242,

- 1

.62

6,

1.87

9,

2.3

2

2 3 4 6 6

0.20

444

0.20

444

0.20

444

0.20

444

0.20

444

' 2

.66

2

1 Zt;

2ii, 3.8

13

1.63

2, 2

.178

, 3.000, 4

.14

1

1.43

8,

1.04

7,

2.66

2,

3.3

36

, 4.

382

~~

~

2.11

6 1.

691,

2

.66

9

1.60

4, 2

.116

, 3

.01

4

1.40

1,

1.86

1. 2

.420

. 3.2

1.33

2.

1.69

1,

2.11

6.

2.6

II

I

I I

I

P -

0

0-24

686

O-!Z4480

0.24440

0- 24420

0.24410

0.22477

0-22477

0-22477

0-22477

0.22477

1

-

2 -

2.146

1.731,2*636

1.642,2.146,2.936

1*431,1*891,2.426,3.148

1.368, 1.731,2.146,2.e36,3.308

TA

BU

4

AO

SB

and Pcrcer&uge

R&

ivtr

Eflciency

(R.B

.):

Nm

md

Dis

ttib

uth

1.786

1.618, 2.099

1.390, 1-786, 2.286

1.314, 1.823, 1.964, 2.424

1.283, 1

.61

8, 1-78e,2-099.2.623

Pro

por

tion

al A

lloo

atio

n

AO

SB

0.24660

0.24463

0.24414

0-24396

0.24386

0.26802

1.883

0.26376

1.623.2.304

0-26219

1-371, 1-883,2.690

0-26142

1.296,1.664,2*114, 2.794

0.26099

1.2

40

,1.6

23

,1.8

83

,2.3

04

,2.9

66

---,

~ ~

~-

i.474. 1

.80

4, 2.423

1.386, 1.720, 2.092, 2-669

1~

32

7,1

~6

16

,1~

00

4,2

~2

26

,2~

66

6

2 3 4 6

6 2 3 4 6 6 -

.

0.22478

0.22478

0,22478

nv

ptr

,,)

0.26802

0.263'18

0.26220

0.26143

0.26100

0.26002

0.26002

0.26002

0*26002

0.26002

0.24149

0.24683

0.24764

0.24838

0.24886;

R.E

.

10

0~

00

100*00

99.99

99.9

9 99,99

98.34

97.91

97.75

97.67

97 * 63

93-0

8 91

-43

90.80

90.60

00.32

Equal

All

ocat

ion

AO

SB

1.G79

1.433, 1.971

1.321, 1.679. 2.169

1.264, 1.626, 1.848, 2.286

1.211, 1.433, 1.679, 1.971, 2.394

nV

g(S

I,t)

0.26782

0.26391

0.26240

0.26

166

0.26 122

1.904

0.22477

1-8

16

. 2

.22

6

I 0.2

24

77

R.E

.

101.44

100.81

100.64

100.39

100.37

100.14

100-

11

100*10

100*01

100.10

100-0

0

100~00

100~

00

99.99

99.99

A NOTE ON OPTIMUM STRATIFICdTION 21

Acknowledgement The author is grateful t o the referee for certain valuable

suggestions. References

[l] Singh, Ravindra, and Sukhatme, B. V. (1969). “Optimum stratification.” Ann. Inst. Statist. Math., 21, 516-628.

[2] Singh. Ravindra (1971). “An expression for the variance of the eatimate of mean in stratihd simple random sampling.” Mimeograph, Department of Mathematics and Statistics, Punjab Agricultural University, Ludhinnn (Punjab).

[3] Singh, Ravindra, and Sukhatme, B. V. (1972). “ A note on optimum atretiha- tion.” J a r . I d . Soc. Ag. Stcrtiet., 24, 91-98.

[4]Singh, Ravindra, and Sukhatme, B. V. (1972). “Optimum stratification in sampling with varying probabilities.” Ann. Imt. Slatiat. Math., 24,486-494.

[5] Singh, Ravinctfs, and Dev Parkash. “Optimum stratification for equal allocation. Ann. Inst. Slatist. Math. (To appear.)

[6] Singh, Ravindra. “ A note on opthum stratification for equal docation with ratio and regression methods of wtimation.” J a r . I d . SOC. Ag. SkJiat. (Submitted for publication.)