9
Austral. 1. Statist., 19(2), 1977,96104. A NOTE ON OPTIMUM STRATIFICATION FOR EQUAL ALLOCATION WITH RATIO AND REGRESSION METHODS OF ESTIMATION' RAVINDRA SINGH Punjab Agricultural University, Ludhiana, India. summary When the information on a highly positively correlated auxiliary variable x is used to construct stratified regression (or ratio) estimates of the population mean of the study variable y, the paper considers the problem of determining approximately optimum strata boundaries (AOSB) on x when the sample size in each stratum is equal. The form of the conditional variance function V(y/x) is assumed to be known. A numerical investigation into the relative efficiency of equal allocation with respect to the Neyman and proportional allocations has also been made. The relative efficiency of equal allocation with respect to Neyman allocation is found to be nearly equal to one. 1. Introduction For the case of simple random sampling, it is well known that if we have information on an auxiliary variable x highly correlated with the study variable y, the population mean can be estimated with much greater efficiency by using a regression estimate in place of the simple mean estimate. The question as to whether the information on the auxiliary variable x can also be used for further increasing the effi- ciency of the estimate by adopting techniques like stratification was considered by Singh and Sukhatme [5]. For allocating the sample to different strata they have used the Neyman allocation method. In order to use this method of allocation it is essential that we have the numerical values for certain strata parameters. In practice this infor- mation is generally not available and, therefore, it is of interest to investigate the relative efficiency of certain other methods of sample allocation which do not need such strata parameter values. Two such methods are the proportional and equal allocation methods. Under the regression model considered in this paper the variance of the stratified regression (or ratio) estimate of the population mean under the prop- ortional allocation of the sample to different strata is constant and is, Manuscript received November 12, 1975; revised January 26, 1977.

A NOTE ON OPTIMUM STRATIFICATION FOR EQUAL ALLOCATION WITH RATIO AND REGRESSION METHODS OF ESTIMATION

Embed Size (px)

Citation preview

Austral. 1. Statist., 19(2), 1977,96104.

A NOTE ON OPTIMUM STRATIFICATION FOR EQUAL ALLOCATION WITH RATIO AND

REGRESSION METHODS OF ESTIMATION'

RAVINDRA SINGH Punjab Agricultural University, Ludhiana, India.

summary

When the information on a highly positively correlated auxiliary variable x is used to construct stratified regression (or ratio) estimates of the population mean of the study variable y, the paper considers the problem of determining approximately optimum strata boundaries (AOSB) on x when the sample size in each stratum is equal. The form of the conditional variance function V ( y / x ) is assumed to be known. A numerical investigation into the relative efficiency of equal allocation with respect to the Neyman and proportional allocations has also been made. The relative efficiency of equal allocation with respect to Neyman allocation is found to be nearly equal to one.

1. Introduction

For the case of simple random sampling, it is well known that if we have information on an auxiliary variable x highly correlated with the study variable y , the population mean can be estimated with much greater efficiency by using a regression estimate in place of the simple mean estimate. The question as to whether the information on the auxiliary variable x can also be used for further increasing the effi- ciency of the estimate by adopting techniques like stratification was considered by Singh and Sukhatme [5 ] . For allocating the sample to different strata they have used the Neyman allocation method. In order to use this method of allocation it is essential that we have the numerical values for certain strata parameters. In practice this infor- mation is generally not available and, therefore, it is of interest to investigate the relative efficiency of certain other methods of sample allocation which do not need such strata parameter values. Two such methods are the proportional and equal allocation methods. Under the regression model considered in this paper the variance of the stratified regression (or ratio) estimate of the population mean under the prop- ortional allocation of the sample to different strata is constant and is,

Manuscript received November 12, 1975; revised January 26, 1977.

A NOTE ON OPTIMUM STRATIFICATION 97

therefore, not affected by changes in strata boundaries or the number of strata (Remark 4, Section 4). It is, therefore, of interest to investi- gate the relative efficiency of the equal allocation method with respect to the Neyman allocation method. The present paper deals with this problem.

For theoretical development, let us assume that the population under study is infinite and is to be divided into L strata. A stratified simple random sample of size n is drawn from it, the sample drawn from the hth ( h = 1,2, . . . , L ) stratum being of size n h so that Ck2, nh = n. If the regression of y on x is linear and the regression coefficient does not vary much from stratum to stratum, the combined and separate regression estimates of the population mean are nearly equally efficient. The combined estimate has an additional advantage of smaller bias, over the separate estimate especially when the sample sizes within the strata are small. Because of these considerations, we shall consider in this paper the combined regression estimate.

The combined regression estimate of the population mean P is given by

1 L y l c = W h Y h + b x- wh&, ( h = l

(1.1) h = l

where for the hth stratum w h =proportion of units in the stratum, y h =sample mean for the variable y, z h =sample mean for the variable x, shxy = sample covariance between x and y, sk = sample mean square for the variable x,

and x = population mean for the variable x. We shall assume a knowledge of x. Now the large sample var-

iance of the estimate ylc is to the first order of approximation given by L

V ( y 1 , ) = % ( ‘ d y - 2 @ h x y f p 2 d / n h , h = l

(1.2)

where in the hth stratum a2hy=variance of y, crhy =covariance be- tween x and y, &= variance of x, and

If the sample sizes nh are equal in each stratum then the variance in (1.2) becomes

L

(1.3) V ( y l c ) ~ = Ln-’ 1 w i ( O i y - 2 f i u h x y f P z a 3 . h - 1

98 RAVINDRA SMGH

The variance expression in (1.3) is clearly a function of the strata boundaries and can be reduced by suitably choosing them. The prob- lem of determining the optimum strata boundaries was first considered by Delenius 113 and Hayashi, Maruyama and Isida [Z]. Subsequent work in this direction has been reviewed by Singh [4]. We now consider the problem of determining these boundaries for the com- bined regression estimate in stratified sampling with equal allocation method.

2. Minimal Equations and their Approximate Solutions

Under the super-population model (2.1) of [5 ] it can be easily seen that the expected value of the variance in (1.3) with equal allocation of the sample to different strata is given by

L

V(?lc)E = Ln-' 1 w',(d.k+ phq) h = l

(2.1)

where A (x) = c(x) - px. The variance in (2.1) above is same as the one obtained by Singh

and Parkash [6] with c(x) replaced by A(x). The minimal equations giving optimum strata boundaries and the methods of finding their approximate solutions in this case are, therefore, the same as those obtained in [6].

The regression estimate is usually employed when the regression of y on x is linear and is of the form

(2.2) y = (Y + px.

Therefore, in such cases c(x) = CL + px and h(x) = a so that vt,+ = 0 and the variance V(?lc)E reduces to

(2.3)

and the minimal equations giving the optimum strata boundaries become

(2.4) wh[(p(xh) + Phq] = wi[V(xk) + Pip],

where i = h + 1 and h = 1,2 , . . . , L. The approximate solutions to the equations in (2.4) are given by the following cum fJq rule of Singh and Parkash [6].

Cum fJq rule. If the function f(x)J((p(x)) is bounded and its first two derivatives exist for all x in (a, 6 ) with ( b - a ) < m , then the approximate solutions to the minimal equations in (2.4) are obtained by taking equal intervals on the cum f(x>J((p(x)).

A NOTE ON OPTIMUM STRATIFICATION 99

3. The Relative Efficiency

In order to find an expression for the relative efficiency of equal allocation with respect to the Neyman allocation, we shall make use of the following lemma.

Lemma 3.1. If ( X h - 1 , x h ) are the boundaries of the hth sfraniurn and K h = x h - & - I , then

(3.1)

where

p3( t ) = f(t) cp”(t>l(cp(t))”‘.

The lemma can be easily proved by using the Taylor series

As in the case of Neyman allocation the AOSB are obtained by expansions of different terms in (3.1) about the point x h .

taking equal intervals on the cum 1 ( p 3 ( x ) ) ( [ S ] ) , therefore b

S‘(p,(t)) dt = L-’ I. S‘(p,(t)) dt. (3.2) I-,

Hence we get approximately after neglecting the terms of order

O(rn4), where rn =su K h , (a .8 b f wh’/(phq)z la J(q(t))f( t) dt

h = l

= A + B/96LZ, where

and

Thus we have €or the variance under Neyman allocation

I n the case of equal allocation the AOSB are obtained by taking

2

100 RAVINDRA SINGH

equal intervals on the cum f ( x ) J ( q ( x ) ) , therefore, =h

(3.4) I,, J(cp(t>)f(t) dt = 1 J(cp(t))f(t) dtlL.

If the effects of the difference between AOSB for the two allocations on the value of A_, 3 ( p 3 ( f ) ) dr are neglected, the value of this term for the AOSB for equal allocation can be approximately taken as Jt 3(p3(t)) dt/L. Then we get after dropping the terms involving higher powers of strata widths

(3.5) WhJ(khlp) (A+ B/96L2)/L, and the variance for the equal allocation becomes

L nV(YIJE = L 1 W&,s = ( A + B/96L2)'.

From the relations (3.3) and (3.6) it is seen that the variances for the two allocations are approximately same. The two allocations are, therefore, nearly equally efficient. This observation is also supported by the numerical investigation made in Section 5 of the paper.

h-1 (3.6)

4. Some Further Remarks

(i) The various other methods of finding the approximate solutions to the equations in (2.4) can be obtained by proceeding on the lines of Singh and Sukhatme [3].

(ii) In this paper we have not considered the situation where the conditional variance function p(x) is to be estimated from the data.

(iii) As the cum f J q rule does not depend on the function h ( x ) the same rule can be used to determine the AOSB in situations where the regression function c(x) is not linear. Also the same rule is applicable for finding the AOSB for the stratified ratio estimate with equal allocation of the sample to different strata.

(iv) When the function c ( x ) is linear, the variance for the estimate jlc with proportional allocation of the sample to different strata becomes equal to the expected value of the function q ( x ) in the whole population for any value of L. With this allocation, there- fore, the stratification does not result in any reduction of the variance V(ylc).

(v) If q ( x ) is of the form q ( x ) = it is interesting to note that the AOSB so obtained do not depend on the extent of correlation between the two variables. They only depend on the density function of x and on the value of g.

A NOTE ON OPTIMUM STRATIFICATION 101

TABLE I AOSB and the Var iance nV(y,,),

g = 1 g = 2 p.d.f. L

AOSB nV(?,,)E AOSB n V(V,,)E

2 1.5416 0.027590 3 1.3733,1.7012 0.027551 4 1.2852,1.5416, 0.027536

5 1.2308, 1.4418, 0.027530

6 1.1940, 1.3733, 0.027526

1 1.7782

1.6383,1.8236

1.5416,1.7012, 1.8536

~ ~~~

1.5810 0.027062 1.4140, 1.7319 0.026908 1.3226,1.5810 0.026851 1.8027 1.2647,1.4831 0.026824 1.6731, 1.8438 1.2245,1.4140 0.026809 1.5810,1.7319 1.8707

2 1.3198 0.018419 3 1.2039, 1-4523 0-018398 4 1.1501.1.3198, 0.018390

1.5298 2 5 1.1191,1.2486, 0.018387

1.3965,1.5818 6 1.0985,1-2039 0.018385

1.3198,1.4523 1.6198

1.3474 0.018108 1.2262, 1.4821 0.018027 1.1684, 1.3474 0.017997 1.5582 1.1342,1.2737 0.017983 1.4261,1.6088 1.1116, 1.2262 0.017975 1.3474, 1.4821 1.6453

2 1.8794 0.262415 3 1.526L2.3514 0,260566 4 1.3815.1.8794 0.259855

5 1.2971,1.6590, 0.259491

6 1.2421,1.5261, 0.259284

3 2.6737

2.1419,2.9179

1.8794,2.3514, 3.1135

2.1147 0.237950 1.6914,2.6587 0.230950 1.5043,2.1147 0-228335 3.0143 1.4007, 1.8514 0.227053 2,4198, 3.2808 1.3318, 1.6914 0.226334 2.1147, 1.6587 3.488 1

2 1.7859 0.118640 3 1.5177,2.0994 0.118168 4 1.3901.1.7859, 0.117982

5 1.3139,1-6234 0.117894

6 1.2632.1-5177 0.117842

4 2.2862

1.9639,2-4245

1.7859,2.0994 2.5231

1.9045 0.112304 1.6159,2.2252 0.110464 1.4738, 1.9045 0.109750 2.4231 1.3855, 1.7187 0.109398 2.0919,2.5594 1.3266.1.6159 0.109147 1.9045,2.2252 2.6663

102 RAVINDRA SINGH

We now consider the numerical investigation into the relative efficiency of the equal allocation method with respect to the Neyman allocation. For the sake of completeness the relative efficiency of equal allocation with respect to proportional allocation has also been calcu- lated.

5. Numerical Investigation

For numerical investigation in this paper we consider the same four densities for x which were considered by Singh and Sukhatme [ S ] . These are

(i) Rectangular: f(x) = 1, 1 s x C 2 (ii) Right Triangular: f ( x ) = 2(2 - x), 1 s x 6 2

(iii) Exponential: f(x) = e-'+', 1 s x

The truncation of exponential and right normal densities, the form of regression of y on x and the form of the conditional variance function

(iv) Right Normal: f(r) = ( 2 / ~ ) $ e-(x-')1'2 , lsxsm.

TABLE I1 Percentage Relative Efficiency of Equal AI-

location

R.E.(N) R.E.(P) p.d.f. L

g = l g=2 g = l g=2

2 99.99 99.87 100.67 102.61 3 99.99 99.93 100.81 103.19

1 4 99.99 99.95 100.86 103.41 5 99.99 99-97 100-89 103.52 6 99.99 99.98 100.90 103.57

2 99.98 100.01 100.55 102.26 3 100.00 99.99 100.66 102.72 4 100.00 99.99 100.71 102.89

2 5 100.00 99.99 100.72 102.97 6 100.00 99.99 100.73 103.02

2 99.95 99.95 105.16 114.61 3 99-94 99.87 105.91 118.08 4 99.93 99.86 106.30 11944

3 5 99.95 99.88 106.34 120.11 6 99.95 99.90 106.43 120.49

2 100.02 100.00 101.83 107.58 3 99.99 99.96 102.24 109.37

4 4 99.98 99.93 102.40 110-08 5 99.97 99.92 102.48 110.44 6 99.97 99.96 102.52 110.69

A NOTE ON OPTIMUM STFWTIFfCATION 103

q ( x ) is also same as in [ 5 ] . In this case also the AOSB are obtained by using the frequency distribution versions of the cum fJq rule with the values of the function p(x) being evaluated at the mid points of the class intervals.

In Table I are given the AOSB and nV(y,,), for g = 1 and g = 2. In case of g= 0 the variance nV(jj,,), =constant and is not affected by the change in the strata boundaries or the value of L. This was also the case for the Neyman allocation method (Singh and Sukhatme [5] ) . Table I1 gives the relative efficiency of the equal allocation method with respect to Neyman (R.E(N)) and proportional allocation (R.E.(P)) methods. The variances corresponding to the Neyman and proportional allocations have been taken from [5].

From Table I1 it is observed that the relative efficiency of equal allocation with respect to the Neyman allocation is practically 100 percent as the smallest value it takes is 99.86. One can, therefore, conveniently use the equal allocation of the sample to different strata with the cum fJq rule rather than going in for Neyman allocation and the corresponding stratification rule. It is also seen that there is no particular trend of change in the relative efficiency with increase in the number of strata. Also, since the relative efficiency does not depend on the constant a, the conclusions remain same for all the correlation values for any given value of g. The relative efficiency decreases with increase in the value of g as for g = O the relative efficiency is 100 percent. In some cases the relative efficiency is found to be slightly more than 100 percent. Such an observation has also been made earlier by Singh and Parkash in [6]. This happens because the relative efficiency is based not on the exactly optimum strata boundaries but on the AOSB.

The equal allocation method is found to be more efficient than the proportional allocation in all the cases. The performance of this allocation is particularly good in case of the exponential distribution. The efficiency is found to increase with the increase in the values of both L and g.

In view of the relations (1.2) and (1.4) of [5] it is seen that the variance of the combined ratio estimate can be obtained from the relation (1.2) of the present note by replacing /3 by the population ratio R. All the results obtained in this paper also, therefore, hold in the case of the combined ratio estimate.

References

[l] Delenius, T. (1950). “The problem of optimum stratification” Skand Akruar . , 3 3 ,

[ 2 ] Hayashi, C., Maruyama, F. and Isida, M. D. (1951). “On some criteria for stratifica- 203-21 3 .

tion”, A n n . Insr. Statist. Math., 2, 77-86.

104 RAVINDRA SMGH

[3] Singh, Ravindra and Sukhatme, B. V. (1969). “Optimum stratification”, Ann. Inst.

[4] Singh, Ravindra (1971). “Determination of optimum boundaries”, J. Indian Soc.

[S] Singh, Ravindra and Sukhatme, B. V. (1973). “Optimum stratification with ratio and

[a] Singh, Ravindra and Parkash, Dev (1975). “Optimum stratification for equal alloca-

Statist. Math., 21, 515-528.

Agric. Starist., 23, 115-123.

regression methods of estimation”, Ann. Inst Statist. Math. 25, 627-633.

tion”, Ann. lnsr Statist Math., 27, 273-280.