Upload
mariano-ruiz
View
213
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [University of Newcastle (Australia)]On: 27 August 2014, At: 22:21Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Journal of Applied StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cjas20
Double Sampling Ratio-productEstimator of a Finite Population Meanin Sample SurveysHousila P. Singh a & Mariano Ruiz Espejo ba School of Studies in Statistics, Vikram University , Indiab Departamento de Matemáticas Fundamentales , UNED , Madrid,SpainPublished online: 14 Feb 2007.
To cite this article: Housila P. Singh & Mariano Ruiz Espejo (2007) Double Sampling Ratio-productEstimator of a Finite Population Mean in Sample Surveys, Journal of Applied Statistics, 34:1, 71-85,DOI: 10.1080/02664760600994562
To link to this article: http://dx.doi.org/10.1080/02664760600994562
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Double Sampling Ratio-product
Estimator of a Finite Population Mean
in Sample Surveys
HOUSILA P. SINGH� & MARIANO RUIZ ESPEJO��
�School of Studies in Statistics, Vikram University, India, ��Departamento de Matematicas
Fundamentales, UNED, Madrid, Spain
ABSTRACT It is well known that two-phase (or double) sampling is of significant use in practicewhen the population parameter(s) (say, population mean X) of the auxiliary variate x is notknown. Keeping this in view, we have suggested a class of ratio-product estimators in two-phasesampling with its properties. The asymptotically optimum estimators (AOEs) in the class areidentified in two different cases with their variances. Conditions for the proposed estimator to bemore efficient than the two-phase sampling ratio, product and mean per unit estimator areinvestigated. Comparison with single phase sampling is also discussed. An empirical study iscarried out to demonstrate the efficiency of the suggested estimator over conventional estimators.
KEY WORDS: Auxiliary variate, double sampling ratio and product estimators, finite populationmean, study variate
Introduction
One of the major developments in sample surveys over the last five decades is the use of
auxiliary variable x, correlated with the study variable y, in order to obtain the estimates of
the population total or mean of the study variable. Various estimation procedures in
sample surveys need advance knowledge of some auxiliary variable xi which is then
used to increase the precision of estimates. For example, the classical ratio and product
estimators require the advance knowledge of population mean �X of the auxiliary variable x.
When the population mean �X is unknown, it is sometimes estimated from a preliminary
large sample on which only the auxiliary characteristic x is observed. The value of �X in
the estimator is then replaced by its estimate. A smaller second-phase sample of the
variate of interest (study variate) y is then taken. This technique, known as double
sampling or two-phase sampling, is especially appropriate if the xi values are easily acces-
sible and much cheaper to collect than the yi values, see Sitter (1997) and Hidiroglou &
Sarndal (1998). Neyman (1938) was the first to give the concept of double sampling in
connection with collecting information on the strata sizes in a stratified sampling.
Journal of Applied Statistics
Vol. 34, No. 1, 71–85, January 2007
Correspondence Address: Housila P. Singh, School of Studies in Statistics, Vikram University, Ujjain - 456010,
M. P., India. Email: [email protected]
0266-4763 Print=1360-0532 Online=07=010071–15 # 2007 Taylor & FrancisDOI: 10.1080=02664760600994562
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
The use of double sampling is necessary if the x-value is obtained by performing a
non-destructive experiment whereas to obtain a y-value of a unit, a destructive experiment
has to be performed, see Unnikrishan & Kunte (1995). However there are several situations
in practice, where double sampling can be employed effectively. Snedecor & King (1942)
have mentioned the application of a double sampling procedure by Goodman for determi-
nation of corn yield. Goodman found that it was easier and much cheaper to count the
number of ears of corn in a given unit area than to harvest the yield and obtain the dry
weight of kernels. The high cost of making dry weight determination led to the use of
double sampling in which ears would be counted and measured in many fields but harvested
in only a portion of these. Thus, taking advantage of the correlation between study variate y
(the dry weight of kernels) and the auxiliary variate x (length � diameter of the ear) and using
the regression technique, the dry weight of kernels per ear can be estimated (see Singh, 1968).
Sukhatme (1962) has advocated that the double sampling is usually used when the
number of units required to give the desired precision on different items is widely differ-
ent. This procedure is also used when it is proposed to utilize the information gathered in
the first phase as auxiliary information in order to improve the precision of the information
to be gathered in the second phase. For example, in a survey to estimate the production of
lime crop based on orchards as sampling units, a comparatively larger sample is taken to
obtain the acreage under the crop while the yield rate is obtained from only a sub-sample
of the orchards selected for determining acreage estimators for estimating population
mean �Y of the study variable y in two-phase sampling.
For more applications of double sampling method, the reader is referred to Sukhatme &
Koshal (1959), Yates (1960), Singh & Singh (1965), Singh (1968), Seth et al. (1968),
Chand (1975), Mukerjee et al. (1987), Dorfman (1994), Rao & Sitter (1995), York et al.
(1995), Prasad et al. (1996), Pitt et al. (1996), Breslow & Holubkov (1997), Singh &
Ruiz Espejo (2000) and Barnett et al. (2001).
We, further, note that the ratio method of estimation (or the product method of esti-
mation, see Robson (1957) and Murthy (1964)) yields a more efficient estimator than
the simple unbiased estimator provided the correlation coefficient between study variate
y and auxiliary variate x has high positive value (or high negative value). Further, the
ratio estimator is most effective and is as efficient as the regression estimator, when the
relationship between the study variate, y, and the auxiliary variate, x, is linear through
the origin and the variate of y is proportional to x. However, in many practical situations,
the line does not pass through the vicinity of the origin. Keeping this deed in view, we have
made an effort to improve these estimators in double sampling. Throughout, samples have
been drawn by the method of simple random sampling without replacement (SRSWOR).
For estimating the population mean �Y of the study variate y, recently Singh & Ruiz
Espejo (2003) considered an estimator of the ratio-product type given by
�yRP ¼ �y k�X
�xþ (1� k)
�x
�X
� �
where �y and �x are the sample means of y and x respectively based on a sample of size n out
of the population of N units, �X is the known population mean of x, and k is the set equal to
k ¼ (1þ C )/2 with C ¼ rCy/Cx obtained from judgement, past data or pilot sample
survey, r is the correlation coeffecient between y and x, Cy and Cx are the coefficients
of variation of y and x respectively. For the details of the above estimator the reader is
referred to Singh & Ruiz Espejo (2003).
In this paper we have studied the properties of the above estimator �yRP in the case of
double sampling. Numerical illustrations are given in the support of the present study.
72 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
The Double Sampling Estimator
When the population mean �X of x is not known, a first-phase sample of size n1 is drawn
from the population on which only the x-characteristic is measured in order to furnish a
good estimate of �X. Then a second-phase sample of size n is drawn on which both the vari-
ates y and x are measured. Let �x1 ¼ (1=n1)Pn1
i¼1 xi denote the sample mean of x based on
the first-phase sample of the size n1. Then the two-phase sampling (or double sampling)
estimator is given by
�y(d)RP ¼ �y k
�x1
�xþ (1� k)
�x
�x1
� �(1)
where k is determined so as to minimize the variance of �y(d)RP.
Defining
e0 ¼ (�y� �Y)= �Y , e1 ¼ (�x� �X)= �X and e01 ¼ (�x1 � �X)= �X
we have
�y(d)RP ¼
�Y(1þ e0){k(1þ e01)(1þ e1)�1 þ (1� k)(1þ e1)(1þ e01)�1}: (2)
We now assume that je1j , 1 and je01j , 1, so that we may expand (1þ e1)21 and
(1þ e01)21 as a series in powers of e1 and e10. Expanding, multiplying out and retaining
terms of es to the second degree, we obtain
�y(d)RP ¼
�Y{k(1þ e0 � e1 þ e01 � e0e1 þ e21 þ e0e01 � e1e01 þ . . . )
þ (1� k)(1þ e0 þ e1 � e01 � e0e01 þ e0e1 þ e021 � e1e01 þ . . . )}
or
�y(d)RP �
�Y ffi �Y{e0 þ e1 � e01 � e0e01 þ e0e1 � e1e01 þ e021
þ k(e21 � e021 � 2e0e1 þ 2e0e01 � 2e1 þ 2e01)}: (3)
Taking expectation in equation (2) and noting that
E(e0) ¼ E(e1) ¼ E(e01) ¼ 0
and that the expectations of the second degree terms of order n21, we obtain
E{�y(d)RP} ¼ �Y þ o(n�1)
Thus the bias of the estimator �y(d)RP, is of the order n21 and hence its contribution to the mean
square error will be of the order of n22.
To find the bias and variance of �y(d)RP, let
C2y ¼ S2
y= �Y2, C2
x ¼ S2x= �X
2and r ¼ Syx=(SySx)
Double Sampling Ratio-product Estimator 73
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
where
S2y ¼
1
N � 1
XN
i¼1
(yi � �Y)2, S2x ¼
1
N � 1
XN
i¼1
(xi � �X)2
and
Syx ¼1
N � 1
XN
i¼1
(yi � �Y)(xi � �X):
The following two cases will be considered separately.
Case I. When the second phase sample of size n is a subsample of the first phase of size n1.
Case II. When the second phase sample of size n is drawn independently of the first phase
sample of size n1.
The case where the second sample is drawn independently of the first was considered by
Bose (1943).
Case I
Bias, Variance and Optimum k
In Case I, we have
E(e0) ¼ E(e1) ¼ E(e01) ¼ 0
E e20
� �¼
1� f
nC2
y
E e21
� �¼
1� f
nC2
x
E e021� �¼
1� f1
n1
C2x
E e0e1ð Þ ¼1� f
nCC2
x
E e0e01� �
¼1� f1
n1
CC2x
E e1e01� �
¼1� f1
n1
C2x
9>>>>>>>>>>>>>>>>>>>>>>>>>=>>>>>>>>>>>>>>>>>>>>>>>>>;
(4)
where f ¼ n/N, f1 ¼ n1/N, and C ¼ rCy/Cx.
Substituting equation (4) and noting that E(e0) ¼ E(e1) ¼ E(e01) ¼ 0 in (3) we get the
bias of �y(d)RP to the first degree of approximation as
B{�y(d)RP} ¼
1� f �
n�YC2
x {C þ k(1� 2C)} (5)
where f � ¼ n/n1. Thus, B{�y(d)RP} in equation (5) is ‘zero’ if
k ¼C
2C � 1:
74 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Thus the estimator �y(d)RP with k ¼ C/(2C 2 1) is almost unbiased. From equation (5), it also
follows that the bias in �y(d)RP is negligible if the sample size n is sufficiently large.
The variance of �y(d)RP, up to terms of order n21, is
V{�y(d)RP} ffi �Y
2E½e2
0 þ (1� 2k){(1� 2k)(e1 � e01)2 þ 2(e0e1 � e0e01)}�: (6)
Taking the expectation of both sides in equation (6) and using results in equation (4), we
obtain the variance of �y(d)RP to terms of order n21, as
V{�y(d)RP}I ¼
�Y2 1� f
nC2
y þ1� f �
n(1� 2k þ C)C2
x (1� 2k þ 2CÞ
� �(7)
which is minimized when
k ¼1þ C
2¼ k0 (say) (8)
Substituting equation (8) in equation (1) we get the ‘asymptotically optimum estimator’
(AOE) as
�y(d0)RP ¼
�y
2(1þ C)
�x1
�xþ (1� C)
�x
�x1
� �Putting equations (8) in equations (5) and (7) we get the bias and variance of �y(d0)
RP res-
pectively as
B{�y(d0)RP }I ¼
1� f �
2n�YC2
x {1þ C(1� 2C)}
and
V{�y(d0)RP }I ¼ S2
y
1� f
n(1� r2)þ
1� f1
n1
r2
� �which is the same as the variance of the linear regression estimator �ydlr ¼ �yþ b(�x1 � �x) in
two phase sampling, where b is the sample regression coefficient of y on x.
Comparison with Ratio Estimator in Double Sampling
For k ¼ 1, the estimator �y(d)RP in equation (1) reduces to the usual double sampling ratio
estimator
�y(d)R ¼ �y
�x1
�x
The variance of �y(d)R can be obtained by putting k ¼ 1 in equation (7) as
V{�y(d)R } ¼ �Y
2 1� f
nC2
y þ1� f �
nC2
x (1� 2C)
� �: (9)
Double Sampling Ratio-product Estimator 75
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
From equations (7) and (9) we have
V{�y(d)RP}� V{�y(d)
R } ¼ �41� f �
n�Y
2C2
x {k(1� k þ C)� C}
which is negative if
either 1 , k , C
or C , k , 1
�or equivalently,
min(C, 1) , k , max(C, 1)
or equivalently,jk � k0j , j1� k0j
where k0 ¼ (1þ C )/2.
Comparison with Product Estimator in Double Sampling
For k ¼ 0, the estimator �y(d)RP in (1) boils down to the usual double sampling product esti-
mator �y(d)P for �Y as
�y(d)P ¼ �y
�x
�x1
The variance of �y(d)P can be obtained by putting k ¼ 0 in equation (7) as
V{�y(d)P } ¼ �Y
2 1� f
nC2
y þ1� f �
nC2
x (1þ 2C)
� �: (10)
From equations (7) and (10) we have
V{�y(d)RP}� V{�y(d)
P } ¼ �4 �Y2 1� f �
nC2
x k(1� k þ C);
which is negative if
either 0 , k , 1þ C
or 1þ C , k , 0
�or equivalently,
min (0, 1þ C) , k , max(0, 1þ C)
or equivalently,
jk � k0j , jk0j:
76 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Comparison with Mean Per Unit Estimator
The variance of sample mean �y under SRSWOR sampling scheme is given by
V(�y) ¼1� f
n�Y
2C2
y (11)
From equations (7) and (11) we have
V{�y(d)RP}� V(�y) ¼ �Y
2 1� f �
nC2
x (1� 2k)(1� 2k þ 2C)
which is negative if
either 1=2 , k , 1=2þ C
or 1=2þ C , k , 1=2
�
or equivalently,
min1
2,
1
2þ C
� �, k , max
1
2,
1
2þ C
� �
or equivalently,
jk � k0j , k0 �1
2
���� ����:Case II
In Case II, we have
E(e0) ¼ E(e1) ¼ E(e01) ¼ 0
E e20
� �¼
1� f
nC2
y
E e21
� �¼
1� f
nC2
x
E e021� �¼
1� f1
n1
C2x
E(e0e1) ¼1� f
nCC2
x
E(e0e01) ¼ E(e1e01) ¼ 0
9>>>>>>>>>>>>>>>>>>=>>>>>>>>>>>>>>>>>>;
(12)
Taking expectation of equation (3) and using the results in equation (12) we get the bias of
�y(d)RP up to terms of order n21, as
B{�y(d)RP}II ¼
�YC2x
1� f
n{C þ k(1� 2C)}þ
1� f1
n1
(1� k)
(13)
Double Sampling Ratio-product Estimator 77
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
which will vanish if
k ¼
1� f1
n1
þ1� f
nC
1� f1
n1
þ1� f
nð2C � 1)
Taking expectation of equation (6) and using the results in equation (12) we get the
variance of �y(d)RP up to terms of order n21, as
V{�y(d)RP} ¼ �Y
2 1� f
nC2
y þ (1� 2k) ð1� 2kÞ1� f
nþ
1� f1
n1
� ��þ 2
1� f
nC
�C2
x
(14)
which is minimum when
k ¼1þ uC
2¼ k�0 (say) (15)
where
u ¼
1� f
n1� f
nþ
1þ f1
n1
:
Substitution of equation (15) into equation (1) yields the ‘AOE’ as
�y(d�
0)
RP ¼�y
2(1þ uC)
�x1
�xþ (1� uC)
�x
�x1
� �Putting equation (15) in equations (13) and (14) we get the bias and variance of �y
(d�0)
RP
respectively as
B{�y(d�
0)
RP }II ¼�YC2
x
2
1� f
n{1þ uC(1� 2C)}þ
1� f1
n1
(1� uC)
and
V �y(d�
0)
RP
n oII¼
1� f
nS2
y(1� ur2)
Ignoring the finite population correction in equation (14), the variance of �y(d)RP is given by
V{�y(d)RP}II ¼
�Y2 1
nC2
y þ (1� 2k) (1� 2k)1
nþ
1
n1
� �þ
2
nC
� �C2
x
(16)
78 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
which is minimized for
k ¼1þ u�C
2¼ k��0 (say) (17)
where u� ¼ n1/(nþ n1).
Substitution of equation (17) in equation (1) yield the ‘AOE’ as
�y(d��
0)
RP ¼�y
2(1þ u�C)
�x1
�xþ (1� u�C)
�x
�x1
� �(18)
Putting equation (17) in equation (16) we get the variance of �y(d��
0)
RP as
V{�y(d��
0)
RP }II ¼S2
y
n(1� u�r2): (19)
Comparison with Ratio Estimator in Double Sampling
Putting k ¼ 1, the estimator �y(d)RP at equation (1) reduces to the ratio estimator �y(d)
R .
Thus putting k ¼ 1 in equation (16) we get the variance of �y(d)R to the first degree of
approximation as
V{�y(d)R } ¼ �Y
2 1
n{C2
y þ C2x (1� 2C)}þ
1
n1
C2x
and so
V{�y(d)RP}� V{�y(d)
R } ¼ �Y2C2
x
1
nþ
1
n1
� �{(1� 2k)2 � 1}þ
4C
n(1� k)
is negative if
either u�C , k , 1
or 1 , k , u�C
�
or equivalently,
min(u�C, 1) , k , max(u�C, 1)
or equivalently,
jk � k��0 j , j1� k��0 j:
Double Sampling Ratio-product Estimator 79
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Comparison with Product Estimator in Double Sampling
Setting k ¼ 0 in equation (18) we get the variance of �y(d)P to the first degree of approxi-
mation as
V{�y(d)P } ¼ �Y
2 1
n{C2
y þ C2x (1þ 2C)}þ
1
n1
C2x
(20)
From equations (16) and (20) we have
V{�y(d)RP}� V{�y(d)
P } ¼ �4 �Y2C2
x
1
nþ
1
n1
� �k(1� k)þ
kC
n
� �which is negative if
either 0 , k , 1þ u�C
or 1þ u�C , k , 0
�or equivalently,
min(0, 1þ u�C) , k , max(0, 1þ u�C)
or equivalently,jk � k��0 j , jk
��0 j:
Comparison with Mean Per Unit Estimator �y
Ignoring fpc, the variance of �y under SRSWOR is given by
V(�y) ¼1
n�Y
2C2
y (21)
From equations (16) and (21) we have
V{�y(d)RP}� V(�y) ¼ �Y
2C2
x (1� 2k)2 1
nþ
1
n1
� �þ 2(1� 2k)
C
n
� �which is negative if
either 1=2þ u�C , k , 1=2or 1=2 , k , 1=2þ u�C
�or equivalently,
min1
2,
1
2þ u�C
� �, k , max
1
2,
1
2þ u�C
� �or equivalently,
jk � k��0 j , k�0 �1
2
���� ����
80 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Comparison with Linear Regression Estimator
With the fpc ignored, the variance of the linear regression estimator in double sampling,
�ydlr ¼ �yþ b(�x1 � �x), is given by Cochran (1963),
V(�ydlr) ¼S2
y
n1�
n1 � n
n1
r2
� �¼ V{�y(d0)
RP }I (22)
From equations (19) and (22) we have
V(�ydlr)� V{�y(d��
0)
RP }II ¼S2
yr2n
n1(nþ n1). 0
This shows that the variance of the ‘AOE’ �y(d��
0)
RP is always less than that of �ydlr. Thus the
AOE �y(d��
0)
RP in Case II is uniformly more efficient than the AOE �y(d0)RP in Case I.
Remark 4.1
Following Singh & Ruiz Espejo (2003), one can define the estimators based on estimated
optimum values under Cases I and II respectively as
�y(d0)RP ¼
�y
2(1þbC)
�x1
�xþ (1�bC)
�x
�x1
� �and
�y(d�
0)
RP ¼�y
2(1þ ubC)
�x1
�xþ (1� ubC)
�x
x1
� �where bC ¼ (syx=s
2x)(�x1=�y) ¼ b=bR, syx ¼
1
n� 1
Xn
i¼1
(yi � �y)(xi � �x)
s2x ¼
1
n� 1
Xn
i¼1
(xi � �x)2 and bR ¼ �y=�x1
It can be easily shown to the first degree of approximation that
V{�y(d0)RP } ¼ V{�y(d0)
RP } (under Case I)
andV{�y(d0)
RP } ¼ V{�y(d�
0)
RP } (under Case II)
Comparison with Single Phase Sampling
In this section the comparisons between double and single-phase sampling have been
made for fixed cost. We shall consider the Cases I and II separately.
Case I. In this case let us consider the following cost function
c ¼ c1nþ c2n1 (23)
where c equals total cost of the survey and c1 and c2 are the costs per unit of collecting
information on y and x respectively.
Double Sampling Ratio-product Estimator 81
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
In this case, ignoring fpc we write the variance expression of AOE �y(d0)RP as
V ¼V1
nþ
V2
n1
where V1 ¼ Sy2(1 2 r2) and V2 ¼ Sy
2r2.
The optimum values of n and n1 for fixed cost c, which minimize the variance in
equation (23) are given by
nopt ¼cffiffiffiffiffiffiffiffiffiffiffiffiV1=c1
p� ffiffiffiffiffiffiffiffiffiV1c1
pþ
ffiffiffiffiffiffiffiffiffiV2c2
p � ¼ cffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi(1� r2)=c1
p� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffic1(1� r2)
pþ r
ffiffiffiffiffic2p �
nIopt ¼cffiffiffiffiffiffiffiffiffiffiffiffiV2=c2
p� ffiffiffiffiffiffiffiffiffiV1c1
pþ
ffiffiffiffiffiffiffiffiffiV2c2
p � ¼ crffiffiffiffiffiffiffiffiffi1=c2
p� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffic1(1� r2)
pþ r
ffiffiffiffiffic2p �
9>>>>=>>>>;The variance of �y(d0)
RP corresponding to optimal double sampling estimator is
Vopt{�y(d0)RP } ¼ (1=c)
� ffiffiffiffiffiffiffiffiffic1V1
pþ
ffiffiffiffiffiffiffiffiffic2V2
p �2
¼ (S2y=c)
� ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffic1(1� r2)
pþ r
ffiffiffiffiffic2p 2
)(24)
Case II. In Case II, we assume that x is measured on n� ¼ nþ n1 units and y on n units.
Following Srivastava (1970) we shall consider a simple cost function
c ¼ c1nþ c�2n� (25)
where c1 and c�2 denote costs per unit of observing y and x values respectively.
The variance of �y(d�
0)
RP at equation (19) can now be written as
V� ¼V1
nþ
V�2n�
(26)
To obtain the optimum allocation of sample between phases for a fixed cost c, we
minimize equation (26) with the condition (25). It is easily found that this minimum is
attained for
n
n�¼
V1c�2V2c1
� �1=2
¼c�2(1� r2)
c1r2
� �1=2
Thus the minimum variance corresponding to these optimum values of n and n1 are
given by
Vopt{�y(d��
0)
RP } ¼ (S2y=c)
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi(1� r2)c1
pþ r
ffiffiffiffiffic�2
pn o2
(27)
Had all the resources been diverted towards the study of character y only, then we would
have optimum sample size as below
n�� ¼ c=c1
82 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Thus, the variance of sample mean �y for a given fixed cost c in case of large population is
given by
Vopt(�y) ¼c1
cS2
y (28)
Case I. From equations (24) and (28), the proposed double sampling strategy would be
profitable as long as
Vopt{�y(d0)RP } , Vopt(�y)
or equivalently,
c2
c1
,(1�
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p)2
r2
Table 2. Description of the populations (b)
Population r Cy Cx C
Optimum value
k0 ¼ (1þ C)/2
Optimum value
k�0 ¼ (1þ uC)/2
1. 0.930 2.00500 1.59775 1.1671 1.08355 0.95598
2. 0.840 2.00500 1.44699 1.1639 1.08195 0.95473
3. 0.7418 0.23833 0.09198 1.9221 1.46105 1.2474
4. 0.5677 0.23833 0.11265 1.2011 1.10055 0.96709
5. 0.720 1.35277 0.24495 3.9763 2.48815 2.06514
6. 0.520 1.35277 0.46904 1.4998 1.24990 1.09035
7. 20.7177 0.48031 0.17776 21.9392 20.46960 20.28041
8. 20.4996 0.48031 0.74933 20.3202 0.3399 0.37114
9. 20.4074 0.20174 0.15033 20.5467 0.22664 0.27782
10. 20.0591 0.20174 0.27678 20.0431 0.47845 0.48237
Table 1. Description of the populations (a)
Population Source Study variate y Auxiliary variate x N n n1
1. Sukhatme & Chand
(1977)
Bushels of apples
harvested in 1964
Apple trees bearing
age in 1964
120 20 50
2. Sukhatme & Chand
(1977)
Bushels of apples
harvested in 1964
Bushels of apples
harvested in 1959
120 20 50
3. Srivastava (1971) Yield per plant Height of the plant 50 8 20
4. Srivastava (1971) Yield per plant Base diameter 50 8 20
5. Tripathi (1980) Persons in services Educated persons 225 40 100
6. Tripathi (1980) Persons in services Persons employed 225 40 100
7. Steel & Torrie (1960) Log of leaf burn in secs Nitrogen percentage 30 8 18
8. Steel & Torrie (1960) Log of leaf burn in secs Clorine percentage 30 8 18
9. Dobson (1990) Total calories
percentage from
carbohydrate
Body weight 20 5 12
10. Dobson (1990) Total calories
percentage from
carbohydrate
Age 20 5 12
Double Sampling Ratio-product Estimator 83
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Case II. From equations (27) and (28) it is obtained that the double sampling estimator
�y(d��
0)
RP yields less variance than that of sample mean �y for the same fixed cost, if
r2 .4c1c�2
(c1 þ c�2)2
Empirical Study
To observe the relative performance of different estimators of �Y we consider ten natural
population data sets. The description of the populations are given in Tables 1 and 2.
We have computed the percentage relative efficiency of different estimators with
respect to �y and this is shown in Table 3.
Table 3 exhibits that there is considerable gain in efficiency by using suggested estima-
tors �y(d0)RP (or �y(d0)
RP ) and �y(d�
0)
RP (or �y(d�
0)
RP ) over conventional estimators �y, �y(d)R and �y(d)
P except for
the data set of population 10, where the estimators �y, �y(d0)RP (or �y(d0)
RP ) and �y(d�
0)
RP (or �y(d�
0)
RP ) are
almost equally efficient. It is due to poor correlation between y and x. It is further observed
that the estimator �y(d�
0)
RP (or �y(d�
0)
RP ) is more efficient than �y(d0)RP (or �y(d0)
RP ) for all data sets. Thus,
it is preferred to use the proposed estimators �y(d0)RP (or �y(d0)
RP ) and �y(d�
0)
RP (or �y(d�
0)
RP ).
References
Barnett, V., Haworth, J. & Smith, T. M. F. (2001) A two-phase sampling scheme with applications to auditing or
sed quis custodiet ipsos custodes. Journal of the Royal Statistical Society Series A, 164, pp. 407–422.
Bose, C. (1943) Note on the sampling error in the method of double sampling, Sankhya, 6, pp. 329–330.
Breslow, N. E. & Holubkov, R. (1997) Maximum likelihood estimation of logistic regression parameters under
two-phase outcome-dependent sampling, Journal of the Royal Statistical Society (Series B), 59,
pp. 447–461.
Chand, L. (1975) Some ratio-type estimators based on two or more auxiliary variables. PhD Thesis, Ames, Iowa
State University.
Cochran, W. G. (1963) Sampling Techniques, 2nd edn (New York: Wiley).
Dobson, A. J. (1990) An Introduction to Generalized Linear Models, 1st edn (New York: Chapman & Hall).
Dorfman, A. H. (1994) A note on variance estimation for the regression estimator in double sampling, Journal of
the American Statistical Association, 89, pp. 137–140.
Table 3. Percent relative efficiencies
Estimator �yR(d) �yR
(d) �yP(d) �yP
(d) �y(d0)RP or �y(d0)
RP �y(d�0)
RP or �y(d�
0)
RP
Population �y Case I Case II Case I Case II Case I Case II
1. 100.00 256.42 303.14 � � 265.10 309.04
2. 100.00 199.18 220.41 � � 203.26 223.09
3. 100.00 143.39 161.57 � � 164.76 174.82
4. 100.00 128.83 133.23 � � 129.91 133.46
5. 100.00 119.95 128.06 � � 160.84 168.95
6. 100.00 121.27 126.25 � � 124.58 127.05
7. 100.00 � � 142.59 156.51 163.99 170.83
8. 100.00 � � � � 123.31 125.14
9. 100.00 � � 104.21 � 114.83 115.47
10. 100.00 � � � � 100.27 100.29
�Data not applicable and percent relative efficiency less than 100%.
84 H. P. Singh & M. Ruiz Espejo
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Hidiroglou, M. A. & Sarndal, C. E. (1998) Use of auxiliary information for two-phase sampling, Survey
Methodology, 24, pp. 11–20.
Mukerjee, R., Rao, T. J. & Vijayan, K. (1987) Regression type estimator using multiple auxiliary information,
Australian Journal of Statistics, 29, pp. 244–254.
Murthy, M. N. (1964) Product method of estimation, Sankhya (Series A), 26, pp. 294–307.
Neyman, J. (1938) Contribution to the theory of sampling human populations, Journal of the American Statistical
Association, 33, pp. 101–116.
Pitt, D. G., Glover, G. R. & Jones, R. H. (1996) Two-phase sampling of woody and herbaceous plant communities
using large-scale aerial photographs, Canadian Journal of Forestry Research, 26, pp. 509–524.
Prasad, B., Singh, R. S. & Singh, H. P. (1996) Some chain ratio-type estimators for ratio of two population means
using two auxiliary characters in two phase sampling, Metron, 54, pp. 95–113.
Rao, J. N. K. & Sitter, R. R. (1995) Variance estimation under two phase sampling with application to imputation
for missing data, Biometrika, 82, pp. 453–460.
Robson, D. S. (1957) Applications of multivariate polykays to the theory of unbiased ratio-type estimation,
Journal of the American Statistical Association, 52, pp. 511–522.
Seth, G. R., Sukhatme, B. V. & Manwani, A. H. (1968) Sample surveys on fruit crops, in: Contributions in Stat-
istics and Agricultural Sciences, pp. 181–190 (New Delhi: Indian Society of Agricultural Statistics).
Singh, D. (1968) Double sampling and its application in agriculture, in: Contributions in Statistics and
Agricultural Sciences, pp. 213–226 (New Delhi: Indian Society of Agricultural Statistics).
Singh, D. & Singh, B. D. (1965) Double sampling for stratification on successive occasions, Journal of the
American Statistical Association, 60, pp. 784–792.
Singh, H. P. & Ruiz Espejo, M. (2000) An improved class of chain regression estimators in two phase sampling,
Statistics & Decisions, 18, pp. 205–218.
Singh, H. P. & Ruiz Espejo, M. (2003) On linear regression and ratio-product estimation of a finite population
mean, The Statistician, 52, pp. 59–67.
Sitter, R. R. (1997) Variance estimation for the regression estimator in two-phase sampling, Journal of the
American Statistical Association, 92, pp. 780–787.
Snedecor, G. W. & King, A. J. (1942) Recent developments in sampling for agricultural statistics, Journal of the
American Statistical Association, 37, pp. 95–102.
Srivastava, S. K. (1970) A two-phase sampling estimator in sample surveys, Australian Journal of Statistics, 2,
pp. 23–27.
Srivastava, S. K. (1971) Generalized estimator for the mean of a finite population using multiauxiliary
information, Journal of the American Statistical Association, 66, pp. 404–407.
Steel, R. G. D. & Torrie, J. H. (1960) Principles and Procedures of Statistics (New York: McGraw-Hill).
Sukhatme, B. V. (1962) Some ratio-type estimators in two-phase sampling, Journal of the American Statistical
Association, 57, pp. 628–632.
Sukhatme, B. V. & Chand, L. (1977) Multivariate ratio type estimators, Proceedings of the Social Statistics
Section of the American Statistical Association, pp. 927–931.
Sukhatme, B. V. & Koshal, R. S. (1959) A contribution to double sampling, Journal of the Indian Society of
Agricultural Statistics, 11, pp. 128–144.
Tripathi, T. P. (1980) A general class of estimators of population ratio, Sankhya (Series C), 42, pp. 63–75.
Unnikrishan, N. K. & Kunte, S. (1995) Optimality of an analogue of Basu’s estimator under a double sampling
design, Sankhya (Series B), 57, pp. 103–111.
Yates, F. (1960) Sampling Methods for Censuses and Surveys, 2nd edn (London: Charles Griffin).
York, I., Madigan, D., Heuch, I. & Lie, R. T. (1995) Birth defects registered by double sampling: a Bayesian
approach incorporating covariates and model uncertainty, Applied Statistics, 44, pp. 227–242.
Double Sampling Ratio-product Estimator 85
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4
Dow
nloa
ded
by [
Uni
vers
ity o
f N
ewca
stle
(A
ustr
alia
)] a
t 22:
21 2
7 A
ugus
t 201
4