18
METRON - International Journal of Statistics 2009, vol. LXVII, n. 1, pp. 57-74 M. T. ALODAT – M. Y. AL-RAWWASH I. M. NAWAJAH Analysis of simple linear regression via median ranked set sampling Summary - The purpose of this article is to study the simple linear regression model un- der the median ranked set sampling (MRSS) scheme. In fact, if the response variable can be easily ranked than quantified or measuring the response variable is expensive but can be ranked easily, then it is recommended to use the MRSS to collect data on the response variable. We obtain estimators of the regression parameters using the col- lected data under the assumption that the errors are symmetrically distributed also we elaborate on some of the mathematical properties of these estimators. We also show that the new estimators are more efficient than those obtained via simple random sam- ple (SRS) and extreme ranked set sample (ERSS). An application to real data is also presented. Key Words - Extreme ranked set sample; Median ranked set sample; Ranked set sample; Simple linear regression. 1. Introduction McIntyre (1952) was the first to introduce the so called Ranked Set Sam- pling (RSS) which is considered to be a relatively new method of sampling compared to other sampling methods. The RSS can be summarized as follows. Randomly select m 2 sample units from the population of interest by visual inspection and allocate these selected units as randomly as possible into m sets each of size m. Accordingly, select the RSS of size m for actual analysis that consists of the smallest ranked unit from the first set, the second smallest ranked unit from second set, continuing in this fashion until the largest ranked unit is selected from the last set. The cycle may be repeated r times until a desired sample of size n = rm is obtained. The RSS scheme can be compared to the SRS scheme via the following two criteria: Received June 2008 and revised April 2009.

Analysis of simple linear regression via median ranked set sampling

Embed Size (px)

Citation preview

METRON - International Journal of Statistics2009, vol. LXVII, n. 1, pp. 57-74

M. T. ALODAT – M. Y. AL-RAWWASHI. M. NAWAJAH

Analysis of simple linear regressionvia median ranked set sampling

Summary - The purpose of this article is to study the simple linear regression model un-der the median ranked set sampling (MRSS) scheme. In fact, if the response variablecan be easily ranked than quantified or measuring the response variable is expensivebut can be ranked easily, then it is recommended to use the MRSS to collect data on theresponse variable. We obtain estimators of the regression parameters using the col-lected data under the assumption that the errors are symmetrically distributed also weelaborate on some of the mathematical properties of these estimators. We also showthat the new estimators are more efficient than those obtained via simple random sam-ple (SRS) and extreme ranked set sample (ERSS). An application to real data is alsopresented.

Key Words - Extreme ranked set sample; Median ranked set sample; Ranked setsample; Simple linear regression.

1. Introduction

McIntyre (1952) was the first to introduce the so called Ranked Set Sam-pling (RSS) which is considered to be a relatively new method of samplingcompared to other sampling methods. The RSS can be summarized as follows.Randomly select m2 sample units from the population of interest by visualinspection and allocate these selected units as randomly as possible into m setseach of size m. Accordingly, select the RSS of size m for actual analysisthat consists of the smallest ranked unit from the first set, the second smallestranked unit from second set, continuing in this fashion until the largest rankedunit is selected from the last set. The cycle may be repeated r times until adesired sample of size n = rm is obtained. The RSS scheme can be comparedto the SRS scheme via the following two criteria:

Received June 2008 and revised April 2009.

58 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

1. Relative efficiency. The relative efficiency of RSS with respect to SRS inthe estimation of the population mean is defined as

RE(XRSS, XSRS) = Var(XSRS)

Var(XRSS)

where XSRS and XRSS are unbiased estimators for the population mean µ

based on equal size samples obtained from the SRS and the RSS samplingschemes, respectively. If, RE(XRSS, XSRS) > 1, then

Var(XRSS) < Var(XSRS).

2. Relative saving.

RS = 1 − 1

RE(XRSS, XSRS),

which equals the number of saved sampling units due to using the RSSmethod.

Ranked set sampling has been considered and modified by many authors overthe last few decades. To summarize the main and important activities in thisfield, we introduce this review based on the bibliography by Kaur et al. (1995).Among the early work on RSS, McIntyre (1952) illustrated the RSS techniqueand proposed the estimate XRSS = ∑n

i=1 Xi(i)/n as a trial estimator of µ versusthe traditional estimate XSRS. McIntyre (1952) claimed without proof that XRSS

is unbiased estimator of the population mean regardless of any error in rankingjudgment. Numerous authors have tried to evaluate the performance of RSS,(for more details see Halls and Dell, 1966; Martin et al. 1980; and Alodat andAl-Sagheer, 2007). On the other hand, some authors have studied RSS undercertain distributional assumptions and we will briefly introduce some of them.

Takahashi and Wakimoto (1968) gave the mathematical framework for theRSS. They showed that the sample mean XRSS is the minimum variance un-biased estimator for the population mean. Moreover, when ranking is perfect,they established the following inequality on RE(XRSS, XSRS)

1 ≤ RE(XRSS, XSRS) ≤ m + 1

2,

where m is the set size. Muttlak (1995) used the RSS to estimate the parametersof the simple linear regression model assuming that the explanatory variable Xis known constants. Muttlak (1997) proposed the median ranked set sampling(MRSS) scheme. This scheme is similar the RSS scheme, but we quantifythe median of each set instead of quantifying other order statistics. Muttlak(1997) showed that the MRSS is more efficient than the RSS for estimating

Analysis of simple linear regression via median ranked set sampling 59

the population mean when the population is symmetric. Yu and Lam (1997)proposed a regression-type estimator based on RSS. They demonstrated thatthis estimator is always more efficient than the regression estimator using SRSand it is more efficient than the estimator proposed by Patil et al. (1993),unless the correlation coefficient is low (|ρ| < 0.4).

In order to decide which sampling method is more appropriate to use forregression analysis, Patil et al. (1993) conducted an experiment to obtain theparameters estimates of a linear regression model. They showed that the esti-mator of the population mean using RSS is more efficient than the usual SRSregression estimator unless the correlation coefficient between the response Yand the concomitant variable X is high (say, ρ > 0.85). Under the assumptionthat the interest variable and the concomitant variable follow a joint bivariatenormal distribution, Yu and Lam (1997) found that the RSS regression estima-tor is more efficient than the SRS regression estimator unless the correlationbetween the interest variable and the concomitant variable is low (say ρ < 0.4).

Later, Chen and Wang (2004) proposed a general unbalanced RSS schemein order to study the polynomial regression model by ranking on the concomitantvariable X . They considered the regression model

Y = β0 + β1 X + β2 X2 + · · · + βp X p + ε,

where Y is response variable, X is predictor variable, β’s are the regressionparameters and ε is a random error independent of X .

The extreme ranked set sampling (ERSS) procedure, as proposed by Sama-wi et al. (1996), can be described as follows:

1. Select m random sets each of size m from the population of interest.2. Rank the units within each set with respect to a variable of interest by

visual inspection.3. If m is even, we select the smallest unit from each of the first m/2 sets,

while we choose the largest unit from each of the last m/2 s sets for actualmeasurement. However, if m is odd, we select the smallest unit from eachof the first (m −1)/2 sets for actual measurement and we select the largestunit from each of the next (m − 1)/2 sets, while we select the median ofthe last set for actual measurement.

The cycle may be repeated r times to get n = mr units. These n units representan ERSS sample.

In this paper, we introduce a new method to analyze the simple linearregression model. The measurements are collected using the median ranked setsampling scheme where the response variable is assumed to be easily rankedthan quantified or the quantification of the response happens to be expensive.In Section 2, we introduce the MRSS to the regression framework in orderto build the estimation strategy accordingly. Later, we introduce the least

60 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

squares estimators of the regression parameters using the MRSS scheme anddiscuss its properties. In Section 3, we present the ERSS to simple linearregression setup and outline the estimation strategy as well as the propertiesof the estimators obtained accordingly. In Section 4, we conduct a comparisonstudy via simulation among the three sampling schemes namely SRS, ERSSand MRSS. In Section 5, a real example illustrates our procedure and theoryregarding the advantages of the MRSS compared to SRS. Finally, we presentour findings, discuss our ideas and state the final conclusions.

2. Median ranked set sample for regression

Consider the following simple linear regression model

Yj = β0 + β1xj + εj , j = 1, . . . , r,

where

1. β0 and β1 are unknown parameters.2. E(Yj |X j = xj ) = β0 + β1xj .3. εj ’s are iid random errors with mean zero and unknown variance σ 2. The

distribution of the errors should be symmetric and we may assume thenormality for more convenience in this article.

To this end, we assume x1, x2, . . . , xr to be different values of the predictorX which are chosen by the experimenter. We assume that for each xj , theexperimenter has repeated the experiment n = 2m − 1 times, consequently welet Yj j1, Yj j2, . . . , Yj jn , j = 1, 2, . . . , r , be the corresponding values of Y atxj , i.e.,

Yj ji = β0 + β1xj + εj j i , j = 1, . . . , r, i = 1, . . . , n,

where εj j i are the corresponding errors. If the variable Y seems to be moreeasily ranked than quantified or if measuring the variable Y is expensive whileit can be ranked easily, then the median ranked set sampling could be a properscheme to collect our data on the variable Y . To accomplish this missionand using visual inspection, let Yj(m) denote the quantified median of Yj j1,Yj j2, . . . , Yj jn . Using the collected data, Yj(m) can be related to xj using thefollowing simple linear regression model

Yj(m) = β0 + β1xj + εj(m), j = 1, . . . , r, (1)

where εj(m) is the median of εj j1, εj j2, . . . , εj jn .

Analysis of simple linear regression via median ranked set sampling 61

Many authors in a wide range of research reported that data collectionis expensive and time-consuming task. For example, environmental mangershighlighted this obstacle in several occasions and explain how hard and ex-pensive the process of obtaining estimates of species abundance especially ifthe species are rare or occur in remote locations. Patts and Elith (2006) foundthat the Australian threatened plant species, Leinomea ralstonni, is protectedunder both state and federal legislation due to its small geographic range andpopulation size. Prediction of species abundance is supposed to guide manage-ment of the species and as an input for population viability. Model structureincludes the choice of environmental characteristics (explanatory variables) thatare expected to affect pj , the species abundance (the response variable). Themodel of interest is

log ρj = β0 + β1xj + εj ,

where xj is an environmental cutting factor which could be controlled by theexperimenter.

Other examples could be found in animal growth literature such as studyingthe age of animals. The age of animals needs to be determined in this field ofresearch, but aging an animal is usually time-consuming and costly. However,variables on the physical size of an animal, which is closely related to age,can be collected easily and cheaply.

2.1. Distribution of errors

Based on the setup so far mentioned in this article, we carry out ourestimation strategy of the regression parameters in (1) and we assume thatthe errors εj j i ’s are normally distributed N (0, σ 2), thus the probability densityfunction (pdf) of εj(m) is given by

fεj(m)(ε; σ 2) = Cm

σ

[�

σ

)]m−1 [1 − �

σ

)]m−1

φ

σ

)= Cm

σ

[�

σ

)�

(−ε

σ

)]m−1

φ

σ

), for ε ∈ R,

where Cm = (2m−1)!(m−1)!2

.

It can be seen that

a. fεj(m)(ε; σ 2) is symmetric around zero.

b. E(εj(m)) = 0.

62 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

c. The ordered errors εj(m)’s have a constant variance, since

Var(εj(m) = Eε2j(m) − (Eεj(m))

2.

Eε2j(m) = Cm

σ

∫ ∞

−∞ε2�

σ

)m−1

(−ε

σ

)m−1

φ

σ

)dε

= σCm

∫ ∞

−∞

σ

)2 [�

σ

)�

(−ε

σ

)]m−1

φ

σ

)dε.

Using the transformtaion w = εσ

, we get

Eε2j(m) = σ 2Cm

∫ ∞

−∞w2�(w)m−1�(−w)m−1φ(w)dw

= σ 2 Dm,

where Dm = Cm∫∞−∞ w2�(w)m−1�(−w)m−1φ(w)dw.

Note that Dm is a constant depending only on m.The following theorem is given in Arnold et al. (1992) and will be used

in the sequel.

Theorem 1. Let X1:n, X2:n, . . . Xn:n be the order statistics of a random sample ofsize n taken from a standard normal distribution. Let σi j = cov(Xi :n, X j :n), thenσi j ≥ 0 and

∑nj=1 σi j = 1, for all i = 1, 2, . . . , n.

Using the above theorem with i = m, j = m and X j :m = εj(m), we concludethat

σmm = cov(Xm:n, Xm:n) = Var(Xm:n) = Dm ≤ 1.

Table 1 reveals that Dm is a decreasing function of m.

Table 1: Values of Dm for different values of m.

m 1 2 3 4 5 6Dm 1 0.449 0.287 0.210 0.167 0.137

Moreover Yj(m), j = 1, 2, . . . , r are independent random variables such thatfor each j , Yj has the following probability density function

fYj(m)(yj , β0, β1, σ

2) = Cm

σ�(uj )

m−1[1 − �(uj )]m−1φ(uj ),

where uj = uj −β0−β1xjσ

. The mean and the variance of Yj(m) are

E(Yj(m)|xj ) = β0 + β1xj ,

Analysis of simple linear regression via median ranked set sampling 63

and

Var(Yj(m)|xj ) = Var(β0 + β1xj + εj(m)) = σ 2 Dm .

2.2. Least squares estimation using MRSS

To find the least squares estimators of β0 and β1, it is necessary to minimizethe quantity

h1(β0, β1) =r∑

j=1

(Yj(m) − β0 − β1xj )2.

It can be easily seen that the least squares estimates of β0 and β1 are β0M andβ1M where

β0M = Y m − β1M x,

β1M =

r∑j=1

(Yj(m) − Y m)(xj − x)

Sxx,

and

x = 1

r

r∑j=1

xj , Y m = 1

r

r∑j=1

Yj(m) and Sxx =r∑

j=1

(xj − x)2.

Hence the best-fitting model is

Yj = β0M + β1M xj .

Theorem 2. Based upon the assumptions and results stated earlier, the least squaresestimators of the regression parameters have the following properties

1. E(β1M) = β1.

2. E(β0M) = β0.

3. Var(β1M) = σ 2 Dm/Sxx .

4. Var(β0M) = σ 2 Dm[ 1r − x2

Sxx].

5. Cov(Y m, β1M) = 0.

6. Cov(β0M , β1M = −xσ 2 Dm/Sxx .

64 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

Proof. We outline the proof only for the properties 1 and 5 and leave the restto the reader

1. β1M is unbiased estimator for β1

E(β1M) = E

r∑

j=1(xj − x)Yj(m)

SX X

=

r∑j=1

(xj − x)EYj(m)

Sxx

E(β1M) +

r∑j=1

(xj − x)β0

Sxx+

β1

r∑j=1

(xj − x)xj

Sxx+

r∑j=1

(xj − x)Eεj(m)

Sxx

So

E(β1M) = β1.

5. Cov(Y m, β1M) = E(Y m β1M) − EY m E β1M .

Cov(Y m, β1M) = E

r∑

j=1(xj − x)Yj(m)

Sxx

r∑i=1

Yi(m)

r

− (β0 + β1x)β1

= 1

r SxxE

r∑i=1

r∑j=1

(xi − x)E(Yj(m)Yi(m))

− (β0 + β1x)β1

= 1

r SxxE

r∑i=1

r∑j=1

(xj − x){Cov(Yj(m), Yi(m))+EYj(m)EYi(m)}

− (β0 + β1x)β1.

Cov(Y m, β1M)= 1

r Sxx

r∑i=1

r∑j=1

(xj − x)(β0 + β1xj )(β0 + β1xi)

−(β0 + β1x)β1

= 1

r Sxx

r∑j=1

[(xj − x)(β0 + β1xj )

r∑i=1

(β0 + β1xi)]

−(β0 + β1x)β1

Analysis of simple linear regression via median ranked set sampling 65

Cov(Y m, β1M)= 1

r Sxx

(β0

r∑j=1

(xj − x) + β1

r∑j=1

(xj − x)xj )

r∑i=1

(β0 + β1xi)

−(β0 + β1x)β1 = 1

r Sxx

[β1Sxx

r∑i=1

(β0+β1xi)

]−(β0+β1x)β1,

Cov(Y m, β1M) = β1

r

r∑i=1

(β0 + β1xi) − (β0 + β1x)β1

Cov(Y m, β1M) = β1β0 + β21 x − β1β0 − β2

1 x = 0,

Cov(Y m, β1M) = 0.

Theorem 3. Under the same assumption stated in this article and using model (1),the statistic

1

Dm(r − 2)

r∑j=1

(Yj(m) − Yj )2

is an unbiased estimator for σ 2.

Proof. To show this, let

S2m = SSE/(r − 2), SSE =

r∑j=1

(Yj(m) − Yj )2

Yj = β0M + β1M xj , j = 1, . . . , r.

It is easy to rewrite the term SSE as

SSE =r∑

j=1

(Yj(m) − Y m)2 − β21M Sxx .

Taking the expectation of the first term in the last equation yields,

E

r∑j=1

(Yj(m) − Y m)2

= E

r∑j=1

Y 2j(m)

− E(rY2m).

Since

EY m = β0 + β1x and Var(Y m) = σ 2 Dm

r,

66 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

then

EY2m = σ 2 Dm

r+ (β0 + β1x)2,

and

E

r∑j=1

Y 2j(m)

= rσ 2 Dm +r∑

j=1

(β0 + β1xj )2.

According to the above results, we may get the following expectation

E

r∑j=1

Y 2j(m)

− E(rY2m) = rσ 2 Dm +

r∑j=1

(β0 + β1xj )2

− r

(σ 2 Dm

r+ (β0 + β1x)2

)

= rσ 2 Dm +r∑

j=1

[β20 + β2

1 x2j + 2β0β1xj ]

− σ 2 Dm − r [β20 + β2

1 x2 + 2β0β1x],

which leads us to the following

E

r∑j=1

Y 2j(m)

− E(rY2m) = σ 2 Dm(r − 1) + β2

1 Sxx .

Note that

E[β21 Sxx ] = Sxx

[σ 2 Dm

Sxx+ β2

1

]= σ 2 Dm + β2

1 Sxx .

Therefore, it is clear and logical to conclude the following

E(SSE) = E

r∑j=1

(Yj(m) − Y m)2

− E[β21 Sxx ]

= σ 2 Dm(r − 1) + β21 Sxx − σ 2 Dm − β2

1 Sxx

= σ 2 Dm(r − 2).

Now

E S2m = E(SSE)

r − 2= σ 2 Dm,

Analysis of simple linear regression via median ranked set sampling 67

which allows us to conclude that

SSE

(r − 2)Dmis an unbiased estimator for σ 2.

On the other hand, it is easy to see that

SSE =r∑

j=1

(Yj(m) − Yj(m))2

= Syy − β21M Sxx ,

where Syy = ∑rj=1(Yj(m) −Y m)2 and SSR = β2

1M Sxx . Also, we may identify the

relation SST = SSR + SSE such that SSR = ∑rj=1(Yj(m) − Y m)2 is the sum of

squares due to the regression line and SST = Syy is the total sum of squares.Knowing that SSR = β2

1M Sxx allows us to take its expectation in thefollowing manner

E(SSR) = Sxx E(β21M) = β2

1 Sxx + σ 2 DM .

Also

E(Syy) = Er∑

j=1

Y 2j(m) − rY

2m

=r∑

j=1

EY 2j(m) − r EY

2m

=r∑

j=1

[Var(Yj(m)) + (EYj(m))2] − {r Var Y m + r(EY m)2}.

More simplification reveals the following

E(Syy) = rσ 2 Dm +r∑

j=1

(β0 + β1xj )2 − r

(σ 2 Dm

r+ (β0 + β1x)2

)

= rσ 2 Dm +r∑

j=1

(β20 + β0β1xj + β2

1 x2j ) − σ 2 Dm

− r(β20 + β0β1x + β2

1 x2).

This leads to concluding that

E(Syy) = (r − 1)σ 2 Dm + β21 Sxx .

68 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

3. Extreme rank set sampling

Let x11, x12, . . . , x1r , x21, x22, . . . , x2r be different values of x which arespecified by the experimenter prior to the experiment. For each xi j i = 1, 2 andj = 1, 2, . . . , r assume that the experiment has been repeated an odd numberof times. Let Yj (1) and Yj(n), j = 1, 2, . . . , r be the measurements on theresponse variable obtained using the ERSS procedure. Then

Yj (1) = β0 + β1x1 j + εj (1)

andYj (n) = β0 + β1x2 j + εj (n),

where j = 1, 2, . . . , r and εj (1), εj (n) are the corresponding errors. We notethat the random errors ε1(1), ε2(1), . . . , εr(1) are iid with pdf

f1(ε) = n[1 − �

σ

)]n−1

φ

σ

)1

σ, for ε ∈ R,

and ε1(n), ε2(n), . . . , εr(n) are iid with pdf

f2(ε) = n[φ

σ

)]n−1

φ

σ

)1

σ, for ε ∈ R.

Since the normal density is symmetric, then εj (1) = −εj (n) and

E(εj (1)) = −E(εj (n)) = µσ, say,

where

µ = n∫ ∞

−∞w[1 − φ(w)]n−1φ(w)dw.

It is easy to see that

Var(εj (1)) = Var(εj (n))

= σ 2∫ ∞

−∞

ε2

σ 2n[�

σ

)]n−1

φ

σ

)1

σdε − µ2σ 2.

Using the transformation w = ε/σ we get

Var(εj (1) = σ 2∫ ∞

−∞nw2[�(w)]n−1φ(w)dw − µ2σ 2

= σ D∗m,

Analysis of simple linear regression via median ranked set sampling 69

where D∗m = ∫∞

−∞ nw2�n−1(w)φ(w)dw−µ2. It can be noted, using Theorem 1,that D∗

m ≤ 1. Let

Yj (i) = β0 + β1xi j + εj (i), i = 1, 2, j = 1, 2, . . . , r,

where Yj (i) denotes the j th observation in the i th group. The least squaresestimators of β0 and β1 can be obtained by minimizing the quantity h2(β0, β1) =∑

i=1,n

∑rj=1 ε2

j (i) as a function of β0 and β1. Since

h2(β0, β1) =r∑

j=1

(Yj (1) − β0 − β1x1 j )2 +

r∑j=1

(Yj (n) − β0 − β1x2 j )2,

then it easy to see that

β1E =

r∑j=1

(x1 j − x1)Yj (1) +r∑

j=1(x2 j − x2)Yj (n)

r∑j=1

(x1 j − x1)2 +r∑

j=1(x2 j − x2)2

,

and

β0E = Y − β1E X ,

where

x1 = 1

r

r∑j=1

x1 j , x2 = 1

r

r∑j=1

x2 j and Y = 1

2r

r∑j=1

(Yj (1) + Yj (n)).

Theorem 4. The estimators β0E and β1E are unbiased estimators of β0 and β1,respectively, and

1. Var(β1E) = σ2 D∗m

Sxx1+Sxx2,

2. Var(β1E) ≤ Var(β1S),

where β1S is the SRS estimator of β0 based on a sample of size 2r , Sxx1 =∑rj=1(x1 j − x1)

2 and Sxx2 = ∑rj=1(x2 j − x2)

2.

Note that it is complicated to find Var(β0E) at this stage and we intend tolook for some solutions in future work.

70 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

Proof. Taking the expectation of β1E gives

E(β1E) =

r∑j=1

(x1 j − x1)E(Yj (1)) +r∑

j=1(x2 j − x2)E(Yj (n))

r∑j=1

(x1 j − x1)2 +r∑

j=1(x2 j − x2)2

=

r∑j=1

(x1 j − x1)(β0 + β1x1 j + µ) +r∑

j=1(x2 j − x2)(β0 + β1x2 j − µ)

r∑j=1

(x1 j − x1)2 +r∑

j=1(x2 j − x2)2

= β1

r∑

j=1(x1 j − x1)

2 +r∑

j=1(x2 j − x2)

2

r∑j=1

(x1 j − x1)2 +r∑

k=1(x2 j − x2)2

= β1.

This implies that β1E is an unbiased estimator of β1. The variance of β1E isgiven by

Var(β1E) =

r∑j=1

(x1 j − x1)2 Var(εj (1)) +

r∑j=1

(x2 j − x2)2 Var(εj (2))[

r∑j=1

(x1 j − x1)2 +r∑

j=1(x2 j − x2)2

]2

= σ 2 D∗m

r∑j=1

(x1 j − x1)2 +r∑

j=1(x2 j − x2)2

Since D∗m ≤ 1, then

Var(β1E) ≤ σ 2

r∑j=1

(x1 j − x1)2 +r∑

j=1(x2 j − x2)2

.

4. Comparison Among SRS, MRSS and ERSS

Let βi S , i = 0, 1 denote the least squares estimator of βi based on a SRS.In this section, we compare the estimators βi S , βi M and βi E , i = 0 or 1 viatheir efficiencies.

Analysis of simple linear regression via median ranked set sampling 71

SinceVar(β1S) = σ 2/Sxx ,

then the efficiency of β1M with respect to β1S is

eff(β1M , β1S) = Var(β1S)

Var(β1M)= 1

Dm.

Since Dm ≤ 1, then β1M is more efficient than β1S . On the other hand, we caneasily see that

Var(β0S) = σ 2

[1

r− x2

Sxx

],

hence

eff(β0M , β0S) = Var(β0S)

Var(β)M)= 1

Dm> 1,

which means that β0M is more efficient than β0S . In order to make fair com-parison between MRSS and ERSS, we need to use the same values of thecovariate, so we assume that β1M is computed using the covariate values x11,x12, . . . , x1r , x21, x22, . . . , x2r . This implies that Sxx = Sxx1 + Sxx2. Finally,the efficiency of β1M with respect to β1E is

eff(β1M , β1E) = Var(β1E)

Var(β1M)= D∗

m

Dm.

From the literature of order statistics theory, it can be shown that D∗m ≥ Dm

for all m. Hence eff(β1M , β1E) > 1 for all m. Since D∗m depends only on m,

we present some results in Table 2 concerning the values of D∗m for different

values of m.

Table 2: The values of D∗m for different values of m.

m 1 2 3 4 5 6D∗

m 1 0.51 0.32 0.25 0.182 0.152

Moreover, Table 3 gives the value of D∗m/Dm for different values of m. It is

clear that D∗m/Dm is greater than one which means that D∗

m > Dm .

Table 3: The values of D∗m/Dm for different values of m.

m 1 2 3 4 5 6

D∗m

Dm1 1.13 1.11 1.19 1.08 1.1

72 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

This is an indication that β1M is more efficient than β1E . As a conclusion, wesay that the MRSS estimators of β0 and β1 are more efficient and better thanERSS estimators.

5. Application

The MRSS technique is illustrated here in the shape of a real examplestudied by Platt, et al. (1988). The data set comprises 399 conifer (pinuspalustris) trees sorted according to two principal variables: X , the diameter incentimeters at breast height and Y , the entire height in feet. To compare theestimators obtained via MRSS to those obtained via SRS setup, we consider asimple random sample of size n = 29 and a MRSS of size n = 29. However,for MRSS we take m = 3 and take the median of the three observationscorresponding to its value Y and finally we obtain the MRSS based on Yvalues which eventually leads to MRSS X -values. Preliminary analysis showsthat the data could be fitted by a simple linear regression model. Applyingthe theory developed in this paper to this real data, we obtain β0M = 12.54and β1M = 0.38 with standard errors of 0.67 and 0.02, respectively. Using thesample random sampling scheme, the data produced the estimates β0S = 10.36and β1S = 0.43 with standard errors of 1.26 and 0.05, respectively. Accordinglyand based on the above analysis, we may conclude that the MRSS is moreefficient than the SRS for the simple linear regression analysis.

6. Discussion and Conclusion

In this paper, we introduced a new method to estimate the parameters ofthe simple linear regression model. We show that the new method producesestimators that are more efficient than those obtained via both SRS and ERSS.The distribution of the random errors in this article is assumed to be symmetric.Despite the fact that we directed our work to the case of normal distribution,yet we may consider other examples of symmetric distributions where we applyour findings. The list of such kind of distributions includes many options butwe limit our investigation to the logistic, student t and laplace distributions.The variance of the errors in this article is σ 2 Dm which differs according tothe distributional assumption. In order to show that the parameters estimatesusing MRSS are more efficient than those obtained under the SRS setup, weneed to check for the value of Dm for the previously mentioned distributionswhich are illustrated in Table 4. Arnold, et al. (1992) reported part of theresults related to the logistic distribution. The results indicate that Dm < 1for all m > 2 which agrees with the literature of ranked set sampling thatrecommends 3 ≤ m ≤ 7.

Analysis of simple linear regression via median ranked set sampling 73

Table 4: The values of Dm for different values of m.

m Laplace(0,1) Student t (5) Logistic(0,1)

2 0.6389 0.5830 1.28993 0.3512 0.3502 0.78994 0.3256 0.2498 0.56765 0.1751 0.1941 0.44266 0.1383 0.1587 0.36267 0.1138 0.1341 0.3071

Finally, this method may be easily extended to multiple regression setup, es-pecially the polynomial regression model proposed by Chen and Wang (2004).

Acknowledgments

The authors would like to thank the editor and the referees for the valuable comments that helpedus to bring this work in a good shape.

REFERENCES

Alodat, M. T. and Al-Sagheer, O. A. (2007) Estimation the location and scale parameters usingranked set sampling, J. Appl. Statis. Sci, 15, 245-252.

Arnold, B. C., Balakrishnan, N. and Nagaraja, H. N. (1992) A first Course in Order Statistic.,John Wiley and Sons, Inc, New York.

Chen, Z. and Wang, Y. (2004) Efficient regression analysis with ranked-set sampling, Biometrics,60, 997-1004.

Halls, L. S. and Dell, T. R. (1966) Trial of ranked set sampling for forage yields, Forest Science,12 (1), 22-226.

Kaur, A., Patil, G. P., Sinha, A. K. and Taillie, C. (1995) Ranked set sampling: an annotatedbibliography, Environ. and Ecol. Stat, 2, 25-54.

Martin, W. L., Shank, T., Oderwald, G. and Smith, D. W. (1980) Evaluation of ranked setsampling for estimating shrub phytomass in Application Oak forest, School of Forestry andWildlife Recourses VPI and SU Blackburg, VA.

McIntyre, G. A. (1952) A method of unbiased selective sampling, using ranked sets, Australian J.Agricultural Research, 3, 385-390.

Muttlak, H. (1995) Parameters estimation in a simple linear regression using rank set sampling,Biometrical J., 37 , 799-810.

Muttlak, H. (1997) Median ranked set sampling, J. Appl. Statist. Sci., 6, 245-255.

Patil, G. P., Sinha, A. K. and Taillie, C. (1993) Relative precision of ranked set sampling:comparison with regression estimator, Environmetrics., 4, 399-412.

Patts, J. M. and Elith, J. (2006) Comparing Species abundance model, Ecological Modelling,199, 153-163.

Platt, W. J., Evans, G. W. and Rathbun, S. L. (1988) The population dynamics of a long-livedconifer (Pinus palustris), Am. Nat., 131 (4), 491-525.

74 M. T. ALODAT – M. Y. AL-RAWWASH – I. M. NAWAJAH

Samawi, H. M., Ahmed, M. S. and Abu-Dayyeh, W. (1996). Estimating the population meanusing extreme ranked set sampling, Biometrical J., 38, 5, 577-586.

Takahasi, K. and Wakimoto, K. (1968) On unbiased estimates of the population mean based onthe sample stratified by means of ordering, Annals of Institute of Statistical Mathematics, 20,1-31.

Yu, P. L. and Lam, K. (1997) Regression estimator in ranked set sampling, Biometrics, 53, 1070-1080.

M. T. ALODATDepartment of StatisticsYarmouk University, [email protected]

M. Y. AL-RAWWASHDepartment of StatisticsYarmouk University, [email protected]

I. M. NAWAJAHDepartment of StatisticsYarmouk University, [email protected]