45
Inferences from Cross-Sectional, Stochastic Frontier Models L ´ eopold Simar Paul W. Wilson * August 2008 Forthcoming, Econometric Reviews Abstract Conventional approaches for inference about efficiency in parametric stochastic fron- tier (PSF) models are based on percentiles of the estimated distribution of the one-sided error term, conditional on the composite error. When used as prediction intervals, cov- erage is poor when the signal-to-noise ratio is low, but improves slowly as sample size increases. We show that prediction intervals estimated by bagging yield much better coverages than the conventional approach, even with low signal-to-noise ratios. We also present a bootstrap method that gives confidence interval estimates for (conditional) expectations of efficiency, and which have good coverage properties that improve with sample size. In addition, researchers who estimate PSF models typically reject mod- els, samples, or both when residuals have skewness in the “wrong” direction, i.e., in a direction that would seem to indicate absence of inefficiency. We show that correctly specified models can generate samples with “wrongly” skewed residuals, even when the variance of the inefficiency process is nonzero. Both our bagging and bootstrap meth- ods provide useful information about inefficiency and model parameters irrespective of whether residuals have skewness in the desired direction. * Simar: Institut de Statistique, Universit´ e Catholique de Louvain, Voie du Roman Pays 20, B 1348 Louvain-la-Neuve, Belgium; email [email protected]. Wilson: The John E. Walker Department of Economics, 222 Sirrine Hall, Clemson University, Clemson, South Carolina 29634–1309, USA; email [email protected]. Research support from the “Inter-university Attraction Pole”, Phase V (No. P5/24) and Phase VI (No. P6/03) from the Belgian Government, and from the Clemson University Research Foun- dation is gratefully acknowledged. We are grateful for comments on earlier versions of this work from Peter Schmidt and from seminar participants at Florida State University, University of Kansas, University of Georgia, Georgia Institute of Technology, Michigan State University, and Texas A&M University, as well as participants at the European Productivity Workshop in Brussels, July 2005, and the Econometric Society European Meetings in Vienna, August 2006. Any remaining errors are solely our responsibility.

Inferences from Cross-Sectional, Stochastic Frontier Models

Embed Size (px)

Citation preview

Inferences from Cross-Sectional,

Stochastic Frontier Models

Leopold Simar Paul W. Wilson∗

August 2008

Forthcoming, Econometric Reviews

Abstract

Conventional approaches for inference about efficiency in parametric stochastic fron-tier (PSF) models are based on percentiles of the estimated distribution of the one-sidederror term, conditional on the composite error. When used as prediction intervals, cov-erage is poor when the signal-to-noise ratio is low, but improves slowly as sample sizeincreases. We show that prediction intervals estimated by bagging yield much bettercoverages than the conventional approach, even with low signal-to-noise ratios. We alsopresent a bootstrap method that gives confidence interval estimates for (conditional)expectations of efficiency, and which have good coverage properties that improve withsample size. In addition, researchers who estimate PSF models typically reject mod-els, samples, or both when residuals have skewness in the “wrong” direction, i.e., in adirection that would seem to indicate absence of inefficiency. We show that correctlyspecified models can generate samples with “wrongly” skewed residuals, even when thevariance of the inefficiency process is nonzero. Both our bagging and bootstrap meth-ods provide useful information about inefficiency and model parameters irrespective ofwhether residuals have skewness in the desired direction.

∗Simar: Institut de Statistique, Universite Catholique de Louvain, Voie du Roman Pays 20, B 1348Louvain-la-Neuve, Belgium; email [email protected]. Wilson: The John E. Walker Department ofEconomics, 222 Sirrine Hall, Clemson University, Clemson, South Carolina 29634–1309, USA; [email protected]. Research support from the “Inter-university Attraction Pole”, Phase V (No. P5/24)and Phase VI (No. P6/03) from the Belgian Government, and from the Clemson University Research Foun-dation is gratefully acknowledged. We are grateful for comments on earlier versions of this work from PeterSchmidt and from seminar participants at Florida State University, University of Kansas, University ofGeorgia, Georgia Institute of Technology, Michigan State University, and Texas A&M University, as well asparticipants at the European Productivity Workshop in Brussels, July 2005, and the Econometric SocietyEuropean Meetings in Vienna, August 2006. Any remaining errors are solely our responsibility.

1 Introduction

Parametric stochastic frontier (PSF) models introduced by Aigner et al. (1977) and Meeusen

and van den Broeck (1977) specify output, cost, etc. in terms of a response function and

a composite error term. The composite error term consists of a two-sided error represent-

ing random effects and a one-sided term representing technical inefficiency. Since their

introduction, several hundred papers, describing either methodological issues or empirical

applications of these models, have appeared in the literature. Bauer (1990), Greene (1993),

and Kumbhakar and Lovell (2000) provide overviews of developments in this area in varying

levels of detail.

PSF models are typically estimated by the maximum likelihood (ML) method. Bauer

(1990), Bravo-Ureta and Pinheiro (1993), and Coelli (1995) observe that most applied pa-

pers describe estimation of cross-sectional models with errors composed of normal and half-

normal random variables. Interest typically lies in obtaining information about (i) mean

technical efficiency; (ii) technical efficiency for individual firms, either real or hypothetical;

or (iii) marginal effects, returns to scale, or other features of the response function. In case

(i), information about mean technical efficiency is gleaned from estimates of unconditional

expectations of the efficiency process. In case (ii), when cross-sectional data are used, firm-

specific efficiency is unidentified; consequently, researchers typically estimate expectations

of efficiency, conditional on observed residuals, and often rank firms in terms of these es-

timated conditional expectations. In both cases, many papers have relied only on point

estimates, with no attempt at inference regarding either firm-specific technical efficiency or

mean technical efficiency.

Taube (1988) and Horrace and Schmidt (1996) provide an analytic approach that yields

prediction intervals for either firm-specific or mean technical inefficiency; we refer to these

as conventional prediction intervals. Their idea is based on percentiles of the estimated

distribution of the one-sided error term, conditional on the composite error. Horrace and

Schmidt as well as others (e.g., Greene, 1999; Giannakas et al., 2003) have noted that

use of the conventional prediction intervals involves replacing unknown parameters with

their estimates, and hence the resulting interval estimates understate the uncertainty that is

involved. Consequently, one might expect bootstrap methods to be useful in this situation.

1

However, as shown below, the presence of the one-sided error term in PSF models creates

even greater problems for conventional inference along the lines of Taube (1988) and Horrace

and Schmidt (1996).1

While we agree with the view of Horrace and Schmidt (1996, page 257) that “boot-

strapping procedures are an imperfect substitute for an adequately developed distributional

theory,” for reasons that become clear in the following sections, bootstrap methods offer some

useful tools for inference within the framework of PSF models. In particular, in this paper

we show how bootstrap aggregating, i.e., bagging, can be used to estimate prediction inter-

vals reflecting uncertainty about model parameters that have been replaced by estimates.

In addition, we obtain bootstrap estimates of confidence intervals for the quantities that

are actually estimated in stochastic frontier applications, namely conditional and uncondi-

tional expectations of the inefficiency process. We find, through Monte Carlo experiments,

that prediction intervals estimated by the bagging technique provide much better coverage

properties than prediction intervals estimated by the conventional method. In addition, we

similarly find that our bootstrap confidence intervals provide better coverage of both condi-

tional and unconditional expectations of inefficiency than intervals relying on the Hessian of

the log-likelihood function and constructed using the delta method.

The remainder of our paper unfolds as follows: notation and a PSF model are defined in

Section 2, with a brief review of quantities that are typically estimated in PSF studies. In

Section 3 we discuss some estimation issues that create problems for inference, particularly

when conventional approaches are used. Section 4 describes the prediction intervals suggested

by Taube (1988) and Horrace and Schmidt (1996) and how bagging techniques can be used

to estimate intervals that reflect uncertainty surrounding estimates of model parameters.

Section 5 discusses confidence intervals for the conditional and unconditional expectations

that are typically estimated in PSF applications, and describes how these can be estimated

by either the delta method or by bootstrap methods. Section 6 gives details of our Monte

Carlo experiments and a discussion of the results, while the final section concludes.

1Other inference-making methods have been proposed; examples include Horrace (2005) and Flores-Lagunes et al. (2007) in the panel-data context.

2

2 The Model

In typical cross-sectional applications where PSF models are estimated, researchers use a

set of n observed pairs of input and output quantities to estimate a production frontier,

technical efficiency, etc. To begin, we define a statistical model along the lines of what is

typically used in applications. Our model is comprised of two assumptions:

Assumption 2.1. The set of n observed pairs of input and output quantities, Sn =

{(xi, yi)}ni=1, with xi ∈ R

p+, yi ∈ R

1+ for each i, results from n independent draws from

a (p+ 1)-dimensional distribution function FXY (x, y).

Assumption 2.2. The distribution function FXY (x, y) is such that, conditional on X,

Y = g(X | β)eW (2.1)

where

W = V − U, (2.2)

V ∼ N(0, σ2V ), (2.3)

U ∼ N+(0, σ2U), (2.4)

and V and U are independent,

Additional assumptions (e.g., convexity of g(X | β) are sometimes made, but are not

important for purposes of the analysis that follows. Following standard practice, we use

upper case Roman letters (e.g., U , V , W , X, Y ) to denote random variables, and lower case

letters (e.g., u, v, w, x, y) to denote realizations of random variables or other non-stochastic

quantities.. Although we consider production functions in this paper, our discussion and

results extend to estimation of cost functions with some changes in notation; e.g., the minus

sign on the right-hand side of (2.2) could be changed to a plus sign to consider cost functions.

Taken together, Assumptions 2.1 and 2.2 imply that for an observed realization (xi, yi),

there are corresponding realizations vi, ui and hence wi, although these are not observed.

Note that even if wi were observed, neither vi nor ui would be identified; e.g., vi = 2, ui = 1

is indistinguishable from vi = 1.5, ui = 0.5 as well as infinitely many other pairs of values.

3

The distributional assumption in (2.3) is typical in applications. The half-normal as-

sumption in (2.4) is also common, but other assumptions are sometimes used for the dis-

tribution of the random term U , e.g., N+(µU , σ2U) with µU 6= 0, gamma, exponential, etc.2

Bauer (1990), Bravo-Ureta and Pinheiro (1993), and Coelli (1995) observe that most ap-

plied papers describe estimation of cross-sectional models with errors composed of normal

and half-normal random variables. Our qualitative results are unaffected by the choice of

one-sided distribution for the random variable U .

Given the conditions (2.2)–(2.4) in Assumption 2.2, the density of W can be found using

either the convolution theorem or characteristic functions, yielding

f(w | γ, σ2) =1

σ√

π2

[1 − Φ

(w

σ

√γ

1 − γ

)]exp

(− w2

2σ2

), (2.5)

where σ2 = σ2V +σ2

U , γ = σ2Uσ

−2, and Φ(·) denotes the standard normal distribution function.

Note that the variance of W is given by σ2W =

(π−2

π

)σ2

U + σ2V ; the term σ2 is only used to

re-parameterize the model.

Given an independent sample Sn = {(xi, yi)}ni=1 as described in Assumption 2.1, max-

imum likelihood estimates θ =[β σ2 γ

]′of the parameter vector θ =

[β σ2 γ

]′are

obtained by θ = argmaxθ

L(θ | Sn), where L(θ | Sn) is the log-likelihood function

L(θ | Sn) = −(n

2

)log

(π2

)−

(n2

)log σ2 +

n∑

i=1

log

[1 − Φ

([log yi − log g(xi | β)]γ1/2

σ(1 − γ)1/2

)]

− 1

2σ2

n∑

i=1

[log yi − log g(xi | β)]2 . (2.6)

ML estimates of σ2U and σ2

V can be recovered from the relationships σ2U = γσ2 and σ2

V =

σ2 − σ2U . Here, we use the parameterization of Battese and Corra (1977); alternatively,

(2.5) and (2.6) may be written in terms of σ2 and λ = σU

σV

, which was the parameterization

used by Aigner et al. (1977). Note that λ2 = γ1−γ

. The parameterization used here has the

advantage that γ ∈ [0, 1], while λ ∈ [0,∞); using γ simplifies numerical maximization of the

log-likelihood function.

2The choice of one-sided distribution is not innocuous. If the one-sided distribution for u can be made toresemble a normal distribution with certain parameter combinations, estimation can be very difficult withsample sizes commonly found in the literature. This was demonstrated for a normal-gamma convolution byRitter and Simar (1997), but the problem could also be expected if the distribution for u was specified astruncated normal, with the location parameter to be estimated (instead of constrained to zero, as in thehalf-normal distribution).

4

Now consider an arbitrary realization (x0, y0) ∈ Rp+×R

1+ from the data-generating process

(DGP) defined by Assumption 2.2; this point might be one of the observations in Sn, or some

other point of interest, perhaps representing input and output quantities of an hypothetical

firm. In any case, (x0, y0) is known with certainty. Associated with this point is a distance,

w0 = ω(θ | x0, y0) = log y0 − log g(x0 | β). (2.7)

Note that while w0 is written as a function of θ, dependence is only on the element β in

θ. Clearly, w0 gives distance in the direction of the y axis from the known point (x0, y0) to

the (unobserved) response function; w0 is non-stochastic, but unobserved. However, given a

consistent estimator θ of θ, w0 can be estimated consistently by

w0 = ω(θ | x0, y0), (2.8)

which amounts to replacing the unknown β on the right-hand side of (2.7) with a consistent

estimator β.

In addition, it is straightforward to show that the conditional density of U , given the

non-stochastic value w0, is

f(u | θ,x0, y0) =

{σ−1∗ φ

(u−µ∗

σ∗

) [1 − Φ

(−µ∗

σ∗

)]−1

∀ u ≥ 0;

0 otherwise,(2.9)

where

µ∗ = −w0σ2U

σ2= −w0γ, (2.10)

σ2∗ =

σ2Uσ

2V

σ2= σ2γ(1 − γ), (2.11)

and φ(·) denotes the standard normal density function. Here, the conditioning is on θ, x0,

and y0 due to the fact that in (2.7), w0 is merely a function of θ, x0, and y0. Note also that

the result in (2.9) reveals that, conditional on w0, U is distributed truncated normal, i.e.,

N+(µ∗, σ2∗).

Battese and Coelli (1988, p. 390) remark that

“It is important to clearly define the appropriate measure of technical efficiency

before reporting numerical values. It appears that there is a degree of confusion

in the literature on this point.”

5

Consequently, we attempt in the remainder of this section to define clearly what is known,

what is unknown, and what will be estimated.

Working in a panel-data context, Battese and Coelli (p. 389–390) define

“technical efficiency of a given firm as the ratio of its mean production (in original

units), given its realized firm effect, to the corresponding mean production if the

firm effect was zero... The suggested measure of technical efficiency for the ith

firm is thus e−ui.”

Unfortunately, as noted above and as Battese and Coelli recognize, the quantity e−ui cannot

be identified from the data in Sn. As an alternative, Battese and Coelli (1988, p. 389) define

“the mean technical efficiency of firms in the industry that corresponds to the measure of”

e−ui. For a fixed point (x0, y0), the measure introduced by Battese and Coelli corresponds

to

τ0 ≡ E(e−U | w0

)

= τ(θ | x0, y0)

=

[1 − Φ

(σ∗ −

µ∗

σ∗

)][1 − Φ

(−µ∗

σ∗

)]−1

exp

(−µ∗ +

σ2∗

2

). (2.12)

The last line of (2.12) follows after using (2.9) in the integration required to compute the

conditional expectation. The expression for τ0 in (2.12) appears in equation (8) of Battese

and Coelli.3 The quantity τ0 represents “mean technical efficiency” of a firm operating at

the point (x0, y0) in the space of inputs and outputs in the sense that it gives the expected

value of technical efficiency e−U for any firm operating at the point (x0, y0). Since w0 is

non-stochastic, τ0 is non-stochastic. Replacing the unknown parameters in (2.12) with their

corresponding ML estimates yields a ML estimate

τ0 = τ(θ | x0, y0) (2.13)

of τ(θ | x0, y0).

3Some authors have estimated exp [−E(U | w0)] instead of E(e−U | w0

). But, the expectation opera-

tor is linear, and consequently E(e−U | w0

)6= e−E(U|w0). The difference between exp [−E(U | w0)] and

E(e−U | w0

)can be substantial. For example, if σ2

U= σ2

V= 1, then e−E(U|w0)/E

(e−U | w0

)= 0.6644,

0.5255, and 0.3798 for w0 = −1, 0, and 1, respectively. Hence, we focus on the conditional expectationdefined in (2.12).

6

In the case of mean technical efficiency in the population of firms, we have

τ = E(e−U

)= τ (σ2

U) = 2e1

2σ2

U [1 − Φ(σU )] , (2.14)

which follows after multiplying e−U by the half-normal density in (2.4) and then integrating

with respect to U over [0,∞). The result in (2.14) is a function of the single model pa-

rameter σ2U . The quantity τ represents the mean level of technical efficiency throughout the

population, apart from the efficiency of a specific firm operating at a specific location. This

quantity too can be estimated consistently by the ML method by τ = τ (σ2U).

In applied studies, researchers often report point-estimates of technical efficiency corre-

sponding to the observed firms in Sn. In other words, researchers report estimates of τ0

defined in (2.12), replacing subscripts 0 with subscripts i. These estimates correspond to τ0

defined in (2.13), again replacing subscripts 0 with subscripts i. In addition, Researchers

sometimes report point-estimates of population mean efficiency τ .

Note that instead of conditioning on the non-stochastic value w0 in (2.12), one could

instead consider the quantity

T ≡ E(e−U |W

), (2.15)

where the conditioning is on the random variable W . In contrast to τ defined earlier in

(2.12), the quantity T is a random variable. Replacing the non-stochastic w0 in (2.10) with

the stochastic W , and then substituting for µ∗ in (2.12) yields

T =

[1 − Φ

(σ∗ +

σ∗

)][1 − Φ

(Wγ

σ∗

)]−1

exp

(Wγ +

σ2∗

2

), (2.16)

illustrating that the random variable T defined in (2.15) is simply a function of model

parameters and the random variable W . Since the density of W is given by (2.5), it would

be straightforward, if tedious, to derive the density of T in terms of model parameters;

and one could then derive prediction intervals for T along the lines of Taube (1988) and

Horrace and Schmidt (1996). The result, however, would be different from the intervals

described by Taube and by Horrace and Schmidt. Alternatively, one could adapt the bagging

method described below in Section 4.2 to estimate prediction intervals for T . However, to

our knowledge, no applied studies have attempted to estimate prediction intervals for T .

Moreover, no realizations of T can be observed, since T is a function of unknown model

parameters.

7

Applying the law of iterated expectations, we have

E(T ) = EW

[E

(e−U |W

)]= E

(e−U

), (2.17)

and therefore E(T ) = τ given by (2.14). Although T cannot be estimated, because it is a

random variable, its expectation τ can be estimated, as noted above; as will also be discussed

below, confidence intervals for τ can be estimated.

To summarize the discussion so far, applied researchers are typically interested in esti-

mates of θ, τ(θ | x0, y0) for various points (x0, y0) (usually corresponding to each of the

observed input-output combinations in Sn), and perhaps τ , as well as confidence intervals

for these quantities. We next turn to a discussion of some estimation issues that affect

researchers’ ability to make inference about the quantities of interest in PSF models.

3 Boundary Problems

In order to better understand the simulation results that appear later in Section 6, it is

important to realize that the presence of a boundary in the support of U poses some problems

for conventional inference in PSF models. As will be seen, the bagging and bootstrap

methods introduced below in Sections 4 and 5 avoid some of these problems.

Lee (1993) derives the asymptotic distribution of the maximum likelihood estimator[β σ2

]for the case where γ = 0 (implying σ2

U = 0), and observes (i) that the information

matrix is singular in this case; (ii) that the likelihood ratio statistic for testing the null

hypothesis γ = 0 is asymptotically distributed as 12χ2(0) + 1

2χ2(1), i.e., as a mixture of a

degenerate chi-square distribution with a unit mass at zero and a chi-square distribution

with one degree of freedom; and (iii) the convergence rates of the estimators of the intercept

term and of γ are slower than the usual O(n−1/2

)parametric rate. For cases where γ > 0

but close to zero, Lee shows that the first-order normal approximation for γ can be poor as

γ approaches zero.4

Unfortunately, even if γ is not close to zero, similar problems arise in finite samples.

Aigner et al. (1977) noted that in some finite samples, the composite residuals in (2.1) may

have positive skewness. In such cases, the ML estimator γ of γ will be zero, while the ML

4Lee (1993) works in terms of the parameterization of Aigner et al. (1977), using λ instead of γ. However,the results described here hold after straightforward substitution.

8

estimators of β and σ2 will be equal to the corresponding OLS estimators in the case of a

linear response function, as noted by Waldman (1982). This was discussed briefly in Olson

et al. (1980) and confirmed in their Monte Carlo experiments. Both Aigner et al. (1977) and

Olson et al. (1980) as well as others have referred to this as a “failure” of the ML estimator,

but this terminology is misleading. As Lee (1993) notes, the estimators are identifiable, even

though the negative of the Hessian of the log-likelihood in (2.6) cannot be inverted when

γ = 0 (or when γ is replaced by an estimate γ = 0) due to singularity. The problem is not

a failure of the ML method per se, but rather a difficulty for inference.

A simple Monte Carlo experiment illustrates that the problem can be severe for reason-

able, plausible values of the model parameters. Table 1 shows, for various sample sizes n

and ratios λ2 = σ2U/σ

2V , the proportion among each set of 1,000 samples drawn from the

normal/half-normal convolution with density given by (2.5) that exhibit positive skewness

within a given sample. The results in Table 1 show, for example, that for λ2 = 1, almost

1/3 of samples of size n = 100 will have skewness in the “wrong” direction. Of course, the

problem goes away asymptotically, but at a rate that depends on the value of λ2. For the

case where λ2 = 1, even with n = 1, 000 observations, about 3.5 percent of samples will have

positive skewness. If λ2 = 0.5—which is far from implausible in applied studies—even with

1,000 observations, one should expect that about 22.5 percent of samples will have positive

skewness. For smaller values of λ2 the problem becomes even more severe. At λ2 = 0.1, even

with 100,000 observations, about 20 percent of samples will exhibit positive skewness.

The results in Table 1 imply that in finite samples, the sampling distribution of the

ML estimators for parameters in the model (2.1) are necessarily far from normal in many

cases. For example, if λ2 = 1 and n = 100, the sampling distribution of the ML estimator

γ will have a mass of approximately 0.3 at zero. Of course, the sampling distribution is

normal asymptotically provided γ is not equal to either 0 or 1. But, the results in Table 1

indicate that asymptotic normal approximations are, in many cases, not particularly useful

for inference about the parameters of the model when samples are finite, even if γ is not

very close to 0. In fact, conventional inference based on inverting the negative Hessian of

the log-likelihood in (2.5) is not possible whenever γ is replaced by an estimate equal to

zero due to singularity (although one can use the outer dot-product of the gradient of the

log-likelihood in such cases). Again for the case where λ2 = 1, n = 100, this problem can be

9

expected to arise in almost 1/3 of all cases. It is well-known that bootstrap methods often

outperform conventional methods in such situations, due to the fact that the bootstrap takes

account of higher moments, which typically matter when limiting normal distributions are

a poor approximation to finite-sample distributions.

As will be seen below in Section 6, the implications of these results for the conventional

methods of estimating prediction and confidence intervals are severe. Additional problems

are created, however, by conditioning that is imposed by some researchers, but without

accounting for the conditioning in the likelihood function. Apparently, when applied re-

searchers encounter residuals with the “wrong” skewness, many (if not all) typically either

(i) obtain a new sample, or (ii) re-specify the model. Indeed, the user’s manual for the

widely-used LIMDEP software package (Greene, 1995, Section 29.3) advises users that “If

this condition emerges, your model is probably not well specified or the data are inconsis-

tent with the model.” Moreover, we know of no published papers reporting an estimate of

zero for the variance parameter of the one-sided component in a stochastic frontier model.5

Yet, the results in Table 1 make clear that even when the PSF being estimated is the true

model, one should expect to sometimes obtain estimates γ = 0 (or an analogous result in

models where other one-sided distributions are specified; the problem is not unique to the

normal/half-normal convolution used here).

Presumably, the researcher estimating a model such as the one in (2.1)–(2.4) must be-

lieve that the model is at least a good approximation of the underlying DGP, if not the

correct specification. Otherwise, he would estimate a different model. Changing the model

specification simply because residuals are skewed in the wrong direction is likely to lead to

a mis-specification, particularly when the DGP produces finite samples yielding positively

skewed residuals with some non-zero frequency. Moreover, it is well-known that classical

inference assumes that the model specification is chosen independently of any estimates that

are obtained; specification-searching introduces problems of bias in both parameter estimates

as well as variance-covariance estimates (see Leamer, 1978 for discussion).

Note that if one were to discard estimates γ = 0 and draw a new sample without changing

5Of course, there are many published papers that specify a production or cost function with a two-sidederror term, but with no one-sided error term. Provided the expectation of the two-sided error is zero,the response functions in these specifications are interpreted as conditional mean functions. This is verydifferent from PSF models, where the response function is interpreted as the boundary of a production set.The boundary problems described here do not exist when conditional mean functions are estimated.

10

the model specification, until obtaining an estimate γ > 0, this would be tantamount to

imposing a complicated conditioning on the underlying model. This point is crucial. If the

true model is the one represented by (2.1)–(2.4), where the v and u are iid, then we have seen

that unless the variance ratio λ2 is sufficiently large, one should expect to sometimes draw

samples with positive skewness in the composite residuals with sample sizes of the same order

as those used in published applications. If one were to believe that the true model can only

generate finite samples with negatively-skewed residuals, then such a model would be very

different from the ones that have been estimated, and different from the one in (2.1)–(2.4).

Indeed, to ensure that only samples with negative skewness can be generated, one would

have to impose a complicated correlation structure that would ensure that whenever a large,

positive value v is drawn, a sufficiently large value of u is also drawn. We know of no instance

in the literature where such a model has been estimated. On the contrary, models such as

(2.1)–(2.4) have been estimated, and this model is clearly capable of generating samples with

positive skewness; moreover, it will do so with a frequency that depends on λ2 and n. More

importantly, values of λ2 and n that are plausible in applications result in relatively high

frequencies of samples with positive skewness.

If a model such as (2.1)–(2.4) is to be estimated, and the researcher has the bad luck to

draw a sample that yields positively-skewed residuals, a solution would be to increase the

sample size. In most applications, however, this option is likely not feasible. Nonetheless,

positively skewed residuals should not be taken as evidence that the model is misspecified.

Fortunately, as will be seen in Section 6 below, while conventional approaches to inference

break down when γ = 0, the bagging and bootstrap approaches provide useful information,

provided the true γ is strictly greater than 0.

As a practical matter, two software packages are commonly used to estimate stochastic

frontier models: (i) the commercial package known as LIMDEP (Greene, 1995), and (ii) a

free-ware package known as FRONTIER (Coelli, 1996); see Sena (1999) and Herrero and

Pascoe (2002) for reviews of these packages. The two packages are rather different in their

treatment of cases where composite residuals have the “wrong” skewness. In both packages,

ordinary least squares (OLS) estimates are first obtained to serve as starting values after

adjusting the intercept and variance terms using the modified OLS estimator. In the case of

LIMDEP, when the OLS residuals have positive skewness, the program stops with a message

11

stating, “Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.” While it is indeed

true that the ML parameter estimates of β and σ2 are equivalent to the OLS estimates in such

cases, the OLS standard error estimates that are reported should not be taken as estimates of

the standard error of the ML estimates. The OLS standard error estimates are conditioned

on γ = 0, and consequently understate the true standard errors since uncertainty about γ is

ignored.6 Indeed, conventional standard error estimates of the ML estimates are unavailable

due to singularity of the negative Hessian of the log-likelihood in this case. However, the

bootstrap and bagging methods remain valid when γ = 0, provided γ > 0.

With FRONTIER, estimation proceeds even if the OLS residuals have positive

skewness—even though with a linear response function, the OLS estimates of slope pa-

rameters and σ2 are equivalent to ML estimates as discussed by Waldman (1982). After the

OLS estimates have been obtained, a grid search procedure is used to find a starting value

for γ; then these starting values are used in the DFP algorithm (Davidon, 1959; Fletcher and

Powell, 1963). If the OLS residuals have positive skewness, FRONTIER returns a very small

estimate for γ, but typically not zero. In addition, FRONTIER does not use the inverse

negative Hessian to estimate the variance-covariance matrix, but rather the DFP direction

matrix, which is an approximation of the inverse negative Hessian. The DFP algorithm

is based in part on approximating the objective function by a quadratic; the accuracy of

the approximation of the Hessian by the DFP direction matrix will suffer if the algorithm

iterates only a few times or if the objective function is far from quadratic. Inattentive users

may be misled by the fact that FRONTIER returns estimates of variance for the parameter

estimates in all cases, even though these are clearly invalid when the OLS residuals have

positive skewness.7

The problems surrounding inference about model parameters also affect inferences about

technical efficiency, as demonstrated in the next section.

6Neither the output from the LIMDEP program nor the accompanying manual suggest that the OLSstandard error estimates should be taken as estimates of the standard error of the ML estimates when theOLS residuals are positively skewed. Nor do we know of any cases where one has done so; rather, as discussedearlier, we expect that many applied users are tempted to draw a new sample at this point, or to re-specifytheir model, even though residuals with positive skewness can easily arise when (2.1)–(2.4) is the correctspecification.

7The discussion here is not meant to imply that FRONTIER is not a useful program; in fact, aside fromthe problems mentioned here, it is easy to use for a wide variety of models, and typically runs very fast;moreover, its author has been kind to make the source code available, at zero price. One should, however,compute OLS residuals and check their skewness before using FRONTIER to estimate PSF models.

12

4 Prediction Intervals for Technical Efficiency

4.1 Estimation by conventional methods

Given the fixed point (x0, y0), with corresponding distance w0 in (2.7), the intervals proposed

by Horrace and Schmidt (1996) are obtained by using the conditional density of U given in

(2.9) to derive percentiles ρα defined by Pr(U ≤ ρα | θ,x0, y0) =∫ ρα

0f(u | θ,x0, y0) du = α.

Inverting the last expression yields the α-percentile

ρα(θ | x0, y0) = µ∗ + σ∗Φ−1

[1 − (1 − α)Φ

(µ∗

σ∗

)](4.1)

of the random variable (U | θ,x0, y0), where Φ−1(·) denotes the standard normal quantile

function. Since the exponential function is monotonic, it follows that

Iα(θ | x0, y0) =[exp

(−ρ1−α/2(θ | x0, y0)

), exp

(−ρα/2(θ | x0, y0)

)](4.2)

gives a two-sided, (1 − α) × 100-percent prediction interval for the random variable(e−U | θ,x0, y0

).8

Horrace and Schmidt (1996) propose estimating the interval in (4.2) by Iα(θ | x0, y0),

i.e., by replacing the unknown θ in (4.2) by the ML estimate θ. However, as Horrace and

Schmidt, (1996, p. 262) and others (e.g., Greene, 1999, page 108; Giannakas et al., 2003,

page 82, footnote 8) have noted, the resulting interval estimates understate uncertainty since

the uncertainty inherent in parameter estimates θ is not accounted for in this approach.

Although this problem disappears asymptotically, it may affect coverages in finite-sample

situations.

Far more importantly, however, as will be confirmed below in Section 6, the unique

features of the PSF model discussed in Section 3 lead to poor coverages by intervals based

on percentiles defined by (4.1). Inspecting (2.10) and (2.11), it is clear that µ∗ = σ2∗ = 0

if γ = 0. In this case, rhoα(θ | x0, y0) = 0 ∀ α ∈ [0, 1], and hence Iα(θ | x0, y0) = [1, 1].

Of course, if γ = 0, then σ2U = 0, and there is no technical inefficiency. However, as the

discussion in Section 3 makes clear, γ will sometimes be zero, even when γ > 0. In such

cases, estimates of the intervals defined by (4.2) collapse to a point for all values of α, leading

to no coverage by the estimated intervals.

8Taube (1988), working with a cost function, proposed a one-sided prediction interval. In terms ofthe production framework used here, Taube’s idea yields the (1 − α) × 100-percent prediction interval(0, exp (−ρα(θ | x0, y0))] for

(e−U | θ, x0, y0

).

13

Both of these problems can be alleviated by using bagging techniques.

4.2 Estimation by bagging

The idea of bootstrap aggregating was first used by Harris (1989); the name bagging was

given by Breiman (1996). The idea consists of generating replications of a predictor, and

then using the replicates to obtain an aggregated predictor. Theoretical properties of bagging

predictors have been examined by Friedman and Hall (1999), Buhlmann and Yu (2002), and

Fushiki et al. (2005). In particular, bagging predictors have been shown to provide substantial

improvements over conventional predictors (such as ML estimates of the intervals defined in

Section 4.1) when the conventional predictors are unstable in the sense that perturbations of

the learning set can cause significant changes in predictions. Due to the problems discussed

earlier in Section 3, this is precisely the case with ML estimates of the intervals defined above

in Section 4.1.

Recall that the interval Iα(θ | x0, y0) defined in (4.2) is determined by percentiles of the

conditional density f(u | θ,x0, y0) given in (2.9), which is estimated by f(u | θ,x0, y0). By

contrast, bagging determines a prediction interval based on percentiles of an estimate of

Ebθ

[f(u | θ,x0, y0)

]=

Θ

f(u | θ,x0, y0)ψ(θ) dθ, (4.3)

where Θ is the parameter space and ψ(θ) is the sampling density of θ. The sampling density

ψ(θ) is unknown, but it can be approximated by bootstrap methods; the bagging technique

involves estimating the expectation in (4.3) by B−1∑B

b=1 f(u | θ∗

b ,x0, y0), where the θ∗

b are

bootstrap estimates of θ. Implementing the bagging estimator of the prediction interval in

(4.2) is straightforward; Fushiki et al. (2005) refer to the resulting interval estimate as a

“parametric bootstrap prediction interval.” The necessary steps are given below:

Algorithm #1:

[1] Maximize the log-likelihood function (2.6) to obtain ML estimates θ, Iα(θ | x0, y0),

and w0 = ω(θ | x0, y0); recover estimates σ2V , σ2

U from elements of θ.

[2] Set b = 1.

[3] For i = 1, . . . , n draw v∗i ∼ N(0, σ2V ) and u∗i ∼ N+(0, σ2

U), and compute y∗ib = g(xi |β)ev∗

i−u∗

i .

14

[4] Using the pseudo-data S∗b,n = {xi, y∗ib}n

i=1, compute bootstrap estimates θ∗

b =

argmaxθ

L(θ | S∗b,n) after replacing log yi with log y∗ib in (2.6).

[5] Draw u∗bk, k = 1, . . . , K, from f(u | θ∗

b ,x0, y0) obtained by replacing θ in (2.9) with

θ∗

b , and set ϕ∗bk = exp(−u∗bk) for each k = 1, . . . , K.

[6] Increment b by one.

[7] Repeat steps [3]–[6] B times, yielding a set U = {ϕ∗bk}B,K

b=1,k=1 of BK values.

[8] Form the set U∗ = {ϕ | ϕ ∈ U, ϕ 6= 1}.

The last step is necessary due to the fact that (as discussed previously in Section 3) some

bootstrap estimates in step [4] may yield estimates γ∗b —the bootstrap analog of γ—equal to

zero, causing the density f(u | θ∗

b ,x0, y0) to collapse to a probability mass at zero in step

[5]. This issue is explained further in Section 6. The realizations in U∗ can be sorted to find

α/2 and (1−α/2) quantiles of their empirical distribution, giving a bagging estimate of the

symmetric prediction interval Iα(θ | x0, y0) defined in (4.2) with nominal coverage (1 − α).

As an alternative, one might be tempted in step [5] of the above algorithm to simply

record quantiles of f(u | θ∗

b ,x0, y0), rather than drawing from this estimated density, and

then after step [7] to simply average the recorded quantiles over the Monte Carlo trials. This,

however, would be something other than bagging. In particular, (4.3) gives the expectation

of a density, rather than quantiles of the density. The number of deviates K drawn in step

[5] can be made to be very large, since draws cost little in terms of computational burden.

4.3 Bootstrap-adjusted interval estimates

As an alternative to the bagging method discussed in Section 4.2, bootstrap methods might

be used to adjust the nominal coverage of conventional estimates of the intervals defined in

Section 4.1 in an attempt to improve the realized coverage of conventional interval estimates.

The idea is similar to the approach described by Davison and Hinkley (1997, pages 243–245).

Iterated bootstrap procedures were introduced by Hall (1992) for the purpose of refining

the coverage of bootstrap estimates of confidence intervals. This idea has been used in the

context of nonparametric efficiency estimation with panel data by Hall et al. (1995), where

15

firm-specific efficiency was estimated as a fixed effect; the bootstrap was iterated to adjust

coverages when the two largest fixed effects were near each other, and in the context of

nonparametric efficiency estimation with cross-sectional data by Simar and Wilson (2001,

2004). The idea here is similar, except that conventional interval estimates are used in

the first stage, so that only one level of bootstrapping is needed. This greatly reduces

the computational burden that is incurred by more typical implementations of the iterated

bootstrap.

To briefly recap the discussion in Sections 2 and 4.1, the applied researcher is confronted

with some data in Sn and wishes to estimate the prediction interval given by (4.2) for the

random variable(e−U |, θ,x0, y0

)with a chosen, nominal coverage level (1−α). Let ξ denote

the random variable(e−U | θ,x0, y0

), where U has the probability density function given by

(2.9). In the case of the two-sided interval Iα(θ | x0, y0) defined in (4.2), the goal is to find

a value α such that Pr(ξ ∈ Ibα(θ | x0, y0

)≈ 1 − α.

Define a set of J values α1 < . . . < αJ , with 0 < αj < 1 for all j = 1, . . . , J . Since

the nominal value chosen for α to determine the coverage of prediction intervals is typically

small, αJ can be made well less than 1. Then the value α can be computed using the

following algorithm.

Algorithm #2:

[1] Maximize the log-likelihood function (2.6) to obtain estimates θ, Iα(θ), and w0 =

ω(θ | x0, y0); recover estimates σ2V , σ2

U from elements of θ.

[2] Set b = 1 and set C =[0 . . . 0

]′, a J-vector of zeros.

[3] For i = 1, . . . , n draw v∗i ∼ N(0, σ2V ) and u∗i ∼ N+(0, σ2

U), and compute y∗ib = g(xi |β)ev∗

i−u∗

i .

[4] Using the pseudo-data S∗b,n = {xi, y

∗ib}n

i=1, obtain bootstrap estimates θ∗

b =

argmaxθ

L(θ | S∗b,n) after replacing log yi with log y∗ib in (2.6).

[5] Compute J(θ∗

b) =[Iα1

(θ∗

b) . . . IαJ(θ

b)]′

, a vector of intervals obtained by replacing

unknown parameters in (4.2) with elements of θ∗

b for each αj .

16

[6] Draw u∗ from f(u | θ,x0, y0) obtained by replacing θ in (2.9) with θ, and set ϕ∗b =

exp(−u∗).

[7] For j = 1, . . . , J , increment the jth element of C by one if ϕ∗b falls in the interval in

the jth element of J.

[8] Increment b by one.

[9] Repeat steps [3]–[8] B times, the vector C of counts.

At the end of the exercise, the coverage of the intervals computed in step [5] is given

by the vector D = B−1C (note that since αj < αj+1, necessarily Dj ≥ Dj+1, where Dj

represents the jth element of D). Together with the grid of values for α defined above, we

have a mapping αj → Dj, which could be approximated by regressing the Dj on the αj , or

perhaps by linear or quadratic interpolation. This allows one to choose a value α that yields

coverage approximately equal to the nominal value α; i.e., instead of using Iα(θ | x0, y0), the

researcher can use Ibα(θ | x0, y0), which will have coverage closer to the desired level (1−α)

than the original interval.

5 Confidence Intervals

In addition to estimating prediction intervals for the random variable(e−U | θ,x0, y0

)as

discussed in Section 4, one can also estimate confidence intervals for non-stochastic quantities

such as τ0 defined in (2.12), τ defined in (2.14), as well as the elements of θ. In applied

studies, researchers typically estimate confidence intervals for the elements of θ (or simply

report whether estimates are significantly different from zero) using conventional methods

employing variance estimates obtained by inverting the negative Hessian of the log-likelihood

function. As will be seen, this approach is problematic due to the boundary issues discussed

earlier in Section 3.

Applied studies typically report point estimates τ0 corresponding to observed input-

output levels of firms represented in the sample Sn as discussed above in Section 2, and often

rank firms according to such estimates. Some studies have in addition reported estimates

of intervals defined by (4.2). A number of such papers have remarked that the estimated

17

intervals are wide, and that firms’ efficiencies are not significantly different in a statistical

sense.9

While prediction intervals for the random variable(e−U | θ,x0, y0

)are certainly of in-

terest, such intervals are necessarily wider than confidence intervals for the non-stochastic

value τ0 defined in (2.12). Given that many papers report estimates of τ0 corresponding

to different firms, and that firms are typically ranked in terms of these estimates, it seems

reasonable to ask whether, and how, reasonable estimates of confidence intervals for τ0 might

be obtained.

In fact, it is straightforward to estimate confidence intervals for τ0 = τ(θ | x0,y0) given

in (2.12), as well as for τ(σ2U ) given in (2.14) using a parametric bootstrap. For the model

defined by Assumptions 2.1 and 2.2, a parametric bootstrap consists of the following steps:

Algorithm #3:

[1] Using the sample data Sn = {(xi, yi)}ni=1, maximize the log-likelihood in (2.6) to obtain

estimates θ; recover σ2V , σ2

U from elements of θ.

[2] For i = 1, . . . , n draw v∗i ∼ N(0, σ2V ) and u∗i ∼ N+(0, σ2

U), and compute y∗i = g(xi |β)ev∗

i−u∗

i .

[3] Using the pseudo-data S∗b,n = {xi, y∗i }n

i=1, compute bootstrap estimates θ∗

b =

argmaxθ

L(θ | S∗b,n) after replacing log yi with log y∗i in (2.6).

[4] Repeat steps [2]–[3] B times to obtain estimates B = {(θ∗

b)}Bb=1.

The estimates obtained in step [1] can be used to compute estimates τ0 = τ(θ | x0, y0)

and τ = τ(σ2U ) of τ0 = τ(θ | x0, y0) and τ (σ2

U), respectively. Similarly, the bootstrap

9The authors of Taube (1988) and Horrace and Schmidt (1996) are well aware that they provide predictionintervals for a random variable, even if they sometimes refer to their intervals as “confidence intervals”(Taube, writing in German, uses “Konfidenzintervall”). However, several published papers and perhaps alarger number of working papers contain clear, explicit statements indicating that the interval in (4.2) is usedas a confidence interval for the non-stochastic quantity τ0 defined in (2.12), rather than as a prediction intervalfor the random variable

(e−U | θ, x0, y0

). At the risk of being pedantic, a confidence interval is random set

covering an unknown population parameter with a certain probability, while a prediction interval involvesinference about a random variable, as opposed to a population parameter. A prediction interval is obtainedfrom knowledge of the distribution of the random variable whose future realizations one wishes to forecast,while a confidence interval requires information about the sampling distribution of an estimator. Precisedefinitions and additional discussion can be found in Stuart et al. (1999, pages 739–741), Shao (1999, pages99 and 432), Gourieaux and Monfort (1995, pages 373–374), or Neter et al. (1990, pages 78–83).

18

estimates obtained in B can be used to compute bootstrap estimates τ ∗b = τ(θ∗

b | x0, y0) and

τ∗

b = τ (σ∗2Ub) after recovering σ∗2

Ub from elements of θ∗

b . If one wishes to make inference about

σ2U and σ2

V , bootstrap estimates σ∗2Ub, σ

∗2V b can also be recovered from the bootstrap estimates

in B.

The bootstrap estimates in B (or the additional bootstrap estimates τ ∗B, τ∗

b) can be used

to estimate confidence intervals for the parameters of the model or for τ0 or τ in any of

several ways. We consider two possibilities. To illustrate, consider an initial estimate δ of

a quantity δ, and a corresponding set of bootstrap estimates {δ∗b}Bb=1. For nominal size α,

the percentile intervals discussed by Efron (1979, 1982) are given by(δ(α

2), δ( 1−α

2))

where

δ(α) denotes the α × 100-percentile of the elements of {δ∗b}Bb=1. Alternatively, bias-corrected

(BC) intervals described by Efron and Tibshirani (1993) are given (again, for nominal size

α) by(δ(α1), δ(α2)

)where α1 = Φ

(2z0 + z(α

2)), α2 = Φ

(2z0 + z(1−α

2)), z0 = Φ−1

(#{bδ∗

b<bδ}

B

),

and z(α) = Φ−1(α).

As an alternative to the bootstrap, confidence intervals for τ0 defined in (2.12) and τ

defined in (2.14) can also be obtained by noting that the theory of maximum likelihood

gives√n

(θ − θ

)d−→ N(0,Σ), (5.1)

where Σ is the variance-covariance matrix of θ. Since τ0 is simply a function of θ and of

x0, y0, the Slutsky theorem yields

√n (τ0 − τ0)

d−→ N(0, σ2

τ0

). (5.2)

The delta method gives the first-order approximation σ2τ0≈ ∇′

τ0Σ∇τ0 , where

∇τ0 =[

∂τ0∂β

∂τ0∂σ2

∂τ0∂γ

]′. (5.3)

Expressions for the necessary partial derivatives are given in the Appendix. In the case of

mean efficiency, the Slutsky theorem again provides

√n

(τ − τ

)d−→ N

(0, σ2

τ

), (5.4)

and the delta method gives σ2τ ≈ ∇′

τΣ∇τ where

∇τ =[

∂τ∂β

∂τ∂σ2

U

∂τ∂γ

]′. (5.5)

19

Again, the necessary partial derivatives are given in the Appendix.

Asymptotic confidence intervals for τ0 and τ can be constructed using the asymptotic

results in (5.2) and (5.4). For example, in the case of τ0, a (1− α)× 100-percent confidence

interval is given by (τ0 − στΦ

−1(α

2

), τ0 + στΦ

−1(1 − α

2

))(5.6)

where Φ−1(·) is the standard normal quantile function. A (1 − α) × 100-percent confidence

interval for τ can be constructed similarly by replacing στ with στ in (5.6). The variances

σ2τ and σ2

τ can be estimated by replacing unknown parameters with their ML estimates.

The bounds of the intervals obtained by the delta method are functions of θ. In order

to estimate the intervals, the unknown parameters must be replaced with estimates, just

as was the case with conventional estimation of prediction intervals as discussed in Section

4.1. Asymptotically this presents no problems, but in finite samples, one would expect that

the resulting interval estimates understate uncertainty about τ and τ , due to the failure

to account for uncertainty about the parameters that must be estimated. The bootstrap

method outlined in Section 5 avoids this problem.

6 Monte Carlo Evidence

6.1 Design of Experiments

In our Monte Carlo experiments, we take (2.1)–(2.4) as the “true” model, with g(X | β) =

eβ1Xβ2 and X ∼ N+(60, 10) so that X > 0. After taking logs, the model can be written as

log Y = β1 + β2 logX + V − U, (6.1)

with V and U distributed as described in (2.3)–(2.4). We set β1 = log(10) ≈ 2.3026 and β2 =

0.8. Olson et al. (1980) note that only two parameters among the set {σU , σV , σ2, γ, σ2

W}(where σ2

W = VAR(W ) =(π−2)σ2

U

π+ σ2

V ) are independent. Therefore, we set σ2W = 1 without

loss of generality, and consider six different values of the ratio λ2 =σ2

U

σ2

V

of shape parameters:

λ2 ∈ {0.1, 0.5, 1, 2, 10, 100}. These values correspond to γ = 0.0909, 0.3333, 0.5, 0.6667,

0.9091, and 0.9901, respectively. Figure 1 shows plots of the density of the composite error

term W given in (2.5) corresponding to the six values of λ2. For reference, plots of the

standard normal density, given by the dashed curves in Figure 1, are also shown.

20

Plausible values of the dependent variable in (6.1) vary across experiments, depending

on λ2. For each experiment, we use the true values of γ and σ2 to estimate (by Monte

Carlo) the 0.1, 0.3, 0.5, 0.7, and 0.9 quantiles of W = V − U ; denote these by w(q), q ∈Q = {0.1, 0.3, 0.5, 0.7, 0.9}. We then obtain log-output values for five hypothetical

firms by computing y(q) = exp(β1 + β2 log(60) + w(q)

), q ∈ Q. The input/output values for

these hypothetical firms are held constant across Monte Carlo trials, and across bootstrap

replications within each Monte Carlo trial, reflecting the fact that in applied settings, points

of interest in the input-output space are always taken as given.

On each Monte Carlo trial, we generate n ∈ {100, 1000} observations {(xi, yi)}ni=1 and

use these to estimate the parameters of the model by maximizing the log-likelihood (2.6).

We then use the parameter estimates to estimate mean efficiency τ defined in (2.14). We

also estimate inefficiency for a set of five “hypothetical” firms, which do not necessarily

correspond to any observations in our samples. For each of these hypothetical firms, the

input level is fixed at 60.10

Each experiment consisted of M = 2048 Monte Carlo trials.11 On each trial, we use

the conventional, bagging, and bootstrap-adjusted methods discussed in Section 4 to esti-

mate prediction intervals for(e−U | θ, x0 = 60, y(q)

), q ∈ Q. On each Monte Carlo trial,

the coverage of each estimated prediction interval can be computed exactly (in the sense

of not relying on Monte Carlo approximation or estimation) since the exact distribution of(e−U | θ, x0 = 60, y(q)

)is known from (2.9). Specifically, if a particular method yields pre-

diction interval estimates (am, bm) for each Monte Carlo trial m = 1, . . . , M , then the esti-

mated coverage is computed as M−1∑M

m=1 Fq(− log(am) | θ, x0 = 60, y(q)) − Fq(− log(bm) |θ, x0 = 60, y(q)), where Fq(·) is the distribution function for the truncated normal distribution

N+(µ∗, σ2∗), here with µ∗ obtained by substituting w(q) for w0 in (2.10).

In addition, confidence intervals for τ as well as for τ corresponding to each (x0, y(q)),

q ∈ Q, were estimated using both the delta method and the bootstrap method described in

Section 5. Estimated coverages for confidence interval estimates were computed by dividing

(i) the number of trials where confidence intervals estimated by a given method include the

10We experimented with other values for the input level, but found little difference among our results.11The Monte Carlo experiments were performed on a massively parallel machine, where the number of

processors for a particular job are conveniently and efficiently chosen as a power of 2. Choosing 211 = 2, 048Monte Carlo trials makes efficient use of the processors.

21

parameter of interest by (ii) M , the total number of Monte Carlo trials.

6.2 Coverages of estimated prediction intervals for(e−U | θ,x0, y0

)

Table 2 gives estimated coverages of two-sided prediction intervals estimated by the conven-

tional method, i.e., by replacing θ in (4.2) with ML estimates θ on each Monte Carlo trial,

for(e−U | θ, x0 = 60, y(q)

)(where q ∈ Q as described above in Section 6.1). For each sam-

ple size n ∈ {100, 1000}, Table 2 gives estimated coverages for six different signal-to-noise

ratios—corresponding to the six different values of λ2—as well as three nominal significance

levels and five different values of y0 determined by quantiles of W .

Coverages by the conventional prediction intervals reported in Table 2 generally improve

when the sample size increases from n = 100 to n = 1000, but are poor for smaller values

of λ2, even with n = 1000. Moreover, with n = 100, coverages across the six values of λ2

improve as λ2 increases from 0.1 to 10, but then become worse when λ2 is increased from 10

to 100. The poor performance of the conventional prediction intervals can be explained after

careful inspection of the interval in (4.2). To illustrate, consider the case where λ2 = 0.1; the

results in Table 1 indicate roughly 46 percent of samples of size n = 1000—about 942 of 2,048

Monte Carlo trials—will have residuals with positive skewness, ensuring that γ = 0 and hence

µ∗ = σ2U = 0. This is further confirmed by looking in Figure 1 at the panel corresponding to

λ2 = 0.1, where f(w | γ, σ2) is seen to be almost the same as the standard normal density.

Hence, for any value of α ∈ (0, 1), the estimate of the interval in (4.2) collapses to [1, 1]

in such cases. But, simple algebra confirms that when λ2 = 0.1 in our experiments, the

conditional density in (2.9) has strictly positive variance. Yet, prediction interval estimates

based on (4.2) will yield zero coverage in about 46-percent, or 942 of 2,048 Monte Carlo

trials, regardless of the significance level that is chosen. A similar phenomenon occurs when

γ = 1, which occurs in a number of trials when λ2 is increased to 100. Necessarily, prediction

intervals of zero-width have poor coverage when the true γ is neither zero nor one.12

12We know of only one other paper that has examined coverages of conventional prediction intervals similarto the interval in (4.2) proposed by Horrace and Schmidt (1996). Kumbhakar and Lothgren (1998) report, inTable 3a of their paper, coverages of e−U (as opposed to e−u | θ, x0, y0) for sample sizes ranging from n = 25

to n = 800, and eleven values of γ∗ = (π−2π

)σ2U

[(π−2

π)σ2

U+ σ2

V

]−1equal to 0.0, 0.1, ..., 1.0. They set σ2 = 1;

hence, their values of γ∗ correspond to λ2 = 0.0, 0.306, 0.688, 1.18, 1.83, 2.75, 4.13, 6.42, 11.01, 24.77,and +∞. Inspection of the results in their Table 3a reveals some problems. For example,with γ∗ = 0.1,implying λ2 = 0.306, Kumbhakar and Lothgren report average coverage equal to 0.743 at 95-percent atnominal significance in samples of size n = 100. Yet, as our results in Table 1 indicate, with n = 100 and

22

Analogous to Table 2, Table 3 presents estimated coverages of two-sided bagging estimates

of prediction intervals for the random variable e−U | θ, x0 = 60, y(q) The bagging method

described in Section 4.2 is shown to yield prediction intervals with coverages at least as

good as those obtained by the conventional approach in every case considered, with large

improvements in many cases. In particular, the bagging intervals appear to be far less

sensitive to both sample size as well as signal-to-noise ratios. With n = 100, coverages

obtained with the bagging intervals are often either too small or too large, depending on

the quantile q of W , with the smallest signal-to-noise ratio, but come much closer to the

nominal significance levels than do the conventional intervals represented in Table 2 at similar

signal-to-noise ratios.

The markedly superior performance of bagging over the conventional approach to estima-

tion of prediction intervals can be understood after re-examining Algorithm #1 in Section

4.2. Consider again the case where λ2 = 0.1; in the roughly 942 trials where composite

residuals have positive skewness, it remains true that γ = 0 and hence σ2U = 0. Then in step

[3] of the procedure, u∗i = 0 ∀ i = 1, . . . , n; consequently, the residuals are composed only of

the normally-distributed v∗i . Although the normal distribution has no skewness, when finite

samples are drawn, roughly half of the samples will have small but positive skewness, while

the remaining samples will have small but negative skewness. Therefore, on each Monte

Carlo trial where σ2U = 0, the distribution of the B bootstrap estimates σ∗2

Ub obtained in

step [4] will exhibit a probability mass of about 0.5 at zero. This reflects rather closely the

sampling distribution of σ2U , which—due to the results in Table 1—necessarily must con-

tain a probability mass of about 0.46 for the case described here. At the end of step [7]

in Algorithm #1, the set U will contain a number of elements identically equal to 1; given

that γ 6= 0 (this is typically assumed by applied researchers, perhaps after rejecting the null

hypothesis H0 : γ = 0), these elements are not informative; hence they are deleted in step

[8]. The elements remaining in U∗ provide good prediction intervals (in terms of coverages),

λ2 between 0.1 and 0.5, about 40.5–48.9 percent of the samples in their Monte Carlo experiments shouldhave yielded samples with residuals skewed in the “wrong” direction; in all such cases σ2

U= 0, causing the

Horrace-Schmidt intervals collapse to a point (and hence to provide no coverage). The results in Table 1indicate that average coverage by the Horrace-Schmidt intervals could be at most only about 0.511–0.0.595.Our results in Table 2 indicate that conditioning on w0 makes some difference; nonetheless, at 95-percentnominal significance, estimates of the conventional interval for e−U | θ, x, y0 in (4.2) ranges between 0.377and 0.518 for n = 100 and λ2 ∈ {0.1, 0.5}, which is well below the coverage of e−U reported by Kumbhakarand Lothgren for n = 100 and λ2 = 0.306. Other entries in their Table 3a are similarly implausible.

23

as demonstrated by the results in Table 3.

Before considering estimation of confidence intervals, we return to the conventional pre-

diction intervals with bootstrap adjustment described in Section 4.3. Estimates from our

Monte Carlo experiments for coverages achieved by this approach are shown in Table 4. The

results reveal that the bootstrap adjustment in Algorithm #2 provides some improvement

over coverages attained with the ordinary conventional approach in Table 2, but the cover-

ages in Table 4 remain poor when compared to the results obtained with bagging in Table

3. The bootstrap adjustment to the conventional intervals cannot overcome the problems

related to the collapse of the interval in (4.2) to a point whenever γ = 1.

6.3 Coverages of estimated confidence intervals for τ0 and τ

Using the expression in (2.12), the “true” values of θ in our Monte Carlo framework, and

the quantiles w(q), it is straightforward to compute τ(θ | x0, y0) (i.e., the “true” value) cor-

responding to quantiles w(q) considered in our Monte Carlo experiments as well as mean

efficiency τ using the expressions in (2.12) and (2.14). It is equally straightforward to com-

pute the proportion of Monte Carlo trials which yield intervals estimated using various

approaches that cover τ or τ .

In our experiments, the delta method provides no discrimination. In every case, the

method yielded intervals that always cover the true values of τ0. This should be of little sur-

prise, given that the delta method is based on a Taylor expansion of the log-likelihood; here,

where substantial nonlinearities exist, higher-order terms in the Taylor expansion that are

dropped by the delta method are important. This also suggests a situation where bootstrap

methods can be expected to give improved performance over more conventional methods of

inference.

Although it is clear that the interval Iα(θ | x0, y0) given in (4.2) is a prediction inter-

val for the random variable(e−U | θ,x0, y0

), a number of authors have used estimates of

Iα(θ | x0, y0) as confidence intervals for τ0. Table 5 presents estimated coverages of τ0 (cor-

responding to the quantiles w(q)) by estimates of conventional (prediction) intervals based

on Iα(θ | x0, y0) in (4.2). We stress that these results are obtained from a mis-use of predic-

tion intervals (for a random variable) as confidence intervals for a non-stochastic population

parameter, namely τ0; this is not what Horrace and Schmidt intended. Nonetheless, given

24

that the conventional, Horrace and Schmidt intervals have been mis-used by some authors

in this way, and given that one could plausibly expect prediction intervals for a random

variable to over-cover the mean of the same random variable, it is interesting to see what

the consequences of the mis-use might be.

Given that the conventional intervals are prediction intervals for a random variable, as

opposed to confidence intervals for its mean, one might plausibly expect that prediction

intervals would over-cover the mean when used as confidence intervals for τ0. Indeed, this

occurs in Table 5 for cases with n = 1000 and λ2 ≥ 2, where realized coverages range from

0.997 to 1.000 at each significance level that was considered. But, Table 5 also reveals the

opposite in many cases, particularly for sample size n = 100, but also when n = 1000 and

the signal-to-noise ratio is less than 1. In these cases, the intervals tend to under-cover. This

is due to the fact that the conventional interval estimates collapse to zero width whenever

γ = 0 or 1, as discussed above in Section 6.2. While one might have expected over-coverage,

that is not always the case.

Table 6 shows estimated coverages of bootstrap confidence intervals for τ0 (again, corre-

sponding to the quantiles w(q)) obtained from Algorithm #3. The results in Table 6 indicate

that the bootstrap method performs well in the sense that it yields coverages that are close

to nominal levels, except perhaps when λ2 is very small, although the coverages in Table 6

are not so bad even with λ2 = 0.1 and n = 100. With the smaller values of λ2, the realized

coverages shown in Table 6 are larger than the nominal significance levels. The results in-

dicate that coverages depend (not surprisingly) on the signal-to-noise ratio as reflected by

λ2. Coverages also depend, though less severely, on where the output value lies for a given

point of interest, as reflected by the quantiles w(q). The bootstrap intervals perform well

with the larger sample size (i.e., n = 1000) when λ2 ≥ 1. For smaller values of λ2, coverage

by the bootstrap intervals is too large for 90-percent and 95-percent nominal significance.

But, small values of λ2 represent low signal-to-noise ratios; in other words, when λ2 is small,

extracting information from a given body of data is necessarily more difficult than when

λ2 is larger. The bootstrap is neither magic nor a panacea, but it still out-performs the

conventional approach to inference regarding efficiencies of individual units.

As noted previously in Section 5, bootstrap estimates in the set B can be used to com-

pute bootstrap estimates τ∗

b ; these can be used to construct (percentile) bootstrap estimates

25

of confidence intervals for mean efficiency τ , analogous to how bootstrap confidence inter-

vals are constructed for τ0. Recall, however, that τ defined in (2.14) is a function of just

one model parameter, namely σ2U . Given the transformation-preserving property of the per-

centile bootstrap (e.g., see Efron and Tibshirani, 1993), the coverage of bootstrap confidence

intervals for τ will be identical to the coverage of bootstrap confidence intervals for σ2U at

the same significance levels. Results on coverages of bootstrap confidence intervals for σ2U

are given in Table 9 and are discussed below in Section 6.4. As will be seen, with sample

size n = 100, the bootstrap intervals perform well, though at .90 and .95 significance levels,

coverages are slightly too large at smaller values of λ2 and slightly too small at larger values

of λ2. At .99 significance, the bootstrap intervals have very good coverage. When the sample

size is increased to n = 1000, coverages by the bootstrap intervals improve in almost every

case.

6.4 Coverage of Intervals for Model Parameters

The bootstrap procedure (Algorithm #3) given in Section 5 can also be used to estimate con-

fidence intervals for the model parameters. Conventional inference about model parameters

meanwhile relies on the asymptotic normality result from the theory of maximum likeli-

hood, and typically employs the inverse negative Hessian of the log-likelihood function in

(2.6) as an estimator of the variance-covariance matrix. Alternatively, one can estimate the

variance-covariance matrix using the outer-dot product of the gradient of the log-likelihood,

sometimes called the OPG or BHHH (after Berndt et al., 1974); i.e., one can estimate Σ in

(5.1) by

Σ =[∇θ∇

θ

]−1

(6.2)

where

∇θ =[∇θ1 ∇θ2 . . . ∇θn

], (6.3)

∇θi =∂Li(θ | Sn)

∂θ

∣∣∣∣θ=bθ

, (6.4)

and Li(θ | Sn) is the contribution of the ith observation to the log-likelihood given in (2.6).

It is widely known (e.g., see Cameron and Trivedi, (2005)) that using the BHHH estimator

in (6.2) often leads to tests with poor finite-sample properties and large size distortions; on

the other hand, it can be computed even when the negative Hessian of the log-likelihood is

26

singular (it is this feature that leads to the guarantee of convergence in the maximization

algorithm of Berndt et al., 1974).

Although the model is parameterized in terms of β, σ2, and γ, straightforward algebra

yields estimators σ2V and σ2

V as functions of β, σ2, and γ; then variances of the the estimators

σ2V and σ2

U are given by O′V ΣOV and O

′UΣOU where Σ is the variance covariance matrix for

(β, σ2, γ) and OV and OU are vectors of derivatives of σ2V and σ2

U with respect to β, σ2,

and γ. Conventional variance estimates can be used to estimate confidence intervals in the

usual way.

Tables 7, 8, and 9 give estimated coverages of intervals estimated by the conventional

method (using the inverse of the negative Hessian in Table 7, and the BHHH estimator

in Table 8) and bootstrap method for model parameters with sample sizes n = 100 and

n = 1000, and significance levels .90, .95, and .99. The first row of Table 7 gives the

number of trials—out of 2,048—where confidence intervals for model parameters could be

estimated. Interval estimates could be computed for every trial using the BHHH estimator

and the bootstrap method, but the negative Hessian of the log-likelihood function is singular

whenever γ = 0 or 1. Due to the problems discussed in Section 6.2, γ = 0 with frequencies

similar to those in Table 1. In addition, with a sample size of only 100 observations, large

values of λ (10 and 100) yield small numbers of Monte Carlo trials with estimates γ = 1.

Of course, the conventional confidence interval estimates that use the inverse of the negative

Hessian cannot be computed whenever the negative Hessian is singular.

Estimated coverages in Table 7 for conventional intervals were computed by dividing the

number of instances where computed intervals covered the corresponding true values not by

the total number of Monte Carlo trials (2,048), but by the number of cases given in the first

row of the table where conventional intervals could be computed. Consequently, the coverage

estimates shown for conventional intervals in Table 7 may be overly optimistic. Although

coverage of the intercept and slope parameters (β1 and β2) by the conventional intervals is

close to nominal significance levels in Table 7, coverage of σ2, γ, σ2V , and σ2

U is poor for all

values of λ2. Moreover, the coverages do not improve when sample size is increased from

n = 100 to n = 1000. Similar observations can be made for coverages of intervals based on

the BHHH estimator, shown in Table 8, although with n = 100, coverages of β are too large

both when λ2 is small as well as when it is large. In both Tables 7 and 8, coverages of the

27

parameters other than β are very poor

Coverage by the bootstrap interval estimates shown in Table 9 is similar to that of the

conventional intervals based on the inverse of the negative Hessian for β1 and β2, and is

more accurate than with the conventional intervals for parameters σ2, γ, σ2V , and σ2

U . Unlike

the experience with the conventional intervals in Tables 7 and 8, coverages of the bootstrap

intervals in Table 9 are seen to improve when sample size is increased, except perhaps for

the smallest values of λ2. For the smallest values of λ2, coverage by bootstrap intervals for

σ2V is very accurate in Table 9 when n = 1000, but too large for σ2, γ, and σ2

U at significance

levels .90 and .95 when λ2 = 0.1 or 0.5. Coverages by the bootstrap intervals, however, are

closer to nominal values than coverages by the conventional intervals. Once again, when

the signal-to-noise ratio is low—reflected by small values of λ2—it is apparently difficult to

extract information about efficiency and noise from the model in 2.1.

7 Conclusions

In the context of cross-sectional PSF models, the term representing firm-specific inefficiency

is not identified; the point estimates that are typically reported are estimates of conditional

expectations. While one can estimate either prediction intervals for the random variable(e−U | w0

), or confidence intervals for the expectation of this random variable (i.e., for τ0

defined in (2.12)), prediction intervals for(e−U | w0

)are likely to be much wider than con-

fidence intervals for τ0. Where prediction intervals associated with individual firms have

been compared in the literature, it should be of little surprise that few, if any, significant

differences are found across firms.

Taube (1988) and Horrace and Schmidt (1996) derived prediction intervals for(e−U | w0

)

and proposed “conventional” estimators of their intervals based on replacing unknown model

parameters with consistent estimators. Unfortunately, the coverage properties of these con-

ventional interval estimates is poor, due in large part to the bounded support of the distri-

bution of the inefficiency term U , with the consequence that in finite samples, residuals may

have the “wrong” skewness, even if the model is correctly specified. By contrast, the bagging

method, using Algorithm #1 in Section 4.2, yields prediction interval estimates with much

better coverage properties.

In the case of confidence intervals for τ0, inappropriate use of the conventional prediction

28

intervals as confidence intervals leads (not surprisingly) to very poor coverage. While one

might expect that prediction intervals would over-cover when used as confidence intervals,

in fact the opposite occurs, again due to the boundary issues discussed in Sections 3 and 6.

Our simulation results demonstrate that a parametric bootstrap yields confidence intervals

for τ0 with good coverage properties, even when the signal-to-noise ratio is low.

It has been known for some time that PSF models sometimes yield residuals with the

“wrong” skewness. The implications of this phenomenon for inference, however, have not

been carefully considered. This paper makes clear that, depending on the signal-to-noise

ratio reflected by λ2, the problem can arise with alarming frequency, even when the model is

correctly specified. Consequently, “wrongly” skewed residuals in an observed sample do not

provide meaningful evidence of specification error. Common practice among practitioners,

however, has been to change the model specification whenever “wrongly” skewed residuals

are encountered.

Classical inference assumes that the model specification is chosen independently of any

estimates that are obtained; specification-searching introduces problems of bias in both pa-

rameter estimates as well as variance-covariance estimates. The bagging procedure described

in Section 4.2 and the bootstrap procedure given in Section 5 can be used to estimate predic-

tion intervals as well as confidence intervals with good coverage properties, and can extract

useful information about efficiency levels even from samples where residuals have the “wrong”

skewness. In such cases, one of course relies heavily on parametric assumptions, but this is

probably always true when PSF models are estimated.

A Technical Appendix

To evaluate ∇τ in (5.2) and (5.3), note that (2.12) can be rewritten as

τ = Φ

(µ∗

σ∗− σ∗

(µ∗

σ∗

)−1

exp

(−µ∗ +

σ2∗

2

). (A.1)

Also note that the first term in ∇τ is given by

∂τ

∂β=

∂τ

∂µ∗

∂µ∗

∂w0

∂w0

∂β

=

(∂τ

∂µ∗

(∂ log g(x | β)

∂β

). (A.2)

29

Hence terms

∂τ

∂ζ= Φ

(µ∗

σ∗

)−1

exp

(−µ∗ +

σ2∗

2

)∂

∂ζΦ

(µ∗

σ∗− σ∗

)

(µ∗

σ∗− σ∗

)exp

(−µ∗ +

σ2∗

2

)∂

∂ζΦ

(µ∗

σ∗

)−1

(µ∗

σ∗− σ∗

(µ∗

σ∗

)−1∂

∂ζexp

(−µ∗ +

σ2∗

2

)(A.3)

are needed, where ζ represents either µ∗, σ2 or γ.

The nine different partial derivatives needed in (A.3) are given by:

∂µ∗Φ

(µ∗

σ∗− σ∗

)= σ−1

∗ φ

(µ∗

σ∗− σ∗

), (A.4)

∂µ∗Φ

(µ∗

σ∗

)−1

= −σ−1∗ Φ

(µ∗

σ∗

)−2

φ

(µ∗

σ∗

), (A.5)

∂µ∗exp

(−µ∗ +

σ2∗

2

)= − exp

(−µ∗ +

σ2∗

2

), (A.6)

∂σ2Φ

(µ∗

σ∗− σ∗

)=

1

(µ∗

σ∗− σ∗

)γ1/2σ−1

[w0σ

−2γ−1 (1 − γ)−1/2 − (1 − γ)1/2], (A.7)

∂σ2Φ

(µ∗

σ∗

)−1

= −1

(µ∗

σ∗

)−2

φ

(µ∗

σ∗

)w0γ

1/2σ−3 (1 − γ)−1/2 , (A.8)

∂σ2exp

(−µ∗ +

σ2∗

2

)=

1

2γ (1 − γ) exp

(−µ∗ +

σ2∗

2

), (A.9)

∂γΦ

(µ∗

σ∗− σ∗

)= φ

(µ∗

σ∗− σ∗

) [σ2γ (1 − γ)

]−1/2

×{−w0 +

1

2(1 − 2γ)

[w0

(1 − γ)− σ2

]}, (A.10)

∂γΦ

(µ∗

σ∗

)−1

=1

(µ∗

σ∗

)−2

φ

(µ∗

σ∗

)w0σ

−1γ−1/2(1 − γ)−3/2 (A.11)

and∂

∂γexp

(−µ∗ +

σ2∗

2

)=

[w0 +

1

2σ2(1 − 2γ)

]exp

(−µ∗ +

σ2∗

2

). (A.12)

The terms appearing in ∇τ in (5.4) and (5.5) are given by

∂τ

∂β= 0, (A.13)

30

∂τ

∂σ2=

1

2γτ − e

1

2γσ2

γ1/2σ−1φ(γ1/2σ

), (A.14)

and∂τ

∂γ=

1

2σ2τ − e

1

2γσ2

γ−1/2σφ(γ1/2σ

). (A.15)

31

References

Aigner, D., Lovell, C. A. K., and Schmidt, P. (1977), “Formulation and estimation of stochas-tic frontier production function models,” Journal of Econometrics, 6, 21–37.

Battese, G. E., and Coelli, T. J. (1988), “Prediction of firm-level technical efficiencies witha generalized frontier production function and panel data,” Journal of Econometrics,38, 387–399.

Battese, G. E., and Corra, G. S. (1977), “Estimation of a production frontier model: Withapplication to the pastoral zone off eastern australia,” Australian Journal of AgriculturalEconomics, 21, 169–179.

Bauer, P. W. (1990), “Recent developments in the econometric estimation of frontiers,”Journal of Econometrics, 46, 39–56.

Berndt, E., B. Hall, R. Hall, and Hausman, J. (1974), “Estimation and inference in non-linearstructural models,” Annals of Economic and Social Measurement, 3, 653–665.

Bravo-Ureta, B. E., and Pinheiro, A. E. (1993), “Efficiency analysis of developing countryagriculture: A review of the frontier function literature,” Agricultural and ResourceEconomics Review, 22, 88–101.

Breiman, L. (1996), “Bagging predictors,” Machine Learning, 24, 123–140.

Buhlmann, P., and Yu, B. (2002), “Analyzing bagging,” Annals of Statistics, 30, 927–961.

Cameron, A. C., and Trivedi, P. K. (2005), Microeconometrics: Methods and Applications,Cambridge: Cambridge University Press.

Coelli, T. (1995), “Estimators and hypothesis tests for a stochastic frontier function: Amonte carlo analysis,” Journal of Productivity Analysis, 6, 247–268.

Coelli, Tim (1996), “A guide to frontier version 4.1: A computer program for stochasticfrontier production and cost function estimation,” CEPA working paper #96/07, Centrefor Efficiency and Productivity Analysis, University of New England, Arimidale, NSW2351, Australia.

Davidon, W. C. (1959), “Variable metric method for minimization,” Argonne NationalLaboratory Research and Development report #5990; reprinted in SIAM Journal onOptimization 1, 1–17 (1991).

Davison, A. C., and Hinkley, D. V. (1997), Bootstrap Methods and their Application, Cam-bridge, United Kingdom: Cambridge University Press.

Efron, B. (1979), “Bootstrap methods: another look at the jackknife,” Annals of Statistics,7, 1–16.

(1982), The Jackknife, the Bootstrap and Other Resampling Plans, Philadelphia: Societyfor Industrial and Applied Mathematics. CBMS-NSF Regional Conference Series inApplied Mathematics, #38.

Efron, B., and Tibshirani, R. J. (1993), An Introduction to the Bootstrap, London: Chapmanand Hall.

32

Fletcher, R., and Powell, M. J. D. (1963), “A rapidly convergent descent method for mini-mization,” Computer Journal, 6, 163–168.

Flores-Lagunes, A., Horrace, W. C., and Schnier, K. E. (2007), “Identifying technicallyefficient fishing vessels: A non-empty, minimal subset approach,” Journal of AppliedEconometrics, 22, 729–745.

Friedman, J. H., and Hall, P. (1999), “On bagging and nonlinear estimation,” Unpublishedworking paper, Department of Statistics, Stanford University.

Fushiki, T., Komaki, F., and Aihara, K. (2005), “Nonparametric bootstrap prediction,”Bernoulli, 11, 293–307.

Giannakas, K., Tran, K. C., and Tzouvelekas, V. (2003), “On the choice of functional formsin stochastic frontier modeling,” Empirical Economics, 28, 75–100.

Gourieaux, C., and Monfort, A. (1995), Statistics and Econometric Methods, vol. 1, Cam-bridge: Cambridge University Press. Translated by Quang Vuong.

Greene, W. H. (1993), “The econometric approach to efficiency analysis,” In The Measure-ment of Productive Efficiency, ed. H. O. Fried, C. A. K. Lovell, and S. S. Schmidt, NewYork: Oxford University Press pp. 68–119.

(1995), LIMDEP Version 7.0 User’s Manual, New York: Econometric Software, Inc.

Greene, W. H. (1999), “Frontier production functions,” In Handbook of Applied Econo-metrics, ed. H. Peseran and P. Schmidt, vol. II, Oxford, United Kingdom: BlackwellPublishers Ltd. chapter 3, pp. 81–166.

Hall, P. (1992), The Bootstrap and Edgeworth Expansion, New York: Springer-Verlag.

Hall, P., Hardle, W., and Simar, L. (1995), “Iterated bootstrap with application to frontiermodels,” The Journal of Productivity Analysis, 6, 63–76.

Harris, I. R. (1989), “Predicitve fit for natural exponential families,” Biometrika, 76, 675–684.

Herrero, I., and Pascoe, S. (2002), “Estimation of technical efficiency: A review of some ofthe stochastic frontier and dea software,” Computers in Higher Education EconomicsReview. Virtual edition (http://www.economics.ltsn.ac.uk/cheer.htm).

Horrace, W. C. (2005), “On ranking and selection from independent from independent trun-cated normal distributions,” Journal of Econometrics, 126, 335–354.

Horrace, W. C., and Schmidt, P. (1996), “Confidence statements for efficiency estimatesfrom stochastic frontier models,” Journal of Productivity Analysis, 7, 257–282.

Kumbhakar, S. C., and Lothgren, M. (1998), “A monte carlo analysis of technical inefficiencypredictors,” Working Paper Series in Economics and Finance, No. 229, StockholmSchool of Economics, P. O. Box 6501, S–113 83 Stockholm, Sweden.

Kumbhakar, S. C., and Lovell, C. A. K. (2000), Stochastic Frontier Analysis, Cambridge:Cambridge University Press.

33

Leamer, Edward E. (1978), Specification Searches: Ad Hoc Inference with NonexperimentalData, John Wiley & Sons, Inc.

Lee, L. F. (1993), “Asymptotic distribution of the maximum likelihood estimator for astochastic frontier function model with a singular information matrix,” EconomietricTheory, 9, 413–430.

Meeusen, W., and van den Broeck, J. (1977), “Efficiency estimation from Cobb-Douglasproduction functions with composed error,” International Economic Review, 18, 435–444.

Neter, J., Wasserman, W., and Kutner, M. H. (1990), Applied Linear Statistical Models,Homewood, Illinois: Richard D. Irwin, Inc.

Olson, J. A., Schmidt, P., and Waldman, D. M. (1980), “A monte carlo study of estimatorsof stochastic frontier production functions,” Journal of Econometrics, 13, 67–82.

Ritter, C., and Simar, L. (1997), “Pitfalls of normal–gamma stochastic frontier models,”Journal of Productivity Analysis, 8, 167–182.

Sena, V. (1999), “Stochastic frontier estimation: A review of the software options,” Journalof Applied Econometrics, 14, 579–586.

Shao, J. (1999), Mathematical Statistics, New York: Springer-Verlag New York, Inc.

Simar, L., and Wilson, P. W. (2001), “Aplicacion del ‘bootstrap’ para estimadores d. e. a.,”

In La Medicion de la Eficiencia y la Productividad, ed. A. Alvarez, Madrid: Piramide.Translation of Performance of the Bootstrap for DEA Estimators and Iterating thePrinciple, Discussion Paper No. 0002, Institut de Statistique, Universite Catholique deLouvain, Louvain-la-Neuve, Belgium.

(2004), “Performance of the bootstrap for DEA estimators and iterating the principle,”In Handbook on Data Envelopment Analysis, ed. W. W. Cooper, L. M. Seiford, andJ. Zhu, New York: Kluwer Academic Publishers, Inc. chapter 10, pp. 265–298.

Stuart, A., Ord, K., and Arnold, S. (1999), Kendall’s Advanced Theory of Statistics, sixthed., vol. 2A, London: Edward Arnold.

Taube, R. (1988), Moglichkeiten der Effizienzmessung von offentlichen Verwaltungen, Berlin:Duncker & Humbolt GmbH.

Waldman, D. (1982), “A stationary point for the stochastic frontier likelihood,” Journal ofEconometrics, 18, 275–279.

34

Table 1: Proportion of 1,000 Normal-Half Normal Samples with Positive Skewness

λ2 =σ2

U

σ2

V

n .01 .05 .1 .5 1 2 10 20 10025 0.517 0.510 0.512 0.464 0.390 0.320 0.114 0.048 0.02050 0.494 0.498 0.493 0.455 0.381 0.219 0.023 0.005 0.003

100 0.498 0.486 0.489 0.405 0.301 0.142 0.000 0.000 0.000200 0.475 0.509 0.465 0.372 0.228 0.060 0.000 0.000 0.000500 0.503 0.497 0.489 0.320 0.106 0.007 0.000 0.000 0.000

1,000 0.510 0.488 0.460 0.220 0.032 0.000 0.000 0.000 0.00010,000 0.513 0.458 0.386 0.004 0.000 0.000 0.000 0.000 0.000

100,000 0.498 0.379 0.215 0.000 0.000 0.000 0.000 0.000 0.0001,000,000 0.463 0.148 0.003 0.000 0.000 0.000 0.000 0.000 0.000

35

Table 2: Estimated Coverages of Prediction Intervals for(e−U | θ, x0 = 60, y0 = y(q)

)Estimated by Conventional Method

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

quantile 0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.90.1 0.315 0.424 0.508 0.629 0.796 0.609 0.436 0.643 0.808 0.879 0.893 0.8860.3 0.410 0.480 0.557 0.677 0.815 0.610 0.461 0.660 0.829 0.890 0.894 0.8860.5 0.439 0.496 0.575 0.701 0.826 0.611 0.470 0.667 0.837 0.894 0.895 0.8860.7 0.451 0.498 0.578 0.710 0.841 0.611 0.476 0.671 0.843 0.896 0.897 0.8860.9 0.456 0.493 0.573 0.708 0.839 0.615 0.482 0.675 0.847 0.896 0.896 0.890

(1 − α) = 0.950.1 0.382 0.469 0.556 0.686 0.857 0.644 0.478 0.686 0.864 0.934 0.944 0.9380.3 0.451 0.513 0.598 0.730 0.873 0.645 0.491 0.700 0.881 0.943 0.945 0.9380.5 0.466 0.524 0.611 0.750 0.882 0.646 0.497 0.706 0.888 0.945 0.946 0.9380.7 0.473 0.526 0.614 0.758 0.896 0.646 0.500 0.710 0.893 0.947 0.947 0.9380.9 0.476 0.523 0.611 0.756 0.892 0.654 0.503 0.714 0.897 0.947 0.947 0.942

(1 − α) = 0.990.1 0.461 0.522 0.612 0.753 0.923 0.684 0.513 0.728 0.918 0.983 0.988 0.9830.3 0.486 0.544 0.638 0.784 0.932 0.685 0.517 0.737 0.928 0.987 0.988 0.9830.5 0.490 0.548 0.645 0.798 0.940 0.685 0.518 0.741 0.933 0.988 0.989 0.9830.7 0.492 0.550 0.648 0.803 0.950 0.685 0.520 0.744 0.936 0.988 0.989 0.9830.9 0.493 0.550 0.648 0.803 0.945 0.695 0.521 0.748 0.939 0.989 0.989 0.986

36

Table 3: Estimated Coverages of Prediction Intervals for(e−U | θ, x0 = 60, y0 = y(q)

)Estimated by Bagging

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

quantile 0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.90.1 0.783 0.905 0.919 0.907 0.899 0.830 0.881 0.928 0.901 0.891 0.893 0.8870.3 0.871 0.919 0.905 0.883 0.892 0.829 0.907 0.920 0.897 0.892 0.893 0.8870.5 0.911 0.920 0.895 0.868 0.854 0.829 0.918 0.915 0.894 0.893 0.893 0.8870.7 0.932 0.913 0.885 0.866 0.827 0.693 0.926 0.909 0.893 0.894 0.895 0.8870.9 0.943 0.895 0.875 0.875 0.926 0.830 0.932 0.903 0.892 0.895 0.896 0.890

(1 − α) = 0.950.1 0.891 0.958 0.966 0.959 0.943 0.895 0.941 0.967 0.951 0.944 0.946 0.9420.3 0.935 0.961 0.952 0.936 0.941 0.895 0.953 0.963 0.948 0.945 0.945 0.9420.5 0.956 0.962 0.946 0.927 0.908 0.896 0.959 0.960 0.947 0.946 0.945 0.9420.7 0.967 0.958 0.939 0.925 0.898 0.794 0.963 0.957 0.946 0.946 0.947 0.9420.9 0.973 0.949 0.935 0.936 0.959 0.897 0.967 0.952 0.945 0.947 0.948 0.944

(1 − α) = 0.990.1 0.975 0.991 0.994 0.992 0.983 0.976 0.988 0.994 0.991 0.988 0.989 0.9890.3 0.987 0.992 0.990 0.983 0.981 0.979 0.991 0.994 0.990 0.988 0.988 0.9890.5 0.991 0.993 0.989 0.981 0.969 0.980 0.992 0.993 0.989 0.989 0.989 0.9890.7 0.993 0.992 0.986 0.980 0.968 0.964 0.993 0.992 0.989 0.989 0.989 0.9890.9 0.995 0.992 0.988 0.987 0.987 0.975 0.994 0.991 0.988 0.989 0.989 0.989

37

Table 4: Estimated Coverages of Prediction Intervals for(e−U | θ, x0 = 60, y0 = y(q)

)Estimated by Conventional Method

with Bootstrap Adjustment

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

quantile 0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.90.1 0.391 0.490 0.584 0.722 0.849 0.616 0.465 0.680 0.847 0.898 0.895 0.8870.3 0.428 0.503 0.589 0.720 0.840 0.614 0.469 0.673 0.842 0.896 0.895 0.8860.5 0.441 0.502 0.585 0.719 0.837 0.612 0.471 0.670 0.842 0.896 0.895 0.8860.7 0.453 0.498 0.578 0.714 0.845 0.611 0.477 0.671 0.843 0.896 0.897 0.8860.9 0.465 0.498 0.577 0.710 0.839 0.615 0.487 0.679 0.848 0.896 0.896 0.890

(1 − α) = 0.950.1 0.441 0.520 0.614 0.757 0.895 0.651 0.497 0.714 0.894 0.948 0.946 0.9380.3 0.463 0.530 0.622 0.763 0.891 0.648 0.497 0.710 0.891 0.947 0.946 0.9380.5 0.469 0.529 0.620 0.764 0.890 0.647 0.497 0.708 0.891 0.947 0.946 0.9380.7 0.475 0.526 0.615 0.760 0.899 0.646 0.501 0.710 0.893 0.947 0.947 0.9380.9 0.483 0.528 0.615 0.758 0.892 0.654 0.508 0.718 0.898 0.947 0.947 0.942

(1 − α) = 0.990.1 0.484 0.543 0.637 0.784 0.940 0.690 0.520 0.741 0.932 0.989 0.988 0.9830.3 0.491 0.551 0.648 0.800 0.941 0.687 0.520 0.742 0.933 0.989 0.988 0.9830.5 0.491 0.551 0.649 0.804 0.943 0.686 0.519 0.742 0.934 0.988 0.989 0.9830.7 0.493 0.550 0.648 0.804 0.951 0.685 0.520 0.744 0.936 0.988 0.989 0.9830.9 0.495 0.553 0.651 0.804 0.945 0.695 0.523 0.751 0.940 0.989 0.989 0.986

38

Table 5: Estimated Coverages of E(e−U | θ, x0 = 60, y0 = y(q)

)by H-S Intervals

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

quantile 0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.90.1 0.244 0.403 0.512 0.674 0.875 0.698 0.506 0.734 0.934 0.997 1.000 1.0000.3 0.386 0.508 0.609 0.760 0.914 0.704 0.519 0.743 0.945 0.999 1.000 1.0000.5 0.467 0.544 0.644 0.797 0.938 0.709 0.520 0.746 0.947 0.999 1.000 1.0000.7 0.491 0.551 0.654 0.812 0.964 0.711 0.521 0.750 0.949 0.999 1.000 1.0000.9 0.494 0.556 0.659 0.819 0.979 0.720 0.521 0.754 0.949 1.000 1.000 1.000

(1 − α) = 0.950.1 0.314 0.450 0.551 0.720 0.907 0.713 0.519 0.740 0.941 0.999 1.000 1.0000.3 0.438 0.528 0.631 0.779 0.935 0.714 0.521 0.747 0.947 0.999 1.000 1.0000.5 0.483 0.549 0.650 0.804 0.950 0.717 0.521 0.750 0.949 0.999 1.000 1.0000.7 0.494 0.555 0.658 0.816 0.972 0.719 0.522 0.754 0.949 0.999 1.000 1.0000.9 0.495 0.558 0.661 0.822 0.982 0.724 0.524 0.755 0.950 1.000 1.000 1.000

(1 − α) = 0.990.1 0.415 0.507 0.614 0.768 0.938 0.723 0.523 0.748 0.947 0.999 1.000 1.0000.3 0.477 0.549 0.650 0.800 0.955 0.724 0.524 0.753 0.949 0.999 1.000 1.0000.5 0.494 0.555 0.657 0.813 0.966 0.726 0.525 0.755 0.949 0.999 1.000 1.0000.7 0.496 0.558 0.661 0.821 0.979 0.727 0.527 0.756 0.950 1.000 1.000 1.0000.9 0.497 0.561 0.663 0.826 0.983 0.731 0.527 0.759 0.951 1.000 1.000 1.000

39

Table 6: Estimated Coverages of E(e−U | θ, x0 = 60, y0 = y(q)

)by Bootstrap Confidence Intervals

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

quantile 0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.90.1 0.942 0.943 0.936 0.930 0.850 0.899 0.953 0.952 0.894 0.893 0.885 0.9190.3 0.942 0.945 0.943 0.936 0.852 0.891 0.953 0.949 0.897 0.890 0.882 0.9160.5 0.942 0.948 0.948 0.933 0.853 0.883 0.953 0.950 0.897 0.887 0.883 0.9160.7 0.942 0.956 0.959 0.884 0.848 0.879 0.953 0.951 0.899 0.879 0.888 0.9150.9 0.977 0.959 0.814 0.704 0.828 0.850 0.953 0.956 0.895 0.870 0.894 0.911

(1 − α) = 0.950.1 0.967 0.970 0.967 0.958 0.906 0.964 0.978 0.976 0.975 0.943 0.938 0.9610.3 0.967 0.971 0.970 0.962 0.911 0.954 0.979 0.976 0.971 0.943 0.939 0.9600.5 0.967 0.971 0.974 0.964 0.912 0.950 0.979 0.977 0.967 0.943 0.938 0.9590.7 0.967 0.974 0.980 0.946 0.920 0.942 0.979 0.979 0.954 0.939 0.949 0.9600.9 0.993 0.981 0.892 0.794 0.886 0.910 0.979 0.980 0.946 0.928 0.940 0.957

(1 − α) = 0.990.1 0.988 0.991 0.992 0.982 0.971 0.991 0.998 0.996 0.993 0.987 0.984 0.9830.3 0.988 0.992 0.993 0.985 0.974 0.989 0.998 0.996 0.993 0.985 0.986 0.9830.5 0.988 0.994 0.993 0.989 0.976 0.988 0.998 0.995 0.994 0.985 0.988 0.9840.7 0.989 0.995 0.997 0.986 0.982 0.988 0.998 0.996 0.994 0.984 0.991 0.9840.9 1.000 0.997 0.958 0.918 0.950 0.962 0.998 0.997 0.990 0.976 0.982 0.989

40

Table 7: Estimated Coverages of Conventional Confidence Intervals for Model Parameters (using inverse of negative Hessianto estimate covariances)

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

# 1030 1166 1384 1715 2035 1932 1084 1572 1956 2047 2048 2048

(1 − α) = 0.9β1 0.882 0.887 0.889 0.884 0.876 0.873 0.903 0.915 0.904 0.900 0.890 0.880β2 0.876 0.872 0.881 0.887 0.873 0.874 0.898 0.900 0.899 0.891 0.891 0.884σ2 0.579 0.625 0.655 0.628 0.152 0.024 0.735 0.798 0.790 0.522 0.140 0.023γ 0.963 1.000 1.000 1.000 1.000 1.000 0.798 0.983 1.000 1.000 1.000 1.000σ2

V 0.979 0.992 0.998 0.989 0.996 1.000 0.967 0.973 0.955 0.995 1.000 1.000σ2

U 0.976 0.997 1.000 0.997 0.974 0.982 0.873 0.972 0.990 0.980 0.997 0.999

(1 − α) = 0.95β1 0.935 0.937 0.941 0.939 0.925 0.911 0.940 0.950 0.949 0.945 0.939 0.938β2 0.931 0.937 0.934 0.939 0.926 0.914 0.948 0.949 0.948 0.944 0.940 0.941σ2 0.620 0.662 0.686 0.668 0.185 0.031 0.784 0.842 0.823 0.612 0.159 0.027γ 0.998 1.000 1.000 1.000 1.000 1.000 0.891 0.999 1.000 1.000 1.000 1.000σ2

V 0.987 0.997 0.999 0.995 0.996 1.000 0.989 0.983 0.967 0.997 1.000 1.000σ2

U 0.996 1.000 1.000 0.998 0.984 0.992 0.943 0.992 0.999 0.992 0.998 1.000

(1 − α) = 0.99β1 0.984 0.987 0.988 0.984 0.979 0.945 0.983 0.987 0.987 0.987 0.988 0.985β2 0.989 0.987 0.985 0.982 0.977 0.945 0.984 0.985 0.987 0.989 0.989 0.987σ2 0.674 0.716 0.726 0.714 0.247 0.041 0.863 0.896 0.883 0.767 0.212 0.037γ 1.000 1.000 1.000 1.000 1.000 1.000 0.982 1.000 1.000 1.000 1.000 1.000σ2

V 0.996 0.998 1.000 0.999 0.998 1.000 1.000 0.995 0.981 0.998 1.000 1.000σ2

U 1.000 1.000 1.000 1.000 0.997 0.997 0.994 1.000 1.000 1.000 1.000 1.000

41

Table 8: Estimated Coverages of Conventional Confidence Intervals for Model Parameters (using OPG estimator)

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.9β1 0.924 0.916 0.910 0.894 0.894 0.945 0.935 0.920 0.895 0.896 0.888 0.895β2 0.944 0.934 0.922 0.905 0.898 0.943 0.946 0.924 0.904 0.894 0.891 0.896σ2 0.175 0.238 0.263 0.217 0.110 0.050 0.141 0.259 0.270 0.238 0.099 0.021γ 0.001 0.004 0.008 0.015 0.097 0.785 0.000 0.001 0.001 0.000 0.011 0.082σ2

V 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000σ2

U 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

(1 − α) = 0.95β1 0.960 0.956 0.953 0.945 0.945 0.969 0.960 0.956 0.946 0.944 0.940 0.941β2 0.967 0.962 0.957 0.953 0.948 0.971 0.974 0.963 0.951 0.949 0.941 0.940σ2 0.191 0.266 0.309 0.265 0.132 0.053 0.162 0.311 0.324 0.285 0.123 0.025γ 0.001 0.004 0.009 0.020 0.111 0.828 0.000 0.001 0.002 0.000 0.013 0.101σ2

V 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000σ2

U 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

(1 − α) = 0.99β1 0.991 0.990 0.990 0.987 0.988 0.989 0.989 0.987 0.985 0.986 0.988 0.987β2 0.994 0.992 0.992 0.988 0.986 0.987 0.993 0.989 0.987 0.989 0.990 0.987σ2 0.227 0.305 0.374 0.355 0.171 0.065 0.206 0.398 0.412 0.371 0.151 0.038γ 0.001 0.005 0.013 0.024 0.146 0.865 0.000 0.001 0.004 0.001 0.019 0.129σ2

V 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000σ2

U 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

42

Table 9: Estimated Coverages of Bootstrap Confidence Intervals for Model Parameters (Percentile Bootstrap)

n = 100 n = 1000

λ2 =σ2

U

σ2

V

λ2 =σ2

U

σ2

V

0.1 0.5 1 2 10 100 0.1 0.5 1 2 10 100

(1 − α) = 0.9β1 0.886 0.882 0.879 0.880 0.874 0.858 0.910 0.903 0.894 0.893 0.893 0.892β2 0.885 0.878 0.881 0.880 0.875 0.858 0.899 0.890 0.895 0.890 0.889 0.891σ2 0.938 0.961 0.957 0.905 0.868 0.867 0.938 0.960 0.897 0.885 0.891 0.895γ 0.942 0.941 0.938 0.928 0.838 0.720 0.953 0.952 0.893 0.894 0.888 0.913σ2

V 0.745 0.867 0.902 0.903 0.825 0.711 0.900 0.947 0.911 0.889 0.891 0.910σ2

U 0.943 0.948 0.949 0.939 0.865 0.876 0.953 0.950 0.895 0.893 0.891 0.899

(1 − α) = 0.95β1 0.940 0.942 0.940 0.936 0.930 0.922 0.952 0.947 0.943 0.940 0.939 0.944β2 0.935 0.940 0.939 0.939 0.931 0.919 0.949 0.946 0.945 0.943 0.940 0.945σ2 0.963 0.982 0.985 0.963 0.927 0.933 0.976 0.981 0.948 0.936 0.946 0.938γ 0.966 0.969 0.966 0.955 0.895 0.996 0.978 0.976 0.973 0.945 0.940 0.974σ2

V 0.824 0.910 0.936 0.938 0.884 0.990 0.944 0.976 0.965 0.949 0.940 0.977σ2

U 0.967 0.971 0.972 0.970 0.929 0.936 0.979 0.978 0.965 0.940 0.947 0.940

(1 − α) = 0.99β1 0.983 0.986 0.985 0.984 0.982 0.984 0.988 0.986 0.985 0.986 0.987 0.987β2 0.982 0.985 0.984 0.981 0.983 0.983 0.986 0.985 0.986 0.989 0.987 0.988σ2 0.992 0.998 0.997 0.993 0.980 0.976 0.998 0.997 0.996 0.985 0.985 0.984γ 0.987 0.991 0.990 0.981 0.959 0.999 0.998 0.995 0.994 0.986 0.984 0.994σ2

V 0.933 0.970 0.980 0.974 0.951 0.999 0.986 0.993 0.992 0.986 0.981 0.995σ2

U 0.988 0.992 0.993 0.989 0.984 0.980 0.998 0.996 0.994 0.986 0.984 0.983

43

Figure 1: Conditional Density of W , f(w | γ, σ2), Corresponding to Different Values of λ2

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

lambda^2= 0.1

epsilon

dens

ity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

lambda^2= 0.5

epsilon

dens

ity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

lambda^2= 1

epsilon

dens

ity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

lambda^2= 2

epsilon

dens

ity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

lambda^2= 10

epsilon

dens

ity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

lambda^2= 100

epsilon

dens

ity

Note: Solid curves show f(w | γ, σ2); dashed curves show standard normal density.

44