STOCHASTIC FRONTIER.pdf

E62: Stochastic Frontier Models and Efficiency Analysis E-1

E62: Stochastic Frontier Models and Efficiency Analysis

E62.1 Introduction

Chapters E62-E65 present LIMDEP‟s programs for two types of efficiency analysis,

stochastic frontier analysis (SFA) and data envelopment analysis (DEA). To a large extent, these are

competing methodologies. No formulation has yet been devised that unifies the two in a single

analytical framework. Arguably, the former is a fully parameterized model whereas the latter is

„nonparametric,‟ albeit also atheoretical in nature.

The stochastic frontier model is used in a large literature of studies of production, cost,

revenue, profit and other models of goal attainment. The model as it appears in the current literature

was originally developed by Aigner, Lovell, and Schmidt (1977). The canonical formulation that

serves as the foundation for other variations is their model,

y = x + v - u,

where y is the observed outcome (goal attainment), x + v is the optimal, frontier goal (e.g.,

maximal production output or minimum cost) pursued by the individual, x is the deterministic part

of the frontier and v ~ N[0,v2] is the stochastic part. The two parts together constitute the

„stochastic frontier.‟ The amount by which the observed individual fails to reach the optimum (the

frontier) is u, where

u = |U| and U ~ N[0,u2]

(change to v + u for a stochastic cost frontier or any setting in which the optimum is a minimum). In

this context, u is the „inefficiency.‟ This is the normal-half normal model which forms the basic

form of the stochastic frontier model.

Many varieties of the stochastic frontier model have appeared in the literature. A major

survey that presents an extensive catalog of these formulations is Kumbhakar and Lovell (2000).

(See, as well, Bauer (1990), Greene (2008) and several other surveys, many of which are cited in

Kumbhakar and Lovell and in Greene.) The estimator in LIMDEP computes parameter estimates for

most single equation cross section and panel data variants of the stochastic frontier model.

A large number of variants of the stochastic frontier model based on different assumptions

about the distribution of the „inefficiency‟ term, u have been proposed in the received literature.

Most of these are available in LIMDEP, as suggested in the list below. The bulk of the received

technology centers on cross section style modeling. However, recent advances include many

extensions that take advantage of the features of panel data. A large array of panel data estimators

are also supported by LIMDEP as well.


The conventional approach to deterministic frontier estimation is currently data envelopment

analysis. This is usually handled with linear programming techniques. The analysis assumes that

there is a frontier technology (in the same spirit as the stochastic frontier production model) that can

be described by a piecewise linear hull that envelopes the observed outcomes. Some (efficient)

observations will be on the frontier while other (inefficient) individuals will be inside. The

technique produces a deterministic frontier that is generated by the observed data, so by construction,

some individuals are „efficient.‟ This is one of the fundamental differences between DEA and SFA.

Data envelopment analysis is documented in Chapter E65.

The analysis of production, cost, etc. in the stochastic frontier framework involves two steps.

In the first, the frontier model is estimated, usually by maximum likelihood. In the second, the

estimated model is used to construct measures of inefficiency or efficiency. Individual specific

estimates are computed that provide the basis of comparison of firms either to absolute standards or

to each other. The sections of this chapter develop several model forms used in the first step.

Efficiency estimation, the second step, appears formally in Section E62.8. The general methodology

is then used in the already developed specifications and with several proposed in the sections that

follow, as well as in Chapters E63 and E64.

E62.2 Stochastic Frontier Model Specifications

The stochastic frontier model is

y = x + v-u, u =|U|.

In this area of study, unlike most others, estimation of the model parameters is usually not the

primary objective. Estimation and analysis of the inefficiency of individuals in the sample and of the

aggregated sample are usually of greater interest. This part of the development will present tools for

estimation of inefficiency.

Typically, the production or cost model is based on a Cobb-Douglas, translog, or other form

of logarithmic model, so that the essential form is

log y = x + v - u

where the components of x are generally logs of inputs for a production model or logs of output and

input prices for a cost model, or their squares and/or cross products. In this form, then, at least for

relatively small variation, u represents the proportion by which y falls short of the goal, and has a

natural interpretation as proportional or percentage inefficiency. The numerous examples below will

demonstrate. Users are also referred to the various survey sources listed earlier.

The results one obtains are, of course, critically dependent on the model assumed. Thus,

specification and estimation of model parameters, while perhaps of secondary interest, are

nonetheless a major first step in the model building process. In nearly all received formulations, the

random component, v, is assumed to be normally distributed with zero mean. In some models, v may

be heteroscedastic. But, in either form, the large majority of the different frontier models that have

been proposed result from variations on the distribution of the inefficiency term, u. The range of

specifications examined in this chapter includes the following:

Distributional assumptions: half normal, exponential, gamma

Partially nonparametric frontier function

Sample selection model


The following extensions are presented in Chapter E63:

Truncated normal with nonzero, heterogeneous mean in the underlying U

Heteroscedasticity in v and/or u

Heterogeneity in the parameter of the exponential or gamma distribution

Amsler et al.‟s „scaling model‟

Alvarez et al.‟s model of fixed, latent management

A number of treatments for panel data are presented in Chapter E64.

E62.3 Basic Commands for Stochastic Frontier Models

The command for all specifications of the stochastic frontier model is

FRONTIER ; Lhs = y ; Rhs = one, ... ; … other specifications $

NOTE: One must be the first variable in the Rhs list in all model specifications.

The default specification is Aigner, Lovell and Schmidt‟s canonical normal-half normal model. The

default form is a production frontier model,

y = x + v - u, u = |U|.

That is, the right hand side of the equation specifies the maximum goal attainable. To specify a cost

frontier model or other model in which the frontier represents a minimum, so that

y = x + v + u, u = |U|,

use

; Cost

This specification is used in all forms of the stochastic frontier model. As noted below, one

additional specification you may find useful is

; Start = values for , , .

(The meanings of the parameters are developed below.) ALS also developed the normal-exponential

model, in which u has an exponential distribution rather than a half normal distribution. To request

the exponential model, use

; Model = Exponential (or ; Model = E )

in the FRONTIER command. For this model, the parameters are (,,v). Further details appear

below. There are also several model forms, and numerous modifications such as heteroscedasticity

that are developed below.


This is the full list of general specifications that are applicable to this model estimator.

Controlling Output from Model Commands

; Par keeps ancillary parameters , , etc. with main parameter vector in b.

; OLS displays least squares starting values when (and if) they are computed.

; Table = name saves model results to be combined later in output tables.

Robust Asymptotic Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),

same as ; Printvc.

; Choice uses choice based sampling (sandwich with weighting) estimated matrix.

; Cluster = spec requests computation of the cluster form of corrected covariance estimator.

Optimization Controls for Nonlinear Optimization

; Start = list gives starting values for a nonlinear model.

; Tlg [ = value] sets convergence value for gradient.

; Tlf [ = value] sets convergence value for function.

; Tlb [ = value] sets convergence value for parameters.

; Alg = name requests a particular algorithm, Newton, DFP, BFGS, etc.

; Maxit = n sets the maximum iterations.

; Output = n requests technical output during iterations; the level „n‟ is 1, 2, 3 or 4.

; Set keeps current setting of optimization parameters as permanent.

Predictions and Residuals

; List displays a list of fitted values with the model estimates.

; Keep = name keeps fitted values as a new (or replacement) variable in data set.

; Res = name keeps residuals as a new (or replacement) variable.

; Fill fills missing values (outside estimating sample) for fitted values.

Hypothesis Tests and Restrictions

; Test: spec defines a Wald test of linear restrictions.

; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec.

; CML: spec defines a constrained maximum likelihood estimator.

; Rst = list specifies equality and fixed value restrictions.

; Maxit = 0 ; Start = the restricted values specifies Lagrange multiplier test.


E62.3.1 Predictions, Residuals and Partial Effects

Predicted values and „residuals‟ for the stochastic frontier models are computed as follows:

The same forms are used for cross section and panel data forms. The predicted value is x. (These

are rarely useful in this setting.) The „residual‟ is computed directly as

ˆi i ie y x

This residual is usually not of interest in itself. It is, however, the crucial ingredient in the efficiency

estimator discussed in Section E62.8. The estimator of ui that we will use is computed by the

Jondrow formula E[u|v-u] or E[u|v+u] if based on a cost frontier,

2

( )ˆ[ | ] ,1 1 ( )

wE u w v u

w

, w = /,

2 2 , .uv u

v

In the JLMS formula, ei is the estimator of εi. The formulas and computations are discussed in

Section E62.8.

The frontier model is, save for its involved disturbance term, a linear regression model. The

conditional mean in the model is

E[yi|xi] = xi - E[ui|xi].

In most cases, E[ui|xi]is not a function of xi, so the derivatives of E[yi|xi] with respect to xi are just .

In other cases, we will consider, the conditional mean of ui does depend on xi or other variables, so

the partial effects in the model might be more involved than this. Once again, however, these will

usually not be of direct interest in the study. But, in all cases, ˆ[ | ]E u will be an involved function of

xi and any other variables that appear anywhere else in the model. We will examine the partial

effects on the efficiency estimators in Section E62.8.

E62.3.2 Results Saved by the Frontier Estimator

The results saved by the frontier estimator are

Matrices: b = regression parameters, ,

varb = asymptotic covariance matrix

Scalars: sy, ybar, nreg, kreg, and logl

Last Function: JLMS estimator of ui.


Use ; Par to add the ancillary parameters to these. The ancillary parameters that are estimated for

the various models are as follows, including the scalars saved by the estimation program:

Half and truncated normal: estimates , , saves lmda and s = ,

Truncated normal: same as half normal, estimates , saved as mu,

Exponential: estimates , v, saves theta and s = v,

Heteroscedastic model: average value of as s, average value of as lmda

Heterogeneity in mean: estimates , , saves lmda and s = .

E62.4 Data for the Analysis of Frontier Models

We will use two data sets to illustrate the frontier estimators. The first, the data on U.S.

airlines is a panel data set that we will use primarily for illustrating the stochastic frontier model.

The second, the famous WHO data on health care attainment, will be used both for the stochastic

frontier models and for the later work on data envelopment analysis.

E62.4.1 Data on U.S. Airlines

We will develop several examples in this section using a panel data set on the U.S. airline

industry from the pre-deregulation period (airlines.dat). The observations are an unbalanced panel

on 25 airlines. The original balanced panel data set contained 15 observations (1970-1984) on each

of 25 airlines. Mergers, strikes and other data problems reduced the sample to the unbalanced panel

of 256 observations The group sizes (number of firms) are 2 (4), 4(1), 7 (1), 9 (3), 10 (3), 11 (1), 12

(2), 13 (1), 14 (3) and 15 (6). The variables in the data set are

firm = ID, 1,...,25 year = 1970...1984 t = year - 1969 = 1,...,15

cost = total cost revenue = revenue output = total output

stage = average stage length points = number of points served loadfct = load factor

cmtl = materials cost mtl = materials quantity pm = price of material

cfuel = fuel cost fuel = fuel quantity pf = fuel price

ceqpt = equipment cost eqpt = equipment quantity pe = equipment price

clabor = labor cost labor = labor quantity pl = labor price

cprop = property cost property = property quantity pp = property price

k = capital index pk = capital price index

Transformed variables used in the examples are as follows:

lc = log(cost) cn = cost/pp lcn = log(cn)

lpm = log(pm) lpf = log(pf) lpe = log(pe)

lpl = log(pl) lpp = log(pp) lpk = log(pk)

lpmpp = log(pm/pp) lpfpp = log(pf/pp) lpepp = log(pe/pp)

lplpp = log(pl/pp) lf = log(fuel) lm = log(mtl)

le = log(eqpt) ll = log(labor) lp = log(property)

lq = log(output) lq2 = lq2


E62.4.2 World Health Organization (WHO) Health Attainment Data

The data used by the WHO in their 2000 World Health Report assessment of health care

attainment by 191 countries have been used by many researchers worldwide both for developing

frontier models and for analyzing health outcomes. The data are a panel of five years, 1993-1997, on

health outcome data for 191 countries and a number of internal political units, e.g., the states of

Mexico. The main outcome variables are dale and comp (an aggregate of such measures as

efficiency and equity of health care delivery in the country). The main input variables are hexp and

educ. A variety of other variables, listed below, were observed only in 1997. The following

descriptive statistics apply to the entire data set of 840 observations:

Variable Mean Std. Dev. Description

country * * country number omitting internal units, 1...,191

year * * year (1993-1997)

small * * internal political unit, 0 for countries, else 1,...,6.

comp 75.0062726 12.2051123 composite health care attainment

dale 58.3082712 12.1442590 disability adjusted life expectancy

hexp 548.214857 694.216237 health expenditure per capita, PPP units

educ 6.31753664 2.73370613 educational attainment, years

oecd .279761905 .449149577 OECD member country, dummy variable

gdpc 8135.10785 7891.20036 per capita GDP in PPP units

popden 953.119353 2871.84294 population density per square KM

gini .379477914 .090206941 gini coefficient for income distribution

tropics .463095238 .498933251 dummy variable for tropical location

pubthe 58.1553571 20.2340835 proportion of health spending paid by government

geff .113293978 .915983955 World Bank government effectiveness measure

voice .192624849 .952225978 World Bank measure of democratization

(The data were analyzed in Greene (2004a,b). Some of the variables, such as popden and gdpc, were

augmented from other sources in these studies.) Although the data are a five year panel – a few

countries were observed for fewer than five years – there is almost no cross year variation in any

variable. (The proportion of total variation that is within groups is less than 1% for the four time

varying variables.) We have created a cross section from these data as follows: First, we discarded the

data on internal political units. We then averaged comp, dale, hexp and educ across the five years. We

retained a sample of 191 cross sectional (country) units. The following command set creates the data set.

SAMPLE ; 1-840 $

REJECT ; small > 0 $

SETPANEL ; Group = country ; Pds = ti $

RENAME ; hc3 = educ $

CREATE ; lpubthe = log(pubthe) $

CREATE ; dalebar = Group Mean(dale, Pds = ti) $

CREATE ; compbar = Group Mean(comp, Pds = ti) $

CREATE ; educbar = Group Mean(educ, Pds = ti) $

CREATE ; hexpbar = Group Mean(hexp, Pds = ti) $

CREATE ; logdbar = Log(dalebar) ; logcbar = Log(compbar) $

CREATE ; logebar = Log(educbar) ; loghbar = Log(hexpbar) $

CREATE ; loghbar2 = loghbar^2 $

REJECT ; year # 1997 $


E62.5 Skewness of the OLS Residuals and Problems Fitting Stochastic Frontier Models

Before maximum likelihood estimation begins, the skewness of the OLS residuals in the

regression of y on x is checked. Waldman (1982) has shown that when the OLS residuals are

skewed in the wrong direction, a solution for the maximum likelihood estimator for the stochastic

frontier model is simply OLS for the slopes and for v2 and 0.0 for u

2. If this condition is found, a

lengthy warning is issued. We emphasize, this is not a bug in the program, nor is it something to be

„fixed,‟ beyond changing the specification of the model or rethinking the stochastic frontier as the

modeling platform. This is our single most frequently posed question, so we offer an application to

demonstrate the effect. Consider the commands

CALC ; Ran(12345) $

SAMPLE ; 1-500 $

CREATE ; u = Abs(Rnn(0,2))

; v = Rnn(0,1)

; x = Rnn(0,1)

; y = x + v + u $

REGRESS ; Lhs = y ; Rhs = one,x

; Res = e $

FRONTIER ; Lhs = y ; Rhs = one,x $

KERNEL ; Rhs = e $

The CREATE command generates y exactly according to the model, except note that u is not

subtracted, it is added. Thus, we should expect this model to perform poorly. The estimation results

from the FRONTIER command are shown below. Note the string of warnings. Estimation is

allowed to proceed, but the results are not a „frontier‟ as such. The final estimate of is essentially

zero, with a huge standard error and the reported estimate of u2 in the box above the results is

0.0000. The other estimates are, in fact, the same as OLS. The kernel density estimator for the OLS

residuals is clearly skewed in the positive, that is, the wrong direction. Once again, we emphasize,

this is a failure of the data to conform to the model.

Error 315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.

WARNING! OLS residuals have the wrong skewness for SFM

Other forms of the model models may also behave poorly.

In this case, one MLE for the half normal model is OLS

for beta and sigma and zero for the inefficiency term.

Warning 141: Iterations:current or start estimate of sigma nonpositive





Line search at iteration 30 does not improve fn. Exiting optimization.


-----------------------------------------------------------------------------

Limited Dependent Variable Model - FRONTIER

Dependent variable Y

Log likelihood function -921.33848

Estimation based on N = 500, K = 4

Inf.Cr.AIC = 1850.7 AIC/N = 3.701

Variances: Sigma-squared(v)= 2.33375

Sigma-squared(u)= .00000

Sigma(v) = 1.52766

Sigma(u) = .00000

Sigma = Sqr[(s^2(u)+s^2(v)]= 1.52766

Gamma = sigma(u)^2/sigma^2 = .00000

Stochastic Production Frontier, e = v-u

LR test for inefficiency vs. OLS v only

Deg. freedom for sigma-squared(u): 1

Deg. freedom for heteroscedasticity: 0

Deg. freedom for truncation mean: 0

Deg. freedom for inefficiency model: 1

LogL when sigma(u)=0 -921.33851

Chi-sq=2*[LogL(SF)-LogL(LS)] = .000

Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------

| Standard Prob. 95% Confidence

Y| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------

|Deterministic Component of Stochastic Frontier Model

Constant| 1.61107 165.2912 .01 .9922 -322.35365 325.57580

X| 1.00746*** .07057 14.28 .0000 .86914 1.14578

|Variance parameters for compound error

Lambda| .10897D-05 135.6070 .00 1.0000 -.26578D+03 .26578D+03

Sigma| 1.52766*** .00242 630.99 .0000 1.52292 1.53241

--------+--------------------------------------------------------------------

Figure E62.1 Kernel Density for Least Squares Residuals


Unfortunately, the Waldman result is a sufficient condition, not a necessary one. That is, it

has been shown that when the OLS residuals have the „right‟ skewness, then the MLE for the frontier

model is unique, and you will have no trouble in estimation. When they have the „wrong‟ skewness,

it is only shown that the OLS results are a local stationary point of the log likelihood, not that they

are the global maximizers. There may be another point that is yet better than OLS. Our airline data

used below provide an example. Consider the following results, where we present both the

stochastic frontier estimates and OLS. (The model, itself, is developed later, so we show only the

useful results here.) As above, we receive the initial warning about the skewness of the OLS

residuals. Then, estimation proceeds and an apparently routine solution emerges that is different

from, and better than (has a higher log likelihood) OLS.

Error 315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.

WARNING! OLS residuals have the wrong skewness for SFM

Other forms of the model models may also behave poorly.

In this case, one MLE for the half normal model is OLS

for beta and sigma and zero for the inefficiency term.

Normal exit: 11 iterations. Status=0, F= -105.0617

-----------------------------------------------------------------------------


Dependent variable LQ

Log likelihood function 105.06169

Variances: Sigma-squared(v)= .02411


Sigma(v) = .15527

Sigma(u) = .06757


--------+--------------------------------------------------------------------


LQ| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| -1.05847*** .02333 -45.37 .0000 -1.10419 -1.01274

LF| .38355*** .07045 5.44 .0000 .24547 .52163

LE| .21961*** .07300 3.01 .0026 .07653 .36270

LM| .71667*** .07654 9.36 .0000 .56666 .86668

LL| -.41139*** .06382 -6.45 .0000 -.53647 -.28630

LP| .18973*** .02960 6.41 .0000 .13171 .24775


Lambda| .43515** .20117 2.16 .0305 .04086 .82944

Sigma| .16933*** .00057 295.74 .0000 .16821 .17045

--------+--------------------------------------------------------------------

Ordinary least squares regression ............

Diagnostic Log likelihood = 105.05876

Standard error of e = .16244

--------+--------------------------------------------------------------------


LQ| Coefficient Error t |t|>T* Interval

--------+--------------------------------------------------------------------

Constant| -1.11237*** .01015 -109.57 .0000 -1.13227 -1.09247

LF| .38283*** .07116 5.38 .0000 .24335 .52231

LE| .21922*** .07389 2.97 .0033 .07441 .36404

LM| .71924*** .07732 9.30 .0000 .56769 .87078

LL| -.41015*** .06455 -6.35 .0000 -.53665 -.28364

LP| .18802*** .02980 6.31 .0000 .12961 .24643

--------+--------------------------------------------------------------------


There is no simple bullet proof strategy for handling this situation. You can try different

starting values with ; Start = values for , , that differ from OLS, but it is hard to know where

these will come from. Moreover, it is likely that you will end up at OLS anyway. As Waldman

points out, this is a potentially ill behaved log likelihood function. We offer the preceding as a

caution for the practitioner. For the particular data set used here, we can identify a specific culprit.

The „failure‟ of the model emerges in the presence of the variable lm, and does not occur when lm is

omitted from the equation. We have no theory, however, for why this should be the case. Simply

deleting variables from the model until one which does not have the skewness problem emerges does

not seem like an effective strategy.

We do note, the failure might signal a misspecified model. For example, for our airlines

example, the specification above omits the capital variable. When lk = log(k) is added to the model, we

obtain the following quite routine results (albeit with the wrong signs on capital and labor inputs).


-----------------------------------------------------------------------------





Inf.Cr.AIC = -198.9 AIC/N = -.777



Sigma(v) = .13791

Sigma(u) = .13007

Sigma = Sqr[(s^2(u)+s^2(v)]= .18957


Var[u]/{Var[u]+Var[v]} = .24425







LogL when sigma(u)=0 108.07431


Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

| Deterministic Component of Stochastic Frontier Model

Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439

LF| .37257*** .07038 5.29 .0000 .23463 .51052

LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299

LM| .69910*** .07580 9.22 .0000 .55054 .84766

LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530

LP| .44533*** .09498 4.69 .0000 .25917 .63149

LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759

| Variance parameters for compound error

Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373

Sigma| .18957*** .00064 297.81 .0000 .18832 .19082

--------+--------------------------------------------------------------------

We emphasize, the Waldman result, and this particular theoretical outcome, is specific to the

normal-half normal model. However, when it occurs, problems of a similar sort will often, but not

always, show up in other models. Thus, in spite of a warning, your fitted exponential, or panel data

model, may be quite satisfactory.


E62.6 The Ordinary Least Squares Estimator

For the simplest specification

y = x + v - u, u =|U|

in which contains a constant term and both v and U are homoscedastic and have zero means, i.e., in

the original half normal or exponential models, the OLS estimator of all elements of except the

constant term are consistent. It is convenient to rewrite the model as

y = 0 + 1x1 + v - u.

Under the assumptions, we can write the model as

y = (0 - E[u]) + 1x1 + v - (u - E[u])

or y = + 1x1 + e

in which e has zero mean and constant variance, and is orthogonal to (1,x1). Thus, the model as shown

can be estimated consistently by OLS. The constant term estimates = (0 - E[u]). Assuming that

E[u] is estimable, therefore, estimation of by MLE vs. OLS is a question of efficiency, not

consistency. (However, we remain interested in estimation of u, so this may be a moot point.)

E62.6.1 Corrected Ordinary Least Squares – COLS

The COLS estimator is obtained by turning the least squares estimator into a deterministic

frontier model. This is done by shifting the intercept in the OLS estimator upward (for a production

frontier) or downward (for a cost frontier) so that all points lie either below or above the estimated

function. Figure E62.2 shows the result for estimation of a simple cost frontier for the airlines data.

The function is shifted so that it rests on the single most extreme point (residual) in the data. The

COLS estimator is requested with

FRONTIER ; Lhs = goal variable

; Rhs = one, …

; Model = COLS $

Add ; Cost if the model is a cost frontier.

Efficiency values, as discussed below, are obtained as follows:

; Eff = variable name

saves the residuals from the deterministic frontier. These are the estimates of ui. Note in Figure E62.2,

for a cost frontier, all values of ui are positive. If you fit a production frontier, then all points will lie

below the regression and all residuals will be negative. The estimated inefficiency that is saved will be

-ei. Thus, in both cases, the values saved by ; Eff = variable are the positive estimates of the size of

the deviation of the observation from the frontier. The estimator saved by ; Eff = variable name is the

inefficiency estimate, in this model, a direct estimate of ui. The estimator of technical or cost efficiency

is

Efficiency = exp ˆ( )iu


If you fit a production frontier, use

; Techeff = variable name

to save this variable. For a cost frontier, use

; Costeff = variable name

Figure E62.2 COLS Estimator of Cost Frontier Function

The following shows computation of a COLS estimator for the airlines. The FRONTIER

command requests both the inefficiency estimates, ui, and the cost efficiency estimates, eui_cost.

The kernel density estimate for the cost efficiency is shown in Figure E62.3. The results for the

estimator begin with the standard output for least squares regression. The second panel includes

some preliminary results for the stochastic frontier model, including the chi squared test for zero

skewness (which is rejected); 2 = (n/6)(m3/s

3)

2. The standard normal statistic is the signed (based on

m3) square root of 2. The third panel presents descriptive statistics for ui and exp(-ui).

CREATE ; lc = Log(cost/pp)

; lpkp = Log(pk/pp)

; lplp = Log(pl/pp)

; lpmp = Log(pm/pp)

; lpep = Log(pe/pp)

; lpfp = Log(pf/pp) $

CREATE ; lk = Log(k) $

CREATE ; ly = Log(output) ; ly2 = .5*ly*ly $

FRONTIER ; Lhs = lc ; Rhs = one,ly,ly2,lpkp,lplp,lpmp,lpep,lpfp

; Cost ; Model = COLS

; Costeff = Eui_cost ; Eff = ui $

KERNEL ; Rhs = eui_cost

; Title = Estimated Cost Efficiency Based on COLS Estimator $


-----------------------------------------------------------------------------

Corrected OLS Deterministic Frontier Cost Function

LHS=LC Mean = 2.84024

Standard deviation = 1.09256

No. of observations = 256 Degrees of freedom

Regression Sum of Squares = 300.028 7

Residual Sum of Squares = 4.36487 248

Total Sum of Squares = 304.393 255


Fit R-squared = .98566 R-bar squared = .98526

Model test F[ 7, 248] = 2435.25310 Prob F > F* = .00000

Diagnostic Log likelihood = 157.91523 Akaike I.C. = -4.00909

Restricted (b=0) = -385.41031 Bayes I.C. = -3.89830

Chi squared [ 7] = 1086.65108 Prob C2 > C2* = .00000

--------------------------------------------------

Skewness test for inefficiency based on residuals

Normalized skewness = m3/s^3 = .21340

Chi squared test (1 degree of freedom) 1.94294 Critical value= 3.84000

Standard normal test statistic 1.39389 Test value = +/- 1.96000

Estimated Efficiency Values Based on e(i)+Min e(i)

--------+-----------------------------------------

| Mean Std.Dev. Minimum Maximum

CostInef| .357 .133 .000 .773

Cost Eff| .706 .091 .462 1.000

--------+--------------------------------------------------------------------


LC| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------

|Deterministic COLS Frontier Function

Constant| 19.4363 27.45697 .71 .4790 -34.3783 73.2510

LY| .94303*** .01809 52.12 .0000 .90757 .97849

LY2| .08248*** .01236 6.67 .0000 .05825 .10671

LPKP| 1.42385 2.14849 .66 .5075 -2.78711 5.63480

LPLP| .01915 .10169 .19 .8506 -.18016 .21847

LPMP| .04504 1.41721 .03 .9746 -2.73264 2.82272

LPEP| -.57070 .67904 -.84 .4007 -1.90159 .76019

LPFP| -.04811** .01986 -2.42 .0154 -.08704 -.00919

--------+--------------------------------------------------------------------

Note: ***, **, * ==> Significance at 1%, 5%, 10% level.

Figure E62.3 Kernel Estimator for Cost Efficiency


E62.6.2 Modified OLS and Starting Values for the MLE

Under the specific distributional assumptions of the half normal and exponential models, we

do have method of moments estimators of the underlying parameters. They are based on the moment

equations

Var[e] = Var[v] + Var[u]

and Skewness[e] = Skewness[u]

since v is symmetric. The left hand sides can be consistently estimated using the OLS residuals:

m2 = (1/n)i ei2

and m3 = (1/n)iei3.

Both of the functions on the right hand side are known for the half normal and exponential models.

In particular, for the half normal model, the moment equations are

m2 = v2 + [1 - 2/]u

2 ,

m3 = (2/)1/2

[1 - 4/]u3.

The solutions are:

1/ 3

3 / 2ˆ

1 4/u

m

and 2

2ˆ ˆ(1 2/ )v um .

Note that there is no solution for u if m3 is not negative, which is the problem discussed in Section

E62.5. Assuming that this problem does not arise, the corrected constant term is

a + Est.E[u] = a + ˆ 2/u .

This is the „modified least squares‟ (MOLS) estimator that is discussed in a number of sources, such

as Greene (2005). These are the values used for starting values for the MLE, as well. Looking

ahead, note that there is no natural method of moments estimator for the mean parameter in the

truncated normal model discussed in Section E63.3. For this model, we use

/u = 0.

For the normal-exponential model, the moment equations that correspond to the preceding are

m2 = v2 + 1/

2

m3 = -2/3.

Therefore,

1/3

2

2

3

2ˆ ˆˆand 1/v mm

and a + ˆ1/ .


The header information in the results table will display the decomposition of the variance of

the composed error in two parts. In the case of the half normal model,

Var[u] = [(-2)/]u2

not u2. Therefore, the estimated parameters might be a bit misleading as to the relative influence of

u on the total variation in the structural disturbance.

We note, these estimators are sometimes quite far from the maximum likelihood estimators,

particularly when the sample is small. But, they are generally quite satisfactory as starting values for

the MLE. The following demonstrates these results for the airline data, where we use MOLS and

MLE to fit a normal-half normal cost frontier. (Note, the signs of the OLS residuals are reversed

because we are fitting a cost function.) In the results below, we have imposed the assumption of

linear homogeneity in prices in the cost function by normalizing the six input prices, pk, pl, pe, pp,

pm, pf, by the property price, pp. The model contains log(pj/pp). To complete the constraint, we

have also normalized total cost by pp before taking logs.

CREATE ; lpk = Log(pk) $

CREATE ; lpmpp = lpm - lpp ; lpfpp = lpf - lpp ; lpepp = lpe - lpp

; lplpp = lpl - lpp ; lpkpp = lpk - lpp $

CREATE ; lcp = lc - lpp $

NAMELIST ; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp $

REGRESS ; Lhs = lc ; Rhs = x ; Res = e $

CREATE ; e = -e ; e2 = e*e ; e3 = e2*e $

CALC ; m2 = Xbr(e2) ; m3 = Xbr(e3) $

CALC ; List ; su = (m3 * Sqr(pi/2) / (1-4/pi))^(1/3)

; sv = Sqr(m2 - (1-2/pi) * su^2)

; a = b(1) + su * Sqr(2/pi) ; lambda = su/sv

; sgma = Sqr(su^2 + sv^2) $

FRONTIER ; Lhs = lc ; Rhs = x ; Cost $

The first set of results below are the OLS estimates with the correction to the constant term

and the method of moments estimators of u and v used to start the MLE. The maximum likelihood

estimators are shown next. The estimates for the stochastic frontier model include the log likelihood

and the implied estimates of u, v and their squares, based on the estimates of = u/v and 2 = u

2

+ v2, which are estimated by ML. (The reverse transformations are u

2 =

2

2/(1 +

2) and v

2 =

2/(1 +

2). The MLE is documented further in the next section.

-----------------------------------------------------------------------------

Ordinary least squares regression ............

LHS=LC Mean = 2.84024













--------+--------------------------------------------------------------------


LC| Coefficient Error t |t|>T* Interval

--------+--------------------------------------------------------------------

Constant| 19.7932 27.45697 .72 .4717 -34.0214 73.6079

LY| .94303*** .01809 52.12 .0000 .90757 .97849

LY2| .08248*** .01236 6.67 .0000 .05825 .10671

LPKP| 1.42385 2.14849 .66 .5081 -2.78711 5.63480

LPLP| .01915 .10169 .19 .8508 -.18016 .21847

LPMP| .04504 1.41721 .03 .9747 -2.73264 2.82272

LPEP| -.57070 .67904 -.84 .4015 -1.90159 .76019

LPFP| -.04811** .01986 -2.42 .0161 -.08704 -.00919

--------+--------------------------------------------------------------------

[CALC] SU = .1296481

[CALC] SV = .1046056

[CALC] A = 19.8966785

[CALC] LAMBDA = 1.2393989

[CALC] SGMA = .1665862

Calculator: Computed 5 scalar results

-----------------------------------------------------------------------------


Dependent variable LCN



Inf.Cr.AIC = -298.4 AIC/N = -1.166



Sigma(v) = .10103

Sigma(u) = .13746

Sigma = Sqr[(s^2(u)+s^2(v)]= .17059


Var[u]/{Var[u]+Var[v]} = .40216

Stochastic Cost Frontier Model, e = v+u







Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.584

Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------


LCN| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| 19.8020 25.91115 .76 .4447 -30.9829 70.5869

LY| .95577*** .01781 53.68 .0000 .92088 .99067

LY2| .09086*** .01198 7.58 .0000 .06738 .11435

LPKP| 1.43400 2.02750 .71 .4794 -2.53982 5.40783

LPLP| .01242 .09676 .13 .8979 -.17722 .20205

LPMP| .05744 1.33747 .04 .9657 -2.56396 2.67883

LPEP| -.56860 .64356 -.88 .3770 -1.82995 .69275

LPFP| -.06002*** .01993 -3.01 .0026 -.09907 -.02096


Lambda| 1.36059*** .20306 6.70 .0000 .96261 1.75857

Sigma| .17059*** .00058 294.50 .0000 .16946 .17173

--------+--------------------------------------------------------------------


E62.7 Estimating the Normal-Half Normal and Normal-Exponential Models

ALS‟s canonical form of the model is the normal-half normal model,

y = x + v - Su, u = |U|, S = +1 for production, -1 for cost,

U ~ N[0,u2],

v ~ N[0,v2].

The command for estimating the stochastic frontier model is

FRONTIER ; Lhs = y ; Rhs = one, ... $

The default form is the normal-half normal model. In this form, model estimates consist of ,

2 2

v u and = u/v, and the usual set of diagnostic statistics for models fit by maximum

likelihood. The other basic form in the ALS model is the exponential model,

u ~ exp(-u), u> 0,

which has mean inefficiency E[u] = 1/ and standard deviation, u= 1/. The parameters estimated in

the exponential specification are (,,v). The estimate of u is reported in the results as well.

The following illustrate the estimator, with a normal-half normal cost frontier and a normal-

exponential production frontier. The coefficient estimates for the exponential cost frontier are shown

as well.

FRONTIER ; Cost ; Lhs = lcn ; Rhs = x $

FRONTIER ; Cost ; Lhs = lcn; Rhs = x; Model = Exponential $

The stochastic frontier results include the standard output for MLEs The derived estimates of u, v,

u2, v

2 and are shown as well. The value of = u

2/

2 is given for comparability with other parts

of the literature. This ratio, which lies in (0,1) is sometimes reported as a variance decomposition of

. However, the variance of u = |U| is (1 - 2/)u2, so the appropriate decomposition is (1 -

2/)u2/[v

2 + (1 - 2/)u

2]. This is the value shown next under in the results.

A likelihood ratio test against the hypothesis of no inefficiency follows the variance

estimates. The degrees of freedom for the test are accumulated in the table.. The first is for u in the

base case. The second is for the heteroscedasticity terms in Var[u] when they are introduced in the

model. Heteroscedasticity is developed in Chapter E63. The third term is for the truncation

parameters in the normal-truncated normal model, also developed in the next chapter. The “degrees

of freedom for the inefficiency model” are the sum of these three terms. The likelihood ratio statistic

is presented next. This is a nonstandard test because the null value of u is on the boundary of the

parameter space. Appropriate tables for the mixed chi squared test used here are given in Kodde and

Palm (1986). (A copy of the relevant parts of the table is kept internally by the program. (See, also,

Coelli, Rao and Battese (1998) for further details.)


-----------------------------------------------------------------------------





Inf.Cr.AIC = -298.4 AIC/N = -1.166



Sigma(v) = .10103

Sigma(u) = .13746

Sigma = Sqr[(s^2(u)+s^2(v)]= .17059


Var[u]/{Var[u]+Var[v]} = .40216









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 19.8020 25.91115 .76 .4447 -30.9829 70.5869

LY| .95577*** .01781 53.68 .0000 .92088 .99067

LY2| .09086*** .01198 7.58 .0000 .06738 .11435

LPKP| 1.43400 2.02750 .71 .4794 -2.53982 5.40783

LPLP| .01242 .09676 .13 .8979 -.17722 .20205

LPMP| .05744 1.33747 .04 .9657 -2.56396 2.67883

LPEP| -.56860 .64356 -.88 .3770 -1.82995 .69275

LPFP| -.06002*** .01993 -3.01 .0026 -.09907 -.02096


Lambda| 1.36059*** .20306 6.70 .0000 .96261 1.75857

Sigma| .17059*** .00058 294.50 .0000 .16946 .17173

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------

Results for the normal-exponential model appear below. It is not possible to use a LR test to

choose between these two models. The test has zero degrees of freedom – neither model is obtained

by a restriction on the other. One possibility might be a Vuong (1989) statistic, which would be

computed as

, log( | ) log( | )i i i

m

n mV m f normal f exponential

s .

Results of the test are shown below the model results. The statistic is well inside the inconclusive

region.


-----------------------------------------------------------------------------





Inf.Cr.AIC = -299.8 AIC/N = -1.171

Exponential frontier model



Sigma(v) = .10709

Sigma(u) = .07539









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 22.6569 25.48354 .89 .3740 -27.2899 72.6038

LY| .96069*** .01892 50.77 .0000 .92360 .99777

LY2| .09281*** .01249 7.43 .0000 .06832 .11729

LPKP| 1.65439 1.99409 .83 .4067 -2.25395 5.56272

LPLP| -.00962 .09785 -.10 .9217 -.20140 .18216

LPMP| -.06595 1.31569 -.05 .9600 -2.64465 2.51275

LPEP| -.62841 .63243 -.99 .3204 -1.86795 .61114

LPFP| -.06397*** .02033 -3.15 .0017 -.10381 -.02412


Theta| 13.2651*** 2.90719 4.56 .0000 7.5671 18.9630

Sigmav| .10709*** .00980 10.93 .0000 .08788 .12629

--------+--------------------------------------------------------------------

FRONTIER ; … half normal model $

CREATE ; fn = logl_obs $

FRONTIER ; … Model = Exponential $

CREATE ; fe = logl_obs

; mi = fn - fe $

CALC ; List

; vuong = Sqr(n) * Xbr(mi)/Sdv(mi) $

[CALC] VUONG = -.9047927


E62.7.1 Log Likelihoods for the Half Normal and Exponential Models

As will be evident below, different formulations of the log likelihood are most convenient

for estimation of the different forms of the frontier models. (And, different authors sometimes

parameterize the models differently.) The base case is the normal-half normal model. In this form,

vi~ N[0,v2] and ui = |Ui| where Ui ~ N[0,u

2]. It follows that f(ui) = 2(ui/u), ui> 0. The density of

i = vi- ui has been shown to be

f(i) = (2/)(i/)(-i/).

The most common form of the individual term in the log likelihood function (and the one used in

LIMDEP) is

log Li = ½ log(2/) - log - ½(i/)2 + log[-Si/]

where i = yi - xi

= u / v,

2 = u

2 + v

2, v

2 =

2 / (1 +

2), u

2 =

2

2 / (1 +

2)

S = +1 for production frontier, -1 for cost frontier

Olsen‟s transformation is used for maximizing the log likelihood. We reparameterize the function in

terms of = 1/ and = (1/). Then,

log Li = ½ log(2/) + log + ½i2 + log (-Si)

where i = yi - ′xi.

Define the functions ai = -Si

i = (ai)/(ai)

i = -aii = i2.

Then, the gradient and Hessian are

0

log / 1/

0 0

i i

i i i i i

i

L y S y

x x

2 2

2 2

2 2 2 2

2

0

log / 0

0 0

S

1 /

0

i i

i i i i

i i i i i i i i

i i i i i i i i

i i i ii i i i

L y y

y

y y y Sy

S Syy

x x 0

x

0

x x x x 0 0 x

x 0

xx


The log likelihood for the exponential model is

log Li= log + ½2v

2 + Si+ log[-Si/v - v].

The parameter in the exponential model is 1/u. The Olsen transformation is not useful for this

model. Define ci = -Si/v - v, i = (ci), i = -cii - i2 and ai = Si/v - . The gradient and

Hessian for the exponential model are

2

2 2

2

2 2

2

/

log / 1 /

/

/ /

log /

/

ii v

i i v v i

v i v v

i i v i i i v

i i i v v

v v i i v i v i

SS

L S

S

S a S

L S a

a S a a

xx

x x x x

x

x

2 2

2 3

0

1 / 2

2 2 /

i i i

i v v i

i i v i i i v

S S

S

S S

x x

x

x

E62.7.2 Alternative Parameterization

Some treatments of the normal-half normal model (e.g., Coelli (1996)) use the alternative

parameterization = u2 /

2 in the formulation of the log likelihood. This does not change the

model, since it is a one to one transformation of the parameters;

1

.

The parameterization in terms of is more convenient but does not produce different results.

E62.7.3 Variance Estimator in Frontier 4.1

A number of researchers have used Tim Coelli‟s (1996) Frontier 4.1 program for estimation of

stochastic frontier models. Frontier 4.1 and LIMDEP use different methods for computing estimators

of the asymptotic covariance matrix of the ML estimator. LIMDEP uses either the BHHH estimator or

the negative inverse of the Hessian. Frontier 4.1 used the weighting matrix used by the DFP algorithm

to approximate the inverse Hessian during the iterations. As a general proposition, we recommend

against this „estimator,‟ and never use it. There is no theoretical assurance of its accuracy if

convergence is reached in a finite number of iterations. Nonetheless, we have been asked about this

many times. In the interest of methodological advance, LIMDEP provides a command switch,

; F41

that will invoke this estimator. (This is only provided for the stochastic frontier estimators.) No

indication is given in the output that this option has been used.


E62.8 Estimating Inefficiency and Efficiency Measures

The main objectives of fitting the frontier models is to estimate the inefficiency terms in the

stochastic model, ui, by observation. The Jondrow estimator of E[u|v-u] is the standard estimator.

This is

2

( )ˆ[ | ] ,1 1 ( )

wE u w v u

w

, w =S/.

(This is an indirect estimator of u. Unfortunately, it is not possible to estimate ui directly from any

observed sample information. The various surveys noted earlier discuss the computation of and

properties of this estimator.) The counterpart for the normal-exponential model is

( )ˆ[ | ]

1 ( )v

wE u w

w

, w = (S/v + v).

These are computed and saved as new variables in your data set with

; Eff = variable name

The ; List specification will also request a listing of this variable. This form is used for all

distributions and all variations of the stochastic frontier model.

By adding ; Eff = u to the frontier command, then

KERNEL ; Rhs = u $

we obtain the results below. (We also added the title to the command with ; Title = …) Note an

important element of the estimation. The „Standard Deviation‟ reported below is 0.054895, whereas

the estimate of u is 0.13746. The difference arises because the 0.054895 is an estimate of the

standard deviation of E[u|], not the standard deviation of u.

+---------------------------------------+

| Kernel Density Estimator for U |

| Observations = 256 |

| Points plotted = 256 |

| Bandwidth = .016298 |

| Statistics for abscissa values---- |

| Mean = .109394 |

| Standard Deviation = .054895 |

| Minimum = .030722 |

| Maximum = .350422 |

| ---------------------------------- |

| Kernel Function = Logistic |

| Cross val. M.S.E. = .000000 |

| Results matrix = KERNEL |

+---------------------------------------+


Figure E62.4 Analysis of Estimated Inefficiencies

E62.8.1 Estimating Technical or Cost Efficiency

One might be interested in estimating the „efficiency‟ of the individuals in the sample. The

model is usually specified in logs, of the form

log y = x + v - u.

Under this assumption, the efficiency of the individual would be

EFF = Exp( )

yu

Optimal y

This can be obtained with

; Techeff = the variable name

or ; Costeff = the variable name

if you estimate a cost frontier instead. You may compute both inefficiencies and efficiency measures

in the same command. Figure E62.5 was obtained by adding

; Costeff = ecu

to the FRONTIER command, then requesting the kernel density estimator as before (with the title

changed accordingly).


Figure E62.5 Estimated Cost Efficiencies

E62.8.2 Confidence Intervals for Inefficiency and Efficiency Estimates

Horrace and Schmidt (1996, 2000) suggest a useful extension of the Jondrow result. JLMS

have shown that the distribution of ui|i is that of a N[μi*,σ*] random variable, truncated from the left

at zero, where μi* = -εiλ2/(1+λ

2) and σ* = σλ/(1+λ

2). This result and standard results for the

truncated normal distribution (see, e.g., Greene (2011)) can be used to obtain the conditional mean

and variance of ui|i. With these in hand, one can construct some of the features of the distribution of

ui|i or E[TEi|i] = E[exp(-ui|i]. The literature on this subject, including the important contributions

of Bera and Sharma (1999) and Kim and Schmidt (2000) refer generally to „confidence intervals‟ for

ui|i. For reasons that will be clear shortly, we will not use that term – at least not yet, until we have

made more precise what we are estimating.

For locating 100(1-)% of the conditional distribution of ui|i, we use the following system

of equations

2 = v

2 + u

2

= u/v

i* = -iu2/

2 = -i

2/(1+

2)

* = uv/ = /(1 + 2)

1

2

1

2

* * 1 (1 ) * / *

* * 1 * / *

i

i

LB

UB

i i

i i

Then, if the elements were the true parameters, the region [LBi,UBi] would encompass 100(1-)% of

the distribution of ui|i. For constructing „confidence intervals‟ for technical efficiency, TEi|i, it is

necessary only to compute TEUBi = exp(-LBi) and TELBi = exp(-UBi).


We note two caveats about the estimator. First, the received papers based on classical

methods have labeled this a confidence interval for ui. However, it is a range that encompasses

100(1-)% of the probability in the conditional distribution of ui|i. based on E[ui|i], not ui, itself.

The interval is „centered‟ at the estimator of the conditional mean, E[ui|i], not the estimator of ui,

itself, as a conventional „confidence interval‟ would be. The estimator is actually characterizing the

conditional distribution of ui|i, not constructing any kind of interval that brackets a particular ui –

that is not possible. Second, these limits are conditioned on known values of the parameters, so they

ignore any variation in the parameter estimates used to construct them. Thus, we regard this as a

minimal width interval.

You can request computation of these lower and upper bounds by adding

; CI(100( 1 - )) = lower, upper

where 100(1-) is one of 90, 95, or 99 and lower, upper are names for two variables that will be

created. You may use this feature with ; Eff = variable or ; Techeff = variable (or ; Costeff =

variable for a cost frontier). If you have both ; Eff and ; Techeff in the command, the confidence

intervals are computed for ; Techeff. (You can obtain the interval for ; Eff in this case by computing

the negatives of the logs with CREATE.)

We obtained these bounds for our cost function with

; Costeff = euc ; CI(95) = eucl,eucu

We followed the estimation with

PLOT ; Rhs = eucl,ecu,eucu

; Title = Upper and Lower Bound Estimates of Cost Efficiency

; Vaxis = Cost Efficiency$

to obtain Figure E62.6.

Figure E62.6 Lower and Upper Bound Estimates of Cost Efficiency


The centipede plot is also a useful device in this context. The following redraws Figure E62.6 using

a different view for the lower and upper bounds

CREATE ; Firm_i = Trn(1,1) $

PLOT ; Lhs = firm_i ; Rhs = eucl,eucu

; Centipede ; Endpoints = 0,260 ; Grid

; Title = Confidence Limits for Cost Efficiency $

Figure E62.7 Centipede Plot of Efficiency Bounds

E62.8.3 Partial Effects on Efficiencies

The variables in the production or cost frontier function begin with either the inputs for the

production model or input prices and outputs in the cost model. Analyses of how these variables

affect technical or cost efficiency are not likely to be particularly revealing. However, if the function

includes environmental variables (we call these zi), it might be of interest to examine how variation

in these impacts efficiency. For our example, we consider

Log(Cost/Pp) = + q logQ + qq log2Q + kk log(Pk/Pp)

+ Lload factor + Nnodes + SLog stage length + v + u

In this case, it might be interesting to examine how increased load factor, route complexity, or stage

length impact efficiency.

Expressions for the technical inefficiency values appear at the beginning of Section E62.8.

In those expressions, we will use

Efficiency = exp{- ˆ[ | ]E u }.

The two expressions for the normal and exponential models are functions of a w() that is specific to

the model. Each may be written as

Efficiency = exp{-mA[wm()]}


Where m = half normal or exponential, m = /(1+2) for the half normal and 1/v for the

exponential, and wm is defined earlier. We now suppose that

= y - x - z

where x is the theoretical inputs to the goal and z are the environmental variables. We require the

derivatives with respect to z. For convenience, let W = -w and exploit the symmetry of the normal

density. Then, A[wm()] = [(W)/(W) + W]. The derivative is

Efficiency/z = Efficiency-mdA(W)/dW -1 wm/ -.

The two terms that we need to complete the derivation are wm/ = S/ for the half normal model

and S/v for the exponential model and

2

( ) ( ) ( )1 ( ).

( ) ( )

dA W W W WD W

dW W W

Collecting terms,

2 2/(1 )

( ) ( )

1

EfficiencyEfficiency D W Sor

z

We can sign this result, though the magnitude will be empirical. The first three terms are all between

zero and one, as is their product. S is either +1 for a production frontier or -1 for a cost frontier.

Thus, in total, the derivative is a fraction of the corresponding coefficient, which takes the same sign

for a cost frontier and the opposite sign for a production frontier.

Partial derivatives and simulations are computed with PARTIALS and SIMULATE. The

general approach would be

FRONTIER ; Cost (optional)

; Lhs = goal variable

; Rhs = one, x variables, z variables $

The command might also contain ; Eff = variable, ; Techeff = variable or ; Costeff = variable.

Then, you may follow it with

PARTIALS ; Effects: variables desired ; other options $

or SIMULATE ; Scenario … all options $

The function analyzed in these two commands is the technical or cost efficiency,

Efficiency = exp{- ˆ[ | ]E u }.


The following demonstrates using the cost frontier, with variables z = (load factor, log stage length,

points served). Data on z are missing for one of the firms.

CREATE ; logstage = Log(stage) $

NAMELIST ; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp

; z = loadfctr,logstage,points $

FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z

; Eff = u ; Costeff = euc ; CI(95) = eucl,eucu $

SIMULATE ; Scenario: & loadfctr = .4(.025)1 ; Plot(ci) $ -----------------------------------------------------------------------------


Dependent variable LC



Inf.Cr.AIC = -404.3 AIC/N = -1.579



Sigma(v) = .09054

Sigma(u) = .08676

Sigma = Sqr[(s^2(u)+s^2(v)]= .12539


Var[u]/{Var[u]+Var[v]} = .25020









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 9.19939 21.64273 .43 .6708 -33.21957 51.61835

LY| .97398*** .01751 55.63 .0000 .93966 1.00829

LY2| .05123*** .01029 4.98 .0000 .03106 .07140

LPKP| .49455 1.69257 .29 .7701 -2.82283 3.81193

LPLP| .13721* .08121 1.69 .0911 -.02195 .29637

LPMP| .45863 1.11624 .41 .6812 -1.72915 2.64642

LPEP| -.10302 .53634 -.19 .8477 -1.15422 .94818

LPFP| -.02090 .01794 -1.16 .2441 -.05607 .01427

LOADFCTR| -.99466*** .17446 -5.70 .0000 -1.33660 -.65273

LOGSTAGE| -.17940*** .02531 -7.09 .0000 -.22902 -.12979

POINTS| .00164*** .00031 5.20 .0000 .00102 .00225


Lambda| .95827*** .16869 5.68 .0000 .62763 1.28890

Sigma| .12539*** .00039 321.29 .0000 .12463 .12616

--------+--------------------------------------------------------------------


---------------------------------------------------------------------

Model Simulation Analysis for JLMS efficiency estimator in SF model

---------------------------------------------------------------------

Simulations are computed by average over sample observations

---------------------------------------------------------------------

User Function Function Standard

(Delta method) Value Error |t| 95% Confidence Interval

---------------------------------------------------------------------

Avrg. Function .93354 .00635 147.07 .92110 .94598

LOADFCTR= .40 .95844 .00346 277.19 .95166 .96522

LOADFCTR= .43 .95502 .00344 277.54 .94827 .96176

LOADFCTR= .45 .95123 .00357 266.70 .94424 .95822

LOADFCTR= .48 .94706 .00392 241.56 .93937 .95474

LOADFCTR= .50 .94247 .00456 206.48 .93353 .95142

LOADFCTR= .53 .93746 .00552 169.87 .92664 .94828

(some rows omitted)

LOADFCTR= .83 .84622 .03145 26.91 .78458 .90786

LOADFCTR= .85 .83696 .03384 24.73 .77063 .90329

LOADFCTR= .88 .82763 .03616 22.89 .75676 .89850

LOADFCTR= .90 .81827 .03839 21.32 .74303 .89352

LOADFCTR= .93 .80892 .04053 19.96 .72947 .88836

LOADFCTR= .95 .79958 .04259 18.78 .71611 .88305

LOADFCTR= .98 .79029 .04455 17.74 .70296 .87761

Figure E62.8 Simulated Cost Efficiency Values

We have also analyzed the partial effects.

FRONTIER ; Cost ; Lhs = lcp ; Rhs = x,z $

PARTIALS ; Effects: loadfctr & loadfctr = .4(.025)1 ; Plot(ci) $

PARTIALS ; Effects: z ; Summary $


---------------------------------------------------------------------

Partial Effects Analysis for JLMS efficiency estimator in SF model

---------------------------------------------------------------------

Effects on function with respect to LOADFCTR

Results are computed by average over sample observations

Partial effects for continuous LOADFCTR computed by differentiation

Effect is computed as derivative = df(.)/dx

---------------------------------------------------------------------

df/dLOADFCTR Partial Standard

(Delta method) Effect Error |t| 95% Confidence Interval

---------------------------------------------------------------------

APE. Function -.22444 .06690 3.35 -.35557 -.09331

LOADFCTR= .40 -.13020 .02575 5.06 -.18067 -.07973

LOADFCTR= .43 -.14405 .03134 4.60 -.20547 -.08263

LOADFCTR= .45 -.15900 .03766 4.22 -.23281 -.08519

LOADFCTR= .48 -.17497 .04464 3.92 -.26246 -.08748

(Some rows omitted)

LOADFCTR= .85 -.37205 .09615 3.87 -.56051 -.18359

LOADFCTR= .88 -.37392 .09265 4.04 -.55551 -.19234

LOADFCTR= .90 -.37452 .08896 4.21 -.54887 -.20017

LOADFCTR= .93 -.37403 .08524 4.39 -.54109 -.20697

LOADFCTR= .95 -.37265 .08160 4.57 -.53259 -.21271

LOADFCTR= .98 -.37054 .07813 4.74 -.52368 -.21739

Figure E62.9 Partial Effects of Load Factor

---------------------------------------------------------------------

Partial Effects for JLMS efficiency estimator in SF model

Partial Effects Averaged Over Observations

* ==> Partial Effect for a Binary Variable

---------------------------------------------------------------------

Partial Standard


---------------------------------------------------------------------

LOADFCTR -.25723 .07389 3.48 -.40205 -.11240

LOGSTAGE -.04620 .01292 3.58 -.07153 -.02088

POINTS .00035 .00012 2.95 .00012 .00058

---------------------------------------------------------------------


E62.8.4 Partial Effects of Model Variables on Efficiencies

The preceding has examined the partial effects with respect to z in the model

y = x + z + v-Su.

It was noted that partial effects with respect to x are not likely to be particularly interesting.

Nonetheless, they could be computed.

NOTE: Partial effects of variables in the stochastic frontier efficiency models may be computed

with respect to any variable in any model, regardless of where those variables appear in the model.

That includes x in the original frontier model, z in the means of the truncated regression formats, and

z in the variances of the heteroscedasticity models.

To continue the earlier example, the partial effect of LogQ could be computed in the cost function

using

NAMELIST ; x = one,lq,lq^2,lpmpp,lpfpp,lpepp,lplpp,lpkpp $

NAMELIST ; z = loadfctr,logstage,points $

FRONTIER ; Cost ; Lhs = lcp ; Rhs = x,z $

PARTIALS ; Effects : lq ; summary $

Note that the specification will correctly account for the fact that the square of LogQ appears in the

cost function when it computes the partial effects.

E62.8.5 Examining Ranks of Inefficiencies

Researchers often analyze outcome data in which the absolute values of the inefficiencies are

not necessarily of interest. Rather, it is the ranking of observations that they wish to analyze. The

WHO analysis of health care attainment (see Section E62.4.2) is a prominent example. LIMDEP

provides several tools for examining ranks of inefficiencies.

First, to rank the raw observations on efficiency or inefficiency, use

CREATE ; rank variable = Rnk(variable) $

The Rnk function sorts the data for you and creates the ranking variable. The observation with the

highest value gets the rank of one. The lowest gets a rank of n. Note, tied observations do not get the

same rank. Tied observations are ranked in the order in which they appear in the data. For example, in

a sample of 100, if 10 observations are tied for third place, they will receive ranks 3 through 12.

Two CALC functions provide descriptive measures for ranks. For two sets of ranks, the

Spearman rank correlation coefficient is computed as

= 1 - 6 Σidi2 /n(n

2 - 1),

di= variable1i - variable2i


The function for computing this is

CALC ; List ; Rkc(variable1,variable2) $

The rank correlation is a correlation coefficient, so it has a natural range of measurement. (See the

application below.) For more than two sets of ranks, a useful statistic is Kendall‟s coefficient of

concordance,

W = 121

n

i (Si - S )2/[nK

2(n

2 - 1)]

where Si = Σkrankk,i.

To compute this measure, use

CALC ; List ; Cnc(ranks1,...,ranksK) $

The concordance coefficient is not a correlation coefficient, so its magnitude is ambiguous. It can be

used for a large sample test of discordance. Under the null hypothesis that the sets of ranks are

independent, the statistic has a large sample chi squared distribution. In particular,

K(n-1)W → χ2[K(n-1)].

To illustrate these computations, we have analyzed the WHO data described in Section

E62.4.2. We have fit identical stochastic frontier models for the two attainment variables, lcomp, the

log of the composite measure, and ldale, the log of disability adjusted life expectancy. We then

computed the ranks for the 191 countries and plotted the ranks for the two measures as well as the

raw efficiency measures. The simple correlation for the efficiency measures and the rank correlation

for the ranks are displayed. The commands are as follows:

NAMELIST ; x = one,logebar,loghbar,loghbar2 $

NAMELIST ; z = gini,lpopden,lgdpc,geff,voice,oecd,lpubthe,tropics $

FRONTIER ; Lhs = logdbar ; Rhs = x,z

; Eff = udale ; Techeff = edale $

FRONTIER ; Lhs = logcbar ; Rhs = x,z

; Eff = ucomp ; Techeff = ecomp $

CREATE ; dalerank = 192 - Rnk(edale) $

CREATE ; comprank = 192 - Rnk(ecomp) $

PLOT ; Lhs = dalerank ; Rhs = comprank

; Endpoints = 0,200 ; Limits = 0,200

; Title = Ranks of Efficiencies: DALE vs. COMP $

PLOT ; Lhs = edale ; Rhs = ecomp ; Endpoints = .8,1 ; Grid

; Title = Efficiencies: DALE vs. COMP $

CALC ; List ; Rkc(dalerank,comprank) $

CALC ; List ; Cor(edale,ecomp) $


-----------------------------------------------------------------------------


Dependent variable LOGDBAR



Inf.Cr.AIC = -283.7 AIC/N = -1.485



Sigma(v) = .03808

Sigma(u) = .18134

Sigma = Sqr[(s^2(u)+s^2(v)]= .18529


Var[u]/{Var[u]+Var[v]} = .89180









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------


LOGDBAR| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| 2.60812*** .18255 14.29 .0000 2.25034 2.96590

LOGEBAR| .11227*** .01869 6.01 .0000 .07564 .14891

LOGHBAR| .30118*** .05072 5.94 .0000 .20177 .40059

LOGHBAR2| -.02710*** .00455 -5.96 .0000 -.03601 -.01818

GINI| -.30417*** .10600 -2.87 .0041 -.51192 -.09642

LPOPDEN| .00213 .00402 .53 .5955 -.00574 .01001

LGDPC| .07541*** .02424 3.11 .0019 .02789 .12293

GEFF| -.00673 .01551 -.43 .6642 -.03714 .02367

VOICE| .02093* .01113 1.88 .0601 -.00089 .04275

OECD| .01608 .03055 .53 .5987 -.04381 .07596

LPUBTHE| .00974 .01497 .65 .5150 -.01959 .03908

TROPICS| -.03703** .01714 -2.16 .0307 -.07063 -.00344


Lambda| 4.76248*** 1.22054 3.90 .0001 2.37026 7.15470

Sigma| .18529*** .00086 214.30 .0000 .18360 .18698

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


Dependent variable LOGCBAR



Inf.Cr.AIC = -468.4 AIC/N = -2.452



Sigma(v) = .03768

Sigma(u) = .09421

Sigma = Sqr[(s^2(u)+s^2(v)]= .10147


Var[u]/{Var[u]+Var[v]} = .69429









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------


LOGCBAR| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| 3.21081*** .10704 30.00 .0000 3.00101 3.42060

LOGEBAR| .06590*** .01319 4.99 .0000 .04004 .09177

LOGHBAR| .18617*** .03763 4.95 .0000 .11240 .25993

LOGHBAR2| -.01509*** .00328 -4.61 .0000 -.02151 -.00867

GINI| -.25334*** .07579 -3.34 .0008 -.40189 -.10478

LPOPDEN| .00523* .00281 1.86 .0628 -.00028 .01073

LGDPC| .05747*** .01681 3.42 .0006 .02453 .09040

GEFF| .00290 .01068 .27 .7858 -.01803 .02384

VOICE| .02082** .00872 2.39 .0170 .00373 .03791

OECD| .01699 .01946 .87 .3827 -.02115 .05513

LPUBTHE| .01798** .00903 1.99 .0466 .00027 .03568

TROPICS| -.02365** .01191 -1.99 .0471 -.04700 -.00031


Lambda| 2.50000*** .41784 5.98 .0000 1.68104 3.31896

Sigma| .10147*** .00045 224.53 .0000 .10058 .10235

--------+--------------------------------------------------------------------

[CALC] *Result*= .6353076



Figure E62.10 Ranks and Estimates of Efficiency


E62.9 Partially Nonparametric Stochastic Frontier Model

The stochastic frontier is fully parametric in both the deterministic part of the frontier and

the distribution of the components of i. This section examines a partially nonparametric model of

the form

y = g(x,z) + v – Su.

The estimator is based on the locally linear regression in Section E9.5. The underlying logic is the

result that in the stochastic frontier model, apart from the constant term, OLS consistently estimates

the slope parameters of the model and estimates the constant term with a known bias. For the

constant, a, the bias is E[u], the unconditional mean, which in the stochastic frontier model is

E[u] = 2 /u .

Continuing this approach, then, the least squares residuals estimate i + E[u]. In addition, the least

squares residual variance, ee/n, consistently estimates Var[i] = 2 = v

2 + [(1 – 2/)u

2]. The

implication is that the only parameter remaining to estimate is u2. In Section E62.6.2, we used the

third moment of the OLS residuals and the method of moments to estimate u, then used this

estimate to estimate , the constant term in the frontier function.

The approach proposed here uses this same method with three differences.

1. The residuals used to compute the variance estimator are based on a locally linear,

nonparametric estimator of the deterministic function.

2. The remaining parameter to be estimated in this case is rather than u. We will base the

estimation on the result 2 2 2 2/ (1 ).u

3. The approach will be based on a maximum likelihood estimator rather than the method of

moments.

Estimation uses the following steps: We begin with estimation of the conventional normal-half

normal frontier model with a linear frontier function in order to obtain an initial estimator of and of

2. The LOWESS estimator developed in Chapter E9.5 is then employed to estimate g(x,z) for each

point in the sample. The residuals from the estimated functions are used with the estimate of 2 for

estimation of . With 2 and in hand, we can compute the constant term, a set of residuals, and the

JLMS estimators of technical or cost efficiency. Technical details appear in Section E62.9.2.


E62.9.1 Application

We have reestimated the airlines cost frontier with the semiparametric estimator. The

frontier functions differ noticeably, primarily in the parameter estimates that are statistically

insignificant. The kernel estimators suggest, however, that the difference in the estimates of

inefficiency are quite modest. The descriptive statistics suggest the same pattern. The final plot

shows more graphically how the nonparametric function has changed the estimates. The fact that

most of the estimates from the nonparametric estimator lie below the 45 degree line is consistent

with the appearance that generally, they are smaller than the parametric values. The last set of

results are the ordinary (Pearson) correlation and Kendall‟s tau.

FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Costeff = eup $

FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Lowess ; Costeff = eunp$

KERNEL ; Rhs = eunp,eup

; Title = Estimated Inefficiencies from Parametric and Nonparametric

Frontiers $

DSTAT ; Rhs = eup,eunp $

PLOT ; Lhs = eup ; Rhs = eunp ; Rh2 = eup ; Fill ; Grid ; Vaxis = EUNP

; Title = Nonparametric vs. Parametric Estimates $

CALC ; List; Cor(eup,eunp) ; Ktr(eup,eunp) $

-----------------------------------------------------------------------------







Sigma(v) = .09054

Sigma(u) = .08676

Sigma = Sqr[(s^2(u)+s^2(v)]= .12539


Var[u]/{Var[u]+Var[v]} = .25020









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

-----------------------------------------------------------------------------


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 9.19939 21.64273 .43 .6708 -33.21957 51.61835

LY| .97398*** .01751 55.63 .0000 .93966 1.00829

LY2| .05123*** .01029 4.98 .0000 .03106 .07140

LPKP| .49455 1.69257 .29 .7701 -2.82283 3.81193

LPLP| .13721* .08121 1.69 .0911 -.02195 .29637

LPMP| .45863 1.11624 .41 .6812 -1.72915 2.64642

LPEP| -.10302 .53634 -.19 .8477 -1.15422 .94818

LPFP| -.02090 .01794 -1.16 .2441 -.05607 .01427

LOADFCTR| -.99466*** .17446 -5.70 .0000 -1.33660 -.65273

LOGSTAGE| -.17940*** .02531 -7.09 .0000 -.22902 -.12979

POINTS| .00164*** .00031 5.20 .0000 .00102 .00225


Lambda| .95827*** .16869 5.68 .0000 .62763 1.28890

Sigma| .12539*** .00039 321.29 .0000 .12463 .12616

--------+--------------------------------------------------------------------

+-----------------------------------------------+

| Locally linear weighted regression estimation |

| Sample size 256 |

| Model size 11 |

| Band width .500000 |

| LOESS Sum of Squared Residuals 1.69637 |

| OLS Sum of Squared Residuals 2.79975 |

| Derivatives Matrix LOCLBETA |

+-----------------------------------------------+

Reestimating lambda using residuals based on LOWESS regression


-----------------------------------------------------------------------------

Partially Nonparametric Stochastic Frontier Fit by LOWESS


Estmation based on N = 256, K = 11

Variances: Sigma-squared(u)= .00438 Sigma(u) = .06616

Sigma-squared(v)= .00504 Sigma(v) = .07096

Sigma = Sqr[(s^2(u)+s^2(v)]= .09702 Lambda = .93233


-----------------------------------------------------------------------------

Statistical results are for the sample means of the LOWESS estimated betas.

They are not moments of an asymptotic distribution.

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

Constant| 34.8551 23.42958 1.49 .1368 -11.0661 80.7762

LY| .98897*** .05040 19.62 .0000 .89018 1.08775

LY2| .04598*** .01677 2.74 .0061 .01310 .07885

LPKP| 2.48149 1.78813 1.39 .1652 -1.02319 5.98616

LPLP| .09976 .10851 .92 .3579 -.11292 .31244

LPMP| -.85374 1.34656 -.63 .5261 -3.49295 1.78547

LPEP| -.71103 .43514 -1.63 .1023 -1.56389 .14183

LPFP| -.02183 .03324 -.66 .5114 -.08698 .04332

LOADFCTR| -.78691 .65061 -1.21 .2265 -2.06208 .48826

LOGSTAGE| -.20490* .11308 -1.81 .0700 -.42653 .01672

POINTS| .00225 .00205 1.10 .2710 -.00176 .00627

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


Descriptive Statistics

--------+---------------------------------------------------------------------

Variable| Mean Std.Dev. Minimum Maximum Cases Missing

--------+---------------------------------------------------------------------

EUP| .933537 .025027 .812486 .975689 256 0

EUNP| .948487 .019528 .844732 .983878 256 0

--------+---------------------------------------------------------------------



Calculator: Computed 2 scalar results

Figure E62.11 Kernel Estimators of Inefficiency Distributions

Figure E62.12 Plot of Nonparametric Estimates vs. Parametric Estimates


E62.9.2 Technical Details

The log likelihood function for the normal-half normal model is the sum of

log Li = ½ log(2/) - log - ½(i/)2 + log[-Si/].

The value of 2= v

2 + [(1 – 2/)u

2]is estimated using the squared LOWESS residuals; it is the

sample variance = q2. The LOWESS residuals, themselves, are estimates of i + E[ui]. With q

2 and

the residuals in hand, the log likelihood is a function only of . During the iteration, we compute

a = /(1+2)

1/2,

s2 = q

2 / (1 – (2/)a

2), then s

m = as 2 /

ei = residuali - m.

These residuals and s are used to compute logLi and the derivative with respect to . This estimation

step provides the estimator of that we need to compute the efficiencies. After estimation of ,

computation of the JLMS estimates of inefficiency is done the same as in the parametric form of the

model, using the LOWESS residuals.

E62.10 The Normal-Gamma Model

The normal-gamma model is the remaining distributional form of the stochastic frontier

model. Under this specification,

ui ~ 1exp( )

, 0, 0, 0.( )

P P

i ii

u u u P

P

This model is more flexible than the half normal or exponential model in that with two parameters, it

allows the both the shape and location to vary independently. (The truncation model does likewise,

but it is considerably more difficult to estimate.) To specify the gamma model, use

; Model = Gamma (or ; Model = G)

The normal-gamma model is estimated by the method of simulated maximum likelihood.

(See Greene (2000b) and the details in Section E62.10.2.) The counterpart to the JLMS estimator of

the inefficiency, E[u|] must also be estimated by simulation.


E62.10.1 Application of the Normal-Gamma Model

We illustrate the gamma model by fitting a cost frontier model with normal-gamma

inefficiency. For comparison, we have also fit the exponential model, which results when P is

constrained to equal one. (The exponential model is fit directly by its own log likelihood, not by

constraining P to equal one in the gamma model.) We have also computed the inefficiencies for the

two models, and plotted kernel density estimators to compare them. The commands are

FRONTIER ; Lhs = lc ; Rhs = x ; Cost ; Model = Gamma ; Costeff = eucg

; Pts = 50 ; Halton $

FRONTIER ; Lhs = lc ; Rhs = x ; Cost ; Model = Exponential ; Costeff = euce $

KERNEL ; Rhs = eucg,euce

; Title = Kernel Density Estimates for E[u|e,exponential and gamma] $

We note by the Wald and likelihood ratio tests, we cannot reject the hypothesis of the exponential

model (P is close to one). The similarity of the kernel density estimators is consistent with this finding.

-----------------------------------------------------------------------------





Inf.Cr.AIC = -297.9 AIC/N = -1.164

Model estimated: Aug 22, 2011, 22:09:16

Normal-Gamma frontier model



Sigma(v) = .10814

Sigma(u) = .07399


Half Normal:u(i)=|U(i)|; frontier model








Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 22.9007 27.13658 .84 .3987 -30.2860 76.0874

LY| .96086*** .02028 47.38 .0000 .92112 1.00061

LY2| .09283*** .01327 7.00 .0000 .06682 .11883

LPKP| 1.67283 2.12387 .79 .4309 -2.48987 5.83553

LPLP| -.01112 .06724 -.17 .8687 -.14290 .12066

LPMP| -.07676 1.37564 -.06 .9555 -2.77297 2.61944

LPEP| -.63376 .68533 -.92 .3551 -1.97698 .70946

LPFP| -.06405*** .02311 -2.77 .0056 -.10934 -.01876


Theta| 12.4180** 5.05037 2.46 .0139 2.5194 22.3165

P| .84426 .69128 1.22 .2220 -.51062 2.19913

Sigmav| .10814*** .01148 9.42 .0000 .08563 .13064

--------+--------------------------------------------------------------------

E65: Data Envelopment Analysis E-43

-----------------------------------------------------------------------------


Exponential frontier model



Sigma(v) = .10709

Sigma(u) = .07539









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 22.6569 25.48354 .89 .3740 -27.2899 72.6038

LY| .96069*** .01892 50.77 .0000 .92360 .99777

LY2| .09281*** .01249 7.43 .0000 .06832 .11729

LPKP| 1.65439 1.99409 .83 .4067 -2.25395 5.56272

LPLP| -.00962 .09785 -.10 .9217 -.20140 .18216

LPMP| -.06595 1.31569 -.05 .9600 -2.64465 2.51275

LPEP| -.62841 .63243 -.99 .3204 -1.86795 .61114

LPFP| -.06397*** .02033 -3.15 .0017 -.10381 -.02412


Theta| 13.2651*** 2.90719 4.56 .0000 7.5671 18.9630

Sigmav| .10709*** .00980 10.93 .0000 .08788 .12629

--------+--------------------------------------------------------------------

Figure E62.13 Kernel Density Estimates for Gamma and Exponential Inefficiencies


E62.10.2 Technical Details on Normal-Gamma Model

The log likelihood for this model is equal to the log likelihood for the normal-exponential

model plus a term that is produced by the difference between the exponential and the gamma

distributions;

Log L = Log L(exponential)

+ n[(P-1)log - log(P)] + i log h(P-1,i)

where h(r,i) =

0

0

1/ ( ) /

1/ ( ) / v i v

v i v

rz z dz

z dz

, i = -i - v

2.

The normal-exponential model results if P = 1. Computation of the function h(r,i) is the obstacle to

estimation. Beckers and Hammond (1987) derived a closed form expression, but the result has never

been operationalized – it is complex in the extreme. Greene (1990) attempted estimation by using a

crude approximation with Simpson‟s rule, but failed to obtain reasonable results. (See Ritter and

Simar (1997).)

A satisfactory solution is produced by the technique of maximum simulated likelihood. The

integral and its derivatives can be estimated consistently by Monte Carlo simulation. The crucial

result is that h(r,i) is the expectation of a random variable;

h(r,i) = E[zr | z 0]

where z ~ N[i, v2]

i = -i- v2

Therefore, h(r,i) is the expected value of zr where z has a truncated at zero normal distribution.

Thus, we estimate h(r,i) by using the mean of a sample of draws from this distribution. For given

values of i and i (i.e., yi, xi, , v, , r), h(r,i) is consistently estimated by

1

1ˆ Q r

i iqqh z

Q

where ziq is a random draw from the truncated normal distribution with mean parameter i and

variance parameter v. This produces the simulated log likelihood function

Log LS = Log L(exponential)

+ n[(P-1)log - log(P)] + i log h (P-1,i)

which for a given set of draws is a smooth and continuous function of the parameters.


Random draws from the truncated distribution are obtained using Geweke‟s method as

follows: Let

L = truncation point = 0 for this application

= the mean of untruncated distribution = -i - v2

= the standard deviation of the untruncated distribution = v

PL = [(L - ) / ]

F = one draw from U[0,1]

z = + -1

[PL + F(1 - PL)]

Then, z = the draw from the truncated distribution.

Collecting all terms, then, this produces the simulated log likelihood function:

Log L = n{log + ½ v2

2} + i{di + log[-(di/v + v)]}

+ n[(P-1)log - log(P)]

+ i log

1

1

1)1(

1P

v

iiqiqvi

Q

qFF

Q

i = yi - xi

i = -i- v2

and Fiq is a fixed set of Q draws from U[0,1] specific to the individual. Derivatives of h(r,i) and log

h(r,i) are also estimated by simulation. The JLMS efficiency measure has the simple form

E[u|] = h(P,i) / h(P-1,i).

The final consideration is the method of obtaining the draws. The default method is to use

the random number generators. Since this is a very computation intensive model, it is usually more

efficient to use Halton draws – you can use many fewer Halton draws than random draws to obtain

the same quality results. Halton draws are discussed in Section R24.7. To use Halton draws with

this estimator, add

; Halton

to the command. The number of points for either method is specified with

; Pts = the desired number of draws

We have used this feature in the example in the previous section.


E62.11 Sample Selection in a Stochastic Frontier Model

This model is a counterpart to familiar models of sample selection. See Greene (2010) for

details on the methodology. Additional results appear in Terza (2010). The model is a familiar

sample selection form

d* = z + w, d = 1(d* > 0)

y = x + v - u

u = |U| with U ~ N[0,u2]

(v,w) ~ bivariate normal with [(0,0),(v2, v, 1)]

(y,x) only observed when d = 1.

Thus, the selection operates through the heterogeneity component of the production model, not the

inefficiency. (Thus, observation is not viewed as a function of the level of inefficiency.)

The model is fit by maximum simulated likelihood. To request it, use LIMDEP‟s usual

format for sample selection models,

PROBIT ; Lhs = d ; Rhs = variables in w ; Hold $

FRONTIER ; Lhs = y ; Rhs = variables in x; Selection $

The model must be the base case, half normal, with no panel data application, no truncation, or

heteroscedasticity, etc. You may control the simulations with ; Halton and ; Pts for the simulation.

Efficiency and inefficiency estimates are saved as with other models with ; Eff and ; Techeff.

However, observations in the nonselected part of the sample are given missing values (-999) for any

of these computations. The PARTIALS and SIMULATE commands do not inherit the selection

model – these commands are not available after fitting this model.

E62.11.1 Application

The following creates a data set that conforms exactly to the assumptions of the model.

CALC ; Ran(123457) $

SAMPLE ; 1-2000 $

CREATE ; z1 = Rnn(0,1) ; z2 = Rnn(0,1) $

CREATE ; v1 = Rnn(0,1) ; v2 = Rnn(0,1) $

CREATE ; e1 = v1 ; e2 = .7071 * (v1+v2) $

CREATE ; ds = z1 + z2 + e1 ; d = ds > 0 $

CREATE ; u = Abs(Rnn(0,1)) ; x1 = Rnn(0,1) ; x2 = Rnn(0,1) $

CREATE ; y = x1 + x2 + e2 - u $

PROBIT ; Lhs = d ; Rhs = one,z1,z2 ; Hold $

FRONTIER ; Lhs = y ; Rhs = one,x1,x2 ; Selection $


-----------------------------------------------------------------------------

Binomial Probit Model

Dependent variable D


--------+--------------------------------------------------------------------


D| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------

|Index function for probability

Constant| .03616 .03525 1.03 .3051 -.03294 .10525

Z1| .96314*** .04604 20.92 .0000 .87291 1.05338

Z2| 1.01534*** .04702 21.59 .0000 .92318 1.10750

--------+--------------------------------------------------------------------


Normal exit: 14 iterations. Status=0, F= 1916.202

-----------------------------------------------------------------------------


Dependent variable Y



Inf.Cr.AIC = 3844.4 AIC/N = 1.922

Variances: Sigma-squared(v)= 1.00545

Sigma-squared(u)= 1.07396

Sigma(u) = 1.03632

Sigma(v) = 1.00272

Sigma = 1.44202

Lambda = 1.03351

Sample Selection/Frontier Model

Murphy/Topel Corrected VC Matrix







Chi-sq=2*[LogL(SF)-LogL(LS)] = -507.754

Kodde-Palm C*: 95%: 2.706, 99%: 5.412

-----------------------------------------------------------------------------

--------+--------------------------------------------------------------------


Y| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| -.04492 .10971 -.41 .6822 -.25994 .17011

X1| 1.00102*** .03357 29.82 .0000 .93522 1.06682

X2| .95627*** .03195 29.93 .0000 .89364 1.01890

Sigma(u)| 1.03632*** .13217 7.84 .0000 .77728 1.29537

Sigma(v)| 1.00272*** .05471 18.33 .0000 .89549 1.10995

Rho(w,v)| .77553*** .06187 12.54 .0000 .65427 .89679

--------+--------------------------------------------------------------------


E62.11.2 Log Likelihood and Estimation Method

Write the model structure as

d* = z + w, w ~ N[0,1], d = 1(d* > 0)

y = x + vv - u u

u = |U| with U ~ N[0,1]

(v,w) ~ bivariate normal with [(0,0),(1, , 1)]

(y,x) only observed when d = 1.

(Note for convenience later, we have moved the scale parameters into the structural model.) To set

up the estimator, we now write w in its conditional on v form,

w|v = v + h where h ~ N[0, (1 - 2)] and h is independent of v.

Therefore, d*|v = z +v + h, d = 1(d* > 0|v)

Then, Prob[d = 1 or 0 | z,v] = 2

(2 1)1

z vd

For the selected observations, d = 1, conditioned on v, the joint density for y and d is the product of

the marginals since conditioned on v, y and d are independent;

f(y, d = 1|x,z,v) = f(y|x,v) Prob(d = 1|z,v).

We have the second part above. For the first part,

y|x,v = (x + vv ) - uu

where u is the truncation at zero of a standard normal variable, so f(u) = 2(u), u>0. The Jacobian of

the transformation from u to y is 1/u, so by the change of variable, the conditional density is

( )2

( | , ) ,( ) 0.vv

u u

v yf y v v y

xx x

Therefore, the joint conditional density is

2

( )2( , 1| , , )

1

x zx z v

u u

v y vf y d v

.


To obtain the unconditional density, it is necessary to integrate v out of the conditional density.

Thus,

2

( ))2( , 1| , ) ( )

1

v

vu u

v y vf y d f v dv

x z

x z

.

The relevant term in the log likelihood is log f(y,d=1|x,z). For the nonselected observations, the

contribution to the log likelihood is the log of the unconditional probability of nonselection, which is

Prob(d = 0|z) = 2

( )1

z

v

vf v dv

.

The integrals do not exist in closed form, so these terms cannot be evaluated as is. Before

proceeding, we note the additional complication, x + vv - y = uu> 0, so the density f(v) is not the

standard normal that intuition might suggest; it is a truncated normal.

The integrals can be computed by simulation. By construction,

2 2

) )2 2( )

1 1

x + x +z zv vv

vu u u u

v y v yv vf v dv E

so by sampling from the distribution of v, we can compute the function of v and average to obtain the

integrals. In order to sample the draws on v, we note the implied truncation,

v> (y - x)/v or v>/v.

Draws from the truncated normal can be obtained using result (E-1) in Greene (2011). Let A equal a

draw from the uniform (0,1) population. The desired draw from the truncated normal distribution

will be

vr = -1

[(/v) + Ar(-/v)].

Collecting all terms, then, the simulated log likelihood will be

1 2 2

)1 2log log (1- )

1 1

R v ir ir irS i ii r

u u

v y v vL d d

R

x + z z -

+

where the draws on vir are as shown above. Derivatives of this simulated log likelihood are obtained

numerically using finite differences.


E63: Heteroscedasticity and Truncation in Stochastic Frontier Models

E63.1 Introduction

This chapter develops several extensions of the stochastic frontier model presented in

Chapter E62. The four models considered here are as follows:

Heteroscedasticity in v and/or u

Truncated normal with nonzero, heterogeneous mean in the underlying U

Heterogeneity in the parameter of the exponential or gamma distribution

Amsler et al.‟s „scaling model‟

E63.2 Heteroscedasticity and Heterogeneity

In the development of the frontier model, an important question concerns how to introduce

observed heterogeneity into the specification. Suppose the vector of variables zi contains the

information. For example, in the airline data, we have data on load factor, stage length and number

of points in the route map, that may also impact production, cost and efficiency. In the model

proposed thus far, the only point at which one might introduce zi appears to be in the goal function

itself, which would become

yi = ′xi + ′zi + vi - ui.

This is a common approach. (See, e.g., Greene (2004a,b).) In this chapter, we present two other

methods of introducing observed heterogeneity in the frontier model, in the variance parameters and

in the mean of the underlying inefficiency.

E63.2.1 Heterogeneity in the Scale Parameters

A natural departure point is to allow observable variation in v2 and/or u

2. For the first of

these, the term heteroscedasticity is appropriate. (The papers by Hadri et al. (1999, 2003a,b) develop

heteroscedasticity models for frontier specifications.) For the second of these, a result which seems

routinely to be overlooked in the literature is that allowing u2 to vary over observations, call it u,i

2,

induces more than just heteroscedasticity. Unavoidably in all model specifications, when this

parameter varies over individuals, then both the variance and the mean of ui do also. For the half

normal model, regardless of how u,i varies,

E[ui] = u,i(0)/(0) = 0.79788u,i.

A like result emerges in the truncated normal model. In the exponential model, the mean of ui equals its

standard deviation, while in the gamma model, it is a multiple, P1/2

, of it. Thus, in all cases, as regards

ui, the term heteroscedasticity, while not inappropriate, is nonetheless ambiguous. These models cannot

be heteroscedastic without also having a heterogeneous mean. In what follows, therefore, we continue

to use the familiar terminology, but we emphasize the nature of the model as well.


The models of scale heterogeneity may extend either variance parameter with the

specification of the variance functions

Var[U|zi] = ui2 = u

2 exp(zi) (heteroscedastic)

Var[v|zi] = v2 = v

2 exp(wi) (heteroscedastic)

Var[u|zi] = u2 exp(z) and Var[v|zi] = v

2 exp(wi) (doubly heteroscedastic)

There is no requirement that the same variables enter the two functions, and either or both may be

heterogeneous. The model specification is

; Heteroscedasticity or ; Het

and either or both of

; Hfv = variables in the variance of v

; Hfu = variables in the variance of u

If either variance is not given, it is assumed to be constant. The variance function is the exponential

format used throughout LIMDEP If either variance is unspecified, the implied model is ji2 = exp(

or ) which is the same as

; Hfv = one or ; Hfu = one

If both are unspecified, then the implied model

; Het ; Hfv = one ; Hfu = one

is the default, normal-half normal stochastic frontier model. It provides identical estimates. (Try it.)

A constant (one) is automatically inserted into both lists if you do not include it. This form may be

used with the normal-half normal and normal-truncated normal models.

E63.2.2 Exponential and Gamma Models with Heterogeneity

The one sided component of the normal-exponential and normal-gamma models is

parameterized with a scale parameter, , which is thus far taken to be a constant. In these models,

E[ui] = P/ = Pu

where P = 1 in the exponential model. The exponential heteroscedasticity model for ui is extended to

these two models by using

i = exp(-zi).

With this parameterization, the estimates from this model will be comparable to those for the half

normal and truncated normal models. (See the examples below.) To request this form, use

; Het ; Hfu = the list of variables.


The list should not contain a constant term, one. This may be used in all implementations of the

exponential gamma model. Note, however, that in the panel data settings, the parameter is assumed

to be time invariant. The values for zi are taken from the data record for the last period for firm i.

We will return to this subject below. The symmetric component, v, may also be heteroscedastic, as

in the other models, with

; Hfv = list of variables.

E63.2.3 Efficiency Estimation with Heteroscedasticity

This extension does not change the computation of measures of efficiency or inefficiency.

The central results are the JLMS estimators,

2

( )ˆ[ | ] ,1 1 ( )

wE u w v u

w

, w =S/

for the half normal models and

( )ˆ[ | ]

1 ( )v

wE u w

w

, w = (S/v + v)

for the exponential models. These functions are evaluated for each observation at

i = u,i / v,i

and i2 = u,i

2 + v,i

2

for the half normal model and v,i and i likewise in the exponential and gamma models.

E63.2.4 Application

The estimates below show a production frontier based on the six inputs. The second set of

results presents the heteroscedastic model, with the variance of v a function of the log of the average

stage length and the variance of u depending on the load factor and the log of the number of points

served. We examine the efficiency results, then compute the average partial effects of the

environmental variables on technical efficiency.

FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = eu $

FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = euhet

; Het ; Hfv = lstage ; Hfu = loadfctr,points $

PARTIALS ; Effects: lstage / loadfctr / points ; Summary $

KERNEL ; Rhs = eu,euhet

; Title = Kernel Estimators for Technical Efficiency $

PLOT ; Lhs = eu ; Rhs = euhet ; Rh2 = eu ; Fill ; Grid

; Title = Estimates of Technical Efficiency

; Vaxis = exp(-E[u|e]) for Heteroscedastic Model $


-----------------------------------------------------------------------------







Sigma(v) = .13791

Sigma(u) = .13007

Sigma = Sqr[(s^2(u)+s^2(v)]= .18957


Var[u]/{Var[u]+Var[v]} = .24425









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439

LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530

LP| .44533*** .09498 4.69 .0000 .25917 .63149

LF| .37257*** .07038 5.29 .0000 .23463 .51052

LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299

LM| .69910*** .07580 9.22 .0000 .55054 .84766

LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759


Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373

Sigma| .18957*** .00064 297.81 .0000 .18832 .19082

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


-----------------------------------------------------------------------------





Inf.Cr.AIC = -274.6 AIC/N = -1.073



Sigma(v) = .11367

Sigma(u) = .18907

Sigma = Sqr[(s^2(u)+s^2(v)]= .22061


Var[u]/{Var[u]+Var[v]} = .50132

Variances averaged over observations









Kodde-Palm C*: 95%: 8.761, 99%: 12.483

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -3.29243*** .72664 -4.53 .0000 -4.71662 -1.86824

LL| -.47507*** .08890 -5.34 .0000 -.64932 -.30083

LP| .50435*** .10452 4.83 .0000 .29950 .70920

LF| .53204*** .07550 7.05 .0000 .38406 .68003

LE| 2.36654*** .69245 3.42 .0006 1.00936 3.72372

LM| .53413*** .08670 6.16 .0000 .36419 .70406

LK| -2.43136*** .77258 -3.15 .0016 -3.94558 -.91713

|Parameters in variance of v (symmetric)

Constant| -3.97891*** .86601 -4.59 .0000 -5.67626 -2.28155

LSTAGE| -.06406 .13359 -.48 .6315 -.32590 .19777

|Parameters in variance of u (one sided)

Constant| 9.96191** 4.51238 2.21 .0273 1.11781 18.80600

LOADFCTR| -25.9711*** 9.37571 -2.77 .0056 -44.3471 -7.5950

POINTS| -.00353 .01288 -.27 .7840 -.02877 .02171

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------

The figure below displays the kernel density estimators for the two sets of estimated

inefficiencies. The upper one is for the heteroscedastic model. The figure shows clearly the

influence of the heterogeneity. The means of the two distributions are virtually the same, but the

variance in the heteroscedastic model is considerably higher.


Figure E63.1 Kernel Estimators for Density of E[u|] with and without Heteroscedasticity

Figure E63.2 Plot of Estimated Inefficiencies, Heteroscedastic vs. Homoscedastic

---------------------------------------------------------------------

Partial Effects for JLMS Estimator in Normal/het SF Model

Partial Effects Averaged Over Observations

* ==> Partial Effect for a Binary Variable

---------------------------------------------------------------------

Partial Standard


---------------------------------------------------------------------

LSTAGE -.00034 .00071 .48 -.00174 .00105

LOADFCTR .62934 .17576 3.58 .28485 .97382

POINTS .00009 .00031 .28 -.00052 .00069

---------------------------------------------------------------------



For the models with heteroscedasticity, we revert to the original structural form of the model

to form the log likelihoods. For the normal-half normal model, for example, we use

log Li = - log(2/) - logi - ½(i/i)2 + log[-Sii/i]

where i = 22uiui

i = ui / vi

ui2 = exp(zi)

vi2 = exp(wi),

where S = +1 for a production frontier and -1 for a cost frontier. Likewise, for the truncation model,

log Li = - ½log2 -logi - ½[(Si + )/i]2

+ log[(/i - Sii)/i] - log(/u.i ).

We build the structure of the model with two freely varying variance parameters, u,i and v,i, rather

than the reduced form parameters and . The use of i as a free parameter would not be

appropriate because the numerator and denominator of i must be allowed to vary freely and

independently. A like consideration rules out the composed parameter i. The formulation of the

log likelihood and its derivatives follows the results given earlier for the homogeneous cases. Where

the derivatives with respect to and emerge, we use the chain rule to differentiate with respect to

u,i and v,i first. Note that the independent parameter u and v have been absorbed into the

exponential functions. Thus, v is exp(0). This ensures that the variances are always positive.

The normal-gamma and normal-exponential models are not reparameterized. The log

likelihood for the exponential model with variance heterogeneity is

log Li = logi+ ½i2i,v

2 + iSi+ log[-Si/i,v - ii,v]

where i = exp(-zi)

and i,v = v exp(wi).

The sign change in i is used to make the normal-exponential model comparable to the normal-half

normal model, since Var[ui] = 1/i2.


E63.3 The Normal-Truncated Normal Model

The normal-truncated normal model relaxes an implicit restriction in the normal-half normal

model, that the mean of the underlying inefficiency variable is zero. The extended model is obtained

by allowing , the mean of U, to be nonzero;

y = x + v - u, u = |U|

U ~ N[,u2]

v ~ N[0,v2]

(With a constant term in the model, no similar parameter can be introduced into the distribution of v.)

The command for estimating this model is

FRONTIER ; Lhs = dependent variable

; Rhs = one, other independent variables

; Model = Truncated Normal $ (or ; Model = T)

The specification of the cost frontier and the estimator of technical inefficiency are requested in the

same fashion,

; Cost

and ; Eff = variable name

Other optional parts of the command are the same as that for the normal-half normal model.

We note, this model is extremely volatile, owing to the rather weak identification of the

parameter . It is difficult to distinguish the mean from the variance parameter in this model. In the

truncation model,

E[ui] = + u(/u)/(/u).

This implies that u and can covary so as to produce little or no variation in the expectation of ui.

The likelihood is not a function of the square of ui, so this mean is the only source of information

about these two parameters. (By totally differentiating the expected value, one can solve for the

implicit relationship, d/du that produces dE[ui] = 0.) The example below suggests how this aspect

of the model influences (or fails to) the estimates of inefficiency. For purposes of the JLMS

estimator for the half normal model, when the mean of U is a nonzero , the argument to the

function is replaced with

w = S/ - /().

The remaining part of the computation is the same.


E63.3.1 Application

The results below show estimates of a stochastic cost frontier with the half normal then the

truncated normal specifications. The additional parameterization appears to have had a large impact

on the results; the estimates are noticeably different. The plot of the two sets of inefficiency

estimates suggest that the effect of the new specification has been little more than to double the

estimated values from the model – the dashed line in the figure shows the function uTN = 2uHN. The

extremely large estimates of and the standard error do suggest that something is amiss with the

model, however.

The commands are:

FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = u $

FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = ut ; Model = T $

PLOT ; Lhs = u ; Rhs = ut ; Rh2 = u ; Fill ; Grid

; Title = Truncated Normal Inefficiencies vs. Half Normal $

DSTAT ; Rhs = u,ut $

-----------------------------------------------------------------------------






Sigma(v) = .13791

Sigma(u) = .13007

Sigma = Sqr[(s^2(u)+s^2(v)]= .18957


Var[u]/{Var[u]+Var[v]} = .24425









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439

LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530

LP| .44533*** .09498 4.69 .0000 .25917 .63149

LF| .37257*** .07038 5.29 .0000 .23463 .51052

LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299

LM| .69910*** .07580 9.22 .0000 .55054 .84766

LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759


Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373

Sigma| .18957*** .00064 297.81 .0000 .18832 .19082

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


-----------------------------------------------------------------------------







Sigma(v) = .13771

Sigma(u) = 1.57738

Sigma = Sqr[(s^2(u)+s^2(v)]= 1.58338


Var[u]/{Var[u]+Var[v]} = .97946










Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -3.11541*** .77143 -4.04 .0001 -4.62739 -1.60343

LL| -.44532*** .07797 -5.71 .0000 -.59814 -.29249

LP| .46908*** .11368 4.13 .0000 .24628 .69188

LF| .37437*** .07465 5.02 .0000 .22807 .52068

LE| 2.20830*** .73883 2.99 .0028 .76023 3.65637

LM| .67741*** .09341 7.25 .0000 .49433 .86048

LK| -2.20620*** .82402 -2.68 .0074 -3.82126 -.59115

|Offset [mean=mu(i)] parameters in one sided error

Mu| -31.5468 5061.203 -.01 .9950 -9951.3228 9888.2292


Lambda| 11.4545 907.8501 .01 .9899 -1767.8991 1790.8081

Sigma| 1.58338 124.7546 .01 .9899 -242.93113 246.09790

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


--------+---------------------------------------------------------------------


--------+---------------------------------------------------------------------

U| .902312 .035500 .703534 .963108 256 0

UT| .925474 .039335 .608274 .972355 256 0

--------+---------------------------------------------------------------------


Figure E63.3 Inefficiency Estimates from Truncated Normal Model

E63.3.2 Battese and Coelli (1995) Formulation

There are (apparently) two formulations of the normal – truncated normal model in the

literature. The formulated above,

y = x + v - u, u = |U|

U ~ N[,u2]

v ~ N[0,v2]

is due to Stevenson (1980). Note that the inefficiency term is the absolute value of a normally

distributed variable with a nonzero mean. Battese and Coelli proposed an apparently different

formulation of the truncation model;

u = + w

where w is a truncated normal, such that

w > -.

This is actually the same model. You can obtain the estimates using this alternative formulation with

; Model = BC95

in place of ; Model = T. The log likelihood for this formulation involves a one to one

reparameterization of the Stevenson model, which has slightly different numerical properties. You

can see this in the application below. The estimated inefficiency and efficiency values produced by

the two models are the same to five or six digits, however.


-----------------------------------------------------------------------------






Sigma(v) = .13850

Sigma(u) = 1.50235

Sigma = Sqr[(s^2(u)+s^2(v)]= 1.50872


Var[u]/{Var[u]+Var[v]} = .97715


Battese/Coelli 1995 truncated normal model








Kodde-Palm C*: 95%: 5.138, 99%: 8.273

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -3.09929*** .76919 -4.03 .0001 -4.60687 -1.59172

LL| -.44370*** .07771 -5.71 .0000 -.59600 -.29140

LP| .46535*** .11351 4.10 .0000 .24288 .68781

LF| .37430*** .07432 5.04 .0000 .22863 .51997

LE| 2.18991*** .73664 2.97 .0030 .74613 3.63369

LM| .67921*** .09322 7.29 .0000 .49651 .86191

LK| -2.18647*** .82171 -2.66 .0078 -3.79700 -.57594

|Offset [mean=z(i)*delta] parameters in one sided error

Constant| -29.6062 4821.053 -.01 .9951 -9478.6972 9419.4848


Gamma| .99157 1.34377 .74 .4606 -1.64216 3.62531

SigmaSqd| 2.27624 363.5754 .01 .9950 -710.31839 714.87086

--------+--------------------------------------------------------------------

(Stevenson formulation)


--------+--------------------------------------------------------------------


Constant| -3.11541*** .77143 -4.04 .0001 -4.62739 -1.60343

LL| -.44532*** .07797 -5.71 .0000 -.59814 -.29249

LP| .46908*** .11368 4.13 .0000 .24628 .69188

LF| .37437*** .07465 5.02 .0000 .22807 .52068

LE| 2.20830*** .73883 2.99 .0028 .76023 3.65637

LM| .67741*** .09341 7.25 .0000 .49433 .86048

LK| -2.20620*** .82402 -2.68 .0074 -3.82126 -.59115


Mu| -31.5468 5061.203 -.01 .9950 -9951.3228 9888.2292


Lambda| 11.4545 907.8501 .01 .9899 -1767.8991 1790.8081

Sigma| 1.58338 124.7546 .01 .9899 -242.93113 246.09790


E63.3.3 Technical Details on the Truncated Normal Model

The individual term in the log likelihood for the normal-truncated normal model is

log Li = - ½log2 -log - ½[(Si + )/]2 - log(/u ) + log[(/ - Si)/].

The definitions above imply that

u = / 21 .

Using this and the reparameterization

= /()

produces the log likelihood for this model,

Log Li = - ½log2 -log - ½(di/ + )2 - log( 21 ) + log( - di/).

The function is then maximized with respect to , , and . After optimization, the structural

parameter is recovered from the result = . For the model with heterogeneity in the mean

presented in Section E63.3.4,

i = zi

we simply replace with i= zi, then recover the parameter vector from the same transformation

as before, = .

For purposes of the JLMS estimator for the half normal model, when the mean of U is a

nonzero , the argument to the function is replaced with

w = S/ - /().

The remaining part of the computation is the same.

E63.3.4 Heterogeneity in the Mean in the Truncation Model

The models listed above are all „homogeneous.‟ Both the means and the variances of the

underlying disturbance distributions are constant. There are several models of heterogeneity

available as well. Use

; Model = T ; Rh2 = list of variables that enter the mean

to specify the heterogeneity in mean model, Ui ~ N[zi, u2]. In formulating this model, though it is

not required, you should include a constant in zi (the Rh2 variables) so that the homogeneous model

becomes a special case. Also, if you are fitting a panel data version of this, note that the assumption

underlying the model is that the same ui occurs in every period. Therefore, the zi should be the

same in every period. LIMDEP will assume this is the case, and only use the Rh2 variables provided

for the first period.


E63.3.5 Truncation and Heteroscedasticity

The doubly heteroscedastic model is also available for the truncated normal stochastic

frontier model. In

yi = xi + vi- ui

you may specify ; Model = Truncated Normal; Rh2 = list of variables

and Var[ui] = u2 exp(′zi) with

; Het ; Hfu = list of variables in zi

and/or Var[vi] = v2 exp(′wi) with

; Het ; Hfv = list of variables in wi

Note that since both variance functions have a free multiplicative constant, you should not include

one in either variable list.

In the absence of the Rh2 list, the mean of the underlying truncated variable is taken to be a

constant to be estimated. This formulation encompasses all of Stevenson (1980), Reifschneider and

Stevenson (1991), Huang and Liu (1994), and Battese and Coelli (1995). (Notwithstanding the

assertion in the Battese and Coelli paper, the latter is not a panel data treatment as observations are

still assumed to be independent.)

To illustrate the truncated normal estimator, we have refit the stochastic frontier production

function with a complete set of firm dummy variables (less the last one) and the load factor variable

in the mean of the underlying distribution. In the second model below, we have made the variance

of v a function of the log of the average stage length. The command set begins with a small repair to

the data set. One of the firms has no observations for the load factor, stage length or points served

variables – they are coded as zero in the data. These observations are bypassed, then the firm

dummies for the fixed effects model are assembled.

SAMPLE ; All $

REJECT ; loadfctr = 0 $

CREATE ; i = Seq(firm) $

CREATE ; Expand(i,0) $

CREATE ; lk = Log(k) $

NAMELIST ; xp = one,lf,lm,le,ll,lp,lk $

FRONTIER ; Lhs = lq ; Rhs = xp ; Model = T ; Rh2 = loadfctr,_i_ $

FRONTIER ; Lhs = lq ; Rhs = xp ; Model = T ; Rh2 = loadfctr,_i_

; Het ; Hfv = lstage $

(These are „true fixed effects‟ models.)


-----------------------------------------------------------------------------





Inf.Cr.AIC = -324.4 AIC/N = -1.267




Sigma(v) = .09799

Sigma(u) = .06241

Sigma = Sqr[(s^2(u)+s^2(v)]= .11618


Var[u]/{Var[u]+Var[v]} = .12845










Kodde-Palm C*: 95%:38.301, 99%: 45.026

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -2.92400*** .68225 -4.29 .0000 -4.26118 -1.58682

LF| .31938*** .09026 3.54 .0004 .14246 .49629

LM| .81647*** .08387 9.73 .0000 .65209 .98086

LE| 1.99934*** .64368 3.11 .0019 .73776 3.26092

LL| -.42790*** .10954 -3.91 .0001 -.64260 -.21321

LP| .42291*** .10529 4.02 .0001 .21654 .62929

LK| -2.07145*** .72267 -2.87 .0042 -3.48786 -.65503


LOADFCTR| -.83124 6.87337 -.12 .9037 -14.30280 12.64031

I01| .63250 4.90139 .13 .8973 -8.97405 10.23904

I02| .58118 4.27763 .14 .8919 -7.80282 8.96519

(Firms 3-21 omitted)

I22| .45249 4.00889 .11 .9101 -7.40480 8.30977

I23| .64687 99.45841 .01 .9948 -194.28803 195.58176

I24| -.19804 7.26011 -.03 .9782 -14.42760 14.03152


Lambda| .63686** .28984 2.20 .0280 .06879 1.20494

Sigma| .11618*** .01008 11.53 .0000 .09643 .13593

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


-----------------------------------------------------------------------------







Sigma(u) = .10183

Sigma(v) = .07961

Sigma = Sqr[(s^2(u)+s^2(v)]= .12926









Kodde-Palm C*: 95%:38.301, 99%: 45.026

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -1.98442* 1.05055 -1.89 .0589 -4.04346 .07463

LF| .45669*** .11002 4.15 .0000 .24105 .67233

LM| .59013*** .10421 5.66 .0000 .38589 .79437

LE| 1.11856 1.00928 1.11 .2677 -.85959 3.09671

LL| -.29237*** .10923 -2.68 .0074 -.50646 -.07827

LP| .31311** .14333 2.18 .0289 .03220 .59402

LK| -1.14743 1.10875 -1.03 .3007 -3.32054 1.02568

|Mean of underlying truncated distribution

LOADFCTR| -2.20067*** .42161 -5.22 .0000 -3.02701 -1.37433

I01| 1.44767*** .25736 5.63 .0000 .94326 1.95208

I02| 1.39624*** .22401 6.23 .0000 .95718 1.83529


I24| 1.29355*** .24998 5.17 .0000 .80360 1.78349

|Scale parms. for random components of e(i)

ln_sgmaU| -2.28443*** .02100 -108.79 .0000 -2.32559 -2.24328

ln_sgmaV| -3.22203*** 1.20573 -2.67 .0075 -5.58522 -.85884

|Heteroscedasticity in variance of symmetric v(i)

LSTAGE| .11855 .19755 .60 .5485 -.26865 .50574

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


E63.4 Alvarez et al. – Equality Constrained Scaling Model

Alvarez, Amsler, Orea and Schmidt (2006) have suggested a form of the truncation model

which encompasses a number of ideas in stochastic frontier modeling. Their formulation is a

normal-truncated normal frontier model with

i = zi and u,i = u′zi.

The mean and standard deviation of the underlying truncated normal variable ui are scaled by the

same linear function of the data. We are skeptical of the linear scaling of the variance, and propose

our usual exponential form instead. The linear form may be natural for the mean, but it allows the

variance to be negative, which is unacceptable. The model used here is

i = exp(zi) and u,i = u exp(zi).

The Alvarez model results if = . Otherwise, we allow these to be free and to produce another

variant of the frontier model. Note that as stated, this model is now merely a change of the normal-

truncated normal model with heteroscedasticity in which the variables enter the truncation mean

function in the exponential function rather than linearly.

The equality constrained scaling model is requested with

FRONTIER ; Lhs = y ; Rhs = one, x...

; Model = Scaling

; Heteroscedasticity

; Rh2 = variables in mean of truncated distribution

; Hfu = the same list of variables $

Note in this case, Rh2 and Hfu give the same list. To obtain the scaling model without forcing the

equality of and , use


; Model = S


; Rh2 = variables in mean of truncated distribution

; Hfu = the same list of variables $

Note, ; Model = Scaling in the equality constrained case and ; Model = S when the equality

constraint is relaxed. (In this formulation, the variable lists could differ.) To constrain = 0, which

just produces the heteroscedasticity model, use


; Model = T


; Hfu = list of variables $


To constrain = 0, you would use the available setup for the truncated normal form, but ; Model = S

rather than ; Model = T to obtain the exponential scaling of the mean.


; Model = S

; Rh2 = variables in mean of truncated distribution $

Finally, with both = 0 and = 0, this is just the standard normal-truncated normal model.

Technical Details

The implementation of the scaling model in LIMDEP is just a version of the truncation

model with heteroscedasticity. The modifications of that model are:

The constant terms in the mean and variance are enforced by the program.

The mean function is exponential.

In the first form of the model, a constraint is imposed that the coefficients in the mean and

variance functions are the same.

As Alvarez et al. note in their paper, this model is not supported by any particular theory of the

frontier framework. They suggest it as a natural extension of the familiar model with truncation.

Rather, they argue that the unnatural form of the model would be the one with different scaling

factors in the mean and variance functions.

Application

To illustrate the scaling model, we use the airlines cost data. The cost function is fit with

truncation mean and variance functions that depend on the load factor and (log of) the average stage

length. The equality constraint is imposed in the first model and relaxed in the second.

FRONTIER ; Lhs = lc ; Cost ; Rhs = x

; Model = Scaling ; Het

; Rh2 = loadfctr,lstage

; Hfu = loadfctr,lstage $

FRONTIER ; Lhs = lc ; Cost ; Rhs = x

; Model = S ; Het

; Rh2 = loadfctr,lstage

; Hfu = loadfctr,lstage $


-----------------------------------------------------------------------------







Sigma(v) = .12361

Sigma(u) = .00169

Sigma = Sqr[(s^2(u)+s^2(v)]= .12363

Stochastic Frontier Scaling Model

Mean scale factor for E[u] = .6996

Mean scale factor for V[u] = .6996








Kodde-Palm C*: 95%:10.371, 99%: 14.325

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 18.9477 27.00668 .70 .4829 -33.9844 71.8798

LY| .95234*** .02117 44.98 .0000 .91084 .99383

LY2| .07740*** .01534 5.04 .0000 .04733 .10747

LPKP| 1.50434 1.86479 .81 .4198 -2.15058 5.15926

LPLP| .12682 .08328 1.52 .1278 -.03640 .29003

LPMP| -.16640 1.21907 -.14 .8914 -2.55574 2.22294

LPEP| -.52809 .60356 -.87 .3816 -1.71105 .65488

LPFP| .00151 .02141 .07 .9436 -.04045 .04348

|Mean of Truncated Distribution, Mu then scale

Mu_0| 2.50985 11.12070 .23 .8214 -19.28633 24.30603

LOADFCTR| -.56559 3.85231 -.15 .8833 -8.11597 6.98479

LSTAGE| -.00823 .05624 -.15 .8837 -.11845 .10200

|Standard Deviation of u: Sigma(u) then scale

Sigmau_0| .00241 9.18604 .00 .9998 -18.00191 18.00673

LOADFCTR| -.56559 3.85231 -.15 .8833 -8.11597 6.98479

LSTAGE| -.00823 .05624 -.15 .8837 -.11845 .10200

|Standard deviation of v

Sigma(v)| .12361 .08711 1.42 .1559 -.04713 .29435

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


-----------------------------------------------------------------------------







Sigma(v) = .11551

Sigma(u) = .03476

Sigma = Sqr[(s^2(u)+s^2(v)]= .19230

Stochastic Frontier Scaling Model

Mean scale factor for E[u] = .3459

Mean scale factor for V[u] = .2261








Kodde-Palm C*: 95%:10.371, 99%: 14.325

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 11.6452 24.94703 .47 .6406 -37.2501 60.5405

LY| .94078*** .02140 43.97 .0000 .89884 .98272

LY2| .06680*** .01579 4.23 .0000 .03585 .09776

LPKP| .85146 1.94378 .44 .6614 -2.95828 4.66120

LPLP| .16345** .07956 2.05 .0399 .00751 .31939

LPMP| .25417 1.26886 .20 .8412 -2.23275 2.74109

LPEP| -.34167 .62932 -.54 .5872 -1.57511 .89178

LPFP| .00164 .02164 .08 .9395 -.04078 .04406

|Mean of Truncated Distribution, Mu then scale

Mu_0| 1.92288*** .44030 4.37 .0000 1.05991 2.78584

LOADFCTR| -1.74305 4.08382 -.43 .6695 -9.74720 6.26110

LSTAGE| -.01930 .04649 -.42 .6781 -.11042 .07182

|Standard Deviation of u: Sigma(u) then scale

Sigmau_0| .15374 1.11571 .14 .8904 -2.03301 2.34049

LOADFCTR| -14.5014 10.21457 -1.42 .1557 -34.5216 5.5188

LSTAGE| 1.02454 1.26499 .81 .4180 -1.45479 3.50388

|Standard deviation of v

Sigma(v)| .11551*** .00793 14.56 .0000 .09996 .13106

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


E64: Panel Data Stochastic Frontier Models E64.1 Introduction

The stochastic frontier model as it appears in the current literature was originally developed

by Aigner, Lovell, and Schmidt (1977). The canonical formulation that serves as the foundation for

other variations is their model,

y = x + v - u,

where y is the observed outcome (goal attainment), x + v is the optimal, frontier goal (e.g.,

maximal production output or minimum cost) pursued by the individual, x is the deterministic part

of the frontier and v ~ N[0,v2] is the stochastic part. The two parts together constitute the

„stochastic frontier.‟ The amount by which the observed individual fails to reach the optimum (the

frontier) is u, where

u = |U| and U ~ N[0,u2]

(change to v + u for a stochastic cost frontier or any setting in which the optimum is a minimum). In

this context, u is the „inefficiency.‟ This is the normal-half normal model which forms the basic

form of the stochastic frontier model. Chapters E62 and E63 developed several versions of the

stochastic frontier model suitable for cross section and pooled data sets. This chapter will develop

versions of the model constructed specifically for panel data.

E64.2 Panel Data Estimators for Stochastic Frontier Models

The stochastic frontiers literature has steadily evolved since the developments of basic

random and fixed effects models by Pitt and Lee (1981) and by Cornwell, Schmidt and Sickles

(1990). All of the generally used forms of panel data models are supported in LIMDEP. The

following will document them in detail. These sections are arranged as follows:

Pitt and Lee – Time Invariant Inefficiency, Random Effects,

Cornwell, Schmidt and Sickles – Time Invariant Inefficiency, Fixed Effects,

Battese and Coelli – Time Dependent Inefficiency Models,

True Fixed Effects Models with Time Varying Inefficiency,

True Random Effects Models with Time Varying Inefficiency,

Random Parameters Stochastic Frontier Models,

Alvarez et al. – Fixed Management (Random Parameters) Model,

Latent Class Stochastic Frontier Models.

The panel models developed here will share features with other panel models in LIMDEP, as

presented in Chapters R22-R25. As in other settings, panels in all models may be unbalanced. Panels

are identified by

SETPANEL ; … $

then ; Panel

in the command, or ; Pds = group count


Nearly all of the models to be presented here actually require panel data, but a few will work, albeit

not as well as otherwise, with ; Pds = 1, i.e., with a cross section. This will be specifically noted

below when it is the case. Second, in all models, the cost form as opposed to the production form is

requested with

; Cost

This and other model specifications are generally the same as the cross sectional cases.

E64.3 Pitt and Lee – Time Invariant Inefficiency, Random Effects

The panel data, random effects specifications based on the model of Pitt and Lee (1981) are

yit = + ′xit + vit - Sui

with S = +1 for a production model and -1 for a cost model. The inefficiency component is assumed

to be time invariant. The base case is the normal-half normal model

ui = |Ui|, Ui ~ N[0,2].

This is a direct extension of the cross section variant discussed earlier. Several model formulations

are grouped in this class. The command for the Pitt and Lee group of models is given by changing

the base case specifications to

FRONTIER ; Lhs = y ; Rhs = one, ... ; Panel $

Pitt and Lee is the default panel data model. The only necessary change for the default case is

specification of the panel with ; Panel. As in the cross section case, the normal-exponential case is

requested with

; Model = Exponential

while the normal-truncated normal is requested with

; Rh2 = one or ; Rh2 = one, additional variables

(The ; Model = T is not needed.) The truncation model may not be combined with the exponential

specification; it is only supported for the normal-truncated normal form.

NOTE: The gamma model does not have a random effects (panel data) version. The model

extensions, such as the scaling model and sample selection described in Chapter E63 likewise do not

support a Pitt and Lee style random effects version.

There is an important consideration for the truncation version with heterogeneous mean. If

you are fitting a panel data version of this model, note that the assumption underlying the model is

that the same ui occurs in every period. Therefore, the zi must be the same in every period.

LIMDEP will assume this is the case, and only use the Rh2 variables provided for the first period.


When the random effects model is estimated, maximum likelihood estimates of the cross

section models are always computed first to obtain the starting values. This will produce a full set of

results which will ignore the panel nature of the data set. A second full set of results will then follow

for the random effects model.

The model estimates retained for all cases are

b = regression parameters, ,

varb = asymptotic covariance matrix.

Use ; Par to retain the additional parameters in b and varb. As seen in the applications below, the

parameters estimated in each case will differ depending on the model formulation. The ancillary

parameters that are estimated for the various models are the same ones saved by the cross section

versions. All models save sy, ybar, nreg, kreg, and logl as well as s, b, varb, etc.

WARNING: Numerous experiments and applications have suggested that the normal-truncated

normal model is a difficult one to estimate. Identification appears to be highly variable, and small

variations in the data can produce large variation in the results. The model often fails to converge

even when convergence of the restricted model with zero underlying mean is routine.

E64.3.1 Model Specifications

There are many different combinations of the components of the random effects model listed

above. The following shows the different possibilities for the Pitt and Lee model. (There are also

many combinations of these that do not use the panel data random effects form.):

NAMELIST ; x = one, … $

CREATE ; y = the outcome variable $

SETPANEL ; … $

Model 1 = pooled

FRONTIER ; Lhs = y ; Rhs = x $

Model 2 = random effects half normal

FRONTIER ; Lhs = y ; Rhs = x ; Panel $

Model 3 = random effects exponential

FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Model = Exponential $

Model 4 = random effects normal heteroscedastic in u or v only

FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Het ; Hfv = … $

FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Het ; Hfu = … $

Model 5 = random effects normal doubly heteroscedastic

FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Het ; Hfv = … ; Hfu = … $

Model 6 = random effects truncated normal

FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Rh2 = one, … $

Model 7 = random effects truncated normal, singly or doubly heteroscedastic

FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Rh2 = one, …

; Het ; Hfv = … ; Hfu = … $

The Pitt and Lee model forms assume that the inefficiency is time invariant. Thus, the

estimate of ui is repeated for each observation in the group. An example below illustrates.


E64.3.2 Applications

The following illustrates a few of the numerous formats of the random effects frontiers. The

data set used is the Swiss railroad data used in Greene (2011, Table F19.1). These data are provided

with the program as swissrailroads.lpj. The variables used here are

ct = total cost

pk = capital price

pe = electricity price

pl = labor price

q2 = passenger output – passenger km

q3 = freight output – ton km

rack = dummy variable for „rack rail‟ in network

tunnel = dummy variable for network with tunnels over 300 meters on average

virage = dummy variable for networks with narrow radius curvature

narrow_t = dummy variable for narrow track (1m as opposed to standard 1.435m).

Preparing the data set includes bypassing one firm for which there is only a single year of data. For

the remaining 49 firms, Ti is a mixture 3, 7, 10, 12 or 13. Figure E64.1 details the distribution of

group sizes.

Figure E64.1 Groups Sizes for Swiss Railroad Sample

Descriptive statistics for the data are shown below. Variables with names beginning with „M‟ are

firm means, repeated for each year for the firm.

We fit four models to illustrate the estimator, the pooled normal-half normal, pooled normal-

truncated (heterogeneous), basic Pitt and Lee and a full model with time invariant inefficiency,

truncation (heterogeneous) and double heteroscedasticity.


The commands are as follows:

SETPANEL ; Group = id ; Pds = ti $

REJECT ; ti = 1 $

CREATE ; lple = Log(pl/pe) ; lpke = Log(pk/pe) ; lnc = Log(ct/pe)$

NAMELIST ; x = one,lnq2,lnq3,lple,lpke $

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x $

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Panel ; Costeff = eusfp_l $

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Rh2 = rack,tunnel

; Het ; Hfu = virage ; Hfv = virage ; Costeff = eushet_t $

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; panel ; Rh2 = rack,tunnel

; Het ; Hfu = virage ; Hfv = virage ; Costeff = fullmodl $

--------+---------------------------------------------------------------------


--------+---------------------------------------------------------------------

ID| 25.48760 14.60037 1.0 51.0 605 0

YEAR| 90.91570 3.692372 85.0 97.0 605 0

NI| 12.58347 1.305259 1.0 13.0 605 0

STOPS| 20.42479 18.48285 4.0 121.0 605 0

NETWORK| 39431.66 56642.38 3898.0 376997.0 605 0

LABOREXP| 12801.95 26232.69 951.0 173549.0 605 0

STAFF| 170.3810 333.0317 11.0 1934.0 605 0

ELECEXP| 968.1521 1944.830 14.0 14737.0 605 0

KWH| 7602.221 15608.39 82.0 104923.0 605 0

TOTCOST| 22470.44 42283.57 1534.0 280871.0 605 0

NARROW_T| .676033 .468375 0.0 1.0 605 0

RACK| .234711 .424169 0.0 1.0 605 0

TUNNEL| .188430 .391379 0.0 1.0 605 0

T| 5.915702 3.692372 0.0 12.0 605 0

Q1| 813914.0 1083923 61000.0 6409000 605 0

Q2| .308145D+08 .550599D+08 409000.0 .311000D+09 605 0

Q3| .101934D+08 .527303D+08 150.0 .477000D+09 605 0

CT| 26728.37 49883.51 2120.968 307433.4 605 0

PL| 86051.77 6484.535 60932.91 104930.4 605 0

PE| .157485 .022766 .076344 .265182 605 0

PK| 4534.491 2128.307 1040.323 14466.06 605 0

VIRAGE| .715702 .451452 0.0 1.0 605 0

LABOR| 52.40245 9.598136 20.03025 73.11581 605 0

ELEC| 4.044504 1.422098 .568412 9.311660 605 0

CAPITAL| 43.55305 9.461303 23.88916 77.33154 605 0

LNCT| 11.30622 1.101691 9.462956 14.57019 605 0

LNQ1| 13.06322 1.010039 11.01863 15.67321 605 0

LNQ2| 16.31759 1.339167 12.92147 19.55500 605 0

LNQ3| 12.49439 2.716709 5.010635 19.98343 605 0

LNNET| 3.200860 .908512 1.360464 5.932237 605 0

LNPL| 13.21935 .163565 12.60449 13.77599 605 0

LNPE| -1.859557 .152870 -2.572503 -1.327338 605 0

LNPK| 10.17950 .438886 8.740266 11.37466 605 0


LNSTOP| 2.775052 .655071 1.386294 4.795791 605 0

LNCAP| 3.137572 .328311 2.123893 3.850147 604 1

MLNQ1| 13.06322 1.005089 11.16747 15.59433 605 0

MLNQ2| 16.31759 1.333346 13.20185 19.45679 605 0

MLNQ3| 12.49439 2.648475 7.734539 19.68075 605 0

MLNNET| 3.200860 .906363 1.360464 5.927817 605 0

MLNPL| 13.21935 .126548 12.89796 13.61620 605 0

MLNPK| 10.17950 .396797 8.938699 11.03543 605 0

MLNSTOP| 2.775052 .651059 1.386294 4.789402 605 0

LPLE| 13.21943 .163692 12.60449 13.77599 604 1

LPKPE| 10.16419 .576094 1.0 11.37466 605 0

LNC| 11.30305 1.099836 9.462957 14.57019 604 1

--------+---------------------------------------------------------------------

This is the pooled normal-half normal model.

-----------------------------------------------------------------------------


Dependent variable LNC



Inf.Cr.AIC = 432.8 AIC/N = .717



Sigma(v) = .27077

Sigma(u) = .35119

Sigma = Sqr[(s^2(u)+s^2(v)]= .44345


Var[u]/{Var[u]+Var[v]} = .37937









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------


LNC| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| -10.0907*** 1.14284 -8.83 .0000 -12.3306 -7.8507

LNQ2| .64179*** .01371 46.80 .0000 .61491 .66867

LNQ3| .06855*** .00655 10.46 .0000 .05570 .08139

LPLE| .53971*** .08858 6.09 .0000 .36610 .71333

LPKE| .26045*** .03260 7.99 .0000 .19655 .32435


Lambda| 1.29697*** .13854 9.36 .0000 1.02545 1.56850

Sigma| .44345*** .00056 789.05 .0000 .44235 .44455

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


This is the original Pitt and Lee normal-half normal model with time invariant inefficiency.

In comparison to the pooled model above, u has tripled and v has decreased by two thirds. The

assumption of time invariance of the inefficiency produces a large reallocation of the random

components between noise and inefficiency. This is evident in the kernel estimate below as well.

-----------------------------------------------------------------------------





Inf.Cr.AIC = -1040.2 AIC/N = -1.722

Stochastic frontier based on panel data

Estimation based on 49 individuals



Sigma(v) = .07879

Sigma(u) = .96071

Sigma = Sqr[(s^2(u)+s^2(v)]= .96394


Var[u]/{Var[u]+Var[v]} = .98183









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -7.25643*** .24767 -29.30 .0000 -7.74185 -6.77101

LNQ2| .36259*** .01503 24.12 .0000 .33312 .39205

LNQ3| .01902*** .00240 7.94 .0000 .01432 .02372

LPLE| .64148*** .02112 30.38 .0000 .60009 .68287

LPKE| .30842*** .00700 44.08 .0000 .29471 .32214


Lambda| 12.1932** 5.55909 2.19 .0283 1.2975 23.0888

Sigma(u)| .96071*** .13303 7.22 .0000 .69998 1.22145

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


This is the pooled normal-truncated and doubly heteroscedastic normal model.

-----------------------------------------------------------------------------





Inf.Cr.AIC = 148.9 AIC/N = .246



Sigma(u) = .02720

Sigma(v) = .26729

Sigma = Sqr[(s^2(u)+s^2(v)]= .26867









Kodde-Palm C*: 95%: 8.761, 99%: 12.483

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -13.4218*** 1.01232 -13.26 .0000 -15.4059 -11.4377

LNQ2| .62859*** .01404 44.79 .0000 .60108 .65610

LNQ3| .09670*** .00669 14.46 .0000 .08359 .10981

LPLE| .68419*** .07646 8.95 .0000 .53433 .83405

LPKE| .39946*** .03301 12.10 .0000 .33476 .46415


RACK| .62333*** .05632 11.07 .0000 .51293 .73372

TUNNEL| -.35607*** .05500 -6.47 .0000 -.46387 -.24828


ln_sgmaU| -2.54850*** .96756 -2.63 .0084 -4.44488 -.65212

ln_sgmaV| -1.36799*** .06507 -21.02 .0000 -1.49551 -1.24046

|Heteroscedasticity in variance of truncated u(i)

VIRAGE| -1.47329 2.86559 -.51 .6072 -7.08975 4.14316


VIRAGE| .06774 .08094 .84 .4026 -.09090 .22638

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


This is the same model as immediately above, with the additional assumption that the

inefficiency is time invariant. Compared to the previous specification, u has now increased by a

factor of 30 while v has nearly vanished, falling from 0.27 to 0.005, that is, by a factor of 50.

-----------------------------------------------------------------------------





Inf.Cr.AIC = -1043.9 AIC/N = -1.728



Sigma(u) = .87314

Sigma(v) = .00543

Sigma = Sqr[(s^2(u)+s^2(v)]= .87316











Kodde-Palm C*: 95%: 8.761, 99%: 12.483

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -7.26117*** .25317 -28.68 .0000 -7.75738 -6.76496

LNQ2| .36162*** .01558 23.20 .0000 .33107 .39216

LNQ3| .01947*** .00257 7.58 .0000 .01444 .02451

LPLE| .64342*** .02165 29.72 .0000 .60099 .68584

LPKE| .30730*** .00727 42.24 .0000 .29305 .32156


RACK| .81356 .52427 1.55 .1207 -.21399 1.84112

TUNNEL| 1.46353*** .47072 3.11 .0019 .54094 2.38613


ln_sgmaU| -.17921 .21781 -.82 .4106 -.60611 .24769

ln_sgmaV| -4.94678*** .20426 -24.22 .0000 -5.34711 -4.54644

|Heteroscedasticity in variance of truncated u(i)

VIRAGE| .06076 .04703 1.29 .1964 -.03142 .15294


VIRAGE| -.37544 .44206 -.85 .3957 -1.24185 .49097

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


The kernel estimator compares the estimated cost efficiency distributions for the pooled and

basic Pitt and Lee model. The pattern suggested earlier is clearly evident. The same comparison

appears for the truncated normal/heteroscedasticity models. (The estimated cost efficiency results

for the basic Pitt and Lee model and the expanded one are the same to three or four digits.) The

partial listing below shows the estimates for the four models, noting the time invariance of the Pitt

and Lee estimates.

Figure E64.2 Kernel Estimators for Cost Efficiency

Figure E64.3 Estimated Cost Efficiency



For the three forms of the normal mixture models, we use the following: Let

= u2 / v

2

i = i/u

i = zi for the heterogeneous mean model

, = a constant (0) for the simple truncated (half) normal model

Ai = 1 + Ti

hi = i / Ai– STi i /(u Ai)

i =

iT

t ititi xyT1

')/1( .

Then, the contribution of individual i to the log likelihood function for the normal-half normal model

is

log Li = – (Ti/2)log 2–Ti logu– ½ log Ai – (Ti/2) log

– ½( / u2) 2

1 it

T

t

i

+ ½ Aihi2 + ½ log(hi iA )– ½ i

2– log(i)

For the normal-exponential model, let

hi = – (v/Ti + di /v)

Then, log Li = – ½ log Ti– (Ti– 1)log 2 + log– (Ti – 1)logv

– ½(1/v2) 2

1 it

T

t

i

+ ½ Ti hi2 + log(hi iT )

The Jondrow estimator, as formulated in Battese and Coelli (1988) in as follows: Let

i = 1 / (1 + 2Ti),

i2 = u

2i,

Ei = i + (1 - i)( – i ),

and i = (1/Ti)tit.

Then, E[ui|i1,i2,...] = Ei + i[(Ei/i) / (Ei/i)].

For the exponential model, replace i with v and Ei with iT (–i – v

2/Ti).


E64.4 Cornwell, Schmidt and Sickles – Time Invariant Inefficiency, Fixed Effects

Cornwell, Schmidt and Sickles (1990) suggested a modification of the familiar fixed effects

linear regression,

yit = i + ′xit + vit.

The estimated model is

yit = ai + b′xit + vit

= max(ai) + b′xit + vit+ [ai– max(ai)]

= a + b′xit + vit - ui

where ui = max(ai) - ai > 0.

(To change this to a cost frontier, change ui to [ai - min(ai)] This bears resemblance to a stochastic

frontier model, though in fact, it is a „deterministic‟ frontier model. The signature feature is that ui

equals zero for the „most efficient‟ firm in the sample. A natural interpretation of this is that what

we measure with the model is not the absolute inefficiency, but inefficiency of firm i relative to the

other firms in the sample. From the modeler‟s point of view, this approach has several substantive

advantages and disadvantages: The main advantage is

It is distribution free. It requires only the assumptions of the linear model.

The disadvantages are:

It does not allow any time invariant variables in the model.

It labels as inefficiency any and all omitted time invariant effects.

It can only measure firms relative to each other.

As illustrated in the results below, this approach tends to produce very large estimates of ui.

The invariance assumption about ui has been criticized elsewhere. Attempts to relax this assumption

are a recurrent theme in the literature, including the Battese and Coelli and true fixed and random

effects approaches described later. Other early work on the model suggested direct manipulation of

the fixed effects, for example,

it = i0 + i1t + i2t2.

Other more recent research (Han, Orea and Schmidt (2005)) has proposed factor analytic forms for

it. The sections to follow will include several of these different approaches.


Application

This Cornwell, Schmidt and Sickles (CSS) approach requires only a linear fixed effects

regression and a few instructions to manipulate the fixed effects. The following analyzes the airline

data with this approach. The following computes the CSS estimates and compares them to the

unstructured pooled estimates (using the normal-half normal model from Chapter E62) and the Pitt

and Lee model introduced above. The commands for the analysis are as follows:

SAMPLE ; All $

CREATE ; Railroad = id $

CREATE ; If(railroad > 20)railroad = railroad - 1 $ (There is a gap in the data)

HISTOGRAM ; Rhs = railroad

; Title = Number of Observations for Firms in Swiss Railroad Sample $

SETPANEL ; Group = id ; Pds = ti $

REJECT ; ti = 1 $


CREATE ; pooled = Group Mean(eusfpool, Pds = ti) $

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Panel ; Costeff = pittlee $

REGRESS ; Lhs = lnc ; Rhs = x ; Panel ; Fixed Effects $

CREATE ; ai = alphafe(railroad) $

CALC ; minai = Min(ai) $

CREATE ; css = Exp((minai - ai)) $

CREATE ; Period = Ndx(id,1) $

REJECT ; period#1 $

PLOT ; Lhs = railroad ; Rhs = pooled,css ; Grid ; Fill ; Limits = 0,1

; Vaxis = Estimated Cost Efficiency

; Title = Half Normal vs. Cornwell, Schmidt, Sickles FE Cost Efficiencies $

PLOT ; Lhs = railroad ; Rhs = css,pittlee ; Grid ; Fill ; Limits = 0,1

; Vaxis = Estimated Cost Efficiency

; Title = Pitt and Lee RE vs. Cornwell, Schmidt, Sickles FE Cost Efficiencies $

The results below show the considerable differences in the parameter estimates produced by the

three models. Figure E64.4 demonstrates the expected quite large differences between the time

varying estimates (using the group means) and the time invariant results based on the CSS model.

Figure E64.5 also shows a striking, albeit commonly observed result – the CSS and Pitt and Lee

estimates are virtually identical.


-----------------------------------------------------------------------------

LSDV least squares with fixed effects ....

LHS=LNC Mean = 11.30305












Estd. Autocorrelation of e(i,t) = .668792

--------------------------------------------------

Panel:Groups Empty 0, Valid data 49

Smallest 3, Largest 13

Average group size in panel 12.33

Variances Effects a(i) Residuals e(i,t)

.423441 .006192

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

LNQ2| .29374*** .02850 10.31 .0000 .23789 .34959

LNQ3| .01612*** .00543 2.97 .0030 .00547 .02676

LPLE| .66452*** .03580 18.56 .0000 .59434 .73469

LPKE| .31777*** .01863 17.05 .0000 .28125 .35430

--------+--------------------------------------------------------------------

(These are the estimated parameters in the estimated pooled stochastic frontier model.) Constant| -10.0907*** 1.14284 -8.83 .0000 -12.3306 -7.8507

LNQ2| .64179*** .01371 46.80 .0000 .61491 .66867

LNQ3| .06855*** .00655 10.46 .0000 .05570 .08139

LPLE| .53971*** .08858 6.09 .0000 .36610 .71333

LPKE| .26045*** .03260 7.99 .0000 .19655 .32435


Lambda| 1.29697*** .13854 9.36 .0000 1.02545 1.56850

Sigma| .44345*** .00056 789.05 .0000 .44235 .44455

(These are the estimated parameters in the estimated Pitt and Lee model.) |Deterministic Component of Stochastic Frontier Model

Constant| -7.25643*** .24767 -29.30 .0000 -7.74185 -6.77101

LNQ2| .36259*** .01503 24.12 .0000 .33312 .39205

LNQ3| .01902*** .00240 7.94 .0000 .01432 .02372

LPLE| .64148*** .02112 30.38 .0000 .60009 .68287

LPKE| .30842*** .00700 44.08 .0000 .29471 .32214


Lambda| 12.1932** 5.55909 2.19 .0283 1.2975 23.0888

Sigma(u)| .96071*** .13303 7.22 .0000 .69998 1.22145


Figure E64.4 Cornwell et al. Estimates vs. Normal-Half Normal

Figure E64.5 Estimated Inefficiencies from Cornwell et al. and Pitt and Lee Models


E64.5 Battese and Coelli – Time Dependent Inefficiency Models

Battese and Coelli (1992) proposed a series of models that can be collected in the general

form

yit = xit + vit - uit

uit = g(zit) |Ui| where Ui is half normal or truncated normal.

Several formulations are available. In Battese and Coelli‟s original formulation, the distribution was

half normal and the base specification was

g(zit) = exp[-(t – T)]

where T is the number of periods in their balanced panel. (Here it would be Ti.) They also suggested

g(zit) = exp[-1(t – T) + -2(t – T)2].

The first (linear) form is taken to be the default case for this model. The second is not provided in

this package. The BC92 model is requested with

FRONTIER ; Lhs = ... ; Rhs = one,...

; Model = BC

; Panel $

A truncated normal version is requested by adding

; Rh2 = list of variables which may (generally should) include one

(The ; Model = T is not needed here.)

We note a warning to practitioners. When the data are very consistent with the model, the

Battese and Coelli model produces quite satisfactory results. The framework has been employed in

many recent empirical applications. But, when the data are not of particularly good quality, or this

is the wrong model, extreme results can emerge. The airline data examined in Chapter E63 (and the

WHO data), for example, are a poor fit to this model.

We have labeled this model as „time dependent‟ rather than time varying. While the

inefficiency component in the model does vary through time, the variation is systematic with respect

to time. A question pursued in the ongoing literature is the extent to which this model actually

moves away from the time invariant specification of Pitt and Lee. Since there is actual variation, the

result is clearly somewhere between Pitt and Lee and what we have labeled the unstructured „pooled‟

model. If equals zero, Pitt and Lee emerges, so it depends entirely on this parameter. We have

found in some investigations that the end result is actually closer to Pitt and Lee than it is to the

pooled model – that is, there is quite a lot of structure involved in the BC92 model. The example

below illustrates.


E64.5.1 Application

To illustrate the Battese and Coelli models, we return to the railroad data used previously.

The base case is the pooled data stochastic cost frontier. This is followed by the Pitt and Lee model

and, finally, by the original Battese Coelli „time decay‟ model,

g(zit) = exp[-(t - Ti)].

The commands are

SAMPLE ; All $

REJECT ; ti = 1 $


FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Model = BC ; Panel ; Costeff = eucbc92 $

DSTAT ; Rhs = eucbc92,eusfpool $

KERNEL ; Rhs = eucbc92,eusfpool

; Title = Estimated Cost Efficiencies - Battese-Coelli 1992 vs. Pooled $

KERNEL ; Rhs = eucbc92,pittlee

; Title = Estimated Cost Efficiencies - Battese-Coelli 1992 vs. Pitt and Lee $

The kernel density estimators are used to compare the efficiency estimates from the pooled data

model to the Battese and Coelli model. The estimates of exp(-E[uit|εi]) from the Battese and Coelli

model are far larger than those from the pooled model. The assumption of time invariance of the

random term is a major component of this model. The second kernel estimator below compares

Battese-Coelli to Pitt-Lee. The correspondence of the two results is striking, albeit to be expected

given the small estimated value of .

-----------------------------------------------------------------------------





Inf.Cr.AIC = 432.8 AIC/N = .717



Sigma(v) = .27077

Sigma(u) = .35119

Sigma = Sqr[(s^2(u)+s^2(v)]= .44345


Var[u]/{Var[u]+Var[v]} = .37937









Kodde-Palm C*: 95%: 2.706, 99%: 5.412


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -10.0907*** 1.14284 -8.83 .0000 -12.3306 -7.8507

LNQ2| .64179*** .01371 46.80 .0000 .61491 .66867

LNQ3| .06855*** .00655 10.46 .0000 .05570 .08139

LPLE| .53971*** .08858 6.09 .0000 .36610 .71333

LPKE| .26045*** .03260 7.99 .0000 .19655 .32435


Lambda| 1.29697*** .13854 9.36 .0000 1.02545 1.56850

Sigma| .44345*** .00056 789.05 .0000 .44235 .44455

--------+--------------------------------------------------------------------

-----------------------------------------------------------------------------





Inf.Cr.AIC = -1044.3 AIC/N = -1.729





Sigma(v) = .07828

Sigma(u) = .98783

Sigma = Sqr[(s^2(u)+s^2(v)]= .99093


Var[u]/{Var[u]+Var[v]} = .98301


Battese-Coelli Models: Time Varying uit

Time dependent uit=exp[-eta(t-T)]*|U(i)|








Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -6.83502*** .27362 -24.98 .0000 -7.37130 -6.29873

LNQ2| .35459*** .01636 21.68 .0000 .32254 .38665

LNQ3| .02183*** .00238 9.17 .0000 .01716 .02649

LPLE| .61516*** .02092 29.40 .0000 .57415 .65617

LPKE| .30931*** .00701 44.09 .0000 .29556 .32306


Lambda| 12.6195*** .01188 1062.18 .0000 12.5962 12.6428

Sigma(u)| .98783*** .15275 6.47 .0000 .68845 1.28721

|Eta parameter for time varying inefficiency

Eta| -.00248*** .00086 -2.89 .0039 -.00416 -.00080

--------+--------------------------------------------------------------------


--------+---------------------------------------------------------------------


--------+---------------------------------------------------------------------

EUCBC92| .514566 .231680 .085140 .982112 604 0

EUSFPOOL| .760991 .095229 .478178 .906348 604 0

--------+---------------------------------------------------------------------

Figure E64.6 Kernel Density Estimates for Inefficiencies from Battese and Coelli Model

Figure E64.7 Kernel Density Estimates for Inefficiencies



To form the log likelihood function for the model, we use Battese and Coelli‟s

parameterization of the model. The contribution of the ith individual (firm, group, etc.) to the log

likelihood is

22

21

2

1

22

2 2 2

2 2

( 1) log(1 ) 1log (log 2 log )

2 2 2 (1 )

1log 1 1

2

1log log ( )

2 2

/

i

i

Ti i iti t

T

itt

i i ii

u v

u

i

T TL

g

A A

1

2

1

0 or or

exp[ ( )] or exp( )

1 for a production model and -1 for a cost model

(1 )

(1 ) 1 1

i

i

t it it

i i

it i it

T

i t it iti

T

t it

y

g t T

S

S gA

g

x

w

z

Derivatives of this function are complicated in the extreme, and are omitted here. (Some useful

results for obtaining them are found in Battese and Coelli (1992, 1995).)

The Jondrow estimator of uit is

E[uit | i1,i2,...] = git E[ui | i1,i2,...]

= git

( / )

( / )

i ii i

i i

where i = 1

2

1

(1 ) ( )

(1 )

i

i

T

i t it it

T

t it

g S

g

2

i = 2

2

1

(1 )

(1 ) iT

t itg


E64.6 Time Varying Inefficiency in the Battese Coelli Model

The general form of the Battese and Coelli model is,

yit = xit + vit - uit

uit = g(zit) |Ui| where Ui is half normal or truncated normal.

The default form used earlier is g(zit) = exp[-(t – Ti)]. You may also use a more general form,

g(zit) = exp(zit)

where zit contains any desired set of variables. For this extension, use


; Model = BC ; Hfu = the variables in z

; Pds = the panel specification $

As before, the truncated normal version of the model is also supported. For an example, we have

used

FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Model = BC ; Panel ; Costeff = eucbc92h

; Hfu = rack,virage,tunnel $

The estimates of cost efficiency produced by this model are identical to those from the base model in

the previous section.

-----------------------------------------------------------------------------








Sigma(v) = .07840

Sigma(u) = .97369

Sigma = Sqr[(s^2(u)+s^2(v)]= .97685


Var[u]/{Var[u]+Var[v]} = .98247



Time varying uit=exp[eta*z(i,t)]*|U(i)|








Kodde-Palm C*: 95%: 8.761, 99%: 12.483


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -6.89845*** .32923 -20.95 .0000 -7.54374 -6.25316

LNQ2| .35751*** .01591 22.47 .0000 .32632 .38870

LNQ3| .02149*** .00236 9.10 .0000 .01686 .02613

LPLE| .61741*** .02430 25.40 .0000 .56977 .66504

LPKE| .30892*** .00759 40.71 .0000 .29405 .32380


Lambda| 12.4202*** .01108 1120.76 .0000 12.3984 12.4419

Sigma(u)| .97369*** .13513 7.21 .0000 .70884 1.23855

|Coefficients in u(i,t)=[exp{eta*z(i,t)}]*|U(i)|

RACK| .00024 .01743 .01 .9889 -.03392 .03441

VIRAGE| -.02096 .01321 -1.59 .1126 -.04685 .00493

TUNNEL| .00219 .01625 .14 .8926 -.02966 .03405

--------+--------------------------------------------------------------------

(Parameter estimates from base case Battese and Coelli) --------+--------------------------------------------------------------------


Constant| -6.83502*** .27362 -24.98 .0000 -7.37130 -6.29873

LNQ2| .35459*** .01636 21.68 .0000 .32254 .38665

LNQ3| .02183*** .00238 9.17 .0000 .01716 .02649

LPLE| .61516*** .02092 29.40 .0000 .57415 .65617

LPKE| .30931*** .00701 44.09 .0000 .29556 .32306


Lambda| 12.6195*** .01188 1062.18 .0000 12.5962 12.6428

Sigma(u)| .98783*** .15275 6.47 .0000 .68845 1.28721


Eta| -.00248*** .00086 -2.89 .0039 -.00416 -.00080

--------+--------------------------------------------------------------------

E64.7 True Fixed Effects Models

The received applications of fixed effects to the stochastic frontier model, primarily

Cornwell, Schmidt and Sickles have actually been reinterpretations of the linear regression model

with fixed effects, not frontier models of the sort considered here. The estimators described below

apply the fixed effects to the stochastic frontier. We label these „true fixed effects models‟ to

distinguish them from the linear regression models as discussed in Section E64.3. (This is not meant

to apply that these are „false fixed effects models.‟ Had we used „real fixed effects models,‟ then the

contrasting „unreal fixed effects models‟ would arise which is likewise problematic. We use this

purely as a concise term of art, not a characterization of the types of estimators considered.)

The stochastic frontier model with fixed effects may be fit in several forms. The base case

applies the heterogeneity to the normal-half normal production function model;

yit = i + xit + vit - Suit,

where S = +1 for a production frontier and -1 for a cost frontier, and

ui = | N[0, u2] |.


This model (as are the others) is fit by maximum likelihood, not least squares. The normal-half

normal model is applied to the stochastic part of the model. Note that the inefficiency term in this

model is time varying. The heterogeneity may appear in Stevenson‟s truncated normal model as

follows. This is a true fixed effects, normal-truncated normal model.

yit = i+ xit + vit - uit,

ui = | N[i, u2] |

i = zi.

In this form, the heterogeneity is still retained in the production function part of the model. Another

possibility is to allow the heterogeneity to enter the mean of the inefficiency distribution rather than

the production function – this seems the most natural of the three forms. In this case,

yit = xit + vit - uit,

uit = | N[it , u2] |

it = i + (nonzero) or zi.

The mean of the inefficiency distribution shifts in time, but also has a firm specific component.

Finally, the heterogeneity may be shifted to the variance of the inefficiency distribution. In this

form, we have


uit = | N[0, ui2] |

uit2 = u

2 exp(i +zit).

The variables in the variance term may be omitted if only a groupwise heteroscedastic model is

desired. Note this is a half normal model. A model with nonzero underlying mean and variation in

the variance appears to be inestimable. Note that in order to secure identification, this model must

have time varying inefficiency, induced by time variation in the variance.

NOTE: We have had extremely limited success with the second and third forms of the model. The

likelihood function is quite volatile in the parameters of the underlying mean of the truncated

distribution with the result that the estimated variance parameters and generally become negative

in the early iterations and estimation must be halted. This occurs even when very good starting

values are used, which suggests that estimation of this model as stated is likely to be extremely

problematic in all but the most favorable of cases. An alternative approach which is simple, but can

be used only with small panels (up to 100 groups), is suggested below.

In terms of implementation, we note that these forms of the models, though they are new

with LIMDEP, have long been feasible. The panels typically used by researchers in this setting are

often fairly small – our airline data for example have only 25 units and the Swiss railroad data has 49

firms. It would always have been possible to create these models simply by adding dummy variables

to the familiar model. However, LIMDEP‟s implementation of the model obviates this by using the

methodology described in Chapter R23. In principle, this allows up to 100,000 firms in the data set.


Results that are kept for this model are

Matrices: b = estimate of

varb = asymptotic covariance matrix for estimate of .

alphafe = estimated fixed effects (if ; Par is in the command)

Scalars: kreg = number of variables in Rhs

nreg = number of observations

logl = log likelihood function

Last Model: b_variables

The upper limit on the number of groups is 100,000.

E64.7.1 Commands for the Fixed Effects Stochastic Frontier Model

The command for fitting the normal-half normal model with fixed effects is as follows:

FRONTIER ; Lhs = ... ; Rhs = one,... $


; FEM ; Pds = specification $

The model must be fit twice. The first model is a pooled data model which provides the starting values

for the second. The second command is identical to the first save for the addition of the panel data

specification. In order to set up the initial values correctly, it is essential that your initial model include

the constant term first in the Rhs list and that the second model specification be identical to the first.

Other options and specifications for the fixed effects models are the same as in other applications. (See

Chapter R23 for details.) The fixed effects command also contains the constant term, but this will be

removed by the command processor later. See the example below for the operation of the command.

NOTE: Starting values must be provided by the first estimator. The specification ; Start = list of

values is not available for this model. You must fit both models each time you fit an FEM. The

starting values are not retained after the FEM is estimated.

All fixed effects forms are estimated by maximum likelihood. You may also fit a two way

fixed effects model

yit = i+ t + xit + vit - ui, (change to v + u for a stochastic cost frontier),

ui = | N[0, u2] |

where t is an additional, time (period) specific effect. The time specific effect is requested by adding

; Time

to the command if the panel is balanced, and

; Time = variable name

if the panel is unbalanced.


For the unbalanced panel, we assume that overall, the sample observation period is

t = 1,2,..., Tmax and that the time variable gives for the specific group, the particular values of t that

apply to the observations. Thus, suppose your overall sample is five periods. The first group is three

observations, periods 1, 2, 4, while the second group is four observations, 2, 3, 4, 5. Then, your

panel specification would be

; Pds = Ti, for example, where Ti = 3, 3, 3, 4, 4, 4, 4

and ; Time = Pd, for example, where Pd = 1, 2, 4, 2, 3, 4, 5.

E64.7.2 Model Specifications for Fixed Effects Stochastic Frontier Models



; Par keeps ancillary parameter in main results vector b.




same as ; Printvc.



; Tlg[ = value] sets convergence value for gradient.












E64.7.3 Application of the True Fixed Effects Model

We have fit the fixed effects model with the airline data used in the previous chapter. These

are simple models that do not use the observed heterogeneity in load factor, stage length or number

of points served. Additional variables which vary over time can also be included in the function. The

commands employed for the example are

SETPANEL ; Group = firm ; Pds = ti $

FRONTIER ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp,lk$

FRONTIER ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp,lk,

; FEM ; Panel ; Techeff = euitfe ; Par $

REGRESS ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp,lk

; Panel ; Fixed Effects $

CREATE ; ai = alphafe(firm) $

CALC ; maxai = Max(ai) $

CREATE ; euicss = exp(-(maxai - ai)) $

CREATE ; meuitfe = Group Mean(euitfe, Pds = ti) $

SAMPLE ; All $

CREATE ; Period = Ndx(firm,1) $

PLOT ; For[period=1] ; Lhs = firm ; Rhs = euitfe,euicss

; Fill ; Symbols ; Limits = 0,1 ; Grid

; Title = Technical Efficiency Estimates, CSS vs. True Fixed Effects

(Group Means)

; Vaxis = Estimated Technical Efficiency $

This command recovers the estimated fixed effects from the Cornwell et al. model. then replicates

them for each year in the data set. This is used to create the plot of the two sets of estimates of ui

shown below.

-----------------------------------------------------------------------------





Inf.Cr.AIC = -198.9 AIC/N = -.777




Sigma(v) = .13791

Sigma(u) = .13007

Sigma = Sqr[(s^2(u)+s^2(v)]= .18957


Var[u]/{Var[u]+Var[v]} = .24425









Kodde-Palm C*: 95%: 2.706, 99%: 5.412


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439

LF| .37257*** .07038 5.29 .0000 .23463 .51052

LM| .69910*** .07580 9.22 .0000 .55054 .84766

LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299

LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530

LP| .44533*** .09498 4.69 .0000 .25917 .63149

LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759


Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373

Sigma| .18957*** .00064 297.81 .0000 .18832 .19082

--------+--------------------------------------------------------------------

Normal exit from iterations. Exit status=0.

-----------------------------------------------------------------------------

FIXED EFFECTS Frontr Model




Inf.Cr.AIC = -344.1 AIC/N = -1.344


Unbalanced panel has 25 individuals

Skipped 0 groups with inestimable ai

Half normal stochastic frontier

Sigma( u) (1 sided) = .11713

Sigma( v) (symmetric)= .08347

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

|Production / Cost parameters

LF| .20090** .09879 2.03 .0420 .00727 .39453

LM| .78173*** .07495 10.43 .0000 .63483 .92863

LE| .56626 .62357 .91 .3638 -.65591 1.78843

LL| -.16687 .11488 -1.45 .1464 -.39204 .05830

LP| .17273* .09414 1.83 .0665 -.01177 .35724

LK| -.29167 .69055 -.42 .6728 -1.64513 1.06179

|Variance parameter for v +/- u

Sigma| .14383*** .00045 317.51 .0000 .14294 .14472

|Asymmetry parameter, lambda

Lambda| 1.40326*** .21468 6.54 .0000 .98248 1.82403

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


-----------------------------------------------------------------------------

LSDV least squares with fixed effects ....

LHS=LQ Mean = -1.11237












Estd. Autocorrelation of e(i,t) = .575211

--------------------------------------------------

Panel:Groups Empty 0, Valid data 25

Smallest 2, Largest 15

Average group size in panel 10.24

Variances Effects a(i) Residuals e(i,t)

.030410 .013550

--------+--------------------------------------------------------------------


LQ| Coefficient Error t |t|>T* Interval

--------+--------------------------------------------------------------------

LF| .14860 .09677 1.54 .1259 -.04107 .33828

LM| .80497*** .07843 10.26 .0000 .65125 .95868

LE| .68672 .67075 1.02 .3069 -.62792 2.00136

LL| -.15977 .11829 -1.35 .1780 -.39162 .07208

LP| .16227 .09973 1.63 .1050 -.03320 .35774

LK| -.37897 .74689 -.51 .6123 -1.84284 1.08490

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------

Figure E64.8 plots the Jondrow estimates of exp(-E[uit|it]) from the true fixed effects model

and the estimates of ui from the Cornwell, Schmidt and Sickles model of Section E64.4 for each

firm. Since the true FE estimates vary by period, we have plotted the group means. The implication

of the regression based model is clear in the figure. The estimates of technical efficiency from the

true FEM are generally considerably larger than those from the deterministic model.

Figure E64.8 True Fixed Effects vs. Fixed Effects Estimates of ui


E64.7.4 Fixed Effects in the Normal-Truncated Normal Model

The preceding may be extended to the truncated normal (with earlier caveats) as follows: For

a model with heterogeneity appearing in the production (or cost) function,

yit = i + xit + vit - uit,

uit = | N[it , u2] |

it = (nonzero) or zit,

use FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...

; Model = T $

FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...

; FEM ; Panel $

The Rh2 is optional in the first equation if you have only a constant term in the mean of the truncated

distribution. But, you should include it nonetheless so as to insure the match between the first and

second commands. Also, it is essential that both Rhs and Rh2 include constant terms in the first

positions.

To move the heterogeneity to the mean of the underlying truncated normal distribution,


ui = | N[itu2] |

it = i + zit,

use FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...

; Model = T $

FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...

; Model = T

; FEM ; Panel $

Note that this version differs from the earlier one only in the presence of ; Model = T in the second

form and its absence in the first. Again, the variable specifications in the two commands must be

identical, and both must include constant terms in the first position in both lists. As before, you may

use ; Rh2 = one if you do not require variables zit in the mean. (This constant term will be removed

from the fixed effects model, but this common value is used as the starting value for the firm specific

estimates.)

We note, we have had scant success with this model even with a carefully constructed data

set and good starting values. The problem appears to be Newton‟s method, which must be used for

the general fixed effects program which this is part of. If you have a small panel with no more than

100 groups, an alternative approach appears to work better. You may provide a stratification

variable in the cross section template to request that a set of dummy variables be inserted directly

into the function.


To fit a model of the first form above, use


; Model = T [ ; Rh2 = list is optional ]

; Str = a variable which provides a group indicator for the panel $

The stratification variable must take the full set of values from 1 to N up to 100 and all groups must

have at least two observations. For the second form, with the heterogeneity embedded in the mean

of the truncated normal distribution, add

; Mean

to the command.

This provides four possible forms of the model, which we illustrate with the airline data:

NAMELIST ; x = one,lf,lm,le,ll,lp,lk $

This is a true fixed effects model with normal-truncated normal structure for uit.

FRONTIER ; Lhs = lq ; Rhs = x

; Model = T

; Str = firm $

This model is the same as the preceding one except now i= 1 + 2loadfctri.


; Model = T

; Rh2 = one,loadfctr

; Str = firm $

This is a true fixed effects model with the fixed effects appearing in i rather than in the production

function.


; Model = T

; Mean

; Str = firm $

This model is the same as the preceding model except that loadfctr now also appears in the mean of

the truncated variable.


; Model = T

; Rh2 = one,loadfctr ; Mean

; Str = firm $


E64.7.5 Fixed Effects in the Heteroscedasticity Model

The firmwise heteroscedasticity model,


uit = | N[0, uit2] |

uit2 = u

2 exp(i +zit)

is requested in the same fashion as the normal-truncated normal model, using a stratification variable

in the cross section formulation. (This likelihood function is likewise quite ill behaved, though less

so than the truncation form.) The command is

FRONTIER ; Lhs = ... ; Rhs = one, ...

; Het

; Hfu = list of variables ; Hfv = one

; Str = stratification variable $

This model also allows for the doubly heteroscedastic form,


uit = | N[0, uit2] |

uit2 = u

2 exp(i +zit)

vit ~ N[0,vit2]

vit2 = v

2 exp(′wit)

The command would be

FRONTIER ; Lhs = ... ; Rhs = one, ...

; Het

; Hfu = list of variables ; Hfv = list of variables

; Str = stratification variable $

To continue the earlier example, the following fits a model of heteroscedasticity to the

airline data. The first model has heteroscedasticity and the fixed effects in the variance of ui. The

second is doubly heteroscedastic, again with the fixed effects in the variance of ui.



; Het ; Hfu = one,loadfctr ; Hfv = one ; Str = firm $


; Het ; Hfu = one,loadfctr ; Hfv = one,loadfctr ; Str = firm $


-----------------------------------------------------------------------------






Sigma(v) = .09357

Sigma(u) = .22182

Sigma = Sqr[(s^2(u)+s^2(v)]= .24075


Var[u]/{Var[u]+Var[v]} = .67126



Stratified by FIRM , 25 groups

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -3.70847*** .75902 -4.89 .0000 -5.19612 -2.22081

LF| .38142*** .08642 4.41 .0000 .21204 .55079

LM| .57659*** .09175 6.28 .0000 .39676 .75642

LE| 2.78934*** .72692 3.84 .0001 1.36459 4.21408

LL| -.41646*** .08641 -4.82 .0000 -.58582 -.24710

LP| .59190*** .11704 5.06 .0000 .36251 .82129

LK| -2.87861*** .80566 -3.57 .0004 -4.45767 -1.29956


Constant| -4.73798*** .21921 -21.61 .0000 -5.16764 -4.30833


Constant| 8.11346 7.80244 1.04 .2984 -7.17903 23.40596

LOADFCTR| -23.6678*** 6.88328 -3.44 .0006 -37.1588 -10.1768

FIRM001| 1.35540 7.37739 .18 .8542 -13.10403 15.81482

FIRM002| .25791 7.25149 .04 .9716 -13.95476 14.47057

FIRM003| .68176 7.22190 .09 .9248 -13.47290 14.83643


FIRM021| .73089 7.21226 .10 .9193 -13.40488 14.86666

FIRM022| -.38963 7.46091 -.05 .9584 -15.01274 14.23347

FIRM023| -.63171 7.53984 -.08 .9332 -15.40952 14.14610

FIRM024| -7.77451 41.07339 -.19 .8499 -88.27688 72.72786

--------+--------------------------------------------------------------------

Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.


-----------------------------------------------------------------------------


-----------------------------------------------------------------------------





Inf.Cr.AIC = -310.6 AIC/N = -1.213




Sigma(v) = .09519

Sigma(u) = .20307

Sigma = Sqr[(s^2(u)+s^2(v)]= .22427


Var[u]/{Var[u]+Var[v]} = .62318



Stratified by FIRM , 25 groups

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -3.00340*** .65319 -4.60 .0000 -4.28364 -1.72316

LF| .24071*** .07721 3.12 .0018 .08938 .39204

LM| .60992*** .07600 8.03 .0000 .46096 .75887

LE| 2.19046*** .62677 3.49 .0005 .96202 3.41890

LL| -.38679*** .07314 -5.29 .0000 -.53015 -.24344

LP| .49345*** .09820 5.03 .0000 .30098 .68591

LK| -2.09638*** .69385 -3.02 .0025 -3.45631 -.73646


Constant| -13.5487*** 2.64897 -5.11 .0000 -18.7406 -8.3569

LOADFCTR| 15.5221*** 4.48367 3.46 .0005 6.7343 24.3099


Constant| 8.01865 5.60084 1.43 .1522 -2.95879 18.99609

LOADFCTR| -23.3031*** 6.88508 -3.38 .0007 -36.7976 -9.8086

FIRM001| .88200 5.06220 .17 .8617 -9.03972 10.80373

FIRM002| -.83198 4.67591 -.18 .8588 -9.99660 8.33264

FIRM003| -.18608 4.65296 -.04 .9681 -9.30573 8.93356


FIRM021| .35047 4.63405 .08 .9397 -8.73210 9.43303

FIRM022| -.68781 4.83235 -.14 .8868 -10.15903 8.78342

FIRM023| -.96206 4.88186 -.20 .8438 -10.53033 8.60622

FIRM024| -2.86357 4.82675 -.59 .5530 -12.32383 6.59670

--------+--------------------------------------------------------------------


E64.8 True Random Effects Models

We call the stochastic frontier model with a random as opposed to a fixed effect term a „true

random effects‟ model. The structure is the normal-half normal stochastic frontier model,

yit = wi+ + ′xit + vit + uit

vit ~ N[0,v2]

uit = |Uit|, Uit ~ N[0,u2]

wi ~ N[0,w2].

At first look, this appears to be a model with a three part disturbance, which would surely be

inestimable. But, that is incorrect. It is a model with a traditional random effect, but with the

additional feature that the time varying disturbance is not normally distributed. Specifically, the

model may be written in our familiar form for the stochastic frontier model,

yit = + ′xit + it + wi

it ~ (2/)(it/)(-it/)

wi ~ N[0,w2].

The model is estimable by maximum simulated likelihood, as shown below. Contrast this to the Pitt

and Lee form,

yit= + ′xit + vit + ui

vit~ N[0,v2]

ui = |Ui|, Ui ~ N[0,u2].

In this form, ui, the time invariant effect, is the inefficiency. In the true random effects model, uit is

the inefficiency, and it is time varying. The latent heterogeneity, the random effect, is wi. Thus, in

the Pitt and Lee model, the „inefficiency‟ term also contains all other time invariant unmeasured

sources of heterogeneity. In the true random effects model, these effects appear in wi, and uit picks

up the inefficiency. By this interpretation, we will expect (and always find) that estimated

inefficiencies from the Pitt and Lee are larger than those from the true random effects model,

sometimes far larger. The same result is at work in the difference between the Cornwell et al. fixed

effects model and the true fixed effects model. Figure E64.8 clearly shows the effect at work.

The true random effects model is estimated as a form of random parameters (RP) model, in

which the only random parameter in the model is the constant term. Thus, we write the model in the

canonical RP form

yit = i + ′xit + vit + uit

vit ~ N[0,v2]

uit = |Uit|, Uit~ N[0,u2]

i = + wi

wi ~ N[0,w2]


Details on estimating random parameters models appear in Chapter R24, so they will be omitted

here.

The command structure for the true random effects model is similar to that for the true fixed

effects model. The frontier model must be fit twice, first with no effects to generate the starting

values, then with the effect specified. The commands are

FRONTIER ; Lhs = ... ; Rhs = one,... ; Par $


; RPM ; Fcn = one(n) $

If desired, the Jondrow estimates are requested as usual with

; Eff = the variable name

The computation of random parameters models is fairly time consuming because of the simulations.

You can control this in part with

; Pts = the number of replications

For exploratory work (or for examples in program documentation), small values such as 25 or 50 are

sufficient. For final results destined for publication, larger values, in the range of several hundred

are advisable. Also, we advise using Halton sequences rather than pseudorandom numbers for the

simulations (see Chapter R24). The parameter is

; Halton

The random parameters formulation also allows a variety of specifications for the mean of the

underlying uit – the normal-truncated normal model – and for heteroscedasticity. These are

discussed in Section E64.9.

Application

To illustrate the true random effects model, we continue the analysis of the airline data. The

commands below estimate the pooled model, then the true RE model. In like fashion to the analysis

of fixed effects, we then compare the true random effects estimates of inefficiency to the Pitt and Lee

estimates. Figure E64.8 illustrates the general result that the estimated inefficiencies in the true fixed

effects model will differ considerably from those produced by the Cornwell et al. approach to fixed

effects. Figure E64.9 shows the same result for the two approaches to random effects. Numerous

studies in the literature (see Greene (2005) for discussion) have documented the similarity of the

random and fixed approaches – when the same overall structure is used. Thus, Figure E64.10 shows

similar results for the true fixed and random effects models and for the Pitt and Lee and Cornwell et

al. models.


The commands used for this application are as follows:


FRONTIER ; Lhs = lq ; Rhs = x ; Panel ; Eff = uplre $

FRONTIER ; Lhs = lq ; Rhs = x ; Par $

FRONTIER ; Lhs = lq ; Rhs = x ; Panel ; RPM ; Eff = utre

; Fcn = one(n) ; Pts = 50 ; Halton $

FRONTIER ; Lhs = lq ; Rhs = x ; Par $

FRONTIER ; Lhs = lq ; Rhs = x ; Panel ; FEM ; Eff = utfe $

DSTAT ; Rhs = uplre,utre $

CREATE ; utrebar = Group Mean(utre, Str = firm) $

PLOT ; Lhs = uplre ; Rhs = utrebar ; Grid

; Title = Group Means of u(i,t) vs. Time Invariant u(i) $

PLOT ; Lhs = utfe ; Rhs = utre ; Grid

; Title = Time Varying FE u(i) vs. Time Varying RE u(i) $

-----------------------------------------------------------------------------









Sigma(v) = .11582

Sigma(u) = .25552

Sigma = Sqr[(s^2(u)+s^2(v)]= .28054


Var[u]/{Var[u]+Var[v]} = .63879









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -1.70327*** .41761 -4.08 .0000 -2.52176 -.88477

LF| .19534** .09759 2.00 .0453 .00407 .38662

LM| .81312*** .06954 11.69 .0000 .67682 .94941

LE| 1.12741*** .34589 3.26 .0011 .44947 1.80534

LL| -.32931*** .07230 -4.55 .0000 -.47102 -.18760

LP| .22206*** .06265 3.54 .0004 .09927 .34485

LK| -.86072** .42646 -2.02 .0436 -1.69657 -.02488


Lambda| 2.20605* 1.31249 1.68 .0928 -.36639 4.77849

Sigma(u)| .25552** .10148 2.52 .0118 .05661 .45442

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------







Sigma(v) = .13791

Sigma(u) = .13007

Sigma = Sqr[(s^2(u)+s^2(v)]= .18957


Var[u]/{Var[u]+Var[v]} = .24425









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439

LF| .37257*** .07038 5.29 .0000 .23463 .51052

LM| .69910*** .07580 9.22 .0000 .55054 .84766

LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299

LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530

LP| .44533*** .09498 4.69 .0000 .25917 .63149

LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759


Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373

Sigma| .18957*** .00064 297.81 .0000 .18832 .19082

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------

These are the estimates of the true random effects model. Note that the variation of the

random terms in the model has been rearranged. In the pooled model, sv = 0.138 and su = 0.130. In

the random effects model, we have sv = .099 and su= .100. But, sw = .140. The proportional

allocation of the total to u and v has stayed roughly the same, but some additional variation is now

attributed to the random effect. Note that the production function parameters have changed

substantially as well.


-----------------------------------------------------------------------------

Random Coefficients Frontier Model



Restricted log likelihood .00000

Chi squared [ 1 d.f.] 321.16131

Significance level .00000


Inf.Cr.AIC = -301.2 AIC/N = -1.176



Stochastic frontier (half normal model)

Simulation based on 50 Halton draws

Sigma( u) (1 sided) = .09962

Sigma( v) (symmetric) = .09857

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

|Production / Cost parameters, nonrandom first

LF| .20387*** .05183 3.93 .0001 .10229 .30545

LM| .79450*** .04660 17.05 .0000 .70318 .88583

LE| 1.10745*** .33573 3.30 .0010 .44943 1.76547

LL| -.32691*** .04277 -7.64 .0000 -.41074 -.24308

LP| .22812*** .05403 4.22 .0000 .12223 .33401

LK| -.84947** .38344 -2.22 .0267 -1.60101 -.09794

|Means for random parameters

Constant| -1.83727*** .35442 -5.18 .0000 -2.53191 -1.14263

|Scale parameters for dists. of random parameters

Constant| .11729*** .00934 12.56 .0000 .09898 .13559


Sigma| .14015*** .01373 10.21 .0000 .11325 .16705


Lambda| 1.01064** .43792 2.31 .0210 .15234 1.86895

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


--------+---------------------------------------------------------------------


--------+---------------------------------------------------------------------

UPLRE| .221170 .117670 .016992 .435912 256 0

UTRE| .078815 .031677 .026405 .305595 256 0

--------+---------------------------------------------------------------------


Figure E64.9 Time Varying vs. Time Invariant Estimates of u(i)

Figure E64.10 Comparison of Time Varying Fixed and Random Effects Estimates


E64.9 Random Parameters Stochastic Frontier Models

The random parameters stochastic frontier model in LIMDEP is very general, and embodies

all three of the formulations discussed in the preceding sections on fixed and random effects.

yit = ixit + vit - uit,

ui = | N[it, uit2] |

it = imit.

uit2 = u

2 exp(iwit).

The model allows, all at once, half normal or truncated normal distribution for ui and firmwise and/or

timewise heteroscedasticity in uit. The model form allows parameters to be random in all three parts

of the specification with the single restriction noted below. (Only the variance of the „disturbance,‟

vit is assumed to be constant. In addition, this model form does not accommodate heteroscedasticity

in vit.) As will be clear in what follows, the true random effects model developed in the previous

section is a special case of this model with nonrandom parameters in it and uit2 and

only a random constant term in i.

NOTE: The random parameters normal-truncated normal model with heteroscedasticity (in uit) at

the same time is not identified. Only one of these two should be specified. The command parser

will not prevent you from specifying such a model, but it will ultimately be impossible to obtain the

parameter estimates.

The general structure of the random parameters stochastic frontier model is based on the

conditional density

f(yit| xit, i) = f(ixit), i = 1,...,N, t = 1,...,Ti

where i = + zi + vi

and f(.) is the density for the stochastic frontier regression model. The model assumes that

parameters are randomly distributed with possibly heterogeneous (across individuals) means

E[i| zi] = + zi,

(the second term is optional – the mean may be constant), and

Var[i| zi] = .

As noted earlier, the heterogeneity term is optional. In addition, it may be assumed that some of the

parameters are nonrandom by placing rows of zeros in the appropriate places in and . The general

form of random parameter vector i is also extended to i and i. The general aspects of random

parameters model estimation in LIMDEP are described in Chapter R24.


Command for the Random Parameters Model

The model command for the random parameters form of the stochastic frontier model is as

follows. The first FRONTIER command is mandatory, and is needed to obtain the starting values.

This is a pooled data version of the model. Note that it does not include the heteroscedasticity or

truncation specification, even if the second command does.

FRONTIER ; Lhs = dependent variable ; Rhs = independent variables

; Parameters $


; Rhs = independent variables

[ ; Rh2 = list is optional for the truncated normal model ]

[ ; Hfn = list is optional for the heteroscedasticity model ]

; Pds = fixed periods or count variable

; RPM (may include = variables in z)

; Fcn = random parameters specification $

(Note, again, only one of the two optional specifications noted should be specified.)

NOTE: For this model, your Rhs list must include a constant term. Though not strictly necessary,

you should also include constants in Rh2 or Hfn if they are specified.

Specifying Random Parameters

The ; Fcn = specification is used to define the random parameters. It is constructed from

the list of Rhs names as follows: Suppose your model is specified by

; Rhs = one, x1, x2, x3, x4

This involves five coefficients. Any or all of them may be random; any not specified as random are

assumed to be constant. For those that you wish to specify as random, use the following for

production (cost, profit) function parameters,

; Fcn = variable name (distribution),

variable name (distribution), ...

There are two other sets of parameters in the model, in the mean of and variance of the one sided

disturbance. To specify random parameters in the underlying mean of the truncated normal variable,

use the following:

; Fcn = variable name [distribution],

variable name [distribution], ...

(Note square brackets designate the terms in it.) For parameters in the computation of the variance

of uit, use

; Fcn = variable name <distribution>,

variable name <distribution>, ...


The difference in the three formulations is in the enclosures, ( ) for production function, [ ] for mean

of the truncated distribution, and <> for the variance of the one sided disturbance. This distinction

is necessary because the lists might have variables in common, and this is the only way to distinguish

them. In particular, it is likely that all three lists would include one, so this device is used to

distinguish the three functions.

Three distributions may be specified All random variables have mean 0.

n = standard normal distribution, variance = 1,

t = triangular (tent shaped) distribution in [-1,+1], variance = 1/6,

u = standard uniform distribution [-1,1], variance = 1/3.

Note that each of these is scaled as it enters the distribution, so the variance is only that of the

random draw before multiplication. (See Chapter R23 for discussion of this computation and for

other distributions that can be specified.) The latter two distributions are provided as one may wish

to reduce the amount of variation in the tails of the distribution of the parameters across individuals

and to limit the range of variation. (See Train (2010) for discussion.) For example, to specify that

the constant term and the coefficient on x1 are normally distributed with fixed mean and variance,

and a normally distributed constant in the mean of the truncated distribution, you might use

; Fcn = one(n), x1(n), one[n]

This specifies that the first and second coefficients are random while the remainder are not. The

parameters estimated will be the mean and standard deviations of the distributions of these two

parameters and the fixed values of the other three.

NOTE: If you use the wrong enclosures for the variables, a diagnostic will appear that the program

does not recognize a variable. For example:

FRONTIER ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp

; Hfn = one,lf ; RPM ; Pds = ni

; Fcn = one(n),lf(n),lf[n] $

Variable in FCN=name[type] is not in RHS/RH2/HFN list.

The reason for the diagnostic is that the lf[n] would indicate a specification for the truncation model,

using ; Rh2 = list. But, this command specifies only heteroscedasticity, which is denoted with <>

enclosures. Hence, when the lf[n] is encountered, LIMDEP searches for lf in an Rh2 list, and finding

no such list, issues the diagnostic.


Correlated Random Parameters

The stochastic frontier model does not support correlated random parameters. The model is

not identified with this extension.

Heterogeneity in the Means The preceding examples have specified that the mean of the random variable is fixed over

individuals. If there is measured heterogeneity in the means, in the form of

E[ki] = k + mkmzmi

where zmi is a variable that is measured for each individual, then the command may be modified to

; RPM = list of variables in z

In the data set, these variables must be repeated for each observation in the group. Since the

coefficients are assumed to be time invariant, the variables in zi must be also.

The Parameter Vector and Retained Results

The variances of the underlying random variables are given earlier, 1 for the normal

distribution, 1/3 for the uniform, and 1/6 for the tent distribution. The k parameters are only the

standard deviations for the normal distribution. For the other two distributions, k is a scale

parameter. The standard deviation is obtained as k / 3 for the uniform distribution and k / 6 for

the triangular distribution. When the parameters are correlated, the implied covariance matrix is

adjusted accordingly. The correlation matrix is unchanged by this.

Results saved by this estimator are:

Matrices: b = estimate of

varb = asymptotic covariance matrix for estimate of .

beta_i = individual specific parameters, if ; Par is requested.

Scalars: kreg = number of variables in Rhs

nreg = number of observations

logl = log likelihood function

Last Model: b_variables

Last Function: None


Standard Model Specifications for the Stochastic Frontier Random Parameters Model



; Par keeps individual specific parameter estimates.




same as ; Printvc.

; Robust requests a „sandwich‟ estimator or robust covariance matrix for TSCS

and several discrete choice models.


















Application

We continue the earlier application by fitting the stochastic frontier model with random

parameters. The random parameters truncation model appears to be unidentified in these data, so the

second model fit is with heteroscedasticity. In the first model, the constant and one of the production

coefficients is specified to be random. In the second, these two coefficients and the parameter on the

variable that enters the variance function are all taken to be random. The kernel density estimators

compare the efficiency estimates from the random parameters model to those from the simplest

pooled estimator.


The commands are:


FRONTIER ; Lhs = lq ; Rhs = x ; Eff = u $


; RPM ; Panel ; Pts = 50 ; Halton; Fcn = one(n),lf(n) ; Eff = urp1 $

KERNEL ; Rhs = urp1,u $

FRONTIER ; Lhs = lq ; Rhs = x $

FRONTIER ; Lhs = lq ; Rhs = x ; Hfn = one,loadfctr

; RPM ; Panel ; Pts = 50 ; Halton

; Fcn = one(n),lf(n),loadfctr<n> $ -----------------------------------------------------------------------------

Random Coefficients Frontier Model



Restricted log likelihood .00000

Chi squared [ 2 d.f.] 322.66392

Significance level .00000


Inf.Cr.AIC = -300.7 AIC/N = -1.174





Sigma( u) (1 sided) = .10598


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


LM| .81447*** .04526 18.00 .0000 .72577 .90317

LE| 1.16342*** .31391 3.71 .0002 .54817 1.77867

LL| -.33712*** .04111 -8.20 .0000 -.41769 -.25654

LP| .24213*** .04782 5.06 .0000 .14841 .33585

LK| -.94502*** .35520 -2.66 .0078 -1.64119 -.24886


Constant| -1.89056*** .33140 -5.70 .0000 -2.54009 -1.24103

LF| .21430*** .05277 4.06 .0000 .11088 .31773


Constant| .12526*** .00926 13.53 .0000 .10711 .14341

LF| .04979*** .00823 6.05 .0000 .03366 .06592


Sigma| .14165*** .01265 11.20 .0000 .11686 .16645


Lambda| 1.12768*** .42335 2.66 .0077 .29792 1.95743

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------

Figure E64.11 shows the distributions of the estimates of inefficiencies from the random parameters

model and the simple, pooled fixed parameters model. The figure suggests that the RP formulation

is moving some of the variation of the outcome variable out of the inefficiency term and into the

production model, in the form of parameter variation.


Figure E64.11 Kernel Density Estimator for Random Parameters Model Inefficiencies

-----------------------------------------------------------------------------

Random Coefficients FrntrTrn Model





Stochastic frontier, truncation/hetero.


Estimated parameters of efficiency dstn

s(u) = .189842 s(v)= .07165

avgE[u|e]= .10986 avgE[TE|e]= .90303

Lambda = su/sv = 2.64974

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

|Nonrandom parameters

LM| .62243*** .04223 14.74 .0000 .53966 .70521

LE| .38353 .28063 1.37 .1717 -.16649 .93355

LL| -.36579*** .03589 -10.19 .0000 -.43614 -.29544

LP| .15282*** .04217 3.62 .0003 .07017 .23547

LK| -.16125 .31392 -.51 .6075 -.77652 .45401

suONE| 9.05239*** 1.65934 5.46 .0000 5.80014 12.30464


Constant| -1.17144*** .29799 -3.93 .0001 -1.75549 -.58739

LF| .49011*** .04904 9.99 .0000 .39398 .58623

suLOADFC| -16.4160*** 3.47560 -4.72 .0000 -23.2281 -9.6039


Constant| .12591*** .00859 14.65 .0000 .10906 .14275

LF| .01186** .00593 2.00 .0456 .00023 .02350

suLOADFC| 1.47653*** .36192 4.08 .0000 .76718 2.18589

|Sigma(v) from symmetric disturbance.

Sigma(v)| .07165*** .00670 10.69 .0000 .05851 .08478

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


E64.10 Alvarez et al. – Fixed Management Model

Alvarez, Arias and Greene (2006) suggested a production model in which an unobserved

factor enters as a latent variable. The core production model is

yit = f(xit,1,xit,2,...,xit,K, mi)

where the unobservable, time invariant factor, „mi‟ is labeled „management‟ in their paper. By

treating the unobserved factor as a random component in the model, the authors develop a stochastic

frontier model in which the resultant functional form is such that all random parameters are functions

of the same single random effect, vi, and the vi appears in squared form in the equation as well. In

generic terms, this model is a random parameters stochastic frontier model with random constant

term and first order terms, and nonrandom second order terms in a translog model. The functional

form is

, , , ,1 1 1

212

,

2

2

log ln ln ln

( )

~ [0,1]

~ [0, ]

| [0, ] |

K K K

it i k i it k km it k it m it itk k m

i i i

k i k k i

i

it v

it u

y x x x v u

w w

w

w N

v N

u N

This model is specified simply by creating the necessary variables, then building a random

parameters model with the two additional specifications,

; Common ; Mgt

The ; Common specification alone is generic, and applies to all random parameters models. Use it

to specify that the same random component appears in all random parameters. The ; Mgt

specification has no function outside the frontier model. It is used only with the frontier model to

specify this particular form. For example, consider the following three factor translog model:

FRONTIER ; Lhs = yit ; Rhs = one,x1,x2,x3,x11,x12,x13,x22,x23,x33 $

FRONTIER ; Lhs = yit ; Rhs = one,x1,x2,x3,x11,x12,x13,x22,x23,x33

; RPM ; Pds = the panel specification ; Halton

; Fcn = one(n),x1(n),x2(n),x3(n)

; Common ; Mgt $

(It is always necessary to fit the frontier model with fixed parameters first to generate the starting

values.)


An extension of this model that the authors considered was intended to ameliorate the

probable correlation between the random effect wi and the independent variables (factors). The

Mundlak approach to this problem is to incorporate the group means of the variables in the model.

For this model, they proposed

log K

i k i,k ik=1w = τ x + f

where fi is now the structural random variable that drives the random parameters. This extension is

requested with

; Means

(The program deduces internally which variables are nonconstant and should be used.)

Application

The following is the Alvarez, Arias and Greene application. The data consists of six years of

observations on 247 Spanish dairy farms. The output, yit is milk production. The four inputs, x1, x2,

x3 and x4 are feed, land, labor and cows. Commands for fitting the model are as follows: (We have

restricted the number of iterations and the number of replications for purpose of this numerical

illustration.) Both models (with and without the Mundlak adjustment) are shown.

FRONTIER ; Lhs = yit

; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44 ; Par $


; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44

; RPM ; Halton ; Pts = 25 ; Pds = 6 ; Maxit = 25 ; Common ; Mgt

; Fcn = one(n),x1(n),x2(n),x3(n),x4(n) $


; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44 ; Par $


; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44

; RPM ; Halton ; Pts = 25 ; Pds = 6 ; Maxit = 25

; Common ; Mgt ; Means

; Fcn = one(n),x1(n),x2(n),x3(n),x4(n) $

The first set of results is the pooled stochastic frontier model with no extensions or

modifications.


-----------------------------------------------------------------------------


Dependent variable YIT





Sigma(v) = .09359

Sigma(u) = .16825

Sigma = Sqr[(s^2(u)+s^2(v)]= .19253


Var[u]/{Var[u]+Var[v]} = .54012









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------


YIT| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| 11.6942*** .00529 2209.86 .0000 11.6838 11.7046

X1| .60483*** .02133 28.35 .0000 .56302 .64664

X2| .02246** .01140 1.97 .0489 .00011 .04480

X3| .02336* .01245 1.88 .0606 -.00104 .04776

X4| .44945*** .01172 38.34 .0000 .42647 .47242

X11| .59297*** .13525 4.38 .0000 .32789 .85806

X12| -.17183*** .04842 -3.55 .0004 -.26673 -.07693

X13| .20033*** .06903 2.90 .0037 .06502 .33563

X14| -.32993*** .07299 -4.52 .0000 -.47297 -.18688

X23| .00386 .04203 .09 .9268 -.07852 .08624

X24| .06473** .03009 2.15 .0314 .00576 .12369

X34| -.07096* .03853 -1.84 .0655 -.14648 .00455

X44| .20854*** .04328 4.82 .0000 .12373 .29336


Lambda| 1.79780*** .10292 17.47 .0000 1.59608 1.99951

Sigma| .19253*** .00011 1715.95 .0000 .19231 .19275

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


This is the fixed management model without the Mundlak correction.

+---------------------------------------------+

| Random Coefficients Frontier Model |

| Dependent variable YIT |

| Log likelihood function 1327.58807 |

| Estimation based on N = 1482, K = 21 |

| Sample is 6 pds and 247 individuals |

+---------------------------------------------+

-----------------------------------------------------------------------------

All parameters have the same random effect

Alvarez/Arias/Greene Fixed Mgt. SF Model



Sigma( u) (1 sided) = .09355


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


X11| .19550** .08392 2.33 .0198 .03101 .35999

X12| -.00410 .02903 -.14 .8876 -.06100 .05279

X13| -.03972 .04116 -.96 .3346 -.12039 .04095

X14| -.08681** .04220 -2.06 .0397 -.16952 -.00410

X23| .02377 .02534 .94 .3483 -.02590 .07344

X24| -.01893 .01743 -1.09 .2775 -.05310 .01524

X34| .02550 .02305 1.11 .2684 -.01967 .07067

X44| .09988*** .02339 4.27 .0000 .05403 .14572


Constant| 11.6506*** .00445 2620.80 .0000 11.6418 11.6593

X1| .65048*** .01227 53.03 .0000 .62643 .67452

X2| .03525*** .00681 5.17 .0000 .02190 .04861

X3| .04531*** .00759 5.97 .0000 .03043 .06019

X4| .40147*** .00646 62.16 .0000 .38881 .41413

|Coefficients on unobservable fixed management

Constant| .12579*** .00238 52.96 .0000 .12114 .13045

X1| -.02248* .01218 -1.85 .0649 -.04635 .00139

X2| .00767 .00851 .90 .3676 -.00902 .02436

X3| .00794 .00939 .85 .3979 -.01047 .02635

X4| -.00967 .00657 -1.47 .1410 -.02255 .00320

Alpha_mm| -.02835*** .00414 -6.85 .0000 -.03646 -.02024


Sigma| .11007*** .00289 38.04 .0000 .10439 .11574


Lambda| 1.61332*** .11959 13.49 .0000 1.37893 1.84771

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


+---------------------------------------------+

| Random Coefficients Frontier Model |

| Dependent variable YIT |

| Log likelihood function 1273.63070 |

| Sample is 6 pds and 247 individuals |

+---------------------------------------------+

-----------------------------------------------------------------------------

All parameters have the same random effect

Alvarez/Arias/Greene Fixed Mgt. SF Model



Sigma( u) (1 sided) = .12577


--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


X11| -.06957 .08521 -.82 .4142 -.23658 .09743

X12| .00164 .02989 .05 .9562 -.05693 .06022

X13| .31592*** .04339 7.28 .0000 .23087 .40097

X14| -.08946* .04767 -1.88 .0606 -.18289 .00398

X23| -.02088 .02784 -.75 .4533 -.07545 .03369

X24| -.04357** .01912 -2.28 .0227 -.08103 -.00610

X34| -.15581*** .02350 -6.63 .0000 -.20187 -.10975

X44| .16310*** .02763 5.90 .0000 .10895 .21725


Constant| 11.6829*** .00449 2601.72 .0000 11.6741 11.6917

X1| .60260*** .02198 27.41 .0000 .55951 .64569

X2| .05221*** .01636 3.19 .0014 .02015 .08427

X3| .10728*** .02775 3.87 .0001 .05290 .16166

X4| .39780*** .01047 38.00 .0000 .37728 .41832

|Coefficients on unobservable fixed management

Constant| .11398*** .00235 48.52 .0000 .10937 .11858

X1| -.05393*** .01134 -4.76 .0000 -.07616 -.03171

X2| .03061*** .00916 3.34 .0008 .01265 .04857

X3| .01309 .01202 1.09 .2760 -.01046 .03665

X4| .01621** .00707 2.29 .0218 .00236 .03007

Alpha_mm| -.03575*** .00368 -9.72 .0000 -.04296 -.02855


Sigma| .13678*** .00368 37.19 .0000 .12957 .14399


Lambda| 2.33925*** .14491 16.14 .0000 2.05524 2.62326

|Variable Means in Unobserved Management

X1_bar| -.12466 .22073 -.56 .5722 -.55728 .30796

X2_bar| .00045 .15758 .00 .9977 -.30839 .30930

X3_bar| .01632 .25437 .06 .9489 -.48224 .51487

X4_bar| .15107 .11332 1.33 .1825 -.07102 .37316

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


E64.11 Latent Class Stochastic Frontier Models

The latent class framework discussed in Chapter E20 is available for the stochastic frontier

model. The structural equations of the basic model are

yit | j = jxit + vit - uit,

vi | j = N[0, vj2]

ui | j = | N[uj2] |

where „j‟ indicates class j. The truncation and heteroscedasticity models are not supported by this

estimator. However, the Battese and Coelli model, in which

uit | j = g(zit)| j |Ui|

is available for both forms of g(zit).

The estimation command for the latent class stochastic frontier model is


; Rhs = one, remaining variables ; Parameters $


; Rhs = one, remaining variables

; Pds = fixed periods or count variable

; LCM ; Pts = number of classes (2, 3, ..., 9) $

(As in other panel data settings, it is necessary to fit the pooled model first to compute the starting

values.)

The Battese and Coelli models may be specified here with

; Model = BC

for the decay model and

; Model = BC

; Hfu = one, heteroscedasticity variables

For this model, you must fit the identical Battese and Coelli model without the latent class

specification first. The application below demonstrates.

The basic form of the latent class model assumes that the class probabilities are fixed values.

You may make them dependent on time invariant variables, wi with

; LCM = list of variables in w

Do not include one in the list.


Some particular variables computed for the latent class model are

; Group = the index of the most likely latent class

; Cprob = estimated probability for the most likely latent class

You can obtain a listing of these two results by using

; List

An example appears below. You can also use the ; Rst = list option to structure the latent class

model so that different variables appear in different classes or that certain coefficients are equal

across classes. Examples are given in Chapter E20.

Estimates retained by this model include:

Matrices: b = full parameter vector, [111, 2,22, ... F1,...,FJ]

varb = full covariance matrix

beta_i = individual specific parameters, if ; Par is requested

Note that b and varb involve J(K+2) estimates. Two additional matrices are created,

b_class = a JK matrix with each row equal to the corresponding j

class_pr = a J1 vector containing the estimated class probabilities

Scalars: kreg = number of variables in Rhs list

nreg = total number of observations used for estimation

logl = maximized value of the log likelihood function

exitcode = exit status of the estimation procedure

Standard Model Specifications for the Latent Class Stochastic Frontier Model



; Par keeps individual specific parameter estimates.

; Partial Effects displays marginal effects, same as ; Marginal Effects.

; OLS displays least squares starting values when (and if) they are computed.




same as ; Printvc.

; Robust requests a „sandwich‟ estimator or robust covariance matrix for TSCS and

several discrete choice models.















; Fill fills missing values (outside estimating sample) for fitted values.






Application

The airline data used in the preceding examples are clearly not compatible with this model;

no configuration of the equation produces meaningful results. To illustrate the estimator, we have

borrowed the Spanish dairy data used in the previous section. The following commands fit a two

class, Battese and Coelli decay model.

NAMELIST ; x = one,x1,x2,x3,x4 $

FRONTIER ; Lhs = yit ; Rhs = x

; Model = BC

; Pds = 6 $

FRONTIER ; Lhs = yit ; Rhs = x

; Model = BC

; LCM ; Pts = 2 ; Pds = 6 ; List $


These are the initial results from the first command.

-----------------------------------------------------------------------------








Sigma(v) = .07413

Sigma(u) = .19848

Sigma = Sqr[(s^2(u)+s^2(v)]= .21187


Var[u]/{Var[u]+Var[v]} = .72263



Time dependent uit=exp[-eta(t-T)]*|U(i)|








Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------


Constant| 11.7882*** .00716 1646.05 .0000 11.7742 11.8022

X1| .62230*** .01365 45.59 .0000 .59555 .64905

X2| .06001*** .01069 5.61 .0000 .03905 .08096

X3| .05708*** .01454 3.93 .0001 .02858 .08557

X4| .35510*** .00700 50.69 .0000 .34137 .36883


Lambda| 2.67761*** .02351 113.88 .0000 2.63152 2.72369

Sigma(u)| .19848*** .00060 332.72 .0000 .19731 .19965


Eta| .08030*** .00432 18.60 .0000 .07184 .08877

--------+--------------------------------------------------------------------


Warning 141: Iterations:current or start estimate of sigma is nonpositive

Normal exit from iterations. Exit status=0.

-----------------------------------------------------------------------------

Latent Class / Panel Frontier Model




Sample is 6 pds and 247 individuals

Stoch. frontier (B&C,time varying U)

Ineff=u(i,t)=exp(-eta*(t-T))|U(i)|

Model fit with 2 latent classes.

--------+--------------------------------------------------------------------



--------+--------------------------------------------------------------------

|Model parameters for latent class 1

Constant| 11.8355*** .02201 537.84 .0000 11.7923 11.8786

X1| .60324*** .03499 17.24 .0000 .53467 .67181

X2| .13327*** .04014 3.32 .0009 .05459 .21195

X3| .10581*** .03248 3.26 .0011 .04216 .16947

X4| .33560*** .01392 24.11 .0000 .30832 .36288

|Square root of variance sum, sqr(s2u + s2v)

Sigma| .71161** .35935 1.98 .0477 .00730 1.41591

|Asymmetry parameter in compound distn, su/sv

Lambda| .02071 .02565 .81 .4194 -.02956 .07098

|Scale factor in time varying inefficiency

Eta| .19551*** .01986 9.84 .0000 .15658 .23444

|Model parameters for latent class 2

Constant| 11.7611*** .01279 919.62 .0000 11.7360 11.7862

X1| .61866*** .01873 33.04 .0000 .58196 .65536

X2| .05041*** .01289 3.91 .0001 .02514 .07567

X3| .06232*** .01830 3.40 .0007 .02645 .09820

X4| .30614*** .01029 29.76 .0000 .28598 .32631

|Square root of variance sum, sqr(s2u + s2v)

Sigma| .92839*** .02938 31.60 .0000 .87081 .98597

|Asymmetry parameter in compound distn, su/sv

Lambda| .05084 .22185 .23 .8187 -.38398 .48566

|Scale factor in time varying inefficiency

Eta| .07059*** .00475 14.87 .0000 .06129 .07990

|Estimated prior probabilities for class membership

Class1Pr| .30612*** .05178 5.91 .0000 .20463 .40760

Class2Pr| .69388*** .05178 13.40 .0000 .59240 .79537

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------


+---------------------------------------------------+

| Stochastic Frontier Model Variance Parameters |

| Class Lambda Sigma Sigma(u) Sigma(v) |

| 1 .020709 .711607 .014734 .711454 |

| 2 .050840 .928393 .047139 .927195 |

+---------------------------------------------------+

=============================================================================

Predictions computed for the group with the largest posterior probability

Obs. Periods Estimated inefficiencies, E[u|v -/+ u]

=============================================================================

Ind.= 1 J* = 1 P(j)= .889 .111

01-06 .3105 .2554 .2100 .1727 .1421 .1168

Ind.= 2 J* = 2 P(j)= .295 .705

01-06 .0813 .0757 .0706 .0657 .0613 .0571

Ind.= 3 J* = 2 P(j)= .012 .988

01-06 .2254 .2100 .1957 .1824 .1699 .1584

Ind.= 4 J* = 1 P(j)= .955 .045

01-06 .1778 .1463 .1203 .0989 .0814 .0669

Ind.= 5 J* = 1 P(j)= .650 .350

01-06 .2453 .2018 .1659 .1365 .1122 .0923

Ind.= 6 J* = 2 P(j)= .138 .862

01-06 .0517 .0482 .0449 .0418 .0390 .0363

Ind.= 7 J* = 1 P(j)= .985 .015

01-06 .3010 .2476 .2036 .1674 .1377 .1132

Ind.= 8 J* = 2 P(j)= .165 .835

01-06 .0561 .0523 .0487 .0454 .0423 .0394

Ind.= 9 J* = 2 P(j)= .450 .550

01-06 .0134 .0125 .0116 .0108 .0101 .0094

Ind.= 10 J* = 1 P(j)= .999 .001

01-06 .1039 .0855 .0703 .0578 .0475 .0391

(Farms 11-247 omitted)


E65: Data Envelopment Analysis E65.1 Introduction

There are two broad paradigms used by researchers to analyze efficiency in production,

stochastic frontier analysis (SFA) and data envelopment analysis (DEA). No formulation has yet

been devised that unifies SFA and DEA in a single analytical framework. Arguably, the former is a

fully parameterized model whereas the latter is „nonparametric,‟ albeit also atheoretical in nature.

DEA is currently the conventional approach to deterministic frontier estimation. This is usually

handled with linear programming techniques. The analysis assumes that there is a frontier

technology (in the same spirit as the stochastic frontier production model) that can be described by a

piecewise linear hull that envelopes the observed outcomes. Some (efficient) observations will be on

the frontier while other (inefficient) individuals will be inside. The technique produces a

deterministic frontier that is generated by the observed data, so by construction, some individuals are

„efficient.‟ This is one of the fundamental differences between DEA and SFA. This chapter presents

LIMDEP‟s programs for data envelopment analysis (DEA).

E65.2 Data Envelopment Analysis

Stochastic frontier modeling is based on maximum likelihood or other classical or Bayesian,

parametric econometric techniques. In contrast, DEA is based on nonparametric, linear programming

methods. Both paradigms are based on an underlying construct of the efficient production frontier that

relates maximal output to inputs for the „firm‟ (decision making unit, or DMU). Using SFA methods,

the analyst defines, then estimates a continuous, regular relationship that defines the frontier. DEA

uses linear programming methods to fit a piecewise linear „hull‟ around the data, under the assumption

that the hull adequately approximates the underlying frontier, the more so as the number of

observations increases. (Since the technique is nonstatistical, this is difficult to establish analytically.)

There is a vast literature on the two techniques and comparisons, none of which will be reviewed here.

Our purpose here is only to document the estimator. We recommend, as a departure point in the

literature, a working paper by Coelli (1996a), which describes the techniques documented here and

introduces some of the theoretical notions. He also provides several useful citations.

E65.2.1 Input and Output Oriented Efficiency

The discussion of DEA efficiency measurement begins with the notion of a measure of the

ratio of outputs to inputs for firm „i,‟

Ratioi = yi / xi, i = 1,..,N,

where yi is the vector of M outputs and xi is the vector of K inputs. The optimal weights are defined

by the programming problem,

Maximize wrt ,: yi / xi

Subject to ys / xs < 1, s = 1,...,N

m > 0, m = 1,...,M

k > 0, k = 1,...,K


The optimization program seeks the optimal weights to maximize the „efficiency‟ of firm s subject to

the restriction that the efficiencies of all firms are less than or equal to one, and that all weights are

nonnegative. Because the objective function is homogeneous of degree zero – any multiple of the

weights produces the same solution – it is normalized with a restriction such as xi = 1.

Transforming and simplifying the problem a bit produces the equivalent program,

Maximize wrt ,: yi

Subject to xi = 1

ys - xs < 0, s = 1,...,N

> 0

> 0

An equivalent form of the problem is the envelopment form (hence the name),

Minimize wrt i, : i

Subject to s sys – yi > 0

i xi - sxs > 0

s > 0.

The value of i is the input oriented technical efficiency score for the ith firm

TEINPUT,i = i.

It measures the extent to which the firm could reduce inputs to obtain the same output – relative to

other firms in the sample. Note that the program is solved for each firm in the sample – an efficiency

score i is generated for each firm. For some firms in the sample, the efficiency score will be 1.0.

This indicates firms deemed to be technically efficient. Otherwise, i < 1.

The preceding formulation includes an implicit assumption of constant returns to scale

(CRS). The assumption is relaxed to variable returns to scale (VRS), by adding a restriction

s s = 1.

Variable returns to scale is the standard assumption in contemporary applications. This provides a

means by which the „scale efficiency‟ of the firm can be measured. Let iC denote the technical

efficiency measure obtained assuming constant returns and iV be the variable returns to scale

counterpart. Then, the „scale efficiency‟ may be measured by

SEi = iC / iV.

This can be computed using the results of the two different programs after computation. A

„nonincreasing returns to scale‟ (NRS) version of the program can be obtained by changing the adding

up restriction to

s s < 1.


An alternative view of the optimization process is to consider the extent to which outputs

could conceivably be increased using the same inputs – again relative to the standard of other firms

in the sample. The linear program which produces this solution is

Maximize wrt i, : i

Subject to s sys – i yi > 0

xi - sxs > 0

s > 0.

Once again, this assumes constant returns to scale. The variable returns to scale form is obtained by

adding the constraint ss = 1. In this solution, 1 < i < ∞. The technical efficiency measure is

0 < TEOUTPUT,i = 1/i < 1

As before, some firms in the sample (the same firms) will be found to be technically efficient by this

output oriented efficiency measure.

E65.2.2 Economic and Allocative Efficiency

With input price information, wi, (and assuming cost minimization) a cost minimization

program to find the optimal inputs given the input prices is

Minimize wrt i, : wi i

Subject to s sys – yi > 0

i - sxs > 0

s > 0.

As before, to allow for variable returns to scale (VRS), we add s s = 1. In this program, i gives the

cost minimizing vector of inputs for output yi and input prices wi. The cost efficiency for the ith firm is

then the ratio

0 < CEi = wii / wixi < 1.

Allocative efficiency may be measured using

0 < AEi = CEi / TEINPUT,i < 1.

E65.2.3 Solutions to the Optimization Problems

We note briefly the mathematical form of LIMDEP‟s solutions to the linear programs above.

The programming problem is defined in terms of

Activity vector, = the solution vector

Coefficient vector, c so that the objective function is c

Constraint matrix, A

Lower and upper limits for constraints, bL and bU

Lower and upper limits for activities, dL and dU


The linear program solution, in general is, then,

Optimize wrt : c

Subject to bL < A < bU

dL < < dU.

We will define the components for the three programs defined earlier. Note, first, for convenience,

we define the data matrices, Y and X. Y is an NM matrix of outputs whose ith row is the vector of

outputs for firm i; X is the NK matrix of inputs, defined likewise. For an individual firm, we define

yi to the M1 column vector of outputs for firm i; thus, yi is the transpose of the ith row of Y.

Likewise, xi is the column vector of K inputs for firm i, the transpose of the ith row of X. Finally,

the column vector of weights is = (1,...,N). Thus,

s s ys = Y and s s xs = X.

Finally, we note once again, the programs about to be defined are solved for each firm to obtain the

efficiency scores. (In fact, should be indexed by firm, since it is recomputed each time. For

convenience, we have omitted this subscript.) We use the symbol ∞K and ∞M to indicate a vector

whose each element equals infinity (or sometimes minus infinity) and boldface 1 or 0 to indicate a

vector of ones or zeros with a subscript to indicate the number of elements. Finally, our tableaus

include the VRS restriction, which may be suppressed by the user for the CRS form.

With all this in place, we can define the solutions to the optimization problems just by

identifying the components of the linear programming problems. These are as follows:

Input Oriented Technical Efficiency

= , = , = , =0 1 1

- -

= , = , b =

1 0 1

N N N

L U

i

K i K

L i M U M

N

0 0 1d c d

X x 0

b y A Y 0

1

Output Oriented Technical Efficiency

, , , 1 1

, - ,

1 0 1

N N N

L U

i

K K i

L M i U M

N

0 0 1d c d

X 0 x

b 0 A Y y b

1


Allocative Efficiency

, , ,

-

- , ,

1 1

NN N

L U

i iK

K K K

L i M K U M

N K

00 1d c d

w0

X I 0

b y A Y 0 b

1 0

One final note, DEA requires a fair amount of computation. The linear program involves

M+K+1 constraints and N+1 activities, and it is computed once for each of the N firms in the sample.

The amount of computation increases with the square of N. The particular computations are quite

fast, however

E65.3 Confidence Limits for Efficiency Scores

A major shortcoming of the DEA approach to modeling production is the absence of a

statistical underpinning. One approach that has been used to try to produce some statistical

characterization of the estimator is to use bootstrapping to obtain confidence limits for the estimated

efficiency scores. A popular method used is that of Simar and Wilson (1998). In brief, their method

amounts to the following: We have in hand for each firm a i estimated using the linear program

defined above. To carry out the bootstrap, we use the following experiment. The data on xm for all

firms, including this one, are proportionally scaled using a randomly generated (see their paper for

the algorithm) scale factor, i/mb for replication b. Then, i,b is recomputed using the revised data,

with the same method. The experiment is repeated B times. The 5th and 95

th percentiles of the B

observations provide the confidence limits. This is repeated B times for each firm. To obtain

bootstrapped confidence use the command syntax described below, with the simple addition of the

request for the number of bootstrap replications.

It should be noted, bootstrapping adds considerably to the amount of computation. In

general, the analysis requires the computation of 2N linear programs, two for each firm, to compute

the input and output oriented efficiency scores, plus one more if input prices are supplied for the

allocative efficiency computation. Bootstrapping adds BN more programs. Each program involves

N+1 activities and K+M+1 constraints, so overall, the amount of computation is considerable.

Nonetheless, each component of each linear program is very fast. In the example below, we have

123 observations. We requested 50 bootstrap replications, so we computed altogether 53123 =

6,519 programs, each with 123 activities. The LP computations plus all the ancillary computations

and the display took altogether only 3.84 seconds on our desktop computer.


E65.4 Command Structure

The command for the data envelopment analysis routine is simply

FRONTIER ; Lhs = output variables

; Rhs = input variables (will never include one)

; Alg = DEA $

The following is the full list of specifications for this command.

The default specification uses the variable returns to scale form. If you wish to use the

constant returns to scale form, add

; CRS

to the command. The nonincreasing returns to scale form (Σi i < 1) is requested with

; NRS

If you wish to analyze input price data, add

; Rh2 = input price variables

The program computes the DEA efficiency scores (input and output oriented, and economic

efficiency), and stores them as variables and as matrices. (See the description in the next section.) If

you would like to see a listing of the scores on your screen, in the output window, add

; List

to the command. The list of „peer‟ firms for each observation (see Section E65.5.1 below) may be

requested by adding

; Peers

to the command. Finally, to obtain bootstrapped confidence limits for the estimator, add

; Nbt = the desired number of replications


E65.5 DEA Results

This estimator by default computes both the input and output oriented technical efficiency

scores. Descriptive statistics for the results are the visible output from the estimator. The following

shows an example, using the sample of 1,482 observations on Spanish dairy farms that was

examined in Section E64. This is a one output, four input process.

FRONTIER ; Lhs = milk

; Rhs = cows,land,labor,feed

; Alg = DEA $

+---------------------------------------------------------------------------+

| Data Envelopment Analysis |

| Output Variables: MILK |

| Input Variables: COWS LAND LABOR FEED |

| Underlying Technology assumes VARIABLE Returns to Scale. |

+---------------------------------------------------------------------------+

| Estimated Efficiencies: Mean Std.Deviation Minimum Maximum |

| Technical Efficiency ======= ============= ======= ======= |

| Input Oriented .8301 .1416 .4823 1.0000 |

| Output Oriented .7388 .1268 .3875 1.0000 |

| Sample Size: 1482 Observations. 1482 Complete observations |

| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E |

| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE |

| Incomplete observations are filled with zeros for efficiency values. |

+---------------------------------------------------------------------------+

As noted, the computed efficiency scores are saved in two places, in the data area, as variables

deaeff_i and deaeff_o and deaeff_e if you provide input prices for the economic efficiency analysis.

The same results are saved as matrices, dea_effo, dea_effi, dea_effe. Note that in both occurrences,

the estimator is bypassing missing and bad (nonpositive) data. If any of the variables used in the

analysis are missing, the observation is assigned an efficiency score of 0.0. The matrices will have

row dimension equal to the original sample size, before the bypass of missing values.

The example below includes a listing of the efficiency scores. The observation identifier

shows I = the sequence number of the observation used in the analysis. The R = value shows,

instead, the actual location of the observation in the raw data set. I will not equal R if you have used

a subset of the data (e.g., with SAMPLE or REJECT), or if the program has bypassed missing data

– the listing will only show the complete observations. If you have included observation labels, e.g.,

firm names, in your data set, these observation and row identifiers will be replaced with the

observation names for your data set.

For a second example, the following analyzes the Christensen and Greene (1976) electricity

generation data. For these data, we have the input prices, so we do the full analysis.

FRONTIER ; Alg = DEA ; List ; Nbt = 50

; Lhs = output

; Rhs = labor,capital,fuel

; Rh2 = lprice,cprice,fprice $


+---------------------------------------------------------------------------+


| Output Variables: OUTPUT |

| Input Variables: LABOR CAPITAL FUEL |

| Price Variables: LPRICE CPRICE FPRICE |


+---------------------------------------------------------------------------+



| Input Oriented .7692 .1390 .3464 1.0000 |

| Output Oriented .7657 .1467 .2960 1.0000 |

| Economic Efficiency .4331 .1965 .1411 1.0000 |

| Allocative Effic. .5473 .1754 .1796 1.0000 |





| Compute allocative efficiency as technical divided by economic efficiency |

+---------------------------------------------------------------------------+

Estimated Efficiency Values for Individual Decision Making Units

(Results are listed only for complete observations)

===============================================================================

Observation | Input Oriented| Output Oriented| Economic | Allocative

Sample Data | Rank Value| Rank Value| Rank Value| Rank Value

================+===============+================+===============+=============

I= 1 R= 1| 1 1.00000| 1 1.00000| 1 1.00000| 1 1.00000

I= 2 R= 2| 13 .98446| 16 .92501| 53 .43644| 87 .44333

I= 3 R= 3| 16 .96243| 28 .88393| 119 .17287| 123 .17962

I= 4 R= 4| 46 .79469| 83 .73593| 96 .29127| 103 .36652

I= 5 R= 5| 115 .57426| 118 .44224| 47 .44703| 15 .77845

I= 6 R= 6| 120 .44307| 122 .35608| 103 .26194| 43 .59120

I= 7 R= 7| 80 .73356| 100 .64826| 101 .26996| 102 .36801

I= 8 R= 8| 123 .34637| 123 .29601| 121 .15388| 85 .44425

I= 9 R= 9| 106 .62517| 110 .57829| 109 .21689| 111 .34692

I= 10 R= 10| 103 .63852| 107 .59578| 66 .38812| 39 .60783

(Remaining observations are omitted.)

----------------------------------------------------------------------------

Results of Bootstrap analysis of technical efficiency. 50 replications

----------------------------------------------------------------------------

Technical Estimated Corrected Standard Confid. Limits

Observation_____ Efficiency Bias Tech.Eff. Deviation Lower Upper

I= 1 R= 1 1.0000 .0000 1.0000 .0000 1.0000 1.0000

I= 2 R= 2 .9845 -.0634 1.0479 .1008 .6583 1.0000

I= 3 R= 3 .9624 -.0898 1.0522 .1391 .5023 1.0000

I= 4 R= 4 .7947 .1091 .6856 .0953 .7222 1.0000

I= 5 R= 5 .5743 .3006 .2737 .1215 .6007 1.0000

I= 6 R= 6 .4431 .4318 .0113 .1246 .5785 1.0000

I= 7 R= 7 .7336 .1086 .6250 .1131 .6609 1.0000

I= 8 R= 8 .3464 .5317 -.1853 .0979 .6977 1.0000

I= 9 R= 9 .6252 .2154 .4097 .1265 .5131 1.0000

I= 10 R= 10 .6385 .2267 .4118 .1062 .6645 1.0000


It is always interesting to compare the DEA results with those obtained using the stochastic

frontier model. The following fits a translog stochastic frontier production function for the

Christensen and Greene data, computes the technical efficiencies, and plots them against the DEA

efficiency scores. As has been widely documented, the results are not so close to each other as one

might hope.

FRONTIER ; Lhs = logq

; Rhs = one,logcap,loglabor,logfuel,

loglsq,logksq,logfsq,logklogl,logklogf,logllogf

; Techeff = tesf $

PLOT ; Lhs = tesf ; Rhs = deaeff_i

; Grid ; Title = DEA Efficiencies vs. Stochastic Frontier JLMS $

Figure E65.1 Comparison of SFA and DEA Efficiency Estimates

E65.5.1 Analysis of Peers

Part of the solution for the technical efficiency is the set of activity multipliers, λi,m for the ith

firm. The vector of N values, λi,m will give the weights that produce the point on the efficient frontier

for this firm. The firms with nonzero values of λi,m – there will typically only be a few or one of them –

will define the „peers‟ for firm i. The listing of the peer firms can be requested by adding ; Peers to the

command. The first few observations for the sample above are shown below.

===============================================================================

Peers - By Firm

===============================================================================

Firm Orient. TechEff Peers

--------------------- ------- ------- --------------------------------------

1 Inputs 1.00000 3 14 101

Outputs 1.00000 1 14 101

2 Inputs .98446 4 71

Outputs .92501 1 71

3 Inputs .96243 3 71

Outputs .88393 1 71

4 Inputs .79469 4 14

Outputs .73593 1 14

5 Inputs .57426 4 71 118

Outputs .44224 1 71


E65.5.2 Application

The following uses all the features of the routine save for the Malmquist TFP computation

and the allocative efficiency routine. The sample data are in an Excel spreadsheet:

IMPORT ; File = … testdea.csv $

FRONTIER ; Lhs = cameras,video,warranty

; Rhs = floor,staff

; Alg = DEA ; CRS

; Peers

; Nbt = 50 $

Figure E65.2 Sample Data for Data Envelopment Analysis

+---------------------------------------------------------------------------+


| Output Variables: CAMERAS VIDEO WARRANTY |

| Input Variables: FLOOR STAFF |

| Underlying Technology assumes CONSTANT Returns to Scale. |

+---------------------------------------------------------------------------+



| Input Oriented .9132 .1270 .6387 1.0000 |

| Output Oriented .9132 .1270 .6387 1.0000 |





+---------------------------------------------------------------------------+


Estimated Efficiency Values for Individual Decision Making Units

===============================================================================

Observation | Input Oriented| Output Oriented| Economic | Allocative

Sample Data | Rank Value| Rank Value| Rank Value| Rank Value

================+===============+================+===============+=============

Bury | 9 .79126| 9 .79126| 0 .00000| 0 .00000

London | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000

Glasgow | 7 .95227| 7 .95227| 0 .00000| 0 .00000

Bath | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000

Chippenham | 11 .63869| 11 .63869| 0 .00000| 0 .00000

Liverpool | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000

Tunbridge | 8 .90635| 8 .90635| 0 .00000| 0 .00000

Leicester | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000

Malmesbury | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000

Kendal | 10 .75714| 10 .75714| 0 .00000| 0 .00000

Bristol | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000

===============================================================================

Peers - By Firm

Firm Orient. TechEff Peers

--------------------- ------- ------- --------------------------------------

1 Bury Inputs .79126 6 11

Outputs .79126 6 11

2 London Inputs 1.00000 2

Outputs 1.00000 2

3 Glasgow Inputs .95227 2 6 11

Outputs .95227 2 6 11

4 Bath Inputs 1.00000 2 4 8 9

Outputs 1.00000 2 4

5 Chippenham Inputs .63869 6 11

Outputs .63869 6 11

6 Liverpool Inputs 1.00000 6 11

Outputs 1.00000 6

7 Tunbridge Inputs .90635 4 8 9

Outputs .90635 4 8 9

8 Leicester Inputs 1.00000 2 8 9

Outputs 1.00000 2 8

9 Malmesbury Inputs 1.00000 4 6 9

Outputs 1.00000 2 6 9

10 Kendal Inputs .75714 2 4

Outputs .75714 2 4

11 Bristol Inputs 1.00000 2 11

Outputs 1.00000 2 11

===============================================================================

----------------------------------------------------------------------------

Results of Bootstrap analysis of technical efficiency. 50 replications

----------------------------------------------------------------------------

Technical Estimated Corrected Standard Confid. Limits

Observation_____ Efficiency Bias Tech.Eff. Deviation Lower Upper

Bury .7913 .0404 .7509 .0374 .7931 .9074

London 1.0000 .0000 1.0000 .0000 1.0000 1.0000

Glasgow .9523 .0353 .9170 .0143 .9570 1.0000

Bath 1.0000 .0000 1.0000 .0000 1.0000 1.0000

Chippenham .6387 .0392 .5995 .0309 .6411 .7293

Liverpool 1.0000 .0000 1.0000 .0000 1.0000 1.0000

Tunbridge .9064 .0630 .8433 .0333 .9138 1.0000

Leicester 1.0000 .0000 1.0000 .0000 1.0000 1.0000

Malmesbury 1.0000 .0000 1.0000 .0000 1.0000 1.0000

Kendal .7571 .0389 .7183 .0551 .7614 .9307

Bristol 1.0000 .0000 1.0000 .0000 1.0000 1.0000

----------------------------------------------------------------------------


E65.6 Comparing Efficiency Values and Rankings – SFA vs. DEA

In many settings, the efficiency ratings themselves are less interesting than the ranks of the

observations. The WHO study used in numerous examples throughout this chapter is an example, in

which the objective of the efficiency analysis was to rank the countries in terms of their measured

efficiency. A perennial question in the efficiency analysis literature focuses on whether one obtains the

same qualitative results with the two methodologies. We return to the WHO data to provide an

illustration.

The data used are the country means of the output, dale, and two inputs, health expenditure, hexp,

and education, educ. After the raw data are input, we use the following

SAMPLE ; All $

REJECT ; Small > 0 $

CREATE ; dalebar = Group Mean(dale, Str = country) $

CREATE ; hexpbar = Group Mean(hexp, Str = country) $

CREATE ; educbar = Group Mean(educ, Str = country) $

REJECT ; year # 1997 $

CREATE ; logdbar = Log(dalebar) $

CREATE ; loghbar = Log(hexpbar) $

CREATE ; logebar = Log(educbar) $

FRONTIER ; Lhs = logdbar ; Rhs = one,loghbar,logebar ; Techeff = effsfa $

FRONTIER ; Lhs = dalebar ; Rhs = hexpbar,educbar ; Alg = DEA$

DSTAT ; Rhs = effsfa,deaeff_i,deaeff_o ; Output = 2 $

PLOT ; Lhs = effsfa ; Rhs = deaeff_i ; Grid

; Title = SFA Efficiencies vs. DEA Input Efficiencies $

PLOT ; Lhs = effsfa ; Rhs = deaeff_o ; Limits=.4,1.1 ; Grid

; Title = SFA Efficiencies vs. DEA Output Efficiencies $

CREATE ; sfarank = Rnk(effsfa) $

CREATE ; dearanki = Rnk(deaeff_i) $

CREATE ; dearanko = Rnk(deaeff_o) $

CALC ; List ; Rkc(sfarank,dearanki)

; Rkc(sfarank,dearanko)

; Rkc(dearanki,dearanko) $

PLOT ; Lhs = sfarank ; Rhs = dearanki

; Endpoints = 0,200 ; Limits = 0,200 ; Grid

; Title = Ranks of SFA Efficiencies vs. DEA Input Efficiencies $

PLOT ; Lhs = sfarank ; Rhs = dearanko

; Endpoints = 0,200 ; Limits = 0,200 ; Grid

; Title = Ranks of SFA Efficiencies vs. DEA Output Efficiencies $



-----------------------------------------------------------------------------


Dependent variable LOGDBAR



Inf.Cr.AIC = -256.8 AIC/N = -1.344



Sigma(v) = .03744

Sigma(u) = .20989

Sigma = Sqr[(s^2(u)+s^2(v)]= .21320


Var[u]/{Var[u]+Var[v]} = .91947









Kodde-Palm C*: 95%: 2.706, 99%: 5.412

--------+--------------------------------------------------------------------


LOGDBAR| Coefficient Error z |z|>Z* Interval

--------+--------------------------------------------------------------------


Constant| 3.57889*** .04980 71.87 .0000 3.48129 3.67649

LOGHBAR| .06480*** .00824 7.86 .0000 .04864 .08096

LOGEBAR| .15292*** .01852 8.26 .0000 .11662 .18923


Lambda| 5.60534*** 1.46657 3.82 .0001 2.73091 8.47977

Sigma| .21320*** .00101 211.97 .0000 .21123 .21517

--------+--------------------------------------------------------------------


-----------------------------------------------------------------------------

+---------------------------------------------------------------------------+


| Output Variables: DALEBAR |

| Input Variables: HEXPBAR EDUCBAR |


+---------------------------------------------------------------------------+



| Input Oriented .6138 .2089 .2059 1.0000 |

| Output Oriented .8794 .1124 .5061 1.0000 |





+---------------------------------------------------------------------------+

DSTAT ; Rhs = effsfa,deaeff_i,deaeff_o ; Output = 2 $



--------+---------------------------------------------------------------------


--------+---------------------------------------------------------------------

EFFSFA| .882053 .059219 .801579 .982272 191 0

DEAEFF_I| .613836 .208905 .205870 1.0 191 0

DEAEFF_O| .879363 .112447 .506133 1.0 191 0

--------+---------------------------------------------------------------------

--------+--------------------------

Cor.Mat.| EFFSFA DEAEFF_I DEAEFF_O

--------+--------------------------

EFFSFA| 1.00000 .70610 .75911

DEAEFF_I| .70610 1.00000 .72559

DEAEFF_O| .75911 .72559 1.00000

Figure E65.3 Plot of SFA Efficiency Values vs. DEA Values


Figure E65.4 Plot of Ranks of SFA Efficiency Scores vs. Ranks of DEA Scores


E65.7 Malmquist Index of Total Factor Productivity

(Once again, the user is referred to the relevant literature, such as the numerous papers by Fare

and Grosskopf) for background details. Fare‟s 1994 output based Malmquist productivity change may be

written

O

( 1 ) ( 1 1)( 1)

( ) ( 1)

i i

i,

i i

TE t + | t ×TE t + | t +M t,t + =

TE t | t ×TE t | t +

where TE(r|s) indicates the earlier defined output oriented technical efficiency index for firm i, using

inputs xi,r and producing outputs yi,r relative to production (and input usage) for firms based in period s.

This index is computed using the following program:

, , , 0 1

, , -

N N N

L U

ir

s KK i

L U

s irM M

0 0 1d c d

X 0 xb A b

Y y0

This uses the constant returns to scale form. Also, since the period r output and input vectors for firm i

will not appear in Ys and Xs when r does not equal s, ir need not be larger than one. Note that this

requires solution of four linear programs for each firm in each period, so the total number of programs to

solve will be 4NT. Each is quite fast, so overall, the computations do not take long. In the sample of

247 firms and six periods, the nearly 6,000 programs, each involving 248 activities and six constraints,

took about 10 seconds.

These computations are carried out for each firm in each period save the last one, and produce an

NT matrix of TFP values, one row for each firm, one column for each period. The TFP value for the last

period is recorded as 1.0, though this is just a space filler.

To compute the Malmquist TFP indices, you will require a panel of data, at least two periods, for

each of N firms. Unlike other panel data routines in LIMDEP, this computation always requires a

balanced panel. Every firm must be observed in the same T periods. Also, this routine has no procedures

for avoiding missing or invalid data such as zero values for inputs or outputs. The balanced panel must be

„clean‟ before computation begins. To request the computations, just add

; Pds = t, the fixed number of periods.

Nothing else need be changed. There is no bootstrap feature (; Nbt = 0); the computations assume

constant returns to scale (; CRS is the default and cannot be changed) and no allocative efficiency (; Rh2

is ignored).


Malmquist TFP Index Application

To illustrate the Malmquist computations, we reexamine the sample of 247 Spanish dairy farms

observed for six years. The output is milk production. Inputs are cows, land, labor and feed.

FRONTIER ; Lhs = milk

; Rhs = cows,land,labor,feed

; Alg = DEA ; Pds = 6

; List $

The following results are displayed. In addition, a matrix containing the full table, named malmquist, is

created.

==============================================================================

Malmquist TFP Index for Productivity Change

Panel contained 247 firms each observed in 6 periods

Full Results saved as matrix MALMQIST

==============================================================================

Average results across firms, by period:

==============================================================================

Period: 1 2 3 4 5

TFP 1.0476 1.0233 1.0247 1.0298 1.0349

==============================================================================

Individual calculations by firm

(Only 8 periods can be displayed. TFP for the final period is not computed.)

==============================================================================

Observation 1 2 3 4 5 6 7 8

Firm = 1 1.1301 1.1002 .9736 1.0291 1.0901 1.

Firm = 2 1.0528 1.0343 1.0212 1.0109 1.0416 1.

Firm = 3 1.0525 1.0383 .9477 1.0465 1.0395 1.

Firm = 4 1.1418 1.0129 1.0079 .9829 1.0476 1.

Firm = 5 1.1192 1.0240 1.0082 1.0245 1.0641 1.

Firm = 6 .9871 1.0073 .9785 1.0322 1.0464 1.

Firm = 7 .9851 1.1484 1.1599 .8054 1.1110 1.

Firm = 8 1.0746 .9796 .9636 1.0671 .9753 1.

Firm = 9 .8977 1.1496 .9818 1.0500 .9867 1.

Firm = 10 1.0105 1.1507 .9751 1.0055 1.0469 1.

Firm = 11 1.1276 .9867 .9636 1.0826 .9873 1.

Firm = 12 1.0310 1.1020 .9822 1.0438 .9914 1.

Firm = 13 1.0549 1.1263 .9221 1.0723 1.1945 1.

Firm = 14 .9408 1.0740 .9938 .9739 1.0336 1.

Firm = 15 .8952 .7156 1.5056 .8614 .9204 1.

(Rows 66 – 247 omitted).

Documents

STOCHASTIC FRONTIER.pdf