25
Journal of Statistical Computation and Simulation Vol. 75, No. 4. April 2005. 263-286 Least absolute value regression: recent contributions TERRY E. DIELMAN* M. J. Neeley School ot" Business, TCU, RO. Box 298530, Fort Worth, TX 76129, USA (Revised 7 October 2003: in final form 21 March 2004} This article provides a review of research involving least absolute value (LAV) regression. The review is concentrated primarily on research publisbed since Ihe sur\'ey article by Dielman (Dielman, T. E. (1984). lx"a.sl absolute value estimation in regression mtxlels; An annotated bibliography. Communi- cations ill Statistics - Theory and Methoih. 4. 513-541.) and includes articles on LAV estimation as applied to linear and non-linear regression models and in sysiems of equations. Some topics included are computation of LAV estimates, properties of LAV eslimators and inferences in LAV regression. In addition, recent work in some areas related lo LAV reijression will be discussed. Keywords. Linear regression models; Nonlinear regression models; Systems of equations; /. i -norm regression; Minimum absolute deviation regression; Least absolute deviation regression; Minimum sum of absoiute errors regression 1. Introduction This article provides a review of research on least absolute value (LAV) regression. It includes articles on LAV estimation as applied lo linear and non-linear regression models and in systems of equations. Some references to the LAV method as applied in approximation theory are also included. In addition, recent work in areas related to LAV regression will be discussed. I have attempted to include major contributions to LAV regression not included in Dielman 11 ]. My apologies in advance for any omissions. Additional survey articles on LAV estimation include the annotated bibliography of Dielman [I] as well as survey articles by Dodge [2|. Narula |3] and Pynnonen and Salmi 14], The paper by Dodge 12] served as an introduction to three special issues of Computational Statistics atul Data Analysis (CSDA) entitled "Statistical Data Anaiysis Pro- cedures Based on the Li-Norm and Related Methods' {Computational Statistics and Data Analysis. Volume 5. Number 4. 1987; Volutiie 6. Numbers 3 and 4, 1988). An earlier ver- sion of the paper appears in Statistical Data Analysis Based on the L\-Norm atid Related Methods, Y. Dodge, editor. Amsterdam: North-Holland. 1987. This is a collection of arti- cles from the First International Conference on the L|-Norni and Related Methods held in Neuchatel. Switzerland. I reference a number of the articles in the CSDA collection, but *Fax: 817-257-7227; Email: [email protected] Journal of Stalistical Cotnputatioti and Simulation ISSN 0094-9655 print/ISSN 1563-5163 online © 2005 Taylor & Francis Ltd htip://www.iandf.co.uk/iournals DOI: 10,1080/0094965042000223680

Least absolute value regression: recent contributions

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Least absolute value regression: recent contributions

Journal of Statistical Computation and SimulationVol. 75, No. 4. April 2005. 263-286

Least absolute value regression: recent contributions

TERRY E. DIELMAN*

M. J. Neeley School ot" Business, TCU, RO. Box 298530, Fort Worth, TX 76129, USA

(Revised 7 October 2003: in final form 21 March 2004}

This article provides a review of research involving least absolute value (LAV) regression. The reviewis concentrated primarily on research publisbed since Ihe sur\'ey article by Dielman (Dielman, T. E.(1984). lx"a.sl absolute value estimation in regression mtxlels; An annotated bibliography. Communi-cations ill Statistics - Theory and Methoih. 4. 513-541.) and includes articles on LAV estimation asapplied to linear and non-linear regression models and in sysiems of equations. Some topics includedare computation of LAV estimates, properties of LAV eslimators and inferences in LAV regression.In addition, recent work in some areas related lo LAV reijression will be discussed.

Keywords. Linear regression models; Nonlinear regression models; Systems of equations; /. i -normregression; Minimum absolute deviation regression; Least absolute deviation regression; Minimumsum of absoiute errors regression

1. Introduction

This article provides a review of research on least absolute value (LAV) regression. It includesarticles on LAV estimation as applied lo linear and non-linear regression models and in systemsof equations. Some references to the LAV method as applied in approximation theory are alsoincluded. In addition, recent work in areas related to LAV regression will be discussed. I haveattempted to include major contributions to LAV regression not included in Dielman 11 ]. Myapologies in advance for any omissions.

Additional survey articles on LAV estimation include the annotated bibliographyof Dielman [I] as well as survey articles by Dodge [2|. Narula |3] and Pynnonen andSalmi 14], The paper by Dodge 12] served as an introduction to three special issues ofComputational Statistics atul Data Analysis (CSDA) entitled "Statistical Data Anaiysis Pro-cedures Based on the Li-Norm and Related Methods' {Computational Statistics and DataAnalysis. Volume 5. Number 4. 1987; Volutiie 6. Numbers 3 and 4, 1988). An earlier ver-sion of the paper appears in Statistical Data Analysis Based on the L\-Norm atid RelatedMethods, Y. Dodge, editor. Amsterdam: North-Holland. 1987. This is a collection of arti-cles from the First International Conference on the L|-Norni and Related Methods held inNeuchatel. Switzerland. I reference a number of the articles in the CSDA collection, but

*Fax: 817-257-7227; Email: [email protected]

Journal of Stalistical Cotnputatioti and SimulationISSN 0094-9655 print/ISSN 1563-5163 online © 2005 Taylor & Francis Ltd

htip://www.iandf.co.uk/iournalsDOI: 10,1080/0094965042000223680

Page 2: Least absolute value regression: recent contributions

264 T E. Dielman

not their earlier versions in the conference proceedings since these are essentially repeats.There are three other collections of articles worth mentioning. These collections containselected papers from the Second. Third and Ft)urth International Conferences on the L|-Norm and Related Methods, held in Neuchatei. in 1992. 1997 and 2002, respectively. Thesecollections are published as L\-Slatistical Analysis and Related Methods. Y. Dodge, editor,Amsterdam: North-Holland, 1992; L[-Statistical Procedures atid Related Topics, Y. Dodge,editor. Institute of Mathematical Statistics Lecture Notes - Monograph Series. Volume 31,1997; and Statistical Data Atialysis Based on the L\-Nonn atid Related Methods, Y. Dodge,editor. Birkhauser, 2002. Selected papers from these collections will be referenced in thisarticle.

In addition to survey articles, there are books or chapters in books tbat provide informa-tion on LAV regression. Birkes and Dodge Iref. 5, see Chap. 41. Blootntield and Steiger [6]and Sposito |ref. 7. see Chap. 5] provide technical detail about LAV regression, whereasFarebrother |8 | presents a discussion of the historical deveiopment of LAV and least squares(LS) methods.

The primary emphasis of this article is LAV linear regression. To motivate the discussion,consider the multiple linear regression model

fthXik-^E; f o r / = = 1,2 n. (1)k=\

where v, is the /th value ofthe response variable; .t,*:, the /th observation on the kth explanatoryvariable; ^o- the constant in the equation, fit, the coefficient of the ^th explanatory variableandf, i.s the/th value of the disturbance. Additional assumptions about the model are reservedunlil later sections, The LAV regression involves finding estimates of fi^. fi\. fii fiK,denoted ho,b\,bz,.. ..hn^ that minimize the sum ofthe absolute values ofthe residuals,X!/'=i \yi ~ - "'l' '^here y,- = /?{> -I- Ylk=\ ^k-'^ik represent predicted values.

This problem can be restated as a linear programing problem:

minimize /J(^,^ + d~) (2)(=1

/ ^ \subject to Vi - I 11 + X]''**''^ + ^^^ - (/,' I = 0 for / = 1. 2 n. (3)

where df, d~ > 0, and the bi,. k =()A.2 K are unrestricted in sign. The df and d~are. respectively, the positive and negative deviations (residuals) associated with the /thobservation.

The dtial problem is stated most conveniently as

maximize ^ ( / ? ,y , - v,) (4)

subject to \ \ /-''• '* ~ -"•"' for /: = 1, 2 K, (5)( = 1

where 0 < />, < 2, / = 1.2 n. In the dual formulation, the dual variables are the />,,/ = I. 2 n. See Wagner |9] fora discussion of this form ofthe dual problem.

The LAV regression is also known by several other names, including L| -norm regression,minimum absolute deviation regression, least absolute deviation regression and minimum sumof absolute errors regression.

Page 3: Least absolute value regression: recent contributions

Least absolute vahw 265

2. History and computation of least absolute value regression

Boscovich [ 10,11 ] explicitly discussed minimizing the sum ofthe absolute errors as a criterionfor fitting a line to observation data. This is the first recognized use of the LAV criteria for aregression application and is prior to Legendre's announcement of the principle of LS in 1805.After that announcement, LAV estimation took a secondary role in the solution of regressionproblems, likely due to the uniqueness of LS solutions, the relative computational simplicityof LS and the thorough reformulation and development of the method of LS by Gauss 112-14]and Laplace [15] in terms ofthe theory of probability.

Fiu-ebrother [8] provides a history of the development of LAV and LS procedures for fittinglinear relationships. I highly recommended this book for the reader interested in obtaining aclear perspective of the historical development of these methods. The other published workof Farebrother 116-19] also provides a variety of historical information. The 1993 articlediscusses the work of Boscovich.

Koenker [20] provides an interesting review of historical connections to state-of-the-artaspects of LAV and quantile regression. He discusses Edgeworth's work on computation ofLAV regression and its relationship to the simplex approach as well as Edgeworth's commentthat LAV computations could be made as simple as those of LS. He also relates the work ofBowley [21 ] to that of Rousseeuw and Hubert [22] on regression depth and that of Frisch [23]to interior point methods.

Stigler [24[ includes LAV regression as a part ofthe discussion in his book on statistics priorto I9O{). In addition, Stigler [25] discusses a manuscript fragment that shows that ThomasSimpson and Roger Boscovich met in 1760, and that Boscovicb posed a LAV regressionproblem to Simpson.

Charnes et al. |26] are credited with tirst using tbe simplex method to solve a LAV regressionproblem. They used tbe simplex method to solve the primal linear programing problem directly.It was quickly recognized, however, that computational efficiencies could be gained by takingaccount ofthe special structure ofthe type of problem being solved. Until the 1990s, mostresearch on algorithms/programs to solve the LAV regression problem involved variations ofthe simplex method.

Barrodaleand Roberts [27,281 (BR) provided a veiyefticient algorithm based on the primalformulation ofthe problem. Tbis algorithm is the one used in the lMSL package of subroutines.It was considered to be the fastest algorithm available at the time it was published, and isstill often used as a benchmark today. Armstrong and Kung [29] (AK) specialized the BRalgorithm for simple regression. Bloomfield and Steiger [30] (BS) modified the BR algorithmby employing a steepest edge criterion to determine a pivot.

Armstrong cr«/. [3! [ (AFK) used the revised simplex method with LU decomposition ofthebasis matrix to develop a very fast algorithm for multiple regression. Tbe algorithm is similarto that of BR but is more efficient due to its use ofthe LU decomposition for maintaining thecurrent basis and requiring less sorting.

Josvanger and Sposito [32] (JS) presented an algorithm for LAV simple regression thatused a descent approach rather tban linear programing. Many early timing comparisons ofthealgorithms mentioned so far are summarized in Dielman and Pfaffenberger [34j.

Gentle et al. [33] examined the pertbrmance of LAV algorithms for simple and multipleregression. The study used openly available codes. For simple regression, the codes of AK.JS. AFK. BS and Abdelmaiek [351 (A) were compared. The JS program performed well. Inmultiple regression, the AFK program performed well. The BS program performed well inboth cases for smaller sample sizes, but failed to produce correct answers when sample sizewas large (1000 or more in multiple regression).

Page 4: Least absolute value regression: recent contributions

266 T E. Oielman

Gentle et al. [36] examined the performance of LAV algorithms for simple regression.Again, the study used openly available codes. For simple regression, the codes of AK, JS,AFK and BS were compared. The JS program performed well. When there was a perfect fit,(he AFK program outperformed the JS program.

Naiula ('/ (li [37[ performed a liming comparison for the codes of JS, AFK. BS and BR forsimple regression. The JS algorithm performed best when sample size was 3(X) or less, the BSalgorithm when sample size was 750 or more, and the two performed similarly for intermediatesample sizes. The previous four algorithms and the algorithm of AK were compaied when LSestimates were used as starting values in the LAV algorithms. The AFK and BS algorithmsperformed best overall.

Sunwooand Kim [38] (SK) developed an algorithm that used a direct descent method withtbe LS estimate as a starting point. SK showed their algorithm to be faster than hotb AK andJS. (It should be noted tbat the timing results of SK for tbe JS algorithm differed considerablyfrom other published results.) Although the timing comparisons favor the SK algorithm, it isunclear whether tbe computational time involved in finding the LS estimate is included. TheJS and AK algorithms employ advanced starts but not LS. so it is unclear whether the timingcomparisons include LS starts for ali three pR)cedures.

Soliman et al. [39] proposed an algorithm that used LS residuals to identify tbe observationswhose LAV residuals are equal lo zero. In this way. Ihey claimed ihat the LAV regression couldbe determined and tbe resulting computational lime would be faster than algorithms utilizingsimplex solutions. Herce [40[ sbowed tbat the algorithm does not necessarily produce LAVestimates. Christensen et al. [411 proposed a modification of the original algorithm in whichibe original method is implemented only after first discarding observations wiih large LSresiduals. Herce [42] responded, but did not point out the remaining problems witb the revisedalgorithm. Bassett and Koenker [43[ showed that the modified algorithm wouki not produceestimates tbat are necessarily identical or even close to the LAV estimales. and recommendedthat the algorithm not be used for LAV estimation.

Dielman [44| summarized the computational algorithms and timing comparisons for LAVregression including many of those presented so far.

There have been a numher of other algorithms or modifications to algorithms suggestedin tbe literature. Seneta and Steiger [451 suggested a LAV algorithm that is faster than lheBS algorithm when tbe number of parameters is large relative to the number of observations.Farebrother [46] presented a version of the algorithm of Sadovski [471 for LAV simple regres-sion tbat incorporated several improvements overthe original. Farebrother [48[ proposed threevariants of tbis procedure, along witb timing comparisons, and suggestions for improvementsto tbe code of JS. Rech et al. [49] described an algoritbm for fitting the LAV simple regressionline that is based on a labeling tecbnique derived from linear programing. No timing compar-isons were given. Madsen and Nielsen |50| described an algorithm fOr solving tbe linear LAVproblem based on smoothing the non-diffcrentiable LAV function. Numerical tests suggestedthat the algorithm might be superior to the BR code. Hong and Cboi |511 proposed a methodof finding Ihe LAV regression coefficient estimates by defining tbe estimates in terms of theconvergent weighted medians ofthe slopes from each data point lo the point that is assumedto be on a predicted regression line. The method is similar to tbat of JS. Sklar |52j providedextensions to available LAV best subset algoritbms. Narula and Wellington [53| provided asingle efficient algorithm to solve both the LAV and tbe Chebychev regression problems, ratberthan using separate algoritbms for each. Planitz and Gates 1541 used a quadratic programingmethod to select the unique best LS solution from the convex set of all besl LAV solutions.They suggested this approach as a solution to cases when a unique LAV solulion does not exist.

Adcock and Meade [55| compared three methods for computing LAV estimates in tbelinear modei: the BR algorithm, the modification of the BR algorithm due to Bloonitield and

Page 5: Least absolute value regression: recent contributions

Lea.st ahsolute valtie 267

Steiger [30] and an iteratively reweighled least squares (IRLS) algorilhiii. They found theIRLS algorithm to be (aster when the number of observations was large relative to the numberof parameters {for example, in a simple regression with more than 1500 observations and ina five-variable multiple regression with more than 5000 observations). This is in contrast toprevious comparisons involving IRLS algorithms.

Portnoy and Koenker (56] surveyed recent developments on the computation of LAVestimates, primarily the interior point algorithms for solving linear programs. A simple pre-processing approach for data is described that together with the use of interior point algorithmsprovides dramatic time improvements in computing LAV estimates. The aulhors note thatsimplex-based algorithms will produce LAV regression solutions in less time than LS forproblems with a few hundred observations, but for very large numbers of observations canbe much slower. The pre-processing of the data involves choosing two subsets of the datasuch that the observations in one subset are known to fall above the optimal LAV plane andthe observations in the other will fall below the plane. Using these subsets as observationseffectively reduces the number of observations in the LAV regression problem and thereforethe time required to produce a solution. The authors t)btain a 10- to 100-fold increase in com-putational speed over current, simplex-based algorithms in large problems (10,000-20{).{)00observations). The authors note a number of avenues for future research that may reline suchan approach. Hopefully, such algorithms might soon be available in commercial software. Seealso Coleman and Li [57], Koenker [58] and Portnoy [59J.

3. Properties of least absolute value regression estimators

Rewrite the model in equation (1) in matrix form as

Y = X/i + e, (6)

where Y is the t\ x I vector of observations on the dependent variable; X, the n x (/T -I- I)matrix of observations on the independent variables. /J. the (A" -f I) x 1 vector of regressioncoefticients to be estimated and £ is the/J x I vectorof disturbances. Assume that the distribu-tion function, F, ofthe disturbances has median zero, that F is continuous and has continuousand positive / at the median. Also assume that {I/»)X'X —»• Q. a positive defmite matrix, as/) —> oo. Under these assumptions, Bassett and Koenker [60] are recognized as the first to pro-vide a proof that -Jn^ ~ (i) converges in distribution to a ^-dimensional Gaussian randomvector with mean 0 and covariance matrix A-Q"', where k-/n is the asymptotic variance ofthe sample median from random samples from distribution F. Koenker and Bassett [611 alsoproved asymptotic normality of Boscovich's estimator (LAV subject to the constraint that themean residual is zero).

Phillips [62] used generalized functions of random variables and generalized Taylor seriesexpansions to provide quick demonstrations of tbe asymptotic theory for the LAV estimator.Assuming the errors are independent and identically distributed (iid) with zero median andprobability density that is positive and analytic at zero and that (l//f)X'X -^ Q, a positivedefinite limit, as n -^ oo, Pbillips proceeds as if the objective function were differentiable. Jus-tification is provided for proceeding with Taylor series expansion ofthe first-order conditionsto arrive at the asymptotic theory.

Pollard [63] provided a direct proof of tbe asymptotic normality for the LAV estimator.The author points out that previous proofs depended on some sort of stochastic equicontinuityargument: they required uniform smallness for the changes in sotne sequence of stochasticprocesses due to small perturbations ofthe parameters. The technique in this artiele depends on

Page 6: Least absolute value regression: recent contributions

268 T. E. Dielman

the convexity property of the criterion function and results in a simpler proof. Pollard provesconvergence in distribution under a variety of assumptions. He assumes that the disturbancesare iid with median zero and a continuous, positive density in a neighborhood of zero andproves convergence when: (I) the independent variables are deterministic; (2) the independentvariables are random and (3) the data generation process is autoregressive (AR) with eitherfinite or infinite variance disturbance distributions.

Wu [64] provides conditions under which LAV estimates are strongly consistent. Thereare a number of different assumptions and conditions under which weak consistency of theLAV coefficient estimator has been proved. Bai and Wu [65[ provide a summary of a varietyof cases. Here is a list of possible assumptions used in various combinations to prove weakconsistency for the regression model:

(Al) The disturbances are independent and come from distribution functions /v each withmedian zero.

(A2) The disturbances are independent and come from a common distribution function Fwith median zero.

(B1) There exist positive constants 6 e {0,\ /2) and ^ > 0 such that for each / = 1.2

< ~S)] > 9.

(B2J The re exist posit ive cons tants 6 and A such that for each / = 1 , 2 , . . . ,

m a x ( / » ( - M < Ei < 0 ) . PiO < Ei < tt)] < Ou forO < H < A .

(B3) There exist posit ive constants , ^i a n d ^ 2 i and A such that for each ( = 1 . 2 . . . . ,

^2l"I 5 Pi(n) ^ ^ i |w | for I»| < A

where

Pi(u} = P{Q < E i < u ) if M > 0 .

P i i u ) = p { i i < Sj < { ) ) i fw < 0 .

(,B4) There exist positive constants B and A such that for each / = 1. 2 , . . . , e, has a densityfi witb fi(u) < 6* for - A < M < A.

There are aiso various sets of conditions on tbe explanatory variables, where /.t is a i-vector.

(C1) .V-' -> 0 where 5« = E L i ^.^,'(C2) inf|^i=,E;^il/^'-^/l = oo(C3) E"il-^/P = oo(C4) j:r=^\xi\ = oo

Chen et al. [66] show that CI is a necessary condition under assumptions A2 and B3. Chenand Wu |67| assume AI and B4 and show that C4 is a necessary condition for consistency.Chen ('/ (ll. [68] show that C3 is a necessary condition for consistency under AI and B2. Baiand Wu [65] assume A1 and BI and show that C2 is a necessary condition for consistency.

Andrews [69] showed that LAV estimators are unbiased if the conditional distribution ofthe vector of errors is symmetric given the matrix of regressors. In certain ca.ses of LAVestimation, there may not be a unique solution, so a tie-breaking rule might be needed toinsure unbiasedness. This rule may take the form of a computational algorithm as discus.sedin Farebrother [701. When disturbance distributions are not symmetric. Withers |7I | providesapproximations for the bias and skewness of the coefficient estimator.

Page 7: Least absolute value regression: recent contributions

Least absolute value 269

Bassett [72[ notes that a well-known property of the LAV estimator is that for a/)-variable linear model; p observations will be fitted exactly. He shows that certain sub-sets of p observations will not be fit by the LAV estimate for any realization of the dependentvariables. This identifies subsets of the data that seem to be unimportant. The author considersthis property of LAV estimation mainly because it seems so strange.

Bai [ 731 developed the asymptotic theory for LAV estimation of a shift in a linear regression.Caner [74] developed the asymptotic theory for the more general threshold model.

He et al. [75] introduced a finite-sample measure of performance of regression estima-tors based on tail behavior. For heavy-tailed error distributions, the measure introduced isessentially the same as the finite-sample concept of breakdown point. The LS, LAV and leastmedian-of-squares estimators are examined using the new measure with results mirroringthose that would be obtained using breakdown point.

Ellis and Morgenthaler [76] introduced a leverage indicator that is appropriate for LAVregression. For the LAV case, the leverage indicator tells us about the breakdown and/orexactness of fit.

Ellis [77] developed a measure of instability that shows tbat LAV estimators are frequentlyunstable. In a comment. Portnoy and Mizera (ref. 78, pp. 344-347) suggest that LAV estimatorsdo not exhibit extreme forms of sensitivity or instability and find fault with the measuredeveloped by Ellis. (Ellis responds on ref. 79, pp. 347-350.)

Dieiman [ 1 ] summarized early small-sample comparisons of efficiency of LAV and LSestimators. The results of these early studies confirmed that the LAV regression estimatoris more efficient than LS when disturbances ai'e heavy-tailed. This fact was later confirmedanalytically as well. The analytic results sbow tbat LAV will be preferred to LS whenever tbemedian is more efficient than the mean as a measure of location.

Pffalfenberger and Dielman [80] used Monte Carlo simulation to compare LS. LAV andridge regression along with a weighted ridge regression estimator (WRID) and an estimatorthat combines ridge and LAV regression (RLAV). The simulation used normal, contaminatednormal. Laplace and Cauchy disturbances. The RLAV estimator performed well for the outlierproducing distributions and high muiticollinearity.

Lind el ul. [81] examined the performance of several estimators when regression dis-turbances are asymmetric. Estimators included LS, LAV and an asymptotically optimalM-estimator, The LAV estimator performs well when the percentage of observations in onetail is not too large and also provides a good starting point for the optimal M-estimator.

McDonald and White [82| used a Monte Carlo simulation to compare LS. LAV and severalother robust and partially adaptive estimators. Disturbances used were normal, contaminatednormal, a bimodal mixture of normals and lognormal. Sample size was 50. Adaptive proceduresappeared ttt be superior to other methods for most non-normal error distributions. They foundthe bimodal error distribution to be difficult for any method.

4. Least absolute value regression with dependent errors

When time-series data are used in a regression, it is not unustial to find that the errors in themodel are correlated. There is a long history of what to do in this case when the estimationmethod is LS. Several articles deal with this problem when LAV estimation is used.

For the following discussion, the regression model in equation (1) is considered. Thedisturbances are generated by a first-order AR process:

e, = pF.,_x + ; / , . (7)

Page 8: Least absolute value regression: recent contributions

270 T. E. Dietman

where p is the first-order autocorrelation coeflicient ( |p | < I) and the ;;, are iid disturbances,but not neces.sarily normally distributed.

Two procedures, both two-stage and based on a generalized LS approach, are typicallyemployed to correct for autocorrelation in the least squares regression context. These are thePrais-Winsten (PW) and Cochrane-Orcutt (CO) procedures. Both prtK-edures transform ihedata using the autocorrelation coefficient, p, after which the transformed data are used inestimation. The procedures difier in their treatment ofthe first observation, {x\. y\). Using ihemodel of equation (6), the PW translbrmation matrix can be written;

0 0 ... -p \

Pre-mu!tiplying the model in equation (6) by Mt yields

MY = MXft + Me

orY* = X';^ -I- n

(8)

(9)

(10)

where Y' contains the transformed dependent variable values and X* is the matrix oftransformed independent variable values, so

and

yi - />.vi • • - V,, - py,<-1J

. 22 - PX\2

( I I )

12)

_ \ ~ P .V,,i - p.X,,-\.] Xn2 - pXn-1,2 " ' * X,,K - pXn-i,

In equation (10), ij is the vector of serially uncorrelated rjt errors.The CO transformaiioii matrix is the (7" — !) x 1 matrix obtainetl by removing the first

row of the M transformation matrix. The use of the CO iransformation means that T - Iobservations, rather than 7", are used to estimate the mode!. The CO transformation omits thefirst observation, whereas the PW transformation includes the transformed first observation.Asymptotically, ihe loss of this single obser\ation is probably ofniinimal concern. However.for small-samples, omitting the first observation has been shown to result in a LS estimatorinferior to that obtained when the first observation is retained and transformed. LSCO andLAVCO will be used to indicate the method in which a CO transformation is used with LSor LAV, respectively. Similarly. USPW and LAVPW wiil be used to indicate Ihe method inwhich a PW transformation is used with LS or LAV.

Coursey and Nyquist [ 831 used a Monte Carlo simulation to compare four types of estimatorswhen disturbances are subject to first-order autocorrelation: LAV, LS. LSCO and LAVCO.Sample si/es of 15 and 30 were used with disturbances generated from the class of symmetricstable distributions. They find that the LAVCO estimator can perform worse than LAV incertain cases. Prior research had shown that LS can outperform LSCO as well. The omissionofthe tirst observation may result In an inferior estimator in small-samples.

Weiss [K4| examined LAV estimation in a regression with (irst-t)rder serially correlatederrors. He considered a LAVCO estimator to correct for first-order serial correlation. He

Page 9: Least absolute value regression: recent contributions

Least absolute value 271

shows that the LAVCO coefficient estimator is asymptotically normal, but this result assumesexistence of at least second moments of the error distribution, a stronger assumption than thatrequired for independent disturbances. A Monte Carlo simulation using normal, lognormaland contaminated disturbances, with sample sizes of 25. 49 and 81 was performed. Resultssuggested the LAVCO estimator would perform better than the LSCO estimator in heavy-tailed distributions, but is negatively affected by trending regressors (as in the LS case).Tests for first-order serial correlation were also examined. This examination found that usingthe Durbin-Watson test with LAV residuals substituted for LS residuals was a reasonableprocedure. See Davis and Dunsmuir [851 for asymptotic results wben regression disturbancesfollow ARMA errors.

Nyquist [86] showed that the LAVCO procedure is unreliable due to the search procedure andproperties of LAV estimation. It is more likely that the LAVCO procedure will not convergeto the proper value. An alternative non-linear procedure is suggested.

Dielman and Rose [87] used a Monte Carlo simulation to compare LAV. LS, LSPW. LAVPW.LSCO and LAVCO. Disturbances used were normal, contaminated normal, Laplace andCauchy with a sample size of 20. The results suggest that: (I) LSCO and LAVCO shouldbe avoided; (2) correction for autocorrelation using the PW transformation improves LAV andLS estimates for moderate-to-high levels of autocorrelation; (3) LAVPW appears to be therecommended approach when error distributions are fat-tailed and autocorrelation is presentand (4) the estimates are not appreciably worse after autocorrelation correction, regardless ofthe degree of autocorrelation. Dielman and Ro.se |88| used an identical simulation design tocompare LAV, LAVCO, LAVPW and two pre-test estimators that transform with either PW orCO when a pre-test suggested that autocorrelation was present. Again, the PW transformationwas found preferable to the CO. There was little difference between always correcting andcorrecting only when suggested by a pre-test.

Nyquist [89] proposed a LAV-based Lagrange multiplier (LM) test for first-order autocor-relation. As the error variance increases, the asymptotic relative efficiency for the LAV-basedtest becomes more favorable relative to tbe LS-based test.

5. Forecasting using least absolute value regression

Dielman [90] used Monte Carlo simulation to compare forecasts from LAV and LS estimatedregression equations with 30 observations. In error distributions that are prone to outliers(Cauchy, Laplace, contaminated normal), the LAV forecasts were shown to be superior toLS. Use of LAV (or some other robust technique) was suggested as an adjunct to LS. Thecomparison of forecasts from the two methods would provide a way of assessing whetheroutliers have adversely affected the LS forecasts. See also Dielman [911 for a correction tothe original article.

Dielman and Rose [92] investigated the forecasting perfonnance of LAV and LS estimatedregressions using Monte Carlo simulation when disturbances are subject to first-order autocor-relation. Four estimators were compared: LAV. LS and both LAVPW and PW( see the previoussection for definitions of the PW and LAVPW estimators). Out-of-sample root mean squareforecast errors were the basis for comparison. Disturbance distributions used were normal,contaminated normal. Laplace and Cauchy, and the sample size was 20. The results suggestedthat: (I) correction for autocorrelation improves forecasts for moderate-to-high levels of auto-correlation; (2) LAVPW appears to be the recommended approach when error distributionsare fat-tailed, and autocorrelation is present and (3) the forecasts are not appreciably worseafter autocorrelation correction, regardless of the degree of autocorrelation.

Page 10: Least absolute value regression: recent contributions

272 T. E. Dietman

6. Inferences in least absolute value estimated regressions

For purposes of this section, we re-express equation (6) in the following matrix form:

e (13)

In equation (13), the coefficient vector fi and the data matrix X have been partitioned; fii isa Al X I vector of coefficients to remain in the model and X| is the associated part of theoriginal data matrix, X; ^2 represents the kz x I vector of coefficients to be included in ahypothesis test and Xi is the associated part of the original data matrix. X. The test we willconsider is the basic test for coefficient significance, i.e. Ho: ^2 = 0. The covariance matrixof the LAV coefficient estimator can be written as X^(X'X)"' with the scale parameter, k,defined as X — \f2f{m), where /(";) is the p.d.f. ofthe disturbance distribution evaluated atthe median.

The LM test statistic for the test of the null hypothesis Ho: ^2 = 0 is given by

LM = giDg2. (14)

where g2 is the appropriate portion of the normalized gradient of the unrestricted LAV objectivefunction, evaluated at the restricted estimate, and D is the appropriate block of the (X'X)"'matrix to be used in the test.

The WALD test statistic is given by

WALD = ^ (15)

where D is as previously defined and b2 is the LAV estimate of ^2-The likelihood ratio (LR) test statistic (assuming the disturbances follow a Laplace

distribution) is

^ ^ ^ 2 ( S A D , ^ S A D . ) ^ (16)A

where SAD| is tbe sum ofthe absolute deviations of the residuals in the restricted or reducedmodel {i.e. fii = 0) and SAD2 is the sum of the absolute deviations of the residuals in theunrestricted model.

The WALD, LR and LM statistics each have, asymptotically, a chi-square distribution withAl degrees of freedom. See Koenker and Bassett [93| and Bai ft al. [94| for further details onthese test statistics.

Note that both the WALD and LR test statistics require the estimation ofthe scale parameterK whereas the LM test statistic does not. One estimator often suggested can be computed asfollows:

sA?{it,r-,,,-i)eun)] . " + 1 fnX = ^ where m = -—— - z^nJ —. (17)

Zajl 2 V 4where the e,., are ordered residuals from the LAV-fitted model, and /J' = n - r where r is thenumber of zero residuals. A value of ur — 0.05 is usually suggested. This estimator will bereferred to as the SECI estimator.

McKean and Schrader \9f>] used Monte Carlo simulation to compare several methods ofstudentizing the sample median. The methods included the SECI estimator and several othersthat could be extended for use in LAV regression hypothesis tests. SECI peribrmed well andthe value of ff = 0.05 seemed to produce the best results. McKean and Schrader [96] againsuggest this estimator and provide an example of its use.

Page 11: Least absolute value regression: recent contributions

Lea.st absolute value 273

Sposito and Tveite [97] used Monte Carlo simulation to study the SECI estimator. Goodestimates were obtained for finite range error distributions (triangle and uniform) consideredand for the normal distribution. For the Laplace and Cauchy error distributions, larger samplesizes were needed: /J — 100 for Laplace. /; — 300 for Cauchy.

Sheather [98[ summarizes the results of a Monte Carlo simulation to compare the SECIestimator and several other estimators for X. including some that do not extend easily to theregression application. Tbe conclusion was that the SECI estimator provides a good, quickpoint estimate ofthe standard enor. Dielman and Pfaffenberger [99. IOO] and Dielman andRose [101, 102) also noted that this estimator performs well when used to compute the LRtest statistic.

Liu [103] proposed several non-parametric estimators o'( X and proved strong consistency.Niemiro [104] suggested kernel-smoothing methods for estimation ofthe nuisance parameterin LAV regression. Consistency ofthe suggested estimator is shown and bounds are obtainedfor the rate of convergence. Rao [105] and Bai et ai [94[ suggest additional alternatives.Small-sample comparisons with the estimator in equation (17) would help to determine theefficacy of tbese estimators.

Bootstrap methodology provides an alternative to the WALD, LR and LM tests. The boot-strap approach was developed by Efron [106], and bas been shown to be useful in manyinstances where traditional approaches for testing and estimation are either undeveloped orsuspected when k2 = \,a bootstrap test statistic for Ho: 62 = 0 in the LAV regression contextcan be computed as follows. The model shown as equation (13) is estimated using LAV esti-mation procedures, and residuals are obtained, The test statistic. |b2 - 0|/se(b2), is computedfrom the regression on the original data, where se(b2) represents the standard error of thecoefficient estimate b2, computed as

se(b2) = XD'^^ (18)

wbere A, is defined in equation (17) and D is defined in equation (14). The residuals. £•,(/ =1,2 n). from this regression are saved, centered and resampled (with replacement), toobtain a new sample of pseudo-disturbances, e*. Thee* values are used to create pseudo-data:

Y*=b,X,-f-b2X2+e*, (19)

where b | and b2 are the initial LAV estimates ofthe two vectors of regression coefficients. TheT X 1 vector, e*, is the vectorof e* values. The coefficients in equation (19) are re-estimatedto obtain new parameter estimates, b, and b^- The bootstrap test statistic |b2 — b2l/se(b2)is computed and saved, and the process is repeated a large number of times. For a test to beperformed at a particular level of significance, a, the critical value is the (I — a)th percentilefrom the ordered values of |b2 - b2l/se(bp. If the original test stati.stic. |b2 — 0]/se(b2). islarger than this critical value, then the null hypothesis that b2 = 0 is rejected. This procedurefollows the guidelines suggested by Hall and Wilson [ 107], including the use of bootstrappivoting, which results in increased power for the test and a more accurate level of significance.

When k2 > 1, a modified approach is necessary to produce a statistic similar to the LSF-statistic. De Angelis et al. [1081 described analytical and bootstrap approximations to theestimator distributions in LAV regression.Theconsistency of the bootstrap in the LAV regres-sion setting is established. The rate of convergence is slow in the case of the unsmoothedbootstrap. Tbe authors show how the rate of convergence can be improved by using either asmoothed bootstrap approach or a normal approximation based on kernel estimation of theerror density. Suggestions are given for the choice of smoothing bandwidth necessary in thelatter two cases.

Page 12: Least absolute value regression: recent contributions

274 T. E. Dietmati

The LR. LM and WALD tests are distributed asymptotically as chi-square with degrees offreedom equal lu the number of coefficients included in the test (denoted A:). However, thisfact does not confirm how the test statistics will perfonn in small-samples. Several MonteCarlo studies have been performed to try and shod light on the small-sample performance.These studies will now be summarized.

Koenker [ 109] used Monte Carlo simulation to examine the performance of the WALD,LM and LR tests for LAV regression coefficients. The true value of the nuisance parameterwas used in computing the WALD and LR statistics. Sample sizes were 30. 60 and 120.and normal. Laplace and Cauchy error distributions were used. Compari.sons were based onlevels of significance and power, but power comparisons were not adjusted for differences inlevels of significance. The LM estimator performed comparatively well, although room forimprovement was noted.

Schrader and McKean [ 110] examined the LR., WALD and bootstrap test. A Monte Carlosimulation was used to study test performance. Error distributions were normal, contaminatednormal and slash. Sample sizes ranged from 15 to 300. Comparisons were based on leve[s ofsignificance and power, but power comparisons were not adjusted for differences in levels ofsignificance. The study found the WALD test inadequate. The LR test performed reasonablywell. Best perft)rmance was with the bootstrap test.

Dielman and Pfaffenberger [99. 100. I 11 ] compared the WALD. LR and LM test statisticsunder a variety of conditions using Monte Carlo simulation. Their results suggest that tbe LRtest using the SECI estimator and the LM test are both superior to the WALD test. Comparisonswere based on levels of significance and power, but power comparisons were not adjusted fordifferences in levels of significance. One version of the WALD and LR tests used tbe SECIestimator of the nuisance parameter; another used a bootstrap estimate of this parameter (not atrue bootstrap test but a bootstrapping procedure was used to estimate the nuisance parameter).Bootstrap estimation of the nuisance parameter did not improve any of the test results.

Dielman and Rose [ 101 ] compared the LR test using the SECI estimator of the nuisanceparameter, the LM test and the bootstrap test. They used normal, contaminated normal. Laplaceand Cauchy disturbances with sample sizes of 10, 15. 20 and 25. The observed levels ofsignificance were closer to nominal for the LM and bootstrap tests. The power of the bootstraptest was generally better than that of the LM test although somewhat lower than that of theLR test. Results for power were not adjusted for differences in levels of significance.

Dielman and Rose [ 102 [compared the WALD. LR and LM tests in LAV multiple regression.They used normal, contaminated normal. Laplace and Cauchy disturbances with sample sizesof 14, 20 and 30. Empirical levels of significance and power of the test procedures were com-pared. Power results were adjusted using the procedure suggested by Zhang and Boos [112].The performance of individual coefficient tests and an overall-fit test was examined. Resultssuggest that empirical levels of significance are closer to nominal for the LM test but that theLR test is preferred on the basis of power. Both are preferable to the Wald test.

Dielman and Rose [113] used Monte Carlo simulation to compare level of significance andpower of tests in LAV regression when disturbances are subject to first-order serial correlation.The test procedures considered are the WALD, LR and LM tests. The LAV regressions areestimated both with and without correction for autocorrelation. Two corrections are applied:the CO (omit first observation) Iransformation and the PW (retain first observation) transfor-mation. Results indicate that correction for autocorrelation is important for large values ofthe autocorrelation coefficient. The CO transformation and the WALD test seem to be thepreferred pair when level of significance is considered; when power is considered, tbe COand LR combination is prefened. The preference for the CO transformation for testing is incontrast tv the results for estimation. When estimator efficiency is of interest, the PW trans-formation produces superior results. This result does not suggest that we should disregard the

Page 13: Least absolute value regression: recent contributions

Least absolute value 275

PW transformation. Further examination suggests that test procedures do not perform partic-ularly well with either the PW or the CO transformation when the level of autocorrelationis high. Perhaps altemative approaches such as a bootstrap test would serve better in thissituation.

Stangenhaus[l 14], Stangenhaus and Narula [1151andStangenhausc;«/. [116] used MonteCarlo simulation to study the performance of confidence intervals for coefficients in a LAVregression. Their findings include:

Fairly accurate results are otilaineil with small-samples (sample size t t>-l 5) for normal and contaminated normalerror disiribulions bul much larger samples ;ire needed (sample size UK) nr more) lor Cauchy and Laplace emirdistrihutions. However, ihe difference between nominal and actual coverage rates was small in all cases. Secalso Dielman and Pfaflenlierger [! 17|,Intervals compuied using ihe bootstrap sampling distribution (percentile bootstrap = PB) were superior to thoseconstructed using ilie bootstrap slandard deviation (standard bootstrap — SB} in samples of size SO or less. (Thisis consisieni with ihc hypothesis tesling restilts of Dielman and Pfaffenberger discussed earlier.) Little differencewas found with sample sizes grealerlhan 50. The SB intervals were constructed using 200bonlstrap repL'litions;ihePB intervals used HHW.

Gutenbrunnerf^a/. [118] considered tests of ageneral linear hypothesis for linear regressionbased on regression rank scores. The tests are robust to outliers and there is no need to estimatea nuisance parameter. The regression rank scores arise as solutions to the dual form of thelinear program required to compute regression quantile.s. When sign scores are used, the teststatistic coincides with the LAV LM test.

Cade and Richards [ 119| suggested permutation procedures (resampling without replace-ment) for hypothesis tests about the parameters in LAV linear regression. A Monte Carlostudy showed that the permutation test performs better than LS-based tests in cases where tbedisturbances are fat-tailed or asymmetric.

Horowitz [ 120] noted that the LAV estimator does not satisfy the standard conditions lorobtaining asymptotic refinements through use ofthe bootstrap because the LAV objective func-tion is not smooth. He proposed a smoothed LAV estimator that is asymptotically equivalentto the standard LAV estimator. For the smoothed estimator, refinements for tests of hypothesesabout the parameters are possible. The results extend to censored LAV and models with orwithout heteroskedasticity.

Weiss [121) developed a generalized method of moments (GMM) test for comparing LAVand LS regressions. The GMM test is equivalent to the Hausman test.

Furno [ 122] considered different versions ofthe LM test for autocorrelation and/or condi-tional heteroskedasticity. The use of LS versus LAV residuals as well as squared residualsversus their absolute values was compared. Furno showed that LM tests based on LAVresiduals were distributed asymptotically as chi-square and were robust to non-normality.Monte Carlo simulation suggested using tbe absolute value of LAV residuals for the testsdiscussed.

7. Time-series models

An and Chen [ 123 [ considered an AR model with stable disturbances and proved convergencein probability of the LAV estimates of the AR parameters.

Dunsmuir and Spencer [ 124] proved strong consistency of the LAV estimator in ARMAmodels under general conditions. They also proved asymptotic normality. Dunsmuir 1125|used Monte Carlo simulation to study LAV estimation applied to a seasonal moving averagemodel. The model examined is essentially the airline model of Box of Jenkins. Results sug-gest that the normal distribution is a poor approximation to the small-sample distributionsof the coefficients, LAV estimation does provide some benefits over LS when disturbance

Page 14: Least absolute value regression: recent contributions

276 T. E. Dielman

distributions are heavy-tailed, and no clear preference can be given for backcasting over zeropre-period values when LAV estimation is used. Dunsmuir and Murtagh [126] formulatedLAV models for ARMA processes as non-linear programing problems and suggested tech-niques for estimation. Examples of applications of LAV versus LS estimation using real dataare shown with support for the use of LAV.

Olsen and Ruzinsky [ 127] studied the convergence of the LAV estimator for an AR(p) pro-cess when the process generating the data has been incorrectly identified as an ARMA(/7, q),MAle/) or higher order AR process. Ruzinsky and Olsen [128] proved strong consistency ofthe LAV AR parameter estimator when the process disturbance has zero tnean.

Pino and Morettin [129] proved the LAV estimates of ARMA model parameters to bestrongly consistent. They provided a stationarity and invertibility condition avoiding the usualassumption of finite variance.

Knight [ 130, 1311 shows that LAV estimators of autoregressive parameters are asymptoti-cally normal if the distribution function of the errors has positive tirst derivative evaluated atzero. Knight derives limiting distributions of LAV estimators in AR models under more gen-eral assumptions on the distribution function ofthe errors. Herce [132) derives the asymptoticdistribution of the LAV estimator of the AR parameter under the unit root hypothesis wheneiTors have finite variances. Rogers [133] also derives asymptotic results for LAV regressionmodels with certain assumptions relaxed. Specifically, his work deals with time-series modelswith deterministic trends and random walks and is therefore related to the work of Knight andHerce.

8. Non-linear least absolute value regression

Gonin and Money ] 134] presented a brief discussion of algorithms for solving the non-linearLAV regression problem. The algorithms are grouped into three categories: those using onlyfirst-derivative information, those using .second-derivative information and those which lin-earize the model functions but incorporate quadratic approximations to take curvature effectsinto account. They present a much more detailed discussion in Chapter 2 of Gonin andMoney [135].

Oberhofer [ 136 [ shows conditions for consistency of tbe LAV estimator in non-linear regres-sion. Richardson and Bhattacharyya [ 137[ extended Oberhofer's result. Their resu[ts apply toa more general parameter space.

Soliman et al. [I38[ proposed a modification of the algorithm discussed in Christensenet al. [41 ] to produce non-linear LAV estimates with non-linear equality constraints. Bassettand Koenker [43[ show that this algorithm will not produce estimates that are necessarilyidentical or even close to the LAV estimates.

Weiss [ 139] examined an estimator designed for non-linear estimation of the regressioncoefficients and the autocotrelation coefficient simultaneously in titne-series regression mod-els with autocorrelated disturbances. The basic requirement for consistency is that errorshave conditional median zero. Asymptotic normality is shown. Estimation ofthe asymptoticcovariance matrix is also considered and LM. Wald and LR tests are developed.

Shi and Li [140, 1411 discussed a LAV estimator for a model with linear and non-linearcomponents. A piecewise polynomial is used to approximate the non-linear component. Theconvergence rate of the estimator was studied. They lind convergence rates for estimators ofthe regression coefficients under mild conditions.

Kim and Choi [142[ provided conditions for strong consistency and asymptotic normalityfor the non-linear LAV regression estimator As with linear regression results, the non-linear

Page 15: Least absolute value regression: recent contributions

Lea.it absolute value Til

LAV estimator is shown to be more efficient than non-linear LS for any error distributionfor which the sample median is more efficient than the sample mean as an estimator oflocation.

Andre' et al. [ 143 ] proposed using a non-linear LAV regression to estimate a multistage doseresponse model. They used simulation to exatnine tbe performance ofthe model.

Zwanzig [144] shows conditions for strong consistency of the LAV estimator of theparameters in the non-linear regression model and in the non-linear error-in-variables model.

9. Applications

In this section, a few ofthe applications of LAV regression that have appeared in the literaturewill be discussed. Coleman and Larsen [145) used equations estimated by LAV. LS andChebychev to predict housing prices. The results showed little difference between forecastsfrom the three methods. A more structured experiment might clarify the results furtber.

Corrado and Schatzberg [146] used LAV. LS and a non-parametric rank regression estimatorto estimate the systematic risk {slope parameter) in a regression of daily stock returns on anindex. Their results provide some evidence that both the LS and the non-parametric estimatorwill be superior to LAV under the conditions examined. They used only large firms in theanalysis, so the regression disturbances are likely to be close to normal. In another financialapplication. Chan and Lakonishok 1147 ] u.sed LS. LAV. trimmed quantile estimators (TRQ) andtwo estimators that are linear combinations of regression quantiles to estimate market modelparameters. The study used simulated return data and actual returns. When a /-distribution with3 degrees of freedom is used as the disturbance distribution, the robust methods were moreefficient than LS. The TRQ estimators appeared more efficient than LAV. In a similar analysis.Butler et al. [148] applied LS and LAV estimation, as well as a partially adaptive estimatorto market model data and compared the results for two examples. Draper and Paudyal [ 149[applied LS and LAV to actual returns and suggest that robust methods may be superior whendaily data are used. Mills et al. [150| apply LS and various robust estimators {including LAV)to an event study, Their findings suggest the results based on cumulative average residualsfrom the regression can vary depending on the estimator used.

Bassett [151] examined the rating of sports teams based on scores of previous games byestimating the parameters ofthe standard linear model. He applied both LAV and LS to datafrom the 1993 NFL season. In terms of correct predictions. LAV outperformed LS.

Kaergard [1521 Li d LAV regression to estimate a Danish investment growth model whileSengupta and Okamura [ 153] applied LAV regression to a cost frontier mode! for analyzingthe impact of scale economies in tbe South Korean manufacturing .sector. Westland [154]applied LAV. LS and two-stage LAV and LS estimation techniques to systems of equationswith partial structural variability. Preference among estimators and predictions depended tosome extent on the criteria used to judge performance, but, in general, little difference wasfound for the example in this article.

Gonin and Money [ 155[ used non-linear examples from pharmacokinetics to illustrate anadaptive algorithm for the choice of p in non-linear L/,-estimation. They compared estimatesfrom the adaptive algorithm to LAV estimates. See also Roy [I56[.

Another area that might be thought of as an application of LAV regression is to provideinitial estimates for otber robust algorithms. This was done, for example, in Rousseeuw and vanZomeren [ 157]. They used LAV regression as part of a robust regression algorithm. Leveragepoints are first eliminated from the data set by various methods, and then LAV regression isapplied to the remaining data.

Page 16: Least absolute value regression: recent contributions

278 T. E. Dielman

10. Further uses of least absolute value estimation methods

Powell [ 158] examined two-stage LAV estimators for the parameters of a structural equation ina simultaneous equation model. He demonstrated the asymptotic normality of ihe estimators forvery general disturbance distributions. Powell [ I59[ proposed an e.stimator for the parametersof the censored regression model that is a generalization of LAV estimation for the standardlinear model. The estimator is consistent and asymptotically normal for a wide class of errordistributions. Rao and Zhao ] 160] also studied the asymptotic behavior of the LAV estimator incensored regression models. Using weaker conditions than those of Powell [159], Pollard [63]and Chen and Wu [67], they prove consistency and asymptotic normality of the estimator.

Koenker and Portnoy [ 161 ] proposed robust alternatives to the seemingly unrelated regres-sion estimator of Zellner. The LAV is considered as a special case. Asymptotic normality ofthe coefficient estimators is shown.

Honore [ 162 [ proposed trimmed LS and trimmed LAV estimators of truncated and censoredregression models with fixed effects and panel data, where the panel was of length two. Theestimators were shown to be consistent and asymptotically normal.

Wang and Scott [ 163] proposed a hybrid method that combines non-parametric regressionand LAV estimation. LAV regressions are fitted over local neighborhoods. The new methodgeneralizes to several dimensions. The method is proved to be consistent. Thus a procedurewith relative computational ease and asymptotic theoretical results is obtained.

Dodge [ 164] introduced an estimation prtKedure which is a convex combination of LAV andLS.The method performs well in comparison to LS. LAV and other robust methods. Choosingthcparameters that detinclheconvexcombinationis the trick that must be mastered in practice.Dodge and Jureckova [165] show that the estimation of regression parameters by a convexcombination of LS and LAV estimators can be adapted so the resulting estimator achieves theminimum asymptotic variance in the mode[ under consideration. Dodge et al. [166] discusscomputation of this estimator. Dodge and Jureckova [ 1671 introduce an estimator that is aconvex combination of an M-estimator and the LAV estimator. Dodge and Jureckova (168]summarize results based on estimation procedures that involve convex combinations of LAVand some other estimation procedure.

Rao [105] examined extensions of LAV estimation to a multivariate linear model. Mathewand Nordstrom [ 169] examined procedures for a linear model with an additive bias vectorthat is bounded. They called tbis model an approximately linear model. When the regressioncoefficients are estimated by minimizing the maximum of a weighted sum of squared devia-tions, the criterion to be minimi/.ed is a linear combination t)t LS and LAV criteria for the ideallinear model. When the regression coefficients are estimated by minimizing the maximum of aweighted sum of absolute deviations, the estimate turns out to be independent ofthe assumedbounds.

Narula and Karhonen ] I7O[ suggest the LAV criterion for estimating parameters in mul-tivariate regression models. The problem can be formulated as a multiple objective LPproblem.

Morgenthaler [ 171 ] examined the consequences of replacing LSby LAV in tbe derivation ofquasi-Iikelihoods. The LAV-type criterion was found to be applicable and to lead to alternativesfor maximum likelihood fits.

Puig and Stephens [ 172] studied goodness-of-fit tests for the Laplace distribution. Asymp-totic theory is given as well as critical values derived from Monte Carlo simulations forfinite-samples. Power studies suggest the Watson U- statistic is the most powerful. Examplesusing LAV regression are provided. They suggest the importance of such tests in conjunctionwith LAV estimation since the LAV estimator is maximum likelihood when disturbancesfollow a Laplace distribution.

Page 17: Least absolute value regression: recent contributions

Least absolute value 279

Bradu [ 173] provides a sufficient condition for the exact fit property (EFP) which is easier tocheck than the condition discussed in Ellis and Morgenthaler [76] and discusses the use oftheEFPfor outlier identification in .specific circumstances. Dodge [ 174[ proposed algorithms fordetecting outliers in both the y- and .v-variables. The algorithms utilize regressions involvingthe v-variable and each .v-variable individually as the dependent variable.

Morgenthaler [175] di.scusses the behavior of residuals from LAV linear models fromdesigned experiments. Shortcomings of the use of LAV residuals are noted. Sheather andMcKean [ I76[ examined the usefulness of residual plots from LAV regression. Since the LAVresiduals and the fitted values can be negatively correlated, cases exist where examination ofresidual plots may lead to judgments of modet inadequacy for properly chosen models.

Hurvich and Tsai [177] developed a criterion for the selection of LAV regression models. Anew measure specifically oriented toward LAV regression is developed and compared to thecorrected Akaike Information Criterion (cAlC) measure of Hurvich and Tsai. Both measuresperform well, although the cAIC is computationally less intensive.

Huskova [178] proposed L|-type test procedures for detection of a change in linear models.The model examined is one where the regression values take on specific values up to a certainpoint in time, then change from that time point on. The problem is to test whether the changepoint occurs at a certain time period.

References| i | Dielman, TE.. 1984. Leasl absolute value estimation in regression models: An aniiolaled bibHography.

Communications in Statistics - Theory ami Methods. 4. 51 .'1-541.12| Dodge. Y. 1987. An intn>dui:tion to L| -norm bused statistical data analysis. Computtitioiial Statistics and Data

Analy.m. 5. 239-253.|3] Narula. S.C. 1987. The minimum sum of absolute errors regression. Jounuil of Quality Technotogy. 19,37-45,14| Pynniincn. S. and Salmi. T.. 1994. A report on least absoiule deviation regression wiih ordinary linear

programing. Liikelaloudflliueu Aiktikauskirja. 43. 3fi—+9.|5] Birkes, D. and Dodge, Y. \^9'},.Ahermilivi' Methods of Regression (NewYork: John Wiley).[6] Bloomfield. P. and Steiger. W.. 1983, U-astAhsolute Deviations: Theory, Applications, amiAlf;orithm\ IBoaton:

Birkhauser).|7 | Sposito. V.A.. 1989. Linear Programing niili Statistical Applications (Anu's. Iowa: Iowa State University

Press).[81 Farebrother. R.W.. 1999. Fitting Linear Relatiomhlps: A History of the Calculus of Observations 1750-1900

(New York: Springer) (Springer Series in Stalislics).[y| Wagner. H.M., 1959, LineiU" programing techniques for regression analysis. yoHma/o/r/ie/t/McntYiHSwn'iffcti/

As.socialion. 54. 2(K>-212.[10] Boscovich, R.J., 1757. De litteraria expeditione per pontificiam ditioned. el .synopsis amplioris opens, ac

habeniur pluraejiisex exemplaria eliam sensorum impre.ssa. Bommien.si Scientiarum el Anum Instituto AtqueAceulenn'ii Commentarii. 4. 3,^3-396.

[Ill Boscovich, R.J.. 1760. De recentissimis giadiiiim dimensionibus. et iigura, ac magniludine terrae indederivanda. Philosophiae Receniioris. a Beuedivto Stay inRomano Archiftyna.sis Puhtico Eloquentare Pm-fessore. vesibus tradttae. Libri X. cum adnotianibus et Supplementis P. Rogerii Joseph Ho.wovirh, S. / , 2,406-426.

112] Gauss, C.R, 1809. Theoria Motus Corponim Celestiutn in Sectionbus Conicis Soluni Antbientium. Hamburg:Perthes el Besser. Translated. 1857. as Theory of Motion of the Heavenly Bodies Moving aboul the Sun inConic Sections, trans. Davis. C. H. Little. Brown. Boston. Reprinted. 1963 (New Y<trk: Dover).

[13] Gauss, C.F.. 1823, Theoria Combination's Ohsen'ationuni Errorilms Minimus Obnoxiae. CommentationesSocieiatis Rcgiae Scieniiarum Gotiingensis Recentiores. 5: German summary. Goltingische Gelehne An/eigen(1820. 321-327 and (1823). 313-318.

[141 Gauss, CF,, 1828, Supptementum Theoriae Combinationis Obsenationtim Ermribus Minimis Ohno.xiue.Cnnimenialiones ,Societatis Regine Scienliamm Gotlingensis Recentiores, 6, German summary, GottingischeGelehrteAnzeigen{l826), 1521-1527.

[15] Laplace, P.S.. 1812. Theorie Anolytique des Probabilites, Mme Courcies, Paris 1812. (Reprinted as OeuvresComplete tie Laplace. 7. Gautliier-Vilhirs, Paris. 1847).

[16] Farcbrotber. R.W., 1987, The historical development ofthe Lj and L4 estimation procedures: 1793-1930,In: Dodge. Y (Ed.). Statistical Data Analysis Based on the L\-norm and Related Methods (Amsterdam:Nortb-Holland), pp, 37-63.

[17| Farebrother. R.W., 1990, Siudies in ibe history of probability and slatislics XLII: Funher details of contactsbetween Boscovich and Simpson in June 1760, Biomeirika. 77, 397-4(K).

Page 18: Least absolute value regression: recent contributions

280 T. E. Dielman

[181 Farebrolber, R.W.. 199.3, Boscovicb's method for correcting discordant observations. In: Bursill-Hall, P. (Ed.).R. J. Boscovich: His Life and .Scientific Work (Rome: Insliluto della Encyclopedia Italiana), pp. 255-261.

[19] Farebrotbcr, R.W.. 1997. Notes on the early hislor>' of elemental set melhinls, Li-Stati.stical Procedures andRelated Topici. tMS Lecture Notes-Monograph Series. 31, 161-170.

[20] Koenker, R., 2(KX), Gallon, Edgcwonh, Frisch. and prospects for quantile regression in econometrics. Joumalof Econometrics. 95, 347-374.

[21] Bowley, A.L.. 1902. Applications to wage statistics and olher groups. Journal ofthe Rovat Statistical Society.6S.,'?31-.342.

[221 Rousseeuw, P. and Hubert. M., 1998, Regrsasiun dcpih. Journal of the American Statistical Association, 94.388-102.

[23[ Friscb.R.. 1956, La Resolution des probltmesdeprogramme lin^airepar la m^tbodedu potential logarithmique.Cahiers diiy Seminaire d'Econometrie. 4. 7-20.

[24[ Sligler. S.M., 1986. The History of Statistics: The Measurement of Uncertainty before I9(X) (Cambridge, MA:The Belknap Press of Harvard University).

|25| Stigler. S.M.. 1984. Studies in the history of probability and statistics XL: Boscovich, Simpson and a I7t->Otnanu.' cript note on fitting a linear relalion. Biometrika. 71. 615-620.

126[ Charnes. A.. Cooper. W.W. and Ferguson. R.. 1955. Optimal estimation of executive compensation by linearpm^nntin^. Management Science. I. 138-151.

1371 Banodaie, I. and Roberts. F.D.K., 1973, An improved algorithm for discrete I] linear approximation. SIAMJtnirtiai on Numerical Analysis, 10, 839-848.

[28[ Barrodale, I, and Roberts, F.D.K., 1974, Algorithm 478: Solution of an overdetennined system of equations intbe i norms. Communications' of the ACM. 17. 319-320.

[29 [ Armstrong, R,D. and Kung. M.T.. 1978. Algorithm AS 132: Least absolute values estimates for a simple linearregression problem. Applied Statistics. 27, 363-366,

[30[ Bloomfield. P. and Steiger. W.. 1980. Least absolute deviations curve-fitting. StAM Journal on Scientific andStatistical Computing. I. 290-301.

[311 Armstrong, R.D.. Frome, E.L. and Kung. D.S., 1979. A revised simplex algorithm lor the absolute deviationcurve-fitling problem. Communication in Staii.stics-Simulation ami Computation. B8, 175-190.

[32| Josvanger and Sposito. V,, 1983, L| norm estimates for the simple regression problem. Convnunicalions inStatistics-Simulation and Computation. 12, 215-221.

[331 Gentle, J.E., Narula, S.C. and Sposito, V.A., 1987, Algorithms tor unconstrained L| linear regressiini. In: Dodge.Y. (E\i.), Statistical Data Analvsis Ba.iedon the L\-Nonn and Related Methods (Amsterdam: Norlh-Htilland),pp. S3-94.

[34[ Dielman, TE, and Pfaffenberger. R., 1984, Computational algorithms for calculating least absolute valtie andChebyshev estimates for multiple regression. American Journal of Mathematicat and Management Sciences,4. 169-197,

[351 Abdelmalek. N.N.. 1980. Li .solution of overdetermined systems of linear equations, ACM Transactions forMathematical Software, 6. 220-227.

[36| Gentle. J.E.. Sposito. V.A, and Narula. S.C. 1988. Algorithms for unconstrained L| simple linear regression.Computalionat Statistics anti Data Atratysis, 6. 335-33'J.

[37i Nanila. ,S.C,. Sposito. V.A. and Gentle. J.E.. 1991, Comparison of computer programs for simple linear Liregression. Journal of Statistical Computation and Simulation, 39, 63-68.

[38| Sunwoo. H.S. and Kim. B,C.. 1987. L] estimation for the simple linear regression mixlel. Communications inStatistics - Theory and methods. 16, 1703-1715.

[.19) Soliman, S.A., Christensen. G.S. and Rouhi. A.H., 1988, A new technique for curve fitting based on minimumabsolute deviations. Computalionat Statistics and Data Analysis. 6. 34 !-351,

[40| Herce, M.A., 1990. An example showing that a new technique for LAV estimation breaks down in certaincases. Cotnpuiationat Statistics and Data Anatysis. 9. 197-202.

[41) Christensen, G.S.. Soliman. S.A. and Rouhi, A.. 1990, Discussion of an example showing that anew techniquefor LAV estimation breaks down in conain cases. Computational Stalislics ami Data Analysis. 9. 203-213.

[42] Herce. M.A., 1990, Some comments on Christensen, Soliman and Rouhi •sDisctissioti.C('m/w/u/iom//S/«(/.v/ic-.vA Data Analy.sis. 9. 215-216.

|43[ Bassett. G.W, and Koenker. R.W.. 1992, A note on recent proposals for computing h estimates. ComputationalStatistics and Data Anatysis, 14, 207-211.

[44[ Dielman. T.E., 1992. Computational algorithms for least absolute value regression. In: Dodge, Y. (Ed,),t.\-Staiistieal Anatyses atul Related Mettiods (Amsterdam: North-Holland), pp, 311-326,

[45 [ Seneta. E. and Steiger. W.L.. 19S4. A new LAD curve fitting algorithm: Slightly over-determined equationsystems in L|. Discrete Apptied Mathematics. 7, 79-91,

[46[ Farebrother. R.W.. 1988. A simple recursive procedure for the L] norm fitting of a straight line. AppliedStatisties. 3.457-489.

[47[ Sadovski, A.N., 1974. Algorithm AS74: L] -norm lit of a straight line. Applied Statistics. 23. 244-248.[48[ Farebrother. R,W., 1992, Least squares initial values for the L|-norm titting of a straight line-a remark on

algorithm AS 238: A simple recursive procedure for the L| norm fitting of a straight line. Applied Statistics,

[49] Rech. P,, Schmidbauer, P. and Eng, J., 1989, Lea.st absolute regression revisited: A simple labeling method forlinding a LAR line. Communications in Statistics - Simulation and Computation, 18, 943-955,

Page 19: Least absolute value regression: recent contributions

Least absolute vatue 281

[50] Madsen, K, and Nielsen, H.B,. 1993. A finite smoothing algorithm for linear lj e.stimation. SIAM Journal onOptimization, 3. 223-235,

[51 ] Hong, C.S, and Cboi. H.J.. 1997, On Lj regression coefficients. Communications in .Statistics-Simulation andCompulation. 26, 531 -537.

(52] Sklar. M.G., 1988. Extensions to a best subset algorithm for leasl absolute value estimation. American Journalof Mattiematicat and Management Sciences. 8, 1-58.

[53] Narula. S.C. and Wellington, J.F., 1988, An efficient algorithm for the MSAE and MM AE regression problems.StAM Journal on Scientific and Statislicat Computing. 9, 717-727.

154] Planitz, M. and Gates. J.. 1991. Strict discrete approximation in the L| and L^ norm&. Applied Statistics. 40.113-122.

[55[ Adcock, C.J. and Meade. N., 1997. A comparison of two LP solvers and a new IRLS algorithm for L| estimation,L i-Statisticat Procedures ami Retated Topics. IMS Ix'cture Notes - Monograph Series. 31, 119-132,

[56] Portnoy, S. and Kiienker. R.. 1997. The Gaussian hare and the Laplacian tonoise: Computability of squared-errorversus absolute-error estimators. Statisticat Science. 12. 279-296. (Comments and rejoinder, 296-30()).

[57] Coleman, T.R, and Li. Y.. 1992, A globally and quadnuically convergent affine scaling method for linear L|problems, Mathematicat Programing. 56, 189-222.

[58] Koenker, R,. 1997,11 computation: An interior monologue, L]-Statistical Procedures and Related Topics. IMSLecture Notes - Monograph Series, 31, l,'5-32.

[59| Portnoy. S.. 1997, On computation of regression quantiles: Making the Laplacian tortoise faster. L\-StatisticalProcedures and Related Ibpics. IMS Lecture Notes - Monograph Series, 31, 187-200.

[60] Bassett, G.W. and Koenker. R.W., 1978, Asymptotic theory of least absolute error regression. Journal oftheAmerican Statistical Association, 73. 618-622,

[61] Koenker. R. and Bassett. 0.. 1985, On Boscovich's estimator. Ttie Annats of Statistics, 13. 1625-1628.[62[ Phillips. PCB.. 199 l.A shorleut to LAD estimator asymptotics. Econometric Tlieory. 7. 450-463.[63] Pollard, D,. 1991. Asymptolics for least absolute deviation regression estimators. Econometric Theory, 7,

186-199.[64] Wu. Y, 1988, Strong consisteney and exponential rate of the minimum Lj-norm estimates in linear regression

models. Computalionat Statistics and Data Anatysis. 6, 285-295.[65] Bai.Z.D. and Wu. Y.. 1997, On necessary conditionsfor the weak consistency of minimum L|-norm estimates

in linear models. Statistics & Pmbabitity Letters. 34. 193-199.[66] Chen. X,R.. Zhao. L. and Wu. Y.. 1993. On conditions of consistency of ML|N estimates. Stali.ttica Sinica, 3.

9-18,[67] Chen. X.R. and Wu. Y.. 1993. On a necessary condition for the consistency of the L| estimates in linear

regression models. Communication in Statistics-Ttteory andMettunls. 22, 631-6,'t9.[68] Chen. X.R,, Wu. Y. and Zhao, L., 1995, A necessary eondition tor the con.sisieney of L| estimates in linear

models. Sankhya: Ttie Indian Journat of Statistics. Series A, 57. 384-392.[69] Andrews. D.W.K.. 1986. A note on the unbiasedness of feasible GLS, quasi-maximum likelihood, robust,

adaptive, and spectral estimators ofthe linear model. Econometrica. 54. 687-698.[70] Farebrother, R,W., 1985. Unbiased L| andLi^je-stimation, Communications in Stati.stics -Ttieorv and Methods,

14. 1941-1962.[71] Withers. C.S., 1987. The bias and skewness of L|-estimates in regression. Computational Statistics and Data

Anatysis. 5. M)l-30^.[72] Bassett. G.W.. 1988. A p-subset property of L| and regression quantile estimates. Computational Statistics

anil Data Analy.sis. 6. 297-304.[73] Bai. J.. 1995, Least absolute deviation estimation of a shift. Econometric Theory, 11. 403-436,[74[ Caner. M., 2002, A note on least absolute deviation estimation of a threshold model. Econometric Theory, 18,

800-814,[75] He, X., Jureckova, J.. Koenker. R. and Portnoy. S.. 1990. Tail behavior of regression estimators and their

breakdown points. Econometrica. 58. 1195-1214.[76] Ellis, S.P. and Morgenthaler, S., 1992. Leverage and breakdown in L| regression. Journal ofthe American

Statistical Association. 87. 143-148.[77] Ellis. S.P. 1998, Instability of least squares, least absolute deviation atid least median of squares linear

regression, Stati.stical Science, 13, 337-344,[78] Ponnoy. S, and Mizera. I . 1998, Comment. Statisticat .Science, 13, 344-347.[79[ Ellis, S.P, l^i^K Rcjomder. Slati.\tical Scietice. 13.347-350.[80] Pfaffenbergt'r. R.C, and Dielman, T.E.. 1989. A comparison of regression estimators when both multicollincar-

ity and outliers are present. In: Lawrence. K. D. and Arthur. J. L. (Eds.), Robust Regression: Anatysis anilApplications (New York: Marcel Dekker, Inc.). pp. 243-270.

[81] Lind. J.C, Mehra. K.L. and Sheahan, J.N,. 1992. Asymmetric errors in linear models: Estimation-theory andMonie Carlo. .Siati.slics. 23. 305-320.

[82] McDonald. J.B. and White. S.B., 1993. A comparison of some robust, adaptive, and partially adaptive estimatorsof regression models, Economelric Reviews, 12. 103-124.

[83] Coursey, D, and Nyquist. H,. 1983. On least absolute error estimation oflinear regression models with dependentstable residuals, Ttie Review of Economics of Statistics. 65. 687-692.

[84] Weiss. A. A.. 1990. Least absolute error estimation in the presence of serial correlation.44, 127-158,

Page 20: Least absolute value regression: recent contributions

282 T E. Dielman

[85] Davis. R.A. and Dunsttiuir, W,T,M,. 1997, Least absolute deviation estimation for regression with ARMAerrors. Journal of Theoretical Pivtxibility, 10. 481^97.

186) Nyquist. H.. 1992. L|-norm estimation of regression models with serially dependent error terms. In: Dodge.Y. (Ed,), L\-Statistical Analysis and Retated Methods (Amsterdam: North-Holland I, pp. 253-264.

jK7[ Dielman. T.E. and Rose. E.L., 1994. Estimation in least absolute value regression with autocorrelated errors,Journat of Statisticat Ctimputation and SimiiUuion. 50, 29—43.

[88[ Dielman. T.E. and Rose, E.L., 1995, Estimation after pre-testing in least absolute value regression withauttKorrelated errors. Journal of Business and Management. 2. 74-95.

[89[ Nyquist. H.. 1997. A Lagrange multiplier approach to testing for serially dependent error terms. L\-StatisticalProcedures and Related Topics. IMS Lecture Notes -Monograph Series. 31. 329-336,

|90[ Dielman, T.E., 1986, A comparison of forecasts from least absolute value and least squares regression. Journalof Forecasting, S, 189-195.

|91 [ Dielman. T.E.. 1989. Corrections to: A comparison of forecasts from least absolute value and least squaresregression. Journat of Forecasting. 8. 419-420,

[92[ Diehtutn. T.B. and Rose. E.L.. 1994, Forecastitig in least absolute value regression with autocorrelated errors:A small-sample sludy. Internatirmal Journat of Forecasting. 10. 539-547,

[93 [ Koenker. R. and Bassett. CJ.. 1982. Tests of linear hypotheses and L\ estimation. tVfWrtwc/ncf;, 50. 1577-1583,[94[ Bai, /.,D., Rao, C.R. and Yin. Y.Q.. 1990. Least absolute deviations analysis of variance. Sankhya: The Indian

Jimrnal of Statistics. 52. Series A. 166-177.[95[ McKean. J. and Schrader. R., 1984. A comparison of methods for studentizing the sample median.

Communications in Statistics ~ Simulation and Computation, 13. 751-773,[9(i] McKean. J, and Schrader. R., 1987. Least abs<ilute em»rs analysis of variance. In: Dodge, Y. (Ed.). Statistical

Data Anatysis Based on ttie L\-Norm and Retated Mettiods (Amsterdam: North-Holland), pp, 297-305,[97] Sposito. V.A. and Tveite, M,D., 1986. On the estimation of the variance of the median used in L| linear

inference procedures. Communicati<ms in Stalislics-Ttieory and Methods. 15. I3fi7-1375.[98[ Sheather. S.J.. 1987.Assessing the accuracy ofihesatiiple median: Estimated standard errors versus interpolated

conlidence intervals. In: Dodge. Y. (Ed.). Statistical Data Analysis Based on the L \ -Norm and Related Methods(Amsterdam: North-Holland), pp. 203-215.

[99[ Dielman. T.E. and Pfalfonbergcr, R.. 1990. Tests of linear hypotheses in LAV regression. Communications inStatistics -Simulation and Computation, 19, 1179-1199,

I I(M1[ Dielman, T,E, and Pfaftenberger. R,. 1992. A further comparison of tests of hypotheses in LAV regression.Computatiifnal Statistics and Data Anatysis. 14. 375-384.

|IOI[ Dielman, TE. and Rose. E.L,, 1995. A bootstrap approach to hypothesis testing in least ab.solute valueregression. Computationat Statistics ami Data Anatysis. 20. 119-130.

[ IO2[ Dielman. T.H. and Rose. E.L.. 1996, A note on hypothesis testing in LAV multiple regression; A small-samplecomparison. Computationat Statistics and Data Anatysis. 21, 46. —170,

[ IO3[ Liu. Z.J., 1992, Non-parametric estimates of the nuisance parameter in the LAD tests. Communications inStati.stics - Ttieoiy and Meihods, 21, 861 -881.

[ 104[ Niemiro. W,. 1995, Estimation of nuisance parameters for inference based on least absolute deviations.Appticationes Mattiematicae. 22. 515-529.

[ IO5[ Rao. C.R.. 1988, Methodology based on the L|-norm in statistical inference. Sankhva: The Indian Journat ofStati.mcs. Series A. 50. 289-313.

[IO6[ Efron. B,, 1979, Bootstrap methods: Another look at the jackknife,/l»i«u/.v(/Sw»',(f/c.v, 7. 1-26.[ IO7[ Hall, P. and Wilson, S,R.. 199 L Twu guidelines lor bootstrap hypothesis testing. Biometrics, 47, 757-762.[IO8[ De Angelis. D.. Hall, P, and Young, G.A.. 1993, Analytical and bootstrap approximations to estimator

distributions in /| regression, Journat of the American Statisticat Association. 88. 1310-1316.[109! Koenker, R., 1987. A comparison of asymptotic testing methods for 1|-regression. In: Dodge. Y.( Ed.), ,.S'/«m7(C(j/

Data Analysis Ba.seil on ttic L\-Norm and Related Methods (Amsterdam: North-Hoi land), pp, 287-295,[ 110[ Schrader, R.M, and McKean. J.W.. 1987. Smalt-samplepropertiesof least absolute errors analysis of variance.

In: Dodge. Y. (Ed.). Statistical Data Analy.sis Ba.sed on the L\-Norm and Related Mettiods (Amsterdam:North-Holland), pp. 307-321,

(11 I [ Dielman, T.E. and Pfaffenberger, R.. 1988. Bootstrapping in least absolute value regression: An appticalion tohypothesis testing. Communications in Stalislics - Simulation and Computation. 17, 843-8.56,

1112 [ Zhang, J. and Boos. D.D.. 1994. Adjusted power estimates in Monte Cario ex]>eriments. Communications inStatistics -Simulation andCotttputation, 23. 165-173,

i 113[ Dielman. TE. and Ro.se. E.L.. 1997. Estimation and te,sting in least absolute value regression with seriallyconelatcd disturbances. Annals of Operations Re.^earch, 74, 239-257.

i I I4[ Stangenhaus. 0.. 1987. Bootstrap and inference procedures for Li regression. In: Dodge. Y. (Ed.). Stati.\ticalData Atialysis Based on the 1.1 -Norm and Relalecl Mettiods (Amsterdam: North-Holland), pp. 323-332.

[ I I5[ Stangenhaus. G. and Narula. S.C. 1991. Inference pnu-edures forthe L| regression. Computational Statisticsand Data .Analy.sis, 12. 79-85.

[ I I6[ Stangenhaus, G., Narula, S.C. and Ferrerira. P., 1991, Bootstrap confidence intervals forthe minimum sum ofabsolute errors regression. Joumal of Statisticat Computation and Simulation. 46, 127-133,

[117] Dielman. T,E, and Pfaffenberger. R.. 1988b. Least absolute value regression: Necessary sample sizes lo usenormal theory inference procedures. Decision Sciences. 19, 734—743.

Page 21: Least absolute value regression: recent contributions

Lea.'it absolute vatue 283

[118] Gutenbrunner, C . JureckoviS. J.. Koenker. R. and Portnoy, S,. 1993, Tests of linear bypotheses based onregression rank scores. Nonparametric Statistics, 2. 307-331.

[ 1191 Cade. B.S. and Richards. J.D.. 1996, Permutation tests for least absolute deviation regression. Biometrics. 52,886-902.

[120] Horowitz. J.L.. 1998. Bootstrap methods for median regression models. Econometrica. (tt. I327-L351,[ 121 [ Weiss. A.A,. 1988. A comparison of ordinary least squares and least ab.solute error estimation. Econometric

Theory. 4. 5\1-521.[ 122 [ Furno, M.. 200(}, LM tests in the presence of non-normal error distributions. Econometric Theory. 16,249-261.[I23[ An. H, and Chen, Z,, 1982, On convergence of LAD estimates in autoregression with infinite variance. yow/'Mrt/

of Multivariate Anatysis. 12. 335-345.[ I24[ Dunsmuir. W.TM. and Spencer. N.M,, I99I, Strong consistency and asymptotic normality of l| estimates of

the autoregressive moving-average model. Journal of Time-Series Anatysis, 12. 95-104.[ I25| Dunsmuir. W.T.M.. 1992. A simulation study of 1| estimation of a seasonal tnoving average time-series model.

Communication in Statistics-Simulation and Computations. 21. 519—531.[126] Dunsmuir. W.TM. and Murtagh. B.A,. 1993, Least absolute deviation estimation of stationary time-series

models. European Journal of Operational Resean-h. 67. 272-277,[127[ Olsen. E.T. and Rui'.insky. S,A.. 1989. Characterization of ihe LAD(L|) AR parameter estimator when applied

to stationary ARMA. MA. and higher order AR processes./£'££7r(;H.v(ic/(<7H.v<«Mf(>M,v»'c.(, Speech, and Si gnatProcessing. 37. 1451-1454.

[ 128[ Ru/insky, S.A. and Olsen. E.T, i989,Strongconsisteneyof tbe LAD (L|) estimator of parameters of stationat7autoregressive processes with zero mean. IEEE Transactions on Acoustics. Speech and Signal Pmces.sing. 37,597-6(K).

|]29[ Pino, F,A. and Morettin, P.A., 1993, The consistency of the L|-norm estimates in ARMA models.Communications in Statistics - Ttieory ami Methodotogy. 22. 2185-2206.

[I3O[ Knight. K,. 1997. Some limit theory for L|-eslimators in autoregressive models under general conditions,L\-Statisticat Procedures and Related Topics. IMS Lecture Notes-Monograph .Series. 31. 315-328,

[131[ Knight. K.. 1998, Limiting distributions for L| regression estimators under general conditions. Armats ofStatistics. 26. 755-770.

[ I32[ Herce, M,A., 1996. Asymptotic theory of LAD estimation in a unit root process with finite variance errors.Econometric Theory. 12, 129-153.

[ I33[ Rogers. A.J., 2(H)1. Least absolute deviations regression under nonstandard conditions. Econometric Ttieory.17. 820-852,

1134[ Gonin. R. and Money. A.H.. 1987. A review of computational methods for solving the non-linear L|-normestimation problem. In: Dodge. Y. (Ed.). Statisticat Data Anatysis Based on the L i -Nonn and Retaied Methods{Amsterdam: North-Holland), pp. 117-129.

[ I35[ Gonin. R. and Money. A.H.. 1989. Nontinear Lp-Norm Estimation (NewYork: Marcel Dekker Inc.),[I36| Oberhofer. W., 1982. The consistency of non-linear regression minimizing the L\-nOYm. Annals of Statistics,

10,316-319.[I37[ Richardson. CD. and Bhattacharyya. B.B., 1987. Consistent L|-cstimators in non-linear regression for a

noncompact parameter space. Sankhya: The Itidian Journat of Statistics, Series A. 49. 377-387.[ I38[ Soliman. S,A,. Christensen, G.S. and Rouhi, A.. 1991. A new algorithm tor nonlinear L|-norm minimization

with nonlinear equality constraints. Computational Statistics and Data Anatysis. 11. 97-109.[ 139! Weiss, A.A,, I99L Estimating non-linear dynamic models using least absolute error estimation. Ecommielric

Theory, 7. 46-68.[ i40[ Sbi. P. and Li. G,. 1994. Asymptotics of the minimum L| -norm estimates in a partly linear model. Sy.ttem.s

Science and Mathematicat Sciences, 7. 67-77,[14![ Shi. P, and Li. G., 1994. On the rates of convergence of minimum L|-norni estimates in a partly linear model.

Communications in Stalislics - Theory and Methods. 23. 175-196.(142) Kim. H.K. and Choi. S.H., 1995, Asymptotic properties of non-linear least absolute deviation estimators.

Journat of ihe Korean Statistical Society, 24, 127-139.1143[ Andre, C.D.S.Niuiila. S.C, Peres. C, A. and Ventura, G.A,, 1997, Asymptotic properties of the minimum sum

of absolute errors estimators in a c!o.se-response model. Joumal of Statistical Computation atid Simulation,58,361-379,

1144) Zwan/ig. S.. 1997, On Li-norm estimators in nonlinear regression and in nonlinear error-in-variables models.L\-Stalistical Procedures and Retated Topics. IMS Lecture Notes - Monograph Series, H. tOI-l 18,

[I45[ Coleman, J.W. and Larsen. i.E.. 1991, Alternative estimation techtiiques for linear appraisal models. TheAppraisat Joumal. S9. 526-532.

[146] Corrado. C.J. and Schatzberg. J.D.. 1991. Estimating systematic risk with daiiy security returns: A note on therelative efficiency of selected estimators. Ttie Financial Review. 26, 587-599.

[147] Chan, L.K.C, and Lakonishok. J.. 1992. Robust measurement of beta risk. Journal of Financial and QuantitativeAnalysis. 27. 265-282.

1148[ Butler, R,J.. McDonald. J.B.. Nelson. R.D. and White. S.B,. 1990. Robust and partially adaptive estimation ofregression models. The Review of Economics and Statistics. 72. 321-327.

1149] Draper. P. and Paudyal. K., 1995. Empirical irregularities in the estimation of beta: The impact of alternativeestimation assumptions and prinredures. Journal of Business Finance & Accounting, 22. 157-177.

Page 22: Least absolute value regression: recent contributions

284 T.E. Dielman

\ 15O[ Mills, T C Coutts. J.A. and Roberts. J.. 1996. Misspecification testing and robust estimation ofthe marketmodel and their implications for event studies. Applied Economics. 28. 559-566.

[151) Bassett, G., 1997. Robust sports ratings based on lea.st absolute errors, Ttie American Staii.stician. 51,99-105,

[152] Ka;rg3rd. N., 1987, Estimation criterion, residuals and prediction evalualion. Computational Statistics andData Anaty.sis, 5, 443^50.

[ I53[ Sengupta. J.K. and Okamura. K.. 1993. Scale economies in manufacturing: Problems of robust estimation.Empiricat Economics. 18. 469-480.

[ I.'i4[ Westland, A,H., 1989. Robust estimation and prediction of economic systems: The ease of partial sttiicturalvariability, Quatity & Quantity. 23. 61-73.

[155[ Gonin, R, and Money, A,H,, 1987. Outliers in physical processes: L|- or adaptive Lp-notTii estimation?In: Dodge, Y. (Ed.), Statisticat Data Analv.sis Based on ttie L\-Nomi and Related Mettiods (Amsterdam:Nonh-Holland), pp. 447-154.

1156[ Roy. T, 1993, Estimating shelf-life using L| regression methods. Journat of Pharmaceutical & BiomedicalAnatysis. 11.843-845.

[157[ Rousseeuw, P,J, and van Zomeren, B,C,, 1992, A comparison of some quick algorithms for robust regression.Cimiputationat Statistics and Data Analy.s-is. 14, 107-116.

1158[ Powell, J.L.. 1983, The asymptotic normality of two-stage least absolute deviations e.stimaiors, Econometrica,51. 1569-1575.

[I5y[ Powell, J.L., 1984. Least absolute deviations estimation for tbe censored regression model. Joumal ofEconometrics. 25, 303-325.

[ 160[ Rao. C.R. and Zhao, L.C, 1993, Asymptotic normality of LAD estimator in censored regression models.Matliematicat Mettiods of Statistics, 2, 228-239.

[1611 Koenker. R. and Ponnoy, S.. 1990, M estimation of multivariate regressions, Joumal of the American StatisticalA.ssociation, 85, 1060-1068,

[ 162] Honore, B,E,. 1992, Trimmed LAD and least squares e.stimation of truncated and censored regression modelswith fixed effects, Economeirica. 60. 533-565.

[ 163[ Wang. F T and Scott. D.W,. 1994. The L| method for robust non-parametric regression. yourna/o/?/i*'-4m(?rit«HStatistical Association, 89. 65-76,

[ 164[ Dodge. Y.. 1984. Robust estimation of regression coefficients by minimizing a convex combination of leastsquares and least absolute deviations. Computational Stalislics Quarterly. I. 139-153,

[ 165] Dodge. Y, and Jureckova. J.. 1987. Adaptive combination of least squares and least absolute deviations esti-mators. In: Dodge. Y. (Ed.). Statistical Data Anatysis Based on L\-Norm and Retated Meihods (Amsterdam:North-Holland), pp, 275-284.

[ 16(i[ Dodge. Y,. Antoch, J, and Jureckova, J,. 1991. Computational aspects of adaptive combination of least squaresand least absolute deviations estimators, C<miputalional Statistics and Data Analysis. 12, X7-99.

[ !(i7] Dodge. Y, and Jureckova, J,. 1988. Adaptive combination of M-estimalor and L]-estimator in the linear model.In: Dodge. Y.. Fedorov. V. V. and Wynn. H. P. (Eds.). Optimal Design ami Analysis ofE.xperiments (Amsterdam:Elsevier Sciences Publishers), pp. 167-176,

[ Ui8j Dodge. Y. and Jureckovd. J.. 1992. A class of estimators based on adaptive convex combinations of twoestimation procedures. In: Dodge, Y, (Ed.). Li-Statistical Analysis and Related Methods (Amsterdam: North-Holland), pp. 31-45.

[169] Mathew. T. and Nordstrom. K., 1993. Least squares and least absolute deviation procedures in approximatelylinear models. Statistics & Probability Letters. 16, 153-158.

[ I7O[ Narula. S.C. and Karhonen, PJ,, 1994, Multivariaie multiple linear regression based on the tninimutii sum ofabsolute errors criteritin. European Journal of Operational Research. 73, 70-75.

[171 [ Morgenthaler. S,, 1992. Least-absolute-deviations fits for generalized linetu" models, Biomt-trika.79.741-154.[ i72[ Puig. P, and Stephens, M.A., 2000. Tests of fit forthe Laplace distribution, with applications, Tectmometrics.

42.417-424,[173] Bradu. D., 1997, Identification of outliers by means of L| regression: Safe and unsafe configurations,

Coni/iiitationat Statistics and Data Analysis. 24, 271 -281.[ 174[ Dodge. Y., 1997, LAD regression for detecting outliers in response and explanator>' variables. Journal of

Mutlivariale Anatysis. 61, 144-158.[175] Morgenthaler. S.. 1997. Properties of L| residuals, L\-.Statistical Procedures and Related Topics. IMS Lecture

Notes - Monograph Series. 31. 79-90.[ 176] Sheather. S.J. and McKean, J,W., 1992, The interpretation of residuals based on L| estimation. In: Dodge, Y,

(Ed.), L\-Statisticat Anatysis and Related Methods. North-Holland, Amsterdam, pp. 145-155.[ 177] Hurvich. CM, and Tsai, C . I WO. Mode! .selection for least absolute deviations regression in small-samples.

Staii.stics & Pivbabitity Letters. 9, 259-265,[ 178) HuSkova. M.. 1997. Li-test procedures for detection of change. L\-Statisticat Procedures and Related Topics.

IMS Lecture Notes - Monograph Series, 31. 57-70.

Page 23: Least absolute value regression: recent contributions

Least absolute value 285

Summary of Additional Papers on Least Absolute Value Estimation Not Cited in Text

Armstrong, R.D, and Kung, M,T., 1984, A linked list data structure for a simple linear regression algorithm.Computational and Operational Research. II , 295-305, Pre,sents a special purpose algorithm to solve simpleLAV regression problems. The algorithm is a specialization of the linear programing approach developed byBarrodale and Robens. but requires considerably less storage.

Bassett, G.. 1992, The Gauss Markov property forthe median. In: Dodge. Y. (Ed.). L \ -StatisticalAnalyses and RelatedMethods (Amsterdam: North-Holland), pp. 23-31. A Gauss Markov type theorem for the median is proved. Theauthor shows that such a result implies more about restrictions on the cla.ss of estimators considered than optimalityof the e.stimator.

Brennan,J.J.andSeiford,L.M.. 1987, Linear programing and 1| regression; A geometric interprelalion. ComputationalSlatislics ami t)ata Anatysis, 5, 263-276. Provide a geometric interpretation of the solution process of the LAVregression problem.

Danao. R.A,. 1983, Regression by minimum sum of absolute errors: A note on perlect muiticollinearity. PhilippineReview of Economics and Business. 20. 125-133, When there is perfect tiiulticollinearity among regressors ina LAV regression, the simplex algorithm wil! chtwse one maximal set of lineariy independent regressors fromthe equation by setting the coefficients of the other variables equal to zero, ln essence, the variables witb zerocoefficients are dn)pped fmm tbe equation. There will be multiple optimal solutions pos,sible in such a ease.

Dodge, Y. and Roenko. N.. 1992. Stability of L|-nonn regression under additional observations. ComputationalStatistics and Data Anatysis. 14,385-390. A lest is provided to determine whether the introduction of an ailditionalobservation will lead to a new set of LAV regression estimates or whether the original solution remains optimal,

Dupacova, J.. 1992. Robustness of L i regression in the light of linear programing. In; DtKige, Y, (Ed.), L\-StalislicatAnaly.sis and Related Meihods (Amsterdam: North-Holland), pp. 47-61, Uses linear programing results to examinethe behavior of LAV estimates in the linear regres,sion model. Properties of LAV estimates are explained throughthe useof LP.

Earebrother, R.W.. 1987b, Mechanical representations of the Li and L: estimation problems. In: Dodge, Y. (Ed,),Slalislicat Data Analysis Ba.sed on the L i -mirm and Related Meihods {AmslcriSam: Nonli-Holland), pp, 455-464,

Ha. CD. and Narula, S.C. 1989, Perturbation analysis for ihe minimum sum of absolute errors regression. Communi-cations in Statistics-Simulation and Computation, 18,957-970, Used sensitivity analysis to investigate the amountby which the values ofthe response variable forthe non-defining observations (those with non/cro residuals) wouldchange without changing the parameter estimate in a LAV regression,

Harris, T. 1950. Regression using minimum absolute deviations. .American Stati.stician. 4. 14-15. Brief discussionof LAV regression as an answer to a contributed question.

Huber. P., I9S7. The place of the L|-norm in robust estimation, Computalionat Statistics and Data Atuttysis, 5,255-262. Discussed the place of the LAV estimator in robust estimation. Huber states the two main purposes ofLAV estimation as (1) providing estimates with minimal bias if the observations are asymmetrically contaminatedand [2) furnishing convenient staning values for estimates based on iterative procedures.

Koenker. R. and Bassett, G.W.. 1984, Four (pathological) examples in asymptotic statistics. The American Staii.ttician,38. 209-212. The authors present four examples illustrating varieties of pathological asymptotic behavior. Theexamples are presented in the context of LAV regres,sion. The article provides insight into results ofthe failure ofstatidard conditions,

McConnell, C.R.. 1987. On computing a best discrete L| approximation using tbe method of vanishing Jacobians.Computatitmal Statistics and Data Anatysis, 5. 277-288. Shows that (be method of vanishing Jacobians ean beused to solve the L| linear programing problem,

McKean, J.W, atid Sievers, G.L., 1987, Coefficients of determination for least absolute deviation analysis. Statistics<£ Probabitity Letters, 5, 49-54. Desirable properties for a coefficient of determination are developed and thenpossible choices in the case of LAV regression are discussed. A measure linked to the test statistic for signitieanceof LAV regression coefficients seems to be a good cboice.

Muller, C, 1997, Lj -tests in linear models: Tests with maximum relative power. L \ -Statistical Procedures and RelatedTopics. IMS Lectutv Notes -Monograph Series. 3\. 91-99, Showed that Wald-type tests i.tn coefficients in LAVlinear regression miiximize the relative power (power relative lo the bias),

Muller. C . 1992. l.i-estimaiion and tesling in conditionally contaminated linear models. In: Dodge. Y. (Ed,),L\-Statisticat Anatysis and Related Mettiods (Amsterdam: North-Holland), pp, 69-76, Considered regressionmodels with disturbances thai may be from different contaminated nonnal distributions and examined situationswhere LAV estimators and tests are not always the most robust.

Page 24: Least absolute value regression: recent contributions

286 T. E. Dielman

Natiila. H.C. and Wellington. J,F,, 1985. Interior analysis for the minimum sum of absolute etrors regression,Tectinonu'trics. 27. ISI -188. Shows that there exists an interval for the value ofthe response variable of a nondefin-ing observation (i>bservat ion with non-/ero residual) such that, if the \alue of the observation is in this interval, theLAV paratneter estimates will not change. Also develops a procedure to determine an interval for the value of iheresponse variable for a defining oKservation (observation with zero residual) such that, if the value of the responsevariable for the observation is in this interval, the set of defining oKservations does not change.

Saleh, A.K.M.E. and Sen, P.K.. 1987. On the asymptotic distributional risk properties of pretest and shrinkageL| -estimators. Computational Statistics and Data Anatysis, 5. 289-299. This paper considered LAV estimationof a subset of coefficients when another subset of coetticicnts is included in Ihe model but tnay be unnecessary.The LAV e.siimator, a shrinkage estimator and a pre-test estimator arc examined. On ihe basis of asymptoticdistributional risk, the shrinkage e.stimator may dominate the LAV estimator, but not the preliminary lest estimator.

Page 25: Least absolute value regression: recent contributions