12
This article was downloaded by: [Case Western Reserve University] On: 22 November 2014, At: 17:05 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Applied Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/cjas20 Residuals for log-Burr XII regression models in survival analysis Giovana O. Silva a , Edwin M.M. Ortega a & Gilberto A. Paula b a Department of Exact Sciences , Universidade de São Paulo (USP) , Av. Pádua Dias 11 - Caixa Postal 9, Piracicaba, SP, 13418-900, Brazil b Department of Statistics , Universidade de São Paulo (USP) , Rua do Matão 1010, São Paulo, SP, 05508-090, Brazil Published online: 30 Sep 2010. To cite this article: Giovana O. Silva , Edwin M.M. Ortega & Gilberto A. Paula (2011) Residuals for log-Burr XII regression models in survival analysis, Journal of Applied Statistics, 38:7, 1435-1445, DOI: 10.1080/02664763.2010.505950 To link to this article: http://dx.doi.org/10.1080/02664763.2010.505950 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Residuals for log-Burr XII regression models in survival analysis

Embed Size (px)

Citation preview

Page 1: Residuals for log-Burr XII regression models in survival analysis

This article was downloaded by: [Case Western Reserve University]On: 22 November 2014, At: 17:05Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Applied StatisticsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/cjas20

Residuals for log-Burr XII regressionmodels in survival analysisGiovana O. Silva a , Edwin M.M. Ortega a & Gilberto A. Paula ba Department of Exact Sciences , Universidade de São Paulo(USP) , Av. Pádua Dias 11 - Caixa Postal 9, Piracicaba, SP,13418-900, Brazilb Department of Statistics , Universidade de São Paulo (USP) , Ruado Matão 1010, São Paulo, SP, 05508-090, BrazilPublished online: 30 Sep 2010.

To cite this article: Giovana O. Silva , Edwin M.M. Ortega & Gilberto A. Paula (2011) Residuals forlog-Burr XII regression models in survival analysis, Journal of Applied Statistics, 38:7, 1435-1445,DOI: 10.1080/02664763.2010.505950

To link to this article: http://dx.doi.org/10.1080/02664763.2010.505950

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Residuals for log-Burr XII regression models in survival analysis

Journal of Applied StatisticsVol. 38, No. 7, July 2011, 1435–1445

Residuals for log-Burr XII regressionmodels in survival analysis

Giovana O. Silvaa, Edwin M.M. Ortegaa∗ and Gilberto A. Paulab

aDepartment of Exact Sciences, Universidade de São Paulo (USP), Av. Pádua Dias 11 - Caixa Postal 9,Piracicaba, SP 13418-900, Brazil; bDepartment of Statistics, Universidade de São Paulo (USP),

Rua do Matão 1010, São Paulo, SP 05508-090, Brazil

(Received 8 September 2008; final version received 18 June 2010)

In this paper, we compare three residuals to assess departures from the error assumptions as well as to detectoutlying observations in log-Burr XII regression models with censored observations. These residuals canalso be used for the log-logistic regression model, which is a special case of the log-Burr XII regressionmodel. For different parameter settings, sample sizes and censoring percentages, various simulation studiesare performed and the empirical distribution of each residual is displayed and compared with the standardnormal distribution. These studies suggest that the residual analysis usually performed in normal linearregression models can be straightforwardly extended to the modified martingale-type residual in log-BurrXII regression models with censored data.

Keywords: censoring data; Burr XII distribution; residual analysis; survival data analysis

1. Introduction

The assessment of the fitted model is an important part of data analysis, particularly in regres-sion models, and residual analysis is a helpful tool to validate the fitted model. Examination ofresiduals can be used, for instance, to detect the presence of outlying observations, the absenceof components in the systematic part of the model and departures from the error and varianceassumptions. However, finding appropriate residuals in non-normal regression models has beenan important topic of research, particularly under censoring. The preference would be for residualswhose empirical distribution is close to normality. Thus, many of the residual analyses usuallyapplied in normal linear regression can be straightforwardly extended to non-normal models.

In survival analysis applications, the failure rate function frequently has an unimodal shape. Insuch cases, the log-normal or log-logistic regression models [7,8] are used. Recently, the log-BurrXII regression model was proposed by Silva et al. [13]. Additionally, the log-logistic regression

∗Corresponding author. Email: [email protected]

ISSN 0266-4763 print/ISSN 1360-0532 online© 2011 Taylor & FrancisDOI: 10.1080/02664763.2010.505950http://www.informaworld.com

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 3: Residuals for log-Burr XII regression models in survival analysis

1436 G.O. Silva et al.

model is a special case of the log-Burr XII regression model. So, the log-Burr XII regressionmodel can be widely applied in the area of survival analysis as an alternative to the log-logisticand log-normal regression models. In addition, the Bur XII distributions have been applied invarious survival analyses [12], for which studies can be extended. We feel that this new modelwill attract wider applications in reliability studies and biology as well as in other areas of research.

In this paper, we proposed a residual for the log-Burr XII regression model whose empiricaldistribution is close to normality. To find this residual, we perform a study considering threeresiduals. The lack of residual studies for the log-logistic regression model reinforces the needfor the studies developed in this paper.

The first residual is the martingale residual [3]. The second residual is a martingale-type basedon the deviance component residual for the Cox’s proportional hazard model with no time-dependent explanatory variables, introduced by Therneau et al. [14]. The last is a modificationin the martingale-type residual. We verify, after intensive Monte Carlo studies, that the modifiedmartingale-type residual for the majority of the cases studied has an empirical distribution closerto the normality assumption than the other two residuals.

The paper is organized as follows. In Section 2, the log-Burr XII regression models are presentedas well as some inferential results. Additionally a brief study is presented for the log-Burr XIIdistribution. Section 3 deals with the definition and discussion of the residuals and presents andcomments the results from various simulation studies. In Section 4, a real data set is analyzed andSection 5 concludes.

2. Log-Burr XII regression models

We consider the following log-Burr XII regression model:

yi = xTi β + σzi, i = 1, . . . , n, (1)

where yi is the observed log-lifetime or log-censoring time for the ith individual, β =(β1, . . . , βp)T(−∞ < βj < ∞, j = 1, . . . , p) is a vector of unknown parameters, σ > 0 is anunknown scale parameter, xi = (xi1, . . . , xip)T contains values of explanatory variables and Zi

follows a distribution with probability density function (pdf) given by

f (z; k) = k{1 + exp(z)}−(k+1) exp(z), −∞ < z < ∞, (2)

where k > 0 is an unknown parameter and z = (y − xTβ)/σ .It is important to know the cumulants of the pdf (2), which are defined by the cumulant

generating function. The theorem below presents the cumulant generating function for the thisvariable.

Theorem 1 For the variable Z, the cumulant generating function is given by

KZ(ξ) = log(k) + log[B(k − ξ, ξ + 1)], if k > ξ,

where B[a, b] is the complete beta function [8]. The cumulants of Z are obtained by the differentialKZ(ξ) so that (dj /rdξj )KZ(ξ)|ξ=0 = κj with κ0 = KZ(0) = 0. The first two cumulants are themean and variance, respectively.

Corollary 1 The mean and variance of Z are given, respectively, by

E(Z) = ψ(1) − ψ(k) and V (Z) = ψ ′(1) + ψ ′(k),

where ψ(·) is the digamma function [8].

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 4: Residuals for log-Burr XII regression models in survival analysis

Journal of Applied Statistics 1437

The third cumulant is a measure of skewness for the distribution of Z, in the sense that κ3 =0 when Z is distributed symmetrically. The fourth cumulant is a measure of kurtosis for thedistribution of Z.

Corollary 2 The third and fourth cumulants of Z are, respectively, given by

κ3 = ψ ′′(1) − ψ ′′(k) and

κ4 = 6[ψ(1) − ψ(k)]2[ψ(k) + ψ ′(1)] + 6[ψ(1) − ψ(k)]2gi − 3g2i

− [ψ(1) − ψ(k)]{[�′′(1) + 13ψ(1)][ψ ′(k) − ψ2(k)] − 10�′′(1)ψ(k) + 2ψ3(k) + 4�′′′(1)}+ ψ ′′′(k) + 3(ψ ′(k))2 + 3ψ ′(k)ψ3(k) + 9ψ2(k)ψ ′(k) + ψ4(k)

− 18ψ(1)ψ(k)ψ ′(k) − 4ψ(1)ψ3(k) + 6�′′(1)[ψ ′(k) + ψ2(k)] − 4�′′′(1)ψ(k) − �′′′′(1),

where gi = [ψ ′(k) + ψ2(k) − 2ψ(1)ψ(k) + �′′(1)], �′′(1) = ψ2(1) + ψ ′(1), �′′′(1) = ψ ′′(1)

+ 3ψ(1)ψ ′(1) + ψ3(1) and �′′′′(1) = ψ ′′′(1) + 3ψ(1)�′′(1) + 3�′′(1)ψ ′(1) + �′′′(1)ψ(1).

The analysis of cumulants is very important since for j > 3 they vanish for the normal distri-bution. Note that when k = 1, one obtains κ3 = 0, but this is not a sufficient condition for the Z

variable be symmetrically distributed. Besides, the standardized cumulants are given by

ρr = κr

κ(r/2)

2

, r = 1, 2, . . . , s. (3)

A detailed discussion on the cumulants theory can be found, for instance, in [9].Consider now the censored case with the assumption of noninformative censoring and the

regression model given in Equation (1). Let yi be either the observed log-lifetime or log-censoringtime for the ith individual. The set of individuals for which yi is a log-lifetime or a log-censoringtime will be denoted by F or C, respectively. The total log-likelihood function of the model (1)for θ = (k, σ, βT)T is given by

L(θ) =∑i∈F

log[f (yi; θ)] +∑i∈C

log[S(yi; θ)],

where f (yi; θ) and S(yi; θ) denote the pdf and survivor function, respectively, which assume thefollowing forms:

f (yi; θ) = k

σ

[1 + exp

(yi − xT

i β

σ

)]−(k+1)

exp

(yi − xT

i β

σ

)and

S(yi; θ) =[

1 + exp

(yi − xT

i β

σ

)]−k

,

where −∞ < y < ∞. Note that when k = 1, the log-logistic regression model appears as a specialcase of the log-Burr XII regression model.

Then, the log-likelihood function for θ can be expressed as

l(θ) = r log(k) − r log(σ ) +∑i∈F

zi − (k + 1)∑i∈F

log[1 + exp(zi)]

− k∑i∈C

log[1 + exp(zi)], (4)

where r is the number of uncensored observations (failures) and zi = (yi − xTi β)/σ .

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 5: Residuals for log-Burr XII regression models in survival analysis

1438 G.O. Silva et al.

The maximum-likelihood estimates of the regression coefficients and the k and σ parametersare obtained by maximizing Equation (4). We use in this step the MaxSQP subroutine in the matrixprogramming language available in [5]. As initial values, we have considered the estimates underthe location-scale regression model based on the log-logistic model [7,8].

The asymptotic inference for the parameter vector θ can be based on the normal approximation

of the maximum-likelihood estimator (MLE) θ given by θT ∼ N(p+2){θT; L(θ)−1}, where L(θ) is

the (p + 2) × (p + 2) observed information matrix, obtained from

L(θ) =⎛⎝Lkk Lkσ Lkβj

· Lσσ Lσβj

· · Lβj βs

⎞⎠with the sub-matrices given in Silva et al. [13].

3. Residuals analysis

In order to study departures from the error assumption as well as the presence of outly-ing observations, we will consider the martingale residual [3,7] and transformations for thisresidual [10].

In addition to these residuals, another possibility is to use the deviance component residualdefined by di = sgn(yi − μi)

√2{i(θ) − i(θ, )}1/2, where θ is the MLE of θ under the saturated

model (with n parameters), θ is the MLE under the model of interest (with p + 2 parameters)obtained by maximizing Equation (4) and sgn(yi − xT

i β) denotes the sign of (yi − xTi β). In the

log-likelihood for the saturated model, yi are the estimatives of xTi β in Equation (4). However, it

is not possible to estimate σ and k under the saturated model. So one should replace k and σ bytheir respective maximum-likelihood estimates, k and σ , when computing the residual proposedabove. For censored data, Davison and Gigli [4] defined the deviance component residual as

sign(yi − μi){−2 log[S(yi, θ)]}1/2, (5)

where S(yi, θ) is the MLE of the survival function. Thus, the deviance component residual forlog-Burr XII regression models can be expressed as

rDCi=

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

sgn(yi − xTi β)

√2

{−(k + 1) log(2) −

(yi − xT

i β

σ

)

+(k + 1) × log

[1 + exp

(yi − xT

i β

σ

)]}1/2

, if i ∈ F ;

sign(yi − xTi β)

⎧⎨⎩−2 log

⎡⎣{1 + exp

(yi − xT

i β

σ

)}−k⎤⎦⎫⎬⎭

1/2

, if i ∈ C.

Numerical results not presented here showed that {i (θ) − i(θ)} may be negative, thus pre-venting computation of the corresponding deviance residual. In what follows, we do not considerthe deviance residual.

McCullagh [9] showed that

T (Z) = 2√

k∗(z), where k∗(z) = supξ

{ξz − KZ(ξ)}

has an approximately normal distribution with mean −ρ3/6 and variance 1 + (14ρ23 − 9ρ4)/36,

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 6: Residuals for log-Burr XII regression models in survival analysis

Journal of Applied Statistics 1439

where KZ(t) is given by Theorem 1 and ρ3 and ρ4 are defined in Equation (3). Hence, the T (Z)

variable could be used as a possible residual, since it can be regarded as a transformation of thevariable Z, which represents the error for the regression model, with the aim of obtaining a variablewith standard normal distribution. However, to obtain k∗(z) for the pdf given in Equation (2), itis necessary to solve the equation −ψ(k − ξ) + ψ(ξ + 1) = z, which is very hard, discouragingthe use of this residual.

3.1 Martingale-type residual

In parametric lifetime models, the martingale residual can be expressed as rMi= δi +

log[S(yi, θ)], with δi = 1 indicating that the observation is uncensored and δi = 0 indicatingthat the observation is censored. Thus, the martingale residual for log-Burr XII regression modelsassumes the form

rMi=

{1 − k log[1 + exp(zi)], if i ∈ F ;−k log[1 + exp(zi)], if i ∈ C,

where zi = (yi − xTi β)/σ .

Another possibility is to use a transformation of the martingale residual based in the deviancecomponent residual for the Cox proportional hazard model with no time-dependent explanatoryvariables as introduced by Therneau et al. [14], and defined as

rDi= sign(rMi

){−2[rMi+ δi log(δi − rMi

)]}1/2,

where rMiis the martingale residual. Thus, a martingale-type residual for log-Burr XII regression

models can be expressed as

rDi=

⎧⎪⎪⎨⎪⎪⎩sign{1 − k log[1 + exp(zi)]}

×{−2[1 − k log[1 + exp(zi)] + log(1 − k log{1 + exp(zi)})]}1/2, if i ∈ F ;sign{−k log[1 + exp(zi)]}{2k log[1 + exp(zi)]}1/2, if i ∈ C.

We use this transformation in order to obtain a new residual symmetrically distributed aroundzero [10]. In the sequel, we discuss the results from various simulation studies of the empiricaldistribution of the residuals proposed in this section.

3.1.1 Simulation studies

In order to investigate the empirical distributions of the residuals rMiand rDi

for the values n = 50and n = 100, k = 0.27, 1.00 and 2.00, σ = 0.5 and 0.8 (failure rate function is unimodal) andcensoring percentages 0, 0.10 and 0.30, we performed a small simulation study described, in thesequel. The lifetimes denoted by T1, . . . , Tn were generated from the Burr XII distribution given byZimmer et al. [15], considering the reparametrization c = 1/σ and s = exp(μ) and by assumingμi = β0 + β1xi , with xi being generated from a uniform distribution in the range [0, 1], β0 andβ1 fixed. The censoring times denoted by C1, . . . , Cn were generated from a uniform distribution[0, θ ], where θ was adjusted until the censoring percentages, 0, 0.10 or 0.30, were reached. Table 1presents the values of θ for each combination of n, k, σ and censoring percentages.

The lifetimes considered in each fit were calculated as min{Ci, Ti}. For each setting of n, k, σ

and censoring percentages, 1000 samples were generated, each one fitted under the log-Burr XIIregression model (1) with μi = α + βxi . For each fit, the residuals rMi

and rDiwere calculated and

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 7: Residuals for log-Burr XII regression models in survival analysis

1440 G.O. Silva et al.

Table 1. Values of parameter from the uniform distribution (θ ).

Censoring 10% Censoring 30%

σ -Value k-Value n = 50 n = 100 n = 50 n = 100

0.50 0.27 450 300 80 501.00 50 50 15 252.00 20 25 10 15

0.80 0.27 850 1500 150 2001.00 50 50 20 252.00 20 20 10 10

stored. Then we performed normal probability plots between the mean quantiles of the residualsand the expected quantiles of the standard normal distribution.

Tables 2 and 3 give a numerical summary of the simulations. We can observe that residual rDihas

mean systematically greater than zero, negligible skewness, small kurtosis and standard deviation(SD) near 1. The residual rMi

presents kurtosis that differs from 3 and skewness near zero. Thedistributional form of rMi

seems to be skew (it has maximum value +1 and minimum value −∞).The empirical distribution of rDi

appears to present similar agreement with the standard normaldistribution for values of σ < 1.0 for each n, k, σ and censoring proportion given. Hence, wehave proposed a change in the martingale-type residual for uncensured observations in order toobtain mean approximately 0.

Table 2. Estimated distributional measures for the martingale residual in log-Burr XII regression models.

Martingale residual (rM)

Censoring 0% Censoring 10% Censoring 30%

σ -Value k-Value Average n = 50 n = 100 n = 50 n = 100 n = 50 n = 100

0.50 0.27 Mean 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000SD 0.9779 0.9887 0.8774 0.8892 0.6797 0.6863Skewness −1.7962 −1.9025 −1.2478 −1.4025 −0.7258 −0.9190Kurtosis 7.4183 8.1942 4.5822 5.7999 2.8766 3.9953

1.00 Mean 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000SD 0.9744 0.9902 0.8808 0.8911 0.6791 0.6917Skewness −1.7237 −1.8778 −1.1385 −1.4522 −0.8608 −0.8200Kurtosis 6.6780 7.8461 4.8780 5.6602 3.1532 3.0237

2.00 Mean 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000SD 0.9793 0.9913 0.8811 0.8927 0.6820 0.6937Skewness −1.7311 −1.8669 −1.3912 −1.4983 −0.8742 −0.8807Kurtosis 6.5616 7.6331 5.0676 5.8461 3.1151 3.2560

0.80 0.27 Mean 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000SD 0.9764 0.9895 0.8748 0.8872 0.6793 0.6862Skewness −1.7912 −1.9080 −1.2692 −1.3372 −0.7083 −0.7100Kurtosis 7.3948 8.2353 4.8867 5.4181 2.8978 3.0452

1.00 Mean 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000SD 0.9743 0.9904 0.8797 0.8916 0.6805 0.6904Skewness −1.7230 −1.8799 −1.3035 −1.4246 −0.7295 −0.7460Kurtosis 6.6731 7.8619 4.7302 5.6377 2.6435 2.8459

2.00 Mean 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000SD 0.9791 0.9913 0.8814 0.8924 0.6825 0.6924Skewness −1.7302 −1.8672 −1.3299 −1.4451 −0.7386 −0.8326Kurtosis 6.5575 7.6363 4.7674 5.6021 2.5888 3.2125

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 8: Residuals for log-Burr XII regression models in survival analysis

Journal of Applied Statistics 1441

Table 3. Estimated distributional measures for the martingale-type residual in log-Burr XII regression models.

Martingale residual (rD)

Censoring 0% Censoring 10% Censoring 30%

σ -Value k-Value Average n = 50 n = 100 n = 50 n = 100 n = 50 n = 100

0.50 0.27 Mean 0.3405 0.3408 0.2713 0.2721 0.1435 0.1480SD 1.0340 1.0352 1.2006 1.1829 1.2415 1.1635Skewness −0.0186 −0.0201 −0.18690 −0.1267 0.2820 0.3967Kurtosis 3.0743 3.1418 3.8918 3.7468 3.4828 3.2464

1.00 Mean 0.3404 0.3413 0.2770 0.2773 0.1589 0.1528SD 1.0327 1.0358 1.1217 1.1257 1.0908 1.1880Skewness −0.0247 −0.0260 −0.0588 −0.0539 0.3526 0.2212Kurtosis 3.0240 3.1292 3.4946 3.5723 3.0836 3.5948

2.00 Mean 0.3407 0.3414 0.2790 0.2789 0.1612 0.1564SD 1.0332 1.0358 1.0881 1.1002 1.0793 1.1498Skewness −0.0300 −0.0285 0.0117 0.0013 0.3338 0.2517Kurtosis 3.0408 3.1303 3.2538 3.4227 3.1005 3.5015

0.80 0.27 Mean 0.3402 0.3407 0.2699 0.2702 0.1366 0.1346SD 1.0332 1.0348 1.1931 1.2115 1.2755 1.3173Skewness −0.0172 −0.0217 −0.1100 −0.1814 0.3390 0.2883Kurtosis 3.0702 3.1412 3.6318 3.8860 3.4694 3.6413

1.00 Mean 0.3404 0.3413 0.2734 0.0039 0.1441 0.1420SD 1.0327 1.0359 1.1532 1.0761 1.2158 1.2604Skewness −0.0241 −0.0258 −0.0719 0.0239 0.3167 0.2566Kurtosis 3.0253 3.1313 3.5148 3.1586 3.4139 3.6398

2.00 Mean 0.3407 0.3414 0.2746 0.2749 0.1461 0.1458SD 1.0332 1.0358 1.1345 1.1386 1.2033 1.2090Skewness −0.0298 −0.0285 −0.0416 −0.0322 0.2984 0.3153Kurtosis 3.0406 3.1299 3.4366 3.5178 3.4308 3.5002

3.2 Modified martingale-type residual

The new residual is obtained by subtracting the expected value from the martingale-typeresidual, which – based on the simulation studies – is approximately 0.34. Thus, the modifiedmartingale-type residual becomes

rMDi=

⎧⎪⎨⎪⎩sign{1 − k log[1 + exp(zi)]}{−2[1 − k log[1 + exp(zi)]

+ log(1 − k log{1 + exp(zi)})]}1/2 − 0.34, if i ∈ F ;sign{−k log[1 + exp(zi)]}{2k log[1 + exp(zi)]}1/2, if i ∈ C.

3.2.1 Simulation studies

The simulation studies for the modified martingale-type residual were developed in the same wayconducted as in Section 3.1.1. The results are summarized in Table 4 and present similar valuesas shown in Table 3, except that now the mean is near zero.

In addition, we can extract the following interpretations from the simulation studies:

• We clearly observe that the empirical distribution of the modified martingale-type residualpresents good agreement with the standard normal distribution for each n, k, σ and censoringproportion given.

• As the censoring percentage decreases, the empirical distribution of the modified martingale-type residual seems to have a better agreement with the standard normal distribution.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 9: Residuals for log-Burr XII regression models in survival analysis

1442 G.O. Silva et al.

Table 4. Estimated distributional measures for the modified martingale-type residual in log-Burr XIIregression models.

Martingale residual (rMD)

Censoring 0% Censoring 10% Censoring 30%

σ -Value k-Value Average n = 50 n = 100 n = 50 n = 100 n = 50 n = 100

0.50 0.27 Mean 0.0005 0.0008 −0.0346 −0.0339 −0.0945 −0.0900SD 1.0340 1.0352 1.0917 1.0819 1.0407 0.9916Skewness −0.0186 −0.0201 −0.0131 0.0107 0.2965 0.3379Kurtosis 3.0743 3.1418 3.1772 3.1596 2.7256 2.6794

1.00 Mean 0.0004 0.0013 −0.0290 −0.0287 −0.0792 −0.0852SD 1.0327 1.0358 1.0423 1.0459 0.9426 1.0047Skewness −0.0247 −0.0260 0.0194 0.0241 0.2747 0.2332Kurtosis 3.0240 3.1292 3.0320 3.1112 2.5660 2.8119

2.00 Mean 0.0007 0.0014 −0.0270 −0.0271 −0.0768 −0.0816SD 1.0332 1.0358 1.0216 1.0303 0.9348 0.9803Skewness −0.0300 −0.0285 0.0422 0.0438 0.2564 0.2362Kurtosis 3.0408 3.1303 2.9407 3.0690 2.5781 2.7885

0.80 0.27 Mean 0.0002 0.0007 −0.0361 −0.0358 −0.1015 −0.1034SD 1.0332 1.0348 1.0885 1.1020 1.0639 1.0910Skewness −0.0172 −0.0217 0.0288 −0.0058 0.3492 0.3329Kurtosis 3.0702 3.1412 3.0643 3.1894 2.7248 2.8112

1.00 Mean 0.0004 0.0013 −0.0326 −0.0321 −0.0939 −0.0960SD 1.0327 1.0359 1.0629 1.0665 1.0239 1.0529Skewness −0.0241 −0.0258 0.0279 0.0319 0.3087 0.2875Kurtosis 3.0253 3.1313 3.0264 3.1137 2.7051 2.8235

2.00 Mean 0.0007 0.0014 −0.0314 −0.0311 −0.0918 −0.0922SD 1.0332 1.0358 1.0511 1.0550 1.0153 1.2020Skewness −0.0298 −0.0285 0.0360 0.0439 0.2904 0.3040Kurtosis 3.0406 3.1299 3.0050 3.0939 2.7147 2.7908

• The empirical distribution of the residual seems to present similar agreement with the standardnormal distributions for values of σ < 1.0 for each n, k, σ and censoring proportion given.

Thus, we recommend the use of normal probability plots for rMDiwith a simulated envelope,

as suggested by Atkinson [1]. This can be constructed as follows: (i) fit the model and generatea sample of n independent observations using the fitted model as if it were the true model; (ii)fit the model to the generated sample using (δi, xi ) of the data set, and compute the values ofthe residuals; (iii) repeat steps (i) and (ii) m times; (iv) obtain ordered values of the residuals,r∗(i)v, i = 1, 2, . . . , n and v = 1, 2, . . . , m; (iv) consider the n sets of the m order statistics; (v) for

each set compute its average, minimum and maximum values; and (vi) plot these values and theordered residuals of the original sample against the normal scores. The minimum and maximumvalues of the m order statistics form the envelope. Observations corresponding to residuals outsidethe limits provided by the simulated envelope are worthy of further investigation. Additionally,if a considerable proportion of points fall outside the envelope and/or some systematic tendencyappears, then one has evidence against the adequacy of the fitted model. Graphs of such residualsagainst the fitted values may be also useful.

4. Application

As an illustration, we consider the data from the Veterans Administration lung cancer trial given inPrentice [11] and reported in Kalbfleisch and Prentice [6] and Bennett [2]. This data set considered

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 10: Residuals for log-Burr XII regression models in survival analysis

Journal of Applied Statistics 1443

males with advanced inoperable lung cancer that received chemotherapy. Bennett [2] used onlythe subgroup of patients with no prior therapy and only the most important explanatory variablefor the ith patient: tumor type (large, adeno, small, squamous) and performance status (PS), ameasure of general fitness on a scale from 10 to 90, where 10, 20, 30 = completely hospitalized,40, 50, 60 = partial hospital confinement and 70, 80, 90 = able to care for self. Based on theseparameters, he provided an application of a log-logistic regression model for survival data. Thedata set presented the survival times, in days, for 97 patients and six is right-censored. The aimof the study was to relate the survival/censoring time of advanced inoperable lung cancer thatreceived chemotherapy with a number of prognostic variables.

As the log-Burr XII regression model is an alternative to the log-logistic regression model [13],we used this data set to provide an application of the results derived in the previous sections.

Thus, we consider the following regression model:

yi = β0 + β1xi1 + β2xi2 + β3xi3 + β4xi4 + σzi, (6)

where zi are independent random variable such that the density of zi is given in Equation (2) andthe response yi denotes the lifetime logarithm or censoring time logarithm.

To obtain the maximum-likelihood estimates for the parameters in the model, we used theMaxSQP subroutine in the matrix programming language available in Ox . The results are givenin the Table 5.

Note that the PS and tumor type are important factors. The parameter estimates and standarderror given here are qualitatively very similar to those obtained by the log-logistic regressionmodel. The conclusions that can be drawn are similar to those obtained by Bennett [2].

In order to assess if the model is appropriate, the plot comparing the empirical distributionfor the survival function and survival function estimated by standard normal distribution to themodified martingale-type residual for log-Burr XII regression models is given in Figure 1. Thisfigure, shows that the log-Burr XII model presents satisfactory fitting.

In order to detect possible outlying observations as well as departures from the assumptionsof the log-Burr XII regression model, we present in Figure 2(a) the normal probability plot forthe modified martingale-type residual (rMDi) with generated envelope. As can be seen, the graphin Figure 2(a) indicates that the log-Burr XII distribution with k = 1.4287 does not seem to beunsuitable to fit the data. In addition, the graph of rMDi against the fitted values, described inFigure 2(b), suggests that the residuals are randomly scattered around zero. As can be seen, thisgraph indicates that case 59 has a residual value greater than 3, so this observation can be consid-ered an outlier. Case 59 has the lowest failure time. By removing this observation, the parameterestimates do not present large changes and the inference remains unaltered. Influential observa-tions can also be identified by using the procedures developed by Silva et al. [13]. Therefore,looking at Figure 2 we can conclude that the log-Burr XII regression model seems to fit this dataset adequately.

Table 5. Maximum-likelihood estimates for the parameters from the log-Burr XII regressionmodel fitted to the lung cancer data set.

Parameter Estimate SE p-Value

k 1.4287 0.6812 –σ 0.6023 0.0930 –β0 3.2603 0.6611 <0.00001β1 0.1147 0.3153 0.71615 Squamous vs. largeβ2 −0.7406 0.2775 0.0076 Small vs. largeβ3 −0.7195 0.2960 0.0151 Adeno vs. largeβ4 0.0282 0.0052 <0.0001 PS

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 11: Residuals for log-Burr XII regression models in survival analysis

1444 G.O. Silva et al.

Figure 1. Estimated survival function of the modified martingale-type residual for the lung cancer data setby fitting log-Burr XII regression models: (a) Kaplan–Meier and (b) standard normal distribution.

Figure 2. (a) Normal probability plot of modified martingale-type residual (rMDi) and (b) plot of the modified

martingale-type residual against the fitted values.

5. Concluding remarks

Various simulation studies developed in this work indicate that the distribution of a modifiedmartingale-type residual presents high agreement with the standard normal distribution whencompared with other residuals considered in the studies, namely martingale and martingale-typeresiduals. This fact suggests that many residual analyses usually applied in normal linear regressionmodels, such as normal probability plots and index plot of residuals, may be extended to log-BurrXII regression models. In addition, together with the results obtained by Silva et al. [13], who

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014

Page 12: Residuals for log-Burr XII regression models in survival analysis

Journal of Applied Statistics 1445

that derived normal curvatures under various perturbation schemes for log-Burr XII regressionmodels, we can perform a general model checking analysis, making this model very attractive formodeling censored and uncensored lifetime data.

Acknowledgements

This work was partially supported by grants from CAPES – Brazil and was developed when the first author was a doctoralstudent at the Departamento de Ciências Exatas, Escola Superior de Agricultura “Luiz de Queiroz”, Universidade de SãoPaulo, Piracicaba. The authors are grateful to one anonymous referee for valuable comments that helped to improve anearlier version of this paper.

References

[1] A.C. Atkinson, Plots, Transformations and Regression: An Introduction to Graphical Methods of DiagnosticsRegression Analysis, 2nd ed., Clarendon Press, Oxford, 1985.

[2] S. Bennett, Log-logistic regression models for survival data, Appl. Stat. 32 (1983), pp. 165–171.[3] D. Collett, Modelling Survival Data in Medical Research, Chapman and Hall, London, 1994.[4] A.C. Davison and A. Gigli, Deviance residuals and normal scores plots, Biometrika 76 (1989), pp. 211–221.[5] J. Doornik, An Object-oriented Matrix Programming Language Ox 5, 5th ed., Timberlake Consultants Press, London,

2007.[6] J.D. Kalbfleisch and R.L. Prentice, The Statistical Analysis of Failure Time Data, 2nd ed., John Wiley, New York,

2002.[7] J.P. Klein and M.L. Moeschberger, Survival Analysis: Techniques for Censored and Truncated Data, Springer, New

York, 1997.[8] J.F. Lawless, Statistical Models and Methods for Lifetime Data, 2nd ed., Wiley, New York, 2003.[9] P. McCullagh, Tensor Methods in Statistics, Chapman and Hall, London, 1987.

[10] E.M.M. Ortega, G.A. Paula, and H. Bolfarine, Deviance residuals in generalized log-gamma regression models withcensored observations, J. Stat. Comput. Simul. 78 (2008), pp. 747–764.

[11] R.L. Prentice, Exponential survival with censoring and explanatory variables, Biometrika 60 (1973), pp. 279–288.[12] Q. Shao and X. Zhou, A new parametric model for survival data with long-term survivors, Stat. Med. 23 (2004),

pp. 3525–3543.[13] G.O. Silva, E.M.M. Ortega, V.G. Cancho, and M.L. Barreto, Log-Burr XII regression models with censored data,

Comput. Statist. Data Anal. 52 (2008), pp. 3820–3842.[14] T.M. Therneau, P.M. Grambsch, and T.R. Fleming, Martingale-based residuals for survival models, Biometrika 77

(1990), pp. 147–160.[15] W.J. Zimmer, J.B. Keats, and F.K. Wang, The Burr XII distribution in reliability analysis, J. Qual. Technol. 30 (1998),

pp. 389–394.

Dow

nloa

ded

by [

Cas

e W

este

rn R

eser

ve U

nive

rsity

] at

17:

05 2

2 N

ovem

ber

2014