The transmuted log-logistic regression model: a new model for time up to first calving of cows

Preview:

Citation preview

1 23

Statistical Papers ISSN 0932-5026 Stat PapersDOI 10.1007/s00362-015-0671-5

The transmuted log-logistic regressionmodel: a new model for time up to firstcalving of cows

Francisco Louzada & DanieleC. T. Granzotto

1 23

Your article is protected by copyright andall rights are held exclusively by Springer-Verlag Berlin Heidelberg. This e-offprint isfor personal use only and shall not be self-archived in electronic repositories. If you wishto self-archive your article, please use theaccepted manuscript version for posting onyour own website. You may further depositthe accepted manuscript version in anyrepository, provided it is only made publiclyavailable 12 months after official publicationor later and provided acknowledgement isgiven to the original source of publicationand a link is inserted to the published articleon Springer's website. The link must beaccompanied by the following text: "The finalpublication is available at link.springer.com”.

Stat PapersDOI 10.1007/s00362-015-0671-5

REGULAR ARTICLE

The transmuted log-logistic regression model: a newmodel for time up to first calving of cows

Francisco Louzada · Daniele C. T. Granzotto

Received: 8 January 2014 / Revised: 2 February 2015© Springer-Verlag Berlin Heidelberg 2015

Abstract In this paper we introduce a general class of survival regression models,the transmuted log-logistic regression model, which is conceived by a quadratic ranktransmutation map applied the usual log-logistic model. We provide a comprehensivedescription of the properties of the proposed distribution along with a study of its hazardfunction. Closed expressions for several probabilistic measures are provided, suchas probability density function, function hazard, moments, quantile function, mean,variance and median. Inference is maximum likelihood based. Simulation studies areperformed in order to evaluate the asymptotic properties of the parameter estimates.The usefulness of the transmuted log-logistic regression distribution for modelingsurvival data is illustrated on a polled Tabapua breed time up to first calving data.

Keywords Log-logistic distribution · Maximum likelihood estimation ·Survival analysis · Transmuted map

1 Introduction

Generating new lifetime distributions is a permanent concern of survival researchers.The challenge is the derivation of more complex and flexible statistical survival prob-ability models which can represent more consistently the random behavior of exper-imental observations in real world lifetime data. The literature on proposing new

F. Louzada (B)SME-ICMC-USP, Universidade de São Paulo, CP 668, São Carlos, São Paulo CEP 13560-970, Brazile-mail: louzada@icmc.usp.br

D. C. T. GranzottoPPGEs-UFSCar, Universidade Federal de São Carlos, CP 676, São Carlos, São Paulo 13565-905, Brazile-mail: danigranzotto@outlook.com

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

survival distributions is rich, growing rapidly and various are the papers extendingstandard survival distributions designed to serve as statistical survival models for awide range of real lifetime phenomena which does not follow any of the standardsurvival distributions. Interested readers can refer to Ghitany (2001), Marshall andOlkin (2007), Sarabia and Prieto (2009), Louzada et al. (2012) and Lai (2013), andtheir references, whose discusses some common approaches for construct lifetimedistributions. Among which we can mention, from simple procedures for constructingvariables, transformations of distribution, reliability or hazard functions, and mixturesof two or more lifetime distributions, to convolutions, compound (infinite mixture)distributions, and probability integral transforms. Moreover, there are some recentresearchers on proposing models whose accommodate intrinsic heterogeneity, suchas Hanagal (2009), who proposed a Weibull extension of the bivariate exponentialregression model with different frailty distributions.

An alternative procedure to the above ones is the so-called transmutation maps,which are regarded as a convenient way of constructing new distributions. Accord-ing to Shaw and Buckley (2007) transmutation maps comprise the functional com-position of the cumulative distribution function of one distribution with the inversecumulative distribution (quantile) function of another one. Motivated by the need forparametric families of rich and yet tractable distributions in financial mathematicsShaw and Buckley (2007) used a transmutation map which is the functional compo-sition of a cumulative distribution function on a distribution with the inverse cumula-tive distribution (quantile) function of a non-Gaussian distribution. After that, somestudies involving quadratic rank transmutation map can be seen in other applica-tion areas such as survival analysis and reliability. For instance, we cite Aryal andTsokos (2009), which proposed a generalization of the extreme value distributionby using the quadratic rank transmutation map and applied such new distribution toanalyzing the snow fall data in Midway Airport in the state of Illinois, USA, Aryaland Tsokos (2011), which proposed the transmuted Weibull distribution, illustrat-ing its use in two published data sets, and Aryal (2013), which proposed the trans-muted log-logistic distribution. These papers however do not provide a regressionapproach.

In this paper we introduce a general class of survival regression models, the trans-muted log-logistic regression model, conceived by considering a quadratic rankingtransmutation map (Shaw and Buckley 2007) in order to include a third parameter inthe usual log-logistic model. Covariates are introduced in an accelerated lifetime testfashion such as in Mackensie (1997) and Tojeiro and Louzada-Neto (2011), regressingthe scale parameter. A direct advantage of our modeling is the fact that the log-logisticdistribution has a larger range of choices for the shape of the hazard function than theWeibull or extreme value distributions do (Chen and Sinha 2001; Bennett 1983). Forinstance, the log-logistic distribution can accommodate unimodal hazard function indetriment of its competitors.

From the practical point of view such hazard shape is frequently observed. Forinstance, consider a long-term study involving cows of Tabapua breed, which was heldat EMBRAPA, a Brazilian agricultural research institute, in order to infer on the timeup to first calving in polled Tabapua cows. Note that, very substantial and continuedgenetic improvement has been made in livestock over the past several decades and

123

Author's personal copy

The transmuted log-logistic regression model

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

r/n

f

prp until one yearprp after one year

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

r/n

f

calves after 2000calves before 2000

Fig. 1 TTT Plot by considering the covariates: prp (left panel) and period (right panel)

these changes have a direct impact on the economy and environment. In addition,increasingly sophisticated statistical models, as the transmuted models, have beenbuilt in order to prove productive effectiveness of these genetically modifications. Hill(2014), discussed some of the history and genetic issues as applied to the science oflivestock improvement, which has had and continues to have major spin-offs into ideasand applications in other areas. The author cite that the development can be attributedmuch of his understanding to wright, and formalized in Fisher’s infinitesimal model,such our model.

In order to verify the possible shape for the hazard function, Fig. 1 shows the TTTplots according to two covariates: the time when the calf was born, after or before2000 (period) and the age that occurred the first oestrus of the cow (prp), until oneyear or after one year. Interested readers can refer to Barlow and Campo (1975) formore information on TTT plotting. Overall, if the TTT plot is concave, convex andthen concave again, it indicates unimodal hazard, which is our case.

The paper is organized as follows. The transmuted log-logistic distribution is pre-sented in Sect. 2. The transmuted regression model and its reliability and hazardfunctions, moments and quantiles are presented in Sect. 3. The inferences for thatmodel are presented in Sect. 4. The results of a simulation study performed in order toevaluate the asymptotic properties of the parameter estimates are presented in Sect. 5.The new distribution is illustrated in a real data set on a polled Tabapua breed time upto first calving data by considering the presence of covariates in Sect. 6. Final remarksare presented in Sect. 7.

2 Model formulation

Let X be a nonnegative random variable denoting the lifetime of an individual in somepopulation. The random variable X is said to be transmuted log-logistically distributedwith scale parameters µ, β and λ if its cumulative density function (cdf) is given by

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

x

Den

sity

β = 0.5β = 0.75β = 1β = 1.5β = 2β = 3

λ = − 1λ = − 0.5λ = 0λ = 0.5λ = 1λ = 1

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

x

Cum

ulat

ive

Den

sity

β = 0.5β = 0.75β = 1β = 1.5β = 2β = 3

λ = − 1λ = − 0.5λ = 0λ = 0.5λ = 1λ = 1

Fig. 2 Pdf and cdf of transmuted log-logistic distribution for µ = 1 and different values of λ and β

F (x) = eµxβ

(1 + eµxβ

)2

(1 + eµxβ + λ

), (1)

where β > 0, −∞ < µ < +∞ and | λ |≤ 1 are the shape, scale and skewnessparameters.

The corresponding probability density function (pdf) is given by

f (x) = eµβxβ−1 [(1 + eµxβ

)− λ

(eµxβ − 1

)]

(1 + eµxβ

)3 . (2)

The results in the Appendix A show that (2) is a pdf.Note that the transmuted log-logistic distribution is an extended model to analyze

more complex data and it generalizes some of the important distributions in reliabilityanalysis. The log-logistic distribution is clearly a special case for λ = 0. Some of thepossible shapes of the transmuted log-logistic pdf and cdf are illustrated respectivelyin the left and right panels of the Fig. 2 for selected values of the parameters λ andβ and for µ = 1. The parameter λ is responsible for introducing skewness into thelog-logistic distribution. This is in full agreement with Shaw and Buckley (2007),which pointed out the introduction of skewness into a distribution is a directly effectof the transmutation maps.

The genesis of the transmuted log-logistic distribution (1) is as follows. Let X be anonnegative random variable denoting the lifetime of an individual in some population.The random variable X is said to be log-logistically distributed with scale parameterµ and shape parameter β if its cdf is given by

G (x) = eµxβ

(1 + eµxβ

) . (3)

According to Shaw and Buckley (2007), a ranking quadratic transmutation map hasthe following simple form, F2

(F−1

1 (u))

= u + λu(1 − u), where for | λ |≤ 1, from

123

Author's personal copy

The transmuted log-logistic regression model

which it follows that a cumulative distribution function F2(x) satisfy the followingrelationship with a baseline cumulative distribution function F1(x), F2(x) = (1 +λ)F1(x)−λF1(x)2. The effect of the quadratic rank transmutation map is to introduceskewness to a base distribution but preserving the moments.

Then, a random variable X is said to have transmuted distribution if its cumulativedistribution function is given by

F(x) = (1 + λ)G(x) − λG(x)2, (4)

for | λ |≤ 1, where G(x) is the cumulative distribution function of the baselinedistribution. Observe that at λ = 0 we have only the baseline cdf of the randomvariable. Following this idea Aryal and Tsokos (2009) propose the transmuted extremevalue distribution while Aryal and Tsokos (2011) propose the transmuted Weibull one.However, both considered distributions can not accommodate data in presence of aunimodal hazard curve, which is a prerogative of the log-logistic distribution.

Hence, from (3) and (4), the cdf of the transmuted generalized log-logistic distrib-ution (1) is directly obtained.

3 The transmuted log-logistic regression model

Let T be a random variable denoting the lifetimes with pdf (2) and µ = γ (x) as aparameter depending on a covariate vector x = (1, x1, . . . , x p)

′ such as

γ (x) = γ0 + γ1x1 + · · · + γpx p.

Then, the pdf (2) may be written as

f (t | γ (x)) = eγ (x)βtβ−1 [(1 + eγ (x)tβ

)− λ

(eγ (x)tβ − 1

)]

(1 + eγ (x)tβ

)3 , (5)

where λ > 0, β > 0 and γ (x) is a regression defined above.Consequently, the reliability function may be written as

R(t | γ (x)) = 1 + eγ (x)tβ(1 − λ)

(1 + eγ (x)tβ)2 . (6)

The reliability function for some parameter values is presented in the upper left panelof the Fig. 3 with γ (x) = γ0 = 1.

The hazard function of the transmuted log-logistic regression model h(t | x) whichdenote the hazard function at time t for an individual with covariate vector x is givenby

h(t | γ (x)) = βeγ (x)tβ−1 [1 + eγ (x)tβ − λ(eγ (x)tβ − 1)

]

(1 + eγ (x)tβ)(1 + eγ (x)tβ − λeγ (x)tβ), (7)

where λ > 0, β > 0 and γ (x) is a regression defined above.

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

t

R(t

)β = 0.5β = 0.5β = 0.5

β = 1β = 1β = 1

λ = − 1λ = 0λ = 1

λ = − 1λ = 0λ = 1

0.0 0.5 1.0 1.5 2.0 2.5 3.0

01

23

4

t

h(t)

λ = − 1λ = 1

0.0 0.5 1.0 1.5 2.0 2.5 3.0

01

23

4

t

h(t)

β = 1.5β = 1.5β = 1.5

β = 1β = 1β = 1

λ = − 1λ = 0λ = 1

λ = − 1λ = 0λ = 1

0 2 4 6 8

02

46

8

t

H(t

)

β = 1.5β = 1.5β = 1.5

β = 1β = 1β = 1

λ = − 1λ = 0λ = 1

λ = − 1λ = 0λ = 1

Fig. 3 Survival (upper left panel), hazard with (upper right panel) and without (lower left panel) Tmaxand cumulative hazard (lower right panel) of transmuted log-logistic distribution for γ (x) = γ0 = 1 anddifferent values of λ and β

The hazard function (7) has the following properties.If λ ≥ 0 we have the log-logistic regression hazard as a particular case given by

h(t | γ (x)) = βeγ (x)tβ−1

(1 + eγ (x)tβ). (8)

It can be easily shown that (8) is increasing for β ≤ 1 and, for β > 1, the hazardfunction initially increases to the maximum Tmax = e−γ (x)/β(β − 1)1/β and tends tozero for t → ∞. The upper right panel of the Fig. 3 shows the hazard function (8) forλ = −1 and λ = 1 with γ (x) = γ0 = 1.

If λ < 0 we have

h(t | γ (x)) = 2βe2γ (x)t2β−1

(1 + eγ (x)tβ)(1 + 2eγ (x)tβ),

which is increasing for β ≤ 1/2 and unimodal for β > 1/2 with the maximum inTmax = [(3(β − 1)/4eγ (x)+

√9β2 − 2β + 1/4eγ (x)]1/β . The lower left panel of the

Fig. 3 shows the hazard function for some fixed parameter values.

123

Author's personal copy

The transmuted log-logistic regression model

Another important relation between the hazard function and the reliability func-tion is given by the cumulative hazard rate function H(t). This relationship H(t) =− ln R(t) is given by

H(t) = 2 ln[1 + exp

[γ (x)

]tβ

]− ln

[1 + exp

[γ (x)

]tβ − λ exp

[γ (x)

]tβ

]. (9)

The cumulative hazard rate function presents the following properties: H(0) = 0;H(T ) is increasing for all t ≥ 0 and limt→∞ H(t) = ∞ and we can see someexamples of curves in the lower right panel of the Fig. 3.

Following, we shall present the moments and quantiles for the transmuted log-logistic regression model. The r th order moments of a transmuted log-logistic randomvariable T is given by

E(T r ) = πr

β2 e− γ (x)rβ (β − rλ) csc

[πrβ

].

So, the expected value E(T ), variance V ar(T ) and median of a transmuted log-logisticrandom variable T in the presence of covariates are, respectively, given by

E (T ) = e−γ (x)

βπ

β2 (β − λ) csc[π

β

], (10)

V ar (T ) = e−2γ (x)

βπ

β2

[2(β − 2λ) csc

(2π

β

)− π

β2 (β − λ)2 csc2(

π

β

)](11)

andt0.5 = e−γ (x)/β

[√1 + λ2 − λ

]1/β. (12)

We can generate a continuos random variable T from its cdf by using uniformvalues (0, 1), where T = F−1(U ). Thus, by using this method, it is possible togenerate random numbers of transmuted log-logistic distribution when the parametersγ0, γ1, . . . , γp,β, λ are known as follows

t =[√

(1 + λ)2 − 4uλ + 2u − (1 + λ)

2 exp[γ (x)](1 − u)

]1/β

. (13)

4 Inference

Let y1, y2, . . . , yn be a sample of size n from a transmuted log-logistic distributionand x = (1, x1, x2, . . . , x p)

′ be a vector of covariates. Also, consider the followingrelationship between the vector of covariates and the parameters γ (x) = γ0 + γ1x1 +· · · + γpx p.

Hence, the log-likelihood function is given by

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

l = n ln β +n∑

i=1

p∑

j=0

γ j xi j +n∑

i=1

ln[1 + e

∑pj=0 γ j xi j yβ

i − λ(

e∑p

j=0 γ j xi j yβi − 1

)]

+ (β − 1)

n∑

i=1

ln yi − 3n∑

i=1

ln(

1 + e∑p

j=0 γ j xi j yβi

). (14)

Therefore, the normal equations are given by

∂l∂γ j

=n∑

i=1

⎣(1 − λ)yβ

i xi j exp[∑p

j=0 γ j xi j

]

1 + exp[∑p

j=0 γ j xi j

]yβ

i − λ[exp

[∑pj=0 γ j xi j

]yβ

i − 1]

⎦ ,

+n∑

i=1

xi j − 3n∑

i=1

⎣xi j yβ

i exp[∑p

j=0 γ j xi j

]

1 + yβi exp

[∑pj=0 γ j xi j

]

⎦ ,

∂l∂β

= nβ

+n∑

i=1

⎣(1 − λ)yβ

i ln(yi ) exp[∑p

j=0 γ j xi j

]

1 + yβi exp

[∑pj=0 γ j xi j

]− λ

[yβ

i exp[∑p

j=0 γ j xi j

]− 1

]

⎦ ,

+n∑

i=1

ln(yi ) − 3 +n∑

i=1

⎣yβ

i ln(yi ) exp[∑p

j=0 γ j xi j

]

1 + yβi exp

[∑pj=0 γ j xi j

]

⎦ ,

∂l∂λ

=n∑

i=1

1 − yβi exp

[∑pj=0 γ j xi j

]

1 + yβi exp

[∑pj=0 γ j xi j

]− λ

[yβ

i exp[∑p

j=0 γ j xi j

]− 1

] .

The maximum likelihood estimator θ = (γ0, . . . , γp, β, λ)′ can be obtained bysolving the above nonlinear system of equations. It is usually more convenient to usenonlinear optimization algorithms such as quasi-Newton or Newton-Raphson algo-rithms to numerically maximize the log-likelihood function given in (14).

Following Aryal and Tsokos (2011), in order to compute the standard error andasymptotic confidence interval we use the usual large sample approximation in whichthe MLEs of θ can be treated as being approximately a (p + 3)-variate normal distri-bution. Hence as n → ∞ the asymptotic distribution of the MLE (γ0, . . . , γp, β, λ)

is given by

⎜⎜⎜⎜⎜⎜⎜⎝

γ0γ1...

γp

β

λ

⎟⎟⎟⎟⎟⎟⎟⎠

∼ N

⎢⎢⎢⎢⎢⎢⎢⎣

⎜⎜⎜⎜⎜⎜⎜⎝

γ0γ1...

γp

β

λ

⎟⎟⎟⎟⎟⎟⎟⎠

,

⎜⎜⎜⎝

V11 V12 . . . V1(p+3)

V21 V22 . . . V2(p+3)...

.... . .

...

V(p+3)1 V(p+3)2 . . . V(p+3)(p+3)

⎟⎟⎟⎠

⎥⎥⎥⎥⎥⎥⎥⎦

, (15)

123

Author's personal copy

The transmuted log-logistic regression model

where, Vi j = Vi j |θ=θ , which is determined by the inverse of the Hessian matrix.Thereby, the approximate 100(1 − α)% two sided confidence intervals for γ j , β

and λ are, respectively, given by

γ j ± zα/2

√V( j+1)( j+1), β ± zα/2

√V(p+2)(p+2) and λ ± zα/2

√V(p+3)(p+3),

where j = 0, . . . , p, zα is the upper α − th percentiles of the standard normal distri-bution. The needed Hessian matrix is showed in the Appendix B.

Different models can be compared by penalizing over-fitting by using the Akaikeinformation criterion Akaike (1973), which intends to minimize the Kullback-Leiblerdivergence between the true distribution and the estimate from a candidate model. Itis given by AIC = −2l(θ) + 2si ze(θ), where l(θ) denotes the log likelihood functionevaluated at the maximum and si ze(θ) is the number of model parameters. The modelwith the lowest value of this criterion (among all considered models) is regarded asthe preferred model for describing the given dataset.

5 Simulation study

In order to study the behavior of the MLEs, this section presents the results of aMonte Carlo experiment on finite samples. For this study we consider six differentset of parameters for n = 50, 100, 150, 300, 500 and 1000, generated according to atransmuted log-logistic regression distribution in the presence of two covariates. Allresults were obtained from 1000 Monte Carlo replications.

The results are summarised in two tables. Table 1 shows the relative difference ofgenerated and estimated parameters and its respectively standard errors over the 1000MLEs, which are observed to decay as the sample size increases. Table 2 shows thecoverage probability of a 95 % two sided confidence intervals for the model parameters,which are observed to be close to the nominal coverage for large sample sizes, thoughthey usually differ from the nominal coverage probability less than 5 % for the smallestsample size considered.

6 Application

In this section we illustrate the usefulness of the regression transmuted log-logisticdistribution on modeling the polled Tabapua breed data, which was briefly introducedin Sect. 1.

This breed is distinguished by meekness, sexual precocity, high milk production,fertility, good meat quality and adaptability to various regions and climate. However,their mainly creation is facing cutting because it has very favorable characteristics forthis purpose too. It is also widely used in crosses with other breeds, generating hardyanimals and good productivity (Pereira 2000).

According to the characteristics mentioned and especially for the sexual precocity,the polled Tabapua breed is an important economic result related to beef cattle sinceit aims to increase production of kilograms of meat per hectare, at a certain time andat less cost.

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

Table 1 MLEs to the original sample generated with different parameter values and sample sizes

Parameters Sample Estimated EstimatedGenerated Size Relative difference(%) Standard error

(α1, α2, β, λ) n α1 α2 β λ α1 α2 β λ

(−2,−1,0.5,−0.8) 50 11.136 20.499 3.644 12.094 0.707 0.634 0.070 0.527

100 7.479 14.098 2.158 7.631 0.518 0.463 0.053 0.372

150 6.029 10.908 1.175 5.175 0.431 0.379 0.044 0.291

300 4.068 6.587 0.747 3.721 0.320 0.278 0.034 0.197

500 2.430 3.919 0.270 1.781 0.248 0.213 0.027 0.142

1000 1.244 2.397 0.228 0.987 0.173 0.149 0.019 0.096

(−2,−0.5,1.5,−0.5) 50 6.667 29.552 0.051 15.737 0.727 0.643 0.210 0.598

100 4.237 21.604 0.860 2.789 0.535 0.466 0.155 0.437

150 4.479 19.244 0.609 4.435 0.465 0.400 0.134 0.386

300 3.155 10.868 0.466 1.911 0.347 0.294 0.098 0.291

500 2.710 8.546 0.202 4.239 0.282 0.237 0.079 0.240

1000 1.516 5.446 0.133 3.468 0.204 0.169 0.057 0.173

(−2,0.5,1,−0.5) 50 6.570 27.020 0.015 14.991 0.724 0.623 0.140 0.595

100 4.188 21.382 0.880 3.046 0.534 0.447 0.103 0.438

150 4.341 18.216 0.657 3.862 0.464 0.377 0.089 0.386

300 3.155 10.491 0.466 1.911 0.347 0.274 0.065 0.291

500 2.710 7.654 0.202 4.239 0.282 0.217 0.053 0.240

1000 1.516 4.722 0.133 3.468 0.204 0.152 0.038 0.173

(−3,−1,0.5,0.8) 50 2.204 0.718 3.115 20.596 0.677 0.592 0.069 0.491

100 1.009 0.442 1.984 14.506 0.471 0.416 0.052 0.364

150 1.031 0.397 1.570 10.668 0.378 0.333 0.046 0.294

300 0.829 0.079 0.646 6.547 0.263 0.231 0.033 0.197

500 0.730 0.094 0.284 4.466 0.201 0.176 0.026 0.147

1000 0.423 0.430 0.087 2.725 0.141 0.123 0.019 0.098

(−2,−0.5,1.5,0.5) 50 2.453 1.889 0.544 27.549 0.653 0.618 0.207 0.538

100 2.413 1.101 0.610 30.890 0.473 0.455 0.156 0.454

150 1.615 2.003 0.328 29.270 0.386 0.379 0.134 0.403

300 1.139 2.257 0.479 19.601 0.272 0.273 0.098 0.310

500 0.300 1.557 0.330 13.072 0.207 0.213 0.080 0.248

1000 0.056 0.477 0.181 7.248 0.144 0.152 0.058 0.179

(−3,0.5,1,0.5) 50 3.048 0.123 0.512 28.512 0.713 0.631 0.138 0.536

100 1.663 1.233 0.596 31.439 0.514 0.471 0.104 0.455

150 1.112 1.809 0.349 29.053 0.417 0.400 0.089 0.403

300 0.645 0.515 0.479 19.601 0.292 0.293 0.066 0.310

500 0.137 0.336 0.342 12.917 0.220 0.233 0.053 0.248

1000 0.015 0.247 0.181 7.248 0.152 0.168 0.039 0.179

123

Author's personal copy

The transmuted log-logistic regression model

Table 2 Probability of coverage by considering 95 % of confidence

Generated Sample CoverageParameters Size Probability

(α1, α2, β, λ) n α1 α2 β λ

(−2,−1,0.5,−0.8) 50 0.916 0.914 0.941 0.498

100 0.923 0.927 0.934 0.640

150 0.911 0.925 0.911 0.721

300 0.925 0.944 0.915 0.849

500 0.938 0.940 0.930 0.888

1000 0.951 0.947 0.954 0.930

(−2,−0.5,1.5,−0.5) 50 0.914 0.909 0.937 0.634

100 0.920 0.908 0.919 0.716

150 0.900 0.913 0.922 0.792

300 0.899 0.931 0.924 0.838

500 0.916 0.937 0.932 0.889

1000 0.940 0.952 0.930 0.921

(−2,0.5,1,−0.5) 50 0.913 0.919 0.932 0.626

100 0.921 0.918 0.916 0.715

150 0.902 0.922 0.920 0.790

300 0.899 0.939 0.924 0.838

500 0.916 0.950 0.932 0.889

1000 0.940 0.955 0.930 0.921

(−3,−1,0.5,0.8) 50 0.933 0.934 0.933 0.494

100 0.952 0.942 0.924 0.674

150 0.945 0.959 0.925 0.763

300 0.952 0.956 0.923 0.841

500 0.940 0.950 0.925 0.894

1000 0.947 0.957 0.930 0.916

(−2,−0.5,1.5,0.5) 50 0.918 0.915 0.918 0.625

100 0.931 0.917 0.935 0.742

150 0.932 0.927 0.934 0.788

300 0.955 0.937 0.925 0.843

500 0.952 0.941 0.929 0.894

1000 0.944 0.951 0.932 0.929

(−3,0.5,1,0.5) 50 0.917 0.922 0.919 0.625

100 0.937 0.919 0.934 0.743

150 0.938 0.915 0.933 0.786

300 0.953 0.913 0.925 0.843

500 0.948 0.933 0.929 0.894

1000 0.947 0.955 0.932 0.929

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

The resulting heifers arising out of tropical regions experiencing the first calvingwith approximately 4 years, two times more time than heifers from temperate regions(Paro et al. 2013; Pereira 2000).

In this context, a long-term study involving cows of Tabapua breed was held atEMBRAPA, a Brazilian agricultural research institute, in order to infer on the timeup to first calving in polled Tabapua cows. The Tabapua breed is strong, rustic andchunky, with excellent mothering ability and potential to gain weight. It originatedcrossover made in the municipality of Tabapua within the state of São Paulo, Brazil,with crossbred zebu, which showed characteristics of spontaneous polled cattle comingfrom European temperate regions with high genetic potential for meat production andmilk Paro et al. (2013).

The sample data consist of the time up to first calving, in days, of 500 animalsobserved from 1983 to 2007 and two covariates: the time when the calf was born, afteror before 2000 (period) and the age that occurred the first oestrus of the cow (prp),until one year or after 1 year.

Firstly, a brief descriptive analysis is provided. The minimum observed time was730 days, or approximately 24 months; the maximum observed was 4323 days or 144months. The median time for the first calving is 1128 days (37.6 months) and eventhe first and third quantiles are, 1053 and 1328 days, respectively.

By using the nonparametric Log-Rank test to compare the survival distributionsof two samples, we observed p < 0.0001 and 0.0248, respectively, for covariates prpand period (see the survival curves in Fig. 6). These p-values show us the significanceof this covariates to describe the time of the first calf. Another nonparametric testwas made (Peto-Peto test) getting up almost the same results (p-values < 0.0001 and0.0006, respectively).

As pointed out in the Sect. 1, the TTT plots in Fig. 1 indicate unimodal hazard.Then, the transmuted log-logistic and the simple log-logistic distributions were fit-ted to the data. Table 3 provides the MLEs, their corresponding standard errors and95 % confidence intervals, as well as the computed −2 log likelihood (−2l) and theAIC. Both criteria provide evidence in favor of the transmuted log-logistic regressiondistribution. Note that, as pointed out by Shaw and Buckley (2007), the transmuta-tion map introduce a skewness in the log-logistic distribution which is intended to bean asymmetric distribution, leading to a more “peaked” distribution, with the modehigher than the one provided on the usual log-logistic distribution bases. In terms ofour application it totally make sense. In general, the crossbreeds are in similar periodswhich increases the chance of a first calf around at the same period of time, increasingthus the mode.

Furthermore, following Burnham and Anderson (2002), we verify if the MLE valuesare influenced by a specific experimental observation, which can be regarded as anoutlier. We considered the one-leaved-out approach on the base of a cross validationscheme, in the sense that the parameters of the transmuted log-logistic regressionwere re-estimated 500 times. Each time one specific observed time was withdrawn.We also estimated the standard errors, confidence intervals and the −2 log likelihoodand AIC criteria. Moreover, the differences, (i , given by (i = AI Ci − AI Cmin, werecalculated. After, the Akaike weights given by ωi = exp(−(i/2) were obtained andplotted according to the left panel of Fig. 4. It is clear the presence of outliers. We decide

123

Author's personal copy

The transmuted log-logistic regression model

Table 3 MLEs considering the log-logistic and transmuted log-logistic regression model

Model Parameter Estimate Standarderror

IC 95 % −2l AIC

Lower Upper

Transmuted γ0 −17.092 0.801 −18.667 −17.860 6925.3 6935.3

γ1 −0.364 0.137 −0.632 −0.095

γ2 3.281 0.794 1.722 4.840

β 2.954 0.128 2.703 3.205

λ −0.764 0.084 −0.930 −0.598

Log-Logistic γ0 −19.319 0.742 −20.777 −17.860 6936.8 6944.8

γ1 3.213 0.809 1.623 4.804

γ2 −0.419 0.154 −0.722 −0.117

β 3.208 0.122 2.968 3.448

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

Index

Aka

ike

Wei

ghts

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

Index

wi

Fig. 4 Cross validation results: Akaike weights for the complete sample (left panel) and without influentialpoints (right panel)

remove all experimental points with ωi > 0.20, corresponding to the observations 15,143, 231, 242, 289 and 360. The Akaike weights in the sample without the outliersare shown in the right panel of Fig. 4.

Table 4 shows the MLEs for the parameters of the transmuted log-logistic regressionmodel in the sample without the outliers. It seems there is no important changes in theMLEs, thought the −2l and AIC values are much smaller than those obtained whenthe complete sample is considered.

Moreover, as a goodness-of-fit procedure, we performed a residuals analysis forboth models (transmuted log-logistic and simple log-logistic regression ones) by usingthe Cox-Snell residuals Cox and Snell (1968). The Cox-Snell residuals are defined asei = *(ti | xi ), where *(.) is the cumulative hazard function of the adjusted model.If we consider the log-logistic and transmuted log-logistic models, the Cox-Snellresiduals are, respectively, given by

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

Table 4 MLEs, standard error and the 95 % confidence interval considering the transmuted log-logisticregression model in the sample withou outliers

Model Parameter Estimate Standarderror

IC 95 % −2l AIC

Lower Upper

Transmutedwithout outliers

γ0 −20.611 1.065 −20.704 −18.518 6366.9 6376.9

γ1 −0.430 0.145 −0.552 −0.145

γ2 3.646 0.913 1.851 5.440

β 3.549 0.168 3.218 3.880

λ −0.710 0.118 −0.943 −0.478

ei L L = ln[1 + exp(γ (x))tβi

](16)

and

eiT L L = 2 ln[1 + exp(γ (x))tβi

]− ln

[1 + exp(γ (x))tβi − λ exp(γ (x))tβi

], (17)

where γ (x) = γ0 + γ1x1 + · · · + γpx p.The Fig. 5 shows the estimated residuals versus the estimated empirical survival

for the residuals. The criteria provides clear evidence in favor of the transmuted log-logistic regression model.

The Fig. 6 show the estimated hazard curve and the estimated most probable timefor the first calf (Tmax) according to both covariates. The Tmax is equals to 29.52 monthswith a 95 % confidence interval equals to (26.56, 32.49) if the prp occurs before thefirst year. For calves born until the year 2000, the Tmax is equals to 42.97 months witha 95 % confidence interval equal to (41.43, 44.50). For calves born after the year 2000or prp occurring after the first year, the Tmax is equals to 40.77 months with a 95 %confidence interval equal to (39.35, 42.18).

Considering the period prior to the year 2000 we can observe an older age at thefirst calving, occurring close to 4 years. This time decreases for less than 3.5 yearsshowing that the current reproductive technics of livestock are more efficient than inthe period before. As the reproductive life begins earlier, the cow has an increasingservice number/conception and consequently increase its reproductive life by reducingthe number of cow breeding cull.

7 Conclusion remarks

In this paper we have introduced a new generalization of the log-logistic distribution,the transmuted log-logistic regression model. Moreover, we considered the presenceof covariates in order to consider the presence of heterogeneity. The proposed dis-tribution is generated by using the quadratic rank transmutation map and taking twoparameter log-logistic distribution (which presents unimodal hazard function) as thebase distribution.

123

Author's personal copy

The transmuted log-logistic regression model

0 2 4 6 8 10

02

46

810

ei

−log

(S(e

i))

0 1 2 3 4 5 6

01

23

45

6

ei

−log

(S(e

i))

Fig. 5 Residuals estimated versus the empirical survival estimated for the residuals: Log-Logistic (leftpanel), Transmuted Log-Logistic distributions (right panel)

1000 1500 2000 2500 3000

0.00

00.

002

0.00

40.

006

0.00

80.

010

0.01

2

time (days)

h(t)

prp until one yearprp after one year

1000 1500 2000 2500 3000

0.00

00.

001

0.00

20.

003

0.00

4

time (days)

h(t)

calves after 2000calves before 2000

Fig. 6 Hazard estimate curve, with the Tmax

We have provided closed expressions for several probabilistic measures including,the probability density function, function hazard, moments, quantile function, mean,variance and median. Likelihood based inference is provided. A simulation study isperformed, from which we learned that the maximum likelihood estimate biases andstandard error decreases with the increasing of the sample size, and that the coverageprobability of a 95 % two sided confidence intervals for the model parameters becomecloser of the nominal ones as the sample size increases. In the real dataset on the timeup to first calving in polled Tabapua cows, the new model outperforms the fit providedby the usual log-logistic model taking into account several criteria.

Finally, following Hill (2014), at least in principle, the proposed model has potentialuse in quantitative genetics, particularly for evaluation of generic merit of animals tosupport selection programs.

Acknowledgments The authors are grateful to the referees for their useful comments and suggestions.The research is partially supported by the Brazilian organizations CNPq and FAPESP.

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

Appendix A: Probability density function proof

According to Mood et al. (1974), any function f (.) with domain the real line andconterdomain [0,∞) is defined to be a pdf if and only if (i) f (x) ≥ 0 for all x and(ii)

∫ ∞−∞ f (x)dx = 1.

Considering X the non-negative random variable distributed according to a trans-muted log-logistic distribution with pdf given by (2), it is easy to see that the firstproperty is satisfied for all x > 0.

The second property is shown as follows.Firstly,

∫ +∞

0f (x)dx =

∫ +∞

0eµβxβ−1

[(1 + eµxβ

)− λ

(eµxβ − 1

)]

(1 + eµxβ

)3 dx =∫ +∞

0eµβxβ−1(1 + eµxβ)−3dx +

∫ +∞

0e2µβx2β−1(1 + eµxβ)−3dx −

∫ +∞

0λe2µβx2β−1(1 + eµxβ)−3dx +

∫ +∞

0λeµβxβ−1(1 + eµxβ)−3dx .

Replacing y = xβeµ, x = e−µ/β y1/β and dy = eµβxβ−1dx in the equation above,it follows that

∫ +∞

0f (x)dx = (λ + 1)

∫ +∞

0(1 + y)−3dy + (1 − λ)

∫ +∞

0y(1 + y)−3dy.

We know that

B(υ,ω) =∫ +∞

0

zω−1

(1 + z)υ+ω, (18)

in terms of Gamma function we have

B(υ,ω) = ,(υ),(ω)

,(υ + ω)= (υ − 1)!(ω − 1)!

(υ + ω − 1)! .

Then, the integrate is given by

(λ + 1)

∫ +∞

0(1 + y)−3dy = λ + 1

2(19)

and

(1 − λ)

∫ +∞

0y(1 + y)−3dy = 1 − λ

2. (20)

123

Author's personal copy

The transmuted log-logistic regression model

Finally, adding the result of (19) with (20) we have

λ + 12

+ 1 − λ

2= 1

Therefore, the equation (2) is a pdf.

Appendix B: Hessian matrix

The Hessian matrix is given by

A =

⎜⎜⎜⎝

A11 . . . A1(p+3)

A21 . . . A2(p+3)...

. . ....

A(p+3)1 . . . A(p+3)(p+3)

⎟⎟⎟⎠

where

⎜⎜⎜⎝

V11 . . . V13

V21 . . . V23...

. . ....

V(p+3)1 . . . V(p+3)(p+3)

⎟⎟⎟⎠=

⎜⎜⎜⎝

A11 . . . A1(p+3)

A21 . . . A2(p+3)...

. . ....

A(p+3)1 . . . A(p+3)(p+3)

⎟⎟⎟⎠

−1

and

A( j+1)( j+1) = − ∂2l

∂γ 2j

=n∑

i=1

[xi j Ci j (1 − λ)

1 + Ci j − λ(Ci j − 1)−

[xi j Ci j (1 − λ)

1 + Ci j − λ(Ci j − 1)

]2]

−3n∑

i=1

[x2

i j Ci j

1 + Ci j−

x2i j C

2i j

(1 + Ci j )2

]

A(p+2)(p+2) = − ∂2l∂β2 =

n∑

i=1

[ln(yi )

2Ci j (1 − λ)

1 + Ci j − λ(Ci j − 1)−

[ln(yi )Ci j (1 − λ)

]2

[1 + Ci j − λ(Ci j − 1)

]2

]

− nβ2 − 3

n∑

i=1

[ln(yi )

2Ci j

1 + Ci j−

(ln(yi )Ci j

1 + Ci j

)2]

A(p+3)(p+3) = − ∂2l∂λ2 =

n∑

i=1

[

−(1 − Ci j

)2

[1 + Ci j − λ

(Ci j − 1

)]2

]

where Ci j = yβi exp

[∑ jj=1 γ j xi j

], j = 0, 1, . . . , p and Al×m (l = m) are the partial

derivates.

123

Author's personal copy

F. Louzada, D. C. T. Granzotto

References

Akaike H (1973) Information theory and the maximum likelihood principle. In: Petrov V, Csâki F (eds)International symposium on information theory. Budapest, Akadémiai Kiadó

Aryal GR (2013) Transmuted log-logistic distribution. J Stat Appl Probab 2:11–20Aryal GR, Tsokos CP (2009) On the transmuted extreme value distribution with applications. Nonlinear

Anal 71:1401–1407Aryal GR, Tsokos CP (2011) Transmuted Weibull distribution: a generalization of the Weibull probability

distribution. Eur J Pure Appl Math 4(2):89–102Barlow RE, Campo RA (1975) Total time on test processes and applications to failure data analysis. In:

Reliability and fault tree analysis, pp 451–481Bennett S (1983) Log-logistic regression models for survival data. J R Stat Soc Ser C 32:165–171Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-

theoretic approach, 2nd edn. Springer, BerlinChen MH, Ibrahim JG, Sinha D (2001) Bayesian survival analysis. Springer, New YorkCox DR, Snell EJ (1968) A general definition of residuals. J R Stat Soc 30:248–275Ghitany ME (2001) A compound rayleigh survival model and its application to randomly censored data.

Stat Pap 42:437–450Hanagal DD (2009) Weibull extension of bivariate exponencial regression model with different frailty

distributions. Stat Pap 50:29–49Hill WG (2014) Applications of population genetics to animal breeding, from wright, fisher and lush to

genomic prediction. Genetics 196(1):1–16Lai CD (2013) Constructions and applications of lifetime distributions. Appl Stoch Models Bus Ind 29:127–

140Louzada F, Bereta EMP, Franco MAP (2012) On the distribution of the minimum or maximum of a random

number of i.i.d. lifetime random variables. Appl Math 3(4):350–353Mackensie G (1997) Regression models for survival data: the generalized time-dependent logistic family.

J R Stat Soc 45(1):21–34Marshall A, Olkin I (2007) Life distributions: structure of nonparametric, semiparametric and parametric

families. Springer, New YorkMood AM, Graybill FA, Boes DC (1974) Introduction to the theory of statistics, 3rd edn. McGraw Hill,

New YorkParo PAZ, Santos ALQ, Maximiniano-Neto A, Paro JLN, Rodrigues DC, Cruz GC, Malta TS, Ribeiro

FM, Andrade MA (2013) Anatomic study of the vascular casts of the testicular arteries in bovines ofTabapua race. Biosci J 1:306–318

Pereira JCC (2000) Contribuitpo genTtica do Zebu na pecuária bovina do Brasil. Inf Agropecu 21:30–38Sarabia JM, Prieto F (2009) The Pareto-positive stable distribution: a new descriptive method for city size

data. Phys A 388(19):4179–4191Shaw WT, Buckley IRC (2007) The alchemy of probability distributions: beyond Gram–Charlier expan-

sions, and a skew–kurtotic–normal distribution from a rank transmutation map. UCL Discovery Repos-itory, pp 1–16

Tojeiro CAV, Louzada-Neto F (2011) The log-logistic regression model with a threshould stress. TrendMath Appl Comput 1:67–77

123

Author's personal copy