Upload
lamphuc
View
222
Download
2
Embed Size (px)
Citation preview
Maximum Likelihood Based Estimation of Hazard Function under Shape
Restrictions and Related Statistical Inference
by
Desale Habtzghi
(Under the direction of Somnath Datta and Mary Meyer )
Abstract
The problem of estimation of a hazard function has received considerable attention in
the statistical literature. In particular, assumptions of increasing, decreasing, concave and
bathtub-shaped hazard function are common in literature, but practical solutions are not well
developed. In this dissertation, we introduce a new nonparametric method for estimation of
hazard function under shape restrictions to handle the above problem. This is an important
topic of practical utility because often, in survival analysis and reliability applications, one
has a prior notion about the physical shape of underlying hazard rate function. At the
same time, it may not be appropriate to assume a totally parametric form for it. We adopt
a nonparametric approach in assuming that the density and hazard rate have no specific
parametric form with the assumption that the shape of the underlying hazard rate is known
( either decreasing, increasing, concave, convex or bathtub-shaped). We present an efficient
algorithm for computing the shape restricted estimator. The theoretical justification for the
algorithm is provided. We also show how the estimation procedures can be used when dealing
with right censored data. We evaluate the performance of the estimator via simulation studies
and illustrate it on some real data sets.
We also consider testing the hypothesis that the lifetimes come from a population with a
parametric hazard rate such as Weibull against a shape restricted alternative which comprises
a broad range of hazard rate shapes. The alternative may be appropriate when the shape of
the parametric hazard is not constant and monotone. We use appropriate resampling based
computation to conduct our tests since the asymptotic distributions of the test statistics in
these problems are mostly intractable.
Index words: Survival Analysis, Hazard Function, Survival Function, Right CensoredData, Nonparametric, Estimation, Parametric, increasing, decreasing,Bathtub-Shaped, Concave, Shape Restricted Estimator, Simulation,Testing, Resampling.
Maximum Likelihood Based Estimation of Hazard Function under Shape
Restrictions and Related Statistical Inference
by
Desale Habtzghi
B.S., University of Asmara, Eritrea, 1996
M.S., Southern Illinois University, U.S.A, 2001
M.S., University of Georgia, U.S.A, 2003
A Dissertation Submitted to the Graduate Faculty
of The University of Georgia in Partial Fulfillment
of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
Athens, Georgia
2006
Maximum Likelihood Based Estimation of Hazard Function under Shape
Restrictions and Related Statistical Inference
by
Desale Habtzghi
Approved:
Major Professor: Somnath Datta and Mary Meyer
Committee: Ishwar Basawa
Daniel Hall
Lynne Seymour
Electronic Version Approved:
Maureen Grasso
Dean of the Graduate School
The University of Georgia
May 2006
Acknowledgments
Writing acknowledgments is a time to reflect upon the glorious struggle that has just taken
place and remember each step along the way. At every turn there are many who have given
their time, energy and expertise and I wish to thank each for the help.
I would like to express my sincere appreciation to my major professors, Dr. Somnath
Datta and Dr. Mary Meyer, who provided not only the direction for the project, but also an
enthusiasm and personal concern which greatly contributed to its progress. Dr. Meyer’s inno-
vative ideas have provided me with a new research avenue and a desire to learn more about
the nonparametric function estimation using shape restrictions. I appreciate her endless help
in pushing me to fully understand the concepts of shape restrictions, without her open door,
open mind and potential it is impossible to complete this project. Dr. Datta broadened my
horizons, I particularly would like to thank him for helping to open my eyes to biostatistics
discipline. I really appreciate all the inputs, advice and encouragement I got from him. He
is always there for me when I call him.
I would like to thank Dr. Ishwar Basawa, Dr. Daniel Hall and Dr. Lynne Seymour for
serving on my committee as well as for their comments and enhancing my professional
development. I am grateful to have spent five years with most knowledgeable professors and
the most friendly staff as well as fellow students, building my solid professional background.
In particular, I would like to thank Dr. Seymour for teaching me Fortran 90 while I was
taking Stat 8060.
v
vi
I would like to express my appreciation to Dr. Robert Lund, Dr. Robert Taylor, Dr.
Tharuvai Sriram and Dr. John Stufken for allowing me to teach in the department of statis-
tics. I would like to Thank Dr. Pike for always wishing me the best.
I am especially appreciative of the support and love of several friends including Mehari,
Thomas, Tesfay, Ron, Musie, Simon, Mebrahtu, Aman, Abel, J. Park, Ross, Archan, Haitao,
Lin Lin, Ghenet, Helen, Dipankar and others who made it easy to live away from home.
I thank my parents for always being there for me. Finally, I would like to express my
sincere thanks to my relatives for their endless love and support. Above all, my highest
gratitude to my God.
I would like to dedicate this dissertation to the memory of my brother, Hagos, who has
passed away because of a tragic accident in 2002.
Table of Contents
Page
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Distribution of failure time . . . . . . . . . . . . . . . . . . 6
2.2 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Shape Restricted Regression . . . . . . . . . . . . . . . . . . 15
3 ESTIMATION OF HAZARD FUNCTION UNDER SHAPE RESTRIC-
TIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Uncensored Sample . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Computing the Estimator . . . . . . . . . . . . . . . . . . . . 31
3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Right Censored Sample . . . . . . . . . . . . . . . . . . . . . 43
4 SIMULATION STUDIES AND APPLICATION TO REAL DATA
SETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
vii
viii
4.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Application To Real Data Sets . . . . . . . . . . . . . . . . . 52
5 TESTING FOR SHAPE RESTRICTED HAZARD FUNCTION
USING RESAMPLING TECHNIQUES . . . . . . . . . . . . . . . . . 62
5.1 Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Resampling Approach . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Bootstrap based tests . . . . . . . . . . . . . . . . . . . . . . 68
5.4 Simulation Studies and Results . . . . . . . . . . . . . . . . . 70
6 CONCLUSIONS AND FUTURE RESEARCH . . . . . . . . . . . . . 77
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Bayesian Approach To Shape Restricted Hazard Function 77
6.3 Marginal Estimation of Hazard Function Under Shape
Restriction in Presence of Dependent Censoring . . . . . 79
6.4 Hazard Function Estimation Using Splines Under Shape
Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Appendix
A Head and Neck Cancer data for Arm A . . . . . . . . . . . . . . . 89
B Bone Marrow Transplantation for leukemia data . . . . . . . . 92
C Data for Leukemia Survival Patients . . . . . . . . . . . . . . . . . 94
D Generator fans failure data . . . . . . . . . . . . . . . . . . . . . . 97
List of Tables
2.1 Parametric Distributions with increasing and decreasing hazard rates . . . . 9
4.1 Comparison of SRE, Kaplan Meier and kernel estimators using OMSE when
the underlying hazard function is increasing convex. . . . . . . . . . . . . . . 51
4.2 Comparison of SRE, Kaplan Meier and kernel estimators using OMSE when
the underlying hazard function is convex. . . . . . . . . . . . . . . . . . . . . 51
4.3 Comparison of Direct and Weighted approaches for estimating increasing
convex hazard function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Simulation results of bias and mean square error for SRE, kernel and Kaplan
Meier estimators at 0, 25 and 50 percent censoring with n=25 from increasing
convex hazard function (Weibull distribution with α = 3, λ = 6). . . . . . . . 54
4.5 Simulation results of bias and mean square error for SRE, kernel and Kaplan
Meier estimators at 0, 25 and 50 percent censoring with n=50 from increasing
convex hazard function (Weibull distribution with α = 3, λ = 6). . . . . . . . 55
4.6 Simulation results of bias and mean square error for SRE, kernel and Kaplan
Meier estimators at 0, 25 and 50 percent censoring with n=25 from bathtub
shaped hazard function (exponentiated Weibull distribution with α = 3, λ =
10 and θ = 0.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Simulation results of bias and mean square error for SRE, kernel and Kaplan
Meier estimators at 0, 25 and 50 percent censoring with n=50 from bathtub
shaped hazard function (exponentiated Weibull distribution with α = 3, λ =
10 and θ = 0.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
ix
x
5.1 Power values for specific values of η, nominal level 0.05, and n =25, 50 and
100 based on log rank (LR), Kolmogorov’s goodness of fit (KS) at 0 and 25
level of censoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Size-power comparison for shape constrained and unconstrained tests for spe-
cific values of η, nominal level 0.05 based on LR and KS without censoring. . 73
A.1 Survival times (in days) for patients in Arm A of the Head and Neck Cancer
Trial. The 0 denotes observations lost to follow up. . . . . . . . . . . . . . . 90
B.1 Bone Marrow Transplantation for acute lymphoblastic leukemia (ALL) group,
status=0 indicates alive or disease free, and status=1 indicates dead or relapsed.) 93
C.1 Data for Leukemia Patients, status=0 indicates still alive and status=1 indi-
cates dead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
D.1 Generator fan failure data in thousands of hours of running time; status=1
indicates failure, and status=0 indicates censored. . . . . . . . . . . . . . . . 98
List of Figures
1.1 Typical Hazard Shapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Examples of fits to scatterplot. (a) The solid curve is convex fit, the dashed
curve is quadratic fit and the dotted curve is the underlying convex func-
tion.(b) The solid curve is convex fit, the dashed curve is linear fit and the
dotted curve is the underlying quadratic function. . . . . . . . . . . . . . . . 26
3.1 Estimation results using percentiles as data. The failure times are quantiles of
exponentiated Weibull distribution with parameters α = 4, η = 1 and λ = 10.
The thin solid curve is the underlying hazard rate, the thick solid curve is
SRE estimate, the dotted curve is kernel estimate, and the dashed curve is
Kaplan Meier estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Estimation results using percentiles as data. The failure times are quantiles
of exponentiated Weibull distribution with parameters α = 3, η = 0.2 and
λ = 10. The thin solid curve is the underlying hazard rate estimate, the
thick solid curve is SRE estimate, the dotted curve is kernel estimate, and the
dashed curve is Kaplan Meier estimate. . . . . . . . . . . . . . . . . . . . . . 42
3.3 Estimation results using percentiles as data. The failure times are quantiles
of a distribution function with quadratic hazard function. The thin curve is
the underlying hazard rate, the thick solid curve is SRE estimate, the dotted
curve is kernel estimate, and the dashed curve is Kaplan Meier estimate. . . 44
xi
xii
3.4 Comparison of Survival functions estimated by different methods. The thin
solid curve is the underlying survival function, the thick solid curve is the
shape restricted estimate, the dotted curve is Kaplan Meier estimate and the
dashed curve is kernel estimate. . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 Estimates of hazard rates for the head and neck cancer data based on kernel
(dashed curve), SRE (solid curve) and parametric (dotted curve) estimators. 58
4.2 Estimates of hazard rates for the bone marrow transplantation data based
on SRE (thick solid curve), kernel (dashed curve) and PMLE (dotted curve)
estimators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Estimates of hazard rates for the Leukemia Survival Data based on SRE (solid
curve), kernel (dotted curve) and Kaplan Meier (short dashed curve) and
PMLE (long dashed curve) estimators. . . . . . . . . . . . . . . . . . . . . . 60
5.1 Graph of hazard function for the model (5.3.1) when α = 6, λ = 10 and η = 1,
0.75 and 0.5 (solid curves) from lowest to highest, η =0.025 and 0.01 (dashed
curves) from lowest to highest, and α = 1, η = 1 (dotted curve). . . . . . . . 70
5.2 Power at selected η values for nominal level 0.05, for log-rank test for 25 ( solid
curve), 50 (dotted curve) and 100 (short dashed curve) sample sizes, while the
long dashed curve represents the nominal level α = 0.05. . . . . . . . . . . . 74
6.1 The edges for convex piecewise quadratic when K=5, with equally spaced knots. 81
6.2 Comparison of SRE and quadratic spline, the failure times are generated from
Weibull distribution with shape and scale parameters 3 and 0.03. The dotted
curve is the underlying hazard rate, the dashed curve is SRE estimate and
the solid curve is shape restricted quadratic spline estimate. . . . . . . . . . 82
Chapter 1
INTRODUCTION
The problem of analyzing time to event data arises in many fields. In the biomedical sci-
ences, the event of interest is most often the time of death of an individual, measured from
the time of disease onset, diagnosis, or the time when a particular treatment was applied. In
social sciences, events of interest might include the timing of arrests, divorces, revolutions,
etc. Time-to-event data are also common in engineering, where the focus is most often on
analyzing the time until a piece of equipment fails. All the above fields use different terms
for the analysis of the occurrence and the timing of events. For example, the terms survival
analysis, event-history analysis and failure-time analysis are used in biomedical, social sci-
ences and engineering, respectively. We will use the term survival analysis throughout this
dissertation.
Let T be the duration of time when the subject is alive or doesn’t fail. In survival
analysis there are three functions that characterize the distribution of T . These are, the
survival function, which is the probability of an individual surviving beyond time t; the
probability density (probability mass) function, which is the unconditional probability of
the event occurring at time t; and the hazard rate (function) which is the probability an
individual dies in the time interval t ≤ T < t + ∆ no matter how small ∆ is, provided that
the individual has survived to time t. If we know one of these functions, then the other two
can be uniquely determined.
1
2
The hazard function is a fundamental quantity in survival analysis. It is also termed
as the failure rate, the instantaneous death rate, or the force of mortality and is defined
mathematically as,
h(t) = lim∆t→0
p(t ≤ T < t + ∆|T ≥ t)
∆t.
The hazard function is usually more informative about the underlying mechanism of
failure than the survival function. For this reason, modeling the hazard function is an impor-
tant method for summarizing survival data. Hazard functions have various shapes, some of
them are increasing, decreasing, constant, bathtub shaped, hump-shaped or possessing other
characteristics. See Figure 1.1 for a picture of typical hazard shapes occurring in practice.
For instance, model (a) has an increasing hazard rate. This may arise when there is natural
aging or wear. Model (b) has a bathtub shaped hazard. Most population mortality data
follow this type of hazard function where, during an early period, deaths result primarily
from infant diseases, after which the death rate stabilizes, followed by an increasing hazard
rate due to the natural aging process. Model (c) has a constant hazard rate. Individuals from
a population whose only risks of death are accidents or rare illness show a constant hazard
rate. Model (d) has a decreasing hazard rate. Decreasing hazard functions are less common
but find occasional use when there is an elevated likelihood of early failure, such as certain
types of electronic devices.
The problem of estimation of hazard function has received considerable attention in the
statistical literature. For discussions of some parametric and nonparametric hazard estima-
tors see Chapter 2. Estimations and inferences based on nonparametric methods have been
shown to be less efficient than those based on suitably chosen parametric models (Miller,
1981).
Hence, in the absence of any distributional assumptions about h(t) other than the shape
constraints to make estimation and related inferences of h(t) based on nonparametric method
3
Time
Haza
rd
0 2 4 6 8 10
0.00.2
0.40.6
0.8
a) increaseb) constant b) bathtubd) decrease
Figure 1.1: Typical Hazard Shapes.
can be even less efficient. So when the only information we have is that the underlying hazard
function is decreasing, increasing, concave, convex or bathtub, the shape restricted estimate
may provide a more acceptable estimate.
In this dissertation, we introduce a new nonparametric method for estimation of hazard
functions under shape restrictions to handle the above problem. This is an important topic
of practical utility because often, in survival analysis and reliability applications, one has a
prior notion about the physical shape of underlying hazard rate function. At the same time,
it may not be safe or appropriate to assume a totally parametric form for it. In such cases,
the prior notion may translate into a restriction on its shape. Furthermore, we show how the
estimation procedures can be used when dealing with right censored data.
4
We also study the problem of testing whether survival times can be modeled by certain
parametric families which are often assumed in applications. Instead of omnibus tests, we
compare hazard rates derived nonparametrically but under similar shape restrictions as
the parametric hazard. We use appropriate resampling-based computation to conduct our
tests since the asymptotic distributions of the test statistics in these problems are largely
intractable.
Estimation and inference for tests involving shape restriction are not easy but methods
for their numerical computation exists (Robertson, Wright, and Dykstra 1988; Fraser and
Massam 1989; Meyer 1999a). We review this issue in detail in Chapter 2, section 2.4.
In our approach, we consider the maximum likelihood technique for estimating the con-
strained hazard function. The shape restricted estimator can be obtained through iteratively
reweighted least squares. This technique has been used in a variety of contexts. Meyer (1999b)
used iteratively reweighted least squares to estimate the maximum likelihood of constrained
potency curve. Meyer and Lund (2003) also applied this technique on time series data for
estimating shape restricted trend models. In addition, Fraser and Massam (1989) applied
the weighted least squares method to obtain the least square estimate of concave regression.
The problem of finding the least square estimator of the concave and convex function
over the constraint space is a quadratic programming problem. There is no known closed
form solution, but it can be obtained by the hinge algorithm of Meyer (1999) or the mixed
primal-dual bases algorithm of Fraser and Massam (1989). These algorithms are given in
section 2.4.
The dissertation is organized as follows: In Chapter 2 we begin with a review of the
literature. We discuss various estimation methods proposed for the hazard rate. This Chapter
also presents a summary review of shape restricted regression and the constraint cone, over
which we maximize the likelihood or minimize the sum of squared errors. In Chapter 3, the
general formulation and some theoretical properties of our method are discussed. Section
5
3.1 deals with construction of the new estimator for uncensored data and section 3.2 deals
with the problem of estimation of hazard function for right censored data. For the right
censored data case, two approaches of obtaining the shape restricted estimator for hazard
are discussed. Simulation results and some real examples are given in Chapter 4. Chapter 5 is
devoted to testing for shape restricted hazard function using resampling technique. Finally,
Chapter 6 deals with future research:
1. Bayesian approaches to the shape restricted hazard function,
2. Marginal estimation of hazard function under shape restriction in presence of depen-
dent censoring, and
3. Hazard function estimation using splines under shape restrictions.
Chapter 2
LITERATURE REVIEW
In this chapter we give basic definitions of functions related to lifetimes. We also review
some pre-existing methods used in the estimation of the hazard function and provide some
background of shape restricted regression.
2.1 Distribution of failure time
Let T denote a nonnegative random variable representing the lifetime of an individual in
some population. Suppose that the lifetime T has the distribution function F and density
f . We would then define the survival function of T as
S(t) = P (T > t) = 1 − F (t).
If T is a continuous random variable, then
h(t) =f(t)
S(t)= lim
∆t→0
p(t ≤ T < t + ∆|T ≥ t)
∆t.
A related quantity is the cumulative hazard function H(t), defined by
H(t) =∫ t
0h(u)du = − log(S(t)).
Thus, for continuous lifetimes we have the following relationships:
1. S(t) = exp(−H(t)) = exp−∫ t0 h(u)du;
2. h(t) = −log S(t)′;
6
7
3. f(t) = −S ′(t);
4. f(t) = h(t) exp−H(t).
2.1.1 Some Parametric Distributions
The models discussed in this section are the most frequently used lifetime models. Reasons
for the popularity of these models include their ability to fit different types of lifetime data
and their mathematical and statistical tractability.
1. Weibull distribution with parameters α and λ
f(t) =α
λ
(
t
λ
)α−1
exp[
−(
t
λ
)α]
h(t) =α
λ
(
t
λ
)α−1
, S(t) = exp[
−(
t
λ
)α]
2. Exponentiated Weibull Family
The exponentiated Weibull distribution with parameters λ, η and α has:
f(t) =αη
λ[1 − exp(−(t/λ)α]η−1 exp (−(t/λ)α) (t/λ)α−1,
S(t) = 1 − [1 − exp (−(t/λ)α)]η ,
h(t) =αη [1 − exp(−(t/λ)α)]η−1 exp (−(t/λ)α) (t/λ)α−1
λ (1 − [1 − exp (−(t/λ)α)]η).
when η = 1 the exponentiated Weibull distribution will be reduced to the familiar
Weibull distribution with scale and shape parameters λ and α, respectively.
3. Gompertz-Makeham distribution with parameters θ, η and α has
f(t) = θeαt exp[−θ
α
(
1 − eαt)
],
h(t) = θeαt, S(t) = e[−θα
(1−eαt)].
8
4. Rayleigh distribution with parameters λ0, and λ1 has
f(t) = (λ0 + λ1t) exp(
−λ0t − 0.5λ1t2)
h(t) = λ0 + λ1t, S(t) = exp(
−λ0t − 0.5λ1t2)
.
5. Pareto distribution with parameters λ, and α has
f(t) =θλθ
tθ+1,
h(t) =θ
t, S(t) =
λθ
tθ.
From the different models we can see that hazard functions can be quite different in functional
form. It is hard to choose the appropriate model from these different parametric models of
no theoretical basis. In the absence of any strong distributional assumptions about h(·) other
than its shape, it may not be appropriate to use a totally parametric form of the hazard func-
tion. For example, the concepts of a distribution functions with increasing hazard function
are useful in engineering applications (Miller, 1981). However, we have many distributions
that have an increasing hazard function; this makes it difficult to select one without an
appropriate theoretical basis (see the Table 2.1). In addition to that, these models are not
capable of giving different shapes of hazard function such as U-shape hazard function, and
bimodal hazard function. For such conditions when the only information available is the
shape (decreasing, increasing, concave, convex or bathtub) of the underlying hazard func-
tion, a new nonparametric estimator that considers shape is introduced in this dissertation
to provide more acceptable estimates.
In Table 2.1 IFR and DFR stands for an increasing hazard rate and a decreasing hazard
rate, respectively.
9
Table 2.1: Parametric Distributions with increasing and decreasing hazard rates
Constant IFR DFRExponential Weibull(α > 1) Weibull (α < 1)
Gamma(α > 1) Gamma (α < 1)Rayleigh (λ > 0) Rayleigh (λ < 0)Gampertz (θ, α > 0)
Pareto (t > θ)
2.2 Censoring
What distinguishes survival analysis from other fields of statistics is that censoring and
truncation are common. A censored observation contains only partial information about the
random variable of interest. In this dissertation we considered the problem of estimating
and testing the constrained maximum likelihood estimator when the data may be subject
to right censoring. Right censoring means that not all of a set of independent survival times
or life times are observed, so that for some of them it is only known that they are larger
than given values. This is the most common type of censoring. Right censoring arises often
in medical studies. For example in clinical trials, patients may enter the study at different
times, then each is treated with one of the several possible therapies. If someone wants to
observe their lifetimes, but censoring occurs when subject is lost to follow up, drops out, dies
due to another cause, or the patient is still alive at the end of the study.
Let T1, T2, . . . , Tn denote iid lifetimes (times to failure) from the continuous distribution
function F , and Z1, Z2, . . . , Zn be the iid corresponding censoring times from continuous
distribution G. The times Ti and Zi are usually assumed to be independent. The observed
random variables are then Xi and δi where Xi = min(Ti, Zi) and δi = I(Ti ≤ Zi). Based on
10
this assumption and the distribution of Z does not involve any parameters of interest, we
derived the maximum likelihood function of the lifetimes in the next section.
2.3 Estimation
2.3.1 Parametric Procedures
Parametric methods rest on the assumption that h(t) is a member of some family of dis-
tributions h(t, θ), where h is known but depends on an unknown parameter θ, possibly
vector-valued. In general, θ is estimated in some optimal fashion, and its estimator θ is used
in h(t, θ) to obtain a parametric estimator of h(t) (Lawless, 1982; Miller, 1981). The Weibull
distribution is considered as illustrative of the parametric approach. Because of its flexibility
the Weibull distribution has been widely used as a model in fitting lifetimes data. Various
problems associated with this distribution have been considered by Cohen (1965) and many
other authors.
The likelihood function:
Here we concentrate on methods based on the likelihood function for a right censored sample.
We derive the general form of the likelihood function. Let T denote a lifetime with distribu-
tion function F , probability density function (pdf) f and survival function Sf ; and Z denote
a random censoring time with distribution function G, pdf g, and survival function Sg.
The derivation of the likelihood is as follows:
P (X = x, δ = 0) = P (Z = x, Z < T ) = P (Z = x, x < T )
= P (Z = x)P (x < T ) = g(x)Sf(x) by independence
P (X = x, δ = 1) = P (T = x, T < Z)
= P (T = x, x < Z) = f(x)Sg(x) by independence
11
Hence, the joint pdf of the pairs (Xi, δi) is a mixed distribution as X is continuous and δ
discrete. It is given by the single expression
P (x, δ) = g(x)Sf(x)1−δ · f(x)Sg(x)δ.
Then the likelihood function of the n iid pairs (Xi, δi) is given by
L =∏
f(xi)Sg(xi)δi · g(xi)Sf(xi)
1−δi
L =n∏
i=1
g(xi)1−δiSg(xi)
δi ·n∏
i=1
f(xi)δiSf(xi)
1−δi .
If the distribution of Z does not involve any parameters of interest, then the first factor
plays no role in the maximization process. Hence, the likelihood function can be taken to be
L =∏
f(xi)δiSf (xi)
1−δi
or
L =∏
h(xi)δiSf(xi) (2.3.1)
since f(xi) = h(xi)Sf(xi).
The log-likelihood function is
ℓ = log(L) =n∑
i=1
δi log h(xi) + log Sf (xi).
Replacing Sf (x) by exp(−H(x)), the log likelihood becomes,
ℓ =n∑
i=1
δi log h(xi) − H(xi) =n∑
i=1
δi log h(xi) −∫ xi
0h(u)du. (2.3.2)
For the uncensored case, all δi = 1, so
ℓ =n∑
i=1
log h(xi) − H(xi) =n∑
i=1
log h(xi) −∫ xi
0h(u)du. (2.3.3)
12
The maximum likelihood estimation for Weibull distribution: The hazard and cumulative
hazard functions of the Weibull distribution are h(t) = (α/λ) (t/λ)α−1 and H(t) = (t/λ)α,
respectively, with unknown scale λ and shape α parameters. The log-likelihood function from
a right censored sample can be written in the following form:
ℓ(λ, α) =n∑
i=1
[δi log h(ti) − H(ti)]
=n∑
i=1
[
δi log
(
α
λ
(
t
λ
)α−1)
−(
t
λ
)α]
=n∑
i=1
[
δi log(
α
λ
)
+ (α − 1)δi log(
tiλ
)
−(
tiλ
)α]
=n∑
i=1
δi log α −n∑
i=1
δiα log λ + (α − 1)n∑
i=1
δi log ti −n∑
i=1
(
tiλ
)α
Taking the first derivative of ℓ with respect to λ and equating it to 0, we obtain
∂ℓ
∂λ= −
αd
λ+ α
(
1
λ
)α+1 n∑
i=1
tαi = 0
λα =1
d
n∑
i=1
tαi (2.3.4)
Similarly, equating the derivative of ℓ with respect to α to 0, gives
∂ℓ
∂α=
d
α− d log λ +
n∑
i=1
δi log ti −(
1
λ
)α n∑
i=1
tαi log(
tiλ
)
= 0 (2.3.5)
Substituting (2.3.4) in (2.3.5), the following equation is obtained,
d
α+
n∑
i=1
δi log ti − d
n∑
i=1(tαi log ti)
n∑
i=1tαi
= 0, (2.3.6)
13
where d is the number of uncensored values.
If the shape parameter α is known, then the maximum likelihood estimator (MLE) of λ
can be obtained explicitly using (2.3.4). However, if α is unknown, then we cannot have an
explicit form of the MLE. Equation (2.3.6) can be solved for α using the Newton-Raphson
iterative method. Then the associated estimator of h(α, λ) is h(α, λ), where α, λ are the
MLEs of α, λ, respectively.
2.3.2 Nonparametric Procedures
Nonparametric procedures, on the other hand, do not require any distributional assumptions
about h(t). Thus, they are more flexible than their parametric counterparts, and as a result
they are widely used in the analysis of failure times (Kouassi and Singh, 1997). For discus-
sions of some nonparametric hazard estimators see Aalen (1978); Cox (1972); Watson and
Leadbetter (1964b); Antoniadis et al. (1999); Liu and Van Ryzin (1984); Ramlau-Hansen
(1983); and Kouassi and Singh (1997). For the present discussion we next review several of
these nonparametric approaches:
a) Kernel Hazard Estimator: Kernel smoothing for general non-parametric function estima-
tion is widely used in statistical applications, particularly for density, hazard and regression
functions. Kernel estimation of the hazard in the uncensored situation was first proposed and
studied by Watson and Leadbetter (1964). Then Ramlau-Hansen (1983), and Tanner and
Wong (1983) extended the idea for right censored data. They described a fixed bandwidth
Kernel-smoothed estimator of the hazard rate function as follows,
h(t) =1
b
n∑
i=1
K(
t − tib
)
δi
n − i + 1(2.3.7)
14
where K(·) is a kernel function, b is the bandwidth which determines the degree of smooth-
ness. In this dissertation the Epanechnikov kernel K(x) = 0.75(1 − x2) for −1 ≤ x ≤ 1 was
used throughout the examples and simulation studies.
b) Kaplan-Meier Type Estimate: Smith (2002), among many authors, discuss the following
estimates of the hazard function. Let ti denote a distinct ordered death time, i = 1, . . . , r ≤ n,
then the hazard rate function is estimated by h(ti) = di/ni and h(t) = di/ni(ti+1 − ti) at
an observed death ti and in the interval ti ≤ t < ti+1, respectively. Here di is the number of
deaths at ith death time and ni is the number of individuals at risk of death at time ti.
c) Semiparametric Approach to Hazard Estimation: Kouassi and Singh (1997) proposed
a mixture of parametric and nonparametric hazard rate estimators, instead of using either
exclusively.
Let
hαt(t, θ) = αth(t, θ) + (1 − αt)h(t), (2.3.8)
where h(t, θ) and h(t) are parametric and nonparametric estimators, respectively and αt is
estimated by minimizing the mean square error of hαt(t, θ).
d) Cox’s Proportional Hazard Model: Introduced by Cox (1972), this approach was devel-
oped in order to estimate the effects of different covariates influencing the times to failure
of a system. The proportional hazards model assumes that the hazard rate of a unit is the
product of an unspecified baseline failure rate, which is a function of time only and a pos-
itive function g(Z, A), independent of time, which incorporates the effects of a number of
covariates. The failure rate of a unit is then given by,
h(t, Z) = h0(t)g(Z, A)
15
where h0 is the baseline hazard rate, Z is a row vector consisting of the covariates, A is a
column vector consisting of the unknown parameters (also called regression parameters) of
the model. It can be assumed that the form of g(Z, A) is known and t is unspecified.
2.4 Shape Restricted Regression
In this section before we introduce our new nonparametric shape restricted estimator, we
review some fundamental concepts that can help us to lay groundwork for the construction
of the shape restricted estimator. The definitions, results, and their proofs along with more
details about the properties of the constraint cone and polar cones can be found in Rockafellar
(1970), Robertson et al. (1988), Fraser and Massam (1989), and Meyer (1999a).
Suppose we have the following model
yi = f(xi) + σǫi, i = 1, · · · , n.
In this model the errors ǫi’s are independent and have standard normal distribution, f ∈ Λ,
and Λ is a class of regression functions sharing a qualitative property such as monotonicity,
convexity or concavity.
The constrained set over which we maximize the likelihood or minimize the sum of squared
errors is constructed as follows: let θi = f(xi) and xi’s are known, distinct and ordered for
1 ≤ i ≤ n. The monotone nondecreasing constraints can be written as
θ1 ≤ θ2 ≤ . . . ≤ θn
If we consider piecewise linear approximations to the regression function with knots at
x−values, the nondecreasing convex, nondecreasing concave and convex shape restrictions
can be written as a set of linear inequality constraints. For example, if we are considering
convex, then we have
16
θ2 − θ1
x2 − x1≤
θ3 − θ2
x3 − x2≤ . . . ≤
θn − θn−1
xn − xn−1.
The constraints for nondecreasing convex can be written as
θ2 − θ1
x2 − x1
≤θ3 − θ2
x3 − x2
≤ . . . ≤θn − θn−1
xn − xn−1
, θ1 ≤ θ2,
and the constraints for nondecreasing concave are given by,
θ2 − θ1
x2 − x1≥
θ3 − θ2
x3 − x2≥ . . . ≥
θn − θn−1
xn − xn−1, θn−1 ≤ θn.
Any of these sets of inequalities defines m half spaces in IRn, and their intersection forms
a closed polyhedral convex cone in Rn. The cone is designated by C = θ : Aθ ≥ 0 for
m × n constraint matrix A (see Rockafellar, 1970, p. 170). For monotone, nondecreasing
convex we have m = n − 1, and for convex m = n − 2.
The nonzero elements of the m × n dimensional A:
1. For monotone constraints, Ai,i = −1 and Ai,i+1 = 1 for 1 ≤ i ≤ n − 1.
2. For nondecreasing convex, A1,1 = −1, A1,2 = 1, Ai,i−1 = xi+1 − xi, Ai,i = xi−1 − xi+1,
and Ai,i+1 = xi − xi−1, for 2 ≤ i ≤ n − 1.
3. For nondecreasing concave, Ai,i = −(xi+2 − xi+1), Ai,i+1 = −(xi − xi+2), Ai,i+2 =
−(xi+1 − xi), An−1,n−1 = −1 and An−1,n = 1 for 1 ≤ i ≤ n − 2.
4. For convex, Ai,i = xi+2−xi+1, Ai,i+1 = xi−xi+2 and Ai,i+2 = xi+1−xi for 1 ≤ i ≤ n−2.
For example if n = 5, the monotone constraint matrix A is given by
A =
−1 1 0 0 0
0 −1 1 0 0
0 0 −1 1 0
0 0 0 −1 1
17
If n = 5 and the x−coordinates are equally spaced, the nondecreasing convex, nondecreasing
concave and convex constraints are given by the following constraint matrices, respectively:
A =
−1 1 0 0 0
1 −2 1 0 0
0 1 −2 1 0
0 0 1 −2 1
,
A =
−1 2 −1 0 0
0 −1 2 −1 0
0 0 −1 2 −1
0 0 0 −1 1
,
and
A =
1 −2 1 0 0
0 1 −2 1 0
0 0 1 −2 1
.
2.4.1 Projection on a closed convex set
The ordinary least-squares regression estimator is the projection of the data vector y on to
a lower-dimensional linear subspace of Rn, whereas the shape restricted estimator can be
obtained through the projection of y on to an m dimensional polyhedral convex cone in
Rn (Meyer, 2003). We have the following useful proposition which shows the existence and
uniqueness of the projection of the vector y on a closed convex set (see Rockafellar, 1970, p.
332 ).
Proposition 1 Let C be a closed convex subset of IRn.
1. For y ∈ IRn and θ ∈ C, the following properties are equivalent:
18
(a) ||y − θ|| = minθ∈C ||y − θ||
(b) 〈y − θ, θ − θ〉 ≤ 0 for all θ ∈ C
2. For every y ∈ IRn, there exists a unique point where θ ∈ C satisfies (a) and (b). θ is
said to be the projection of y onto C,
where the notation 〈y, x〉 =∑
xiyi refers to the vector inner product of x and y. If C is
also a cone, it is easy to see that (b) of Proposition 1 becomes
〈y − θ, θ〉 = 0 and 〈y − θ, θ〉 ≤ 0, ∀θ ∈ C,
which are the necessary and sufficient conditions for θ to minimize ||y − θ||2 over C (see
Robertson et al. 1988, p. 17).
For monotone regression there is a closed form solution, (see Robertson et al. 1988, p.23).
As for nondecreasing convex, nondecreasing concave and convex regression, the problem of
finding the least-squares estimator θ is a quadratic programming problem. There is no known
closed-form solution. But θ can be found using the mixed primal-dual bases algorithm (Fraser
and Massam, 1989) or the hinge algorithm (Meyer, 1999a).
2.4.2 Constraint Cone
Let V be the space spanned by 1 = (1, . . . , 1)T for a monotone, nondecreasing convex,
and nondecreasing concave, and let V be linear space spanned by 1 = (1, . . . , 1)T and
x = (x1, . . . , xn)T for convex regression. Note that V ⊂ C and V is perpendicular to the
rows of the corresponding constraint matrix.
Let Ω be the set such that Ω = C ∩ V ⊥, where V ⊥ is the orthogonal complement of
V . This implies C = Ω ∪ V . We refer to Ω as the “constraint cone”. By partitioning C
into two orthogonal spaces Ω and V , the projection of a vector y ∈ Rn onto C is the sum
19
of the projection of y onto Ω and V , which simplifies the computation. Besides, the edges
of Ω are unique up to multiplicative factor. The edges are a set of vectors in the constraint
cone such that any vector in Ω can be written as nonnegative linear combination of edges,
and no edge is itself a nonnegative linear combination of other edges. For a more detailed
discussion, see Meyer (1999) or Fraser and Massam (1989).
2.4.3 Edges of constraint cone and Polar cone
The constraint space can be specified by a set of linearly independent vectors δ1, . . . , δm.
So that Ω = θ : θ =∑m
j=1 bjδj : b1, . . . , bm ≥ 0 and the constraint set C = θ : θ =
∑mj=1 bjδ
j + ν : b1, . . . , bm, bj ≥ 0 and ν ∈ V , where m = n − 1 for monotone,
nondecreasing concave, nondecreasing convex and m = n − 2 for convex.
For example, if Ω is the set of all nondecreasing concave, nondecreasing convex, or convex
vectors in IRn, it can be specified using the vectors δj . The vectors δj can be obtained from
the formula ∆′ = (AA′)−1A = [δ1, . . . , δm]′.
For n = 5 and equally spaced x values , ∆′ is given by:
for convex,
∆′ =
2 −2 −1 0 1
4 −1 −6 −1 4
1 0 −1 −2 2
,
nondecreasing convex,
∆′ =
−10 −5 0 5 10
−6 −6 −1 4 9
−3 −3 −3 2 7
−1 −1 −1 −1 4
,
nondecreasing concave,
20
∆′ =
−4 1 1 1 1
−7 −2 3 3 3
−9 −4 1 6 6
−10 −5 0 5 10
,
and monotone
∆′ =
−4 1 1 1 1
−3 −3 2 2 2
−2 −2 −2 3 3
−1 −1 −1 −1 4
.
For convenience of presentation, the smallest possible multiplicative factors are chosen so that
all entries of ∆ are integers. Any convex vector θ ∈ C is a nonnegative linear combination
of the columns of the corresponding ∆ plus a linear combination of 1 and x.
If C is the set of all convex vectors in IRn we can also define the vectors δj to be the
rows of the following matrix:
0 0 x3−x2
xn−x2· · · · · · xn−1−x2
xn−x21
0 0 0 x4−x3
xn−x3· · · xn−1−x3
xn−x31
......
......
......
0 · · · · · · · · · · · · 0 1
1 · · · · · · · · · · · · 1 1
x1 · · · · · · · · · · · · xn−1 xn
For a large data set it is better to use the above vectors δj because the previous method of
obtaining the edges is computationally intensive. Another advantage is that the computations
of the inner products with the second approach are faster because of all the zero entries in
the vectors.
21
The polar cone of the constraint cone Ω is (Rockafellar, 1979, p. 121)
Ω0 = ρ : 〈ρ, θ〉 ≤ 0, ∀θ ∈ Ω .
Geometrically, the polar cone is the set of points in Rn which make an obtuse angle with all
points in Ω.
Let us note some straightforward properties of Ω0:
1. Ω0 is a closed convex cone
2. The only possible element in Ω⋂
Ω0 is 0,
3. γ1, . . . , γm ∈ Ω0.
where γj is negative rows of A, i.e., [γ1, . . . , γm] = −A′. The relationship between δj and
γi is (Fraser and Massam, 1989)
〈δj , γi〉 =
−1 if i = j
0 if i 6= j
These vectors are generators of the polar cone. That is, each ρ ∈ Ω0 can be written as a
nonnegative linear combination of the γj’s. To see this, let K be the cone generated by γi,
i.e., each κ ∈ K can be written as a nonnegative linear combination of the γi,
K = κ : κ =m∑
i=1
aiγi, ai ≥ 0,
then for any θ ∈ Ω, we have
〈θ, κ〉 =m∑
i=1
ai〈θ, γi〉 ≤ 0, ∀κ ∈ K.
This shows that Ω ⊆ K0, where K0 is the polar cone of K. For any ζ ∈ K0, we have
〈ζ, γi〉 ≤ 0, i = 1, · · · , m,
which shows that K0 ⊆ Ω. Therefore, Ω = K0. Since K00 = K (Rockafellar, 1970, p.121),
we have Ω0 = K00 = K.
22
Faces and Sectors
The faces of the constraint cone are constructed by subsets of the constraint cone edges.
Any subset J ⊆ 1, · · · , m defines a face of the constraint cone; i.e., a face consists of all
nonnegative linear combinations of constraint cone edges δj , j ∈ J . Note that Ω itself is a
face for J = 1, · · · , m. The subsets J also define sectors which are themselves a polyhedral
convex cone.
Let the sector CJ be the set of all y′s in IRn such that
y =∑
j∈J
bjδj +
∑
j /∈J
bjγj + ν (2.4.1)
where bj ≥ 0 for j ∈ J ; bj > 0 for j /∈ J , ν ∈ V .
The CJ partition Rn, with J = ∅ corresponding to the interior of the polar cone,
and the sector with J = 1, 2, · · · , m coinciding with the constrained cone. Further, the
representation of y ∈ CJ given in (2.4.1) is unique (Meyer 1999).
The following propositions are useful tools for finding the constrained least squares esti-
mator. Their proofs are discussed indetail by Meyer (1999a).
Proposition 2 Given y ∈ IRn such that y =∑
j∈Jbjδ
j +∑
j /∈Jbjγ
j + ν, the projection of y
onto the constraint set Ω is
θ =∑
j∈J
bjδj + ν. (2.4.2)
and the residual vector ρ = y − θ =∑
j /∈Jbjγ
j is the projection of y onto the polar cone Ω0.
Proposition 3 If y ∈ CJ , then θ is the projection of y onto the linear space spanned by
the vectors δj, j ∈ J , plus the projection of y onto V . Similarly, ρ is the projection of y
onto the linear space spanned by the vectors γj , j /∈ J .
If the set J is determined, using Propositions 2 and 3, the constrained least squares
estimate, θ, can be found through ordinary least-squares regression (OLS), using ν ∈ V
23
and δj for j ∈ J as regressors. Alternatively, ρ can be obtained through OLS using γj, for
j /∈ J as regressors, then θ = y − ρ. To find the set J and θ, Fraser and Massam (1989),
and Meyer (1999) proposed the mixed primal-dual bases algorithm and the hinge algorithm,
respectively. The method chosen in this paper is the hinge algorithm for it is fast, useful for
iterative projection algorithm and computationally more efficient.
2.4.4 The hinge algorithm
This algorithm uses a set of vectors δ1, · · · , δm and ν to characterize the constraint space.
The algorithm finds θ by finding J through a series of guesses Jk. At a typical iteration,
the current estimate θk can be obtained by the least-squares regression of y on the δj , for
j ∈ Jk and ν. We call δj the “hinges” since for the convex regression problem, the points
(xj , θj), j ∈ J , are the bending points at which the line segments change slope, and there is
only one way that the bends are allowed to go. The initial guess J0 is set to be empty.
The algorithm can be summarized in four steps:
1. Using ν as regressors to obtain a least-squares estimate θ0, for a convex, ν = 1, x
and for monotone, nondecreasing convex and nondecreasing concave, ν = 1.
Loop
2. At the kth iteration, compute 〈y−θk, δj〉 for each j /∈ Jk. If these are all non-positive,
then stop. If not, then add the vector δj to the model for which this inner product is
largest.
3. Get the least-squares fit with the new set of δ-vectors.
4. Check to see if the regression function satisfies the constraints on the coefficients, i.e.
is bj ≥ 0, for j ∈ J and j /∈ J0
24
. If yes, go to step 2.
. If no, choose the hinge with the largest negative coefficient and remove it from
the current set J . Go to step 3.
At each stage, the new hinge is added where it is “most needed”, and other hinges are
removed if the new fit does not satisfy the constraints. It is clear that if the algorithm ends,
it gives the correct solution and the algorithm does end. See Meyer (1999) for proof.
2.4.5 The mixed primal-dual bases algorithm
The mixed primal-dual bases algorithm is used to find the projection onto a closed convex
cone. In this algorithm, the γj ’s are the primal vectors and δj ’s are the dual vectors. The
mixed primal-dual bases algorithm finds the correct set J by moving along a line segment
connecting the point z0 =m∑
j=1δj with z, where z is the projection of the data y on the
subspace spanned by δj, j = 1, · · · , m. At the kth iteration, the point zk on the line segment
is reached, such that the distance between zk and z is strictly decreasing in k. This point is
also on a face of ΩJk. The next iteration finds zk+1 farther along the segment, on a face of
ΩJk+1. At the beginning of the iteration, both z and zk are expressed in the basis defined
by Jk, such as
z =∑
j∈Jk
bjδj +
∑
j /∈J k
bjγj ,
and
zk =∑
j∈J k
ajδj +
∑
j /∈J k
ajγj ,
where aj ≥ 0 for j ∈ Jk and aj > 0 for j /∈ Jk. If bj ≥ 0 for j ∈ Jk and bj > 0 for j /∈ Jk, the
algorithm stops. Otherwise, find
zk+1 = zk + αk+1(z − zk),
25
where αk+1 ∈ (0, 1) is as large as possible while the coefficients of zk+1 are all positive or
nonnegative as they are in Jk or not, respectively. The point zk+1 is on the face of ΩJ k,
which divides ΩJkand ΩJk+1
. The algorithm terminates at the face of the sector containing
z. It clearly takes a finite number of iterations since there are a finite number of sectors.
Example of Shape Restricted Regression
The following are two examples of shape restricted fit. In Figure 2.1 (a), the data were
generated from convex function f(xi) = 2xi+1/xi with independent zero-mean normal errors,
and fitted by convex and quadratic regressions. The solid curve is convex fit, the dashed curve
is quadratic fit and the dotted curve is the underlying convex function. In Figure 2.1 (b)
the data were generated from quadratic functions f(xi) = x2i with independent zero-mean
normal errors, and fitted by convex and linear regressions. The solid curve is convex fit, the
dashed curve is linear fit and the dotted curve is the underlying quadratic function. For both
cases, it can be clearly seen that the shape restricted regressions fit the data better.
26
•
•
•• •
•
•
•
•
•
•
•
•
••
•
• •
•
•
••
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
X
Y
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
68
Convex
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
• •
•
•
••
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
••
X
Y
0.0 0.5 1.0 1.5 2.0
−20
24
Quadratic
Figure 2.1: Examples of fits to scatterplot. (a) The solid curve is convex fit, the dashed curveis quadratic fit and the dotted curve is the underlying convex function.(b) The solid curveis convex fit, the dashed curve is linear fit and the dotted curve is the underlying quadraticfunction.
Chapter 3
ESTIMATION OF HAZARD FUNCTION UNDER SHAPE RESTRICTIONS
In this chapter, we introduce a new nonparametric method for estimation of hazard
function that imposes shape restrictions on the hazard function, such as increasing, concave,
convex, nondecreasing concave or nondecreasing convex or concave-convex. We derive shape
restricted estimator of hazard rate based on maximum likelihood method from uncensored
and right censored samples. We also examine how the estimated hazard function behaves
for a Weibull distribution, an exponentiated Weibull distribution and a distribution with a
polynomial hazard function, with different parameters using the new estimator and some
pre-existing estimators.
3.1 Uncensored Sample
Suppose X1, X2, . . . , Xn be a random sample of lifetimes from the distribution with density
f , and let F and S = 1 − F be the corresponding distribution and survival functions,
respectively. The associated hazard rate is h = f/S for F (x) < 1. The problem is to estimate
f , S or h by maximizing
log(
∏
f(xi))
=n∑
i=1
log f(xi)
subject to h ∈ Λ where Λ is a class of hazard functions sharing a qualitative property such
as monotonicity, convexity, or concavity.
Let 0 = x0 < x1 < . . . < xn be the order statistics of random sample of lifetimes,
27
28
recall that f(x) can be written as
f(x) = h(x)S(x) = h(x)exp−∫ x
0h(u) du,
then the log-likelihood function is
ℓ =n∑
i=1
log f(xi) =n∑
i=1
log h(xi) −n∑
i=1
∫ xi
0h(u) du. (3.1.1)
3.1.1 Numerical Integration
If h(t) is approximated by a piecewise linear function with knots at the data, the integral of
h(t) is the sum of trapezoid areas, and (3.1.1) becomes,
ℓ =n∑
i=1
log h(xi) −n∑
i=1
i∑
j=1
1
2[h(xj) + h(xj−1)](xj − xj−1).
Expanding the summation, the expression can be simplified to the following,
ℓ =n∑
i=1
log h(xi) −n∑
i=1
cih(xi), (3.1.2)
where the ci depend on the xj . They can be derived by applying the trapezoidal rule to each
segment and summing the results as follows:
n∑
i=1
∫ xi
0h(u)du = x1
h(0) + h(x1)
2
+x1h(0) + h(x1)
2+ (x2 − x1)
h(x1) + h(x2)
2
+x1h(0) + h(x1)
2+ (x2 − x1)
h(x1) + h(x2)
2+ (x3 − x2)
h(x3) + h(x2)
2...
+x1h(0) + h(x1)
2+ (x2 − x1)
h(x1) + h(x2)
2+ . . . + (xn − xn−1)
h(xn) + h(xn−1)
2
29
Collecting h(xi) terms and simplifying yields the following:
2n∑
i=1
∫ xi
0h(u)du = nx1h(0)
+(x1 + (n − 1)x2)h(x1)
+(x2 + (n − 2)x3 − (n − 1)x1)h(x2)
+(x3 + (n − 3)x4 − (n − 2)x2)h(x3)
...
+(xn − xn−1)h(xn) (3.1.3)
Note that h(0) must be a function of the elements of the vector (h(x1), · · · , h(xn)) in
accordance with shape restrictions. For example, if we are assuming an increasing hazard
function, it is clear that h(0) = 0 is the choice that satisfies the shape restriction and
maximizes the likelihood. If h is constrained to be convex, then we define
h(0) = max0,h(x1)x2
x2 − x1−
h(x2)x1
x2 − x1 (3.1.4)
as the choice that preserves the convex shape and maximizes the likelihood over the assump-
tions. If (h(x1)x2/(x2 − x1) − h(x2)x1/(x2 − x1)) > 0 then plugging Equation. (3.1.4) into
Equation (3.1.3) gives
2n∑
i=1
∫ xi
0h(u)du = nx1(
h(x1)x2
x2 − x1−
h(x2)x1
x2 − x1)
+(x1 + (n − 1)x2)h(x1)
+(x2 + (n − 2)x3 − (n − 1)x1)h(x2)
+(x3 + (n − 3)x4 − (n − 2)x2)h(x3)
30
...
+(xn − xn−1)h(xn)
(3.1.5)
Finally, taking the coefficients of each h(xi) gives ci.
c1 =1
2
(
x1 + (n − 1)x2 +nx1x2
x2 − x1
)
, (3.1.6)
c2 =1
2
(
x2 + (n − 2)x3 − (n − 1)x1 −nx2
1
x2 − x1
)
,
ci =1
2(xi + (n − i)xi+1 − (n − i + 1)xi) ,
cn =1
2(xn − xn−1) ,
for 3 ≤ i ≤ n − 1.
On the other hand, if (h(x1)x2/(x2 − x1) − h(x2)x1/(x2 − x1)) ≤ 0, then ci are given by,
c1 =1
2(x1 + (n − 1)x2) , (3.1.7)
ci =1
2(xi + (n − i)xi+1 − (n − i + 1)xi) ,
cn =1
2(xn − xn−1) ,
for 2 ≤ i ≤ n − 1.
For concave and nondecreasing convex, h(0) is given by,
h(0) = min
[
h(x1), max0,h(x1)x2
x2 − x1−
h(x2)x1
x2 − x1
]
.
31
3.2 Computing the Estimator
Let θi = h(xi). Then the log-likelihood of the survival function given in expression (3.1.2)
will become ℓ(θ) =∑n
i=1 log(θi) −∑n
i=1 ciθi. The shape restrictions can be written as a set
of linear inequality constraints, as with shape restricted regression. Then shape constraints
for h(t) can be imposed by restricting θ to be in the closed convex polyhedral cone in IRn
defined by C = θ : Aθ ≥ 0 for an m×n constraint matrix A, where A is one of constraint
matrices given in section 2.4.
Weighted Least Squares and Constrained Maximum Likelihood
In this section before the discussion of the construction of the shape restricted estimator,
the basic idea of weighted least squares is reviewed. As we have seen in section 2.4, the least
squares estimator, θ, of θ is the projection of y onto the cone C with the smallest Euclidean
distance from y.
Let
yi = θi + ǫi, i = 1, · · · , n
where ε ∼ N(0, σ2Σ), Σ = diag(1/w1, · · · , 1/wn), and θi = f(xi). The θ which minimizes
the sum of squares
n∑
i=1
wi (yi − θi)2
over all θ ∈ C, is called the weighted projection of y onto C with weights w.
The solution of the weighted least squares, θ, is characterized by
n∑
i=1
wi(yi − θi)θi = 0,
n∑
i=1
wi(yi − θi)θi ≤ 0,
32
for all θ ∈ C.
In other words, the constrained weighted least squares estimator θ is found by minimizing∥
∥
∥y − θ∥
∥
∥
2under the restriction C∗ = θ : Aθ ≥ 0. Using the methods given in sec-
tion 2.4, where y = Σ−1/2y, θ = Σ−1/2θ, A = AΣ1/2. Then the inverse transformation
θ = Σ1/2θ provides the solution. This projection in the transformed space can be found
using primal-dual base algorithm of Fraser and Massam (1989).
The method for maximizing ℓ over C involves a sequence of iteratively reweighted least
squares estimates. So the MLE is found by iteratively projecting onto the cone, using an
efficient projection algorithm involving the generators of the cone. Since ℓ is strictly con-
cave and C is a closed convex set, hence a maximum likelihood estimate, θ exists. It is
characterized by Kuhn-Tucker conditions:
∇ℓ(θ)′
θ = 0 (3.2.1)
∇ℓ(θ)′
θ ≤ 0 (3.2.2)
for all θ ∈ C, where ∇ℓ(θ) =(
1/θ1 − c1, . . . , 1/θn − cn
)′. We can rewrite conditions (3.2.1)
and (3.2.2) in the following form:
∇ℓ(θ)′
θ =n∑
i=1
(
1
θi
− ci
)
θi = 0
∇ℓ(θ)′
θ =n∑
i=1
(
1
θi
− ci
)
θi ≤ 0.
We write the Kuhn-Tucker conditions (3.2.1) and (3.2.2) in a form to facilitate iteratively
reweighted least squares as follows:
∇ℓ(θ)′
θ =n∑
i=1
wi(yi − θi)θi = 0
33
∇ℓ(θ)′
θ =n∑
i=1
wi(yi − θi)θi ≤ 0
where wi = ci/θi, yi = 1/ci. and the ci are given by (3.1.6) or (3.1.7) for convex constraint
depends of the value of θ0. Hence, weighted least squares can be used if ci > 0 for i = 1, · · · , n.
The problem of finding the estimator θ over C is an iterative quadratic programming
problem, it can be found using primal-dual base algorithm of Fraser and Massam (1989)
or hinge algorithm of Meyer (1999). The algorithm starts with an initial guess θ0 ∈ C.
The point θ1 is found by moving in the direction of the projection of y on C with weights
w0i = ci
θ0 , so that θ1 is the point along the path between θ0 and θ
0that maximizes ℓ,
where θ0
is the projection of y on C with weights w0i = ci
θ0 . Then θ2 is found using weights
w1i = ci
θ1 and the algorithm continues in this way until conditions (3.2.1) and (3.2.2) are
satisfied. The proof of the convergence of the algorithm is in Proposition 4. Before proving
the convergence of the algorithm, we make use of the following lemma.
Lemma 1 Let S = θ|θ ∈ Rn, ℓ(θ) ≥ ℓ(θ0) and θ0 ∈ Rn be fixed, then S is convex and
compact set.
Proof. i) For the convex part, we want to show that for any θ1, θ2 ∈ S,
λθ1 + (1 − λ)θ2 ∈ S for all λ ∈ [0, 1], that is ℓ(
λθ1 + (1 − λ)θ2)
≥ ℓ(θ0). By concavity of ℓ
for all λ ∈ [0, 1] and θ1, θ2 ∈ S, we have
ℓ(
λθ1 + (1 − λ)θ2)
≥ λℓ(θ1) + (1 − λ)ℓ(θ2)
≥ λℓ(θ0) + (1 − λ)ℓ(θ0)
= ℓ(θ0)
34
Hence λθ1 + (1 − λ)θ2 ∈ S and S is convex.
ii) To prove the compactness of S, we want to show that S is a closed and bounded
set.
Let θm ∈ S such that θm → θ. Since ℓ is continuous, we have ℓ(θ) = limm→∞
(θm) ≥ ℓ(θ0)
Hence, S is closed.
To show that S is bounded, suppose there exist θk ∈ S such that ‖θk‖ → ∞,
where θk = (θki , · · · , θ
kn)′. This implies that there exists at least one θk
j such as |θkj | → ∞.
Now write,
ℓ(θk) =n∑
i6=j
(
log θki − ciθ
ki
)
+ log θkj − cjθ
kj
≤n∑
i6=j
(
log1
ci− 1
)
+ log θkj − cjθ
kj
(3.2.3)
For the first term on the right hand side of latter inequality, we use the relation log θi−ciθi ≤
log 1/ci − 1 since log θi − ciθi attains its maximum value at 1/ci.
As |θkj | → ∞ then log(θk
j ) − cjθkj → −∞ since ci are positive (see Lemma 2) and the right
hand side is dominated by cjθkj . This results that ℓ(θk) → −∞, which contradicts that
ℓ(θk) ≥ ℓ(θ0). Hence, S is bounded. Therefore, S is compact.
Proposition 4 The algorithm defined above converges; i.e, θk → θ as k → ∞.
Proof: The proposition will follow if we show that θ is the only fixed point of the algo-
rithm, ℓ is strictly increasing at θk in the direction of θk+1, except at θk = θ, and the θk
fall in a compact set. As a result, all subsequences of the sequence θk converge to θ.
35
Let G(θ) represent the mapping of the algorithm; that is, G(θk) = θk+1. Let ak+1 be
the projection of y onto C, with weights wik = ci/θi
k, i = 1, · · · , n, and let θk+1 be the
maximum of ℓ along the line segment connecting θk with ak+1. Since ℓ is strictly concave
over the line segment, a unique maximum exists. It can be easily seen that G has only
one fixed point. If G(θk) = θk, then (3.2.1) and (3.2.2) hold, and by uniqueness of the
constrained maximum, θk = θ.
The log-likelihood is increasing with strict inequality if θk 6= θ, in the direction of ak+1.
Since∑
wik(yi − ak+1
i )2 ≤∑
wik(yi − θk
i )2 ∀θ ∈ C,
∑
wik(yi − ak+1
i )2 =∑
wik(yi − θk
i )2 +
∑
wik(θi
k − ak+1i )2 + 2
∑
wik(
yi − θki
)
(θki − ak+1
i )
So,∑
wik(yi − θk
i )(θki − ak+1
i ) ≤ 0
or ∇ℓ(θk)′
(θk − ak+1) ≤ 0. with strict inequality if θk 6= ak+1, i.e., θk 6= θ.
Now, let S = S ∩ C, using Lemma 1, it is straightforward to show that S is compact.
From compactness of S, there exists a subsequence θkn and a θa ∈ C such that θkn → θa.
If G(θa) = θb 6= θa, then ℓ(θb) > ℓ(θa). So for large enough n, ℓ(θkn+1) > ℓ(θa), which
contradicts the result that the likelihood function increases in k. Therefore, all subsequences
converge to the same point, which must be θ. This completes the proof of the proposition.
Requirements for the Coefficients of the θi
There is an important point that should be made concerning the sign of ci. If one of the ci is
negative then the iteratively reweighted least squares method can not be employed to find
the estimator. For increasing hazard function it can be easily shown that the ci are positive.
But for convex constraint, concave and increasing convex there is a possibility of obtaining
a negative coefficient for h(x2). The proof is given in the next Lemma.
36
Lemma 2 Let ci be given by (3.1.6) for convex constraints. Then ci is positive for 1 ≤ i ≤ n
and i 6= 2. However, c2 can be negative.
Proof. To show that ci ≥ 0 for i ≥ 3, recall that ci is given by
ci = xi + (n − i)xi+1 − (n − i + 1)xi−1,
where x1, x2, · · · , xn denotes the ordered values of the random sample of lifetimes. Now from
xi ≤ xi+1 it follows easily that,
(n − i)xi+1 + xi ≥ (n − i)xi + xi = (n − i + 1)xi
this implies that,
(n − i)xi+1 + xi ≥ (n − i + 1)xi ≥ (n − i + 1)xi−1
hence, xi + (n − i)xi+1 − (n − i + 1)xi−1 ≥ 0
For c1 it is straightforward to show that c1 = x1 + (n − 1)x2 + n(x1x2)/(x2 − x1) ≥ 0.
Therefore, Lemma 2 holds.
Using the same argument of Lemma 2 for increasing hazard function it can be easily shown
that ci ≥ 0 for 1 ≤ i ≤ n.
However, c2 can be negative for convex constraint, for this case, θ0 = h(0) is given by
linear extrapolation using the points x1 and x3, that is,
θ0 =θ1x3
x3 − x1
−θ3x1
x3 − x1
(3.2.4)
37
then the likelihood function is maximized by forcing θ2 to be collinear with θ1 and θ3. For
c2 < 0 it is shown that the log-likelihood function obtained by this method is not less than
the one given by (3.1.2). The proof is given in Proposition 5.
Let θ2 be replaced by linear interpolation, i.e.,
θ2 =x2 − x1
x3 − x1
θ3 +x3 − x2
x3 − x1
θ1. (3.2.5)
and let θi = θi for i 6= 2. Furthermore, let ℓ(θ) represents the log-likelihood function maxi-
mized over C subject to (3.2.4) and (3.2.5). Hence, ℓ(θ) can be written as
ℓ(θ) =n∑
i=1
log(θi) −n∑
i6=2
ciθi (3.2.6)
where ci is obtained by substituting (3.2.4) and (3.2.5) into (3.1.5), then taking the coeffi-
cients of each θi for 1 ≤ i ≤ n gives ci and θi = θi for i 6= 2.
Now, in order to apply iteratively reweighted least squares, the new coefficients, ci, have
to be positive but there is a possibility of obtaining a negative coefficient for θ3. For this
scenario, θ0 is given by linear extrapolation using x1 and x4, and the likelihood function is
maximized by θ2 and θ3 collinear with θ1 and θ4. This technique continues until the coefficient
of h(xk) is positive for 5 ≤ k ≤ n− 1 after h(0) is replaced by the linear extrapolation using
the points x1 and xk then θ1, θ2, · · · , θk are assumed to be collinear across x1, x2, . . . , xk. Then
h(xj) for 2 ≤ j ≤ k − 1 can be given by linear interpolation.
If ck < 0 the likelihood function maximized over C subject to θ1, · · · , θk−1 and θk are
collinear across x1, · · · , xk is also greater than (3.1.2). The proof for k ≥ 4 will be done in a
similar way using Proposition 5 and recursive arguments.
Proposition 5 Let θ be any vector in C, and let c ∈ IRn such that c2 ≤ 0. Define θ1 = θ1,
θi = θi for 3 ≤ i ≤ n and θ1, θ2 and θ3 are collinear, and θ2 is given by (3.2.5). Then
ℓ(θ) ≥ ℓ(θ).
38
Proof: To prove this, we first show that ℓ(θ) ≥ ℓ(θ), where ℓ(θ) =n∑
i=1log(θi) −
n∑
i=1ciθi, then
show that ℓ(θ) = ℓ(θ).
Substituting (3.2.5) in (3.1.2), i.e., ℓ(θ) =n∑
i6=2log(θi) + log(θ2) −
n∑
i6=2ciθi − c2θ2,
and taking the difference of ℓ(θ) and ℓ(θ), we obtain
ℓ(θ) − ℓ(θ) = log(θ2) − log(θ2) − c2θ2 + c2θ2
= log
(
θ2
θ2
)
+ c2(θ2 − θ2)
≥ 0
Since θ2 ≥ θ2 by convexity and c2 < 0,
it follows that
ℓ(θ) ≥ ℓ(θ). (3.2.7)
Next to show that ℓ(θ) = ℓ(θ),
let a1 = x1 + (n − 1)x2, and ai = xi + (n − i) − (n − i + 1)xi−1 for 2 ≤ i ≤ n. The ci and ci
can be expressed as follows, ci = ci = ai for 4 ≤ i ≤ n, c1 = a1 + (nx1x3)/(x3 − x1) + (x3 −
x2)/(x3−x1)a2, c1 = a1+(nx1x2)/(x2−x1)+(x3−x2)/(x3−x1)c2, c2 = a2−(nx12)/(x2−x1),
c3 = a3 + (x2 − x1)/(x3 − x1)c2, and c3 = a3 − (nx21)/(x3 − x1) + (x2 − x1)/(x3 − x1)a2, then,
ℓ(θ) − ℓ(θ) = c1 − c1 + c3 − c3
=(
a1 +nx1x3
x3 − x1+
x3 − x2
x3 − x1a2
)
−(
a1 +nx1x2
x2 − x1+
x3 − x2
x3 − x1c2
)
+
(
a3 −nx2
1
x3 − x1+
x2 − x1
x3 − x1a2
)
−(
a3 +x2 − x1
x3 − x1c2
)
39
by replacing c2 with a2 − nx12/(x2 − x1) in the above expression, we obtain,
=(
nx1x3
x3 − x1
+x3 − x2
x3 − x1
a2
)
−
(
nx1x2
x2 − x1
+x3 − x2
x3 − x1
(a2 −nx1
2
x2 − x1
)
)
+
(
−nx2
1
x3 − x1+
x2 − x1
x3 − x1a2
)
−
(
x2 − x1
x3 − x1(a2 −
nx12
x2 − x1)
)
=
(
nx1x3
x3 − x1−
nx1x2
x2 − x1−
nx12(x3 − x2)
(x3 − x1)(x2 − x1)
)
+
(
−nx2
1
x3 − x1
+x2 − x1
x3 − x1
a2 −x2 − x1
x3 − x1
a2 +nx1
2
x3 − x1
)
=
(
(nx1x3)(x2 − x1) − (nx1x2)(x3 − x1)
(x3 − x1)(x3 − x1)+
nx12(x3 − x2)
(x3 − x1)(x2 − x1)
)
+ 0
=
(
(nx12(x2 − x3)
(x3 − x1)(x3 − x1)−
nx12(x3 − x2)
(x3 − x1)(x2 − x1)
)
= 0.
Therefore,
ℓ(θ) = ℓ(θ). (3.2.8)
Combining (3.2.7) and (3.2.8) completes the proof of the proposition.
3.3 Examples
3.3.1 Increasing and convex hazard functions from exponentiated Weibull
Distribution
Let the underlying hazard function be given by
h(t) =αη [1 − exp(−(t/λ)α)]η−1 exp (−(t/λ)α) (t/λ)α−1
λ (1 − [1 − exp (−(t/λ)α)]η),
where λ is scale parameter, and α and η are shape parameters. We chose exponentiated
Weibull hazard function because it is flexible enough to accommodate:
1. Increasing for α ≥ 1 and αη ≥ 1,
40
2. Decreasing for α ≤ 1 and αη ≤ 1,
3. Bathtub shaped for α > 1 and αη < 1, or
4. Constant for α = 1 and η = 1 hazard rate.
A detailed analysis of exponentiated Weibull family is found in Mudholkar et al. (1996).
In Figure 3.1, we obtained percentiles from an increasing convex, exponentiated Weibull
distribution with parameters α = 4 and λ = 10 and η = 1, using the quantile function,
Q(p) = λ[
−log(
1 − p1/η)]1/α
. (3.3.1)
The hazard rate was estimated by the new proposed shape restricted, Kaplan Meier,
and kernel estimators from these quantiles at selected time points. The SRE was obtained
by maximizing the likelihood function over the increasing convex constraint. The thin solid
curve is the underlying hazard rate, the thick solid curve is shape restricted estimate (SRE),
and the dotted curve is kernel estimate and the dashed curve is Kaplan Meier estimate.
In order to estimate the hazard rate θ using the SRE, we considered yi = 1/ci, wi = ci/θi
then iteratively reweighted least squares was used to estimate the hazard rate θ until condi-
tions (3.2.1) and (3.2.2) were satisfied. As for the kernel estimator, we used equation (2.3.7).
Although optimal bandwidth selection is essential, we use data adaptive fixed bandwidth
b = (Tmax − Tmin)/(8n0.2U ) that was recommended by Muller and Wang (1994). Where Tmax
and Tmin are the maximum and minimum time used in estimation, respectively and nu is the
number of uncensored observations, for uncensored sample nu = n. In this example the SRE
seems to be closer to the underlying hazard rate than kernel and Kaplan Meier estimators.
For our next example, quantiles from exponentiated Weibull, bathtub shaped hazard
rate with parameters α = 3, η = 0.2 and λ = 10 were used to examine the performance of
the constrained estimator. The likelihood was maximized over convex constraint set, since
41
Time
haza
rd
2 4 6 8 10
0.0
0.5
1.0
1.5
Increasing Convex Hazard Function
l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
Figure 3.1: Estimation results using percentiles as data. The failure times are quantiles ofexponentiated Weibull distribution with parameters α = 4, η = 1 and λ = 10. The thin solidcurve is the underlying hazard rate, the thick solid curve is SRE estimate, the dotted curveis kernel estimate, and the dashed curve is Kaplan Meier estimate.
we assumed our underlying hazard function is convex (bathtub shaped). Then using our
algorithm the constrained estimates were obtained. The results are given in Figure 3.2.
From Figure 3.2, one can see that SRE fits better for the bathtub shaped hazard rate than
kernel and Kaplan Meier estimators. The SRE agrees reasonably well with the underlying
hazard function.
42
Time
haza
rd
0 2 4 6 8
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Convex Hazard Function
llll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
Figure 3.2: Estimation results using percentiles as data. The failure times are quantiles ofexponentiated Weibull distribution with parameters α = 3, η = 0.2 and λ = 10. The thinsolid curve is the underlying hazard rate estimate, the thick solid curve is SRE estimate, thedotted curve is kernel estimate, and the dashed curve is Kaplan Meier estimate.
3.3.2 Quadratic hazard function
For our third example, we considered distribution with a polynomial hazard function. The
hazard function can be written as polynomial,
h(t) = β0 + β1t + · · · + βm−1tm−1
43
The survival function for the distribution is S(t) = exp[−H(t)] and the p.d.f is f(t) =
h(t)S(t) and the parameters β1, · · · , βm−1 must satisfy certain constraints since H(0) = 0
and H(∞) = ∞.
Distributions with m = 2 were discussed by Bain (1974). Recently, polynomial hazard
functions with m = 2, 3 and 4 were also discussed by Hess et al. (1999). For comparison
purpose and convenience, we chose quadratic concave up hazard functions. We followed
the methods used by Hess et al. (1999) to obtain the coefficients of the polynomial hazard
function. They simulated life times over [0,100], and specified h(t) = λ0h0(t), where λ0
was set so that S(t = 90) = 0.1 for n = 100. These values correspond to leaving about
10 patients at risk when t = 90. For quadratic concave up β0, β1 and β3 were selected to
achieve h0(0) = 1, h0(50) = 0 and h0(100) = 1. Then the inverse function ti = S−1(pi)
was used to obtain the percentile of failure times. Based on the computed failure times,
estimates of the underlying hazard function were obtained using the SRE, kernel, Kaplan
Meier estimators. The results of the estimates are shown on Figure 3.3. From the results
we see that our estimator agreed better with the underlying quadratic hazard function than
kernel and Kaplan Meier estimator except at the end points.
3.4 Right Censored Sample
The Direct Approach
Recall the random right censored data in chapter 2, section 2.2, on each of n individuals
we observe the pair(Xi, δi) where Xi = min(Ti, Zi) and δi = I(Ti ≤ Zi). The problem
considered here is estimation of f, F, or h by maximizing data from experiments involving
right censoring. Recall that the likelihood for right censored data is given by,
L =∏
f(xi)δiS(xi)
1−δi ,
44
Time
haza
rd
0 20 40 60 80 100
0.0
0.02
0.04
0.06
0.08
0.10
Quadratic Hazard Function
llll lllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
Figure 3.3: Estimation results using percentiles as data. The failure times are quantiles ofa distribution function with quadratic hazard function. The thin curve is the underlyinghazard rate, the thick solid curve is SRE estimate, the dotted curve is kernel estimate, andthe dashed curve is Kaplan Meier estimate.
and the log-likelihood is,
ℓ =n∑
i=1
δi log h(xi) − H(xi)
=n∑
i=1
(
δi log h(xi) −∫ xi
0h(u)du
)
≈n∑
i=1
δi log h(xi) −n∑
i=1
cih(xi).
Letting h(xi) = θi then the log-likelihood will become ℓ =∑n
i=1 δi log θi −∑n
i=1 ciθi, where
the ci are obtained as in section 3.1. By conditions (3.2.1) and (3.2.2) the log likelihood
45
function, ℓ, is maximized if
∇ℓ(θ)′
θ =n∑
i=1
(
δi
θi
− ci
)
θi =n∑
i=1
wi(yi − θi)θi = 0 (3.4.1)
∇ℓ(θ)′
θ =n∑
i=1
(
δi
θi
− ci
)
θi =n∑
i=1
wi(yi − θi)θi ≤ 0 (3.4.2)
where wi = ci/θi and yi = δi/ci.
3.4.1 Survival Function Example
In this example, we estimate the survival function using the Kaplan Meier, kernel and our
methods. In Figure (3.4) the failure times were randomly generated from Weibull distribution
with parameters (λ = 10, α = 6) and the corresponding censored times were independently
generated from Weibull distribution with parameters (λ = 12, α = 6), where the parameters
for the censored times were selected to achieve 25 percent of censoring. It is clear from Figure
3.4 that our method of survival function estimation does a decent job compared to the other
methods.
The Weighted Approach
Let Xi = min(Ti, Zi), let δi = I(Ti ≤ Zi), and let Yi be the number of persons censored at
time ti plus the number of persons who have failed at time ti. Furthermore, let di be the
number of persons who have failed at time ti and let gi be the number of persons censored
at time ti. Then the Kaplan -Meier estimator of S(t) is given by
S(t) =∏
t≥xi
(
1 −di
yi
)
or
S(t) =∏
t≥xi
(
1 −1
n − i + 1
)δi
46
Time
Surv
ival
6 7 8 9 10 11 12
0.2
0.4
0.6
0.8
Survival Function Estimated by Different Methods
l l l ll l l l ll l l l l l l lll l l l l l l l lll l l ll l l l l l l
Figure 3.4: Comparison of Survival functions estimated by different methods. The thin solidcurve is the underlying survival function, the thick solid curve is the shape restricted estimate,the dotted curve is Kaplan Meier estimate and the dashed curve is kernel estimate.
with x1 < x2 < . . . < xn the order statistics of x′is and δi the associated index of xi. Here di
is 1 or 0 and gi is 0 or 1.
Satten and Datta (2000) have shown that the Kaplan Meier estimator can also be
expressed as an inverse probability of censoring weighted estimator. This approach was intro-
duced in Koul et al. (1981) and Robins and Rotnitzky (1992).
Let survival function for censoring is K(t) = P (Zi > t), using Kaplan -Meier approach K(t)
is given by,
K(t) =∏
t≥xi
(
1 −gi
yi
)
,
47
but we considered that the true failure times as censored times and their corresponding
censoring times as true failure times.
Then the inverse probability of censoring weighted estimator representation of the
Kaplan-Meier estimator is given by,
S(t) =1
n
n∑
i=1
δiI (xi > t)
K(xi−). (3.4.3)
If there were no censoring, the survival function could be estimated by the empirical
survival function,
S(t) =1
n
n∑
i=1
I (xi > t) .
The estimator given by (3.4.3) is an average of iid terms I(xi > t), each multiplied by
δ = I(ti ≤ zi) and weighted inversely by P (Zi ≥ ti). It is analogous to S(t), see Satten and
Datta (2000) for more discussion.
Recall that the log likelihood function of the hazard function is
ℓ =n∑
i=1
δi log h(xi) −n∑
i=1
cih(xi).
Now using the same approach as the survival function the likelihood for the censored data
can be approximated byn∑
i=1
δi log h(xi)
K(xi−)−
n∑
i=1
cih(xi)
K(xi−)
=n∑
i=1
di log(h(xi)) −n∑
i=1
eih(xi)
where di = δi/K(xi−) and ei = ci/K(xi−)
Conditions (3.2.1) and (3.2.2) will be
n∑
i=1
1
K(xi−)
(
δ1
θi
− ci
)
θi =n∑
i=1
wi(yi − θi)θi = 0 (3.4.4)
48
n∑
i=1
1
K(xi−)
(
δi
θi
− ci
)
θi =n∑
i=1
wi(yi − θi)θi ≤ 0 (3.4.5)
where wi = ci/K(xi−)θi and yi = δi/ci.
Comparing the direct and weighted approaches, we can clearly see that they are equivalent
except for the weights. The latter form is suitable for extending the methodology to the case
of dependent censoring. Dependent censoring occur, for example, if there are covariates that
affect both the hazard of failure and hazard of being censored. Satten, et al. (2001) is a good
reference for the dependent censoring case. We plan to extend our results to incorporate
dependent censoring in the future. This can be easily handled by the weighted approach.
Chapter 4
SIMULATION STUDIES AND APPLICATION TO REAL DATA SETS
4.1 Simulation Results
In this chapter we evaluate the performance of the shape restricted estimator (SRE) using
simulation studies. This allows us to compare the results of the estimates obtained by kernel,
Kaplan Meier estimator and our estimator to the values of the underlying hazard function.
The simulation was done for two different shapes of hazard function namely: nondecreasing
convex and convex (bathtub shaped).
The failure times were generated from a Weibull distribution with shape and scale param-
eters 3 and 6, respectively, for the increasing convex hazard function. To implement random
censorship, we independently generated corresponding censoring times from Weibull distri-
bution with scale and shape parameters α and λ, respectively, where α and λ were selected
to achieve a given proportion of censoring: 0 percent, 25 percent and 50 percent. The kernel,
Kaplan Meier and SRE were then computed based on the generated observed lifetimes for
sample sizes 25, 50 and 75 at different levels of censoring.
Similarly, for the convex (bathtub shaped) hazard function, the failure times were gener-
ated from an exponentiated Weibull distribution with parameters α = 3, λ = 6 and η = 0.20
and the corresponding censoring times were generated independently from exponentiated
Weibull distribution with α, λ and η, where α, λ and η were chosen to yield about, 0, 25
and 50 censoring. The simulation repeated N=1000 times for each setting.
49
50
To summarize our simulation results, we computed the overall mean squared error
(OMSE), the average were taken over 1000 simulation and n observed time points. The
OMSE was computed without the largest and smallest observed time, because at the end
points our estimator has spikes. Results from the simulations are presented in Tables 4.1
and 4.2. All the results are rounded to nearest thousandth. Tables 4.1 and 4.2 show the
OMSE of SRE, kernel and Kaplan Meier for different levels of censoring and sample sizes
25, 50 and 75.
We also computed point-wise mean squared errors (MSE) and bias for our estimator
(SRE), kernel and Kaplan Meier at selected grid points between 1.5 and 6.5 with 0.5 incre-
ment, and between 1.00 and 6.5 with 0.5 increment for increasing convex and convex, respec-
tively. The results are given in Tables 4.4-4.7.
The bias and mean squared error (MSE) and standard deviation (SD) at the grid points
are computed as:
Bias(t) =
∑Ni=1 h(t)
N− h(t),
MSE(t) =
∑Ni=1
(
h(t) − h(t))2
N.
SD(t) =
√
√
√
√
∑Ni=1
(
h(t) − h(t))2
N − 1,
respectively.
The results from Tables 4.1 and 4.2, demonstrated that the SRE performed fairly well in
all cases. It is better than kernel and Kaplan Meier when the hazard function is estimated
without spikes. For a high rate of censoring, the performance of the SRE compared to the
kernel method is nearly the same. The Kaplan Meier estimator performed poorly in all cases.
From Tables 4.4-4.7 we found that the SRE has smaller MSE than Kaplan Meier. When
the level of censoring is low and if the MSE computed at the grid point which are not close
51
Table 4.1: Comparison of SRE, Kaplan Meier and kernel estimators using OMSE when theunderlying hazard function is increasing convex.
n Level of censoring SRE Kap kern
25 0 0.122 0.292 0.132
25 25 0.158 0.343 0.174
25 50 0.202 0.363 0.213
50 0 0.121 0.379 0.123
50 25 0.150 0.415 0.164
50 50 0.199 0.428 0.215
75 0 0.109 0.418 0.113
75 25 0.137 0.447 0.156
75 50 0.187 0.442 0.202
Table 4.2: Comparison of SRE, Kaplan Meier and kernel estimators using OMSE when theunderlying hazard function is convex.
n Level of censoring SRE Kap kern
25 0 0.063 0.095 0.066
25 25 0.066 0.117 0.073
25 50 0.089 0.141 0.086
50 0 0.062 0.146 0.064
50 25 0.066 0.162 0.071
50 50 0.086 0.178 0.087
75 0 0.051 0.170 0.058
75 25 0.055 0.181 0.065
75 50 0.082 0.195 0.087
52
to end points the SRE performed better than kernel for both increasing convex and convex
hazard functions. As for a higher censoring rate, the kernel has smaller bias and MSE than
the SRE, in particular if the MSE and bias are calculated at the end points. This is not
surprising because SRE spikes at the endpoints. It can also be easily noted that MSE and
bias become relatively larger whenever that rate of censoring increase. If we investigate the
performance of the estimators in terms of bias, the judgment as to which estimator is better
may not be so clear cut, at 0 and 25 percent of censoring. At at the end points and for
higher level of censoring kernel perform better than SRE. We also see that the MSE and
bias decrease as n increases as expected.
To compare the performance of weighted and direct approaches (see Chapter 3, pp. 43-
46), we computed overall MSE and average bias for two samples and two levels of censoring.
The results are given in Table 4.3. The average were taken over 1000 simulations and the
selected time points. From the results of the simulation, the weighted approach performed
slightly better than the direct approach in all cases.
4.2 Application To Real Data Sets
In this section we illustrate how the proposed methodology works by using different data
sets.
Head and Neck Example
For our first real data example we used a data set from a clinical trial conducted by Northern
Oncology Group, discussed by Efron (1988). The data represent the survival times of 51
head and neck cancer patients under treatment A, who were given radiation therapy. Nine
patients were lost to follow up and were considered as censored. The data from Efron (1988)
are reproduced in Table A.1 in the appendices. Efron observed that the empirical hazard
53
Table 4.3: Comparison of Direct and Weighted approaches for estimating increasing convexhazard function.
n Level of censoring Method Abias aMSE
25 25 percent Direct 0.568 5.544
Weighted 0.536 5.456
50 percent Direct 0.941 6.716
Weighted 0.857 6.394
50 25 percent Direct 0.495 1.305
Weighted 0.464 1.216
50 percent Direct 1.013 5.422
Weighted 0.935 5.126
function starts near 0, and suggested an initial high risk period in the beginning, a decline for
a while, and then stabilizes after one year. He developed and illustrated a methodology for
analyzing the data using a combination of techniques of quantal response analysis and the
spline regression method. Mudholkar et al. (1995) analyzed the data set with exponentiated
Weibull model to estimate the hazard rate function. Kouassi and Singh (1997) also used this
data set to estimate hazard function using a semiparametric approach.
To compare the new estimator we also reported estimates of the hazard function by
kernel estimator and parametric estimator. After transforming the data into months, our
estimator was computed by maximizing the likelihood function over concave-convex set.
The inflection point was found by maximizing ℓ. The kernel was estimated using equation
(2.3.7) with the data adaptive fixed bandwidth b = (Tmax−Tmin)/(8n0.2U ) and the parametric
was computed using Mudholkar, et al. (1995) estimates of the parameters. Figure 4.1 gives
54
Table 4.4: Simulation results of bias and mean square error for SRE, kernel and KaplanMeier estimators at 0, 25 and 50 percent censoring with n=25 from increasing convex hazardfunction (Weibull distribution with α = 3, λ = 6).Time Truehaz BiasSRE BiasKap Biasker MSESRE MSEkap MESker
0 percent censoring
1.500 0.031 0.001 -0.012 0.002 0.000 0.002 0.0012.000 0.056 -0.011 -0.012 0.001 0.001 0.011 0.0012.500 0.087 -0.019 -0.006 0.001 0.002 0.023 0.0023.000 0.125 -0.019 -0.006 0.003 0.002 0.044 0.0043.500 0.170 -0.015 -0.007 0.004 0.004 0.088 0.0064.000 0.222 -0.011 0.002 0.004 0.005 0.209 0.0094.500 0.281 -0.006 -0.007 0.002 0.007 0.249 0.0135.000 0.347 0.000 -0.015 0.001 0.011 0.212 0.0195.500 0.420 0.009 -0.020 0.002 0.018 0.446 0.0296.000 0.500 0.026 -0.047 0.005 0.034 0.458 0.0466.500 0.587 0.062 -0.028 0.009 0.086 1.073 0.074
25 percent censoring1.500 0.031 -0.001 -0.012 0.002 0.000 0.002 0.0012.000 0.056 -0.013 -0.014 0.001 0.001 0.009 0.0012.500 0.087 -0.020 -0.006 0.002 0.002 0.024 0.0023.000 0.125 -0.019 -0.001 0.003 0.002 0.058 0.0043.500 0.170 -0.015 0.005 0.004 0.003 0.138 0.0054.000 0.222 -0.010 -0.003 0.004 0.005 0.160 0.0084.500 0.281 -0.005 -0.020 0.003 0.007 0.295 0.0135.000 0.347 0.002 0.020 0.003 0.012 0.494 0.0215.500 0.420 0.017 -0.029 0.000 0.024 0.433 0.0326.000 0.500 0.052 -0.057 -0.007 0.102 0.753 0.0516.500 0.587 0.173 -0.012 -0.014 0.791 1.756 0.089
50 percent censoring1.500 0.031 -0.006 -0.004 0.004 0.000 0.004 0.0012.000 0.056 -0.015 0.006 0.002 0.001 0.022 0.0012.500 0.087 -0.018 -0.023 0.001 0.002 0.014 0.0023.000 0.125 -0.017 -0.025 0.001 0.003 0.037 0.0043.500 0.170 -0.014 0.013 0.003 0.004 0.251 0.0064.000 0.222 -0.007 -0.023 0.002 0.007 0.206 0.0104.500 0.281 0.004 -0.053 -0.002 0.011 0.140 0.0185.000 0.347 0.025 0.152 0.003 0.026 5.565 0.0395.500 0.420 0.115 -0.029 0.003 0.284 6.729 0.0686.000 0.500 0.169 0.837 -0.018 0.503 18.894 0.0906.500 0.587 3.352 2.049 -0.109 5.390 33.112 0.105
55
Table 4.5: Simulation results of bias and mean square error for SRE, kernel and KaplanMeier estimators at 0, 25 and 50 percent censoring with n=50 from increasing convex hazardfunction (Weibull distribution with α = 3, λ = 6).Time Truehaz BiasSRE BiasKap Biasker MSESRE MSEkap MESker
0 percent censoring1.500 0.031 -0.005 -0.006 0.002 0.000 0.003 0.0012.000 0.056 -0.011 -0.003 0.002 0.001 0.009 0.0012.500 0.087 -0.011 0.002 0.003 0.001 0.041 0.0023.000 0.125 -0.008 -0.001 0.003 0.002 0.050 0.0033.500 0.170 -0.005 -0.014 0.002 0.003 0.034 0.0044.000 0.222 -0.002 0.002 0.002 0.004 0.118 0.0064.500 0.281 -0.001 -0.009 0.002 0.006 0.175 0.0105.000 0.347 0.001 0.005 0.002 0.009 0.355 0.0145.500 0.420 0.004 0.004 0.000 0.014 0.674 0.0206.000 0.500 0.010 -0.009 0.004 0.021 0.593 0.0306.500 0.587 0.020 0.005 0.007 0.034 0.719 0.051
25 percent censoring1.500 0.031 -0.007 -0.006 0.002 0.000 0.003 0.0002.000 0.056 -0.012 -0.002 0.002 0.001 0.011 0.0012.500 0.087 -0.011 0.001 0.003 0.001 0.043 0.0023.000 0.125 -0.007 0.002 0.003 0.002 0.058 0.0033.500 0.170 -0.004 -0.010 0.002 0.003 0.062 0.0044.000 0.222 -0.002 -0.007 0.002 0.004 0.122 0.0064.500 0.281 0.000 -0.006 0.003 0.006 0.239 0.0105.000 0.347 0.002 -0.003 0.002 0.010 0.356 0.0155.500 0.420 0.006 -0.035 0.000 0.015 0.425 0.0226.000 0.500 0.018 -0.053 0.001 0.025 0.504 0.0366.500 0.587 0.050 0.013 0.005 0.060 1.034 0.071
50 percent censoring1.500 0.031 -0.007 0.003 0.003 0.000 0.005 0.0012.000 0.056 -0.008 0.007 0.002 0.001 0.024 0.0012.500 0.087 -0.007 -0.009 0.002 0.001 0.020 0.0023.000 0.125 -0.005 0.008 0.003 0.002 0.070 0.0033.500 0.170 -0.004 0.021 0.004 0.003 0.107 0.0054.000 0.222 -0.001 -0.017 0.004 0.005 0.115 0.0084.500 0.281 0.002 0.067 0.001 0.008 0.995 0.0125.000 0.347 0.010 0.048 0.001 0.015 1.893 0.0225.500 0.420 0.032 -0.027 0.001 0.042 0.489 0.0396.000 0.500 0.077 -0.124 0.005 0.108 0.377 0.0666.500 0.587 0.320 0.152 -0.036 1.564 4.632 0.105
56
Table 4.6: Simulation results of bias and mean square error for SRE, kernel and KaplanMeier estimators at 0, 25 and 50 percent censoring with n=25 from bathtub shaped hazardfunction (exponentiated Weibull distribution with α = 3, λ = 10 and θ = 0.2).Time Truehaz BiasSRE BiasKap Biasker MSESRE MSEkap MESker
0 percent censoring1.000 0.201 0.011 -0.011 0.018 0.006 0.073 0.0071.500 0.188 0.004 -0.008 0.005 0.004 0.091 0.0072.000 0.183 0.000 -0.018 0.002 0.004 0.044 0.0072.500 0.183 -0.003 -0.015 -0.001 0.004 0.053 0.0083.000 0.185 -0.004 -0.029 -0.002 0.004 0.043 0.0093.500 0.189 -0.004 -0.015 -0.001 0.004 0.058 0.0114.000 0.195 -0.004 -0.003 0.000 0.004 0.331 0.0134.500 0.202 -0.002 -0.001 -0.001 0.005 0.257 0.0135.100 0.213 0.002 -0.013 0.003 0.007 0.126 0.0165.500 0.220 0.005 0.024 0.003 0.009 0.790 0.0196.000 0.231 0.012 -0.025 0.002 0.013 0.078 0.0226.500 0.243 0.023 -0.027 -0.001 0.034 0.149 0.027
25 percent censoring1.000 0.201 0.010 -0.020 0.018 0.008 0.097 0.0071.500 0.188 0.002 -0.022 0.006 0.005 0.062 0.0072.000 0.183 -0.004 -0.023 0.003 0.005 0.061 0.0082.500 0.183 -0.007 -0.022 0.001 0.005 0.065 0.0093.000 0.185 -0.009 -0.018 0.002 0.005 0.138 0.0113.500 0.189 -0.008 -0.008 0.003 0.006 0.089 0.0144.000 0.195 -0.005 0.004 0.002 0.011 0.703 0.0174.500 0.202 -0.004 0.013 -0.003 0.009 0.413 0.0185.100 0.213 0.003 0.006 -0.006 0.012 0.502 0.0215.500 0.220 0.012 0.007 -0.007 0.018 0.483 0.0266.000 0.231 0.038 -0.030 -0.003 0.060 0.340 0.0346.500 0.243 0.135 -0.004 -0.004 0.883 0.522 0.045
50 percent censoring1.000 0.201 0.004 -0.029 0.017 0.010 0.188 0.0111.500 0.188 -0.006 -0.015 0.007 0.008 0.102 0.0112.000 0.183 -0.011 -0.019 0.002 0.008 0.110 0.0132.500 0.183 -0.012 -0.025 -0.002 0.009 0.106 0.0143.000 0.185 -0.008 0.028 -0.002 0.012 2.240 0.0183.500 0.189 0.006 -0.020 0.000 0.033 0.132 0.0244.000 0.195 0.015 0.123 0.000 0.053 7.381 0.0294.500 0.202 0.016 0.126 -0.007 0.037 6.809 0.0325.100 0.213 0.054 0.241 -0.016 0.133 9.193 0.0405.500 0.220 0.053 0.370 -0.025 0.139 10.459 0.0446.000 0.231 0.055 0.621 -0.036 0.357 14.773 0.0506.500 0.243 1.115 0.760 -0.054 8.024 16.088 0.057
57
Table 4.7: Simulation results of bias and mean square error for SRE, kernel and KaplanMeier estimators at 0, 25 and 50 percent censoring with n=50 from bathtub shaped hazardfunction (exponentiated Weibull distribution with α = 3, λ = 10 and θ = 0.2).Time Truehaz BiasSRE BiasKap Biasker MSESRE MSEkap MESker
0 percent censoring1.000 0.201 0.005 0.000 0.010 0.005 0.091 0.0071.500 0.188 0.001 -0.002 0.003 0.005 0.124 0.0072.000 0.183 -0.001 0.000 0.002 0.005 0.071 0.0072.500 0.183 -0.003 -0.013 0.001 0.004 0.040 0.0073.000 0.185 -0.004 -0.019 0.000 0.004 0.040 0.0073.500 0.189 -0.005 -0.017 0.000 0.004 0.052 0.0084.000 0.195 -0.005 -0.017 0.000 0.005 0.108 0.0094.500 0.202 -0.004 -0.014 -0.002 0.005 0.117 0.0105.100 0.213 -0.003 -0.016 0.001 0.006 0.086 0.0115.500 0.220 -0.002 0.008 0.003 0.007 0.216 0.0146.000 0.231 0.000 -0.007 0.004 0.008 0.085 0.0166.500 0.243 0.002 -0.009 0.000 0.009 0.200 0.018
25 percent censoring1.000 0.201 0.004 -0.027 0.013 0.007 0.065 0.0081.500 0.188 -0.001 -0.038 0.003 0.005 0.040 0.0072.000 0.183 -0.005 -0.023 0.000 0.005 0.046 0.0072.500 0.183 -0.008 -0.015 -0.002 0.005 0.083 0.0073.000 0.185 -0.009 -0.024 -0.002 0.004 0.059 0.0083.500 0.189 -0.009 -0.007 0.001 0.005 0.098 0.0094.000 0.195 -0.009 0.015 0.002 0.005 0.362 0.0124.500 0.202 -0.008 0.000 -0.003 0.006 0.240 0.0125.100 0.213 -0.005 -0.016 -0.004 0.007 0.111 0.0145.500 0.220 -0.003 -0.015 -0.004 0.009 0.139 0.0196.000 0.231 0.003 -0.021 -0.001 0.013 0.102 0.0256.500 0.243 0.019 -0.050 -0.001 0.067 0.084 0.031
50 percent censoring1.000 0.201 0.002 -0.009 0.017 0.015 0.171 0.0181.500 0.188 -0.005 -0.025 0.005 0.012 0.085 0.0152.000 0.183 -0.009 -0.031 0.001 0.011 0.082 0.0142.500 0.183 -0.013 -0.036 -0.002 0.011 0.058 0.0153.000 0.185 -0.014 0.066 -0.003 0.011 1.431 0.0163.500 0.189 -0.014 -0.025 -0.003 0.012 0.112 0.0184.000 0.195 -0.012 0.051 -0.002 0.014 1.085 0.0244.500 0.202 -0.008 -0.011 -0.007 0.018 0.191 0.0285.100 0.213 0.002 -0.030 -0.009 0.025 0.182 0.0325.500 0.220 0.025 0.056 -0.015 0.092 2.820 0.0366.000 0.231 0.041 0.072 -0.019 0.116 2.787 0.0456.500 0.243 0.129 0.143 -0.027 0.316 3.464 0.060
58
time
Haza
rd
0 10 20 30 40
0.0
0.05
0.10
0.15
0.20
Hazard Rate Estimated by Different Methods
l ll ll l lll ll llllllllllll l ll ll lll l l l ll l l l l l l l l l ll
Figure 4.1: Estimates of hazard rates for the head and neck cancer data based on kernel(dashed curve), SRE (solid curve) and parametric (dotted curve) estimators.
estimated hazard rates for the observed failure times estimated by kernel, parametric and
our method. From Figure 4.1 we can reach different conclusions by using each of the different
estimators. According to our estimator the hazard rate for head and neck cancer patients
under treatment A increases during the first about 6 months and then decreases between
about 6 and 10 months and then stabilize after one year. This agrees with Efron’s analysis
(see Efron’s, (1988) Arm A data). As for the parametric estimator, during the first 3 months
the rate seems to increase, then deceases and slightly stabilizes after about 2 years. The
results obtained by parametric estimator look slightly like the results obtained by Efron’s
analysis. The kernel performed poorly with respect to Efron’s expectations.
59
Bone Marrow Transplantation Example
Time
haza
rd
0 500 1000 1500 2000
0.0
0.00
10.
002
0.00
30.
004
0.00
5
l l ll llllll l ll ll l l l l l l ll l l l l l l l l l l l l l l
Figure 4.2: Estimates of hazard rates for the bone marrow transplantation data based onSRE (thick solid curve), kernel (dashed curve) and PMLE (dotted curve) estimators.
Our second example is concerned with bone marrow transplant for leukemia. The data
set is given in Table B.1. We applied our method to this data set, which can be found in
Klein and Moeschberger (2003, p. 3). Out of their three sets of leukemia patient groups, we
considered the patients in the group with acute lymphoblastic leukemia (ALL). Transplan-
tation is considered a failure when a patient’s leukemia returns (relapse) or when he or she
dies while in remission (treatment related death). In this data set there were 23 uncensored
and 14 censored observations. Klein and Moeschberger (2003) used the data set to illustrate
the construction of estimates of hazard function by kernel estimator. For the purpose of com-
parison, we estimated the hazard function using the SRE, kernel and parametric estimators.
The SRE was computed based on the assumption that the hazard function is decreasing
60
convex. Since during an early period there is a high risk of relapse or death, after which
especially if the patients respond to the treatment the failure decrease and remain constant
for some time.
The results are displayed in Figure 4.2 and it can be seen that both our estimator and
parametric estimator indicate that there is an early high risk of relapse or death after which
the hazard rate decreases. After 1000 days, the SRE shows that the estimated hazard rate is
nearly zero. While the kernel estimator indicates that the risk of relapse or death increases
in the first 250 days after the transplant, then the hazard rate decreases.
Leukemia Survival Data Example
time
haza
rd
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
l l l l l l l l l l l l l l l l l l l l l l l
Figure 4.3: Estimates of hazard rates for the Leukemia Survival Data based on SRE (solidcurve), kernel (dotted curve) and Kaplan Meier (short dashed curve) and PMLE (long dashedcurve) estimators.
61
This data set is taken from Lee (1980, Table 3.3, p. 72). Seventy-one adult patients
with acute leukemia (ALL) and acute myeloblastic Leukemia (AML) were studied at M.D.
Anderson Hospital and Tumor Institute (Hart et al. 1977). The data set is reproduced in
Table C.1. There were some covariates in the data set, in this study we consider the survival
of the patients from diagnosis of acute myeloblastic Leukemia (AML) and survival status
(life or death). There were 51 observations, of which 6 are censored. Based on the nature of
patients with Leukemia, it can be assumed that the bathtub shaped hazard function may
better describe the shape of the hazard function. Muller and Wang (1994) used this data set
to estimate the hazard function using kernel estimator with varying bandwidths. The result
of Muller and Wang (1994) seems to indicate a bathtub shaped hazard function. Based
on the assumption that the data have a bathtub shaped, estimates of the hazard function
were obtained using SRE. For comparison purpose we also reported estimates of the hazard
function by kernel, parametric, and Kaplan Meier estimators. The results are displayed in
Figure 4.3.
Chapter 5
TESTING FOR SHAPE RESTRICTED HAZARD FUNCTION USING
RESAMPLING TECHNIQUES
In this chapter, we study the problem of testing whether survival times can be modeled
by a certain parametric family such as Weibull, when shape restrictions such as monotonicity,
concavity or convexity are imposed on the hazard rate function. Weibull is commonly used in
statistical analysis of lifetimes (Cohen, 1965). It is generally adequate for modeling monotone
hazard rate functions. On the other hand, if we could reject the null hypothesis that the
hazard is from this parametric family, then the resulting parametric hazard estimate is
inaccurate. For instance, the Weibull family is inappropriate when the hazard rate is indicated
to be unimodal or bath-tub shape (Mudholkar et al., 1996).
In other words, we consider to test the hypothesis that the lifetimes come from a pop-
ulation with a parametric hazard rate function such as Weibull against a shape restricted
alternative that comprises a broad range of hazard functions. The alternative may be appro-
priate when the shape of parametric hazard is not constant and monotone.
A number of approaches have been used to test the equality of hazard or survival functions
for uncensored and censored data. There is a large literature on parametric methods which
lean heavily on likelihood methods for exponential distributions, for two or three parameter
Weibull models, and for other distributions such as log-logistic, log-normal and gamma.
These approaches are summarized in Lawless (1988), and Kalbfleisch and Prentice (1996).
The first nonparametric statistics that gained widespread usage in comparing two or more
hazard or survival functions for censored data were those proposed by Gehan (1965), who
62
63
made a generalization of Wilcoxon rank sum statistics, and Mantel (1966) also proposed
log-rank statistics which became the most commonly used two sample test statistics for
censored data. Many other authors have also considered nonparametric tests for equality of
two or more hazard or survival functions. For example, Chikkagoudar and Shuster, (1974)
proposed a rank test for comparison of failure rate. Bickel and Rosenblatt (1973) suggested
a Kolmogorov and Smirnov type test statistics based on maximal absolute deviation and
mean squared errors for censored data.
Most papers in the literature on shape restricted hazard rate function estimation, how-
ever, focus on testing the null hypothesis of a constant hazard rate versus the alternative of a
nondecreasing. For example, in the uncensored case, testing for a constant hazard rate versus
a nondecreasing has been considered by Bickel and Doksum (1969). Barlow and Porschan
(1969) extended these tests to handle Type I censored data. Hall et al. (2000) also proposed
a nonparametric test based on evaluating the distance between the monotonized estimator
and the standard kernel estimator. More recently, Xiong et al. (2004) presented the likeli-
hood ratio test for testing the null hypothesis that the hazard rate is constant against the
alternative that is increasing.
Hypothesis testing involving maximum likelihood estimates usually are tested by means
of the likelihood ratio statistic. Under some regularity conditions, the likelihood ratio statistic
is asymptotically chi-square distributed. For our case, we can not apply this scheme since
we are avoiding a complete specification of the likelihoods. Instead we plan to compare the
hazard function estimates obtained under shape restriction as was described in chapter 3
with its parametric counterpart using log rank and Kolmogorov’s goodness-of-fit.
We are using resampling based computation to conduct our tests since the asymptotic
distributions of the test statistics in these problems are largely intractable. The form of the
resampling scheme will depend on the null hypothesis to be tested. There are many ways
to use resampling scheme for hypothesis testing. In this dissertation, however, we use the
64
bootstrap p-values approach since it is a convenient way to perform bootstrap inferences
(Davidson and MacKinnon, 1996).
In this chapter, we investigate a number of testing problems. Then we discuss resampling
techniques and describe the bootstrapping method. The last section deals with a number
of simulation studies and an application to real data sets to check the performance of the
different test statistics under different alternatives.
5.1 Test Statistics
1. Consider testing the hypotheses,
H0 : h(t) = hpar(t)
against
HA : h(t) 6= hpar(t)
for all 0 ≤ t ≤ τ ; τ could be a fixed (nonrandom) time point although in practice, it
is often taken to be the largest observed study times. Here hpar is the hypothesized
parametric hazard function. The following log rank test statistic can be used.
Z(τ) =∫ τ
0W (s)h(s) − hpar(s)ds, (5.1.1)
where W (ti) is a weight function. The most popular choices are W (t) = Y (t) and
WFH(t) = S0(t)p[1 − S0(t)]
q. The latter one was proposed by Harrington and Fleming
(1982). The S0 is the null hypothesis survival function, p ≥ 0 and q ≥ 0, and Y (ti) is
the number of surviving individuals under study prior to the observed event time ti.
The above test statistic is found in Klein and Moeschberger (2003, chapter 7).
The weight function for the test statistics (5.1.1) was taken to be w(t) = S0(t) (1 − S0(t)).
Since our estimator may have spikes at the end points this assigns less weight to early
65
and late differences between h(t) and hpar(t). In other words, this puts more weights
on the differences of h(t) and h(t)par in the mid-range.
2. Consider Kolmogorov’s goodness-of-fit type testing problems where one tests a speci-
fied null hypothesis,
H0 : S(t) = S0(t)
against
HA : S(t) 6= S0(t).
In a general sense, this problem is the same as the previous one if indeed S0 has
unspecified parameters that need to be estimated except that the hypotheses are spec-
ified in terms of survival functions rather than the hazard rates. However, we may
consider a different test statistic that is in the form of a Kolmogorov’s goodness-of-fit
test. The test evaluates the closeness of the constrained survival function S(t) to the
hypothesized S0(t):
Dn = sup|S(t) − S0(t)|. (5.1.2)
3. For an important special case, one can indeed use a likelihood ratio approach. This is
the situation if the hazard function under H0 is increasing linear and the alternative is
increasing convex. In the case of regression data, Meyer (2003) showed that likelihood
ratio tests of linear versus convex regression functions known to have a mixture of beta
random variables. For further details see Meyer (2003).
66
5.2 Resampling Approach
The asymptotic distributions of the test statistics proposed in the previous section are not
only expected to be complicated but also intractable since they will involve parameters of
the censoring distribution and we have only approximated likelihood function. In such situ-
ations, resampling may be a practical and appealing alternative. The form of the resampling
scheme will depend on the null hypothesis to be tested. Although there are many ways to
use resampling for hypothesis testing, in this dissertation we emphasize its use to compute
bootstrap p-values. The p-value approach is a convenient way to perform bootstrap infer-
ences (Davidson and MacKinnon, 1996). This method has been used by some authors for
testing models with shape constraints. For example, Geyer (1991) used parametric bootstrap
to calculate p-values of likelihood ratio test for convex logistic regression.
5.2.1 Bootstrap
Before discussing the applications of resampling methodology and the p-value approach, the
basic ideas of bootstrap are reviewed in this section.
The concept of the bootstrap was first introduced by Efron (1979) for calculating approx-
imated bias, standard deviations, confidence intervals, p-values, and so forth. In his paper
Efron (1979) considered two types of bootstrap procedures useful, respectively, for nonpara-
metric and parametric inferences. The nonparametric bootstrap depends on the considera-
tion of the discrete empirical distribution generated by a random sample of size n from an
unknown distribution F . This empirical distribution Fn assigns equal probability to each
sample item. In the parametric bootstrap setting, F is considered to be a member of some
prescribed parametric family and Fn is obtained by estimating the family parameter(s) from
67
the data. In each case, by generating an iid random sequence, called a resample or pseudo-
sequence, from the distribution Fn, new estimates of various parameters or nonparametric
characteristics of the original distribution F can be obtained. This simple idea is the root of
the bootstrap methodology.
5.2.2 Bootstrap P-value
Some of the discussion here is abridged from Davidson and MacKinnon (1996), and Hall
(1992). Suppose Z be the test statistic given by equation (5.1.1) or equation (5.1.2). We
can use bootstrapping either to calculate a critical value for Z or to calculate the p-value,
associated with Zobs, the realized value of Z. The latter approach is preferred because knowing
the p-value associated with a test statistic is more informative than simply knowing whether
or not the test statistic exceeds some critical value.
In order to estimate the p-value, we may use either a parametric or a nonparametric
bootstrap to draw the bootstrap samples. For the parametric case, we generate the samples
from the model itself, using a vector of parameter estimates under the null, say µ. This
approach is appropriate in case of a model fully specified under H0. For the nonparametric
case, however, we sample from something like empirical distribution function of the data.
This approach is appropriate if the model is not fully specified under H0.
Suppose we generate B bootstrap samples, each of size n and use them to calculate B
test statistics, Z∗, for j = 1 · · · , B. Then the bootstrap p-value for two tailed test is
calculated as follows,
P (Z) =1
B
B∑
j=1
I(
|Z∗j | ≥ |Zobs|
)
(5.2.1)
68
where I(·) is an indicator function, equal to 1 when its argument is true and equal to zero
otherwise.
5.3 Bootstrap based tests
For the testing problems 1 and 2 the following resampling scheme is used to compute the
bootstrap p-value of our test statistics.
Step 1. Estimate the parameters of the parametric model under the null hypothesis from the
original data and denote the estimated by h(t)µ, say.
Step 2. Generate iid samples of failure times T ∗1 , · · · , T ∗
n from h(t)µ and independently generate
iid samples of censoring times C∗1 , · · · , C
∗n from the Kaplan-Meier estimator of the
censoring distribution. Construct X∗i = min(T ∗
i , C∗i ) and δ∗i = I(T ∗
i ≤ C∗i ), 1 ≤ i ≤ n.
Step 3. Repeat Step 2 a large number, say B, times and for each bootstrap sample X∗i , δ∗i :1 ≤
i ≤ n, compute the test statistic Z . Denote the resulting values by Z∗1 , ..., Z
∗B.
Step 4. Let Zobs be the observed value of the test statistic computed using the original sample.
Then a bootstrap approximation to the p-value is given by
P (Zobs) =
∑Bj=1 I(|Z∗
j | ≥ |Zobs|)
B
reject the null hypothesis for small value of p-value.
69
5.3.1 Bootstrap power
The power of the tests given in section 5.1 are compared by simulation using bootstrap
method for different samples from exponentiated Weibull distributions with parameters λ,
η and α with and without censoring. We generated random samples from
f(t) =αη
λ[1 − exp (−(t/λ)α)]η−1 exp [−(t/λ)α] (t/λ)α−1. (5.3.1)
The parameters λ, α and η were chosen to be λ = 10, α = 6 and η = 1 for the null and λ =
10, α = 6 and η = 1, 0.75, 0.5, 0.25 for the alternative. In these cases, the hazard function in
the null hypothesis is monotone and the hazard functions under the alternative are monotone,
bathtub shaped. For the censoring case, we generate the corresponding censoring times from
Weibull distribution with parameters α, λ and η = 1 to achieve a desired expected percentage
of censoring in the data. Then the hazard function is estimated using our estimator under the
alternative and calculate the parametric hazard function under the null, then we compute
the tests statistics given by equations (5.1.1) or (5.1.2), we denote by Zobs the test statistics
associated with these estimates.
For each sample, we replaced the true parameters by their estimates in a parametric
model under the null hypothesis. We assume H0 is rejected when the p-value is less than
the nominal level. Then we applied the bootstrap based test procedure, described in section
5.2. The whole process is repeated N times. So, if we fix the sample size and repeat N times
of re-sampling process, we will obtain N independent data sets and be able to calculate
the percentage of rejection H0 in those N simulations. This percentage is an estimate of
the power at the specified sample size. Using the same idea, we can get powers at different
sample sizes and η values.
Figure 5.1 shows the graph of the hazard function for certain values of the parameters for
the density function (5.3.1). This figure illustrates the shape of hazard function for several
70
values of η = 1, 0.75, 0.5, 0.025, 0.01. One can easily see that when η is less than 0.333 the
shape of the hazard function is bathtub shaped.
Time
haza
rd
0 2 4 6 8 10 12
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Figure 5.1: Graph of hazard function for the model (5.3.1) when α = 6, λ = 10 and η = 1,0.75 and 0.5 (solid curves) from lowest to highest, η =0.025 and 0.01 (dashed curves) fromlowest to highest, and α = 1, η = 1 (dotted curve).
5.4 Simulation Studies and Results
In this section, we investigated the performance of such bootstrap based tests under a
variety of settings primarily through simulation studies. We conducted simulation study for
the testing problem 1 and problem 2, which are described in section 5.1.
For the first example, the simulations are based on 1000 samples of size n = 25, 40 and 50,
for each sample, bootstrap replications of size B = 1000 were used for both censored and
71
uncensored samples. P-values estimation procedures with the scheme described in section 5.3
were carried out for N = 1000 Monte Carlo replications for each setting. Then the power was
computed by the proportion of rejection out of 1000 Monte-Carlo replications. The results
are displayed in Table 5.1. Power for the log rank (LR) and Kolmogorovs goodness-of-fit
(KS) under different alternatives is given as a function of η, based on the nominal levels
0.05. Note that η = 1 corresponds to the null hypothesis. From the results of Table 5.1 it can
easily be seen that the KS has higher power under the alternative than the LR in all cases
for both the censored and uncensored samples. When the parametric and shape restricted
hazard rates cross at some points, the log rank test may not have optimal power to detect
the difference in the hazard rates tests. Hence, this could be one of the possible reasons that
KS has higher power than LR under the alternative. In general, when the hazard rates are
proportional, the log-rank may have higher power to detect differences in the hazard rates.
For both tests the results clearly show that bootstrap power under the null is getting
closer and closer to the nominal size for the uncensored sample as n increases. On the other
hand, for small sample sizes under censoring the sizes of LR and KS are far from the margin
of the nominal level.
In the second example, we examine the power gains obtained by restricting the alternative
and looking at the power of a test of the hypothesis that the lifetimes come from a population
with a parametric hazard rate against a general not necessarily shape restricted alternative.
The tests statistic used for this purpose are the tests discussed in section 5.1 with slight
modification to log rank test as follows:
Z(τ) =D∑
i=1
W (ti)di
Y (ti)−∫ τ
0W (s)hpar(s)ds, (5.4.1)
where di is the number of events at the observed events times, t1, · · · , tD and Y (ti) is the
number of individuals under study just prior the observed event time ti. The quantity di
Y (ti)
72
gives a crude estimate of the hazard rate at an event time ti. As for the KS test we evaluated
the closeness of the Kaplan-Meier survival function estimator to the hypothesized S0(t).
To investigate the comparison, the power of the tests is computed numerically based on
the bootstrapped p-value for several choices of sample sizes n = 25, 50, 75, and 100. The
results are displayed in Table 5.2. In the first row of Table 5.2, LRSR and KSSR stands for
the log rank test and Kolmogorovs goodness of fit test computed under the shape restricted
alternative, respectively. Where as LRUR and KSUR represent both the tests computed under
a general not necessarily shape restricted alternative.
When the LR test statistic is used there is a sizable percentage in power gain under the
shape restricted alternative. Power gain using a shape restricted KS test was also observed in
all cases over a Kaplan-Meier based KS test although the extent was marginal. A plausible
explanation could be that the maximum difference between the estimated and the target
survival functions is achieved at a region that is relatively unaffected by the shape restriction
on the estimate. Also, the KS tests seem to have better power than the LR tests in this
example.
As n increase, the power increase as expected for all cases and the size of the test becomes
closer and closer to the nominal level under the null for both the shape restricted and unre-
stricted tests. For small sample the shape restricted test has better size than the unrestricted
test under the null.
Figure 5.2 shows power, as a function of η, for sample sizes 25, 50 and 100 based on 1000
replications and 1000 bootstrap samples for each replication. The power are constructed
using the log rank test statistic. From the graph we can see that the rejection probabilities
increase as the sample size increases as expected.
73
Table 5.1: Power values for specific values of η, nominal level 0.05, and n =25, 50 and 100based on log rank (LR), Kolmogorov’s goodness of fit (KS) at 0 and 25 level of censoring.
LR KS LR KS LR KSη n = 25 n=50 n = 100
0 percent censoring1.00 0.055 0.057 0.054 0.049 0.047 0.0530.75 0.077 0.140 0.089 0.282 0.124 0.5780.50 0.152 0.671 0.263 0.963 0.471 0.9970.25 0.308 0.987 0.714 0.997 0.945 0.999
25 percent censoring1.00 0.041 0.066 0.039 0.057 0.056 0.0470.75 0.055 0.178 0.078 0.278 0.145 0.5510.50 0.070 0.765 0.227 0.957 0.387 0.9920.25 0.159 0.997 0.647 0.996 0.929 0.999
Table 5.2: Size-power comparison for shape constrained and unconstrained tests for specificvalues of η, nominal level 0.05 based on LR and KS without censoring.
η n LRSR LRUR %power gain KSSR KSUR %power gain
1.00 25 0.055 0.035 - 0.057 0.056 -0.75 0.077 0.056 37.5 0.140 0.132 6.10.50 0.152 0.123 23.6 0.671 0.667 0.60.25 0.308 0.256 20.3 0.987 0.978 0.91.00 50 0.054 0.038 - 0.049 0.056 -0.75 0.089 0.055 61.8 0.282 0.266 6.00.50 0.263 0.171 53.8 0.963 0.902 6.80.25 0.714 0.375 90.4 0.997 0.992 0.51.00 75 0.054 0.043 - 0.051 0.053 -0.75 0.113 0.081 40.0 0.421 0.413 1.990.50 0.289 0.195 48.2. 0.983 0.982 0.10.25 0.862 0.397 117.1 0.999 0.999 0.01.00 100 0.047 0.055 - 0.052 0.053 -0.75 0.124 0.094 31.9 0.578 0.535 7.90.50 0.471 0.213 121.1 0.999 0.998 0.10.25 0.945 0.411 129.9 1.000 1.000 0.0
74
••
•
•
eta
powe
r
0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
••
•
•
•
•
•
•
llll
Figure 5.2: Power at selected η values for nominal level 0.05, for log-rank test for 25 ( solidcurve), 50 (dotted curve) and 100 (short dashed curve) sample sizes, while the long dashedcurve represents the nominal level α = 0.05.
5.4.1 Head and Neck Example reanalysis
For illustration, we apply our test to the head and neck data set, which was discussed earlier
in Chapter 4, section 4.2. We are interested in testing the hypothesis that lifetimes come
from a Weibull distribution against a shape restricted alternative. After transforming the
data into months, the maximum likelihood estimate of the parameters, (α, λ) of Weibull
distribution α = 0.988 and λ = 13.580 were obtained by Newton Raphson method. The
algorithm described in section 5.3 was used with replication B = 1000 and the test statistics
given in section 5.1 to compute the p-value. The bootstrapped p-values of 0.000 and 0.007 for
75
the LR and KS were obtained. Thus, the null hypothesis that the lifetimes can be modeled
by Weibull distribution was rejected at 0.05 significance level. The p-values suggested the
inadequacy of the Weibull fit, which supports that the Weibull distribution is not appropriate
when the shape of the hazard function is not monotone.
5.4.2 Fan Generators Example
For our second example, we apply our tests to the fan generators data set which can be
found in Therneau and Grambsch (2000, p. 8). This data set originally appeared in Nelson
(1969). The data came from a field engineering study of the time to failure of diesel generator
fans. The data are replicated in Table D.1. The ultimate goal was to decide whether or not
to replace the working fans with a higher quality fan to prevent future failures. Seventy
generators were studied. For each generator, the number of hours of running time from first
being put into service until fan failure or until the end of the study (whichever came first)
was recorded. In this data set there were 11 uncensored and 59 censored observations. From
the data set the parameters of the hazard function under the null hypothesis were estimated
and the values of α = 1.059 and λ = 262.968 were obtained.
We assume that this data set has a decreasing convex hazard, since the engineering
problem was to determine whether the failure rate was decreasing over time (see Therneau
and Grambsch, 2000, p. 8). It was possible that the intial failures removed the weaker fans,
and the failure rate on the remaining fans would be tolerably low.
We use the bootstrap based p-value method with replication B = 1000 to test the hypoth-
esis that the data comes from a Weibull distribution. All the tests gave very high p-values,
LR (0.923) and KS (0.957) for testing the null hypothesis. We conclude that the data can
be modeled by a Weibull distribution with a constant hazard function.
76
5.4.3 Bone Marrow Transplantation for Leukemia Data Example revised
In chapter 4 we have estimated the hazard function based on bone marrow transplant for
leukemia using our method and other estimators. In this example, our main interest is to see if
the data can be modeled by parametric hazard function against shape restricted alternative.
From the data set the parameters of the hazard function under the null hypothesis were
estimated and the values of α = 0.658 and λ = 1097.514 were obtained. Now based on
bootstrapped p-value we found that the p-values of LR and KS equal to 0.190 and 0.658,
respectively. From these results, we can not reject the null hypothesis that the hazard function
can be modeled by a Weibull distribution with a decreasing hazard rate.
Chapter 6
CONCLUSIONS AND FUTURE RESEARCH
6.1 Summary
In this dissertation, we have introduced the nonparametric method for estimation of the
hazard or survival function when shape restrictions are imposed on the hazard function for
uncensored and right censored data.
The results from the simulation and real data set demonstrate the flexibility and prac-
ticality of the our estimator. The benefit of the shape restricted approach is that it can be
used to model any lifetime data whose shape is known, while parametric approach requires
complete knowledge of the underlying hazard function.
Although other nonparametric methods, such as kernel, Kaplan Meier and other methods
can be applied to estimate the hazard function or survival function, they might waste some
important information about the true underlying hazard function. Besides, the nonpara-
metric method may require user-defined parameters such as bandwidth. Hence, when the
only available information is that the underlying hazard function is decreasing, increasing,
concave, convex or bathtub-shaped, then the shape restricted method provide a more accept-
able estimate.
6.2 Bayesian Approach To Shape Restricted Hazard Function
A shape constrained maximum likelihood estimation may not be easily interpreted because
it is not smooth. In this work we suggest to use Bayesian approach with prior to the shape
77
78
restricted hazard function to serve as a smoothing technique. Then we will investigate the
estimation of hazard function and the related hypothesis tests.
Before going into the methods of finding Bayesian tests, we first discuss the Bayesian
approach to statistics. In the classical approach, the parameter (θ) is thought to be an
unknown, but fixed quantity. A random sample t1, . . . , tn is drawn from a population indexed
by θ and, based on the observed values in the sample, knowledge about the values of θ
is obtained. In the Bayesian approach, θ is considered to be a quantity whose variation
can be described by a probability distribution (called the prior distribution). A sample
t1, . . . , tn is then taken from a population indexed by θ and the prior distribution is updated
with this sample information. The updated prior is called the posterior distribution. This
updating is done using Bayes’s rule. All the inferences about θ are now based on the posterior
distribution.
Prior Distributions In order to use Bayesian approach in our case, we will apply priors
over all the unknown parameters. The unknown parameters in the shape restricted model
are the coefficients of the edge vectors and the coefficients of linear vectors. For example if
we consider convex constraint,
θ =n−2∑
j=1
bjδj + c1ν1 + c2ν2,
where bj ≥ 0, j = 1, · · · , n − 2; c1 and c2 are any real numbers.
Meyer and Laud (2005), and Jiang (2005) addressed the problem of incorporating prior
information into the unknown parameters in the shape restricted model. Following their
procedure, we will assign a gamma prior to the coefficients of the edge vectors. This is
because the coefficients need to be non-negative.
We will apply normal distribution priors with means µ1, µ2 and variances M1, M2 on the
linear coefficients c1, c2. The normal prior is suitable for the linear coefficients because it can
take any real values.
79
Selection of the prior parameters can be difficult. Jiang (2005) discussed some ways
of choosing prior parameters. In this work, we will extend the methods of choosing prior
parameters and will derive the posterior distribution. Based on the posterior distribution,
the hazard function will be estimated. Most of the posterior density is complicated, as it
is difficult to get it directly. However, numerical techniques such as Gibbs sampling will be
used to alleviate this difficulty.
6.3 Marginal Estimation of Hazard Function Under Shape Restriction in
Presence of Dependent Censoring
A crucial assumption in the construction of our likelihood function was that censoring
was independent of failure. Although this is a standard and classical assumption in the
vast majority of right censoring literature, it is still less than desirable for certain applica-
tions. Dependent censoring can occur if, for example, there are covariates affecting both the
censoring and failure time distributions. Such covariates could be both internal as well as
external. In a sense the situation with internal covariates is more crucial since they are no
ways to avoid them in some situation. A simple example can be considered to demonstrate
this. Consider individuals moving through successions of stages 1, 2 and 3 and let’s say we
are interested in estimating the waiting time distribution in stage 2. In the presence of right
censoring by a variable C, one has induced dependent censoring for the stage 1 waiting time
even if C was completely independent of the stage mechanisms.
If one could measure the covariates responsible for dependent censoring then there are
ways of accounting for them in the construction of marginal hazard rate function. We plan
to pursue this along with our shape restricted inference. This is easily accomplished via
the weighted approach. Basically, one needs to model the censoring hazard in terms of the
covariates inducing dependent censoring. In all the formulas before K needs to be replaced
80
by the corresponding estimate that incorporates the covariates. We will follow the approach
laid in by Satten et al. (2001) to this end. One flexible way of modeling the censoring hazard
is to use Aalen’s linear model (Aalen, 1989). It produces estimates of K in a closed form
using matrices. The rest of the computation should go through without any change.
6.4 Hazard Function Estimation Using Splines Under Shape Restrictions
The constrained maximum likelihood estimator can be inconsistent at the endpoints, where
there is “ spiking”. This spiking can affect the critical values of the test statistic. The esti-
mator may not also yield smooth estimates of the hazard function. In this work a flexible
nonparametric method using splines will be introduced to estimate the hazard rate function
when shape restriction is imposed on the hazard rate function. We will discuss the method
for estimating convex hazard rate function, concave hazard rate function and other hazard
rate functions with different shapes using the spline approach. Method for testing the null
hypothesis of the hazard function can be modeled by parametric family distribution against
shape restricted alternative will also be investigated.
Construction of the Spline-Estimator
Recall the log likelihood function for the piecewise linear estimator is,
ℓ =n∑
i=1
log δih(xi) −n∑
i=1
cih(xi) (6.4.1)
where the ci depends on xi.
In this work, for convex spline hazard rate function, we restrict the estimator to be
piecewise quadratic. Then using numerical integrations the coefficients for the weights can
be obtained.
Let the knots occur at xi1 , . . . , xik ; for convenience we set i1 = 1 and ik = n, so that
there are K − 2 interior knots. A convex spline hazard rate function can be constructed
81
by taking a nonnegative linear combination of the edges. The cone edges will be derived
from the c-splines at the data using integrated I-splines from Ramsay (1988). For detailed
information of I-splines (see Ramsay, 1988). Any piecewise quadratic convex function with
these knots is a nonnegative linear combination of cone edges plus 1 and x. For example for
a convex spline, the c-splines are given by:
1. δkj = 0 for j = 1, . . . , ik−1 and k = 1, . . . , K.
2. δkj = 1
2
(xj−xik)2
(xik+1−xik
)for j = ik, . . . , ik+1 and k = 1, . . . , K.
3. δkj = xj −
12(xik+1
+ xik) for j = ik+1, . . . , n and k = 1, . . . , K.
t
c−sp
lines
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure 6.1: The edges for convex piecewise quadratic when K=5, with equally spaced knots.
82
C-splines Example 1
In Figure 6.1 the form of c-spline for quadratic function, and K=5 with equally spaced knots
are given.
Quadratic Splines Example 2
To illustrate the flexibility of the method, failure times were generated from Weibull distri-
bution with parameters α = 3 and λ = 0.03. The results are displayed in Figure 6.2 and
shows that the shape restricted quadratic spline is smoother than the SRE and does not
spike so much compared to the SRE.
failure time
haza
rd fu
nctio
n es
timat
e
0.0 0.01 0.02 0.03 0.04
050
100
150
200
250
300
| | | | || | | | || |||| || | | | | |||| | | | | | | | | | | | |
quadratic spline
convex MLE
true hazard
Data simulated from Weibull(3,0.03)
Figure 6.2: Comparison of SRE and quadratic spline, the failure times are generated fromWeibull distribution with shape and scale parameters 3 and 0.03. The dotted curve is theunderlying hazard rate, the dashed curve is SRE estimate and the solid curve is shaperestricted quadratic spline estimate.
Bibliography
[1] Aalen, O. O. (1978). Nonparametric inference for family of counting process, The Annals
of Statistics, 6,, 701-726.
[2] Aalen, O. O. (1989). A linear regression model for the analysis of lifetimes, Statistics in
Medicine, 6, 907-925.
[3] Bickel, P.J, and Doksum, K.A. (1969). Tests for monotone failure rate based on nor-
malized spacings, The Annals of Mathematical Statistics, 40, 1216-1235.
[4] Bickel, P. and Rosenblatt, M. (1973). On some global measures of the deviations of
density function estimates, The Annals of Statistics, 1, 1071 1095.
[5] Barlow, R.E, and Proschan, F. (1969). A note on tests for monotone failure rate based
on incomplete data, The Annals of Mathematical Statistics, 40, 595-600.
[6] Carlin B. P. and Polson N. G. (1991). Inference for nonconjugate Bayesian models using
the Gibbs sampler, Canadian Journal of Statistics, 19, 399-405.
[7] Cox, D. R. (1972). Regression models and life tables, Journal of the Royal Statistical
Society, 34, 187-220.
[8] Davison, R. and MacKinnon, J. G. (1998). Graphical methods for investigating the size
and power hypothesis tests, The Manchester School, 66, 1-26.
[9] Efron, B. (1981). Bootstrap methods: Another look at the Jackknife, Annals of Statis-
tics, 7, 1-26.
83
84
[10] Efron, B. (1981). Censored data and bootstrap, Journal of the American Statistical
Association, 76, 312-319.
[11] Efron, B. (1987). Better bootstrap confidence intervals, Journal of the American Sta-
tistical Association, 82, 171-200.
[12] Efron, B. (1988). Logistic regression, survival analysis, and the Kaplan-Meier Curve,
Journal of the American Statistical Association, 83, 414-425.
[13] Efron, B. and Tibshirani, R. (1986). Bootstrap method for standard errors, confidence
intervals, and other measures of statistical accuracy, Statistical Science, 1, 54-75.
[14] Epanechnikov, V.A. (1969). Nonparametric estimation of a multidimensional probability
density, Theory of Probability and its Applications, 14, 153-158.
[15] Gerlach, B. (1987). Testing exponentially against increasing failure rate with randomly
censored data, Statistics, 18, 268-275.
[16] Fleming, T.R, and Harrington, D.P. (1991). Counting Processes and Survival Analysis,
Wiley, New York.
[17] Florenzano, M., and Le Van, C. (2001). Finite Dimensional Convexity and Optimization,
Springer- Verlag Berlin, Germany.
[18] Fraser, D.A.S. and Massam, H. (1989). A mixed primal-dual bases algorithm for
regression under inequality constraints, Application to convex regression, Scandinavian
Journal of Statistics, 16, 65-74.
[19] Gehan, E.A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-
censored samples, Biometrika, 52, 203-223.
85
[20] Gelfand, A. E, Hills, S. E., Racine-Poon, A., Smith, A. F. M. (1990). Illustration of
Bayesian inference in normal data models using Gibbs sampler, Journal of the American
Statistical Association, 85, 972-985.
[21] Hall, P, and Heckman, N. (2000). Testing for monotonicity of a regression mean by
calibrating for linear functions, The Annals of Statistics, 28, 20-39.
[22] Hall, P, and Horrowitz, J. L. (1996). Bootstrap critical values for tests based on gener-
alized method of moments estimators, Econometrica, 64, 891-916.
[23] Hess, K. R, Serachitopol, D. M. and Brown, B. W. (1999). Hazard function estimation:
A simulation study, Statistics in Medicine, 18, 3075-3088.
[24] Kalbfleisch, J.D. and Prentice, R.L. (1980). The Statistical Analysis of Failure Time
Data, Wiley, New york.
[25] Kaplan, E.L, and Meier, P. (1958). Nonparametric Estimation from incomplete samples,
Journal of the American Statistical Association, 53, 457-481.
[26] Jiang, Yan. (2005). Semiparametric ancova using shape restrictions, Unpublished PhD
dissertation, University of Georgia, Athens, Georgia
[27] Klein, J. P, and Moeschberger, M. L. (1997). Survival Analysis: Techniques for Censored
and Truncated Data, Springer-Verlag, New York.
[28] Koul, H., Susarla,V., Van Ryzin, J. (1981). Regression analysis of randomly right-
censored data, Annals of Statistics, 9, 1276-1288.
[29] Kouassi, D. A, and Singh, J. M. (1997). A Semiparametric approach to hazard estima-
tion with randomly censored observations, Journal of the American Statistical Associ-
ation, 92, 1351-1355.
86
[30] Liu, R. Y. C and Van Ryzin, J. (1985). A Histogram estimator of the hazard rate with
censored data, Annals of Statistics, 13, 592-605.
[31] Meyer, M. C. (1999a). An extension of the mixed primal-dual bases algorithm to the
case of more constraints than dimensions, Journal of Statistical Planning and Inference,
81, 13-31.
[32] Meyer, M. C. (1999b). A comparison of nonparametric shape constrained bioassay esti-
mators, Statistics and Probability letters, 42, 267-274.
[33] Meyer, M. C. (2003). A test for linear versus convex regression function using shaped-
restricted regression, Biometrika, 90, 223-232.
[34] Meyer, M. C. (2006). Shape- Restricted Regression Splines, preprint.
[35] Meyer, M. C., and Lund, R. (2003). Inference in shape-restricted regression with time-
series data, preprint.
[36] Meyer, M. C., and Laud, P. W. (2005). A Bayesian approach to shape-restricted regres-
sion, Department of Statistics Technical Report, the University of Georgia.
[37] Meyer, M. C., and Woodroofe, M. (2004). Estimation of a unimodal density using shape
restrictions,Canadian Journal of Statistics, 32, 85-100.
[38] Miller, R. G. (1981). Survival Analysis, John Wiley & Sons, New York.
[39] Miller, R. G. (1983). What price Kaplan -Meier? Biometrika, 39, 1077-1081.
[40] Muller, H. G., and Wang, J.L. (1990). Nonparametric analysis of changes in hazard
rate with censored survival data: an alternative to change-point models, Biometrika, 7,
305-314.
87
[41] Mudholkar, G, Srivastava, D. K, and Freimer M. (1995). The exponentiated Weibull
family: A reanalysis of the bus-motor failure data, Technometrics, 4, 436-445.
[42] Proschan, F. and Pyke, R. (1967). Tests for monotone failure rate, Fifth Berkeley Sym-
posium, 3, 293-313.
[43] Peto, R. and Peto, J. (1972). Asymptotically efficient rank invariant test procedures,
Journal of the Royal Statistical Society, 135, 185-206.
[44] Ramlau-Hansen, H.(1983). Smoothing counting process intensities by means of kernel
functions, The Annals of Statistics, 11, 435-466.
[45] Ramsay, J. O. (1988). Monotone Regression Splines in action, Statistical Science, 3,
425-461.
[46] Robertson, T., Wright, F. T., and Dykstra, R. L. (1988). Order Restricted Statistical
Inference, John Wiley & Sons, New York.
[47] Rockafellar, R. T. (1970). Convex Analysis, Princeton University Press, New Jersey.
[48] Satten, G. A. and Datta, S (2000). The Kaplan-Meier Estimator as Inverse-Probability
-of Censoring Weighted Average, American Statistician, 81, 13-31.
[49] Satten, G. A., Datta, S, and J., Robins. (2001). Estimating the marginal survival func-
tion in the presence of time dependent covariates, Statistics and Probability letters, 54,
397-403.
[50] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman
and Hall, New York.
[51] Smith, P.J. (2002). Analysis of Failure and Survival Data, Chapman and Hall, Boca
Raton, Florida.
88
[52] Tanner, M. A.(1983). A note on the variable kernel estimator of the hazard function
from censored data, The Annals of Statistics, 11, 994-998.
[53] Tanner, M. A, and Wong, W, H. (1983). The estimation of the hazard function from
random censored data by kernel method, The Annals of Statistics, 11, 989-993.
[54] Therneau, T, and Grambsch, P. M. (2000). Modeling survival data : extending the Cox
model, Springer, New York.
[55] Watson, G.S, and Leadbetter, M. R. (1964a) Hazard analysis I, Biometrika, 51, 175-184.
[56] Watson, G.S, and Leadbetter, M. R. (1964b) Hazard analysis II, Sankhya, 26, 101-116.
[57] Wedman, E. J, and Wright, I. W. (1983). Splines in statistics, Journal of the American
Statistical Association, 78, 351-365.
[58] Xiong, J.P, Miller, F. G, and Yan, Y. (2004). Testing increasing hazard rate for progres-
sion time of dementia, Discrete and Continuous Dynamic Systems Series, 4, 813-821.
Appendix A
Head and Neck Cancer data for Arm A
The head and neck data set is from a clinical trial conducted by Northern Oncology Group,
discussed by Efron (1988). The data represent the survival times of 51 head and neck cancer
patients under treatment A, who were given radiation therapy. Nine patients were lost to
follow up and were considered as censored.
89
90
Table A.1: Survival times (in days) for patients in Arm A of the Head and Neck CancerTrial. The 0 denotes observations lost to follow up.
Survival Time (in days) Status7 134 142 163 164 174 083 184 191 1108 1112 1129 1133 1133 1139 1140 1140 1146 1149 1154 1157 1160 1160 1165 1173 1176 1185 0218 1225 1241 1248 1273 1277 1279 0297 1319 0
91
Survival time from diagnosis Status405 1417 1420 1440 1523 1523 0583 1594 11101 11116 11146 11226 01349 01412 01417 1
Appendix B
Bone Marrow Transplantation for leukemia data
Bone marrow transplant for leukemia data set is taken from Klein and Moeschberger (2003, p.
3). Out of their three sets of leukemia patient groups, we considered the patients in the group
with acute lymphoblastic leukemia (ALL). Transplantation is considered a failure when a
patient’s leukemia returns (relapse) or when he or she dies while in remission (treatment
related death). In this data set there were 23 uncensored and 14 censored observations.
92
93
Table B.1: Bone Marrow Transplantation for acute lymphoblastic leukemia (ALL) group,status=0 indicates alive or disease free, and status=1 indicates dead or relapsed.)
Survival Time (in days) Status1 155 174 186 1104 1107 1109 1110 1122 1129 1172 1192 1194 1226 0230 1276 1332 1383 1418 1466 1487 1526 1530 0609 1662 1996 01111 01167 01182 01199 01330 01377 01433 01462 01496 01602 02081 0
Appendix C
Data for Leukemia Survival Patients
This data set is taken from Lee (1980, Table 3.3, p. 72). The data set was originally appeared
in Hart et al. (1977). Seventy-one adult patients with acute leukemia (ALL) and acute
myeloblastic Leukemia (AML) were studied at M.D. Anderson Hospital and Tumor Institute.
There were some covariates in the data set, in this study we consider the survival of the
patients from diagnosis of acute myeloblastic Leukemia (AML) and survival status (life or
death). There were 51 observations, of which 6 are censored.
94
95
Table C.1: Data for Leukemia Patients, status=0 indicates still alive and status=1 indicatesdead.
Survival time from diagnosis Status1 11 11 11 11 11 11 11 11 11 11 11 12 12 12 13 13 13 14 14 14 15 15 17 18 18 19 19 19 112 112 113 113 113 114 115 118 118 120 024 1
Appendix D
Generator fans failure data
This data set is found in Therneau and Grambsch (2000, p. 8); and Nelson (1969). The
data come from a field engineering study of the time to failure of diesel generator fans. The
ultimate goal was to decide whether or not to replace the working fans with a higher quality
fan to prevent future failures. Seventy generators were studied. For each one, the number of
hours of running time from its first being put into service until fan failure or until the end
of the study (whichever came first) was recorded. In this data set there were 11 uncensored
and 59 censored observations.
97
98
Table D.1: Generator fan failure data in thousands of hours of running time; status=1 indi-cates failure, and status=0 indicates censored.
hours of service Status4.5 14.6 011.5 111.5 115.6 016.0 116.6 018.5 018.5 018.5 018.5 018.5 020.3 020.3 020.3 020.7 120.7 120.8 122.0 030.0 030.0 030.0 030.0 031.0 132.0 034.5 137.5 037.5 041.5 041.5 041.5 041.5 043.0 043.0 043.0 043.0 046.0 148.5 048.5 048.5 048.5 050.0 0