11
J. Ital. Statist. Soc. (1996) 2, pp. 285-295 A BAYESIAN NONPARAMETRIC ESTIMATOR BASED ON LEFT CENSORED DATA Stephen Walker Imperial College, London, UK Pietro Muliere* Universith di Pavia, Italy Summary This paper introduces a Bayesian nonparametric estimator for an unknown distribution function based on left censored observations. Hjort (1990)/Lo (1993) introduced Bayesian nonparametric estimators derived from beta/beta-neutral processes which allow for right censoring. These processes are taken as priors from the class of neutral to the right processes (Doksum, 1974). The Kaplan-Meier nonparametric product limit estimator can be obtained from these Bayesian nonparametric estimators in the limiting case of a vague prior. The present paper introduces what can be seen as the corresponding left beta/beta- neutral process prior which allow for left censoring. The Bayesian nonparametyric estimator is obtained as is the corresponding product limit estimator based on left censored data. Keywords: Beta-neutral process; Dirichlet process; Neutral to the left process; Neutral to the right process; Product limit estimator. 1. Introduction Ware and DeMets (1976) introduced a nonparametric estimator for a distribution function based on arbitrary left censored observations. They obtained their estimator by considering a reversal of time and treating the resulting data as being fight censored and then use theory of Kaplan and Meier (1958). The aim of this paper is to generalise and derive the estimator of Ware and DeMets by working within a Bayesian nonparametric framework. A new process is introduced, which is shown to be neutral to the left, and which is taken as a prior on the space of distribution functions. This prior is updated to the posterior given arbitrary left censored observations from which the estimators are derived. ' Definition 1. (Doksum, 1974) The random distribution function F(t) is said to be neutral to the right if the normalised increments * Addressforcorrespondence:DipartimentodiEconomiaPolJticaeMetodiQuantitativi, Universit~ di Pavia, Via S. Felice, 27100 Pavia, Italy. E_mail: [email protected] 285

A Bayesian nonparametric estimator based on left censored data

Embed Size (px)

Citation preview

Page 1: A Bayesian nonparametric estimator based on left censored data

J. Ital. Statist. Soc. (1996) 2, pp. 285-295

A BAYESIAN NONPARAMETRIC ESTIMATOR BASED ON LEFT CENSORED DATA

Stephen Walker Imperial College, London, UK

Pietro Muliere* Universith di Pavia, Italy

Summary

This paper introduces a Bayesian nonparametric estimator for an unknown distribution function based on left censored observations.

Hjort (1990)/Lo (1993) introduced Bayesian nonparametric estimators derived from beta/beta-neutral processes which allow for right censoring. These processes are taken as priors from the class of neutral to the right processes (Doksum, 1974). The Kaplan-Meier nonparametric product limit estimator can be obtained from these Bayesian nonparametric estimators in the limiting case of a vague prior.

The present paper introduces what can be seen as the corresponding left beta/beta- neutral process prior which allow for left censoring. The Bayesian nonparametyric estimator is obtained as is the corresponding product limit estimator based on left censored data.

Keywords: Beta-neutral process; Dirichlet process; Neutral to the left process; Neutral to the right process; Product limit estimator.

1. In t roduct ion

Ware and DeMets (1976) introduced a nonparametric estimator for a distribution function based on arbitrary left censored observations. They obtained their estimator by considering a reversal of time and treating the resulting data as being fight censored and then use theory of Kaplan and Meier (1958). The aim of this paper is to generalise and derive the estimator of Ware and DeMets by working within a Bayesian nonparametric framework. A new process is introduced, which is shown to be neutral to the left, and which is taken as a prior on the space of distribution functions. This prior is updated to the posterior given arbitrary left censored observations from which the estimators are derived. '

Definition 1. (Doksum, 1974) The random distribution function F(t) is said to be neutral to the right if the normalised increments

* Addressforcorrespondence:DipartimentodiEconomiaPolJticaeMetodiQuantitativi, Universit~ di Pavia, Via S. Felice, 27100 Pavia, Italy. E_mail: [email protected]

285

Page 2: A Bayesian nonparametric estimator based on left censored data

S. W A L K E R �9 P. M U L I E R E

F(tt),[F(t2)- F(q)]/[1 - F(tl) ] ..... [F(t k )- F(tk_l)]/[1- F(tk_,) ]

are indipendent for all t~ <...< t k. The random distribution function F(t) is said to be neutral to the left if

F(t k ),[F(t k ) - F(tk_ , ) ] /F(t k ) ..... [F(t 2 ) - F(t, )]/F(t 2 )

are independent.

The Dirichlet process is conjugate to exact observations whereas a neutral to the right process is conjugate (that is, also neutral to the right) to arbitrary right censored observations. It is anticipated therefore that a neutral to the left process will be conjugate to arbitrary left censored data. In particular we define a neutral to the left process which can be seen as corresponding to the beta process (a neutral to the right process) introduced by Hjort (1990). For further discussion and cases involving left censored observations, see Andersen et al. (1993).

2. Preliminaries

Let f~ = (0,~) and 13 be the Borel o-field in ~ . Let c~(.) be a finite non-null measure defined on (fL 6) and ~( . ) a gamma process with shape measure ~(.). That is,

I~a( O,t]-aa( O,s] ~gamma( oc( O,t]-c~( O,s],l), for all s< t,

and ~ ( . ) is an indipendent increment process. Ferguson (1973) defined the Dirichlet process on (fL b) by

Fa( t ) = Ua( O,t]/laa( O, oo), (1)

Lo (1993) defined a (fight) beta-neutral process based on the beta process of Hjort (1990). These processes are generalisations of the Dirichlet process and can be seen as arising by rewriting (1) in the form

Fa(t)= 1- I'I (I-ta[y*,~)/ /~a[y,~176 y :y<_t "

(2)

286

Page 3: A Bayesian nonparametric estimator based on left censored data

A BAYESIAN NONPARAMETRIC ESTIMATOR

where the product is over all (random) y such that

Al.xa (y)=/ . . ta [y, ~ ) - / . t a [ y * , ' : " ) > 0 (3)

and y < y* are the locations of two consecutive jumps. The beta-neutral generalisation of the Dirichlet process follows by introducing another gamma process Ix.(.), independent of Ix,(.), where lxa(.) is a finite measure defined on (f2, g). Then ~2~ is replaced by

1 - - r / X = [ Y * ' ~ ) + / 3 . # [ y , ~ ) ] Fc,..8(t)= - 11 ~ - - - , y:y<t: IXc~[y, )+/.t#[y,"') I (4)

written F.~ ~ RBN(ct,[3). Now it is seen that (4) can be written as

A/-ta (y) Fa,~(t ) = l- ~< ~l- y':y-t L /.tc~ [Y, ~ ) +/.t# [y, '~')

which is more convenient to write as

Fa.o(t ) = 1 - I'I (1-dV(s)), [o,,]

where 1-lt...irepresents a product-integral (Gill and Johansen, 1990) and V(.) is a beta process (Hjort, 1990); that is, an independent increment process such that at an infinitesimal level the jumps follow a beta distribution,

dV(s) ~ beta(dot(s), a[s, ~)+ [3[s, ~)- dot(s)).

Definition 2. (Hjort, 1990) Let V0be a cumulative hazard with a finite number of jumps taking place at t~ < t 2 .... and let c(.) be a piecewise continuous, nonnegative function on [0,,,o). Say that the L6vy process V is a beta process with parameters c(.) and V0 (.), and write

V - beta{c(.), V0 (.)}

to indicate this, if the following hold: V has L6vy representation

E(exp(-OV(t)))=[j:~<_ EJi ]exp(-I: (1-e-~

287

Page 4: A Bayesian nonparametric estimator based on left censored data

S. WALKER �9 P. MULIERE

where the S. are independent beta variables 1

Sj ~ beta(c(tj )Vo{tjI, c(tj X I - Vo{tj}))

and

dLt( s) = ~ o c( z)s-'( 1 - s) c( z)-, dVo.c( z)ds

in which Vo.c(t) = Vo(t) - X t a Vo{t ] is V o with the jumps removed.

The beta-neutral generalisation of the Dirichlet process is seen to be more restrictive than that of the beta process. The beta process allows

dV(s) ~ beta(c(s)dVo(s),c(sXl-dVo(s)) )

for an arbitrary nonnegative function c(.). This is a useful specification since then

EF~ ~(t)= 1 - rI (1- dV(s)), ' [ 0 , t ]

so that if dVo(s) = dG(s)/G[s, oo), where G(.) is a distribution function, then E F a = G. Howewer, for such a parameterisation with the beta-neutral process it is necessai~y for

c(.O =

implying that c(.) must be nonincreasing. The beta-neutral representation in terms ofa L6vy process Z; F(O = 1 - exp(-Z(O) is detailed in Walker and Muliere (1997).

3. A Left Beta-neutral Process

In this section we introduce a neutral to the left process based on the beta process. Another way of writing (1) is given by

(,)

288

Page 5: A Bayesian nonparametric estimator based on left censored data

A BAYESIAN NONPARAMETRIC ESTIMATOR

were now the product is over all (random) y such that

Aa(y ) = p~(0, y]- /an(0, y*]> 0 (6)

and now y* < y are the locations of two consecutive jumps. A generalisation of (5), similar to the generalisation given by (4), again follows from the introduction of the gamma process bta (.). Then (5) is replaced by

= n :/~(O'Y*]+P#(O'Y]I Fa'g(t) r'Y~>t l 12a(O' Y]+ ##(0, y] J' (7)

which can be written as

{ z~kua(Y) 1l Fa.f(t)=y~>t 1 / / a ( 0 ~ ( 0 , y ] j .

Therefore F a(t ) can easily be identified as a neutral to the left process and called a left beta-n~tral process, written F a ~ LBN(a, fl). The property of neutral to the left can be seen clearly by writing-

Fa.f(t)=(tN ) (1-dW(s)), (8)

where W(.) is a beta process with

dW(s) ~ beta(dot(s), ot(O, s] + fl(O, s] - dot(s)).

Note that here dW(s) represents W(s-e, s] and likewise for a(.) and so on. Next we show that F ~ is almost surely a random distribution function.

Lemma.

Flint-*** Fa,#(t)= limit** rI (1-dW(s) )= la.s. (t,|

Proof. It is easy to see that F a(t) is nondecreasing a.s. and that Fo.a(t) _< 1 a.s.

289

Page 6: A Bayesian nonparametric estimator based on left censored data

Additionally

S . W A L K E R �9 P . M U L I E R E

EFa#(t)= I'I (1 dot(s) I ( , , - ) t

which, assuming a(.) to be continuous, gives

da( ) EFa.#(t)=exp[-f,'a(O,s]+ fl(O,s])

Therefore

EFa,#(t) > expI-St~dot(s)/a(O, s]l = o:(0, t]/a(O, oo)

and, since E F a(t) _< 1, this implies that E F a(t) ----> 1 as t ---> 1 as t ---> ~. From the bounded c6nvergence theorem it follows"that F o( t ) ---> C a.s. and EF o( t ) --->

. o.,p t,p .

EC where C _ 1 a.s. Therefore EC = 1 whmh ensures that C = 1 a.s., completing the proof.

For a general beta process to work here, say

dW(s)~beta(c(s)dWo(s),c(s)(1-dWo(s))),

it is required that

lim,_~. ~t" dW~ = O.

This condition is satisfied if dW o (s) = dG (s) / G(o, s] for some probability distribution G(.) defined on (0, ~). This is a particularly useful specification since it implies that EF.a = G.

4. Posterior Distributions

Let I11 ..... Yn be iid observations from F, an unknown distribution on (0, ~). The Y~s are subject to random left censoring by L I ..... L , assumed to be iid from some distribution H and independent of the Y s. Observed are XI ..... X where

Xi = max{Yi,Li}, i= l ..... n.

290

Page 7: A Bayesian nonparametric estimator based on left censored data

A BAYESIAN NONPARAMETRIC ESTIMATOR

If X = Y~ then put 6 i = 0 else 6i = 1 so that the data can be represented by

. . . . . ( x o , a , ) .

In the following let P represent the posterior process of F given data. According to Hjort (1990, Corollary 4.1), and reiterated by Lo (1993), if F.~~ RBN(a, fl) then

~'a,# ~ RBN(a + ~o~,f l+ ~Y~ t (9)

where {Yu} ({Yr}) is the set of uncensored (right censored) observations. The corresponding result for the left beta-neutral process is given by the following theorem.

Theorem. If F a ~ LBN(a, fl) then

where {Yu} ({Y,}) is the set of uncensored (left censored) observations.

Proof. The most convenient proof is to consider F'~(- co, -t] = 1-Faa(O,t], t > 0, - - o o ' - w h e r e Fa~ here is LBN(a, fl). Then F ' a is RBN(a, fl) on (- ,0) and left censored

observat]'ons in (0, oo) correspond to i:ight censored observations in (- ,,~, 0).

Corollary. The Bayes estimate for F with respect to a quadratic loss function is given by

1 da(s) + du(s) } E(F(t) I(XI,6,) ..... (X , ,S , ) )= ,II a(O,s]+u(s)+fl(O,s]+l(s)' (10)

where u(s) (l(s)) is the uncensored (censored) process, that is,

U(t) = f l ( X i <- t, t~ i = 0), l(t) = f I ( X i < t, (~i = 1 ) i -1 i=1

and 6i=0 indicates no censoring occured for X but 6i=1 indicates that X is left censored.

291

Page 8: A Bayesian nonparametric estimator based on left censored data

S. W A L K E R �9 P. M U L I E R E

In particular, if a(.) and fl(.) are taken to be null measures, corresponding to vague prior information, then (10) becomes

F . ( / )= I-I 1 (11) (t,**) u(s) + l(s)J

which can be identified as the nonparametric maximum likelihood estimate. Now let tl< t2< ... < t,(k <_n) represent the uncensored observations. If tk§ I is taken to be .o then (11) can be written as

l 1-u(tj- tj) (12)

so that F( t ) = 1 for t > t k. I fu ( t I) = 0 then define [:n(t) = 0 for t < t r Here (12) as

the nonparametric estimator given by Ware and DeMets (1976). Ware and DeMets obtained their estimator by considering a reversal of time and using Kaplan and Meier results for fight censored data. We have derived the estimator by considering a generalisation of the Dirichlet process and in particular (10) generalises the Ware and DeMets estimator for left censored data in the same way that Hjort and Lo generaiised the Kaplan-Meier estimator for right censored data. Finally in this Section we mention that we do not believe it is possible to introduce a neutral process which is conjugate to both left and fight censored observations.

5. Example

Wagner and Altman (1973) report an estimate for the distribution of the time at which baboons descend from their sleeping site in the Amboseli Reserve, Kenya. The estimation is complicated by some of the observations being left censored due to the observers arriving after the descent. The data is given in Ware and DeMets (1976, Table 1 ). Figure 1 gives our estimate for the distribution of descent times using the estimator (11).

A nonparametric estimate of the mean is given by

f i = ~, (t i - t i _ l ) x ( 1 - .~.(t i)) , i = l

292

Page 9: A Bayesian nonparametric estimator based on left censored data

A BAYESIAN NON'PARAMETRIC ESTIMATOR

f

l / ' / j "

._.J

f--~-" i

1

l

1

J

| ~ "

Y ~~

J / i

Time ot ~a7

Fig. 1 - Estimated descent time distribution of baboons in Amboseli Reserve, Kenya.

which, for the above example, is evaluated as 8.28. If it is required to obtain a full Bayesian analysis then it is useful to be able to sample from the approximate posterior distribution of some functional of F, say [0(F)ldata]. Here we propose a Bayesian bootstrap for left censored data which can be seen as corresponding to the Bayesian bootstrap for right censored data introduced by Lo (1993). Define, forj -- 1 ..... k,

n /1

Uj = Z ] ( X i = t j , ~ i = O) a n d Tj = ~ l ( X i <_ t j ) . i=1 i=1

Then define the independent beta random variables, forj = 1 ..... k,

Wj ~ b e t a ( U j , Tj - U j ).

The left censored data Bayesian bootstrap is given by the following algorithm,

1. simulate W 1 ..... W e

293

Page 10: A Bayesian nonparametric estimator based on left censored data

S. W A L K E R . P. M U L I E R E

2. define the random distribution ~ ( t ) = 1-[ , j >, ( 1 - Wj ),

3. evaluate 0* = 0(F~) and

4. repeat 1, 2 and 3 B times to get 0~ ..... 0~ and use the empirical distribution function to approximate the posterior distribution [0(F)ldata].

This procedure was used to obtain a sample of size B = 1,000 from the approximate posterior distribution for the mean of F. A histogram representation of the sample is given in Figure 2

8.0 8.1 8.2 8.3 8.4 8.5

Time of day

Fig. 2 - Estimated posterior distribution of the mean of descent time distribution of baboons in Amboseli Reserve, Kenya.

294

Page 11: A Bayesian nonparametric estimator based on left censored data

A BAYESIAN NONPARAMETRIC ESTIMATOR

The mean of this bootstrap sample is 8.28 which is identical to the value obtained analytically.

Acknowledgemen t s

The work of the first author is financed by an EPSRC ROPA. The work was completed during a visit by the second author to Imperial College, London. The authors are grateful for the detailed comments and suggestions of two referees.

REFERENCES

ANDEgSEN, P. K., BORaAN, O., GILL, R.D. and KEIDING, N. (1993), Statistical Models Based on Counting Processes. Springer-Verlag.

DOKSUM, K.A. (1974), Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab. 2, 183-201.

FERGUSON, T.S. (1973), A Bayesian analysis of some nonparametric problems. Ann. Statist. 1,209-230.

GILL, R.D. and JOHANSEN, S. (1990), A survey of product integration with a view toward application in survival analysis. Ann. Staiist. 18, 1501-1555.

I-IJoRT, N.L. (1990), Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18, 1259 -1294.

KAPLAN, E.L. and MEER, P. (1958), Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, 457-481.

Lo, A.Y. (1993), A Bayesian bootstrap for censored data. Ann. Statist. 21,100-123. WAGNER, S.S. and ALTMANN,. S.A. (1973), What time do the baboons come down from the

trees? (An estimation problem). Biometrics 29, 623-635. WALKER, S.G. and MULIERE, E (1997), Beta-Stacy processes and a generalisation of the

Polya-urn scheme. Ann. Statist., 25, 1762-1780. WARE, J.H. and DEM~Ts, D.L. (1976), Reanalysis of some Baboon descent data. Biometrics

32, 459-463.

295