27
Posterior concentration rates for counting processes with Aalen multiplicative intensities Sophie Donnet, Vincent Rivoirard, Judith Rousseau and Catia Scricciolo Les Cahiers de la Chaire / N°A-32

Les Cahiers de la Chaire / N°A-32 - nouvellesdonnees.comnouvellesdonnees.com/wp-content/uploads/2016/05/CAHIER_EGND_A… · Posterior concentration rates for counting processes with

Embed Size (px)

Citation preview

Posterior concentration rates for counting processeswith Aalen multiplicative intensities

Sophie Donnet, Vincent Rivoirard, Judith Rousseau and Catia Scricciolo

Les Cahiers de la Chaire / N°A-32

Posterior concentration rates for counting

processes with Aalen multiplicative intensities

Sophie Donnet ⇤ and Vincent Rivoirard † and Judith Rousseau ‡ and Catia Scricciolo §

Abstract. We provide general conditions to derive posterior concentration rates

for Aalen counting processes. The conditions are designed to resemble those pro-

posed in the literature for the problem of density estimation, for instance in Ghosal

et al. (2000), so that existing results on density estimation can be adapted to

the present setting. We apply the general theorem to some prior models includ-

ing Dirichlet process mixtures of uniform densities to estimate monotone non-

increasing intensities and log-splines.

Keywords: Aalen model, counting processes, Dirichlet process mixtures, posterior

concentration rates

1 Introduction

Estimation of the intensity function of a point process is an important statistical problemwith a long history. Most methods were initially employed for estimating intensitiesassumed to be of parametric or nonparametric form in Poisson point processes. However,in many fields such as genetics, seismology and neuroscience, the probability of observinga new occurrence of the studied temporal process may depend on covariates and, in thiscase, the intensity of the process is random so that such a feature is not capturedby a classical Poisson model. Aalen models constitute a natural extension of Poissonmodels that allow taking into account this aspect. Aalen (1978) revolutionized pointprocesses analysis developing a unified theory for frequentist nonparametric inferenceof multiplicative intensity models which, besides the Poisson model and other classicalmodels such as right-censoring and Markov processes with finite state space, describedin Section 1.1, encompass birth and death processes as well as branching processes. Werefer the reader to Andersen et al. (1993) for a presentation of Aalen processes includingvarious other illustrative examples. Classical probabilistic and statistical results aboutAalen processes can be found in Karr (1991), Andersen et al. (1993), Daley and Vere-Jones (2003, 2008). Recent nonparametric frequentist methodologies based on penalizedleast-squares contrasts have been proposed by Brunel and Comte (2005, 2008), Comteet al. (2011) and Reynaud-Bouret (2006). In the high-dimensional setting, more specificresults have been established by Gaı↵as and Guilloux (2012) and Hansen et al. (2012)who consider Lasso-type procedures.

Bayesian nonparametric inference for inhomogeneous Poisson point processes has

MIA. INRA. UMR0518, AgroParisTech, [email protected]

CEREMADE, Universite Paris Dauphine, [email protected]

ENSAE-CREST, [email protected]

§

Department of Decision Sciences, Bocconi University, [email protected]

2 Posterior concentration rates for Aalen counting processes

been considered by Lo (1982) who develops a prior-to-posterior analysis for weightedgamma process priors to model intensity functions. In the same spirit, Kuo and Ghosh(1997) employ several classes of nonparametric priors, including the gamma, the betaand the extended gamma processes. Extension to multiplicative counting processes hasbeen treated in Lo and Weng (1989), who model intensities as kernel mixtures withmixing measure distributed according to a weighted gamma measure on the real line.Along the same lines, Ishwaran and James (2004) develop computational procedures forBayesian non- and semi-parametric multiplicative intensity models using kernel mix-tures of weighted gamma measures. Other papers have mainly focussed on exploringprior distributions on intensity functions with the aim of showing that Bayesian non-parametric inference for inhomogeneous Poisson processes can give satisfactory resultsin applications, see, e.g., Kottas and Sanso (2007).

Surprisingly, leaving aside the recent work of Belitser et al. (2013), which deals withoptimal convergence rates for estimating intensities in inhomogeneous Poisson processes,there are no results in the literature concerning aspects of the frequentist asymptoticbehaviour of posterior distributions, like consistency and rates of convergence, for in-tensity estimation of general Aalen models. In this paper, we extend their results togeneral Aalen multiplicative intensity models. Quoting Lo and Weng (1989), “the ideaof our approach is that estimating a density and estimating a hazard rate are analogousa↵airs, and a successful attempt of one generally leads to a feasible approach for theother”. Thus, in deriving general su�cient conditions for assessing posterior contrac-tion rates in Theorem 2.1 of Section 2, we attempt at giving conditions which resemblethose proposed by Ghosal et al. (2000) for density estimation with independent andidentically distributed (i.i.d.) observations. This allows us to then derive in Section 3posterior contraction rates for di↵erent families of prior distributions, such as Dirichletmixtures of uniform densities to estimate monotone non-increasing intensities and log-splines, by an adaptation of existing results on density estimation. Detailed proofs ofthe main results are reported in Section 4. Auxiliary results concerning the control ofthe Kullback-Leibler divergence for intensities in Aalen models and existence of tests,which, to the best of our knowledge, are derived here for the first time and can also beof independent interest, are presented in Section 5 and in Section 6.

1.1 Notation and set-up

We observe a counting process N and denote by (Gt)t its adapted filtration. Let ⇤ bethe compensator of N . We assume it satisfies the condition ⇤t < 1 almost surely forevery t. Recall that (Nt � ⇤t)t is a zero-mean (Gt)t-martingale. We assume that Nobeys the Aalen multiplicative intensity model

d⇤t = Yt�(t)dt,

where � is a non-negative deterministic function called intensity function in the sequeland (Yt)t is a non-negative predictable process. Informally,

P[N [t, t+ dt] � 1 | Gt� ] = Yt�(t)dt, (1.1)

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 3

see Andersen et al. (1993), Chapter III. In this paper, we are interested in asymptoticresults: both N and Y depend on an integer n and we study estimation of � (notdepending on n) when T is kept fixed and n ! 1. The following special cases motivatethe interest in this model.

Inhomogeneous Poisson processes

We observe n independent Poisson processes with common intensity �. This model isequivalent to the model where we observe a Poisson process with intensity n⇥ �, so itcorresponds to the case Yt ⌘ n.

Survival analysis with right-censoring

This model is popular in biomedical problems. We have n patients and, for each patienti, we observe (Zi, �i), with Zi = min{Xi, Ci}, where Xi represents the lifetime ofthe patient, Ci is the independent censoring time and �i = 1XiCi . In this case, weset N i

t = �i ⇥ 1Zit, Y it = 1Zi�t and � is the hazard rate of the Xi’s: if f is the

density of X1, then �(t) = f(t)/P(X1 � t). Thus, N (respectively Y ) is obtained byaggregating the n independent processes N i’s (respectively the Y i’s): for any t 2 [0, T ],Nt =

Pni=1 N

it and Yt =

Pni=1 Y

it .

Finite state Markov processes

Let X = (X(t))t be a Markov process with finite state space S and right-continuoussample paths. We assume the existence of integrable transition intensities �hj from stateh to state j for h 6= j. We assume we are given n independent copies of the processX, denoted by X1, . . . , Xn. For any i 2 {1, . . . , n}, let N ihj

t be the number of directtransitions for Xi from h to j in [0, t], for h 6= j. Then, the intensity of the multivariatecounting process Ni = (N ihj)h 6=j is (�hjY ih)h 6=j , with Y ih

t = 1

{Xi(t�)=h}. As before,we can consider N (respectively Y h) by aggregating the processes Ni (respectively theY ih’s): Nt =

Pni=1 N

it, Y

ht =

Pni=1 Y

iht and t 2 [0, T ]. The intensity of each component

(Nhjt )t of (Nt)t is then (�hj(t)Y h

t )t. We refer the reader to Andersen et al. (1993), p. 126,for more details. In this case, N is either one of the Nhj ’s or the aggregation of someprocesses for which the �hj ’s are equal.

We now state some conditions concerning the asymptotic behavior of Yt under the

true intensity function �0. Define µn(t) := E(n)�0

[Yt] and µn(t) := n�1µn(t). We assumethe existence of a non-random set ⌦ ✓ [0, T ] such that there are constants m1 and m2

satisfying

m1 inft2⌦

µn(t) supt2⌦

µn(t) m2 for every n large enough, (1.2)

and there exists ↵ 2 (0, 1) such that, if �n := {supt2⌦ |n�1Yt � µn(t)| ↵m1} \{supt2⌦c Yt = 0}, where ⌦c is the complement of ⌦ in [0, T ], then

limn!1

P(n)�0

(�n) = 1. (1.3)

We only consider estimation over ⌦ (N is almost surely empty on ⌦c) and define theparameter space as F = {� : ⌦ ! R+ |

R⌦ �(t)dt < 1}. Let �0 2 F .

4 Posterior concentration rates for Aalen counting processes

For inhomogeneous Poisson processes, conditions (1.2) and (1.3) are trivially satisfiedfor ⌦ = [0, T ] since Yt ⌘ µn(t) ⌘ n. For right-censoring models, with Y i

t = 1Zi�t,i = 1, . . . , n, we denote by ⌦ the support of the Zi’s and by M⌦ = max⌦ 2 R+.Then, (1.2) and (1.3) are satisfied if M⌦ > T or M⌦ T and P(Z1 = M⌦) > 0 (theconcentration inequality is implied by an application of the DKW inequality).

We denote by k · k1 the L1-norm over F : for all �, �0 2 F , k� � �0k1 =R⌦ |�(t) �

�0(t)|dt.

2 Posterior contraction rates for Aalen counting processes

In this section, we present the main result providing general su�cient conditions forassessing concentration rates of posterior distributions of intensities in general Aalenmodels. Before stating the theorem, we need to introduce some more notation.

For any � 2 F , we introduce the following parametrization � = M� ⇥ �, whereM� =

R⌦ �(t)dt and � 2 F1, with F1 = {� 2 F :

R⌦ �(t)dt = 1}. For the sake

of simplicity, in this paper we restrict attention to the case where M� and � are a

priori independent so that the prior probability measure ⇡ on F is the product measure⇡1 ⌦ ⇡M , where ⇡1 is a probability measure on F1 and ⇡M is a probability measure onR+. Let vn be a positive sequence such that vn ! 0 and nv2n ! 1. For every j 2 N,we define

Sn,j =�� 2 F1 : k�� �0k1 2(j + 1)vn/M�0

,

where M�0 =R⌦ �0(t)dt and �0 = M�1

�0�0. For H > 0 and k � 2, if k[2] = min{2` : ` 2

N, 2` � k}, we define

Bk,n(�0; vn, H) =

⇢� 2 F1 : h2(�0, �) v2n/(1 + log k�0/�k1),

max2jk[2]

Ej(�0; �) v2n, k�0/�k1 nH ,�����1

H

�,

where h2(�0, �) =R⌦(p�0(t)�

p�(t))2dt is the squared Hellinger distance between �0

and �, k · k1

stands for the sup-norm and Ej(�0; �) :=R⌦ �0(t)| log �0(t)� log �(t)|jdt.

In what follows, for any set ⇥ equipped with a semi-metric d and any real number ✏ > 0,we denote by D(✏, ⇥, d) the ✏-packing number of ⇥, that is, the maximal number ofpoints in ⇥ such that the d-distance between every pair is at least ✏. Since D(✏, ⇥, d) isbounded above by the (✏/2)-covering number, namely, the minimal number of balls ofd-radius ✏/2 needed to cover ⇥, with abuse of language, we will just speak of coveringnumbers. We denote by ⇡(· | N) the posterior distribution of the intensity function �,given the observations of the process N .

Theorem 2.1. Assume that conditions (1.2) and (1.3) are satisfied and that, for somek � 2, there exists a constant C1k > 0 such that

E(n)�0

"✓Z

⌦[Yt � µn(t)]

2dt

◆k# C1kn

k. (2.1)

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 5

Assume that the prior ⇡M on the mass M is absolutely continuous with respect toLebesgue measure and has positive and continuous density on R+, while the prior ⇡1 on� satisfies the following conditions for some constant H > 0:

(i) there exists Fn ✓ F1 such that, for a positive sequence vn = o(1) and v2n �(n/ log n)�1,

⇡1 (Fcn) e�(0+2)nv2

n⇡1(Bk,n(�0; vn, H)),

with

0 = m22M�0

(4

m1

1 + log

✓m2

m1

◆�✓1 +

m22

m21

◆+

m2(2M�0 + 1)2

m21M

2�0

), (2.2)

and, for any ⇠, � > 0,

logD(⇠, Fn, k · k1) n� for all n large enough;

(ii) for all ⇣, � > 0, there exists J0 > 0 such that, for every j � J0,

⇡1(Sn,j)

⇡1(Bk,n(�0; vn, H)) e�(j+1)2nv2

n

andlogD(⇣jvn, Sn,j \ Fn, k · k1) �(j + 1)2nv2n.

Then, there exists a constant J1 > 0 such that

E(n)�0

[⇡(� : k�� �0k1 > J1vn | N)] = O((nv2n)�k/2).

The proof of Theorem 2.1 is reported in Section 4. To the best of our knowledge, theonly other paper dealing with posterior concentration rates in related models is that ofBelitser et al. (2013), where inhomogeneous Poisson processes are considered. Theorem2.1 di↵ers in two aspects from their Theorem 1. Firstly, we do not confine ourselvesto inhomogeneous Poisson processes. Secondly and more importantly, our conditionsare di↵erent: we do not assume that �0 is bounded below away from zero and we donot need to bound from below the prior mass in neighborhoods of �0 for the sup-norm,rather the prior mass in neighborhoods of �0 for the Hellinger distance, as in Theorem2.2 of Ghosal et al. (2000). In Theorem 2.1, our aim is to propose conditions to assessposterior concentration rates for intensity functions resembling those used in the densitymodel obtained by parameterizing � as � = M� ⇥ �, with � a probability density on ⌦.

Remark 2.1. If � 2 B2,n(�0; vn, H) then, for every integer j > 2, Ej(�0; �) Hj�2v2n(log n)

j�2 so that, using Proposition 4.1, if we replace Bk,n(�0; vn, H) withB2,n(�0; vn, H) in the assumptions of Theorem 2.1, we obtain the same type of conclu-sion: for any k � 2 such that condition (2.1) is satisfied, we have

E(n)�0

[⇡(� : k�� �0k1 > J1vn | N)] = O((nv2n)�k/2(log n)k(k[2]�2)/2),

with an extra (log n)-term on the right-hand side of the above equality.

6 Posterior concentration rates for Aalen counting processes

Remark 2.2. Condition (2.1) is satisfied for the above considered examples: it is ver-ified for inhomogeneous Poisson processes since Yt = n for every t. For the censoringmodel, Yt =

Pni=1 1Zi�t. For every i = 1, . . . , n, we set Vi = 1Zi�t �P(Z1 � t). Then,

for k � 2,

E(n)�0

"✓Z

⌦[Yt � µn(t)]

2dt

◆k#= E(n)

�0

2

64

0

@Z T

0

nX

i=1

Vi

!2

dt

1

Ak3

75

.Z T

0E(n)�0

2

4

nX

i=1

Vi

!2k3

5 dt

.Z T

0

0

@nX

i=1

E(n)�0

[V 2ki ] +

nX

i=1

E(n)�0

[V 2i ]

!k1

A dt . nk

by Holder and Rosenthal inequalities (see, for instance, Theorem C.2 of Hardle et al.(1998)). Under mild conditions, similar computations can be performed for finite stateMarkov processes.

Conditions of Theorem 2.1 are very similar to those considered for density estimationin the case of i.i.d. observations. In particular,

Bn =

⇢� : h2(�0, �)

�����0

����1

v2n,

�����0

����1

nH , k�k1

H

is included in Bk,n

��0; vn(log n)1/2, H

�as a consequence of Theorem 5.1 of Wong and

Shen (1995). Apart from the mild constraints���0/�

��1

nH and k�k1

H, the setBn is the same as the one considered in Theorem 2.2 of Ghosal et al. (2000). The otherconditions are essentially those of Theorem 2.1 in Ghosal et al. (2000).

3 Illustrations with di↵erent families of priors

As discussed in Section 2, the conditions of Theorem 2.1 to derive posterior contractionrates are very similar to those considered in the literature for density estimation sothat existing results involving di↵erent families of prior distributions can be adapted toAalen multiplicative intensity models. Some applications are presented below.

3.1 Monotone non-increasing intensity functions

In this section, we deal with estimation of monotone non-increasing intensity func-tions, which is equivalent to considering monotone non-increasing density functions �in the above described parametrization. To construct a prior on the set of monotonenon-increasing densities over [0, T ], we use their representation as mixtures of uniform

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 7

densities as in Williamson (1956) and consider a Dirichlet process as a prior on themixing distribution:

�(·) =Z

1

0

1(0, ✓)(·)✓

dP (✓), P | A, G ⇠ DP(AG), (3.1)

where G is a distribution on [0, T ] having density g with respect to Lebesgue measure.This prior has been studied by Salomond (2013) for estimating monotone non-increasingdensities. Here, we extend his results to the case of monotone non-increasing intensityfunctions of Aalen processes. We consider the same assumption on G as in Salomond(2013): there exist a1, a2 > 0 such that

✓a1 . g(✓) . ✓a2 for all ✓ in a neighbourhood of 0. (3.2)

The following result holds.

Corollary 3.1. Assume that the counting process N verifies conditions (1.2) and (1.3)and that inequality (2.1) is satisfied for some k � 2. Consider a prior ⇡1 on � satisfyingconditions (3.1) and (3.2) and a prior ⇡M on M� that is absolutely continuous withrespect to Lebesgue measure with positive and continuous density on R+. Suppose that�0 is monotone non-increasing and bounded on R+. Let ✏n = (n/ log n)�1/3. Then,there exists a constant J1 > 0 such that

E(n)�0

[⇡(� : k�� �0k1 > J1✏n | N)] = O((n✏2n)�k/2(log n)k(k[2]�2)/2).

The proof is reported in Section 4.

3.2 Log-spline and log-linear priors on �

For simplicity of presentation, we set T = 1. We consider a log-spline prior of order qas in Section 4 of Ghosal et al. (2000). In other words, � is parameterized as

log �✓(·) = ✓tBJ(·)� c(✓), with exp (c(✓)) =

Z 1

0e✓

tBJ (x)dx,

where BJ = (B1, . . . , BJ) is the q-th order B-spline defined in de Boor (1978) associatedwith K fixed knots, so that J = K + q � 1, see Ghosal et al. (2000) for more details.Consider a prior on ✓ in the form J = Jn = bn1/(2↵+1)c, ↵ 2 [1/2, q] and, conditionallyon J , the prior is absolutely continuous with respect to Lebesgue measure on [�M,M ]J

with density bounded from below and above by cJ and CJ , respectively. Consider anabsolutely continuous prior with positive and continuous density on R+ on M�. Wethen have the following posterior concentration result.

Corollary 3.2. For the above prior, if k log �0k1 < 1 and �0 is Holder with regularity↵ 2 [1/2, q], then under condition (2.1), there exists a constant J1 > 0 so that

E(n)�0

[⇡(� : k�� �0k1 > J1n�↵/(2↵+1) | N)] = O(n�k/(4↵+2)(log n)k(k[2]�2)/2).

8 Posterior concentration rates for Aalen counting processes

Proof. Set ✏n = n�↵/(2↵+1). Using Lemma 4.1, there exists ✓0 2 RJ such that h(�✓0 , �0) .k log �✓0 � log �0k1 . J�↵, which combined with Lemma 4.4 leads to

⇡1(Bk,n(�0; ✏n, H) � e�C1n✏2n .

Lemma 4.5 together with Theorem 4.5 of Ghosal et al. (2000) controls the entropy ofSn,j and its prior mass for j larger than some fixed constant J0.

With such families of priors, it is more interesting to work with non-normalized �✓.We can write

�A,✓(·) = A exp�✓tBJ(·)

�, A > 0,

so that a prior on � is defined as a prior on A, say ⇡A absolutely continuous with respectto Lebesgue measure having positive and continuous density and the same type of priorprior on ✓ as above. The same result then holds. It is not a direct consequence ofTheorem 2.1, since M�A,✓ = A exp(c(✓)) is not a priori independent of �A,✓. However,introducing A allows to adapt Theorem 2.1 to this case. The practical advantage of thelatter representation is that it avoids computing the normalizing constant c(✓).

In a similar manner, we can replace spline basis with other orthonormal bases, asconsidered in Rivoirard and Rousseau (2012), leading to the same posterior concentra-tion rates as in density estimation. More precisely, consider intensities parameterizedas

�✓(·) = ePJ

j=1 ✓j�j(·)�c(✓), ec(✓) =

Z

RJ

ePJ

j=1 ✓j�j(x)dx,

where (�j)1j=1 is an orthonormal basis of L2([0, 1]), with �1 = 1. Write ⌘ = (A, ✓),with A > 0, and

�⌘(·) = AePJ

j=1 ✓j�j(·) = Aec(✓)�✓(·).Let A ⇠ ⇡A and consider the same family of priors as in Rivoirard and Rousseau (2012):

J ⇠ ⇡J ,

j�✓j/⌧0ind⇠ g, j J, and ✓j = 0, 8 j > J,

where g is a positive and continuous density on R and there exist s � 0 and p > 0 suchthat

log ⇡J(J) ⇣ �J(log J)s, log g(x) ⇣ �|x|p, s = 0, 1,

when J and |x| are large. Rivoirard and Rousseau (2012) prove that this prior leads tominimax adaptive posterior concentration rates over collections of positive and Holderclasses of densities in the density model. Their proof easily extends to prove assumptions(i) and (ii) of Theorem 2.1.

Corollary 3.3. Consider the above described prior on an intensity function � on [0, 1].Assume that �0 is positive and belongs to a Sobolev class with smoothness ↵ > 1/2.Under condition (2.1), if � < 1/2 + ↵, there exists a constant J1 > so that

E(n)�0

[⇡(� : k�� �0k1 > J1(n/ log n)�↵/(2↵+1)(log n)(1�s)/2 | N)]

= O(n�k/(4↵+2)(log n)k(k[2]�2)/2).

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 9

Note that the constraint � < ↵ + 1/2 is satisfied for all ↵ > 1/2 as soon as � < 1and, as in Rivoirard and Rousseau (2012), the prior leads to adaptive minimax posteriorconcentration rates over collections of Sobolev balls.

4 Proofs

To prove Theorem 2.1, we use the following intermediate results whose proofs are post-poned to Section 5. The first one controls the Kullback-Leibler divergence and absolutemoments of `n(�0)� `n(�), where `n(�) is the log-likelihood for Aalen processes evalu-ated at �, whose expression is given by

`n(�) =

Z T

0log(�(t))dNt �

Z T

0�(t)Ytdt,

see Andersen et al. (1993).

Proposition 4.1. Let vn be a positive sequence such that vn ! 0 and nv2n ! 1. Forany k � 2 and H > 0, define the set

Bk,n(�0; vn, H) = {� : � 2 Bk,n(�0; vn, H), |M� �M�0 | vn}.

Under assumptions (1.2) and (2.1), for all � 2 Bk,n(�0; vn, H), we have

KL(�0; �) 0nv2n and Vk(�0; �) (nv2n)

k/2,

where 0, depend only on k, C1k, H, �0, m1 and m2. An expression of 0 is given in(2.2).

The second result establishes the existence of tests that are used to control thenumerator of posterior distributions. We use that, under assumption (1.2), on the set�n,

8 t 2 ⌦, (1� ↵)µn(t) Yt

n (1 + ↵)µn(t). (4.1)

Proposition 4.2. Assume that conditions (i) and (ii) of Theorem 2.1 are satisfied. Forany j 2 N, define

Sn,j(vn) = {� : � 2 Fn and jvn < k�� �0k1 (j + 1)vn}.

Then, under assumption (1.2), there are constants J0, ⇢, c > 0 such that, for everyinteger j � J0, there exists a test �n,j so that, for a positive constant C,

E(n)�0

[1�n�n,j ] Ce�cnj2v2n , sup

�2Sn,j(vn)E�[1�n(1� �n,j)] Ce�cnj2v2

n , J0 j ⇢

vn,

and

E(n)�0

[1�n�n,j ] Ce�cnjvn , sup�2Sn,j(vn)

E�[1�n(1� �n,j)] Ce�cnjvn , j >⇢

vn.

10 Posterior concentration rates for Aalen counting processes

In what follows, the symbols “.” and “&” are used to denote inequalities valid upto constants that are universal or fixed throughout.

Proof of Theorem 2.1. Given Proposition 4.1 and Proposition 4.2, the proof of Theorem2.1 is similar to that of Theorem 1 in Ghosal and van der Vaart (2007). Let Un = {� :k�� �0k1 > J1vn}. Write

⇡(Un | N) =

RUn

e`n(�)�`n(�0)d⇡(�)RF

e`n(�)�`n(�0)d⇡(�)=

Nn

Dn.

We have

P(n)�0

⇣Dn e�(0+1)nv2

n⇡1(Bk,n(�0; vn, H))⌘

P(n)�0

Z

Bk,n(�0; vn, H)

exp{`n(�)� `n(�0)}⇡(Bk,n(�0; vn, H))

d⇡(�)

�(0 + 1)nv2n + log

✓⇡1(Bk,n(�0; vn, H))

⇡(Bk,n(�0; vn, H))

◆◆.

By the assumption on the positivity and continuity of the Lebesgue density of the prior⇡M and the requirement that v2n � (n/ log n)�1,

⇡(Bk,n(�0; vn, H)) & ⇡1(Bk,n(�0; vn, H))vn & ⇡1(Bk,n(�0; vn, H))e�nv2n/2,

so that, using Proposition 4.1 and Markov’s inequality,

P(n)�0

⇣Dn e�(0+1)nv2

n⇡1(Bk,n(�0; vn, H))⌘. (nv2n)

�k/2.

Note that inequality (5.6) implies that ⇡(Sn,j(vn)) ⇡1(Sn,j). Using tests �n,j ofProposition 4.2, mimicking the proof of Theorem 1 of Ghosal and van der Vaart (2007),we have that for J1 � J0,

E(n)�0

[1�n⇡ (� : k�� �0k1 > J1vn | N)]

X

j�J1

E(n)�0

[1�n�n,j ] +

b⇢/vncX

j=dJ1e

e(0+1)nv2n

⇡1(Sn,j)e�cnj2v2n

⇡1(Bk,n(�0; vn, H))

+X

j>⇢/vn

e(0+1)nv2n⇡1(Sn,j)e�cnjvn

⇡1(Bk,n(�0; vn, H))+

e(0+1)nv2n⇡1(Fc

n)

⇡1(Bk,n(�0; vn, H))

+ P(n)�0

(Dn e�(0+1)nv2n⇡1(Bk,n(�0; vn, H)))

. (nv2n)�k/2,

which proves the result since P(n)�0

(�cn) = o(1).

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 11

Proof of Corollary 3.1. Without loss of generality, we can assume that ⌦ = [0, T ]. At

several places, using (1.1) and (4.1), we have that, under P(n)� (· | �n), for any interval I,

the number of points of N falling in I is controlled by the number of points of a Poissonprocess with intensity n(1 + ↵)m2� falling in I. Recall that ✏n = (n/ log n)�1/3. For

0 as in (2.2), we control P(n)�0

(`n(�) � `n(�0) �(0 + 1)n✏2n). We follow most of the

computations of Salomond (2013). Let en = (n✏2n)�k/2,

�0n(t) =�0(t)1t✓nR ✓n0 �0(u)du

, with ✓n = inf

(✓ :

Z ✓

0�0(t)dt � 1� en

n

),

and �0n = M�0 �0n. Define the event An = {X 2 N : X ✓n}. We make use of thefollowing result. Let N be a Poisson process with intensity n(1+↵)m2�0. If N(T ) = k,denote by N = {X1, . . . , Xk}. Conditionally on N(T ) = k, the random variablesX1, . . . , Xk are i.i.d. with density �0. So,

P(n)�0

(Acn | �n)

1X

k=1

P(n)�0

(9 Xi > ✓n | N(T ) = k)P(n)�0

(N(T ) = k)

1X

k=1

✓1�

⇣1� en

n

⌘k◆P(n)�0

(N(T ) = k)

= O⇣ennE(n)�0

[N(T )]⌘= O(en) = O((n✏2n)

�k/2).

Now,

P(n)�0

�`n(�)� `n(�0) �(0 + 2)n✏2n | �n

P(n)�0

�`n(�)� `n(�0) �(0 + 2)n✏2n | An, �n

�+ P(n)

�0(Ac

n | �n).

We now deal with the first term on the right-hand side. On �n \An,

`n(�0) = `n(�0n) +

Z ✓n

0log

✓�0(t)

�0n(t)

◆dNt �

Z T

0[�0(t)� �0n(t)]Ytdt

= `n(�0n) +N(T ) log

Z ✓n

0�0(t)dt

!�M�0

Z T

0�0(t)Ytdt+M�0

R ✓n0 �0(t)YtdtR ✓n0 �0(t)dt

`n(�0n) +M�0

R T

✓n�0(t)dt

R ✓n0 �0(t)Ytdt

R ✓n0 �0(t)dt

�M�0

Z T

✓n

�0(t)Ytdt

`n(�0n) +M�0

en(1 + ↵)m2

1� en/n.

So, for every � and any n large enough,

P(n)�0

�`n(�)� `n(�0) �(0 + 2)n✏2n | An, �n

P(n)�0

�`n(�)� `n(�0n) �(0 + 1)n✏2n | An, �n

= P(n)�0n

�`n(�)� `n(�0n) �(0 + 1)n✏2n | �n

12 Posterior concentration rates for Aalen counting processes

because P(n)�0

(· | An) = P(n)�0n

(·). Let H > 0 be fixed. For all � 2 Bk,n(�0n; ✏n, H), usingProposition 4.1, we obtain

P(n)�0n

�`n(�)� `n(�0n) �(0 + 1)n✏2n | �n

�= O((n✏2n)

�k/2).

Mimicking the proof of Lemma 8 in Salomond (2013), we have that, for some constantCk > 0,

⇡1

�Bk,n(�0n; ✏n, H)

�� e�Ckn✏

2n when n is large enough,

so that the first part of condition (ii) of Theorem 2.1 is verified. As in Salomond (2013),we set Fn = {� : �(0) Mn}, with Mn = exp(c1n✏2n) and c1 a positive constant. From

Lemma 9 of Salomond (2013), there exists a > 0 such that ⇡1(Fcn) e�c1(a+1)n✏2n

for n large enough, and the first part of condition (i) is satisfied. It is known fromGroeneboom (1985) that the ✏-entropy of Fn is of order (logMn)/✏, that is o(n) forall ✏ > 0 and the second part of (i) holds. The second part of (ii) is a consequence ofequation (22) of Salomond (2013).

5 Auxiliary results

This section reports the proofs of Proposition 4.1 and Proposition 4.2 that have beenstated in Section 4. Proofs of intermediate results are deferred to Section 6.

We use the fact that for any pair of densities f and g, kf � gk1 2h(f, g).

Proof of Proposition 4.1. Recall that the log-likelihood evaluated at � is given by `n(�) =R T

0 log(�(t))dNt �R T

0 �(t)Ytdt. Since on ⌦c, N is empty and Yt ⌘ 0 almost surely, wecan assume, without loss of generality, that ⌦ = [0, T ]. Define

Mn(�) =

Z T

0�(t)µn(t)dt, Mn(�0) =

Z T

0�0(t)µn(t)dt,

and

�n(·) =�(·)µn(·)Mn(�)

=�(·)µn(·)R T

0 �(t)µn(t)dt, �0,n(·) =

�0(·)µn(·)Mn(�0)

=�0(·)µn(·)R T

0 �0(t)µn(t)dt.

By straightforward computations,

KL(�0; �) = E(n)�0

[`n(�0)� `n(�)]

= Mn(�0)

KL(�0,n; �n) +

Mn(�)

Mn(�0)� 1� log

✓Mn(�)

Mn(�0)

◆�

= Mn(�0)

KL(�0,n; �n) + �

✓Mn(�)

Mn(�0)

◆�

nm2M�0

KL(�0,n; �n) + �

✓Mn(�)

Mn(�0)

◆�,

(5.1)

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 13

where �(x) = x� 1� log x and

KL(�0,n; �n) =

Z T

0log

✓�0,n(t)

�n(t)

◆�0,n(t)dt.

We control KL(�0,n; �n) for � 2 Bk,n(�0; vn, H). By using Lemma 8.2 of Ghosal et al.(2000), we have

KL(�0,n; �n) 2h2(�0,n, �n)

✓1 + log

�����0,n

�n

����1

2h2(�0,n, �n)

1 + log

✓m2

m1

◆+ log

�����0

����1

2

1 + log

✓m2

m1

◆�h2(�0,n, �n)

✓1 + log

�����0

����1

◆(5.2)

because 1 + log(m2/m1) � 1. We now deal with h2(�0,n, �n). We have

h2(�0,n, �n) =

Z T

0

✓q�0,n(t)�

q�n(t)

◆2

dt

=

Z T

0

s�0(t)µn(t)R T

0 �0(u)µn(u)du�s

�(t)µn(t)R T

0 �(u)µn(u)du

!2

dt

2m2

Z T

0

s�0(t)R T

0 �0(u)µn(u)du�s

�0(t)R T

0 �(u)µn(u)du

!2

dt

+ 2m2

Z T

0

s�0(t)R T

0 �(u)µn(u)du�s

�(t)R T

0 �(u)µn(u)du

!2

dt

2m2Un +2m2

m1h2(�0, �),

with

Un =

s1

R T

0 �0(t)µn(t)dt�s

1R T

0 �(t)µn(t)dt

!2

.

We denote by

✏n :=1

R T

0 �0(u)µn(u)du

Z T

0[�(t)� �0(t)]µn(t)dt,

so that

|✏n| 1

m1

Z T

0|�(t)� �0(t)|µn(t)dt

2m2

m1h(�0, �).

Then,

Un =1

R T

0 �0(t)µn(t)dt

✓1� 1p

1 + ✏n

◆2

✏2n4m1

m22

m31

h2(�0, �).

14 Posterior concentration rates for Aalen counting processes

Finally,

h2(�0,n, �n) 2m2

m1

✓m2

2

m21

+ 1

◆h2(�0, �). (5.3)

It remains to bound � (Mn(�)/Mn(�0)). We have

|Mn(�0)�Mn(�)| Z T

0|�(t)� �0(t)|µn(t)dt

nm2

Z T

0|�(t)� �0(t)|dt

m2

m1M�0

Mn(�0)⇥M�0k�� �0k1 + |M� �M�0 |

m2

m1M�0

Mn(�0)[2M�0h(�, �0) + |M� �M�0 |]

m2

m1M�0

Mn(�0)(2M�0 + 1)vn.

Since �(u+ 1) u2 if |u| 1/2, we have

✓Mn(�)

Mn(�0)

◆ m2

2

m21M

2�0

(2M�0 + 1)2v2n for n large enough. (5.4)

Combining (5.1), (5.2), (5.3) and (5.4), we have KL(�0; �) 0nv2n for n large enough,with 0 as in (2.2). We now deal with

V2k(�0; �) = E(n)�0

[|`n(�0)� `n(�)� E(n)�0

[`n(�0)� `n(�)]|2k], k � 1.

We begin by considering the case k > 1. In the sequel, we denote by C a constant thatmay change from line to line. Straightforward computations lead to

V2k(�0; �) = E(n)�0

"������Z T

0

�0(t)� �(t)� �0(t) log

✓�0(t)

�(t)

◆�[Yt � µn(t)]dt

+

Z T

0log

✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt]

�����

2k3

5

22k�1(A2k +B2k),

with

B2k := E(n)�0

2

4�����

Z T

0log

✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt]

�����

2k3

5

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 15

and, by (2.1),

A2k := E(n)�0

2

4�����

Z T

0

�0(t)� �(t)� �0(t) log

✓�0(t)

�(t)

◆�[Yt � µn(t)]dt

�����

2k3

5

Z T

0

�0(t)� �(t)� �0(t) log

✓�0(t)

�(t)

◆�2dt

!k

⇥ E(n)�0

2

4 Z T

0[Yt � µn(t)]

2dt

!k3

5

22k�1C1knk (A2k,1 +A2k,2) ,

where, for � 2 Bk,n(�0; vn, H),

A2k,1 :=

"Z T

0�20(t) log

2

✓�0(t)

�(t)

◆dt

#k

M2k�0k�0kk

1

"Z T

0�0(t) log

2

✓M�0 �0(t)

M��(t)

◆dt

#k

22k�1M2k�0k�0kk

1

"Ek

2 (�0; �) +

����log✓

M�

M�0

◆����2k#

ChEk

2 (�0; �) + |M� �M�0 |2ki Cv2kn

and

A2k,2 :=

Z T

0[�0(t)� �(t)]2dt

!k

=

Z T

0

�(M�0 �M�)�0(t)�M�[�(t)� �0(t)]

2dt

!k

22k�1k�0k2k1

(M�0 �M�)2k

+ 22k�1M2k�

"Z T

0

✓q�0(t)�

q�(t)

◆2✓q�0(t) +

q�(t)

◆2

dt

#k

22k�1k�0k2k1

(M�0 �M�)2k + 2kM2k

� (k�0k1 + k�k1

)kh2k(�0, �) Cv2kn .

Therefore,A2k C(nv2n)

k.

To deal with B2k, for any T > 0, we set

MT :=

Z T

0log

✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt],

so (MT )T is a martingale. Using the Burkholder-Davis-Gundy Inequality (see TheoremB.15 in Karr (1991)), there exists a constant C(k) only depending on k such that, since

16 Posterior concentration rates for Aalen counting processes

2k > 1,

E(n)�0

[|MT |2k] C(k)E(n)�0

2

4�����

Z T

0log2

✓�0(t)

�(t)

◆dNt

�����

k3

5 .

Therefore, for k > 1,

B2k = E(n)�0

[|MT |2k]

3k�1C(k)

0

@E(n)�0

2

4�����

Z T

0log2

✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt]

�����

k

+

�����

Z T

0log2

✓�0(t)

�(t)

◆[Yt � µn(t)]�0(t)dt

�����

k

+

�����

Z T

0log2

✓�0(t)

�(t)

◆µn(t)�0(t)dt

�����

k3

5

1

A

= 3k�1C(k)(B(0)k,2 +B(1)

k,2 +B(2)k,2),

with

B(0)k,2 = E(n)

�0

2

4�����

Z T

0log2

✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt]

�����

k3

5 ,

B(1)k,2 = E(n)

�0

2

4�����

Z T

0log2

✓�0(t)

�(t)

◆[Yt � µn(t)]�0(t)dt

�����

k3

5 ,

B(2)k,2 =

�����

Z T

0log2

✓�0(t)

�(t)

◆µn(t)�0(t)dt

�����

k

.

This can be iterated: we set J = min{j 2 N : 2j � k} so that 1 < k21�J 2. Thereexists a constant Ck, only depending on k, such that for

B(1)k21�j ,2j = E(n)

�0

2

4�����

Z T

0log2

j✓�0(t)

�(t)

◆[Yt � µn(t)]�0(t)dt

�����

k21�j3

5

and

B(2)k21�j ,2j =

�����

Z T

0log2

j✓�0(t)

�(t)

◆µn(t)�0(t)dt

�����

k21�j

,

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 17

B2k Ck

0

@E(n)�0

2

4�����

Z T

0log2

J✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt]

�����

k21�J3

5

+JX

j=1

(B(1)k21�j ,2j +B(2)

k21�j ,2j )

1

A

Ck

8><

>:

0

@E(n)�0

2

4�����

Z T

0log2

J✓�0(t)

�(t)

◆[dNt � Yt�0(t)dt]

�����

23

5

1

Ak2�J

+JX

j=1

(B(1)k21�j ,2j +B(2)

k21�j ,2j )

9=

;

= Ck

2

4 E(n)�0

"Z T

0log2

J+1✓�0(t)

�(t)

◆Yt�0(t)dt

#!k2�J

+JX

j=1

(B(1)k21�j ,2j +B(2)

k21�j ,2j )

3

5

= Ck

2

4 Z T

0log2

J+1✓�0(t)

�(t)

◆µn(t)�0(t)dt

!k2�J

+JX

j=1

(B(1)k21�j ,2j +B(2)

k21�j ,2j )

3

5

= Ck

2

4B(2)k2�J ,2J+1 +

JX

j=1

(B(1)k21�j ,2j +B(2)

k21�j ,2j )

3

5 .

Note that, for any 1 j J ,

B(1)k21�j ,2j

"Z T

0log2

j+1✓�0(t)

�(t)

◆�20(t)dt

#k2�j

⇥ E(n)�0

2

4 Z T

0[Yt � µn(t)]

2dt

!k2�j3

5

C(M2�0k�0k1)k2

�j

"Z T

0log2

j+1✓M�0 �0(t)

M��(t)

◆�0(t)dt

#k2�j

⇥ nk2�j

C

log2

j+1✓M�0

M�

◆+ E2j+1(�0; �)

�k2�j

⇥ nk2�j

C(nv2n)k2�j

C(nv2n)k,

where we have used (2.1). Similarly, for any j � 1,

B(2)k21�j ,2j (nm2M�0)

k21�j

"Z T

0log2

j✓M�0 �0(t)

M��(t)

◆�0(t)dt

#k21�j

C

log2

j✓M�0

M�

◆+ E2j (�0; �)

�k21�j

⇥ nk21�j

C(nv2n)k21�j

C(nv2n)k.

Therefore, for any k > 1,V2k(�0; �) (nv2n)

k,

18 Posterior concentration rates for Aalen counting processes

where depends on C1k, k, H, �0, m1 and m2. Using previous computations, the casek = 1 is straightforward. So, we obtain the result for Vk(�0; �) for every k � 2.

To prove Proposition 4.2, we use the following lemma whose proof is reported inSection 6.

Lemma 5.1. Under condition (1.2), there exist constants ⇠, K > 0, only depending onM�0 , ↵, m1 and m2, such that, for any non-negative function �1, there exists a test ��1

so thatE(n)�0

[1�n��1 ] 2 exp (�Knk�1 � �0k1 ⇥min{k�1 � �0k1, m1})

and

sup�: k���1k1<⇠k�1��0k1

E�[1�n(1���1)] 2 exp (�Knk�1 � �0k1 ⇥min{k�1 � �0k1, m1}) .

Proof of Proposition 4.2. We consider the setting of Lemma 5.1 and a covering of Sn,j(vn)with L1-balls of radius ⇠jvn and centers (�l,j)l=1, ..., Dj , where Dj is the covering numberof Sn,j(vn) by such balls. We set �n,j = maxl=1, ..., Dj ��l,j , where the ��l,j ’s are definedin Lemma 5.1. So, there exists a constant ⇢ > 0 such that

E(n)�0

[1�n�n,j ] 2Dje�Knj2v2

n and sup�2Sn,j(vn)

E(n)� [1�n(1��n,j)] 2e�Knj2v2

n , if j ⇢

vn,

and

E(n)�0

[1�n�n,j ] 2Dje�Knjvn and sup

�2Sn,j(vn)E(n)� [1�n(1��n,j)] 2e�Knjvn , if j >

vn,

where K is a constant (see Lemma 5.1). We now bound Dj . First note that for any� = M�� and �0 = M�0 �0,

k�� �0k1 M�k�� �0k1 + |M� �M�0 |. (5.5)

Assume that M� � M�0 . Then,

k�� �0k1 �Z

�>�0

[M��(t)�M�0 �0(t)]dt

= M�

Z

�>�0

[�(t)� �0(t)]dt+ (M� �M�0)

Z

�>�0

�0(t)dt

� M�

Z

�>�0

[�(t)� �0(t)]dt =M�

2k�� �0k1.

Conversely, if M� < M�0 ,

k�� �0k1 �Z

�0>�

[M�0 �0(t)�M��(t)]dt

� M�0

Z

�0>�

[�0(t)� �(t)]dt =M�0

2k�� �0k1.

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 19

So, 2k�� �0k1 � (M� _M�0)k�� �0k1 and we finally have

k�� �0k1 � max�(M� _M�0)k�� �0k1/2, |M� �M�0 |

. (5.6)

So, for all � = M�� 2 Sn,j(vn),

k�� �0k1 2(j + 1)vnM�0

and |M� �M�0 | (j + 1)vn. (5.7)

Therefore, Sn,j(vn) ✓ (Sn,j \Fn)⇥ {M : |M �M�0 | (j + 1)vn} and any covering of(Sn,j\Fn)⇥{M : |M�M�0 | (j+1)vn} will give a covering of Sn,j(vn). So, to boundDj , we have to build a convenient covering of (Sn,j\Fn)⇥{M : |M�M�0 | (j+1)vn}.We distinguish two cases.

• We assume that (j + 1)vn 2M�0 . Then, (5.7) implies that M� 3M�0 . More-over, if

k�� �0k1 ⇠jvn3M�0 + 1

and |M� �M�0 | ⇠jvn3M�0 + 1

,

then, by (5.5),

k�� �0k1 (M� + 1)⇠jvn3M�0 + 1

⇠jvn.

By assumption (ii) of Theorem 2.1, this implies that, for any � > 0, there existsJ0 such that for j � J0,

Dj D((3M�0 + 1)�1⇠jvn, Sn,j \ Fn, k · k1)⇥2(j + 1)vn ⇥ (3M�0 + 1)

⇠jvn+

1

2

. exp(�(j + 1)2nv2n).

• We assume that (j + 1)vn > 2M�0 . If

k�� �0k1 ⇠

4and |M� �M�0 | ⇠(M� _M�0)

4,

using again (5.5) and (5.7),

k�� �0k1 ⇠M�

4+

⇠(M� +M�0)

4 3⇠M�0

4+

⇠(j + 1)vn2

7⇠(j + 1)vn8

⇠jvn,

for n large enough. By assumption (i) of Theorem 2.1, this implies that, for any� > 0,

Dj . D(⇠/4, Fn, k · k1)⇥ log((j + 1)vn) . log(jvn) exp(�n).

It is enough to choose � small enough to obtain the result of Proposition 4.2.

20 Posterior concentration rates for Aalen counting processes

6 Appendix

Proof of Lemma 5.1. For any �, we denote by E(n)�,�n

[·] = E(n)� [1�n ⇥ ·]. For any �, �0,

we define

k�� �0kµn :=

Z

⌦|�(t)� �0(t)|µn(t)dt.

On �n we havem1k�� �0k1 k�� �0kµn m2k�� �0k1. (6.1)

The main tool for building convenient tests is Theorem 3 of Hansen et al. (2012) (andits proof) applied in the univariate setting. By mimicking the proof of this theoremfrom Inequality (7.5) to Inequality (7.7), if H is a deterministic function bounded by b,we have that, for any u � 0,

P(n)�

�����

Z T

0Ht(dNt � d⇤t)

����� �p2vu+

bu

3and �n

! 2e�u, (6.2)

where we recall that ⇤t =R t

0 Ys�(s)ds and v is a deterministic constant such that, on

�n,R T

0 H2t Yt�(t)dt v almost surely. For any non-negative function �1, we define the

setsA := {t 2 ⌦ : �1(t) � �0(t)} and Ac := {t 2 ⌦ : �1(t) < �0(t)}

and the following pseudo-metrics

dA(�1, �0) :=

Z

A

[�1(t)��0(t)]µn(t)dt and dAc(�1, �0) :=

Z

Ac

[�0(t)��1(t)]µn(t)dt.

Note that k�1��0kµn = dA(�1, �0)+dAc(�1, �0). For u > 0, if dA(�1,�0) � dAc(�1,�0),define the test

��1,A(u) := 1

⇢N(A)�

Z

A

�0(t)Ytdt � ⇢n(u)

�, with ⇢n(u) :=

p2nv(�0)u+

u

3,

where, for any non-negative function �,

v(�) := (1 + ↵)

Z

⌦�(t)µn(t)dt. (6.3)

Similarly, if dA(�1, �0) < dAc(�1, �0), define

��1,Ac(u) := 1

⇢N(Ac)�

Z

Ac

�0(t)Ytdt �⇢n(u)

�.

Since for any non-negative function �, on �n, by (4.1),

(1� ↵)

Z

⌦�(t)µn(t)dt

Z

⌦�(t)

Yt

ndt (1 + ↵)

Z

⌦�(t)µn(t)dt, (6.4)

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 21

inequality (6.2) applied with H = 1A or H = 1Ac , b = 1 and v = nv(�0) implies that,for any u > 0,

E(n)�0,�n

[��1,A(u)] 2e�u and E(n)�0,�n

[��1,Ac(u)] 2e�u. (6.5)

We now state a useful lemma whose proof is given below.

Lemma 6.1. Assume condition (1.2) is verified. Let � be a non-negative function.Assume that

k�� �1kµn 1� ↵

4(1 + ↵)k�1 � �0kµn .

We set Mn(�0) =R⌦ �0(t)µn(t)dt and we distinguish two cases.

1. Assume that dA(�1, �0) � dAc(�1, �0). Then,

E(n)�,�n

[1� ��1,A(uA)] 2 exp(�uA),

where

uA =

(u0And2A(�1, �0), if k�1 � �0kµn 2Mn(�0),

u1AndA(�1,�0), if k�1 � �0kµn > 2Mn(�0),

and u0A, u1A are two constants only depending on ↵, M�0 , m1 and m2.

2. Assume that dA(�1, �0) < dAc(�1, �0). Then,

E(n)�,�n

[1� ��1,Ac(uAc)] 2 exp(�uAc),

where

uAc =

(u0Acnd2Ac(�1, �0), if k�1 � �0kµn 2Mn(�0),

u1AcndAc(�1, �0), if k�1 � �0kµn > 2Mn(�0),

and u0Ac , u1Ac are two constants only depending on ↵, M�0 , m1 and m2.

Note that, by (6.1), if dA(�1, �0) � dAc(�1, �0), by virtue of Lemma 6.1,

uA � min{u0And2A(�1, �0), u1AndA(�1, �0)}

� ndA(�1, �0)⇥min{u0AdA(�1, �0), u1A}

� 1

2nm1k�1 � �0k1 ⇥min

⇢1

2u0Am1k�1 � �0k1, u1A

� KAnk�1 � �0k1 ⇥min{k�1 � �0k1, m1},

for KA a positive constant small enough only depending on ↵, M�0 , m1 and m2. Simi-larly, if dA(�1, �0) < dAc(�1, �0),

uAc � 1

2nm1k�1 � �0k1 ⇥min

⇢1

2u0Acm1k�1 � �0k1, u1Ac

� KAcnk�1 � �0k1 ⇥min{k�1 � �0k1, m1},

22 Posterior concentration rates for Aalen counting processes

for KAc a positive constant small enough only depending on ↵, M�0 , m1 and m2. Now,we set

��1 = ��1,A(uA)1{dA(�1,�0)�dAc (�1,�0)} + ��1,Ac(uAc)1

{dA(�1,�0)<dAc (�1,�0)},

so that, with K = min{KA, KAc}, by using (6.5),

E(n)�0,�n

[��1 ] = E(n)�0,�n

[��1,A(uA)]1{dA(�1,�0)�dAc (�1,�0)}

+ E(n)�0,�n

[��1,Ac(uAc)]1{dA(�1,�0)<dAc (�1,�0)}

2e�uA1{dA(�1,�0)�dAc (�1,�0)} + 2e�uAc 1

{dA(�1,�0)<dAc (�1,�0)}

2 exp (�Knk�1 � �0k1 ⇥min{k�1 � �0k1, m1}) .

If k�� �1k1 < ⇠k�1 � �0k1, ⇠ = m1(1� ↵)/[4m2(1 + ↵)], then

k�� �1kµn 1� ↵

4(1 + ↵)k�1 � �0kµn

and Lemma 6.1 shows that

E(n)�,�n

[1� ��1 ] 2e�uA1

{dA(�1,�0)�dAc (�1,�0)} + 2e�uAc1

{dA(�1,�0)<dAc (�1,�0)}

2 exp (�Knk�1 � �0k1 ⇥min{k�1 � �0k1, m1}) ,

which completes the proof of Lemma 5.1.

Proof of Lemma 6.1. We only consider the case where dA(�1, �0) � dAc(�1, �0). Thecase dA(�1, �0) < dAc(�1, �0) can be dealt with using similar arguments. So, we assumethat dA(�1, �0) � dAc(�1, �0). On �n we haveZ

A

[�1(t)� �0(t)]Ytdt � n(1� ↵)

Z

A

[�1(t)� �0(t)]µn(t)dt

� n(1� ↵)

2k�1 � �0kµn

� 2n(1 + ↵)k�� �1kµn

� 2n(1 + ↵)

Z

A

|�(t)� �1(t)|µn(t)dt � 2

Z

A

|�(t)� �1(t)|Ytdt.

Therefore,

E(n)�,�n

[1� ��1,A(uA)] = P(n)�,�n

✓N(A)�

Z

A

�(t)Ytdt < ⇢n(uA) +

Z

A

(�0 � �)(t)Ytdt

= P(n)�,�n

✓N(A)�

Z

A

�(t)Ytdt < ⇢n(uA)�Z

A

(�1 � �0)(t)Ytdt

+

Z

A

(�1 � �)(t)Ytdt

P(n)�,�n

✓N(A)�

Z

A

�(t)Ytdt < ⇢n(uA)�1

2

Z

A

(�1 � �0)(t)Ytdt

◆.

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 23

Assume that k�1 � �0kµn 2Mn(�0). This assumption implies that dA(�1, �0) k�1 � �0kµn 2Mn(�0) 2m2M�0 . Since v(�0) = (1 + ↵)Mn(�0), with uA =u0And2A(�1, �0), where u0A 1 is a constant depending on ↵, m1 and m2 chosenlater, we have

⇢n(uA) ndA(�1, �0)q2u0A(1 + ↵)Mn(�0) +

u0And2A(�1, �0)

3 K1

pu0AndA(�1, �0)

as soon as K1 � [2(1 + ↵)Mn(�0)]1/2 + 2Mn(�0)pu0A/3. Note that the definition of

v(�) in (6.3) gives

v(�) = (1 + ↵)

Z

⌦�0(t)µn(t)dt+ (1 + ↵)

Z

⌦[�(t)� �0(t)]µn(t)dt

v(�0) + (1 + ↵)k�� �0kµn

v(�0) + (1 + ↵) [k�� �1kµn + k�1 � �0kµn ]

v(�0) +5 + 3↵

4k�1 � �0kµn C1,

where C1 only depends on ↵, M�0 , m1 and m2. Combined with (6.4), this implies that,on �n, if K1 (1� ↵)/[4

pu0A], which is true for u0A small enough,

1

2

Z

A

(�1 � �0)(t)Ytdt� ⇢n(uA) �(1� ↵)n

2dA(�1, �0)

1�

2K1pu0A

1� ↵

� (1� ↵)n

4dA(�1, �0) �

p2nC1r +

r

3�p

2nv(�)r +r

3,

with

r = nmin

⇢(1� ↵)2

128C1d2A(�1, �0),

3(1� ↵)

8dA(�1, �0)

�.

Inequality (6.2) then leads to

E(n)�,�n

[1� ��1,A(uA)] 2e�r. (6.6)

For u0A small enough only depending on M�0 , ↵, m1 and m2, we have

(1� ↵)

4pu0A

�q2(1 + ↵)Mn(�0) +

2Mn(�0)pu0A

3

so (6.6) is true. Since r � uA for u0A small enough, then

E(n)�,�n

[1� ��1,A(u)] 2e�uA .

Assume that k�1��0kµn > 2Mn(�0). We take uA = u1AndA(�1, �0), where u1A 1is a constant depending on ↵ chosen later. We still consider the same test ��1,A(uA).

24 Posterior concentration rates for Aalen counting processes

Observe now that, since dA(�1, �0) � 12k�1 � �0kµn � Mn(�0),

⇢n(uA) =p2nuAv(�0) +

uA

3

nq2(1 + ↵)u1AMn(�0)dA(�1, �0) +

nu1A

3dA(�1, �0)

p

2(1 + ↵) +1

3

�npu1AdA(�1, �0)

and, under the assumptions of the lemma,

v(�) (1 + ↵)Mn(�0) + (1 + ↵) [k�� �1kµn + k�1 � �0kµn ] C2dA(�1, �0), (6.7)

where C2 only depends on ↵. Therefore,

1

2

Z

A

(�1 � �0)(t)Ytdt� ⇢n(uA)

� n(1� ↵)

2

Z

A

[�1(t)� �0(t)]µn(t)dt�✓p

2(1 + ↵) +1

3

◆pu1AndA(�1, �0)

�1� ↵

2�✓p

2(1 + ↵) +1

3

◆pu1A

�ndA(�1, �0)

� 1� ↵

4ndA(�1, �0),

where the last inequality is true for u1A small enough depending only on ↵. Finally,using (6.7), since uA = u1AndA(�1, �0), we have

1� ↵

4ndA(�1, �0) �

p2nC2dA(�1, �0)u1AndA(�1, �0) +

1

3u1AndA(�1, �0)

�p2nv(�)uA +

uA

3

for u1A small enough depending only on ↵. We then obtain

E(n)�,�n

[1� ��1,A(uA)] 2e�uA ,

which completes the proof.

References

Aalen, O. (1978). Nonparametric inference for a family of counting processes. Ann.Statist., 6: 701–726.

Andersen, P. K., Borgan, A., Gill, R. D., Keiding, N. (1993). Statistical models basedon counting processes. Springer Series in Statistics. Springer-Verlag, New York.

Belitser, E., Serra, P., van Zanten, J. H. (2013). Rate optimal Bayesian intensity smooth-ing for inhomogeneous Poisson processes. http://arxiv.org/pdf/1304.6017v2.

S. Donnet and V. Rivoirard and J. Rousseau and C. Scricciolo 25

Brunel, E., Comte, F. (2005). Penalized contrast estimation of density and hazard ratewith censored data. Sankhya, 3: 441–475.

Brunel, E., Comte, F. (2008). Adaptive estimation of hazard rate with censored data.Comm. Statist. Theory Methods, 37: 1284–1305.

Comte, F., Gaı↵as, S., Guilloux, A. (2011). Adaptive estimation of the conditionalintensity of marker-dependent counting processes Ann. Inst. Henri Poincare Probab.Stat., 47: 1171–1196.

Daley, D. J., Vere-Jones, D. (2003). An introduction to the theory of point processes.Vol. I, 2nd Edition. Probability and its Applications (New York). Springer-Verlag,New York, Elementary theory and methods.

Daley, D. J., Vere-Jones, D. (2008). An introduction to the theory of point processes.Vol. II, 2nd Edition. Probability and its Applications (New York). Springer, NewYork, General theory and structure.

de Boor, C. (1978). A practical guide to splines. Springer, New York.

Gaı↵as, S., Guilloux, A. (2012). High-dimensional additive hazards models and theLasso. Electron. J. Stat., 6: 522–546.

Ghosal, S., Ghosh, J. K., van der Vaart, A. W. (2000). Convergence rates of posteriordistributions. Ann. Statist., 28: 500–531.

Ghosal, S., van der Vaart, A. (2007). Convergence rates of posterior distributions fornoniid observations. Ann. Statist., 35: 192–223.

Groeneboom, P. (1985). Estimating a monotone density. In: Proceedings of the Berkeleyconference in honor of Jerzy Neyman and Jack Kiefer. Wadsworth Statist./ Prob. Ser.,Belmont, CA Wadsworth, pp. 539–555.

Hansen, N. R., Reynaud-Bouret, P., Rivoirard, V. (2012). Lasso and probabilisticinequalities for multivariate point processes. arXiv:1208.0570.

Hardle, W., Kerkyacharian, G., Picard, D., Tsybakov, A. (1998). Wavelets, approxi-mation, and statistical applications. Vol. 129 of Lecture Notes in Statistics. Springer-Verlag, New York.

Ishwaran, H., James, L. F. (2004). Computational methods for multiplicative intensitymodels using weighted gamma processes: proportional hazards, marked point pro-cesses, and panel count data. Journal of the American Statistical Association, 99,175–190.

Karr, A. F. (1991). Point processes and their statistical inference, 2nd Edition. Vol. 7of Probability: Pure and Applied. Marcel Dekker Inc., New York.

Kottas, A., Sanso, B. (2007). Bayesian mixture modeling for spatial Poisson processintensities, with applications to extreme value analysis. J. Statist. Plann. Inference,137, 3151–3163.

26 Posterior concentration rates for Aalen counting processes

Kuo, L., Ghosh, S. K. (1997). Bayesian nonparametric inference for nonhomogeneousPoisson processes. Tech. rep., University of Connecticut, Department of Statistics.

Lo, A., Weng, C.-S. (1989). On a class of Bayesian nonparametric estimates: II. Hazardrate estimates. Annals of the Institute of Statistical Mathematics, 41: 227–245.

Lo, A. Y. (1982). Bayesian nonparametric statistical inference for Poisson point pro-cesses. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 59: 55–66.

Reynaud-Bouret, P. (2006). Penalized projection estimators of the Aalen multiplicativeintensity. Bernoulli, 12: 633–661.

Rivoirard, V., Rousseau, J. (2012). Posterior concentration rates for infinite dimensionalexponential families. Bayesian Analysis, 7: 311–334.

Salomond, J. B. (2013). Concentration rate and consistency of the posterior undermonotonicity constraints. Tech. rep., Univ. Paris Dauphine.

Williamson, R. E. (1956). Multiply monotone functions and their Laplace transforms.Duke Math. J., 23: 189–207.

Wong, W. H., Shen, X. (1995). Probability inequalities for likelihood ratios and con-vergence rates of sieve MLEs. Ann. Statist., 23: 339–362.