Bin I'.UillOUU, and Jon A. Wellnerexplicit calculations for two special cases: case cohort studies and exposure-stratifiedcase-cohort studies. Details of the calculations in Section

Bin I'.UillOUU, and Jon A. Wellner

TECHNICAL REPORT No. 378

August 2000

1l""Hu·tm"nt of Statistics

Box 354322

Universiltv of Washington

;:)caun:, Wlllsllington, 98195 USA

Information Bounds for Regression Models withMissing Data

Bin Nan. Mary Emond; and Jon A. \Vellner2

University of Washington

August 25, 2000

Abstract

In this paper we revisit the information bound calculations in Robins, Rotnitzky, and Zhao(1994) and Robins, Hsieh, and Newey (1995) for regression models with missing covariates. Wepresent an approach to their calculations based on score operators which verifies the boundsestablished in sections 2-5 of RRZ and in section 3 of RHN (up to typos in the case of RHN).In the case of the Cox regression model for survival data treated in RRZ section 8, we obtaindifferent results than those of RRZ. The integral equation we present for the least favorabledirection simplifies to the known information bound for the Cox model with complete data.

\Ne give examples of our information calculations for case-cohort designs and errors invariables regression models for survival data.

1 Introduction.

missing data been the intense over the decade. In particular,li:tllUI,ll(:!,l"K paper ROTNITZKY AND ZHAO ( RRZ) nr(,vi,'1p<:: thf"oretl,c:al

results for information bounds in some covariates mIssIngat random. RRZ studied extensively the case where the model for the complete data isrestricted only by specification of its mean, conditional on covariates. They then gave a morebrief treatment of the case where the full data model is the Cox regression modeL In related work,ROBINS, HSIEH, AND NEWEY (1995) (hereafter RHN) provided information bounds for classicalregression models with missing covariate data. Meanwhile, case-cohort and stratified case-cohortdesigns have become increasingly important and popular in epidemiology since the basic work ofPRENTICE (1986) and SELF AND PRENTICE (1988). These study designs correspond to missing datamodels, since complete data are collected only on a sub-sample of the study cohort. The currentlyused estimators for these designs are not known to be efficient, being based on pseudo-likelihoodsor various ad-hoc estimating equations. Because of the sheer volume of studies using these designs,it is becoming increasingly important to better understand:

(1) \Vhat are the information bounds for these types of designs and models?

(2) How much information is being lost by use of the inefficient estimators.

(3) Is it possible to construct reasonable, easily computable estimators which achieve theinformation bounds?

Our goal here is to address parts of the first two of these issues by revisiting the results of RRZand RHN with the following aims:

• To present proofs of the RRZ information bound result for the restricted mean model, and theRHN information bound for classical regression models using an alternative approach basedon score operators (as opposed to the focus on influence functions in RRZ and RHN).

• To use this alternative approach to obtain new information bounds for censored survival datamodels.

• To illustrate the new bounds with explicit calculations for case-cohort and stratified casecohort (or "errors-in-variables") versions of the Cox modeL

\Ve believe this alternative approach focussed on scores and score operators is simpler than thatRRZ and additional insights. this approach leads us to bound in the case

of data in the bound by RRZ. The mt,:::gri'llfor

mtOrJrl1atlCll1 bounds SASIENI, SASIENI came seven or eight yearsdeveJlop1l11ient of estimators for "partly linear" extensions of the Cox model in HUANG

Construction of estimators for 2.3 based on the calculations in Section 5will be by the see NAN (2001).

relllaln(1er of the paper is as follows: Section 2 the background somegenelral results for missing data models. The three key examples for this paper are also introducedin Section 2. Section 3 presents our approach to efficiency calculations for the restricted meanregression model treated by RRZ (1994). Our results in this section confirm those ofRRZ. [Sections2 and 3 were developed by EMOND AND WELLNER (1995); these were used by NAN (2001) as a startingpoint in his study of Example 2.3.J Section 4 presents our approach to efficiency calculations forthe classical semiparametric regression model treated by RHN. Our results in this section confirmthose of RHN (to within several typographical errors). Section 5 presents our general results oninformation bounds for the Cox model with missing covariates. The results in this section seem todiffer from those obtained by RRZ. In Section 6 we exemplify the general results of Section 5 withexplicit calculations for two special cases: case cohort studies and exposure-stratified case-cohortstudies. Details of the calculations in Section 6 are given in the Appendix (Section 7).

While our focus here is on information bounds rather than on construction of estimators, wecomment briefly here on work on the estimation side of the problem. Most of the recent workon estimators for missing data in the Cox model focus on improvements of the pseudo-likelihoodestimators of SELF AND PRENTICE (1988); see e.g. BORGAN, LANGHOLZ, SAMUELSEN, GOLDSTEIN,AND POGODA (2000), CHEN AND Lo (1999), and CHATTERJEE AND BRESLOW (2000). For simplerregression models with only one infinite-dimensional nuisance parameter (namely the distributionof the covariate vector) there is a considerable body of recent work on the computation and behaviorof (semiparametric) MLE's; see e.g. SCOTT AND WILD (1997), SCOTT AND WILD (1998), LAWLESS,VVILD, AND KALBFLEISCH (1999), McNENEY, VAN DER VAART, AND VVELLNER (2000), and BRESLOW,McNENEY, AND WELLNER (2000).

2 Missing data: background and some general facts.

2.1 A general approach and examples

\Ve calculate information bounds in this article by projecting the score for our parameter of interestonto the orthocomplement of the nuisance parameter tangent space. We first give an intuitiveoverview of this approach along with the specific models for which we provide information boundcalculations. The general setup in this article is as follows: we suppose that UO is a random vectorwith distribution Q in the model Q: the or . The or"full data" model Q may but in our ex,am,ph~s

will be selni~)ararrletJric:

interest

theij.d.

is toTheon observation of

parameter and '/ is a nuisance parallleterestimation of e when '/ is unknown

eE8c isinformation bound

as U'"" E P.Central to both our Q, model that would

hold if there were no random allcomponents are always observed. Following RRZ, we refer to model Q as the model for the "fu11"data and to model P as the model for the "observed" data.

Example 2.1. restricted mean regression model of RRZ.) The observed data consist of aresponse variable Y and covariate (X, V) E where X is sometimes missing. The probability thatX is missing does not depend on X (i.e. X is Missing At Random, or MAR). The variable R takesa value of either 1 or 0, according as X is observed or not observed, respectively. As in sections 2-5ofRRZ, it is assumed that Jr(Y,v) == P(R 11Y = Y, V = , is known and that Jr(Y, 2:. (j > O.The full data are UO (Y, X, V), and we use the notation (Y, RX, V, R) for the observed data, U.Hence,

R=lR 0

The sample spaces for Y, X, and V, denoted by Y, X and V, respectively, depend on the particularcontext but are allowed to be general (in particular the random variables Y, X, or V may bediscrete or continuous). A case of particular interest occurs when X is an expensive or hard-tomeasure covariate and V is a surrogate for this X. The covariate V may be discrete even if X iscontinuous. The model for the full data is the collection Q of all distributions on Y x X x V withdensities of the form

@,hh(Y'X, v) = 12(y - g(x, v; O)lx, v)h(x, v), (2.1 )

whereg(x, v; 0) = E(YIX x, V = v);

so that E== E(O) Y - g(X, V; 0) satisfies

oE 8 c (2.2)

E(EIX, V) = o. (2.3)

The regression e is of and the nuisance parameter TJ is thepair of densities (12, , the conditional density 12 of E X, V, and the density of (X,respectively. The model P for the observed data consists of all densities of the

.1:. 1 ./ :r.

,1oIlunatulg measure.

depends on S: P(R 1 = rr(8). ROBINS, HSIEH, AND NEWEY ( allow the function rrto be unknown. Here, for simplicity, we &<;sume rr is fixed and known. (Note Lemma 1 ofROBINS, AND NEWEY ( which asserts that the information bounds are the same in thetwo

The full data are UO = (Y, X), and we use the notation U for observed data, U(Y, X, R)R(8, R)l-R. (Note that this fits in the framework of Section 2 if we include S in Un:with UO = (Y, X, S), U (Y, X, S, R)R(RY, RX, 8, R)l-R; but the notation is simpler with UO aschosen The sample spaces for Y, X, and S denoted by Y, X, and 5, respectively, depend onthe particular context but are allowed to be general (in particular the random variables Y, X, orS may be discrete or continuous). A case of particular interest occurs when X is an expensive orhard-to-measure covariate and S is a surrogate for this X. The covariate S may be discrete evenif X is continuous. Another case of interest (see e.g. LAWLESS, 'WILD, AND KALBFLEISCH (1999),SCOTT AND WILD (1998), and BRESLOW, McNENEY, AND WELLNER (2000)) occurs when S is discretewith S I:f=lj1{(Y,X) E 5 j } for a partition {5j }f=1 of Y x X.

The model for the full data is the collection Q of all distributions on Y x X with densities ofthe form

where

qo,g(Y,x) = f(ylx;B)g(x) ,

{f(ylx; B): BEe c JRP}

(2.5)

(2.6)

is a regular parametric family of densities with respect to a dominating measure v for each fixedx E X, and 9 is an arbitrary density of x with respect to some dominating measure Ji. Theregression parameter B is the parameter of interest, and the nuisance parameter T/ is the pair ofdensity g.

The model P for the observed data consists of all densities of the form

PO,g(U) = (rr(s)qo,g(Y, x)r ((1 -rr(s))J J. . qe,g(y, X)dJi(X)dV(Y)) l-r (2.7)(y,x):S(y.x)=s

whereu = (y, x, ryr(s, r)l-r, r E {O, I}, and Ji and v are dominating measures.

Example 2.3. (The Cox model with missing covariates.) Let T be a failure time, C be a censoringtime, and Z (X, V) E be a covariate vector which is not time dependent. X is missing atrandom, while Y == T /\ C, Do == l(T :::; C'), V, and R are always observed, where R is an indicatorof missingness &<; in Example 2.1. The full data are UO = (Y, Do, X, V), and the observed dataare U = (Y, Do, RX, V, R) in the general notation introduced above. Note that X may be missingby design also for Example 2.1), as in studies. In a study (Y, Do, IS

nh",pl'1TP,rJ for in 1 of the 2, X is on a subsample thesul)sBllll'ple may on what

covariate Z = Let

where A is 1\. with cienSJlty 9 where

---,:--'--- =1

and lc

AC(t()

dt.

non-i.nt,onnati\re censoring). LetvVe assume that T and C are conditionally independent ZZ rv H with density h. Then Q is the set of all ciells1t:les of the form

Or X ,

[f(Y r. \ dt] <5 [g(y IlCY,co;

(A(y)e01z) <5 exp ( _eO'

z 1$y)) (AC(Y I

r.. f(ti dt]1-0lCY,co)

1-<5 exp (-Ac(Y I z)) h(z). (2.8)

The regression parameter () is the parameter of interest, and the nuisance parameter '/ is (A, AC) h).This is basically the model introduced by Cox (1972). The model P for the observed data is theset of all distributions with densities of the form

p(r,y,o,(r ·x),v) = (Jr(y,rS,v)q(y,rS,x,v)r ((1-Jr(y,rS,v)) Jq(y,O,x,V)dfL(X)) 1-r (2.9)

where Jr(Y, Do, V) = P(R = 1 IUP), with Jr(Y, Do, V) 2: a > 0, and fL is a dominating measure on X.o

Now we give a brief introduction to efficiency calculations. For more details see BICKEL,

KLAASSEN, RITOV, AND 'WELLNER (1993).The information bound for estimation of e in the model P is equal to 1;-1. Here, If; is the

efficient information matrix for e in P, given by

1* E (1*I*T) E.p(I~02$0= P 00 u (2.10)

where Ie is the efficient score for e. The efficient score for e in a model P = {POrry : e E e,,] E H}with nuisance parameters TI is the (componentwise) projection of the vector of scores to for eonto the orthogonal complement of the (closure of the linear of all scores for the nuisanceparameters, PI). Intuitively, when T/ is unknown, information about e can only come from thatcomponent of to that is statistically independent of variability in the data controlled by the nuisanceparameter. This component is leo Formally, each component of the efficient score for e is orthogonalto of the score for the nuisance where is relative to

Ep the space of all mean-zero square

are assumed known, respectively. Analogous definitions hold for Qe and (1)' Then Po + P1) C P,and we may assume for our purposes that Po + P11 P; see BKR\V, page 76, for a discussion.orthogonality condition described above for lo is

(2.1

i.e. = 0 for all bE Pry. Thus, our approach to calculating lo and the resulting efficiencybound is to project io onto the orthocomplement P,f of PrJ in L~(P):

ro= II(ioIP·t), (2.12)

where II denotes the projection operator; see e.g. BICKEL, KLAASSEN, RITOV, AND WELLNER (1993),

Appendix A.2. (Here the ..L (orthogonal complement) denotes the ortho-complement in Lg(P) orLg(Q), depending on the context.) The space Pf is of further importance, because it contains all

influence functions for all regular estimators of e in P. Note that lo IIo(ioIPf) = b* if and only if

(b, io - b*) = 0 for all b E P;. (2.13)

This implies that we can find the desired projection by proposing a guess b* for lo and then showingthat (2.13) holds. This requires some understanding of Pf. However, this last requirement can be

relaxed somewhat. Since ioE P Po + P1), we have

II(ioIP.~) = II(iolP n p;) = II(iol.lVl n p;)

for any (closed) subspace .M such that P C M c Lg(P). So it is sufficient to be able to identifyall b in some set M n P-;" which might not be all of P¢. Note that Pn Pf c M n P~ c P¢. We

will identify an "intermediate" set K = A1 n P¢' and then knowing lo E K will give us a generalform for fo' An expression for lo will be obtained by finding the specific element of Kthat is theprojection of io onto K. The next subsection provides the formal results necessary to carry out thisapproach.

R.

2.2 Some facts about missing data

Throughout this subsection we take the complete data to be UO (UP, r-v Q E Q, with U~

Missing At Random (MAR) in the observed data. R is an indicator of whether U~ is observed, as inExamples 2.1 2.3, with P(R 1 P(R = 1 I with:2: 0' > O. If 7l' 7l'J IS

allowed to unknown via parameters E we denote the model for the7l'. The observed data are U , R)l-R ==

U. space for Q x R cantailgEmt space Q n Q x R is

parameters, R would be included in2.2 below.

Q would be replaced Q+ R in Lemmas 2.1o

For a E the operator A :

Lemma 2.1. {Aa(U) : a E Q} c P, and the adjoint AT : Lg Lg(Q) of A is byATb(UO) E{b(U)IUO} for bE Lg(P).

Proof. This is the content of BICKEL, KLAASSEN, RITOV, AND "WELLNER (1993), section 6.6, pages271-272; see also VAN DER VAART (1998), Lemma 25.34, page 375. However, it is instructive toreview here the properties of conditional expectation that establish AT:

(b,Aa)Lg(p) E{b(U)E(a(UO)IU)} = E{E(b(U)a(UO)IU)}

= E{b(U)a(UO)} EE{b(U)a(UO)IUO}

E{a(UO)E(b(U)IUO)} = (E(b(U)IUO), a(UO)) Lg(Q)

(ATb, ahg(Q) .

Note that E{ATb(UO)} = 0 and E{[ATb(U°)F} <x by the conditional version of Jensen'sinequality since b E Lg(P). Thus ATb(UO) E Lg(Q). 0

The next lemma is central to our approach since it identifies P); once Q* is known.

Lemma 2.2. Suppose that r.(UP) ;:::: (T > O. Then:(i) Range(A) is closed (and so is Range(AI Q) or the range of A restricted to any other closedsubspace of Lg(Q)).(ii) Let bE Lg(P). Then bE P); if and only if ATb E Q* .Proof. To show that (i) holds, first note that A-I exists, and is given on Range(A) by

A-Ib(Uo) = E{Rb(U)IUO}.r.

Since r.(UP) ;:::: (T, it follows that

{ R bfU\ r/2

- \ Jr.

< .!. {E{E[b(U)i }}(T

< .!.{ }}(T

1(T

Hence to prove b .L PrJ we need to show that

Lz(P) = 0 for all a E Qry.

But this equality holds if and only if

=0 for all

k(U)

T . il.e. AbE Q-ry. 0

Now suppose that Q* is known. Then for the subspace M in the discussion earlier in this

section we will take Pry + J( where J( consists of the closed subspace of all functions k(U) of theform

where ((UO) E Q*. Moreover, let J be the set of all .j (U) such that

j(U)

where ((UO) E Q* and 4>(UP) is any function in Lg(Pup). The set J is discussed in a generalcontext by VAN DER VAART (1998), page 383, and in the setting of Example 2.1 by RRZ. As notedby RRZ, the particular function 4>(UP) in the definition of k yields the smallest variance for a givenfunction (.

The next lemma and three propositions characterize J and J( in terms of P and P;' and showthat J( has the desired properties. These propositions form the basis for our specific informationbound calculations for Examples 2.1 - 2.3 in the sections to follow.

The next lemma shows that every b = Aa E Lg(P) for a E Lg(Q) can be decomposed into theform (Rj'iI) (ATA)a II((Rj1r)(ATA)aIJ(2)) where J(2) is the subspace of Lg(P) with form of thesecond term in the definition of the class J.

Lemma 2.3. Suppose a(UO) E Lg(Q). Then Aa.L R-;;r. ¢(UP) for all a, 4> E Lg(Q); equivalently

R TAa = -(A A)1r

,} ,

where

E }.

Proof. pr().1e(~tlOin IS

ortholSon.allty rE;latl0l1lshllP IS

all the assumption of MAR.

Proposition 2.1. K c J cProof: K J is obvious. By Lemma 2.2, to show J c

.1 E . This holds since

it is suthclient to show that .1 E J

.1)

by assumption.

Proposition 2.2. P~ n P c K.

Proof. Suppose that

o

b E P~ n PCP = { Aa : a E Q} .Then we know that b Aa for some a E Q. Therefore,

b Aa = R (ATA)(a) + Aa _ R (ATA)(a) = R (ATA)(a)~ ~ ~

(2.15)

by Lemma 2.3. Because Aa E Prf, applying Lemma 2.2 gives (ATA)(a) E Q*. Thus there existssome «(UO) E L8(Q) such that (ATA)(a) = «(UO). Therefore,

b= R(_rr(R(IJ(2)).~ ~

It is easily shown that

rr (R (IJ(2)) = R - ~E(IUf).~ ~

Hence bE K. To see that (2.16) holds, note that

rr (R (IJ(2)) = R- ~71 7r

(2.16)

for some E L2(Q). For this we have

i.e. all

o

R R ~-(---0~ ~

} }

and implies

by nested conditioning using = 0

Remark: From influence function perspective of RRZ, any function b E P~ is an mtJlueGcefunction for estimation of () in the model P for the observed data, and the decomposition of b givenin (2.15) shows how b Aa E P~ is related to the influence function ATAa for estimation of () inthe model Q for the complete data. Note that (ATA)a is then the influence function of aninverse-weighted estimator of () in P of the basic type proposed by HORVITZ AND THOMPSON (1952).

Lemma 2.4. Given a subspace /v1 of Lg(P) there is a unique projection map II(·\A1) from Lg(P)onto .;\.1. In particular:

A. For a* E Pro hE Lg(P) we have a* = II(hIPT)) if and only if

(h a*, a) = 0 for all a E PT)'

B. For b* E Pt, hE Lg(P) we have b* = II(hIPt) if and only if

'.1(b, h - b*) 0 for all b E PrJ .

C. For b* E Pt n P, hE P, we have b* = II(hIP~ n p) if and only if

(b, h b*) = 0 for all b E Pt n P.

D. Suppose that h E .;\.1 with P C J\.1 c Lg(P), M a closed subspace. Then for b* E Pt n M,

b* = II(hiPt n M) if and only if

(b, h b*) = 0 for all b E Pt n A1 .

E. For hE P, the projection II(hlPrf n .;\.1) = II(hiPt n p) E P.Proof. Parts A - D are simply restatements of the basic Hilbert space projection theorem: seeRUDIN (1966), page 79, and BKRW (1993), Theorem 1 in section A.2, page 425. To prove E,note that since h E P we may write h + a2 some Cl, a2 E P

r"and it follows

II(hiPt n p) = But then we have

h =0

for all b E J\A, and this Iml)lIE~s, by D,

Jvt)

toHowmg prClposItJlOn IS

Proposition 2.3.

t'rclpO:SItH)IlS 2.

e

3 The Restricted l\!Iean Regression l\!Iodel with l\tIissingCovariates

Now we apply the of Section 2 to mean reE:re~;sioin

in Example 2.1. vVe that le is by the solution to a particular Fredholm integral equationas shown in RRZ equations (23) and . Our proof is simpler than of RRZand, hence, more easily understood and adapted to other models, as we show in Section 4. In thissection TJ (12, , and we use Q2 and Q3 to denote the tangent spaces corresponding to handh, respectively. Hence, Q17 = (Q2 + Q3) in this section. Recall that our strategy is to compute theefficient score le as le II(ieIK), and that the collection K is completely determined once we knowQ*. Next, we identify the relevant tangent spaces and orthocomplements.

vVe calculate the tangent space Q in terms of the component spaces Qe, Q2, and Q3corresponding to (), h, and 13. The first of these is easily seen to be

J:/

Qe [i2] [-~~(EIX, V)\7eg(X, V;())], (3.1 )

where i~ = i~(Y, X, V) is the score function for () in the model Q, and [i~] is the linear span of thesescores in Lg(Q). For Q2 we have

(3.2)

where the last restriction comes from the assumption that E(EIX, V) = O. (That Q2 is contained inthe right side in (3.2) is easy to prove; containment of the right side of (3.2) in Q is more difficult,and tY'Pically involves some additional restrictions as in Example 3.2.3, BICKEL, KLAASSEN, RITOV,

AND 'WELLNER (1993), page 53. Also see the discussion in BICKEL, KLAASSEN, RITOV, AND \VELLNER

(1993), page 76. As in RRZ we will assume that equality holds in (3.2).) Finally,

(3.3)

Applying Lemma 2.1 to the above, we have

.. ~ . '. ~

P == [le] = [Rle + (1 - R)E(leIY,

and1

Proposition 3.1. for E

where

Proof. Let

To prove the result, we will show that Tb E (Q2 + Q3) -L and that b - Tb E (Q2 + Q3)' For any02 E Q2:

by (3.2). And, for any 0;) E Q3:

E( . ) = E {E(EbIX, V)03(X, V)E(EIX, V) } = 0Lg(Q) Tb03 (T2(X, V) ,

by (2.3). Now E(b - TbIX, V)shows the desired result.

Proposition 3.2.

o and E[(b - Tb)EIX, V] = 0, so b - Tb E Q2 C Q2 + Q3, whicho

Proof. Take 02 E Q2 and 03 E Q3. Then E[02C(X, V)EIX, V] = °and E[03C(X, V)EIX, V] 0, asin the proof of Proposition 3.1, which shows {c(X, V)E: E[E2C2(X, V)] < oo} C (Q2 + Q3)-L. Toshow the reverse inclusion, apply Proposition 3.1 to see that

E[c(X, V)E2 IX, V](TZ(X, V) E

c(X, V)E.

o

Using the characterization of Q* (Q2 + Q3)-L in Proposition 3.2, we now identify explicitlyfor Example 2.1 the sets J and K of Section 2.

X,V,R): x, V, R) = R c(X,Jr

V)IY, V]},

and

IS res'tricted

Proposition 3.3.

X.v. }.

E and taking conditional eXj)ectatjOIlssatisfies

Y,V, we

Proof of Proposition 3.3. By Lemma 2.4.D, k* is the desired projection if and only if

for all C = c(X, V) with E(C2E2 ) <x. Now the first inner product in this last display is, by Lemma2.1,

by MAR

k*

Ai~!Lg(p) (ATkc , i~!Lg(Q) = (c(X, V)E, i~!Lg(Q) = E[c(X, V)'Veg(X, V; B)]

since E[-EUVh)(E[X, V)IX, V] = 1. To compute the second inner product, first rewrite k* kc*

in terms of the operator A as

R R-7f ,-c*(X, V)E --E[EC*(X, V)IY, V]7f 7f

RC*(X, V) E + (1 _ R)E[E c*(X, V) IY, V]7f 7f

{ (I - R) + R - 7f} E[EC*(X, V)IY, V]7f 7f

A[c*(X, V)~] - 1 - 7f (X, V)EIY, V].7f 7f

Hence the second inner product may be computed as

--.UIL (X, V)EIY,

}

R-7f

--~UIL (X, V)EIY )

V)E

(X,

IR{\1T

A[c*(X, V) E=

Combining the two terms yields

V])(X,

- E{c(X,

+ E{c(X,'iT

E {C(X, [VOg(X, V· 0) - (X, V)E( -iX,1r

+ E{E1: IT E[c*(X, V)EiY, V])] } .

= E{c(X, V)veg(X, V;

1

lo

Because this is 0 for all c c(X, V) with EC2E2 < 00, we have

vOg(X, V; 0) -(

E

2

) {1 IT }(X, V)E IT IX, V + E E-:;;:-E[c*(X, V)EIY, V] = 0

almost surely; i.e. c* solves (3.5). 0

Proposition 3.4. If there is no missing data, i.e. IT(Y, v) = 1 for all y, v, then the solution of theintegral equation (3.5) is c*(X, V) = vOg(X, V;O)jE (£2IX, V), and hence the information for 0with no missing data is

1* = E{veg(X, V;O)02}o E(~IX,V)'

(3.7)

Proof. This follows immediately from (3.5). The information formula (3.7) agrees with the resultsof RRZ proposition 3.1. 0

4 The Classical Regression Model with Missing Covariates.

Now we apply the results of Section 2 to the semiparametric regression model studied by ROBINS,

HSIEH, AND NEWEY (1995).

For this model, Qry = Lg(G) where X '" G with density g, and hence

Thus for b E Lg

In this case the opl=rator A :

b - E(biX).

cE

Proposition 4.1. The efficient score Ie = II(ieIS Ie = == k* where is given

for the "rd''''''l''rt>rl model of Example 2.2

('0

= It Ie- .R 1)) (1- 'S' E (1 It ("I)-It -;-Ec 1)

and \vhere E(c* is the unique solution of

(4.2)

Proof. Computing as in the proof of Proposition 3.3, k* == ke• is the projection of ie onto p~ ifand only if

0= le-

for all ke E K. As in Section 3 we split up the calculation of the inner product in the previousdisplay: the first term is

On the other hand, by writing

* R * R - It .( *IS) ( * I) 1 - It (*1 )k = -c - --E c = A C lit - --E c S ,It It It

the second term is

(ke, k*) = (ke , A(c* lit) - 1 - ItE(c*IS)) LO(P\It 2 I

T R R-It 1-7f(A ke,c* - (-c(X, Y) - --E(c(X, Y)IS), --E(c*

7f 7f 7f

1-7f- E(c--E(c*!S)).It I

Combining we find, using the self-adjointness of the projection operator IICI Q* ), that

o Ia

c* 1 7f,T

rr 1T)

+ (1-

this holds for all bE Lg(Q), we conclude that

(c*) (1 -~+ 1TE - X + 1T --"E(c*IS)-1T 1T

{ (c* I) .(1 - 1T . *+1T E -X -E --E(c1T1 1T

IX))Ix)} .

Computing the conditional expectation given X across this equality, and using c* E Q.* = Lg(G)L,yields

O=E(c*jX) = E(1Ti~IX)+ E((1-1T)E(c*IS)IX)

+ {E (~ Ix) -E (1 1T 1T E(c*IS)lx) }

where 1T* == E(1TIX), and hence

E(i~IX,R 1) + E (1 1T 1T E(c*IS)IX, R = 1)E(i~1TIX) E ((1 1T)E(c*IS)IX)--"--*- + *1T 1T

_ {E (~ Ix) - E (1 : 1T E(c*IS)lx) } 0

Substitution of (4.4) into (4.3) yields

(4.4)

Letting

c* 1Ti~ + (1 - 1T)E(c*IS)

- 1T { E(i~IX, R = 1) + E (1 1T 1T E(c*IS)IX, R I)}1T (i~ - E(i~IX, R = 1)) + (1 1T)E(c*IS)

- 1TE (~E(C* IX,R 1) 0

7f

and taking conditional expectations given S across yields the equation

~E (Zoo= n . e R 1)

{E(_l)-,+ (1

R

our andformula should

rer)la(~eC1 by our(in their notation),

. Note the tYJ?ol:;ra,p.b,lccll errors in their

IX,V,.6. 1]+(1

'When S is discrete, as in the stratified two-phase sampling designs studied by LAWLESS, WILD,

AND KALBFLEISCH ( and BRESLOW, McNENEY. AND \VELLNER , the equation canbe solved explicitly in terms of certain matrices; see e.g. Sections 2 and 3 of BRESLOW, McNENEY,

AND WELLNER (2000).

5 The Cox model with missing data.

Example 2.3 introduced in Section 2 is very useful in epidemiology studies, especially in two-phasedesigns where the probabilities 7r are determined by the investigator. Equation (2.8) gives thejoint density of the complete data (Y,.6., X, V) (Y,.6., Z), and (2.9) gives the joint density of theobserved data (Y,.6., RX, V, R). The finite-dimensional parameter () is the parameter of interest, andthe nuisance parameter 'f7 = (A, AC, h) is a vector of three infinite-dimensional nuisance parameters.As in Section 3, we will use Proposition 2.3 to obtain the efficient score 18in the observed model Pfor Example 2.3. Hence we first need to characterize the space Q*. Then the space K is completelydetermined.

5.1 Tangent Space Q of the Model Q

The score for the parameter of interest () and the score operators for the nuisance parametersA, AC, and h in the "full" model Q given in (2.8) are:

i~(Y,.6., Z) ig(Y,.6., Z) =.6.Z ZA(Y)eo'z = JZ d1\lI(t) , (5.1)

Y

iga(Y,.6., Z) == ig(Y,.6., Z) .6.a(Y) - .In a(t) dA(t) Ja(t) dM(t), (5.2)

i£b(Y,.6., Z) == i~(Y, Z) = (1 .6.)b(Y, Z) - rY

b(t, Z) dAc(tIZ)./0

J.6..

"/'1,""1';n,1' CI)lHltirtg processes

< >

(1- ~)l(Y::; t) lt l(Y 2:: s)dAc(s

= 3 logAx(t),

ab(t, Z) = 3'1j: logAcw

3c(Z) 3~ loghK(Z),

for regular parametric submodels {A~J, {AG1j;L and {hK }. Here we abuse notation slightly bywriting A(-) for the baseline cumulative hazard function, and A('IZ) for the cumulative hazardfunction conditional on Z. Under the proportional hazard assumption, A('IZ) = A(-)e8'z.

Then the scores in the observed model (2.9) are:

ii(Y,~, RX, V, R) = Ri?(Y,~, Z) + (1 - R)E [t?(Y,~, Z)/Y,~,VJ, i = 1"",4.

Hence the tangent spaces for the parameters in the two models are:

(5.6)

By definition, all the elements in Qi, i = 1, .. ',4, are square integrable. It is easy to see thatQ2, Q3, and Q4 are mutually orthogonal. Since they are closed (by definition), the nuisance tangentspace is Qry = Q2 + Q3 + Q4.

Let WI and W 2 be sub-distributions on JRd+1 defined by

WI(Y, z) == Q(Y ::; Y, Z ::; z, ~ = 1) =1 r (1 - G(tlz')) dF(tlz') dH(z'),(-oo,zj i(o,y]

W2 (y, z) == Q(Y ::; Y, Z ::; z, ~ = 0) =1 r (1 F(tJz')) dG(tlz') dH(z') ,(-oo,z] i(o,y]

corresponding to the uncensored and censored data, respectively. Hence W vVI + W2 is themarginal distribution of (Y, Z). Then we can define L 2 spaces corresponding to the sub-probabilitymeasures:

These spaces willL 2

<X}, i = 1,2.

is eaw to see that L 2

. Here QyZ,QE Q.

Lemma 5.1.

and similarly the censoring counting process maxtlng-~tle,

/Proof. \rYe only show the first of these two assertions using the methods of Chapter 6 of SHORACK

AND 'WELLNER ( . The second one can be verified "llJ.1l1<1l1.)

E { (J h, Z) dM(tJ)'} ~ E { E [(J h j Z)dM(t))' Z]}

=E{E[/hi Z)d(M)(t)!Z]}

= E { E [/ hi(t, Z)I(Y 2: t) dA(tIZ) Iz] }E {/ hi(t, Z)P(Y 2: tlZ) dA(tIZ)} (by Fubini)

E {/ hi(t, Z)(l - F(tIZ))(l - G(tIZ)) 1~~J~1) }= / / hI(t,S)dWI(t,S). (by Fubini)

So (5.7) is true.

\rYe define the space Q* by

o

Q* == {ho(Z) + / hl(t, Z) di\f(t) (5.9)

+ / h2(t, Z) di\fc(t): ho E Lg(H), hI E L2(Wd, h2 E L 2(W2)} .

Then from (5.1)-(5.4) we know that Q c Q*. .The following proposition plays a key role in characterizing the space Q~ which will be used to

derive the efficient score 19 for e in the model P in the next subsection.

Proposition 5.1. Q* = Lg(Q).

Proof. Q* C

Ll,Z)Efollows from Lemma 5.1. \rYe only need to show the reverse inclusion. For any

, we can write it as

, Ll, (Y. +

1- co.

Now we operators R I : as follows:

1') \~)

11)+---"----.,..--.,..--+---"-----.,..--

here is the conditional distribution of Y given Z, while WIsub-distributions with

are conditional

W(ylz) = l Y

(1 - dF(tlz) + l Y

(1 -

1 - (1 F(ylz))(l - G(ylz)),

WI(ylz) l Y

(1 - G(tlz)) dF(tlz),

W2(ylz) l Y

(1 F(tlz)) dG(tlz).

dG(tlz)

We will show that

b(Y, L1, Z) JRIb(t, Z) dM(t) +JR2b(t, Z) dMc(t) + E(bIZ), (5.13)

and

l Y

b1

-[{~-+~-}

RIb(Y, Z) E L2(Wd, R2b(Y, Z) E L2(W2). (5.14)

Then by (5.7), (5.8), and (5.9) as well as the fact that E(bIZ) E Lg(H), it follows that b(Y, L1, Z) EQ*.

Equation (5.13) can be verified directly by the definitions of R I and R2 in (5.11) and (5.12):

JRIb(t, Z) divl(t) +JR2b(t, Z) dlVlc(t)

L1RIb(Y, Z) + (1 - L1)R2b(Y, Z) l Y

RIb(t, Z) dA(tIZ) -lY

R2b(t, Z) dAc(tIZ)

bi Z) dWI(tIZ) b2(t, Z) dW2(tIZ)= L1b l (Y, Z) + (1 L1)b2(Y, Z) + 1 + ---"--1----.,..--.,..--.,..--

}

(5.16)

= .lY it -:-----=-':-:-:---:-'-~

iY{lY~}bl

= iY{-l1_= ~ /Y bl dW1

1 - Jo 1 -

Z) dW1(sIZ) loyb1(s, Z) dA(sIZ).

.0

Similarly

(5.17)

iY

1- ~~(tIZ) {it b2(s,Z) dW2(SIZ)} (dA(tIZ) +dAc;(tIZ))

JoY b2(s, Z) dW2(sIZ) /Y1 vV(YIZ) Jo b2(s, Z) dAC;(sIZ).

Substituting (5.16) and (5.17) in (5.15) yields (5.13).Now we show that (5.14) is true. Let m(y,z) = R1b(y,z) - b1(y,z) from (5.11) or equivalently

from (5.12) by changing Rl to R2 and b1 to b2. If we can show that E[m2(y, Z)J < OC, then (5.14)will be true. Since

we have

So

m(Y,Z) __1....,.--..,.. [= b1 (t, Z) dW1(tiZ) + b2(t, Z) dW2

- -1-_-=1:-:-:-c-=- .lx {b1 Z) }

where

It is easy to see 0 ::; 0: ::; 1 and 0 ::; ,8 ::; 1. Then by the same argument as on page 423 ofBICKEL, KLAASSEN, RITOV AND \VELLNER we have

< 4 J{b l

< 8J< 8Jbi

Zl dWI(tIZ) -J- b, Z' dlcV2(t IZ)}2 dW(tlZ'J dW(tJZ) ,2 ) dW(tIZ) ., }

Z)o:(t, Z) dWI(tIZ) + b~(t, Z) dW2

Then by Fubini's theorem we have

This concludes the proof of Proposition 5.1.

It is easy to verify that (5.11) and (5.12) can be written as

o

fyOO bet, 1, z) dW1(tlz) + bet, 0, z) dW2(tlz)

bey, 1, z) - 1 _ W(ylz) ,

fyOO b(t, 1, z) dW1(tlz) + b(t, 0, z) dW2(tlz)

bey, 0, z) - 1 W(ylz) .

(5.18)

(5.19)

From (5.18) and (5.19) we can see that the operators R1 and R2 are similar to the R operatordiscussed by RITOV AND WELLNER (1988), EFRON AND JOHNSTONE (1990), and BICKEL, KLAASSEN,

RITOV, AND WELLNER (1993). We can also rewrite (5.18) and (5.19), using conditional expections,in the following forms which will be used later:

RI(b)(y,z)

R2(b)(y, z)

bey, 5, Z)15=1 - E[b(Y,~, Z)IY > y, Z = z] ,

b(y,5, - E[b(Y,~, Z)IY > y, Z = z] .

(5.20)

(5.21)

Corollary 5.1. For any b E Lg(Q), the decomposition (5.13) is unique in the sense that RIb isunique a.e. WI and R2b is unique a.e. lcV2.

Our proof of Corollary 5.1 will use the following lemma:

Lemma 5.2. For any tUll.ctr,ons x

E

2.

E J 1

Proof. This toll,ows from Lemma 1 SASIENI o

Proof of Corollary 5.1. By Proposition 5.1 we

b I hI Z)dlv!(t) +I h2

Let

+

b = I h~ Z)d.lVf(t) I h; Z)d1v!e;(t) + h~(Z).

Then the difference of above two equalities is

°= I [hI Z) - h~ (t, Z)] dlH(t) +I [h2(t, Z) - h;(t, Z)] dAJ(t) + ho(Z) - h~(Z).

Taking expectations of the squares of both sides of the equality and using the orthogonality of iY!,lIIle;, and any function of Z, by Lemma 5.2 we have

Thus

J(h l hD2dWI = 0, l(h2 - h;)2dW2 = 0, and l(ho - h~)2dH 0,

which imply h~ ho a.s. H, h~ = hI a.e. WI, and h; = h2 a.e. W2.

Now we discuss several properties of the tangent spaces of the model Q.

Proposition 5.2. For any function s(Y, Z) E L 2 (Wr), define

B(s) I {s(t, Z) - E[s(Y, Z)IY = t, Ll I]} dA!(t).

Then:(i) B(s) 1.. Qr) in Lg(Q).

For any bE we have II (biQ*) = B(RI

(iii) Q* = (Q2 + Q3 + Q4)-L {B(s) : s E L 2 n.

o

(5.22)

Proof. Since Q2, Q3 and Q4 are 111U'"U,:H1.i orthogonal, and lIIl 1.. lvle;, we have

II =II

r.J

E

E

for any E L 2(Wd. Now by Lemma 5.2, the LHS of the above equation is

and hence

E[.6.{s(Y, Z) = E[{E[.6.s(Y, (Y)E[.6.IYJ

"a

So B(s) ..l Q2. Since B(s) ..l Q3 + Q4, this yields B(s) ..l Q2 + Q3 + Q4 QT}'

(ii) From Proposition 5.1 and Corollary 5.1 we know that for any b E Lg(Q), we have thedecomposition (5.13). Hence, from the proof of part (i) we know that

II(bl Q2) + II(bl Q:)) + II(bl Q4)

JE[R1(b)(Y, Z)IY t,.6. 1J d1Vf(t) +JR2 (b)(t, Z) d1Vlc (t) + E(bIZ).

Thus,

(iii) This is an immediate consequence of (i) and (ii). 0

If we choose s(t, Z) Z, then B (s) is the efficient score for e in the ("full" or "complete" data)model Q.

5.2 Efficient Score for (j in the Model P

In this subsection we will use the results in Section 2 and the previous subsection to derive theefficient score Ie ofP. Since from Proposition 5.2 we know that Q* = {B(s), s E L 2 (Wr)}, for themodel P, with PEP, we define the class K of all functions with the form

k(Y,.6., RX, V, R) = R B(s(Y, X, V)) - R - Jr [B(s(Y, X, V))IY,.6., VJ,1T 7f

where s E L 2 (W1 . Note that we can rewrite the functions k in terms of the operator D : L 2 (Wr} -+

Lg(Q) defined by

j'

(Y,.6., Z) Z)

we

Proposition 5.3.

Y R·X.iT

where satish,es the operator equation

- { (Y, Z) - --::---,----:.--,---,-..:-.---: 1'.1 11\

,1,

where the operator K is a linear integral operator defined as

(5,24)

K(u*)(Y, Z) = l x

l(Y 2:: Z) dA(sIZ)

T) . (D(U*)(Y"b.,z)" T)+ 'iT(Y, 1," E 'iT(Y', b., Z) IY >}, Z

+ (1 - 'iT(Y, 1, V))E[D(u*)IY, b., V]

_ (~f. 1TT\E (1 -'iT(Y', b., V) E[D( *\(Y' ,0" Z)IY' b.. V] !y' -" y z)'iT \1., ,v J 'iT(Y', b., V) u J , , I , , I ./, '

Our proof of Proposition 5,3 will use the following lemma:

Then we have

(Da, b) L~(Q) = EQ {J a(t, Z) dlvI(t) JR1b(t, Z) dAl(t)}

EQ {b.a(Y, Z)R1b(Y, Z)}

JJa(t, RIb(t, dWj

R 1

oSo DT RIo

Proof of Proposition 5.3.= Ie if and if it satlsh!~s

o

E

To calculate the second inner product, we have

R Jr}E7r-{~

IIJr

R (*' R Jr-Bs )---E7r 1r

11,-- + (1 - R)E

A [B~*)l- 1 - Jr E [B(s*)!Y, L1, V],II J Jr

so that the second inner product is

k*T B(s*),

== (A ks , --)£2(Q)Jr

_ (R B(s) _ R - Jr E [B(s)IY, L1, V], 1 - Jr E [B(s*)IY, L1, V1h2(.p'J'Jr Jr Jr

E [~B (s) B (s*)] - E {I Jr Jr B (s)E [B (s*) jY, L1, V] } .

Combining these two calculations, it follows that

(ks , i1 - k*h2(P) E {B(S) (B(Z) - ~B(s*) + 1: Jr E [B(s*)IY, L1, V1) }

= E {B(S) (B(Z) - ~D(U*) + 1: Jr E [D(u*)jY, L1, V1) }= / S, B T (B(Z) - 2.D (u*) + 1 - Jr E [D(u*)jY, L1, V1) )

\ Jr Jr L2(WIi

= 0,

for all s(Y, Z) E L2(WI), where

u*(Y, Z) s*(Y, Z) - E[s*(Y, Z)jY, L1 11 E L 2(WI),

satisfies (Y, Z)IY, L1 = 11 = O. Hence we must solve the pair of equations

o = B T (B(Z) - 2. D (u*) + 1 - Jr E [D(u*)!Y, L1, V1)IT iT

o ,Z)IYL1 = 1]

(5.25)

(5.26)

D Z Z-

Hence the equation can be rewritten &e;

Now

Z D. = 11

1 Jr. ( *' I- --E(D u JIY, Do,1r

iY, D., V)} Y, Do = 1}

} - iu* (Y) . (5.27)

and

1-Jr(Yl.V) [*)1 1(Y 'V) E D(u Y, D., V 16=1 (5.29)Jr ,1,

E (1 -Jr(Y', D., Z) E[D( *)(y' Do Z)IY' D. V1IY' > Y z)Jr(Y', D., Z) u " " ,.

Substituting (5.28) and (5.29) into (5.27) yields

Z - E[ZIY,D. = 1] + iu*(Y) D(u*)b=l _E(D(u*)(y"D.,Z)IY' yz)Jr(Y, 1, V) Jr(Y', D., Z) >,

_ 1 - Jr(Y, 1, V) E[D(u*)IY, D., V11Jr(Y, 1, V) 6=1

+ E (1 :/i,~~~~)Z) E[D(u*)IY', D., V11Y' > Y, z)1

(Y, Z) K(u*)(Y, (5.30),1,

we use

and the operator K is defined as

u*(Y, Z) 100

l(Y 2:

. Then we

$, Z)dA($

(Y,Z) 1\) } .

Y = 1 0

Do=

Solving this for

Note that

yields

E[Jr(Y, 1, V}ZIY, ~ = ~J E[ZIY, ~ 11E[Jr(Y, 1, V )IY, ~ = l)J j

E[ZIY,~ = 1, R = 1J E[ZIY, ~ = 1J . (5.33)

combining (5.32) and (5.33) with (5.31), we obtain (5.23). o

The reason that we solve for u* instead of s* is that there is no unique solution if we solvefor : if s*(Y, Z) satisfies (5.25) with u* replaced by s* - E[s*IY, ~ 1], then s*(Y, Z) + f(Y)will satisfy (5.25) for any function f(Y). From the form of k E K we know that k is determinedby u(Y,Z) == s(Y,Z) - E[s(Y,Z)IY,~ = 1]. So we only need to solve for u*(Y,Z) == s*(Y,Z)-E[s*(Y, Z)IY, ~ 1J and then compute Ie k*.

The parts of equation (5.23) involving K can be viewed as an integral operator on the unknownfunction u*. So this equation is an operator equation of the second kind; see e.g. KRESS (1999) andRUDIN (1973), Chapter 4. Note, however, that the terms of K involving conditional expectationgiven Y' > Y correspond to non-compact operators (see e.g. RUDIN (1973), Problem 17, page 107).vVe will not explore existence and uniqueness issues connected with (5.23) here; both will be evidentin the examples treated in Section 6.

Proposition 5.4. If there is no missing data, i.e. Jr(Y, 5, v) = 1 for all Y, 5, v, then the solution ofthe integral equation (5.23) is

u*(Y, Z) s*(Y, Z) - E[s*(Y, Z)IY, ~ = 1] = Z - E[ZIY, ~ 1],

yielding the efficient score for complete data, D(u*).

Proof. vVhen Jr == 1, the operator K is

Z)dA(sIZ) + E[D(u*)(Y',~, Z)IY' > Y, Z)

+ (Y',~, Z)iY' > Y, Z)

o.

Remark: Note that the integral equation corresponding to our equation by ROBINS,

ROTNITZKY, AND ZHAO ( in their section 8.3, page 862, fails to have the reduction propertydescribed in Proposition and hence is apparently incorrect, at least according to ourinterrlrei:,ation of the notation of ROTNITZKY, AND ZHAO ( Here is a counterexample:Using our notation, when 15, 1 the integral equation of RRZ can simplified to

Z) = Z + E[D(u) Y?:: y].

Since 1f(Y, 15, 1 means that we have complete data, the solution of this equation should beu(y, Z) = Z - E[ZIY y,.6- 1], so that D(u) is the efficient score function. Let us considerthe example that the survival time is exponentially distributed with rate'\ 1, subjects arefollowed up to either failure or time t 1, and the mass function of the covariate Z E {O, I} ish(z 0) = h(z 1) 1/2. Under the proportional hazard assumption, the density of the data isgiven by

___ { e{)z-yeBz

h(z), 15 = 1, °:S y :S 1;q(y,b,,,,) - h(z),. 15 = 0, y = 1.

When e= 0, the distribution is even simpler:

{e-Yh(z), 15 = 1, O:S y :S 1;

q(y, 8, z) = e-1h(z), 8 0, y = 1.

Then we have

So for the efficient score of the complete data we have

u(y, Z) = Z

Hence from equation (5.34) we should have

12

E[D(Z - 1 IZ, Y ?:: yJ2

Since

1 ~ '.6- iY 1

- - - - dt =2 2 1

.0 2

12

1

2

1

2o. >

But since E[LlIZ 0, Y 2: y] ::; 1 and E[YIZ = 0, Y 2: y] > °for any °::; y < 1, we have

E[LlIZ = 0, Y 2: y] E[YIZ 0, Y 2: y] < 1.

So

This yields a contradiction.

E[D(Z _ 12

1= 0. Y > yl > --.

• - J 2

o

5.3 Alternative Models for Missing Data with Proportional Hazards

We now consider some alternative scenarios for the models involved in the two-phase designs.'When Z (X, V) and we are only interested in estimating the effect of X, we may assume that

V and T are conditionally independent given X; i.e. V affects T only through X. Then we replace0' z by e'x in the model (2.8). After going through the same procedure as deriving equation (5.23),we obtain

u*(Y,Z) - {K(U*)(Y,Z) - E[IT(y~i~~~I;)Ll = I]E[K(u*)(Y,Z)IY,Ll I]}IT(Y, 1, V) {X - E[XIY, RLl = I]} , (5.35)

where the operator K is as defined in (5.24).If we are not able to observe Y in the first phase, we will observe the data:

and the efficient score will have the form

k* = R D(u*(Y, X, V))IT

R-IT[ ( *( /)'1"]--EDu Y,X,t ),u,V.IT

By the same method used to derive (5.23), we find, for estimation of the coefficient of Z, that u*satisfies (5.23) where

K(u*) Z) = faoo I(Y 2:

+ IT(Y, 1,

+

(s, Z) dA(sIZ)

---'---'---'---'-/y' > Y, Z )

>y.

X

6 Examples of Information Bound Calculations: Cox lTIodel

The case-cohort design, studied by PRENTICE ( and SELF AND PRENTICE ( , and theexposure stratified studied BORGAN. SAMUELSEN, GOLDSTEIN, AND

POGODA are two special cases in In the case-cohort ue.'l!SJ.l,

the complete information is essentially observed for all the failures and a simple random subsampleof the non-failures. exposure stratified case-cohort design is a modification classicalcase-cohort design, in which complete covariate data is observed for all failures and a stratifiedrandom sarnple of the non-failures, and the stratification is based upon a correlate (or surrogatevariable, available for everyone) of the true exposure (or prognostic factor) of interest. In thissection, as in our formulation of Example 2.3 and subsequent development in Section 5, we treatthe simplified ij.d. versions of these sampling designs.

Pseudo-likelihood type (inefficient) estimators have been proposed by PRENTICE (1986) forcase-cohort designs, and by BORGAN, LANGHOLZ, SAMUELSEN, GOLDSTEIN, AND PaGODA (2000)

for exposure stratified case-cohort designs. Efficient estimators for these designs apparentlyremain unknown. But information bound calculations can tell us how much information we couldpotentially gain from fully efficient estimators, and which methods use the observed data moreefficiently, if efficient estimators were available. Here we give two examples in which the informationbound calculations can be carried out analytically.

6.1 The Ll.D. Version of the Case-cohort Study

vVe assume that the true distribution has exponentially distributed failure times and a singlebinary covariate Z taking values °and 1. Let h(z) = P(Z = z) be the probability that a subjecthas covariate value z E {O, I}; thus h(O) + h(l) 1. The censoring time is distributed with pointmass 1 at t = 1, which means that all subjects in the cohort are followed from time zero to eitherfailure or to the end of the study at t = 1. This is discussed as an example by SELF AND PRENTICE

(1988). The density of the complete data can be written as

= { wI(yiz)h(z) =

w2(y!z)h(z) =h(z), 0 1, °~ y ~ 1;

<5 = 0, y = 1.(6.1)

In the ij.d. version the case-cohort study, a simple random sample is taken from the non-failureswith sampling (inclusion) probability ?ro. The values of the covariate Z are measured for all thefailures and only the sampled non-failures. Hence it is a special case of the two-phase designdiscussed in the sections with Ll, := P(R = 1 Ll, ?r(Ll) only, and where

= 1, . To simplify the we drop the from u*. the K opl:~rator

reduces to

+E > y,Z

E Z

and we need to solve equation

to obtain the function

E[K(u)(Y, Z)IY = y, Do =

Thus the information for () is

= z - E[ZIY = y, Do = 1]

Iii EP{[~D(U)

The detailed derivation of u u* and the calculation of To are given in the Appendix (Section 7).'When we assume h(O) h(l) = 1/2, we have, for 0 :::; y :::; 1,

where

{

u(y, 1) = al(Y) + C {e>.ye8 + },

u(y,O) = -b1(y) + C {-e>.Y + --,-~~-n};(6.5)

e->'y

al(Y) = e->'y + eO->.ye8'

eo->.ye8

b1(y) = e->'Y + eO->.ye8 '

and C is a constant determined by the parameters in (6.1). In our example calculations theexponential failure rate parameter A is selected to give pre-specified failure probability prior tothe end of the study in the group with Z 0, the baseline failure probability P(T :::; liZ = 0).When () = 0, the above results can be simplified even further and the information calculation isstraightforward. If () #- 0, numerical integration is needed.

,...

Baseline failureproba~i1ity priorto t=1.

0.010.050.10.20.3

0.0 0.2 0.4 0.6Sampling Fraction

0.8 1.0

Figure 1: Variance ratios for the Self and Prentice pseudo-likelihood estimator (SP) for differentsampling fractions and baseline failure probabilities P(T :::; liZ = 0), e= In(2).

~,...

00::::::;-0::::lU.

-0<00)0

j~jQ._0

UJa::«C\J

o

Baseline failureprobability priorto t=1 :

0.010.050.10.20.3

oo.....,..---......,....----.-----.-----r-----,--J

0.0 0.2 0.4 0.6Samplinq Fraction

0.8 1.0

Figure 1 displays the ratios of asymptotic variance (SP Variance) of the SELF AND PRENTICE

( pseudo-likelihood estimator to the information lower bound for "observed derived inthis paper. Figure 1 shows that when the disease is rare, i.e. the baseline failure probability is verylow, the pseudo-likelihood estimator is close to fully efficient. As the failure probability increases,the pseudo-likelihood estimator loses more efficiency, especially when the subcohort fraction issmall. So there is some room for developing more efficient estimators in the situation of moderatefailure probability for case-cohort designs.

Figure 2 displays the ratios of the information lower bound for estimation of 0 based on the"observed data" (1//0with 10given in (6.4)), and the asymptotic variance of the partial-likelihoodestimator for of 0 based on "complete" (or "full") data, under different baseline failure probabilitiesand subsample fractions when e(j 2. Figure 2 shows that the case-cohort design loses moreinformation relative to complete data (supposing that an efficient estimator is available), as thefailure probability increases and as the subcohort fraction decreases. Actually, when the baselinefailure probability is above 0.5, the curves in Figure 2 move toward the upper left again as the failureprobability increases, but there is less interest in these high failure probability cases in practice.From Figure 2 we can see that there may be much more room for developing more efficient designsand corresponding efficient estimators. (For example, an alternative design might be an "exposurestratified case-cohort design" as in our second example.)

·When 0 = 0, the corresponding figures (not shown) have the same patterns as Figures 1 and 2,but with slightly different magnitude.

6.2 The Ll.D. Version of the Exposure Stratified Case-cohort Study

Assume that X is the variable of interest, and that V is a surrogate variable for X, or measurementof X with error. We suppose that V can be observed for everyone in the entire cohort, but X is onlyobserved for subjects in the subcohort and failures. Suppose that V is conditionally independentof T given X. Then under the Cox proportional hazard assumption, we can model

1 F(j,),(ylz) = exp( -A(y)e(jIX), so

The model for this type of data is essentially the same as the model we discussed in previoussection. Replacing 0' z by 0'x in the derivation in Section 5 yields the integral equation (5.35),which is just the equation (5.23) with Z E[ZIY, D. = 1] at the right hand side replaced byX - E[XIY, D. IJ.

The i.i.d. version of the exposure stratified case-cohort design studied by BORGAN, LANGHOLZ,

SAMUELSEN, GOLDSTEIN. AND POGODA is a special case model. Here we discuss aneX<imple with a binary covariate X E a VEl}. Then we

2 2 X and V

1

~{O,x,

at time t = 1. For our calculations the failure rate pal'aulettor A will be setbaseilIle failure probability as in subsection 6.1. mass function

;];, we have the for the data

of theto ""rI1P1T{>

of (X,

0= 1, °::; Y ::; 1;

o 0, Y = 1.

The cohort is categorized into three strata: {b. = I} {b. = 0, V O}, and {b. = 0, V = I}. Weobserve complete information of all the subjects in the first stratum, and of 7fO, 7fl fractions of thesubjects in the second, third strata. vVe only observe (Y, b., V) for other subjects. In probabilitylanguage we have P(R lib. 1) == 7f(Y, 1, V) = 1, P(R lib. = 0, V 0) == 7f(Y, 0, 0) JrO, andP(R lib. = 0, V = 1) Jr(Y, 0, 1) = Jrl. Then the K operator in (5.24) is reduced to

K(u)(y, x, v) ~ . (D~) )io u(t, x, v)eOX Adt + E -:;-IY > y, X = x, V = v

(1 Jr Jr E[D(u) IY, b., V] IY > y, X = x, V = v) , (6.7)

and we need to solve the equation

u(y, x, v) (K(u)(y, x, v) - E[K(u)(Y, X, V)IY = y, b. = 1]) x - E[XIY = y, b. = 1] (6.8)

to obtain the function u(y, x, v). Thus the information for () is

(6.9)

The details of the derivation of u u* and calculation of Ie are given in the Appendix (Section 7).Let ho P(X = 0) = h(O, 0) + h(O, 1) and hI == P(X = 1) = h(I,O) + h(l, 1). Then we have, for°::; y ::; 1,

u(y, 1, 1) a2(Y) + C1l3hoeAyeB + fC1.C2(y),

u(y, 1,0) = a2(Y) + C2(1- hoeAyeB + fC1.C2(Y)'

0,1) = -b2(y) C1(1- a)h1eAY + fC1.C2

'u(y, 0, 0) = -b2(y) - C2ah1eAY + fC1.C2(Y);

(6.10)

where

~

"~'-''-, '-, "-,"\ "'.:', .........

',,, ..............." ......." ...' , ...............................

...... '...... ...........--............................ -----------

---<~:---------~~--------(alpha, beta) ------ _

0.5, 0.5 ..._.:.:-:-.~:.-:~004,0040.3,0.30.2,0.20.1,0.1

ooe,-----,.------,....------.-------r-1

0.0 0.05 0.10 0.15 0.20Baseline Failure Probability Prior to t=1

Figure 3: Asymptotic relative efficiency of optimal variance with a surrogate variable and differentbaseline failure probabilities P(T ::; llX 0), B = In(2), P(X = 0) = 0.9, 11'0 = 11'1 0.1 (withoutstratification) .

or-----------------------,

(alpha, beta)0.5,0.50.4,0.40.3,0.30.2,0.20.1,0.1

------ ---------

o0 ........----...,-----...,-----...,-----...,--1

0.0 0.05 0.10 0.15 0.20Baseline Failure Probability Prior to t=1

Figure 4: Asymptotic relative efficiency of optimal variance with a surrogate variable and differentbaseline failure probabilities P(T ::; llX = 0), e = In(2), P(X = 0) = 0.9, 110 =J 111 (withstratification) .

We can calculate Ie for different 0, (3, P(X = 0), 110, 111, e, and A by using numerical integration.\Vhen 0 ,6 0.5, the exposure stratified case-cohort design is equivalent to the classical casecohort design (previous example) since V is not correlated with X. Figures 3 and 4 show thecomparisons of the asymptotic relative efficiency (ARE) of fully efficient estimators (if they exist)for the exposure stratified (at 110 111 and 110 =J 111) and classical case-cohort designs at eO 2 andP(X = 0) = 0.9. \Vhen e = 0, the corresponding figures (not shown) have similar patterns butslightly different magnitude. In Figure 3, the sampling probabilities in the last two strata are equal,i.e. 110 111 0.1. In Figure 4, 110 and 111 are different, but such that the numbers of sampledsubjects in strata {il = 0, V O} and {il = 0, V I} are the same (or approximately same),and the fraction of sampled subjects from the two strata all together is 0.1 (or approximately 0.1).We can see from Figure 3 that the efficiency increases as the sensitivity (1 and specificity(1 - increase as a result of incorporating the surrogate variable into the model, even without

\Vhen we do sampling =J 1It), Figure 4 shows that the efficiencyare even So information and sampling will increasethe Note that illustrated here is

choose their optimal sampling fractions and a very small failure rate ,\ = 0.01 which we believe issmall enough to be able to compare to their Please note that the in Table 1 areunder the condition that the subcohort fraction is equal to the failure probability which is a verysmall number.

Table 1. Comparisons of asymptotic relative (ARE, relative to a fullcohort the aproximate AREs of pseudo-likelihood estimators and the AREs ofinformation bounds for a stratified case-cohort design, which has one binary covariateX and binary surrogate covariate V with specificity 1 - ;3 and sensitivity 1 Q andthe subcohort size equals the expected number of cases.

ARE(PLO), ARE(IBb), % ARE(PL)

ARE(IB) ,

1 (3 1-,8 1 (31 a 0.50 0.70 0.90 0.50 0.70 0.90 0.50 0.70 0.90

0.50 35.5 36.5 40.8 36.0 37.8 45.9 98.6 96.6 88.90.70 36.5 39.6 47.3 37.7 43.0 55.9 96.8 92.1 84.60.90 40.8 47.3 60.5 43.9 53.0 70.3 92.9 89.2 86.1

(b) P(X = 1) = 0.50

ARE(PL), ARE(IB) , ARE(PL)ARE(IB) ,

1-,8 1-,8 1-(3I-a 0.50 0.70 0.90 0.50 0.70 0.90 0.50 0.70 0.90

0.50 52.9 54.0 58.4 53.5 55.5 63.2 98.9 97.3 92.40.70 54.0 57.3 64.7 55.5 61.1 72.0 97.3 93.8 89.90.90 58.4 64.7 75.8 62.6 71.2 83.6 93.3 90.9 90.7

a. PL Pseudo-Likelihood. This part is taken from Table 1 of BORGAN, LANGHOLZ,

SAMUELSEN, GOLDSTEIN, AND POGODA (2000). b. IE Information Bound.

ACKNOWLEDGEMENTS: We owe thanks to Norman Breslow, Nilanjan Chatterjee, and theother participants in the l\1Iissing Data 'Working Group at the University of Washington during theperiod 1995 - 1998, for many discussions about missing data subject of this paper.

counting processes: a large

References

Andersen, P.K. and Gill, R. D. Cox's reg:re:::,sion modelsample Statist. 10, 1100-1120.

Begun, J. M., \V. J., Huang, \V J. A. . Information and asymptoticertlcHmcy in parametric-nonparametric models. Ann. Statist. 11 432-452.

Bickel, P . .1., Klaassen, C. A. .1., Ritov, Y. and Wellner, J. A. (1993). Efficient and AdaptiveEstimation for Semiparametric Models. Johns Hopkins University Press, Baltimore.

Borgan, 0., Langholz, B., Samuelsen, S.O., Goldstein, 1., and Pogoda, J. (2000). Exposurestratified case-cohort designs. Lifetime Data Analysis 6, 39-58.

Breslow, N.E., McNeney, B., and "Wellner, J.A. (2000). Large sample theory for semiparametricregression models with two-phase sampling. Technical Report ??, Department of Statistics,University of "Washington.

Chatterjee, N. and Breslow, N.E. (2000). A pseudo score estimator for regression problems withtwo phase sampling. Submitted to Biometrika.

Chen, K. and Lo, S. (1999). Case-cohort and case-control analysis with Cox's model. Biometrika86, 755-764.

Cox, D.R. (1972). Regression models and life tables (with discussion). Journal of the RoyalStatistical Society. B 34, 187-220.

Efron, B. (1977). The efficiency of Cox's likelihood function for censored data. Journal of theAmerican Statistical Association 72, 557-565.

Efron, B. and Johnstone, 1. (1990). Fisher's information in terms of the hazard rate. Ann. Statist.18, 38-62.

Emond, M. and Wellner, J.A. (1995). Missing data: expansion on BKRW (1993) andcomments on/computations for Robins, Rotnitzky, and Zhao. Technical Report, Universityof \Vashington.

Horvitz, D.G., Thompson, D.J. (1952). A generalization of sampling without replacement froma finite universe. Journal of the American Statistical Association 47, 663-685.

Huang, J. (1999). Efficient estimation of the partly linear additive Cox model. Ann. Statist. 27,1536-1563.

Kress. R. 1999). Linear Integral Equations, Second Edition. Springer-Verlag, New York.

LaWHcSS, J. F .. \VilcL C. .1., and Kalbfleisch, J. D. (1999). Semiparametric methods for response-selective and m J. Roy. B 61. 413-438.

B. (2001) Information Bounds and Efficient Estimation for Two-Phase Designs withLifetime Data. Ph. D. dissertation (in progress), University of Washington, DepartmentBiostatistics.

Prentice, RL. (1986). A case-cohort design for epidemiologic cohort studies and diseaseprevention trials. Biometrika 73, I-II.

Ritov, Y. and Wellner, J.A. (1988). Censoring, martingales, and the Cox model. ContemporaryMathematics: Statistical Inference for Stochastic Processes 80, ed. N. U. Prabhu, Providence,RI: American Mathematical Society, 191-220.

Robins, J. M., Hsieh, F., and Newey, W. (1995). Semiparametric efficient estimation of aconditional density with missing or mismeasured covariates. J. Roy Statist. Soc. B 57, 409424.

Robins, J.M., Rotnitzky, A., and Zhao, L.P. (1994). Estimation of regression coefficients whensome regressors are not always observed. Journal of the American Statistical Association 89,846-866.

Rudin, VV. (1966). Real and Complex Analysis. McCraw-Hill, New York.

Rudin, W. (1973). Functional Analysis. McCraw-Hill, New York.

Sasieni, P. (1992a). Information bounds for the conditional hazard ratio in a nested family ofregression models. Journal of the Royal Statistical Society B 54, 617-635.

Sasieni, P. (1992b). Nonorthogonal projections and their application to calculating theinformation in a partly linear Cox model. Scand. J. Statist. 19, 215-233.

Scott, A. J. and vVild, C. J. (1997). Fitting regression models to case-control data by maximumlikelihood. Biometrika 84, 57-7I.

Scott, A. J. and Wild, C. J. (1998). Maximum likelihood for generalised case-control studies.Preprint, University of Auckland.

Self, S.C. and Prentice, RL. (1988). Asymptotic distribution theory and efficiency results forcase-cohort studies. Annals of Statistics 16, 64-8I.

Shorack, C.R and Wellner, J.A. (1986). Empirical Processes with Applications to Statistics.Wiley, New York.

Van del' Vaart, A. VV. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge.

UNIVERSITY OF VVASHINGTON

DEPARTMENT OF BIOSTATISTICS

Box 3,15723219,15-72:32

UNIVERSITY OF vVASHINGTON

DEPARTMENT OF BIOSTATISTICS

Box

U.S

DEPARTMENT OF STATISTICS

UNIVERSITY OF \;VASHINGTON

P.O. Box 354322SEATTLE, \;VASHINGTON 98195-4322U.S.A.e-mail: washington.edll

7 Appendix.

7.1 Details of the Calculation for Section 6.1.

For the density function (6.1),

WI

VV2(yIZ)

conditional sub-distributions are

} J.IU<v<ll + { 1 -

1

and W(ylz) = W1(ylz) + W2(yjz).For the case-cohort design, 1T(Y,~) == 1 and 1T(Y, ~)IL'>=o 1To. Thus the K operator

defined in (5.24) reduces to (6.2) and the integral equation (5.23) reduces to (6.3). Let

Then we have

a(Y,~,Z)D(u) ~u(Y, Z) 1 l Y

(. Z) oz \d--- - ut e /\ t1T - 1T(Y,~) 1T(Y,~) 0' .

(7.1)

Now let

IY,~J.

and thus we

}Al

, f..,j,j

I

I

>y,z=

1 {1°O

{t, ----:.........:~ 11

Let h(O) h(l) Since we have

11

[It z)eBZ,\ds] '\eBz->.teeZdt

= 1Y [11

'\eBz-Ateez dt] '\eOZu(s,

= (e->.yeez e->.eez) 1Y,\eoZu(t,

ds +11 [1 1

'\eOz->.teeZ

dt] '\eOZcu(s, z) ds

dt+11(e- Ateez _e->.eez)'\eBZu(t,z)dt,

the operator K in (6.2) simplifies to

K(n)(y, z) ~ {M'\e>.yeO {eBJ~ u(t, 1) dt - J~l u(t, 0) dt}

1\11'\e>'Y {eO Jo1u( t, 1) dt - J~l u( t, 0) dt}

z= 1,

where

1\11

c

1 "0 e>.yeO '\e->.->.ee

"0 e->' +

1\11'\ {eB11

u(t, 1) dt 11

u(t, 0) dt} . (7.2)

qy,b.,Z(Y, 1, z)

qy,dy, l)11

JE[K(u)(Y, IY y,

Then the following calculation is straightforward

1

LK(u)z=o

Thus . The COIlst,ant C can

dt .D1 = 111_---::-

to calculate the information bound for e.

tal dt, B1= t b1 (t))"dt,.fo .fo

the expression , we areefficient score for e is

= R D(u) _ R - " Y: ill.7r 7r

Once we\Ve know that

Then

Ie Ep

Ep [~D2(u)] +Ep [(R 2,,)2 E 2[D(u) I Y,ilJl" " J

+ Ep [- 2R(~;,,)D(u)E[D(u) I y,il]]

I; + 12+ I;,

where

I; = EP [~ ( t.u(Y, Z) {u(t, Z)eoz -\ dty]~ tottal' ;~ (JU(Y, z) - 1"u(t, z )e'·-\ dt) ~(r IJ)q"(y, J, Z )q'-"(y, J)1'(dy)

1 {ill ( 1Y )2= L 2 - u(t,z)e8z )"dt "Oq(y,o=O,z)J1(dy)z=O 0 "0 0

+ l ' (U(Y, z) -1Y

u(t, z)eo.-\ dt)' q(y, J ~ 1, z)p(dy), }

1* E r(R-,,)2 E2rD( 'I YA 1]2 p L 2 l U) I ,uJ

"1 1 1

L L 1 [D(u) I Y = y, il = oJq(r Io)q(y, c5)J1(dy) ,&=0 r=O 0

1*3 E p [- ( ilu(Y,

1 1 1

L ~L 1----'-;::--'-0=0

[Y dt).fo

L

7.2 Details of the Calculation for Section 6.2.

For the ex~mlple of exposure str''Ltitied case-cohort study, the conditional sub-distributions are

WI {I } 1

W2 = 1

and W(ylx, v) = WI + W2(y!X,Since the effect of V is not included in the model, this example is not exactly but essentially

the same as the model discussed mainly in Section 5. Instead of using the equation (5.23), we use(5.35) to solve for u == u*. Since 1T(Y,~, V) == 1, the operator K can be reduced to (6.7) andthe integral equation for u(y, reduces to (6.8). Let

a(Y, L\X, V)D(u)

1T~u(Y, X, V) 1 rY ex1T(Y,~,V) - n(Y,~, V) io u(t, X, V)e )"dt.

Let 1T(Y,~,V)!u=o 1To(V) = 1To if V °or 1TI if V = 1. Then we have

dt

1

E[a(Y,~, X, V)IY > y, X = x, V v]

= 1 VV~Ylx, v) {ioo

a(t, 1, x, v) dWI(tlx, v) +ioo

a(t, 0, x, v) dW2(tlx, v)}

= 1 W~YIX,V) {ioo

[u(t,x,V) -itU(s,x,v)eex)"dS] dWI(t\x,v)

+ 100

[_~(\ rt

u(s,x,V)eex)"dS] dW2(t1x,V)}

y 1Tov;io

{I I u(t,.x,v)dt- (r tu(s,;r;,v)eex)"dS~JI. y i y Lio

.lI u(t, x, dt} .

Now let

thus we have

}E[b(Y,.6., V)iY > y,X = x, V =

1 _ 1 {1°O 0,

L~ -h(-O,--'----'------,..11

Since we have

11 [l tu(S, ;1:, v)eOX .\ dS] .\eOx-Ate8Xdt

= 1Y [1 1

.\eOx-Ate8X dt] .\eOXu(s, x, v) ds +11 [11

.\eOx-Ate8Xdt] .\eOXu(s, x, v) ds

= (e-Aye8x _ e- Ae8X ) 1Y.\eOXu(t,x,v)dt+11

(e-Ate8X _ e- Ae8X ) .\eOXu(t,x,v)dt,

the operator K in (6.7) simplifies to

K(u)(y, x, v) 1 - ?To(v) Aye8x {11\ OX-Ae8x ( ) d- () e Ae u t, x, v t

?To V 0

[

1 h(x'. v)e-Ae8X

' 11, ,] 8X}

-" . 8 u(t, x, v).\eOX dt e-Ae~ h(O, v)e- A+ h(l, v)e- Ae 0

-A{6ho.\eAye8 {eO Jo1 u(t, 1, 1) dt - J~ u(t, 0,1) dt} , x = 1, v = 1;

-N(1 (3)ho.\eAye8 {eO Jo1 u(t, 1,0) dt - Jo1 u(t, 0, 0) dt}, x = 1, v = 0;

.1\:1(1 - a)h1.\eAy {eO J~l u(t, 1, 1) dt J~l u(t, 0,1) dt} , x 0, v = 1;

N ah1.\eAy {eO J~ 1,0) dt - Jo1u(t, 0, 0) dt} , x = 0, v 0.

Cl/3hoeAye8, x = 1, v = 1;

;1: 1. v 0:

(1 ;1: = 0, 1:

= 0. = O.

HereI

and

{ -iV1A. {eO 1,1) dt

-VA.! 1,0) dt. l

0,1) dt} ,

0, dt}.

Then we

(Y,X, 1Y=:y,~ 1]1 1

= L L K(u)(y, X,

x=Ov=O

1 _>..y +1{K(u)(y, 1, l)h(l, l)eO->..yee + K(u)(y, 1, O)h(l, O)eo->..yee

~oe

+ K(u)(y, 0, l)h(O, l)e->"y + K(u)(y, 0, O)h(O, O)e->"Y}

hoh1(1 - eO)[(l a)f3C\ + a(l f3)C2]hoe->"y + h1eo->..yee

Thus from (6.8) we obtain (6.10). Plugging (6.10) into (7.4), we obtain the following linear systemfor C1 and C2

{I + M [f3ho(e>..ee 1) + (1 - a)h1(e>" - 1) - (1 eO)2(1 - a)f3D2]} C1

- {2\11(1 eO)2a (1 - f3)D2} C2 -A!eO A2 M B 2,

{N(l - eO)2(1 a)f3D2} C1 { 1 + N [(1 f3)ho(e>..ee - 1) + ah1 (e>" - 1)

- (1 eO)2a (1 - f3)D2] } C2 N eO A 2+ N B 2

where

dt.

Solving above linear system we obtain the values of the constants C1 and C2 .

Now we are able to calculate the information for B. By the same decomposition as in theprevious example, we have re 1'1 + + 1:3,

0,0.

1

x, .lY

+11

(u(y, 1,0) -lY

I,O)eO,\dt) 2 8 I,x = I,v = O){L(dy)

+11

(u(y, 1, 1) 1Y

I,I)eO'\dt)2 8 l.x I,v=

+: (_ ri

0'0)'\dt)2 q(I'8 O,x=O,v=O)"0 Jo

+ ~ (- (u(t,O, I)'\dt)2

q(I, 8 = O,X O,v = 1)7TI Jo

+ :0 (-11u(t, 1, O)eo'\dt) 2q(I, 8 = 0, x l. v 0)

+ :1 (-11U(t,I,I)e8'\dt)2 q(I,8 O,x=I,v=I).

1 1 1 1 ( )2

12 = L L L 1 r :27T

E 2 [D(u)IY y, ~ = 8, V = v]q(rI8, v)q(y, 8, v){L(dy)v=O T=O 6=0 0

= 1 - 7TO (_ h(O,O)e-'\ ( U(t °0)'\ dt7TO h(O, O)e-'\ + h(I, O)e-,\eB Jo "

2h(I,O)e-'\e

B r1 8)- h(O, O)e-'\ + h(I, O)e-,\eB Jo u(t, 1, O)e ,\dt q(I,8 = 0, v 0)

1 - 7TI ( h(O,I)e-'\ ri ° '\d+~ h(O, I)e-'\ + h(I, I)e-,\eB Jo u(t, ,1) t

h(I,I)e-'\eB ri 8)2

- h(O, I)e-'\ + h(I, I)e-,\eB Jo

u(t, 1, I)e ,\dt q(I,8 = 0, vI).

1 1 1 1 ri { 2 (1 ~)Ii =~~~t;Jo T 7T2 II D(u)E[D(u)1Y = y,~

·q(rI8, v)qT (y, 8, x, (y, 8, v){L(dy)

0)

o,v = 0)

O,:r 1,

0, :r

1 ~ = O.

= I.~ = 0. V1.

o.

O,O)'\dtE[D(u)IY = I,~ 0, V = 0]q(I,8

+ --'----'-

Documents

Bin I'.UillOUU, and Jon A. Wellnerexplicit calculations for two special cases: case cohort studies and exposure-stratifiedcase-cohort studies. Details of the calculations in Section