Department of Biostatistics University of Copenhagen · Research Report 13/03 Department of Biostatistics University of Copenhagen. ... this article we discuss and formalize an earlier

Concordance for prognostic models with competing risks

Marcel Wolbers

Michael T. Koller

Jacquline C. M. Witteman

Thomas A. Gerds

Research Report 13/03

Department of Biostatistics University of Copenhagen

Concordance for prognostic models with

competing risks

MARCEL WOLBERS∗1, MICHAEL T. KOLLER2,JACQUELINE C.M. WITTEMAN3, and THOMAS A. GERDS4

1Oxford University Clinical Research Unit, Wellcome Trust MajorOverseas Programme, Ho Chi Minh City, Viet Nam, and Centre

for Tropical Medicine, Nuffield Department of Medicine, Universityof Oxford, Oxford, UK

2Basel Institute for Clinical Epidemiology and Biostatistics,University Hospital Basel, 4031 Basel, Switzerland

3 Erasmus Medical Center, 3015 Rotterdam, The Netherlands4Department of Biostatistics, University of Copenhagen, 1014

Copenhagen K, Denmark

February 19, 2013

Abstract

The concordance probability is a widely used measure to assess dis-crimination of prognostic models with binary and survival endpoints. Inthis article we discuss and formalize an earlier definition of the concor-dance probability for regression models with competing risks endpoints(Wolbers et al, Epidemiology 2009;20: 555-561). For right censored data,we investigate inverse probability of censoring weighted (IPCW) estimatesof a truncated concordance index. The estimates are based on a workingmodel for the censoring distribution. The simplest model assumes thatthe censoring distribution is independent of the predictor variables. If theworking model is correctly specified then the IPCW estimate consistentlyestimates the concordance probability. The small sample properties of theestimates are assessed in a simulation study also against misspecificationof the working model. We further illustrate the methods by computing theconcordance probability for a prognostic model of coronary heart disease(CHD) events in the presence of the competing risk of non-CHD death. Cindex; Competing risks; Concordance probability; Coronary heart disease;Prognostic models.

1

1 Introduction

Clinical decision-making and cost-effectiveness analyses often rely on prognosticmodels that quantify a subject’s absolute risk of a disease event of interest overtime. Traditionally, these models are based on survival analysis and a large bodyof research discusses the development and validation of prognostic models fortime-to-event data [9, 22, 24]. However, study populations of common diseasesincreasingly consist of elderly individuals with varying degrees of co-morbiditywho are likely to experience one of several disease endpoints other than theendpoint of main interest [14]. As an example, prediction of coronary heartdisease (CHD) events in elderly subjects is complicated by the fact that subjectsmay die from other causes prior to the observation of the disease event of interest[25, 13].

Another example is the quantification of the benefit from implantable cardio-verter-defibrillators (ICDs) in patients at risk of sudden cardiac death. Patientswith an ICD profit from implantation if the device appropriately treats a life-threatening cardiac arrhythmia event (e.g. ventricular fibrillation). However,any benefit from ICD implantation is abolished if a patient dies prior to appro-priate ICD therapy. It is thus of interest whether death without prior appro-priate ICD therapy occurs in a relevant proportion of patients and how suchpatients could be identified prospectively [15].

It is well known that the naive application of standard survival analysis leadsto bias and risk over-estimation if competing risks are present [8, 18]. Prognos-tic models that take competing risk events into account are thus of growingimportance. In many applications, including the examples mentioned above,the parameter of main interest for medical decision-making is the absolute riskof the event of interest as quantified by its (covariate-dependent) cumulativeincidence function [4]. Thus, regression models are particularly attractive whenthey provide subject specific estimates of the absolute risks based on a set ofcovariates [7].

Several measures for quantifying the accuracy of prognostic models havebeen adapted from the standard survival setting with only one failure cause tocompeting risks. Measures include prediction error curves [21], time-dependentsensitivity, specificity and AUC [19, 27], and reclassification methods [25, 13].Commonly these measures assess prognoses for the event status at a fixed follow-up duration t. In contrast, the concordance index assesses prognoses for the or-der of the event times. For survival data, the concordance index [10] is the mostwidely reported measure of discrimination and we have previously presented asimple adaptation of the concordance index to the competing risks setting [25].

In the present paper, we formally define a cause-specific concordance index inthe presence of competing risks and derive estimators suitable for right censoreddata. Notably, the proposed concordance index depends only on the cumula-tive incidence function of the event of interest. To illustrate the definition, wedescribe the behaviour of the concordance index for data generated accordingto a proportional cause-specific hazards model as well as for data from a directmodel for the cumulative incidence function.

2

We then study estimation of a truncated concordance index in the presenceof right-censoring. We introduce an inverse probability of censoring weighted(IPCW) estimator and demonstrate its consistency if the working model for thecensoring distribution is correctly specified. The empirical bias and mean-squareerror are examined in a simulation study. Finally, we illustrate the methods foran example of coronary risk prediction in older woman using data from theRotterdam Study [12].

2 Definition of concordance

2.1 Assessing scores in uncensored data

Competing risks data without censoring are given by pairs (T,D) of data whereT is the time to the event and D is the event type. For the purpose of dis-cussing the definition and estimation of the cause-specific concordance indexit is sufficient to assume that there are only two competing events. Thus, forsimplicity of presentation we let D = 1 denote the event of interest and D = 2the occurrence of any competing event. In applications it may be important tomodel all competing events separately.

The concordance index is defined for any prognostic score M(X) which canbe used to order subjects with respect to the risk of an event of type 1. Forexample M(X) could be the linear predictor of a regression model for the eventof interest derived on a training data set. We assume that higher values of M(X)are associated with higher risk of the event 1. The concordance probabilityassesses the discriminatory power of the prognostic score. Intuitively, M(X)discriminates well if it assigns high risks to individuals experiencing the event ofinterest early, lower risks to individuals experiencing the event of interest late,and negligible risk to those who never experience the event of interest. Thelatter occurs when the individual experiences a competing event [25].

To formally define the concordance index for uncensored data we assumean independent test data set consisting of i.i.d. realizations of (Xi, Ti, Di) (i =1, ..., n) from the joint distribution of predictor variables and competing risksdata. We then define

C1 = P (M(Xi) > M(Xj)|Di = 1 and (Ti < Tj or Dj = 2)) (1)

for any randomly chosen pair of subjects i, j from the joint distribution of pre-dictor variables and competing risks data. According to this definition, we counttwo types of pairs of individuals for which the ranking of the prognostic scorecan be concordant or discordant. The first type are pairs where individual iexperiences the event of interest while individual j is still event free. The sec-ond type are pairs where individual i experiences the event of interest whileindividual j experiences the competing event at any time. For both types itis intuitive to call the pair concordant when the prognostic score is higher forindividual i. Pairs of individuals which both experience the competing event

3

are not evaluable for C1 but are counted for the cause-specific concordance C2

for the competing event which is defined analogously.It is important to discuss modifications of definition (1) for tied data [see

26]. For example, it may happen that M(Xi) = M(Xj). Depending on theapplication, it may then be sensible to count such pairs with a weight of 1/2:

C1 = P (M(Xi) > M(Xj)|Di = 1 and (Ti < Tj or Dj = 2))

+1

2P (M(Xi) = M(Xj)|Di = 1 and (Ti < Tj or Dj = 2)).

To simplify notation, we used definition (1) as the basis for our further devel-opments.

2.2 Assessing prediction models in right censored data

We now generalize the concordance index defined in Section 2.1 in two ways.First, we replace the simple prognostic score M(X) by a more general predictionmodel M(t,X) for the risk of event 1 until time t. In this way we includeregression models for the cumulative incidence of the event of interest at timet, i.e. estimates of F1(t|X) = P (T ≤ t,D = 1|X), which can be obtained bycombining cause-specific hazards models, by fitting a Fine and Gray regressionmodel or by direct binomial regression [3, 20]. Second, to include the typicalapplication where individuals have a limited duration of follow-up (i.e. in thepresence of right censoring) we define a truncated version of the concordanceindex. Following [23] and [6] we define

C1(t) := P (M(t,Xi) > M(t,Xj)|Di = 1 and Ti ≤ t and (Ti < Tj or Dj = 2)).(2)

The parameter C1(t) can be interpreted as the ability of the model to correctlyrank events of interest up to time t and to distinguish them from competingevents. The truncation is necessary to enable estimation of C1(t) from rightcensored data. Note that C1 = C1(∞) in the special case where the order of apair of predictions does not depend on time, i.e. M(t,Xi) = M(s,Xi) for all s, t.However, C1(∞) is only identifiable when the cumulative incidence F1(·|X) canbe identified on [0,∞) for all X. But in most real world applications even themarginal Aalen-Johansen estimator for F1 (which is independent of X) is onlydefined on [0, tmax) where tmax is the maximal follow-up length of the study.

Indeed, the truncated concordance (2) can be written as a functional of F1

and of the marginal distribution of the predictor values of a pair of individuals(Xi, Xj). To see this, we introduce notation for the order of the predicted risksfor a pair of individuals,

Qij(t) = I{M(t,Xi) > M(t,Xj)}.

4

We also note that I({s < Tj or Dj = 2}) = 1− I({Tj ≤ s,Dj = 1}). Thus,

P{Di = 1 and Ti ≤ t and (Ti < Tj or Dj = 2)}

= EXi,Xj

∫ t

0

(1− F1(s|Xj))dF1(s|Xi) (3)

and we can rewrite the truncated concordance index (2) as

C1(t) =EXi,Xj

(Qij(t)∫ t0(1− F1(s|Xj))dF1(s|Xi))

EXi,Xj(∫ t0(1− F1(s|Xj))dF1(s|Xi))

. (4)

Of note, in the form given in equation (4) the cause-specific concordance forevent 1 does not depend on the cumulative incidence F2 of the competing event.This feature is not obvious in formula (2) but desirable when the aim is to assessthe discriminative ability of a model for F1.

3 Illustration for proportional cause-specific haz-ards models and direct models for the cumu-lative incidence

Cause-specific Cox regression models and direct models for the cumulative inci-dence are the most frequently used regression models for competing risks [18].In this section, we illustrate the properties of the concordance probability in set-ting where the true data-generating mechanism is either a set of cause-specificproportional hazards models or a Fine-Gray model. In both situations, we assesthe concordance of a time-independent prognostic score. Specifically, we simu-late a prognostic index M(X) = X by drawing values from the standard normaldistribution. Using a Monte Carlo approach we then approximated C1 = C1(∞)based on simulated data sets, each consisting of n = 10, 000 uncensored i.i.d.realizations drawn from the respective data-generating mechanism.

3.1 Proportional cause-specific hazards models

Data were generated under proportional cause-specific hazards models with con-stant baseline hazards:

Event 1: λ1(t|X) = λ01 exp(β1X)

Event 2: λ2(t|X) = λ02 exp(β2X)

We set λ01 = 1 and considered three values for λ02 = {0, 0.5, 2}. Note thatλ02 = 0 corresponds to no competing risks. Four different scenarios for β2 wereconsidered: (1) β2 = 0, (2) β2 = −β1, (3) β2 = β1, (4) β2 = 2β1.

For all scenarios, we calculated the concordance probability C1 as dependingon the strength of association of the covariate with the cause-specific hazard of

5

the event of interest by varying β1 from 0 to 10. The results are displayed inFigure 1.

Figure 1 shows only weak dependence of the concordance probability on thecause-specific hazard of the competing risk if the latter is not affected by thecovariate (black solid lines). If the covariate is associated with an increasedhazard of the event of interest but a decreased hazard of the competing event,the discrimination ability improves (black dashed lines lines). In contrast, thediscrimination ability is hampered if covariates affect both cause-specific haz-ards with regression coefficients of the same sign (gray lines). This occurs inparticular if there is a strong association with the competing risk (gray dashedlines) or if the baseline competing hazard is high (right panel).

These results can be explained by the fact that the overall effect of a covariateon the cumulative incidence function of the event of interest depends on the sizeof the cause-specific baseline hazard of the event of interest and the competingevent as well as on the size of both cause-specific hazard ratios [1, 14]. In partic-ularly, a covariate that is positively associated with the cause-specific hazard ofthe event of interest can simultaneously be negatively associated with the corre-sponding cumulative incidence function of that event if the cause-specific hazardof the competing event is larger and shows a stronger positive association withthe covariate. This explains the observed concordance probabilities < 0.5. Im-portantly, these findings illustrate that a prognostic model for the absolute riskof the event of interest with good discrimination requires predictors which areonly weakly or, even better, reversely associated with the cause-specific hazardof the competing event.

3.2 Direct models for the cumulative incidence

We generated data from the following Fine-Gray model [3]:

P (T ≤ t,D = 1|X) = 1− [1− p(1− exp(−t))]exp(βX).

Note that when X = 0, this is a unit exponential mixture with mass 1 − p at∞, such that the probability of an event of type 1 in the interval [0,∞) is p.We varied the value of p between values 0.1, 0.5, and 0.9. As before we variedthe strength of association β from 0 to 10. Results are displayed in Figure 2.In addition to the expected positive association between β and the concordanceprobability there is also a weak dependence on p.

4 Estimation of concordance in the presence ofright censoring

4.1 Right censored data

To indicate the end of follow up for subject i we introduce a subject specificcensoring time Ci. Thus, we observe only (Xi, Ti, Di,∆i) (i = 1 . . . n) where

6

Ti = min(Ti, Ci), ∆i = I{Ti ≤ Ci} and Di = ∆iDi. We also use the followingnotation:

N1i (t) = I{Ti ≤ t, Di = 1} N1

i (t) = I{Ti ≤ t,Di = 1}N2i (t) = I{Ti ≤ t, Di = 2} N2

i (t) = I{Ti ≤ t,Di = 2}Aij = I{Ti < Tj} Aij = I{Ti < Tj}Bij = I{Ti ≥ Tj and Dj = 2} Bij = I{Ti ≥ Tj and Dj = 2}

The event-free survival probability conditional on the covariate Xi is then givenby

S(t|Xi) = 1− E(N1i (t)|Xi)− E(N2

i (t)|Xi) = 1− F1(t|Xi)− F2(t|Xi).

We allow the censoring distribution to depend on the covariates Xi but assumethroughout that Ci is conditionally independent of (Ti, Di) given Xi. Thisimplies

P(Ti > t|Xi) = G(t|Xi)S(t|Xi), (5)

whereG(t|Xi) = P (Ci > t|Xi) is the conditional probability of being uncensoredat time t. Noting G(t− |Xi) = P (Ci ≥ t|Xi) we also have for k = 1, 2:

E(Nki (t)|Xi) = E(∆iN

ki (t)|Xi) =

∫ t

0

E(Ci ≥ s|Xi) E(dNki (s)|Xi)

=

∫ t

0

G(s− |Xi) dFk(s|Xi). (6)

4.2 Ignoring non-evaluable pairs

An asymptotically biased estimate of C1(t) is given by

C1,naive(t) =

∑ni=1

∑nj=1(Aij + Bij)Q

ij(t)N1i (t)∑n

i=1

∑nj=1(Aij + Bij)N1

i (t)

whereQij(t) = I{M(t,Xi) > M(t,Xj)} is an indicator for the order of predictedrisks at time t.

This can be interpreted as the proportion of definitely concordant pairsamongst evaluable pairs, i.e. pairs for which one individual experiences theevent of interest and concordance can be decided based on the observed (po-tentially censored) data. The evaluable pairs are all pairs of the form ((Di =1, Ti,M(t,Xi)), (Dj , Tj ,M(t,Xj))) with Ti ≤ t and Tj > Ti or Dj = 2. A”good” prognostic model should assign a lower risk to an individual that ex-periences an event (of any type) late (Tj > Ti) or never experiences the event

of interest (Dj = 2). Thus, concordant pairs are those evaluable pairs withM(t,Xi) > M(t,Xj), i.e. Qij(t) = 1. Pairs for which both individuals areeither censored or experience the competing event as well as pairs for which

7

the first individual experiences the event of interest and the second is censoredbefore that time point are considered non-evaluable because it is impossible todecide whether the predictions for such a pair are concordant or not. This sim-ple estimate evaluated at the time tmax of the maximum follow-up duration hasbeen previously defined in (author?) [25]. While simple, a major problem ofthis estimator is that by ignoring non-evaluable pairs without any correction,bias is introduced. It is well known that Harrel’s C depends on the censoringdistribution [23, 6] and C1,naive(t) shares this limitation.

4.3 IPCW estimate

We use a “working model” to derive an estimate G for G, the conditional prob-ability of being uncensored given the predictors. For example, we could assumea Cox regression model:

G(t|Xi) = exp{−∫ t

0

exp(γTXi)Γ0(s)ds}

where Γ0 is an unknown baseline hazard function and γ a vector of regressioncoefficients that quantify the effect of the predictors on the censoring hazard.Alternatively, we could assume that the censoring is independent of the predic-tors and estimate G with the Kaplan-Meier estimate for the censoring times.

Based on G we define weights

Wij,1 = G(Ti − |Xi)G(Ti|Xj) and Wij,2 = G(Ti − |Xi)G(Tj − |Xj).

and derive the following inverse probability of censoring weighted (IPCW) esti-mate of C1(t):

C1(t) =

∑ni=1

∑nj=1

(AijW

−1ij,1 + BijW

−1ij,2

)Qij(t)N1

i (t)∑ni=1

∑nj=1

(AijW

−1ij,1 + BijW

−1ij,2

)N1i (t)

(7)

Lemma 1. Under conditionally independent censoring, if the working modelis correctly specified in the sense that G consistently estimates G, then C1(t)consistently estimates C1(t) for all t with G(t|x) > ε > 0 uniformly over x.

A proof of the lemma is given in the supplementary material.

5 Simulation study

A simulation study was performed to assess bias and root mean square error(RMSE) of the proposed IPCW estimator of C1(t). As in Section 3.1, the com-peting risks data were simulated based on a standard normal covariate X andCox proportional hazards models with constant cause-specific baseline hazards.Specifically, we chose either cause-specific hazards λ01(t) = 1, λ02(t) = 2 andregression coefficients β1 = β2 = 1 (scenario CR1) or λ01(t) = 1, λ02(t) = 0.5,

8

and β1 = 2, β2 = −1 (scenario CR2). The truncation time point t was chosenas either the median or the upper quartile q3 of the time to event distribution.The true concordance C1(t) between X and failure cause 1 ranged from 0.62 to0.86 for the different competing risks scenarios and truncation points (Table 1).

We simulated both independent and dependent right-censoring for the data.Independent censoring was based on an exponential censoring distribution withthe rate of the exponential distribution chosen such that (on average) 40% ofevents prior to t were censored. Dependent censoring was chosen according toan exponential distribution with rate λcens ·exp(X) with λcens chosen such that30% of events prior to t were censored. Finally, we varied the sample size Nbetween 250 and 1000 observations per simulated data set.

Three different estimators of C(t) were investigated: the naive estimatorC1,naive(t) ignoring non-evaluable pairs, an IPCW estimator C1,KM(t) based ona Kaplan-Meier estimate of the censoring distribution, and an IPCW estimatorC1,Cox(t) which estimated the censoring distribution with a Cox proportionalhazards model depending on X.

Results for all 16 simulated scenarios based on 1000 simulations per scenarioare displayed in Table 1. Lower empirical bias for the IPCW-based estimatorsand reduced bias for the Cox-model based IPCW estimator compared to theKaplan-Meier based estimator in situations with dependent censoring are evi-dent for almost all scenarios. The IPCW estimates also tended to have lowerRMSE with substantial improvements for some scenarios. No consistent im-provements in RMSE of the Cox-model based estimator over the Kaplan-Meierestimator were apparent in the situation of dependent censoring.

6 Application to coronary risk prediction

Specialist medical societies recommend initiation of preventive treatment forcoronary heart disease (CHD) based on a subjects’ predicted 10-year risk forCHD [17]. To accurately predict the absolute risk of CHD in older people,prognostic models for CHD need to account for the competing risk of non-CHD death [13]. In this section, we revisit the example of [25] on coronaryrisk prediction based on data of elderly women from the Rotterdam Study, aprospective, population-based cohort of elderly subjects living in a suburb areaof Rotterdam, the Netherlands [12].

We analysed data from 10 years of follow-up of 4144 women aged between 55and 90 years who were free of CHD at baseline. During that follow-up period,389 women experienced a CHD event and 921 women died without prior CHDevent. Only 41 women of those event-free had less than 10 years of follow-up.We randomly split the data set into a training data set (2763 women with 249CHD events) and a validation data set (1381 women with 140 CHD events).

Using the training set we estimated the parameters of a Fine-Gray regres-sion model which included the ”traditional” baseline risk factors for CHD: age,treatment for high blood pressure (yes versus no), systolic blood pressure (sep-arate slopes depending on whether the subject was on blood pressure treatment

9

or not), diabetes mellitus, log-transformed total cholesterol to HDL cholesterolratio, and smoking status (current versus never or former smoker). All these riskfactors were associated with an increased CHD risk and, except for diabetes,all reached conventional significance (p < 0.05). We also fitted a second Fine-Gray model with age, the strongest predictor, as the only covariate. Finally weestimated two Cox regression models, one for the cause-specific hazard of CHDand one for the cause-specific hazard of non-CHD death. In both Cox analy-ses the same factors were included as for the multivariable Fine-Gray model.Here all risk factors were significantly associated with an increased hazard ofCHD. Also, the factors age, smoking and diabetes were significantly associ-ated with an increased hazards of non-CHD death whereas the log-transformedtotal cholesterol to HDL cholesterol ratio showed a significant inverse associ-ation. Treatment for hypertension and blood pressure were not significantlyassociated with the hazard of non-CHD death. The two Cox analyses yieldedcause-specific hazard predictions λ1(t|x) and λ2(t|x), respectively, which weresubsequently combined into a prediction of the cumulative incidence function ofcause j based on the formula:

Fj(t|x) = 1−∫ t

0

exp

{−∫ s

0

(λ1(u|x) + λ2(u|x))du

}λj(s|x)ds.

Concordance estimates were obtained for these models in the validation set.The dependence of the censoring distribution on the covariates was investigatedwith a Cox regression model which yielded no trends and non-significant Waldtests for all variables. Thus, all IPCW estimates of concordance were basedon the marginal Kaplan-Meier estimator for the censoring distribution from thevalidation set.

The left panel of Figure 3 shows the discrimination ability of the two Fine-Gray models for varying time horizons between 1 and 10 years. The multivari-able model shows higher discriminative ability compared to the model basedon age alone. For both models, concordance estimates stabilize after about 2.5years of follow-up and remain fairly stable though slightly decreasing. The de-crease may occur because earlier events are easier to predict than later events.The multivariable cause-specific Cox models showed a similar trend but slightlylower c-index in the first years compared to the multivariable Fine-Gray model(curve not shown).

The right panel of Figure 3 shows corresponding prediction error curves[21, 7] in the validation set which assesses the overall performance of the models(i.e. both discrimination and calibration) whereas the concordance assessesdiscrimination only.

The upper half of Table 2 shows the estimated concordance for the cumu-lative incidences of CHD and non-CHD death, respectively, after 1, 5 and 10years, as were obtained with the single split into training and validation data.The multivariable models were comparable and showed superior discriminationcompared to the the age only model for the prediction of CHD whereas im-provements were much smaller for the prediction of non-CHD death. This is

10

not surprising as these additional covariates are established CHD-specific riskfactors which would not be expected to strongly affect non-CHD death (exceptfor other deaths related to the cardiovascular system).

The lower half of Table 2 shows bootstrap-crossvalidation estimates [6] ofconcordance obtained by averaging the results of 1000 splits into a bootstraptraining set (size 4144, same as the original data) and a corresponding validationset (size n ∼ 1525). The estimated concordance for CHD at the 1 year horizonwas higher as compared to the results of the single split for all models. Onepossible explanation for this result is that the models where trained in bootstrapsamples which contains more information than the training sample used for thesingle split. Another possible explanation for the differences is that the singlesplit analysis is highly dependent on how the data are split.

7 Discussion

We have presented a formal definition of the concordance probability for prog-nostic models in the presence of competing risks. Like the concordance prob-ability for survival or binary data, it provides a simple numeric measure ofdiscrimination. To deal with right censored data we derived a consistent IPCWestimator of the truncated concordance probability.

The fact that we estimate a truncated version of concordance rather thanthe unconstrained concordance probability should not be seen as a limitation ofour approach. Indeed it is impossible to assess the performance of prognosticmodels beyond the maximum follow-up duration without strong and untestableassumptions. This is not a particularity of the competing risks situation andalso known for survival data without competing risks [23, 6].

Consistency of the proposed estimator relies on the assumption that thecensoring distribution is correctly specified. In many applications it will be rea-sonable to assume that the censoring mechanism is independent of the predictorvariables and then the marginal Kaplan-Meier estimate (ignoring the predictorvariables) of the censoring distribution can be used to derive a consistent IPCWestimate.

The work presented here is related to the work of [19] and, especially, [27] ontime-dependent sensitivity, specificity and ROC curves for competing risks. Fortheir definition of sensitivity and specificity, these authors defined (cumulative)cases and (dynamic) controls at time t as observations with T ≤ t and D = 1(cases) and those with T > t (controls). In addition and similar to the presentwork, [27] also consider alternative controls defined as observations with T > tor D = 2. With this alternative definition of controls, the area under the ROCcurve at time t from [27] is similar to our definition of concordance. However,the estimators in [27] rely on cause-specific (smoothed) Cox regression modelsand are different from the IPCW estimator proposed here. The advantage ofthe IPCW estimator is that it does not require a model for the competing risksprocess but only an estimate of the cenroring distribution.

We assumed that the prognostic model was derived on an independent train-

11

ing data set. While we have not analytically derived formulas for the asymptoticvariance of the estimator, confidence intervals based on bootstrapping the testdata set are readily available in this situation. Clearly, independent trainingdata are not always available, and even if they are, a joint analysis of all datawill be more efficient. However, some form of internal cross-validation is neededto develop and assess a prognostic model with a single data set [2, 9, 11, 5, 22].

It is important to emphasize that our definition of concordance assesses aprognostic model for the absolute risk of the event of interest in the presenceof competing risks. In line with earlier work [4, 25] we regard this risk as cru-cial for medical decision-making in the competing risks setting. However, inmany instances explicit consideration of competing events will also be impor-tant and modeling the entire competing risks multi-state process will providefurther insights [1]. As an example, we saw in Section 3.1 that discrimination ofprognostic models for the event of interest is hampered if covariates affect bothcause-specific hazards with regression coefficients of the same sign, especially ifthere is a strong association with the competing risk or if the baseline competinghazard is high. This indicates that to achieve high discrimination ability oneneeds predictors which are only weakly or, even better, reversely associated withthe cause-specific hazard of the competing event. Moreover, in settings whereall competing events are of similar importance, joint accuracy criteria for theentire competing risks multi-state process are needed and their development isan important area for future research.

Acknowledgments

Marcel Wolbers was supported by the Wellcome Trust and the Li Ka ShingFoundation University of Oxford Global Health Programme. Conflict of Inter-est: None declared.

References

[1] J. Beyersmann, M. Dettenkofer, H. Bertz, and M. Schumacher. A com-peting risks analysis of bloodstream infection after stem-cell transplanta-tion using subdistribution hazards and cause-specific hazards. Statistics inMedicine, 26:5360–5369, 2007.

[2] Bradley Efron and Robert Tibshirani. Improvements on cross-validation:The .632+ bootstrap method. Journal of the American Statistical Associ-ation, 92:548–560, 1997.

[3] J. P. Fine and R. J. Gray. A proportional hazards model for the subdistri-bution of a competing risk. Journal of the American Statistical Association,446:496–509, 1999.

[4] M. H. Gail and R. M. Pfeiffer. On criteria for evaluating models of absoluterisk. Biostatistics, 6:227–239, 2005.

12

[5] T. A. Gerds, T. Cai, and M. Schumacher. The performance of risk predic-tion models. Biometrical Journal, 50:457–479, 2008.

[6] T. A. Gerds, M. W. Kattan, M. Schumacher, and C. Yu. Estimating atime-dependent concordance index for survival prediction models with co-variate dependent censoring. Statistics in Medicine, page Early view: DOI:10.1002/sim.5681, 2012.

[7] T.A. Gerds, T.H. Scheike, and P.K. Andersen. Absolute risk regression forcompeting risks: interpretation, link functions, and prediction. Statisticsin Medicine, 31:3921–3930, 2012.

[8] G. L. Grunkemeier, R. Jin, M. J. Eijkemans, and J. J. Takkenberg. Actualand actuarial probabilities of competing risks: apples and lemons. Annalsof Thoracic Surgery, 83:1586–1592, 2007.

[9] F. E. Harrell. Regression Modeling Strategies. Springer Series in Statistics,New York, 2001.

[10] F.E. Harrell, R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati.Evaluating the yield of medical tests. Journal of the American MedicalAssociation, 247:2543–2546, 1982.

[11] T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of StatisticalLearning (Second Edition). Springer Series in Statistics, New York, 2009.

[12] A. Hofman, C. M. van Duijn, O. H. Franco, M. A. Ikram, H. L. Janssen,C. C. Klaver, E. J. Kuipers, T. E. Nijsten, B. H. Stricker, H. Tiemeier, A. G.Uitterlinden, M. W. Vernooij, and J. C. Witteman. The Rotterdam Study:2012 objectives and design update. European Journal of Epidemiology,26:657–686, Aug 2011.

[13] M. T. Koller, M. J. Leening, M. Wolbers, E. W. Steyerberg, M. G. Hunink,R. Schoop, A. Hofman, H. C. Bucher, B. M. Psaty, D. M. Lloyd-Jones, andJ. C. Witteman. Development and validation of a coronary risk predictionmodel for older U.S. and European persons in the cardiovascular healthstudy and the Rotterdam Study. Annals of Internal Medicine, 157:389–397, Sep 2012.

[14] M. T. Koller, H. Raatz, E. W. Steyerberg, and M. Wolbers. Competingrisks and the clinical community: irrelevance or ignorance? Stat Med,31:1089–1097, May 2012.

[15] M. T. Koller, B. Schaer, M. Wolbers, C. Sticherling, H. C. Bucher, andS. Osswald. Death without prior appropriate implantable cardioverter-defibrillator therapy: a competing risk study. Circulation, 117:1918–1926,2008.

13

[16] Ulla B. Mogensen, Hemant Ishwaran, and Thomas A. Gerds. Evaluatingrandom forests for survival analysis using prediction error curves. Journalof Statistical Software, 50:1–23, 2012.

[17] NCEP. Third Report of the National Cholesterol Education Program(NCEP) Expert Panel on Detection, Evaluation, and Treatment of HighBlood Cholesterol in Adults (Adult Treatment Panel III) final report. Cir-culation, 106:3143–3421, 2002.

[18] H. Putter, M. Fiocco, and R. B. Geskus. Tutorial in biostatistics: compet-ing risks and multi-state models. Stat Med, 26:2389–2430, 2007.

[19] P. Saha and P. J. Heagerty. Time-dependent predictive accuracy in thepresence of competing risks. Biometrics, 66:999–1011, 2010.

[20] Thomas Scheike, M.J. Zhang, and Thomas Alexander Gerds. Predictingcumulative incidence probability by direct binomial regression. Biometrika,95:205–220, 2008.

[21] R. Schoop, J. Beyersmann, M. Schumacher, and H. Binder. Quantifying thepredictive accuracy of time-to-event models in the presence of competingrisks. Biometric Journal, 53:88–112, 2011.

[22] E. W. Steyerberg. Clinical Prediction Models: A Practical Approach toDevelopment, Validation, and Updating. Springer, New York, 2009.

[23] H. Uno, T. Cai, M. J. Pencina, R. B. D’Agostino, and L. J. Wei. On theC-statistics for evaluating overall adequacy of risk prediction procedureswith censored survival data. Statistics in Medicine, 30:1105–1117, 2011.

[24] H. van Houwelingen and H. Putter. Dynamic Prediction in Clinical SurvivalAnalysis. Chapman & Hall/CRC Monographs on Statistics & AppliedProbability, 2011.

[25] M. Wolbers, M. T. Koller, J. C. Witteman, and E. W. Steyerberg. Prog-nostic models with competing risks: methods and application to coronaryrisk prediction. Epidemiology, 20:555–561, 2009.

[26] G. Yan and T. Greene. Investigating the effects of ties on measures ofconcordance. Statistics in medicine, 27:4190–4206, 2008.

[27] Y. Zheng, T. Cai, Y. Jin, and Z. Feng. Evaluating prognostic accuracy ofbiomarkers under competing risk. Biometrics, 68:388–396, 2012.

14

Supplementary Materials

Section A of the supplementary material illustrates the software implementationof the proposed concordance estimator in the function cindex of the R packagepec [16]. Section B provides a proof of Lemma 1.

A Software implementation

The presented concordance estimators for competing risks have been imple-mented in the function cindex of the R package pec [16]. The following boxshows R code for evaluating C1(t) at t = 1, 5, and10 simultaneously for a Fine-Gray regression model and the combination of cause-specific Cox regressionwith age as the only covariate based on a marginal Kaplan-Meier model for thecensoring distribution.

1 library(pec)

2 library(riskRegression)

3 library(cmprsk)

45 # Fit Fine-Gray and cause-specific models for failure cause 1 to the

training data

67 fg.train <- FGR(Hist(ttevent,evtype)~age,data=chd.train,cause=1)

8 cox.train <- CSC(Hist(ttevent,evtype)~age,data=chd.train,cause=1)

910 # Calculate truncated concordance at t=1,5,10 years

11 # for CHD in the test data

1213 cindex(list(fg.train,cox.train),

14 formula=Hist(ttevent,evtype)~1,

15 cens.model="marginal",

16 data=chd.test,eval.times=c(1,5,10),cause=1)

1718 # Fit Fine-Gray and cause-specific models for failure cause 1 in all

data

1920 fg <- FGR(Hist(ttevent,evtype)~age,data=chd,cause=1)

21 cox <- CSC(Hist(ttevent,evtype)~age,data=chd,cause=1)

2223 # Calculate truncated concordance at t=1,5,10 years

24 # for CHD using bootstrap cross-validation

2526 cindex(list(fg.train,cox.train),

27 formula=Hist(ttevent,evtype)~1,

28 cens.model="marginal",

29 splitMethod="bootcv",

30 B=1000,

31 data=chd,eval.times=c(1,5,10),cause=1)

15

B Proof of Lemma 4.1

It is assumed that G is consistent for G. Thus, Slutsky’s lemma shows that theweights converge in probability to

Wij,1 = G(Ti − |Xi)G(Ti|Xj) and Wij,2 = G(Ti − |Xi)G(Tj − |Xj)

as n → ∞. By the law of large numbers and Slutsky’s Lemma it follows thatC1(t) converges in probability to

EXi,Xj

[Qij(t)

(E{AijN1

i (t)W−1ij,1|Xi, Xj}+ E{BijN1

i (t)W−1ij,2|Xi, Xj}

)]EXi,Xj

[(E{AijN1

i (t)W−1ij,1|Xi, Xj}+ E{BijN1

i (t)W−1ij,2|Xi, Xj}

)] . (8)

By applying equations (4.4) and (4.5) we have

E(AijN1i (t)|Xi, Xj) =

∫ t

0

E(I{Tj > s|Xj}) E(dN1i (s)|Xi)

=

∫ t

0

G(s|Xj)S(s|Xj)G(s− |Xi)dF1(s|Xi)

and hence

E(AijN1i (t)W−1

ij,1|Xi, Xj) =

∫ t

0

S(s|Xj)dF1(s|Xi).

Similarly by equation (4.5) we have

E(BijN1i (t)|Xi, Xj) = E(∆i∆jBijN

1i (t)|Xi, Xj)

=

∫ t

0

∫ s

0

G(v − |Xj)G(s− |Xi)dF2(v|Xj) dF1(s|Xi)

which yields

E(BijNi(t)Wij,2|Xi, Xj) =

∫ t

0

F2(s|Xj)dF1(s|Xi).

Inserting shows that expression (8) equals the expression for C1(t) given in (2.3):

EXi,Xj

[Qij(t)

∫ t0{S(s|Xj) + F2(s|Xj)}dF1(s|Xi)

]EXi,Xj

[∫ t0{S(s|Xj) + F2(s|Xj)}dF1(s|Xi)

] =EXi,Xj

[Qij(t)

∫ t0{1− F1(s|Xj)}dF1(s|Xi)

]EXi,Xj

[∫ t0{1− F1(s|Xj)}dF1(s|Xi)

] .

16

Tables and Figures

Figure 1: Concordance for cause-specific hazards models depending on the effectsize β1 as well as on the baseline hazard λ02(t) and the regression coefficientfor the competing risk. Black solid lines refer to β2 = 0, black dashed lines toβ2 = −β1, gray solid lines to β2 = β1, and gray dashed lines to β2 = 2β1.

0 2 4 6 8 10

0.2

0.4

0.6

0.8

1.0

β1

Con

cord

ance

C~

1

λ02(t) = 0

(no competing risks)

0 2 4 6 8 10

0.2

0.4

0.6

0.8

1.0

β1

Con

cord

ance

C~

1

λ02(t) = 0.5

0 2 4 6 8 10

0.2

0.4

0.6

0.8

1.0

β1

Con

cord

ance

C~

1

λ2(t) = 2

17

Figure 2: Concordance for Fine-Gray models depending on the effect size β andthe baseline probability p of an event of interest in [0,∞). Lines correspond top = 0.1 (gray dashed line), 0.5 (black line), and 0.9 (gray dash-dotted line).

0 2 4 6 8 10

0.5

0.6

0.7

0.8

0.9

1.0

β

Con

cord

ance

C~

1

18

Tab

le1:

Sim

ula

tion

sett

ings

,b

ias

and

root

mea

nsq

uare

erro

r(R

MS

E)

for

the

16

scen

ari

osof

the

sim

ula

tion

stud

y.B

ias

an

dR

MS

Efo

rea

chsc

enar

ioar

eb

ased

onre

sult

sfr

om

1000

sim

ula

ted

data

sets

.C

R=

com

pet

ing

risk

sp

roce

ss.

Sim

ula

tion

sett

ing

Bia

s(×

100)

RM

SE

(×100)

CR

tce

nso

rin

gN

C 1(t

)C 1,naive(t

)C 1,K

M(t

)C 1,C

ox(t

)C 1,naive(t

)C 1,K

M(t

)C 1,C

ox(t

)1

med

ian

ind

ep.4

025

00.

671.8

5-0

.05

-0.0

25.6

15.4

65.4

61

med

ian

ind

ep.4

010

000.

672.1

20.0

70.0

83.3

72.7

52.7

51

med

ian

dep

.30

250

0.67

0.0

6-0

.73

-0.5

34.8

04.7

54.9

61

med

ian

dep

.30

1000

0.67

0.3

3-0

.40

0.0

72.3

22.3

42.4

61

q3

ind

ep.4

025

00.

623.8

5-0

.04

0.0

45.8

64.9

74.9

61

q3

ind

ep.4

010

00

0.62

4.0

20.1

10.1

34.6

02.4

52.4

61

q3

dep

.30

250

0.62

1.1

2-0

.45

-0.2

24.3

74.2

44.4

11

q3

dep

.30

1000

0.62

1.2

0-0

.44

-0.0

72.3

32.0

22.1

42

med

ian

ind

ep.4

025

00.

861.4

80.0

20.0

72.7

22.5

32.5

02

med

ian

ind

ep.4

010

000.

861.5

0-0

.00

0.0

11.8

61.2

51.2

52

med

ian

dep

.30

250

0.86

0.6

6-0

.13

-0.0

12.3

02.2

92.1

32

med

ian

dep

.30

1000

0.86

0.6

6-0

.12

-0.0

01.3

01.1

81.0

92

q3

ind

ep.4

025

00.

851.6

10.2

20.2

62.5

52.1

22.1

02

q3

ind

ep.4

010

00

0.85

1.4

8-0

.04

-0.0

31.7

81.0

51.0

52

q3

dep

.30

250

0.85

0.8

8-0

.07

0.1

02.1

22.0

31.8

72

q3

dep

.30

1000

0.85

0.8

4-0

.14

-0.0

11.2

60.9

90.9

1

19

Table 2: Estimated concordance in the coronary heart disease study truncatedafter 1, 5 and 10 years.

Single split into training/validation dataCHD non-CHD death

1y 5y 10y 1y 5y 10yFine-Gray (age only) 57.5 65.8 63.7 81.9 79.6 75.7Fine-Gray 73.7 74.6 71.6 82.6 79.9 76.2Cause-specific Cox 70.7 73.9 71.2 82.2 80.0 76.2

Average of 1000 splits into bootstrap training data andcorresponding validation data

CHD non-CHD death1y 5y 10y 1y 5y 10y

Fine-Gray (age only) 72.5 69.5 66.8 81.5 78.7 76.1Fine-Gray 77.8 74.4 71.6 81.9 78.9 76.7Cause-specific Cox 77.9 74.4 71.6 81.7 79.0 76.7

Figure 3: Left panel: IPCW estimates of C1(t) for the multivariable Fine andGray model (solid line) and the model with age as the only covariate (dashedline) for a follow-up duration of 1−10 years in the validation data. Error barsat 2.5, 5, 7.5 and 10 years of follow-up correspond to boostrap standard errors.Right panel: Prediction error curves for the multivariable Fine and Gray model(black solid curve) and the model with age as the only covariate (black dashedline). The gray line shows the prediction error of the covariate-free nonpara-metric estimate of the cumulative incidence function as a benchmark.

Time t (years)

Con

cord

ance

C1(

t)

0.0 2.5 5.0 7.5 10.0

0.50

0.60

0.70

0.80

●

●●

●

●

●●

●

Time t (years)

Pre

dict

ion

erro

r (B

rier

scor

e)

0.0 2.5 5.0 7.5 10.0

0.00

0.02

0.04

0.06

0.08

20

Research Reports available from Department of Biostatistics http://www.pubhealth.ku.dk/bs/publikationer ________________________________________________________________________________ Department of Biostatistics University of Copenhagen Øster Farimagsgade 5 P.O. Box 2099 1014 Copenhagen K Denmark 11/1 Kreiner, S. Is the foundation under PISA solid? A critical look at the scaling model

underlying international comparisons of student attainment. 11/2 Andersen, P.K. Competing risks in epidemiology: Possibilities and pitfalls. 11/3 Holst, K.K. Model diagnostics based on cumulative residuals: The R-package gof. 11/4 Holst, K.K. & Budtz-Jørgensen, E. Linear latent variable models: The lava-package. 11/5 Holst, K.K., Budtz-Jørgensen, E. & Knudsen, G.M. A latent variable model with mixed

binary and continuous response variables. 11/6 Cortese, G., Gerds, T.A., &Andersen, P.K. Comparison of prediction models for

competing risks with time-dependent covariates. 11/7 Scheike, T.H. & Sun, Y. On Cross-Odds Ratio for Multivariate Competing Risks Data. 11/8 Gerds, T.A., Scheike T.H. & Andersen P.K. Absolute risk regression for competing risks:

interpretation, link functions and prediction. 11/9 Scheike, T.H., Maiers, M.J., Rocha, V. & Zhang, M. Competing risks with missing

covariates: Effect of haplotypematch on hematopoietic cell transplant patients. 12/01 Keiding, N., Hansen, O.K.H., Sørensen, D.N. & Slama, R. The current duration approach

to estimating time to pregnancy. 12/02 Andersen, P.K. A note on the decomposition of number of life years lost according to

causes of death.

12/03 Martinussen, T. & Vansteelandt, S. A note on collapsibility and confounding bias in Cox and Aalen regression models.

12/04 Martinussen, T. & Pipper, C.B. Estimation of Odds of Concordance based on the Aalen

additive model. 12/05 Martinussen, T. & Pipper, C.B. Estimation of Causal Odds of Concordance based on the

Aalen additive model. 12/06 Binder, N., Gerds, T.A. & Andersen, P.K. Pseudo-observations for competing risks with

covariate dependent censoring. 12/07 Keiding, N. & Clayton, D. Standardization and control for confounding in observational

studies: a historical perspective. 12/08 Hilden, J. & Gerds, T.A. Evaluating the impact of novel biomarkers: Do not rely on IDI

and NRI. 12/09 Gerster, M., Madsen, M. & Andersen, P.K. Matched Survival Data in a Co-twin Control

Design. 12/10 Scheike, T.H., Holst, K.K. & Hjelmborg, J.B. Estimating heritability for cause specific

mortality based on twin studies. 12/11 Jensen, A.K.G., Gerds, T.A., Weeke, P., Torp-Pedersen C. & Andersen, P.K. A note on the

validity of the case-time-control design for autocorrelated exposure histories. 13/01 Forman, J.L. & Sørensen M. A new approach to multi-modal diffusions with applications

to protein folding. 13/02 Mogensen, U.B., Hansen, M., Bjerrum, J.T., Coskun, M., Nielsen, O.H., Olsen, J. & Gerds,

T.A. Microarray based classification of inflammatory bowel disease: A comparison of modelling tools and classification scales.

13/03 Wolbers, M., Koller, M.T., Witteman, J.C.M. & Gerds, T.A. Concordance for prognostic

models with competing risks.

Documents

Department of Biostatistics University of Copenhagen · Research Report 13/03 Department of Biostatistics University of Copenhagen. ... this article we discuss and formalize an earlier