Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Part [1.0] – Introductionto
Development and Evaluation of Dynamic Predictions
• A Bansal & PJ Heagerty• University of Washington
1 Biomarkers
2 Biomarkers
3 Biomarkers
Outline
• Introductions• Evaluation – time-dependent accuracy
• Breast cancer biomarker• Mayo PBC Data• Illustration: Multiple Myeloma
• Development of dynamic predictions• End Stage Renal Disease Study
• Decision theoretic evaluation
• Practice with R
4 Biomarkers
Some Context – Armitage lecture 2010
“Whereas measurements and event data are very oftencollected together, the methods of analyzing them belong totwo different fields, survival analysis and longitudinal analysis,which rarely connect.”
“The separation between event data and longitudinalmeasurements is artificial.”
“In epidemiology, there is a strong push towards the analysisof life-history data because such data are now increasinglybecoming available due to the development of informationtechnology.”
“To develop statistical tools for an integrated analysis of suchdata is a big challenge.”
O. Aalen (2012)
5 Biomarkers
Part [1.1] – Measures of Classification Accuracyfor the
Prediction of Survival Times
6 Biomarkers
Session Outline
• Examples• Breast Cancer Cytology Data• Mayo PBC Data• Cystic Fibrosis Foundation Registry Data
• ROC overview
• TP, FP for survival outcomes
• ROCC/Dt (p)
• ROCI/Dt (p), AUC (t), and concordance, C τ
• Further work
7 Biomarkers
Example: Comparing cytometry measures
Breast Cancer among Younger Women
• N = 253 women BC diagnosed aged 20 to 44.
• Endpoint: time-until-death (any cause)• Cytometry measurements:
• “old” (S-phase ungated)• “new” (S-phase gated)
• Goal: compare measures as predictors of mortality
• Heagerty, Lumley & Pepe (2000)
8 Biomarkers
0 1 2 3 4 5 6 7
01
23
45
67
%S Ungated
%S
Gat
ed
New versus Old Method
9 Biomarkers
Predictive Survival Models
Mayo PBC Data
• N = 312 subjects, 125 deaths, 1974-1986
• Baseline measurements: bilirubin, prothrombin time,albumin...
• Goal: predict mortality; “medical management”
10 Biomarkers
11 Biomarkers
Predictive Survival Models
Cystic Fibrosis Data
• N = 23, 530 subjects, 4, 772 deaths, 1986-2000
• n = 160, 005 longitudinal observations
• Longitudinal measurements: FEV1, weight, height
• Goal: predict mortality; transplantation selection
12 Biomarkers
age
fev1
10 20 30 40 50
020
60100
907762
•••••
•
••••••••
agefev1
10 20 30 40 50
020
60100
914421
•
•••••••
age
fev1
10 20 30 40 50
020
60100
931885
••••••••••
age
fev1
10 20 30 40 50
020
60100
930200
••••••••
age
fev1
10 20 30 40 50
020
60100
918755
•
•
age
fev1
10 20 30 40 50
020
60100
904750
• •
age
fev1
10 20 30 40 50
020
60100
906580
•••••••••••••
age
fev1
10 20 30 40 50
020
60100
922747
••••
age
fev1
10 20 30 40 50
020
60100
913076
•••••
age
fev1
10 20 30 40 50
020
60100
923143
•••••••••••••
age
fev1
10 20 30 40 50
020
60100
922295
•••••••
age
fev1
10 20 30 40 50
020
60100
922858
••••••••••••
age
fev1
10 20 30 40 50
020
60100
928943
•••••••••••
age
fev1
10 20 30 40 50
020
60100
925274
••
•••••••
age
fev1
10 20 30 40 50
020
60100
905506
••
age
fev1
10 20 30 40 500
2060
100
925198
••
••••
age
fev1
10 20 30 40 50
020
60100
912910
•••••••••
age
fev1
10 20 30 40 50
020
60100
930735
••••
age
fev1
10 20 30 40 50
020
60100
927419
•••••••••••
age
fev1
10 20 30 40 50
020
60100
908675
••••••••
age
fev1
10 20 30 40 50
020
60100
916766
••
age
fev1
10 20 30 40 50
020
60100
926609
•••
age
fev1
10 20 30 40 500
2060
100
920366
•••••
age
fev1
10 20 30 40 50
020
60100
927349
•••••••
13 Biomarkers
Components of Accuracy
• Calibration• Bias – does observed match predicted?• Evaluated graphically and formally.
• Discrimination• Does prediction separate subjects with different risks?• Evaluated qualitatively based on K-M plots.
Harrell, Lee and Mark (1996); Hosmer and Lemeshow (2013)
14 Biomarkers
Calibration: example from Levy et al. (2006)
15 Biomarkers
BC Survival: New Measurement (gated)
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
%S: [0.0, 2 ), N= 86%S: [ 2 , 6 ), N= 86%S: [ 6 , 100), N= 81
(a) Survival for %S, Gated
16 Biomarkers
BC Survival: Old Measurement (ungated)
0 50 100 150
0.0
0.2
0.4
0.6
0.8
1.0
%S: [0.0, 1.1 ), N= 86%S: [ 1.1 , 3.5 ), N= 86%S: [ 3.5 , 100), N= 81
(b) Survival for %S, Ungated
17 Biomarkers
Discrimination
• Common to use ROC curves for logistic regression / binaryclassification.
Q: Extend classification error rate concepts to survival data?
18 Biomarkers
Binary Classification
Sensitivity “True Positive”
BINARY TEST : P(T + | D = 1)
CONTINUOUS MARKER : P(M > c | D = 1)
Specificity “True Negative”
BINARY TEST : P(T− | D = 0)
CONTINUOUS MARKER : P(M ≤ c | D = 0)
19 Biomarkers
ROC Curve
An ROC curve plots the True Positive Rate, TP(c), versus theFalse Positive Rate, FP(c) for all possible cutpoints, c:
FP(c) = P(M > c | D = 0)
TP(c) = P(M > c | D = 1)
ROC Curve : [ FP(c), TP(c) ] ∀c
20 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
21 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
22 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
23 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
24 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
25 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
26 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
27 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
28 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
29 Biomarkers
-2-1
01
2
0 1
Marker versus Disease status
1-specificityse
nsiti
vity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
ROC curve
30 Biomarkers
ROC Curves
1. Compare different markers over full spectrum of errorcombinations.
2. Compare sensitivity when controlling specificity (eg. TP whenFP=10%).
3. AUC interpretation:“For a randomly chosen case and control, the area under theROC curve is the probability that the marker for the case isgreater than the marker for the control.”
4. AUC is a marker-outcome concordance summary.
31 Biomarkers
Does a (repeated) measurement predict onset?
• Q: Can a measurement accurately predict which CASESwill experience an event (soon)?
• e.g. FEV1• e.g. death time
• Q: Can a measurement be used to accurately guidelongitudinal treatment decisions?
• e.g. lung transplantation
32 Biomarkers
Classification Errors
• True Positive Rate
P( high measurement | CASE )
• False Positive Rate
P( high measurement | CONTROL )
33 Biomarkers
Issues Related to Time
• Q: When is the measurement taken?• At baseline: Y (0)• At a follow-up time t: Y (t)
• Q: What time is used to determine when someone is aCASE or a CONTROL?
• Case = event (disease, death) before time t.• Case = event (disease, death) at time t.• Control = event-free through time t.• Control = event-free through a large follow-up time t?.
34 Biomarkers
Why study time-varying accuracy?
• Marker used to guide decisions at multiple subsequent time points
• However, performance of marker may vary over time as anindividual’s underlying clinical status changes
• Set of subjects still alive (risk set) changes over time
• Could be used for guiding therapy - target highly aggressive diseaseearly on, other longer outcome targeting therapies later
35 Biomarkers
Sensitivity and Specificity for Survival
• Let T denote the survival time.• Possible definitions:
• Cases(t):
1. Cumulative: event before time t: T ≤ t2. Incident: event at time t: T = t
• Controls(t):
◦ Dynamic: event-free through time t: T > t
36 Biomarkers
Sensitivity and Specificity for Survival
• Let T denote the survival time.• Possible definitions:
• Cases(t):
1. Cumulative: event before time t: T ≤ t
2. Incident: event at time t: T = t
• Controls(t):
◦ Dynamic: event-free through time t: T > t
36 Biomarkers
Sensitivity and Specificity for Survival
• Let T denote the survival time.• Possible definitions:
• Cases(t):
1. Cumulative: event before time t: T ≤ t2. Incident: event at time t: T = t
• Controls(t):
◦ Dynamic: event-free through time t: T > t
36 Biomarkers
Sensitivity and Specificity for Survival
• Let T denote the survival time.• Possible definitions:
• Cases(t):
1. Cumulative: event before time t: T ≤ t2. Incident: event at time t: T = t
• Controls(t):◦ Dynamic: event-free through time t: T > t
36 Biomarkers
Cumulative Cases, Dynamic Controls
At time t,
• Cumulative (prevalent) cases:
0 ≤ T ≤ t
• Dynamic controls:T > t
(Heagerty, Lumley, Pepe, 2000)
37 Biomarkers
Cumulative Cases, Dynamic Controls
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
●
●
DiedCensored
38 Biomarkers
Cumulative Cases, Dynamic Controls
Example: t = 2 years
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●
●●●
●●
●●●
●
●●
Example: Identify subjects at high risk in the next 2 years to informdecisions regarding intervention- More intense screening, preventative treatment
39 Biomarkers
Cumulative Cases, Dynamic Controls
Example: t = 2 years
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●
●●●
●●
●●●
●
●●
0 2 4 6 8
40 Biomarkers
Cumulative Cases, Dynamic Controls
0 2 4 6 8
Poor marker performance:
Excellent marker performance:
41 Biomarkers
Cumulative/Dynamic ROC
• More formally,
sensitivityC (c, t) = P(M > c|case(t)) = P(M > c |T ≤ t)
specificityD(c, t) = P(M ≤ c|control(t)) = P(M > c |T > t)
• Using these definitions, define ROC(t) curve for any time t.
• Area Under the Curve:
AUCC/D(t) = P(Mj > Mk |j = case(t), k = control(t))
= P(Mj > Mk |Tj ≤ t,Tk > t)
42 Biomarkers
Cumulative/Dynamic ROC
• More formally,
sensitivityC (c, t) = P(M > c|case(t)) = P(M > c |T ≤ t)
specificityD(c, t) = P(M ≤ c|control(t)) = P(M > c |T > t)
• Using these definitions, define ROC(t) curve for any time t.
• Area Under the Curve:
AUCC/D(t) = P(Mj > Mk |j = case(t), k = control(t))
= P(Mj > Mk |Tj ≤ t,Tk > t)
42 Biomarkers
Cumulative/Dynamic ROC
• More formally,
sensitivityC (c, t) = P(M > c|case(t)) = P(M > c |T ≤ t)
specificityD(c, t) = P(M ≤ c|control(t)) = P(M > c |T > t)
• Using these definitions, define ROC(t) curve for any time t.
• Area Under the Curve:
AUCC/D(t) = P(Mj > Mk |j = case(t), k = control(t))
= P(Mj > Mk |Tj ≤ t,Tk > t)
42 Biomarkers
Cumulative Cases, Dynamic Controls
t = 2
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●
●●●
●●
●●●
●
●●
43 Biomarkers
Cumulative Cases, Dynamic Controls
t = 4
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●
●●
●●
●●
●●●
●●
●●●
●
●●
44 Biomarkers
Cumulative Cases, Dynamic Controls
t = 6
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●
●●
●●
●●
●●●
●●
●●●
●
●●
45 Biomarkers
Cumulative Cases, Dynamic Controls
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
46 Biomarkers
Estimation of Cumulative/Dynamic ROC
• Main issue: Censoring. Cannot classify everyone.
• Valid ROC estimator using nonparametric nearest neighborestimation proposed by Heagerty, Lumley, Pepe (2000)
• Bootstrap resampling method for obtaining confidenceintervals
47 Biomarkers
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1−specificity
sensitivity
(a) ROC(40m), NNE
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1−specificity
sensitivity
(d) ROC(40m), Kaplan−Meier
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1−specificity
sensitivity
(b) ROC(60m), NNE
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1−specificity
sensitivity
(e) ROC(60m), Kaplan−Meier
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1−specificity
sensitivity
(c) ROC(100m), NNE
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
1−specificity
sensitivity
(f) ROC(100m), Kaplan−Meier
48 Biomarkers
Accuracy Comparisons
Sensitivity
• Using gated (new) measurement:◦ P[M1 ≤ 5.4 | N(60m) = 0] = 0.71◦ P[M1 > 5.4 | N(60m) = 1] = 0.82
• Using ungated (old) measurement:◦ P[M2 ≤ 3.5 | N(60m) = 0] = 0.71
◦ P[M2 > 3.5 | N(60m) = 1] = 0.54• Therefore, controlling M1 and M2 to have equal specificity, thenew measure, M1, has a greater sensitivity.
49 Biomarkers
Accuracy Comparisons
AUC Calculations• AUC for gated (new): 0.80,
Bootstrap 95% CI: (0.72,0.89)• AUC for ungated (old): 0.68,
Bootstrap 95% CI: (0.56,0.77)• 95% CI for difference in areas (new-old): (0.03, 0.26)
50 Biomarkers
Recall our goal
How do we evaluate time-varying performance?
51 Biomarkers
Time-varying performance?
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
Performance averaged over previous time points.52 Biomarkers
Extension to assess time-varying trends
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 853 Biomarkers
Extension to assess time-varying trends
At time t,
• Cumulative (prevalent) cases:
t ≤ T ≤ t ′
• Dynamic controls:T > t ′
• Subset data at t to include only subjects with T ≥ t
• We set t ′ = t + 1
• Look at cumulative incidence over 1-year span, i.e. T ≤ t + 1
54 Biomarkers
Cumulative/Dynamic ROC for time-varying trends
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
●
●
DiedCensored
55 Biomarkers
Cumulative/Dynamic ROC for time-varying trends
t = 0
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
56 Biomarkers
Cumulative/Dynamic ROC for time-varying trends
t = 2
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●●
●●
●●●
●
●●
57 Biomarkers
Cumulative/Dynamic ROC for time-varying trends
t = 4
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●
●●●
●●
●●●
●●
●●
●●●
●
●●
58 Biomarkers
Cumulative Cases, Dynamic Controls
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
Can incorporate updated biomarker measurements!59 Biomarkers
Summary
• Define time-dependent ROC curves based on prospective dataand cumulative occurence of events.
• Local survival estimation handles censored event times.
• Tool to evaluate a marker or a survival regression model score.
• R package: survivalROC (see web for doc/validation)
• Can be extended to general longitudinal marker scenario
• Heagerty, Lumley and Pepe (2000) “Time-dependent ROCCurves for Censored Survival Data and a Diagnostic Marker”Biometrics
60 Biomarkers
Incident Cases, Dynamic Controls
At time t,
• Incident cases:T = t
• Dynamic controls:T > t
(Heagerty & Zheng, 2005)
⇒ At time t, only those still at risk classified as cases or controls
61 Biomarkers
Incident Cases, Dynamic Controls
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
●
●
DiedCensored
62 Biomarkers
Incident Cases, Dynamic Controls
t = 4
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
63 Biomarkers
Incident Cases, Dynamic Controls
t = 4
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
64 Biomarkers
Incident/Dynamic ROC to assess time-varying trends
t = 0
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
65 Biomarkers
Incident/Dynamic ROC to assess time-varying trends
t = 2
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
66 Biomarkers
Incident/Dynamic ROC to assess time-varying trends
t = 4
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
67 Biomarkers
Incident/Dynamic ROC to assess time-varying trends
t = 6
Study Time
Sub
ject
0 2 4 6 8
48
1216
20
●●
●●●
●●
●●●
●●
●●●
●●
●●●
●●
●●
●●
●●●
●●
●●●
●
●●
68 Biomarkers
Incident Cases, Dynamic Controls
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 869 Biomarkers
70 Biomarkers
Incident/Dynamic ROC
• More formally,
sensitivity I (c, t) = P(M > c|case(t)) = P(M > c |T = t)
specificityD(c, t) = P(M ≤ c |control(t)) = P(M > c |T > t)
• Using these definitions, define ROC(t) curve for any time t.
• Area Under the Curve:
AUC I/D(t) = P(Mj > Mk |j = case(t), k = control(t))
= P(Mj > Mk |Tj = t,Tk > t)
71 Biomarkers
Incident/Dynamic ROC
• More formally,
sensitivity I (c, t) = P(M > c|case(t)) = P(M > c |T = t)
specificityD(c, t) = P(M ≤ c |control(t)) = P(M > c |T > t)
• Using these definitions, define ROC(t) curve for any time t.
• Area Under the Curve:
AUC I/D(t) = P(Mj > Mk |j = case(t), k = control(t))
= P(Mj > Mk |Tj = t,Tk > t)
71 Biomarkers
Incident/Dynamic ROC
• More formally,
sensitivity I (c, t) = P(M > c|case(t)) = P(M > c |T = t)
specificityD(c, t) = P(M ≤ c |control(t)) = P(M > c |T > t)
• Using these definitions, define ROC(t) curve for any time t.
• Area Under the Curve:
AUC I/D(t) = P(Mj > Mk |j = case(t), k = control(t))
= P(Mj > Mk |Tj = t,Tk > t)
71 Biomarkers
AUC(t) and Concordance
Q: The I/D ROC curve and AUC(t) provide time-specificsummaries of accuracy, but is there a single global summary?Concordance:
C = P(Mj > Mk | Tj < Tk)
C =
∫t
AUC (t) · w(t) dt
with w(t) = 2 · f (t) · S(t)
72 Biomarkers
AUC(t) and Concordance
• Time can be restricted to (0, τ) to obtain:
C τ = P(Mj > Mk | Tj < Tk ,Tj < τ)
=
∫ τ
0AUC (t) · w τ (t) dt
with w τ (t) = w(t)/[1− S2(τ)]
73 Biomarkers
Estimation: Issues
• FPDt (c) = P(M > c | T > t) can be estimated
non-parametrically for times when∑
i Ri (t) moderate-to-large,where Ri (t) = 1(T ∗i > t), “at-risk” indicator.
• However, estimation of TPIt (c) = P(M > c | T = t) requires
some sort of smoothing since the observed subset with Ti = tmay only contain one observation.
• Semiparametric method smoothes by fitting a hazard model(Heagerty & Zheng, 2005)
• Nonparametric method uses kernel-based smoothing -preferable due to fewer assumptions (Saha-Chaudhuri &Heagerty, 2013; Bansal & Heagerty, in progress)
• Bootstrap resampling method for obtaining confidenceintervals
74 Biomarkers
Estimation: Issues
• FPDt (c) = P(M > c | T > t) can be estimated
non-parametrically for times when∑
i Ri (t) moderate-to-large,where Ri (t) = 1(T ∗i > t), “at-risk” indicator.
• However, estimation of TPIt (c) = P(M > c | T = t) requires
some sort of smoothing since the observed subset with Ti = tmay only contain one observation.
• Semiparametric method smoothes by fitting a hazard model(Heagerty & Zheng, 2005)
• Nonparametric method uses kernel-based smoothing -preferable due to fewer assumptions (Saha-Chaudhuri &Heagerty, 2013; Bansal & Heagerty, in progress)
• Bootstrap resampling method for obtaining confidenceintervals
74 Biomarkers
Estimation: Issues
• FPDt (c) = P(M > c | T > t) can be estimated
non-parametrically for times when∑
i Ri (t) moderate-to-large,where Ri (t) = 1(T ∗i > t), “at-risk” indicator.
• However, estimation of TPIt (c) = P(M > c | T = t) requires
some sort of smoothing since the observed subset with Ti = tmay only contain one observation.
• Semiparametric method smoothes by fitting a hazard model(Heagerty & Zheng, 2005)
• Nonparametric method uses kernel-based smoothing -preferable due to fewer assumptions (Saha-Chaudhuri &Heagerty, 2013; Bansal & Heagerty, in progress)
• Bootstrap resampling method for obtaining confidenceintervals
74 Biomarkers
Nonparametric Estimation of AUCI/D(t)
Recall:
• AUCI/D(t) = P(Mj > Mk |j = case, k = control)
• Interpretation: Given a random case and a random control,the probability that the case has a higher marker value thanthe control
75 Biomarkers
Nonparametric Estimation of AUCI/D(t)
●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
7 8 9 10 11 12
0.0
0.2
0.4
0.6
0.8
1.0
Survival Time
Ran
k
At time t,
• Calculate a rank for each case in the risk set relative to the controlsin the risk set
• Mean rank at time t is calculated as the mean of the ranks for allcases in the risk set at t
• For window around t, AUC(t) = local average of the mean ranksacross all observed event times in that window
76 Biomarkers
Nonparametric Estimation of AUCI/D(t)
●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
7 8 9 10 11 12
0.0
0.2
0.4
0.6
0.8
1.0
Survival Time
Ran
k
At time t,
• Calculate a rank for each case in the risk set relative to the controlsin the risk set
• Mean rank at time t is calculated as the mean of the ranks for allcases in the risk set at t
• For window around t, AUC(t) = local average of the mean ranksacross all observed event times in that window
76 Biomarkers
Nonparametric Estimation of AUCI/D(t)
●
●● ●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
7 8 9 10 11 12
0.0
0.2
0.4
0.6
0.8
1.0
Survival Time
Ran
k
At time t,
• Calculate a rank for each case in the risk set relative to the controlsin the risk set
• Mean rank at time t is calculated as the mean of the ranks for allcases in the risk set at t
• For window around t, AUC(t) = local average of the mean ranksacross all observed event times in that window
76 Biomarkers
Motivation: Treatment Prioritization
• Organ transplantation seeks to prioritize limited donor organsby identifying those subjects who are at risk of death withoutintervention (and who would do well if transplanted).
• Lung Allocation Score (see Gries et al. 2010)• MELD Score (Model for Endstage Liver Disease)
• The scientific goal is one where over time a goodmodel/marker would identify those subjects at risk of death(from among those still at-risk).
• Q: Where do diseased subjects who die rank among those inthe risk set?
77 Biomarkers
78 Biomarkers
Additional Comments
• For a baseline marker the C-index can also be estimated as theglobal weighted average:
C =
∫A(t) · 2f (t)S(t) dt
• We can also directly apply the WMR estimator to time-dependentcovariates, M(t), since the method is based on risk-sets and thecase rank within the riskset.
• Time-dependent Covariate Example:
• Cystic Fibrosis Data• FEV1, height, and weight are time-dependent
79 Biomarkers
Time
AU
C(t
) / R
ank
10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
AUC Based on Risk Set Rank
80 Biomarkers
Some Current Extensions
• Q: How often does the CASE marker rank in the top 10% of therisk set (or among controls)?
• This concept is directly connected to sensitivity:
E {1[A(t) > (1− p)]} = CASE(t) > (1-p)% of CONTROLS(t)
= TP I/D(p, t)
(Saha-Chaudhuri & Heagerty, 2018)
81 Biomarkers
Some Current Extensions
• Non-Parametric TPI/D
• Select a value of false-positive rate: p• Derive the indicators:
H(t, p) = 1[ A(t) > (1− p) ]
• Locally weighted averages to obtain smooth curve in time:
TPI/Dhn (t, p) =
∑j
K∗hn(t − tj) · H(t, p)
82 Biomarkers
Summary
• The Case rank is a descriptive summary that is clinicallymeaningful.
• Using the case-rank provide a basis for non-parametricestimation of time-dependent accuracy summaries.
• WMR provides non-parametric estimation with analyticalexpressions for standard errors.
• Methods extend to allow time-dependent markers.
• Methods extend to estimation of time-dependent sensitivity.
83 Biomarkers
Illustration: Multiple Myeloma
Overall survival from registration in study
• Survival curves for Multiple Myeloma have differently shapedsegments
• Are different segments governed by different baseline prognosticvariables (due to different disease biology)?
• Variables representing disease aggressiveness more dominant earlier
84 Biomarkers
Illustration: Multiple Myeloma
• Goal: To compare different prognostic markers with respect to theirperformance over time
• Their approach: Examine associations of variables with survivalusing Cox proportional hazards regression
• Our approach: Classification
(Bansal & Heagerty, 2019)85 Biomarkers
Illustration: Multiple Myeloma
• N = 775
• Prospective randomized trial comparing high-dosechemoradiotherapy to standard chemotherapy
• Median follow-up = 8.2 years
• Median survival = 4.0 years
• Survival similar in both study arms. Pooled together forprognostic marker analysis
86 Biomarkers
Candidate Prognostic Markers
8 baseline variables investigated:
• Age
• Albumin
• Calcium
• Creatinine
• Hemoglobin
• Lactic Hydrogenase (LDH)
• Platelet Count
• Serum beta-2-microglobulin (SB2M)
87 Biomarkers
Methods
AUCCumulative/Dynamic(t)
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
AUCIncident/Dynamic(t)
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
88 Biomarkers
Results
0 1 2 3 4 5 6
0.40
0.50
0.60
0.70
AUCC/D(t)
Time (years)
AU
C(t
)
Platelet CountSB2MCreatinineAge
0 1 2 3 4 5 6
0.40
0.50
0.60
0.70
AUCI/D(t)
Time (years)
AU
C(t
)
Platelet CountSB2MCreatinineAge
0 1 2 3 4 5 6
0.40
0.50
0.60
0.70
AUCC/D(t)
Time (years)
AU
C(t
)
AlbuminCalciumLDHHemoglobin
0 1 2 3 4 5 6
0.40
0.50
0.60
0.70
AUCI/D(t)
Time (years)
AU
C(t
)AlbuminCalciumLDHHemoglobin
89 Biomarkers
Summary
Accuracy summary
ROCI/Dt (p) : vary (M,t)
AUC (t) : vary (t)
C : global
90 Biomarkers
91 Biomarkers