1 Active learning based survival regression for censored data Bhanukiran Vinzamuri ([email protected]) Yan Li ([email protected]) Chandan K. Reddy([email protected])

• 1

Active learning based survival regression for censored data

Bhanukiran Vinzamuri ([email protected])

Yan Li ([email protected])

Chandan K. Reddy([email protected])

• 2

Index of presentationIntroduction

Problem Description

Cox Regression

Regularized Cox Regression

Coordinate Majorization Descent solver for EN-COX.

KEN-COX

Model discriminative Gradient based Sampling

Active learning based Regularized Cox Regression

Experimental Evaluation

Conclusions and Future Work

References

• 3

IntroductionCensored data are observed in many real world applications such as clinical health informatics, genomic analysis and finance applications.

Mining censored data poses a unique challenge to the data mining community due to the fact that standard regression models cannot be directly applied on them.

Censored data consists of two different entries associated with the feature vector.

A time-to-event variable to represent the time.

A binary indicator variable for representing the censored status.

• 4

Problem descriptionObserving the trends of 10 patients for 12 days post their discharge from their index hospitalization.

Patients (2,3,5) have the event of interest observed. Patients (1,8,9) do not have the event of interest before the end of the 12th day. Finally, patients (4,7) dropout from the study.

• 5

Notations used in this paperName Description

X n x m matrix of feature vectors.

T n x 1 vector of failure times.

K number of unique failure times.

δ n x 1 binary vector of censored status.

set of all patients at risk at time

β m x 1 regression coefficient vector

L(β) partial log-likelihood

h(t|X) conditional hazard probability

base hazard rate

base survival rate

S(t|X) conditional survival probability

Ke column wise kernel matrix

• 6

Cox regression Cox models the effect of covariates on the hazard rate

but leaves the baseline hazard rate unspecified.

It does NOT assume knowledge of absolute risk and estimates relative rather than absolute risk.

It uses the proportional hazards assumption which states that the hazard for any individual is a fixed proportion of the hazard for any other individual.

• 7

Mathematical formulation of Cox regression

Baseline Hazard Function

Baseline Survival Function

Conditional Survival Probability

• 8

Hazard probabilities predicted for EHR data

• 9

Regularized Cox RegressionCox regression models have the tendency to overfit which limits their generalizability to different scenarios.

LASSO provides a sparse solution but cannot handle correlation. Elastic net provides sparsity and is effective at handling correlation.

We look at a coordinate majorization descent (CMD) based solver for elastic net cox (EN-COX).

• 10

CMD solver for EN-COX

Composite log likelihood function to be minimized.

Coordinate wise component of the composite function.

Pre-computed term to accelerate the computation

Coordinate wise update for regression coefficient vector.

Soft threshold operator.

• 11

Regularized Cox Regression AlgorithmAlgorithm 1 Regularized Cox Regression (RegCox)

Require: Feature Set Censored variable Time-to-event Regularization parameter

1: Initialize β

2: repeat

3: Compute from using Equations (4),(5)

4: for j = 1,……,m do

5: Set the objective function and apply the CMD procedure

6: Compute the updating factor for computing using

Equation (6)

7:

8: end for

9: Update =

10: until Convergence of

11: Output

12: Output base hazard function using and

• 12

Extending solver for KEN-COX

The original elastic net is extended by adding another term which incorporates a column wise kernel similarity matrix into the computation.

The goal of introducing this additional term is to help the regularization be even more effective at handling correlation and grouped correlation.

• 13

Model discriminative gradient based sampling =

This sampling method chooses that instance which maximizes the criterion mentioned above. This criterion consists of two components the first being the hazard probability computed at each unique time-to-event values.

The second component is the absolute value of the gradient of the log likelihood function computed at the given point.

• 14

Active Regularized Cox Regression (ARC)Algorithm 2 ARC Algorithm

Require: Training Set Train, Unlabelled pool Pool, Time-to-event T, Censored status ,

Active learning rounds max

1:

2: repeat

3:

4: for each instance in Pool do

5: Use model discriminative gradient sampling for each instance in Pool

6: end for

7:

8: Query domain expert for label (time-to-event) of

9:

10:

11:

12: until

• 15

Flowchart for ARC with KEN-COX

• 16

Metrics for evaluation in survival analysis

Survival AUC is also called the concordance index.

In survival analysis physicians and researchers are often more interested in evaluating the relative risk of a disease between patients with different covariates, than the absolute survival times of these patients

The root mean squared error (RMSE) measures the goodness of fit obtained using the survival model.

• 17

Experimental SetupWe conduct experiments to evaluate our ARC framework on EHR data from Henry Ford Hospital Detroit, Michigan. In addition, publicly available censored datasets are also used for our evaluation.

Time-to-event (30 day readmission) values are calculated using the prior admission and discharge dates and patients are right censored using the 30 day readmission study period. For other survival datasets right censoring information is inherently provided.

We also generate synthetic datasets using the normal distribution to generate the feature vectors and a Weibull distribution to generate the synthetic response times.

• 18

# Instances, # Features and Active Learning Sampling Size in Dataset

Dataset #Inst #Feat Train(Samp Size)

Breast 686 10 100(20)

Colon 311 19 50(10)

PBC 888 15 200(20)

EHR 1 5675 98 500(100)

EHR 2 4379 98 500(100)

EHR 3 3543 98 500(100)

EHR 4 2826 98 500(50)

Syn1 500 15 100(15)

Syn2 500 50 100(15)

Syn3 100 50 50(1)

• 19

Experimental ResultsDataset L-COX EN-COX C-Boost RSF GBCI ARC-L ARC-

ENARC-KEN

Breast 0.61 0.63 0.67 0.68 0.69 0.65 0.6856 0.734

Colon 0.651 0.65 0.62 0.60 0.64 0.738 0.735 0.859

PBC 0.735 0.759 0.86 0.863 0.79 0.81 0.825 0.862

EHR 1 0.54 0.55 0.59 0.58 0.59 0.60 0.64 0.671

EHR 2 0.56 0.5822 0.60 0.61 0.601 0.66 0.68 0.71

EHR 3 0.533 0.553 0.59 0.59 0.58 0.575 0.58 0.601

EHR 4 0.54 0.55 0.58 0.569 0.56 0.585 0.581 0.645

Syn1 0.59 0.628 0.60 0.61 0.589 0.7823 0.838 0.92

Syn2 0.801 0.815 0.86 0.94 0.93 0.86 0.867 0.921

Syn3 0.67 0.688 0.64 0.64 0.664 0.73 0.78 0.81

• 20

Comparison of rMSE std values of ARCDataset ARC(LASSO) ARC(EN) ARC(KEN)

Breast

Colon

PBC

EHR 1

EHR 2

EHR 3

EHR 4

Syn1

Syn2

Syn3

• 21

Active learning curves

• Breast

• EHR 2• EHR 1

• Colon

• 22

Active learning curves contd..

• PBC • Synthetic1

• Synthetic3• Synthetic2

• 23

Conclusions and Future Work

We present an active learning extension to the cox regression framework using a novel model discriminative gradient based sampling procedure.

The proposed method uses a fast and scalable optimization method to converge efficiently. Experimental results on EHR data and censored datasets indicates that the proposed models have good discriminative ability and outperform other competing survival regression methods.

Future work includes studying extending this active learning model to transfer learning using regularized cox regression and accelerated failure time models.

• 24

ReferencesJ. P. Klein and M. L. Moeschberger. Survival analysis:techniques for censored and truncated data. Springer,2003.

P. Sasieni. Cox regression model. Encyclopedia of Biostatistics, 1999.

N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for cox proportional hazards model via coordinate descent. Journal of Statistical Software, 39(5):1–13, 2011.

B. Vinzamuri and C. K. Reddy. Cox regression with correlation based regularization for electronic health records. In Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 757–767.IEEE, 2013.

Y. Chen, Z. Jia, D. Mercola, and X. Xie. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and mathematical methods in medicine, 2013.

B. Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.

Y. Yang and H. Zou. A Cocktail Algorithm for Solving The Elastic Net Penalized Cox Regression in High Dimensions. Statistics and Its Interface, 2012.

http://dmkd.cs.wayne.edu/survival

Documents

1 Active learning based survival regression for censored data Bhanukiran Vinzamuri ([email protected]) Yan Li ([email protected]) Chandan K. Reddy([email protected])