Upload
lisa-federer
View
281
Download
0
Embed Size (px)
Citation preview
A Data-Driven Nomogram for Breast Cancer SurvivalCapstone by Lisa Federer, Bret Gaulin, Michal Haskell, Andy Pollack, and Carmen Smiley
The Problem
Image source: http://cdn3.chartsbin.com/chartimages/l_yq6_22d511a9fafd3dee85027e451151ed18
1 in 8 US women will be diagnosed with breast cancer during their lifetime.
Image source: http://www.pathophys.org/wp-content/uploads/2012/12/breastcancer-copy.png
Our hypothesisCertain demographic, diagnostic, and treatment parameters can reliably predict survival time for women with breast cancer. Using this knowledge, we could build a “calculator” to estimate survival time for individuals.
Data exploration and variable selection
• Survival time (months)• Age at diagnosis• Year of birth• Race• Origin (Hispanic recode)• Stage• Histology• Tumor extent• Number of primary tumors• Laterality• ER Status• PR Status• Radiation therapy
146 variables in SEER database
13 variablesof interest
Classification by survival time
Classification model resultsModel Precision Recall f-score Support
Logistic Regression
0 0.65 0.43 0.23 146
1 0.55 0.37 0.44 4162 0.55 0.76 0.63 471
Average 0.56 0.56 0.54 1033
Naïve Bayes
0 0.39 0.16 0.23 1621 0.63 0.3 0.4 4022 0.5 0.84 0.63 466
Average 0.54 0.84 0.63 1030
Decision Tree
0 0.3 0.39 0.34 1311 0.76 0.75 0.75 4162 0.8 0.75 0.78 485
Average 0.72 0.7 0.71 1032
Random Forest
0 0.58 0.43 0.5 1831 0.73 0.75 0.74 4142 0.71 0.78 0.74 440
Average 0.7 0.7 0.7 1037
K Neighbors
0 0.53 0.33 0.41 1501 0.83 0.82 0.82 4252 0.77 0.88 0.82 472
Average 0.76 0.77 0.76 1047
Regression and survival models
Cox proportional hazards model
Aalen’s additive model
Nomogram demonstration
Limitations
Not a substitute for medical advice
Limitations
Missing or incomplete data in SEER database
Questions?