Upload
corinium-global
View
949
Download
0
Embed Size (px)
Citation preview
Changing Lives in Healthcare
through Machine Learning
VP, Innovation RedBird, a Havas Health Company
@1Geek
By
Douglas Barr
ML Checklist
• Step 1. Pick an industry. Any industry.
• Step 2. Find a problem that can be formulated as a function.
• Step 3. Is that function non-trivial? If not, go back to Step 1.
• Step 4. List all the input parameters for that function.
• Step 5. Is any of them accurately observable? If not, focus on that parameter and go back to Step 2.
• Step 6. Apply some ML to it. (Choice of the tech wouldn't matter that much.)
• Step 7. Had some improvement?
Why Am I On-stage?
• Primary focus ~4 yrs has been on ML algos
• Bachelor of Science, CS/EE M.I.T(IFHTP)
• Developed some very good models for tackling
issues in healthcare
• Developed a super-cool conversational AI bot to
help patients with diabetes (shameful plug)
ABOUT US
MEASUREMENT, ANALYSIS, & IMPROVEMENT.
“Trust us, it worked” is something you will never hear at REDBIRD.
Providing rich data analysis is invaluable in today's environment.
At REDBIRD, we help brands break through the clutter to
understand not only what to measure and why, but what to do with
the information as well
We Are REDBIRD
Machine Learning &
Healthcare
• Medtronic partnered with IBM for Sugar.IQ app
• Adherence: AiCure
• AI Coach: RedBird HealthBot (shameful plug…AGAIN!)
• Healthy Behavior: Welltok partnered with IBM Watson
Problem
Readmission rate is one of the key indicators for the
hospitals to maintain their quality. In 2014, Medicare
fined a record number of 2,610 hospitals for having
too many patients return within a month.
Source: http://khn.org/news/medicare-readmissions-penalties-2015/
Objective
Predict readmission classes of patients discharge to
home:
1. Readmitted within 30 days after discharge
2. Readmitted after 30 days after discharge
3. No readmission (between 1999-2008)
Predicting readmission within 30 days is very critical
for
not only the hospitals but patients as well
About The Data
• 101,766 patients hospitalization records
• Health Facts data was an extract representing 10
years (1999-2008) of clinical care at 130 hospitals
http://www.hindawi.com/journals/bmri/2014/781670/
Feature Extraction
• To obtain a high degree of predictive accuracy, our
model learned and identified the following 24 features
for training:
race', 'gender', 'ages', 'admission', 'discharge',
'admsource', 'time in hospital’, 'payer code', 'num lab
procedures', 'num procedures', 'num medications’,
'number outpatient', 'number emergency', 'number
impatient’, 'diag1', 'diag2' 'number diagnoses', 'max glu
serum', 'A1Cresult', 'insulin', 'change', 'diabetesMed'
Model • Model was categorized into 40 categories based on ICD-9
codes.
• Data was split 4:1:5 (training: validation: testing set)
• Tested the following classifiers:
• Random Forest
• KNN
• LR (Lasso and Ridge regularization)
• Naïve Bayes
• SVM
Top 18 Important Features
1. num_lab_procedures: 0.0463
2. Num_medications: 0.0454
3. Number_inpatient: 0.0442
4. Time_in_hospital: 0.0400
5. Ages: 0.0391
6. Number_diagnoses: 0.0365
7. Num_procedures: 0.0325
8. Gender_male: 0.0239
9. Number_outpatient: 0.0203
10. Number_emergency: 0.0186
11. Insulin_steady: 0.0161
12. Payer_code_MC: 0.0150
13. Race_caucasian: 0.0150
14. Diag2_circulatory: 0.0142
15. Medication change: 0.0137
16. Admission_urgent: 0.0118
17. Diag3_neoplasms: 0.0104
18. Diag2_diabetes: 0.0103
Objective
Inform females of their chances of being a carrier of
Duchenne Muscular Dystrophy (DMD) based on
serum markers and family pedigree
About The Data
• The data was obtained from M.Percy, Vanderbilt
University, 1985
• 209 observations corresponded to blood samples on
192 patients (17 patients have two samples)
• Collected as part of a screening program for female
relatives of boys with DMD
The Data (cont.)
• Enzyme levels were measured in known carriers
(75 samples) and in a group of non-carriers (134
samples)
• Of note: The first two serum markers, creatine
kinase and hemopexin (ck,h) are inexpensive to
obtain, while pyruvate kinase and lactate
dehydroginase (pk,ld) are more expensive
• It is of interest to measure how much pk and ld
add toward predicting the carrier status
Result •Using a Two-Class Decision Forest
algorithm, we obtained a 95% accuracy in
our predictive model with 87% precision
•Further stats: • True Positives: 14
• False Positives: 2
• False Negatives: 0
• True Negative: 26
Improving Future Care
through Machine Learning
• For healthcare, the problem really isn’t regulation,
it’s data
• Can we truly base health decisions on some black-
box computations?
• We need to really begin thinking about ramifications
• With great power, comes great responsibility