Havas Health presentation at the Chief Analytics Officer, Fall 2016

Changing Lives in Healthcare

through Machine Learning

VP, Innovation RedBird, a Havas Health Company

@1Geek

By

Douglas Barr

ML Checklist

• Step 1. Pick an industry. Any industry.

• Step 2. Find a problem that can be formulated as a function.

• Step 3. Is that function non-trivial? If not, go back to Step 1.

• Step 4. List all the input parameters for that function.

• Step 5. Is any of them accurately observable? If not, focus on that parameter and go back to Step 2.

• Step 6. Apply some ML to it. (Choice of the tech wouldn't matter that much.)

• Step 7. Had some improvement?

Why Am I On-stage?

• Primary focus ~4 yrs has been on ML algos

• Bachelor of Science, CS/EE M.I.T(IFHTP)

• Developed some very good models for tackling

issues in healthcare

• Developed a super-cool conversational AI bot to

help patients with diabetes (shameful plug)

ABOUT US

MEASUREMENT, ANALYSIS, & IMPROVEMENT.

“Trust us, it worked” is something you will never hear at REDBIRD.

Providing rich data analysis is invaluable in today's environment.

At REDBIRD, we help brands break through the clutter to

understand not only what to measure and why, but what to do with

the information as well

We Are REDBIRD

Yay! ML Library Buzzwords! • Python, C++

• Libraries

• My own ConvoNet for NLP

torch Keras

Life is short. Do stuff that matters.

- Paul Graham, Y Cominbator

What A $%*# Mess!

Patient Life

We Are More Connected Than

Ever

Machine Learning &

Healthcare

• Medtronic partnered with IBM for Sugar.IQ app

• Adherence: AiCure

• AI Coach: RedBird HealthBot (shameful plug…AGAIN!)

• Healthy Behavior: Welltok partnered with IBM Watson

Predictive Medicine

Don’t Be Evil

• ML can be used for good, hopefully not bad

• ANNs in a regulated industry? Hmmm…..

The most important commodity I know of is information.

- Gordon Gekko, Wall Street (1987)

CASE STUDY: PREDICT READMISSION

CLASSES OF PATIENTS DISCHARGED TO

HOME

Problem

Readmission rate is one of the key indicators for the

hospitals to maintain their quality. In 2014, Medicare

fined a record number of 2,610 hospitals for having

too many patients return within a month.

Source: http://khn.org/news/medicare-readmissions-penalties-2015/

Objective

Predict readmission classes of patients discharge to

home:

1. Readmitted within 30 days after discharge

2. Readmitted after 30 days after discharge

3. No readmission (between 1999-2008)

Predicting readmission within 30 days is very critical

for

not only the hospitals but patients as well

About The Data

• 101,766 patients hospitalization records

• Health Facts data was an extract representing 10

years (1999-2008) of clinical care at 130 hospitals

http://www.hindawi.com/journals/bmri/2014/781670/

Feature Extraction

• To obtain a high degree of predictive accuracy, our

model learned and identified the following 24 features

for training:

race', 'gender', 'ages', 'admission', 'discharge',

'admsource', 'time in hospital’, 'payer code', 'num lab

procedures', 'num procedures', 'num medications’,

'number outpatient', 'number emergency', 'number

impatient’, 'diag1', 'diag2' 'number diagnoses', 'max glu

serum', 'A1Cresult', 'insulin', 'change', 'diabetesMed'

Model • Model was categorized into 40 categories based on ICD-9

codes.

• Data was split 4:1:5 (training: validation: testing set)

• Tested the following classifiers:

• Random Forest

• KNN

• LR (Lasso and Ridge regularization)

• Naïve Bayes

• SVM

Classifiers

Random Forest – F1

Top 18 Important Features

1. num_lab_procedures: 0.0463

2. Num_medications: 0.0454

3. Number_inpatient: 0.0442

4. Time_in_hospital: 0.0400

5. Ages: 0.0391

6. Number_diagnoses: 0.0365

7. Num_procedures: 0.0325

8. Gender_male: 0.0239

9. Number_outpatient: 0.0203

10. Number_emergency: 0.0186

11. Insulin_steady: 0.0161

12. Payer_code_MC: 0.0150

13. Race_caucasian: 0.0150

14. Diag2_circulatory: 0.0142

15. Medication change: 0.0137

16. Admission_urgent: 0.0118

17. Diag3_neoplasms: 0.0104

18. Diag2_diabetes: 0.0103

Conclusion

With 98% accuracy, our model is a

good indicator as to what could be

done with more data

CASE STUDY: DUCHENE

MUSCULAR DYSTROPHY

Predicting carrier diagnosis using Machine Learning

Objective

Inform females of their chances of being a carrier of

Duchenne Muscular Dystrophy (DMD) based on

serum markers and family pedigree

About The Data

• The data was obtained from M.Percy, Vanderbilt

University, 1985

• 209 observations corresponded to blood samples on

192 patients (17 patients have two samples)

• Collected as part of a screening program for female

relatives of boys with DMD

The Data (cont.)

• Enzyme levels were measured in known carriers

(75 samples) and in a group of non-carriers (134

samples)

• Of note: The first two serum markers, creatine

kinase and hemopexin (ck,h) are inexpensive to

obtain, while pyruvate kinase and lactate

dehydroginase (pk,ld) are more expensive

• It is of interest to measure how much pk and ld

add toward predicting the carrier status

Result •Using a Two-Class Decision Forest

algorithm, we obtained a 95% accuracy in

our predictive model with 87% precision

•Further stats: • True Positives: 14

• False Positives: 2

• False Negatives: 0

• True Negative: 26

Improving Future Care

through Machine Learning

• For healthcare, the problem really isn’t regulation,

it’s data

• Can we truly base health decisions on some black-

box computations?

• We need to really begin thinking about ramifications

• With great power, comes great responsibility

Thank you.

Data & Analytics

Havas Health presentation at the Chief Analytics Officer, Fall 2016