Upload
pierre-gutierrez
View
455
Download
1
Embed Size (px)
Citation preview
Machine Learning and Internet of Things The Future of Medical Prevention
Introduction
Dataiku
• Founded in 2013 • 60 + employees • Paris, New-York, London, San Francisco
Data Science Software Editor of Dataiku DSS
DESIGN
Load and prepare your data
PREPARE Build your
models
MODEL Visualize and share
your work
ANALYSE
Re-execute your workflow at ease
AUTOMATE Follow your production
environment
MONITOR Get predictions
in real time
SCORE PRODUCTION
A data science workflow Six steps to a predictive model
Data Exploration &
Understanding
Data Preparation Model Creation
Evaluation Deployment Data
Acquisition
Dataset 1
Scored dataset
Model as an API
Iteration 1 Iteration 2
Iteration n
Creating a predictive model is an highly iterative process. Data Science Studio enables its users to create and manage these projects from end-to-end. This process is not industry specific, and can be applied to many use cases.
Dataset 2
Dataset n
Business/Problem Understanding
Adapted from the CRISP-DM methodology
Epilepsy Stats and figures
1-‐3% of the popula/on
15.5 billion Euros / year spent trea/ng seizures
6 Types of epilepsy
Dozens of exis/ng treatments
Days to weeks of hospital /me required to diagnose
Epilepsy + IoT Faster, more comfortable diagnosis
+
EEG Seizures on an EEG
EEG Spikes on an EEG
Goals Improve Epilepsy Diagnosis
1. Allow at-home EEG recording via wearable device
2. Detect seizures automatically
3. Detect spikes automatically
4. Shorten time-to-diagnosis for patients with epilepsy
Ageing Stats and figures
x3 Over last 60 years
12 million fall every year in the U.S
700 million People older than 60 in 2006
28.5 % of this popula/on leave alone in the EU
Third leading Cause of death : strokes.
3/4 of all strokes happen to people over 65
Safe Aging with Sphere Improve Falling Detection
Credit Aakansh Gupta : http://datascience.blog.uhuru.co.jp/machine-learning/safe-aging-with-iot-and-machine-learning/
Goals Improve Falling Detection
1. Predict falls and detect strokes so that help may be summoned
2. Analyse eating behaviour - including whether people are taking
prescribed medication
3. Detect periods of depression or anxiety and intervene using a computer
based therapy
Data Acquisition / Preparation
The concept We’re taking a digital sample of an analog signal
Data Collected True Signal
The Data
1024 Hz x 24 channels = 353 Mb / (hour x patient)
20 patients X 24 hours = 170 Gb
Nightly transfers of data from device to cloud (via wifi)
We want to scale to hundreds of patients with
days of data
Epilepsy
The Data • Accelerometer - Sampled at 20 Hz;
• RGB-D - Bounding box information
• Environmental - The values of passive infrared (PIR) sensors
Safe Aging
The Data Needs
• Interpolation • Missing data, synchronization fail
• Smart Sampling • Zoom at different frequency levels
• Different sensors -> different frequency.
-> how to merge ?
• Aggregation
Time Series as Relational Data
Time Stamp 10001 10002 10003 10004 10005
Sensor1 40 - - 43 42
Sensor2 - 50 55 20 -
Sensor3 30 34 60 - 40
Aggregation
Resampling
Interpolation
Time Series as JSON
{ "sensor1": { 10001: 40, 10004: 43, 10005: 42 }, "sensor2": { 10002: 50, 10003: 55, 10004: 20 } … }
Aggregation
Resampling
Interpolation
Time Series as Time Series Time Series Database
Aggregation
Resampling
Interpolation
Signal Processing Lots of libraries, lots of options
Rename Generate Rolling mean Rolling max Rolling min Rolling median Wavelet decomposition STL decomposition Peak detection Low pass filter High pass filter Convolution Correlation Short-time FFT
Implemented with common interface
+
+
Model Creation and Evaluation
EEG Spikes on an EEG
0
0
0
1
Sphere Movement detection
Machine learning Features
• Descriptive features Epilepsy Pa/ent informa/on : EMR…
Safe aging EMR, age, height
• Time series features
Epilepsy Current values, previous values, correla/ons, Fourier, Wavelets, …
Safe aging Current values, previous values, rolling averages, …
Machine learning Features
• More data Means less feature engineering
Safe aging (lot’s of values)
Xgboost on current and previous values: Let the model find the interac/ons
Epilepsy (millions of lines) RNN, LSTM. Network Architecture = Feature engineering
Training, Testing, Validation 4 patients, 4 readings from each patient
Split 1: Awesome performance 4 patients, 4 readings from each patient
Training Testing
AUC = 0.94
Split 2: OK performance 4 patients, 4 readings from each patient
Training Testing
AUC = 0.70
Worries About our spike detection model
Poor generalization to new patients
What about new devices?
What about different doctors creating annotations?
Solution: more patients, more doctors, more devices
Worries About our position detection model
Average generalization to new patients
What about new devices?
What about different home / rooms ?
Data Solution : more patients, more houses, more devices
Practical Solution : warm start with house + person. Expensive
Deploy
Deploy
• Model Deployment Epilepsy Diagnosis Batch scoring on all record a_er X days
Epilepsy Spike detec:on Batch scoring (used for diagnosis)
Epilepsy seizure detec:on Real /me scoring
Safe aging Real /me scoring every second
Theory
• Maintain your feature flow !
Deploy
• Don’t underestimate real life conditions
• Anomalies
• Headset in wrong position
• Bracelet in wrong hand
• Hardware / sensors deficiency
Practice
• Challenge: go beyond clinical experiments
Summary
Summary
• IoT devices can improve early detection (epilepsy, fall,…)
• IoT devices produce lots of data – use databases made for IoT
• Standard workflow – acquire, visualize, prepare, model – can be replicated for IoT devices using open source software
• Differences between patients remains a challenge for prediction algorithms
IoT devices for medical applications
@prrgu/errez