Upload
carol-mcdonald
View
342
Download
5
Embed Size (px)
Citation preview
®© 2017 MapR Technologies 1 ® 1 MapR Confidential © 2017 MapR Technologies
®
Applying Machine Learning to Live Patient Data Carol McDonald (@caroljmcdonald) & Joseph Blue (@joebluems) March 15, 2017
®© 2017 MapR Technologies 2 ® 2 MapR Confidential
Data-Driven Experience
®© 2017 MapR Technologies 3 ® 3 MapR Confidential
The Promise of Big Data in Healthcare
SMARTER BIGGER FASTER
®© 2017 MapR Technologies 4 ® 4 MapR Confidential
Life moves pretty fast. If you don't stop and look around once in a
while, you could miss it. Ferris Bueller, Fictional High School Student
®© 2017 MapR Technologies 5 ® 5 MapR Confidential
Reading an EKG
P
Q
R
S
T atrial
depolarization
ventricular depolarization
ventricular repolarization
®© 2017 MapR Technologies 6 ® 6 MapR Confidential
Windowing the EKG for Clustering
window length = 32, step size = 2
®© 2017 MapR Technologies 7 ® 7 MapR Confidential
Displaying Centroids
Showing 25 of K=400 centroids
Begin reconstruction
®© 2017 MapR Technologies 8 ® 8 MapR Confidential
Reconstructing the Signal
1 2
1
2
+
window length = 32, step size = 16
®© 2017 MapR Technologies 9 ® 9 MapR Confidential
Diagnosing the Anomalies
residuals
®© 2017 MapR Technologies 10 ® 10 MapR Confidential
Putting it all together…
shape catalog
input reconstruct
encoder t-digest
error
quantile estimator
®© 2017 MapR Technologies 11 ® 11 MapR Confidential © 2016 MapR Technologies © 2017 MapR Technologies
Use Case Architecture
®© 2017 MapR Technologies 12 ® 12 MapR Confidential
Lots of things are producing Streaming Data
Data Collection Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
®© 2017 MapR Technologies 13 ® 13 MapR Confidential
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
Streams capture unbounded sequences of events
Old Message
New Message
Events are delivered in the order they are received, like a queue.
Kafka API Kafka API
®© 2017 MapR Technologies 14 ® 14 MapR Confidential
Stream Topics Organize Events into Categories
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
Unlike a queue messages are not deleted, allows processing of same event for different views
®© 2017 MapR Technologies 15 ® 15 MapR Confidential
Predictive Analytics
Machine Learning
Algorithms
Test Model Predictions
Model Evaluation
Predictive Model Predictions
Model Building
Model scoring
Featurization
Historical Data
+ + +
+ + +
New Data
Stream
Topic
®© 2017 MapR Technologies 16 ® 16 MapR Confidential
Stream Processing Architecture Serve Data Collect Data Data Sources Stream Processing
Derive features
process
Batch Processing
Model
build model update model
Machine-learning
Models
Devices
Feature extraction
Stream
Topic
Images
HL7
Social Media
lab
Stream
Topic
®© 2017 MapR Technologies 17 ® 17 MapR Confidential
// put data in a vector val vrdd = rdd.map(line => Vectors.dense(line.split('\t').map(_.toDouble))) //window and normalize each record.... // call Kmeans , which returns the model val model = KMeans.train(processed, 300, 10) model.save(sc, "/user/user01/data/anomaly-detection-master")
Build Model
®© 2017 MapR Technologies 18 ® 18 MapR Confidential © 2016 MapR Technologies © 2017 MapR Technologies
Use the Model with Streaming Data
®© 2017 MapR Technologies 19 ® 19 MapR Confidential
Use Case: Real Time Anomaly Detection
real-time monitoring
read
EKG data
Spark processing enrich with cluster normalized data
Spark Streaming
Stream
Topic
Stream
Topic
17.9200 12.8000 38.4000 {”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
®© 2017 MapR Technologies 20 ® 20 MapR Confidential
Create a DStream
DStream: a sequence of RDDs representing a stream of data
val model = KMeansModel.load(ssc.sparkContext, modelpath) val messagesDStream = KafkaUtils.createDirectStream[String, String]( ssc, LocationStrategies.PreferConsistent, consumerStrategy )
batch time 0 to 1
batch time 1 to 2
batch time 2 to 3
dStream
Stored in memory as an RDD
Stream
Topic
®© 2017 MapR Technologies 21 ® 21 MapR Confidential
Process DStream // get message values from key,value val valuesDStream: DStream[String] = messagesDStream.map(_.value()) valuesDStream.foreachRDD { rdd => val producer = KafkaProducerFactory.getOrCreateProducer(conf) .... // enrich message with model val cluster = model.predict(processed) .... val record = new ProducerRecord(topicp, "key", message) // send enriched message producer.send(record) } }
®© 2017 MapR Technologies 22 ® 22 MapR Confidential
Process DStream
dStream RDDs
batch time 2 to 3
batch time 1 to 2
batch time 0 to 1
ValueDStream RDDs
Transformed RDDs
map map map Stream
Topic
®© 2017 MapR Technologies 23 ® 23 MapR Confidential
Use Case: Real Time Anomaly Detection
real-time monitoring
read
Spark processing enrich with cluster normalized data
Spark Streaming
Stream
Topic
Vert.x HTTP
Event bus
WebSocket Event Bus Framework
{”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
{”c":120,"colA":[17.92, 12.88, ..],"colB":[17.91, 12.89, 0...]}
®© 2017 MapR Technologies 24 ® 24 MapR Confidential © 2016 MapR Technologies © 2017 MapR Technologies
®© 2017 MapR Technologies 25 ® 25 MapR Confidential
Resources • EKG basics - http://en.wikipedia.org/wiki/Electrocardiography • Source data -
http://physionet.org/physiobank/database/apnea-ecg/ • K-Means basics -
http://www.coursera.org/learn/machine-learning/lecture/93VPG/k-means-algorithm
• Code repositories – Streaming: http://github.com/caroljmcdonald/sparkml-streaming-ekg – UI: http://github.com/caroljmcdonald/mapr-streams-vertx-dashboard
• t-digest for anomalies - http://github.com/tdunning/t-digest
®© 2017 MapR Technologies 26 ® 26 MapR Confidential
e-book available courtesy of MapR
https://www.mapr.com/practical-machine-learning-new-look-anomaly-detection
A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman (published by O’Reilly)
®© 2017 MapR Technologies 27 ® 27 MapR Confidential
MapR Blog mapr.com/blog
®© 2017 MapR Technologies 28 ® 28 MapR Confidential
Q & A
@mapr
Engage with us!
mapr-technologies
Carol McDonald (@caroljmcdonald) Joseph Blue (@joebluems)