Upload
ali-alshehab
View
363
Download
1
Embed Size (px)
Citation preview
LetsHang
Ali S. AlShehab Insight Data Engineering, 2015
MOTIVATION & DEMO
“Wish I knew you were there”
A tool that helps friends hangout
Real-time location map Batch Analysis: When are my friends available?
PIPELINE
User Data
Message Broker
Data Store
UI API
Camus HDFS
Batch Processing
Stream Processing
DATA FLOW Generated Data:
Cassandra Table (Batch): Cassandra Table (Streaming):
METRICS
Kafka Manager [Throughput]: Storm [Real-time Latency]:
Process ID Latency (ms)
Kafka Spout 18.476
_Acker 0.007
My Bolt 6.863
Total 25.339
Rate Mean 1 Min
Bytes in /sec 1.25 m 1.3 m
Bytes out /sec 3.75 m 3.9 m
[4.68 GB/Hr]
CLUSTER SETUP
HDFS Name Node Kafka Broker Spark Master Storm Nimbus Data Consumer
HDFS Data Node 1 Kafka Broker Data Producer Spark Worker Storm Supervisor
HDFS Dada Node 2 Kafka Broker Spark Worker Storm Supervisor
HDFS Data Node 3 Kafka Broker Spark Worker Storm Supervisor
Cassandra Seed
Cassandra
Cassandra
m4.large
m3.medium
CHALLENGES
• Stitching technologies together: • Pyleus Framework [Kafka – Storm] • Cassandra Driver • PySpark TargetHolding Package
• Memory monitoring and allocation
• Front-End rendering optimization
ABOUT ME B.Sc. in EECS – MIT M.Eng in EECS – MIT