Insight Recent Demo

Embed Size (px)

Citation preview

  1. 1. Crowd DetectorCrowd Detector Reza Asad Insight Data Engineering June 2015
  2. 2. Motivation Avoid waiting time in crowded areas.
  3. 3. Data Lets imagine we had data about people's location. This could be collected form people's cell phones. How can we use such data?
  4. 4. Naive Approach
  5. 5. Demo
  6. 6. Data But such data is not available to me ... Solution : Engineer the data! Take data from yelp Perform a random walk
  7. 7. Pipeline Data
  8. 8. Engineering Challenges Choosing K?
  9. 9. Engineering Challenges The area of SF: 46.87 mi For the purpose of this project each cluster is 0.09 mi This means k is roughly 500
  10. 10. Engineering Challenges Parameters to tune: Time it takes to produce the messages Processing time for k-means in Spark Streaming The update interval for a fixed data point in the database
  11. 11. Goal Tune the parameters in order to have a stable system The total delay after processing each batch must be constant and comparable to the batch interval. You can check this in the Spark API
  12. 12. Tackling Challenges Having multiple producers and consumers Kafka is fast with sending messages and is not the bottleneck Establishing some safe limits: Using spark.streaming.receiver.maxRate to control the input rate Understanding the complexity of the process in Spark Streaming Choosing the right batch interval
  13. 13. Raw Data
  14. 14. Data Process Data filteration in spark streaming
  15. 15. Data Process
  16. 16. About Me Long time ago - B.S in pure math, University of Toronto More recent - M.S in applied math, University of British Columbia The exciting now - A data engineer who wants to go camping with other data engineers