Click here to load reader

Strata lightening-talk

  • View
    603

  • Download
    0

Embed Size (px)

DESCRIPTION

This is a lightening talk for Metamarkets' real-time rave party. It's a teaser talk to start the conversation

Text of Strata lightening-talk

  • 1. Data Insights in Netix Danny Yuan (@g9yuayon) Jae BaeFriday, March 1, 13 1
  • 2. Who Am I?Friday, March 1, 13 2
  • 3. Who Am I? Member of Netixs Platform Engineering team, working on very large scale data infrastructure (@g9yuayon)Friday, March 1, 13 2
  • 4. Who Am I? Member of Netixs Platform Engineering team, working on very large scale data infrastructure (@g9yuayon) Built and operated Netixs cloud crypto serviceFriday, March 1, 13 2
  • 5. Who Am I? Member of Netixs Platform Engineering team, working on very large scale data infrastructure (@g9yuayon) Built and operated Netixs cloud crypto service Worked with Jae Bae on querying multi-dimensional data in real timeFriday, March 1, 13 2
  • 6. Friday, March 1, 13 3Developers usually think about monitoring metrics when real-time data ismentioned. We have powerful monitoring systems that track millions of metricsper second. But Im not going to talk about it today. Monitoring metric is crucialdata. That itself would warrant another multi-hour talk by our monitoringteam. :-)
  • 7. No Monitoring Metrics TodayFriday, March 1, 13 3Developers usually think about monitoring metrics when real-time data ismentioned. We have powerful monitoring systems that track millions of metricsper second. But Im not going to talk about it today. Monitoring metric is crucialdata. That itself would warrant another multi-hour talk by our monitoringteam. :-)
  • 8. photo credit: http://www.ickr.com/photos/decade_null/142235888/sizes/o/in/photostream/Friday, March 1, 13 4Instead, Im going to talk about logs. Why is it interesting at all?
  • 9. 1,500,000Friday, March 1, 13 5During peak hours, our data pipeline collects over 1.5 million log events per second
  • 10. 70,000,000,000Friday, March 1, 13 6Or 70 billions a day on average.
  • 11. Server Farm Log Filter Sink Plugin Hadoop Server Farm Kafka Log Filter Sink Plugin Druid Log Collectors Server Farm Log Filter Sink Plugin ElasticSearchphoto credit: http://www.ickr.com/photos/decade_null/142235888/sizes/m/in/photostream/Friday, March 1, 13 7We have this tens of thousands of machines, all of which send log data over a robust datapipeline to highly reliable data collectors. The collectors then lter the data, transform thedata, and dispatch the data to to different destinations for further processing.Photo credit: http://www.ickr.com/photos/decade_null/142235888/sizes/m/in/photostream/
  • 12. Highly Reliable Data Pipeline Server Farm Log Filter Sink Plugin Hadoop Server Farm Kafka Log Filter Sink Plugin Druid Log Collectors Server Farm Log Filter Sink Plugin ElasticSearchphoto credit: http://www.ickr.com/photos/decade_null/142235888/sizes/m/in/photostream/Friday, March 1, 13 7We have this tens of thousands of machines, all of which send log data over a robust datapipeline to highly reliable data collectors. The collectors then lter the data, transform thedata, and dispatch the data to to different destinations for further processing.Photo credit: http://www.ickr.com/photos/decade_null/142235888/sizes/m/in/photostream/
  • 13. A Humble BeginningFriday, March 1, 13 8We didnt build everything in one night. Actually, we had a humble start. I did a lot of logscraping like these. I also used R to analyze logs. But these are specic tasks, and at somepoint
  • 14. A Humble BeginningFriday, March 1, 13 8We didnt build everything in one night. Actually, we had a humble start. I did a lot of logscraping like these. I also used R to analyze logs. But these are specic tasks, and at somepoint
  • 15. A Humble BeginningFriday, March 1, 13 8We didnt build everything in one night. Actually, we had a humble start. I did a lot of logscraping like these. I also used R to analyze logs. But these are specic tasks, and at somepoint
  • 16. A Humble BeginningFriday, March 1, 13 8We didnt build everything in one night. Actually, we had a humble start. I did a lot of logscraping like these. I also used R to analyze logs. But these are specic tasks, and at somepoint
  • 17. Friday, March 1, 13 9Something happened. Our traffic turned into a hockey stick, and the number of applicationsexploded. So, log traffic also exploded. Simple log scraping wouldnt cut it any more.
  • 18. Friday, March 1, 13 9Something happened. Our traffic turned into a hockey stick, and the number of applicationsexploded. So, log traffic also exploded. Simple log scraping wouldnt cut it any more.
  • 19. Application Application Application Application Application Application Application Application Application ApplicationFriday, March 1, 13 9Something happened. Our traffic turned into a hockey stick, and the number of applicationsexploded. So, log traffic also exploded. Simple log scraping wouldnt cut it any more.
  • 20. So We EvolvedFriday, March 1, 13 10So we evolved. One thing we built was a hadoop grep. This tool searches TBs of data. It ismuch more useful that the one provided by Apache Hadoop Distribution, because it supportsmany more Grep options

Search related