09-08-2016
1
Big Data Analytics
Dr. Umesh R. Hodeghatta, Ph.D
http://www.mytechnospeak.com
Email: [email protected]
8/9/2016 1
Outline
• Introduction to Big Data Processing
– What is Big Data
– Big DATA Challenges
– How to handle Big Data
• BIG Data Applications
• Big Data Architecture
• Big Data Tools
8/9/2016 2
What is Big Data
• Everyday, we create billions of bytes of data
– sensors used to gather climate information
– Social media posts
– digital pictures and videos
– Transaction records
– Web logs
– Cell phones, WhatsAp messages 1. Structured data 2. Unstructured data 3. Semi structured data
8/9/2016 3
Examples of Big Data
• Facebook handles 40-50 billion photos from its site members everymonth.
• Walmart handles more than 2 million customer transactions every hour, – databases estimated to contain more than 2.5
petabytes of data.
• FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide
• 235 Terabytes of data collected by the US Library of Congress in 2011
8/9/2016 4
09-08-2016
2
What do you do with DATA?
• What can you do with DATA?
• Why do you need BIG DATA?
8/9/2016 5
What is BIG-DATA
• DATA is BIG in volume
• Challenges are different from SMALL data
– How to handle BIG DATA processing?
– CPU?
– Memory?
– Storage?
• New tools and techniques required
8/9/2016 6
Characteristics of BIG DATA
BIG DATA
Structured &
Unstructured
Volume
Terrabytes Zettabytes
Batch processing Streaming Data
8/9/2016 7
Characteristics of Big Data
• Volume
– The size of the data
• Variety
– Documents, messages, facebook posts, youtube videos, etc
• Velocity
– Generation and processing of data
• Complexity
– Multiple sources, large volumes, less processing time
8/9/2016 8
09-08-2016
3
Enabler of Big Data
Challenges
• Capturing data
• Curation
• Storage
• Searching
• Sharing
• Transfer
• Analysis
• Presentation
• Increase in storage capacities
• Increase in processing powers
• Abundance of data
8/9/2016 9
Big Data and Analytics
• "Big Data" refers to not only storage of data but a practice which deals with analysing this data and utilizing the data analysis: – To derive strategic decisions
• such as introducing new category of products, restructuring the organization and improving the customer care services by analysing the customer care recordings/logs.
• Learning outcome of marketing campaigns
• Planning price strategy
8/9/2016 10
Big Data Technologies
• Processing
– Infrastructure to process and store huge volume of structured and unstructured data in real time
• Amazon, IBM, Microsoft, etc
• Analytics
– Capabilities to analyze the data, build the model and complex analysis
• MapReduce, MongoDB, NoSQL, Cassandra
8/9/2016 11
Magic Quadrant for Business Intelligence and Analytics Platforms
8/9/2016 12
09-08-2016
4
Hype Cycle for Emerging Technologies, 2015
8/9/2016 13 8/9/2016 14
Big Data Tools
8/9/2016 15
Big Data Tools
• NOSQL Database – MongoDB, Cassandra, Hbase, Zookepper, Redis
• MapReduce
– Hadoop, Hive, Pig, Kafka, Flume, MapR, Oozie, Greenplum
• Processing – R, Lucene, Solr, ElasticSearch, Google,
• Storage – HDFS, S3
• Servers – Elastic, Beanstalk, Google App Engine
8/9/2016 16
09-08-2016
5
BIG DATA Players
8/9/2016 17
BIG DATA Applications
8/9/2016 18
Recommendation Systems
8/9/2016 19
Recommendation Engine
8/9/2016 20
09-08-2016
6
Online Advertisement
Online Users
Users Profile
News Articles - Reauters - NYTimes - LATimes - Associate Press
Trend System
Trends and Segments
Stock Market Jobs City Events
ONLNE ADVERTISEMENT
8/9/2016 21
Telecommunication
Telecom Network (25000~ devices
Alarms
Provisioning
Fault Analysis - Fault Detection
Prediction - New
Customers - New Services
8/9/2016 22
Social Network Analysis
8/9/2016 23
Useful Websites
• http://www.kdnuggets.com/
• http://www.kaggle.com
8/9/2016 24
09-08-2016
7
Assignment
• Two Application of BIG Data
– Two different Areas/Domains
– Total Size of Data
– Tools and Technologies Used
8/9/2016 25
End Of Session
8/9/2016 26
Dr. Umesh R. Hodeghatta, Ph.D
http://www.mytechnospeak.com
Email: [email protected]