Transcript
Page 1: Session 1  introduction to Big Data

09-08-2016

1

Big Data Analytics

Dr. Umesh R. Hodeghatta, Ph.D

http://www.mytechnospeak.com

Email: [email protected]

8/9/2016 1

Outline

• Introduction to Big Data Processing

– What is Big Data

– Big DATA Challenges

– How to handle Big Data

• BIG Data Applications

• Big Data Architecture

• Big Data Tools

8/9/2016 2

What is Big Data

• Everyday, we create billions of bytes of data

– sensors used to gather climate information

– Social media posts

– digital pictures and videos

– Transaction records

– Web logs

– Cell phones, WhatsAp messages 1. Structured data 2. Unstructured data 3. Semi structured data

8/9/2016 3

Examples of Big Data

• Facebook handles 40-50 billion photos from its site members everymonth.

• Walmart handles more than 2 million customer transactions every hour, – databases estimated to contain more than 2.5

petabytes of data.

• FICO Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide

• 235 Terabytes of data collected by the US Library of Congress in 2011

8/9/2016 4

Page 2: Session 1  introduction to Big Data

09-08-2016

2

What do you do with DATA?

• What can you do with DATA?

• Why do you need BIG DATA?

8/9/2016 5

What is BIG-DATA

• DATA is BIG in volume

• Challenges are different from SMALL data

– How to handle BIG DATA processing?

– CPU?

– Memory?

– Storage?

• New tools and techniques required

8/9/2016 6

Characteristics of BIG DATA

BIG DATA

Structured &

Unstructured

Volume

Terrabytes Zettabytes

Batch processing Streaming Data

8/9/2016 7

Characteristics of Big Data

• Volume

– The size of the data

• Variety

– Documents, messages, facebook posts, youtube videos, etc

• Velocity

– Generation and processing of data

• Complexity

– Multiple sources, large volumes, less processing time

8/9/2016 8

Page 3: Session 1  introduction to Big Data

09-08-2016

3

Enabler of Big Data

Challenges

• Capturing data

• Curation

• Storage

• Searching

• Sharing

• Transfer

• Analysis

• Presentation

• Increase in storage capacities

• Increase in processing powers

• Abundance of data

8/9/2016 9

Big Data and Analytics

• "Big Data" refers to not only storage of data but a practice which deals with analysing this data and utilizing the data analysis: – To derive strategic decisions

• such as introducing new category of products, restructuring the organization and improving the customer care services by analysing the customer care recordings/logs.

• Learning outcome of marketing campaigns

• Planning price strategy

8/9/2016 10

Big Data Technologies

• Processing

– Infrastructure to process and store huge volume of structured and unstructured data in real time

• Amazon, IBM, Microsoft, etc

• Analytics

– Capabilities to analyze the data, build the model and complex analysis

• MapReduce, MongoDB, NoSQL, Cassandra

8/9/2016 11

Magic Quadrant for Business Intelligence and Analytics Platforms

8/9/2016 12

Page 4: Session 1  introduction to Big Data

09-08-2016

4

Hype Cycle for Emerging Technologies, 2015

8/9/2016 13 8/9/2016 14

Big Data Tools

8/9/2016 15

Big Data Tools

• NOSQL Database – MongoDB, Cassandra, Hbase, Zookepper, Redis

• MapReduce

– Hadoop, Hive, Pig, Kafka, Flume, MapR, Oozie, Greenplum

• Processing – R, Lucene, Solr, ElasticSearch, Google,

• Storage – HDFS, S3

• Servers – Elastic, Beanstalk, Google App Engine

8/9/2016 16

Page 5: Session 1  introduction to Big Data

09-08-2016

5

BIG DATA Players

8/9/2016 17

BIG DATA Applications

8/9/2016 18

Recommendation Systems

8/9/2016 19

Recommendation Engine

8/9/2016 20

Page 6: Session 1  introduction to Big Data

09-08-2016

6

Online Advertisement

Online Users

Users Profile

News Articles - Reauters - NYTimes - LATimes - Associate Press

Trend System

Trends and Segments

Stock Market Jobs City Events

ONLNE ADVERTISEMENT

8/9/2016 21

Telecommunication

Telecom Network (25000~ devices

Alarms

Provisioning

Fault Analysis - Fault Detection

Prediction - New

Customers - New Services

8/9/2016 22

Social Network Analysis

8/9/2016 23

Useful Websites

• http://www.kdnuggets.com/

• http://www.kaggle.com

8/9/2016 24

Page 7: Session 1  introduction to Big Data

09-08-2016

7

Assignment

• Two Application of BIG Data

– Two different Areas/Domains

– Total Size of Data

– Tools and Technologies Used

8/9/2016 25

End Of Session

8/9/2016 26

Dr. Umesh R. Hodeghatta, Ph.D

http://www.mytechnospeak.com

Email: [email protected]