17
Are you Ready for Big Data? Dr. Putchong Uthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University. [email protected]

Are you ready for BIG DATA?

Embed Size (px)

DESCRIPTION

Talk on

Citation preview

Page 1: Are you ready for BIG DATA?

Are you Ready for Big Data?

Dr. Putchong UthayopasDepartment of Computer Engineering, Faculty

of Engineering, Kasetsart University. [email protected]

Page 2: Are you ready for BIG DATA?

We are living in the world of Data

GeophysicalExploration

Medical Imaging

VideoSurveillance

Mobile Sensors

Gene Sequencing

Smart Grids

Social Media

Page 3: Are you ready for BIG DATA?

Big Data

“Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

Reference: “What is big data? An introduction to the big data landscape.”, Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html

Page 4: Are you ready for BIG DATA?

The Value of Big Data

• Analytical use– Big data analytics can reveal insights hidden previously by

data too costly to process. • peer influence among customers, revealed by analyzing

shoppers’ transactions, social and geographical data.

– Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data.

• Enabling new products.– Facebook has been able to craft a highly personalized user

experience and create a new kind of advertising business

Page 5: Are you ready for BIG DATA?

3 Characteristics of Big Data

Page 6: Are you ready for BIG DATA?

Big Data Challenge

• Volume– How to process data so big that can not be move, or store.

• Velocity– A lot of data coming very fast so it can not be stored such

as Web usage log , Internet, mobile messages. Stream processing is needed to filter unused data or extract some knowledge real-time.

• Variety– So many type of unstructured data format making

conventional database useless.

Page 7: Are you ready for BIG DATA?

How to deal with big data

• Integration of – Storage – Processing– Analysis Algorithm– Visualization

Massive Data

Stream

Stream processing

Processing

Processing

Processing

Visualize

Analysis

Storage

Page 8: Are you ready for BIG DATA?

A New Approach For Distributed Big Data

• Disparate Systems• Manual Administration• One Tenant, Many Systems• IT Provisioned Storage

• Single System Across Locations• Automated Policies • Many Tenants One System• Self-Service Access

L.A. BOSTON LONDON L.A. BOSTON LONDON

Storage Islands Single Storage Pool

Page 9: Are you ready for BIG DATA?

Hadoop• Hadoop is a platform for distributing computing problems across a

number of servers. First developed and released as open source by Yahoo.– Implements the MapReduce approach pioneered by Google in compiling its

search indexes.– Distributing a dataset among multiple servers and operating on the data: the

“map” stage. The partial results are then recombined: the “reduce” stage.• Hadoop utilizes its own distributed filesystem, HDFS, which makes data

available to multiple computing nodes• Hadoop usage pattern involves three stages:

– loading data into HDFS,– MapReduce operations, and– retrieving results from HDFS.

Page 10: Are you ready for BIG DATA?

WHAT FACEBOOK KNOWS

http://www.facebook.com/data

Cameron Marlow calls himself Facebook's "in-house sociologist." He and his team can analyze essentially all the information the site gathers.

Page 11: Are you ready for BIG DATA?

Study of Human Society

• Facebook, in collaboration with the University of Milan, conducted experiment that involved – the entire social network as of May 2011– more than 10 percent of the world's population.

• Analyzing the 69 billion friend connections among those 721 million people showed that – four intermediary friends are usually enough to

introduce anyone to a random stranger.

Page 12: Are you ready for BIG DATA?

The links of Love• Often young women specify that

they are “in a relationship” with their “best friend forever”.– Roughly 20% of all relationships for

the 15-and-under crowd are between girls.

– This number dips to 15% for 18-year-olds and is just 7% for 25-year-olds.

• Anonymous US users who were over 18 at the start of the relationship– the average of the shortest number

of steps to get from any one U.S. user to any other individual is 16.7.

– This is much higher than the 4.74 steps you’d need to go from any Facebook user to another through friendship, as opposed to romantic, ties.

http://www.facebook.com/notes/facebook-data-team/the-links-of-love/10150572088343859

Graph shown the relationship of anonymous US users who were over 18 at the start of the relationship.

Page 13: Are you ready for BIG DATA?

Why?

• Facebook can improve users experience – make useful predictions about users' behavior– make better guesses about which ads you might

be more or less open to at any given time• Right before Valentine's Day this year a

blog post from the Data Science Team listed the songs most popular with people who had recently signaled on Facebook that they had entered or left a relationship

Page 14: Are you ready for BIG DATA?

How facebook handle Big Data?

• Facebook built its data storage system using open-source software called Hadoop.– Hadoop spreading them across many machines inside a data center.– Use Hive, open-source that acts as a translation service, making it

possible to query vast Hadoop data stores using relatively simple code.

• Much of Facebook's data resides in one Hadoop store more than 100 petabytes (a million gigabytes) in size, says Sameet Agarwal, a director of engineering at Facebook who works on data infrastructure, and the quantity is growing exponentially. "Over the last few years we have more than doubled in size every year,”

Page 15: Are you ready for BIG DATA?

All DataFaster AnswersElastic & Scalable

1The Journey To Big Data

2 Data Science CollaborationSelf-Service

Agile AnalyticsPeople & Productivity Focus

3 Real Time DecisionsNew ApplicationsData Monetization

Analytic Productivity Platform

Agile Process & Tools

Predictive EnterpriseApplication Focus

Big Data Enabled Apps

Big Data InfrastructureTechnology Focus

Analytics Engines

Cloud Infrastructure

Analytic Engines

Page 16: Are you ready for BIG DATA?

Data Tsunami

• Data flood is coming, no where to run now!– Data being generated anytime,

anywhere, anyone– Data is moving in fast– Data is too big to move, too

big to store• Better be prepare– Use this to enhance your

business and offer better services to customer

Page 17: Are you ready for BIG DATA?

Thank you