26
BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen Internet ictures Clips Maps News Shop Email more 1

BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

Embed Size (px)

Citation preview

Page 1: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BIG DATA Challenges & Opportunities

Search Feeling Lucky

Lei Chen

Internet Pictures Clips Maps News Shop Email more

1

Page 2: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OutlineBackground

Internet Pictures Clips Maps News Shop Email more

“Big data” is term acknowledging the exponential growth, availability and use of …

Challenges“Big data” proposes ground challenges on data capture, storage, analysis …

OpportunitiesMany applications can be benefited from “Big data” …

2

BIG DATA

OutlineBackgroundChallengesOpportunities

Page 3: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

3

We are capturing more data

Satellite imagery, mobile station, distributed sensor

networks, geographical plotting …

Super exponential growth in data volume

Copyright belongs to “Data Analysis Challenges”, JSR-08-142, Dec

Page 4: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

4

We are using more data

Intelligent transportation

Digital health care

Page 5: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

5

We need quick processing of the data

Volcano monitor

Hurricane moving path predication

Page 6: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

6

We are exploring the unknowns with different means of data measurements

Ocean science

Exploring the universe

Page 7: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

7

We are discovering new rules from data

The well-formed.eigenfactor project visualizes information flow in science.

This diagram shows the citation links of the journal Nature.

Copyright belongs to http://well-formed.eigenfactor.org

Page 8: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

8

Defining Big DataWiki: Big data are datasets that grow so large that they become

awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing.

Gartner(2011): Big data is a popular term used to acknowledge

the exponential growth, availability and use of information in the data-rich landscape of tomorrow.

Page 9: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

BackgroundInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallengesOpportunities

9

Features of Big Data

3V: Variety, Velocity and Volume

Page 10: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATAOutlineBackgroundChallengesOpportunities

Network Topology

<key,vals> Object E-R Hierarchical

Applications

Storage(Reliability, Scalability,

Availability)

Data Model(Interpretation, representation)

Data Processing(Processing lang,

optimization, Visualization)

Data Extraction(Acquisition, Integration,

Representation )

Page 11: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

11

Data model challenges

<key,vals> Object E-R Hierarchical

Volume Scale up, scale out, and scale in

Velocity “Interactive” properties to facilitate processing

Variety Simple but unified to adapt heterogeneity

Existing data models are not satisfactoryFunctionality vs. Simplicity

Page 12: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

12

Storage challenges

Storage concerns:

• Reliability: data is safe and trustable

• Availability: data is accessible

• Scalability: data operation performance does not decay along with data size growth

However, the CAP theorem is the bottleneck. No one-for-all solution exists

Page 13: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

13

CAP Theorem• Consistency• Availability• Partition tolerance

Storage challenges

Page 14: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

14

Storage challenges

14

ACID vs. BASE

RDBMS

Atomic

Consistent

Isolated

Durable

NoSQL

Basically AvailableSoft-state

Eventually consistent C

P A

BigTableHyperTableHBaseMongoDBRedisScalaris etc.

RDBMS

DynamoCouchDBCassandraSimpleDBTokyo CabinetRiakVoldemot etc.

Page 15: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

15

Management challenges

15

“Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data” Gartner(2011)

Big data management

Indexing &Partition

Functionality

Adaption to new requirement and new component

Flexibility

Page 16: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

16

Management challenges

16

E.g., Indexing over big data

Volume

Variety

Large volume of data captured very time unit

Requires Distributed adaptive index

Leads to Significant cost on meta data exchange

Data captured from different sources

Requires Distributed adaptive index

Leads to Ambiguity on indexing the same object

Page 17: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

17

Challenges on processing

17

• New query language (algebra)

Desired Sacrifices & Overhead

Flexibility Complexity in data modeling

“Relational” supporting Poor scalability

“Uncertain” supporting Poor scalability and significant computing overhead

Scalability Less functionality

Efficiency & Effectiveness Poor scalability

Page 18: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

18

Challenges on processing

18

• New computing paradigm for processing

Distributed Computing Paradigm Limitations

Message Passing Poor scalability and fault tolerance

Unified AccessInvalidated efficiency over large computing nodes

MapReduce Poor functionality

Page 19: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

ChallengesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

19

Challenges on processing

19

• New optimization methodology

Load Balance Data Locality

High Parallelism Merging Cost

Less Network I/O Replicated Computing

Page 20: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

20

Why “Big Data”?

20

• We are empowered to learn knowledge and process information more accurately, effectively and efficiently.

Natural Science Study Fundamental Scientific Research

Social Civilization Daily Life

Big Data

Page 21: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

Big Data for natural science study

• E.g., natural disaster forecasting and management

Flood Earthquake Extreme Weather

Fore-casting

Management

Meteorological dataGeographic data

Population, transportation, urban design data

Economic data

Page 22: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

22

Big Data for fundamental scientific research

• E.g., Bio informatics and medicine

The mutual promotion relation between the gene technology and the clinical medicine

Page 23: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

Big Data for social civilization

• Light-speed information spreading & enormous knowledge

Quick events detection

Easy collaboration

Wandering where to get a real good cup of coffee ?

JUST tweet your question!!

Page 24: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

24

Big Data for daily life

24

• Our life can be much easier more data… E.g., trip planning

Travel to Beijing::Request

3-day stay

Budget< 1000$

Forbidden City

10am Meeting every day

Real world incidents

Traffic jam

Luggage delay

Bad weather

Predefine

Updating

Adaptive agenda

Page 25: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities

25

Opportunity highlights

25

• Volume o Capture, store and analyze data help us better

understand the world• Velocity

o Guaranteed effective & efficient data processing• Variety

o Handling heterogeneous sources of data

Considering all the challenges and constraints, perhaps there is no one-for-all solution

However, application dependent “Big Data” solutions are promising

Page 26: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1

OpportunitiesInternet Pictures Clips Maps News Shop Email more

BIG DATA

OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities . Applications

Applications

26

Heterogeneous data management• Search doctors • Search universities (undergoing)

Data Integration

Data Extraction

~500,000 doctors &~30,000 hospitals

from 50+GB source

OLAP Query Processing

Integrated Database

Web pages on the Internet Hospital databases Search results from

general-

purpose search engines News / rumors

Search Doctors