Upload
lesley-bowler
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
BIG DATA Challenges & Opportunities
Search Feeling Lucky
Lei Chen
Internet Pictures Clips Maps News Shop Email more
1
OutlineBackground
Internet Pictures Clips Maps News Shop Email more
“Big data” is term acknowledging the exponential growth, availability and use of …
Challenges“Big data” proposes ground challenges on data capture, storage, analysis …
OpportunitiesMany applications can be benefited from “Big data” …
2
BIG DATA
OutlineBackgroundChallengesOpportunities
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
3
We are capturing more data
Satellite imagery, mobile station, distributed sensor
networks, geographical plotting …
Super exponential growth in data volume
Copyright belongs to “Data Analysis Challenges”, JSR-08-142, Dec
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
4
We are using more data
Intelligent transportation
Digital health care
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
5
We need quick processing of the data
Volcano monitor
Hurricane moving path predication
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
6
We are exploring the unknowns with different means of data measurements
Ocean science
Exploring the universe
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
7
We are discovering new rules from data
The well-formed.eigenfactor project visualizes information flow in science.
This diagram shows the citation links of the journal Nature.
Copyright belongs to http://well-formed.eigenfactor.org
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
8
Defining Big DataWiki: Big data are datasets that grow so large that they become
awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing.
Gartner(2011): Big data is a popular term used to acknowledge
the exponential growth, availability and use of information in the data-rich landscape of tomorrow.
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
9
Features of Big Data
3V: Variety, Velocity and Volume
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATAOutlineBackgroundChallengesOpportunities
Network Topology
<key,vals> Object E-R Hierarchical
Applications
Storage(Reliability, Scalability,
Availability)
Data Model(Interpretation, representation)
Data Processing(Processing lang,
optimization, Visualization)
Data Extraction(Acquisition, Integration,
Representation )
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
11
Data model challenges
<key,vals> Object E-R Hierarchical
Volume Scale up, scale out, and scale in
Velocity “Interactive” properties to facilitate processing
Variety Simple but unified to adapt heterogeneity
Existing data models are not satisfactoryFunctionality vs. Simplicity
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
12
Storage challenges
Storage concerns:
• Reliability: data is safe and trustable
• Availability: data is accessible
• Scalability: data operation performance does not decay along with data size growth
However, the CAP theorem is the bottleneck. No one-for-all solution exists
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
13
CAP Theorem• Consistency• Availability• Partition tolerance
Storage challenges
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
14
Storage challenges
14
ACID vs. BASE
RDBMS
Atomic
Consistent
Isolated
Durable
NoSQL
Basically AvailableSoft-state
Eventually consistent C
P A
BigTableHyperTableHBaseMongoDBRedisScalaris etc.
RDBMS
DynamoCouchDBCassandraSimpleDBTokyo CabinetRiakVoldemot etc.
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
15
Management challenges
15
“Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data” Gartner(2011)
Big data management
Indexing &Partition
Functionality
Adaption to new requirement and new component
Flexibility
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
16
Management challenges
16
E.g., Indexing over big data
Volume
Variety
Large volume of data captured very time unit
Requires Distributed adaptive index
Leads to Significant cost on meta data exchange
Data captured from different sources
Requires Distributed adaptive index
Leads to Ambiguity on indexing the same object
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
17
Challenges on processing
17
• New query language (algebra)
Desired Sacrifices & Overhead
Flexibility Complexity in data modeling
“Relational” supporting Poor scalability
“Uncertain” supporting Poor scalability and significant computing overhead
Scalability Less functionality
Efficiency & Effectiveness Poor scalability
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
18
Challenges on processing
18
• New computing paradigm for processing
Distributed Computing Paradigm Limitations
Message Passing Poor scalability and fault tolerance
Unified AccessInvalidated efficiency over large computing nodes
MapReduce Poor functionality
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
19
Challenges on processing
19
• New optimization methodology
Load Balance Data Locality
High Parallelism Merging Cost
Less Network I/O Replicated Computing
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
20
Why “Big Data”?
20
• We are empowered to learn knowledge and process information more accurately, effectively and efficiently.
Natural Science Study Fundamental Scientific Research
Social Civilization Daily Life
Big Data
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
Big Data for natural science study
• E.g., natural disaster forecasting and management
Flood Earthquake Extreme Weather
Fore-casting
Management
Meteorological dataGeographic data
Population, transportation, urban design data
Economic data
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
22
Big Data for fundamental scientific research
• E.g., Bio informatics and medicine
The mutual promotion relation between the gene technology and the clinical medicine
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
Big Data for social civilization
• Light-speed information spreading & enormous knowledge
Quick events detection
Easy collaboration
Wandering where to get a real good cup of coffee ?
JUST tweet your question!!
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
24
Big Data for daily life
24
• Our life can be much easier more data… E.g., trip planning
Travel to Beijing::Request
3-day stay
Budget< 1000$
Forbidden City
10am Meeting every day
Real world incidents
Traffic jam
Luggage delay
Bad weather
Predefine
Updating
Adaptive agenda
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
25
Opportunity highlights
25
• Volume o Capture, store and analyze data help us better
understand the world• Velocity
o Guaranteed effective & efficient data processing• Variety
o Handling heterogeneous sources of data
Considering all the challenges and constraints, perhaps there is no one-for-all solution
However, application dependent “Big Data” solutions are promising
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities . Applications
Applications
26
Heterogeneous data management• Search doctors • Search universities (undergoing)
…
Data Integration
Data Extraction
~500,000 doctors &~30,000 hospitals
from 50+GB source
OLAP Query Processing
Integrated Database
Web pages on the Internet Hospital databases Search results from
general-
purpose search engines News / rumors
Search Doctors