Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
COMP9321 Web Application EngineeringSemester 1, 2017
Dr. Amin BeheshtiService Oriented Computing Group,
CSE, UNSW Austral ia
Week 11( P a r t I I )
http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457http://www.cse.unsw.edu.au/~sbeheshti/
COMP9321, 17s1, Week 11
Big Data: Challenges and Opportunities
COMP9321, 17s1, Week 11
http://www.intelli3.com/
We are Generating Vast Amounts of Data !!
Healthcare
Remote patient monitoring
Manufacturing
Product sensors
Location-Based Services
Real time location data
Retail
Social media…
Digitalization of Artefacts
books, music, videos, etc.
3
COMP9321, 17s1, Week 11
We are Generating Vast Amounts of Data !!
Air Bus A380: generate 10 TB every 30 min
Twitter: Generate approximately 12 TB of data per day.
Facebook: Facebook data grows by over 500 TB daily.
New York Stock: Exchange 1TB of data everyday.
4
COMP9321, 17s1, Week 11
We are Generating Vast Amounts of Meta-data !!
Data
Versioning
Provenance
Security
Privacy
…
5
COMP9321, 17s1, Week 11
We are Generating Vast Amounts of Meta-data !!
Data
Versioning
Provenance
Security
Privacy
…
We are Tracing everything: Who did What? When? Where? …
e.g. Twitter handles ~1.6 billion search queries per day.
6
COMP9321, 17s1, Week 11
We are Generating Vast Amounts of Meta-data !!
Data
Versioning
Provenance
Security
Privacy
…
7
COMP9321, 17s1, Week 11
Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc.
Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc.
Smart phones, e.g. iPhone tracks: our location, our speed, what apps we are using, who we are ringing, etc.
We are Generating Vast Amounts of Meta-data !!8
COMP9321, 17s1, Week 11
Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc.
Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc.
Smart phones, e.g. iPhone tracks: our location, our speed, what apps we are using, who we are ringing, etc.
We are Generating Vast Amounts of Meta-data !!9
COMP9321, 17s1, Week 11
Big Data and Big Meta-Data
share, comment, review,crowdsource, etc.
10
COMP9321, 17s1, Week 11
Big
So, What is Big Data?
Big data refers to our ability to collect and analysethe ever expanding amounts of data and meta-datathat we are generating every second!
Challenges: Capture,Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.
11
COMP9321, 17s1, Week 11
So, What is Big Data?
Big data refers to our ability to collect and analysethe ever expanding amounts of data and meta-datathat we are generating every second!
Challenges: Capture,Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.
12
COMP9321, 17s1, Week 11
So, What is Big Data?
Big data refers to our ability to collect and analysethe ever expanding amounts of data and meta-datathat we are generating every second!
Challenges: Capture,Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.
13
COMP9321, 17s1, Week 11
What Makes it Big Data?
Volume the vast amounts of data generated every second.
Velocity the speed at which new data is generated and moves around.
Variety the increasingly different types of data.
Veracity the quality of data, e.g. the messiness of the data. Needs detecting and correcting noisy and inconsistent data
Value Statistical, Events, Correlation, Hypothetical
14
COMP9321, 17s1, Week 11
Challenges: How to Store and Process?
Big data is high volume, high velocity, and/or high variety information assets.
Require new forms of storage and processing.
On-hand database management tools?
Traditional data processing applications?
15
COMP9321, 17s1, Week 11
Challenges: Big Data Storage
NoSQL databases:
Employs less constrained consistency models. Simple retrieval and appending operations. Significant performance benefits.
Examples:• Key–value Store• Document Store• Graph Database• …
16
COMP9321, 17s1, Week 11
Challenges: Big Data Storage17(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
COMP9321, 17s1, Week 11
Challenges: Big Data Storage18(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
Challenges: Big Data Storage19(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
Challenges: Big Data Storage20(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
Challenges: Big Data Processing
Apache Hadoop: Hadoop is an open source framework that uses a simple
programming model to enable distributed processing oflarge data sets on clusters of computers.
21
Who Use Hadoop?
Amazon Facebook Google IBM New York Times Yahoo! …
Apache Hadoop solution:• Distributed File System (HDFS)• MapReduce• Pig• HCatalog
COMP9321, 17s1, Week 11
Challenges: Big Data Processing
Apache Spark:22
Efficient In-memory storage
Usable Rich APIs in Java,
Scala, Python
Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop
COMP9321, 17s1, Week 11
Challenges: Big Data Processing
Apache Spark:23
Efficient In-memory storage
Usable Rich APIs in Java,
Scala, Python
Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop
COMP9321, 17s1, Week 11Resilient Distributed Dataset (RDD), Spark's data storage model
Challenges: Big Data Integration
PeopleWeb ServicesIT SystemsWorkflows
Example Scenario: Business Processes (BPs)
..
24
BPsExecution
Log
COMP9321, 17s1, Week 11
Challenges: Big Data Integration
PeopleWeb ServicesIT SystemsWorkflows
Example Scenario: Business Processes (BPs)
..
25
BPsExecution
Log
COMP9321, 17s1, Week 11
Challenges: Big Data Integration
Messy, schema-less and complex Big Data world. Less than 10% of Big Data world are genuinely
relational.
e.g. Linked Data
26
COMP9321, 17s1, Week 11
Challenges: Big Data Integration
Big Data-as-a-Service: Effective processing of big data within acceptable
processing time Easy access of the big data and the big data analysis
results
27
COMP9321, 17s1, Week 11
API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;
• www.programmableweb.com/
• DataSift….open data sources
API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;
• www.programmableweb.com/
• DataSift….open data sources
Challenges: Big data requires a broad set of skills28
COMP9321, 17s1, Week 11
Math and Operations Research Expertise
Develop analytic algorithms
VisualizationExpertise
Interpret data sets, determine correlations andpresent in meaningful ways
Tool DevelopersMask complexity and analytics to lower skills
boundaries
Industry VerticalDomain Expertise
Develop hypothesis, identifyrelevant business issues,
ask the right questions
Data Experts
Data architecture, management,
governance, policy
Decision MakingExecutive andManagement
Apply information to solvebusiness issues
Challenges: Big Data Analytics
Analytics can be defined in many ways, but what matters is the purpose of analytics.
Most definitions agree on the following: Analytics is used to gain insights from data in order tomake better decisions, using mathematical or scientificmethods.
29
Analyse Decide
Data Insight Action
COMP9321, 17s1, Week 11
Manage the Data Understand the Data Act on the Data
Challenges: Big Data Analytics
Analytics can be defined in many ways, but what matters is the purpose of analytics.
Most definitions agree on the following: Analytics is used to gain insights from data in order tomake better decisions, using mathematical or scientificmethods.
30
Analyse Decide
Data Insight Action
COMP9321, 17s1, Week 11
Manage the Data Understand the Data Act on the Data
Challenges: Big Data Analytics 31
COMP9321, 17s1, Week 11
Challenges: Big Data Analytics 32
COMP9321, 17s1, Week 11
Challenges: Big Data Analytics 33
Example:• Beheshti et al., “Scalable Graph-based OLAP Analytics over Process Execution
Data”, DAPD Journal (2015).• Beheshti et al., “A Framework and a Language for On-Line Analytical Processing
on Graphs”, WISE Conference (2012).
OLAP, is an approach to answering multi-dimensional analytical queries swiftly.
Problem: • extension of existing OLAP techniques to analysis
of graphs is not straightforward.• key business insights remain hidden in the
interactions among objects.
Solution:• On-Line Analytical Processing on Graphs
COMP9321, 17s1, Week 11
Challenges: Big Data Analytics 34
COMP9321, 17s1, Week 11
Challenges: Big Data Analytics 35
Big Data Analytics benefits from:• NLP• Machine Learning
• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.
COMP9321, 17s1, Week 11
Examples:
• Healthcare• Social Networks
• e.g. Twitter• Education• Finance• …
Challenges: Big Data Analytics 36
Big Data Analytics benefits from:• NLP• Machine Learning
• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.
Beheshti , et al., “Big data and cross-document coreference resolution: Current state and future opportunities”...
COMP9321, 17s1, Week 11
Big Data Leadership !!
Industry has been in the lead Google, Amazon, Yahoo!, etc.
University researchers have been left behind !! due to lack of access to large-scale cluster computing
facilities
Government agencies are making heavy investments Investments in big-data computing will have extraordinary
near-term and long-term benefits. Cloud computing must be considered a strategic resource
37
COMP9321, 17s1, Week 11
Big Data: Opportunities38
COMP9321, 17s1, Week 11
• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors
• Analytics• Organizing Big Data• Navigating through data• Summarizing Big Data• Process Data Analytics• Support decision-making
• Integration• Integrating enterprise and public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph
• Big Data Performance• In memory• New Benchmarks and Architecture
• User Experience• automation and intelligent guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling
Big Data: Opportunities39
COMP9321, 17s1, Week 11
• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors
• Analytics• Organizing Big Data• Navigating through data• Summarizing Big Data• Process Analytics• Support decision-making
• Integration• Integrating enterprise and public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph
• Big Data Performance• In memory• New Benchmarks and Architecture
• User Experience• automation and intelligent guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling
Conclusion
Why Big Data is different from past Very Large Datasets? Meta-Data !!
Having the ability to analyse Big Data is of limited value if users cannot understand the analysis.
How can the industry and academia collaborate towards solving Big Data challenges!!
What is big today maybe not be big tomorrow!
40
COMP9321, 17s1, Week 11
41
COMP9321, 17s1, Week 11
Thank you!