Upload
olafur-andri-ragnarsson
View
479
Download
0
Embed Size (px)
Citation preview
LECTURE L21L21 Big Data and Analytics
Big Data
With the computer revolution, digital data becomes possible
Over the years, data has grown exponentially
“Big Data” has become a platform by itself with new possibilties
Global Data is Growing FastData in Digital Universe vs. Data Storage Cost, 2010-2015
Source: Mary Meeker, KPCB
Data is a New Growth Platform
TheNetwork
TheSoftware
TheInfrastructure
TheData
Large investments in fiber optic & last-mile cable create connectivity that facilitated the early Internet growth
Optimising the network with software became far more capital efficient than additional capital expenditure buildouts, ultimately resulting in the creation of pervasive networks (Siloed DCs -> AWS) and pervasive software (Siebel -> Salesforce)
Emergence of pervasive software created the need to optimise the performance of the network and store extraordinary amounts of data at extremely low prices
Next Big Wave: Leveraging this unlimited connectivity and storage to collect / aggregate / correlate / interpret all of this data to improve people’s live and enable enterprises to operate more efficiently
Evolution of Data Platform
Source: Mary Meeker, KPCB
Data Generators
Source: Mary Meeker, KPCB
Improve people’s live and enable enterprises to operate more efficiently
“Data is moving from something you use outside the workstream to becoming a part of the
business app itself.”— Frank Bien, CEO of Looker
Big Data Examples
Big Data Examples
Macy's Inc. and real-time pricing
The retailer adjusts pricing in near-real time for 73 million items, based on demand and inventory.
Source:Ten big data case studies in a nutshell
Big Data Examples
Tipp24 AG, a platform for placing bets
The company uses software to analyse billions of transactions and hundreds of customer attributes, and to develop predictive models that target customers and personalise marketing messages on the fly.
Source:Ten big data case studies in a nutshell
Big Data ExamplesWal-Mart Stores Inc. and search
The mega-retailer's latest search engine for Walmart.com includes semantic data. A platform that was designed in-house, relies on text analysis, machine learning and even synonym mining to produce relevant search results.
Wal-Mart says adding semantic search has improved online shoppers completing a purchase by 10% to 15%.
Source:Ten big data case studies in a nutshell
Big Data ExamplesPredPol Inc. and repurposing
The Los Angeles and Santa Cruz police departments, a team of educators and a company called PredPol have taken an algorithm used to predict earthquakes, tweaked it and started feeding it crime data.
The software can predict where crimes are likely to occur down to 500 square feet. In LA, there's been a 33% reduction in burglaries and 21% reduction in violent crimes in areas where the software is being used.
Source:Ten big data case studies in a nutshell
Big Data ExamplesAmerican Express and business intelligence
AmEx started looking for indicators that could really predict loyalty and developed sophisticated predictive models to analyse historical transactions and 115 variables to forecast potential churn
The company believes it can now identify 24% of Australian accounts that will close within the next four months.
Source:Ten big data case studies in a nutshell
Big Data ExamplesA Bank and IBM
A large US bank uses IBM machine learning technologies to analyse credit card transactions.
Using machine learning and stream computing to detect financial fraud
TEDxUofM - Jameson Toole - Big Data for Tomorrow
What is Big Data?
What is Big Data?
Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.
Gartner
What is Big Data?
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Techopedia
What is Big Data?
“Big data is the oil of the 21st century and analytics is the combustion engine.”
—Peter Sondergaard, Gartner Reseach
What is Big Data?
Byte: one rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of rice
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers Manhattan
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of US
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of USZettabyte: Fills the Pasific
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of USZettabyte: Fills the PacificYottabyte: Earth size riceball
David Wellman: What is Big Data?
What is Big Data?
Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of USZettabyte: Fills the PacificYottabyte: Earth size riceball
David Wellman: What is Big Data?
Big Data
Internet
Computers
Early computers
What is Big Data?
Big Data is not about the size of the date, it’s about the value within the data
This value can be used for marketing, businesses optimisation, getting insights, improving health, security etc.
Data Analytics
Why Big Data Analytics?
Understand the data the company has
Process data to see patterns, corrections and information that can be used to make better decisions
Obtain insights that are otherwise not known
Data AnalyticsTRADITIONAL APPROACH
Structured and Repeatable AnalaysBIG DATA APPROACH
Iternative adn Exloratory Analays
Business users
Business users
Determine what questions to ask
IT
Structures the data to answer the question
IT
Delivers a platform to enable creative discovery
Explores what questions could be asked
Tools for Data Analytics
NoSQL databases: MongoDB, Cassandra, Hbase, Hypertable
Storage: S3, Hadoop Distributed File System
Servers: EC2, Google App Engine, Heroku
MapReduce: Hadoop, Hive, Pig, Cascading, S4, MapR •
Processing: R, Yahoo! Pipes, Solr/Lucene, BigSheets,
Two Types of Data Analysis Problems
Supervised Learning: Learn from data but we have labels for all the data we’ve seen so far
Example: Determining Spam Emails
Learn from data but we don’t have any labels
Example: Grouping Emails
Unsupervised Learning:
Learning is about discovering hidden patterns in data
Clustering
One of the oldest problems in unsupervised data analysisIn clustering the goal is to group data according to similarity
Algorithms such as K-means are used for clustering
Clustering
For each artifact found, the location to N and E from the Marker is recorded
That is a Data Set
Before the dig, a historian has said that three families lived in the location
Clustering
Similar: close in physical distance
You assign each data point to one and only one group
The groups are called clusters
Clustering
Clustering them is the unsupervised learning problem where you take your data and assign each data point to exactly one group, or cluster
Uses unlabelled data
Clustering
We may have collection data but we don’t know what to do with it
We might want to explore the data without a particular end goal in mind
Perhaps the data will suggest interesting avenues for further analysis
In this case, we say that we're performing exploratory data analysis
Exploratory data analysis
We don’t know what we are looking for
Data point = color of pixel and location of pixelDissimilarity is the distance in color
Exploratory data analysis
In some cases labelling is too expensive
For example, news change every day and there are too much of them
Using Big Data to Influence People
Alexander Nix, CEO Cambridge Analytica
Data Analysis as a PlatformTHEN NOW
Complex tools operated by Data Analysts Chaos of data silos accross the company Real-time data analytics platform like Looker
Customer Data as a PlatformTHEN NOW
Difficult to customise, lack of automated customer insights
Real-time Intelligent that automatically tracks and analysis interaction wiht customer
Mapping Data as a PlatformTHEN NOW
Difficult and expensive to collect dataLimited in-app digital map useage Mapping platforms like Mapbox
Cloud Data Monitoring as a PlatformTHEN NOW
Expensive and clunky point solutionLengthy implementation cycles
Only used by System AdministratorsCloud monitoring platforms like Datadog
Next
Games and gamification
Why you should play video games and why it is important that your kids play video games