51
LECTURE L21 L21 Big Data and Analytics

L21 Big Data and Analytics

Embed Size (px)

Citation preview

Page 1: L21 Big Data and Analytics

LECTURE L21L21 Big Data and Analytics

Page 2: L21 Big Data and Analytics

Big Data

With the computer revolution, digital data becomes possible

Over the years, data has grown exponentially

“Big Data” has become a platform by itself with new possibilties

Page 3: L21 Big Data and Analytics

Global Data is Growing FastData in Digital Universe vs. Data Storage Cost, 2010-2015

Source: Mary Meeker, KPCB

Page 4: L21 Big Data and Analytics

Data is a New Growth Platform

TheNetwork

TheSoftware

TheInfrastructure

TheData

Large investments in fiber optic & last-mile cable create connectivity that facilitated the early Internet growth

Optimising the network with software became far more capital efficient than additional capital expenditure buildouts, ultimately resulting in the creation of pervasive networks (Siloed DCs -> AWS) and pervasive software (Siebel -> Salesforce)

Emergence of pervasive software created the need to optimise the performance of the network and store extraordinary amounts of data at extremely low prices

Next Big Wave: Leveraging this unlimited connectivity and storage to collect / aggregate / correlate / interpret all of this data to improve people’s live and enable enterprises to operate more efficiently

Page 5: L21 Big Data and Analytics

Evolution of Data Platform

Source: Mary Meeker, KPCB

Page 6: L21 Big Data and Analytics

Data Generators

Source: Mary Meeker, KPCB

Page 7: L21 Big Data and Analytics

Improve people’s live and enable enterprises to operate more efficiently

Page 8: L21 Big Data and Analytics

“Data is moving from something you use outside the workstream to becoming a part of the

business app itself.”— Frank Bien, CEO of Looker

Page 9: L21 Big Data and Analytics

Big Data Examples

Page 10: L21 Big Data and Analytics

Big Data Examples

Macy's Inc. and real-time pricing

The retailer adjusts pricing in near-real time for 73 million items, based on demand and inventory.

Source:Ten big data case studies in a nutshell

Page 11: L21 Big Data and Analytics

Big Data Examples

Tipp24 AG, a platform for placing bets

The company uses software to analyse billions of transactions and hundreds of customer attributes, and to develop predictive models that target customers and personalise marketing messages on the fly.

Source:Ten big data case studies in a nutshell

Page 12: L21 Big Data and Analytics

Big Data ExamplesWal-Mart Stores Inc. and search

The mega-retailer's latest search engine for Walmart.com includes semantic data. A platform that was designed in-house, relies on text analysis, machine learning and even synonym mining to produce relevant search results.

Wal-Mart says adding semantic search has improved online shoppers completing a purchase by 10% to 15%.

Source:Ten big data case studies in a nutshell

Page 13: L21 Big Data and Analytics

Big Data ExamplesPredPol Inc. and repurposing

The Los Angeles and Santa Cruz police departments, a team of educators and a company called PredPol have taken an algorithm used to predict earthquakes, tweaked it and started feeding it crime data.

The software can predict where crimes are likely to occur down to 500 square feet. In LA, there's been a 33% reduction in burglaries and 21% reduction in violent crimes in areas where the software is being used.

Source:Ten big data case studies in a nutshell

Page 14: L21 Big Data and Analytics

Big Data ExamplesAmerican Express and business intelligence

AmEx started looking for indicators that could really predict loyalty and developed sophisticated predictive models to analyse historical transactions and 115 variables to forecast potential churn

The company believes it can now identify 24% of Australian accounts that will close within the next four months.

Source:Ten big data case studies in a nutshell

Page 15: L21 Big Data and Analytics

Big Data ExamplesA Bank and IBM

A large US bank uses IBM machine learning technologies to analyse credit card transactions.

Using machine learning and stream computing to detect financial fraud

Page 16: L21 Big Data and Analytics

TEDxUofM - Jameson Toole - Big Data for Tomorrow

Page 17: L21 Big Data and Analytics
Page 18: L21 Big Data and Analytics

What is Big Data?

Page 19: L21 Big Data and Analytics

What is Big Data?

Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

Gartner

Page 20: L21 Big Data and Analytics

What is Big Data?

Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.

Techopedia

Page 21: L21 Big Data and Analytics

What is Big Data?

“Big data is the oil of the 21st century and analytics is the combustion engine.”

—Peter Sondergaard, Gartner Reseach

Page 22: L21 Big Data and Analytics

What is Big Data?

Byte: one rice

David Wellman: What is Big Data?

Page 23: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of rice

David Wellman: What is Big Data?

Page 24: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of rice

David Wellman: What is Big Data?

Page 25: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of rice

David Wellman: What is Big Data?

Page 26: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers Manhattan

David Wellman: What is Big Data?

Page 27: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of US

David Wellman: What is Big Data?

Page 28: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of USZettabyte: Fills the Pasific

David Wellman: What is Big Data?

Page 29: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of USZettabyte: Fills the PacificYottabyte: Earth size riceball

David Wellman: What is Big Data?

Page 30: L21 Big Data and Analytics

What is Big Data?

Byte: one riceKilobyte: handful of riceMegabyte: Big pot of riceGigabyte: Truck full of riceTerabyte: Containership full of ricePetabyte: Covers ManhattanExabyte: Covers the west coast of USZettabyte: Fills the PacificYottabyte: Earth size riceball

David Wellman: What is Big Data?

Big Data

Internet

Computers

Early computers

Page 31: L21 Big Data and Analytics

What is Big Data?

Big Data is not about the size of the date, it’s about the value within the data

This value can be used for marketing, businesses optimisation, getting insights, improving health, security etc.

Page 32: L21 Big Data and Analytics

Data Analytics

Page 33: L21 Big Data and Analytics

Why Big Data Analytics?

Understand the data the company has

Process data to see patterns, corrections and information that can be used to make better decisions

Obtain insights that are otherwise not known

Page 34: L21 Big Data and Analytics

Data AnalyticsTRADITIONAL APPROACH

Structured and Repeatable AnalaysBIG DATA APPROACH

Iternative adn Exloratory Analays

Business users

Business users

Determine what questions to ask

IT

Structures the data to answer the question

IT

Delivers a platform to enable creative discovery

Explores what questions could be asked

Page 35: L21 Big Data and Analytics

Tools for Data Analytics

NoSQL databases: MongoDB, Cassandra, Hbase, Hypertable

Storage: S3, Hadoop Distributed File System

Servers: EC2, Google App Engine, Heroku

MapReduce: Hadoop, Hive, Pig, Cascading, S4, MapR •

Processing: R, Yahoo! Pipes, Solr/Lucene, BigSheets,

Page 36: L21 Big Data and Analytics

Two Types of Data Analysis Problems

Supervised Learning: Learn from data but we have labels for all the data we’ve seen so far

Example: Determining Spam Emails

Learn from data but we don’t have any labels

Example: Grouping Emails

Unsupervised Learning:

Learning is about discovering hidden patterns in data

Page 37: L21 Big Data and Analytics

Clustering

One of the oldest problems in unsupervised data analysisIn clustering the goal is to group data according to similarity

Algorithms such as K-means are used for clustering

Page 38: L21 Big Data and Analytics

Clustering

For each artifact found, the location to N and E from the Marker is recorded

That is a Data Set

Before the dig, a historian has said that three families lived in the location

Page 39: L21 Big Data and Analytics

Clustering

Similar: close in physical distance

You assign each data point to one and only one group

The groups are called clusters

Page 40: L21 Big Data and Analytics

Clustering

Clustering them is the unsupervised learning problem where you take your data and assign each data point to exactly one group, or cluster

Uses unlabelled data

Page 41: L21 Big Data and Analytics

Clustering

We may have collection data but we don’t know what to do with it

We might want to explore the data without a particular end goal in mind

Perhaps the data will suggest interesting avenues for further analysis

In this case, we say that we're performing exploratory data analysis

Page 42: L21 Big Data and Analytics

Exploratory data analysis

We don’t know what we are looking for

Data point = color of pixel and location of pixelDissimilarity is the distance in color

Page 43: L21 Big Data and Analytics

Exploratory data analysis

In some cases labelling is too expensive

For example, news change every day and there are too much of them

Page 44: L21 Big Data and Analytics

Using Big Data to Influence People

Page 45: L21 Big Data and Analytics

Alexander Nix, CEO Cambridge Analytica

Page 46: L21 Big Data and Analytics
Page 47: L21 Big Data and Analytics

Data Analysis as a PlatformTHEN NOW

Complex tools operated by Data Analysts Chaos of data silos accross the company Real-time data analytics platform like Looker

Page 48: L21 Big Data and Analytics

Customer Data as a PlatformTHEN NOW

Difficult to customise, lack of automated customer insights

Real-time Intelligent that automatically tracks and analysis interaction wiht customer

Page 49: L21 Big Data and Analytics

Mapping Data as a PlatformTHEN NOW

Difficult and expensive to collect dataLimited in-app digital map useage Mapping platforms like Mapbox

Page 50: L21 Big Data and Analytics

Cloud Data Monitoring as a PlatformTHEN NOW

Expensive and clunky point solutionLengthy implementation cycles

Only used by System AdministratorsCloud monitoring platforms like Datadog

Page 51: L21 Big Data and Analytics

Next

Games and gamification

Why you should play video games and why it is important that your kids play video games