Upload
richard-vidgen
View
1.455
Download
1
Embed Size (px)
DESCRIPTION
An introduction to big data, the cloud, the Internet of Things, predictive analytics, data science, and behaviour change
Citation preview
Prof Richard Vidgen Hull University Business School
January 2014
Big data: an introduction
Internet of things
Ubiquitous compu4ng
Big data
Data management
Data science
Be9er decisions
Big data in context
Social media
Data genera4on
Data storage and management
The cloud
Data analysis Data visualiza4on
Data analysis and presenta4on
Vidgen, R., (2014). Big data: an introduc4on. The BigDataScience blog. h9p://datasciencebusiness.wordpress.com/
Big data • Big data is a general term used to describe the
voluminous amount of unstructured and semi-structured data a company creates -- data that would take too much time and cost too much money to load into a relational database for analysis
• Although Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data
h9p://searchcloudcompu4ng.techtarget.com/defini4on/big-‐data-‐Big-‐Data
Data volumes • 1 Gigabyte = 1000 megabytes
• 1 Terabyte = 1000 gigabytes
• 1 Petabyte = 1000 terabytes
• 1 Exabyte = 1000 petabytes
• 1 Zettabyte = 1000 exabytes
• 1 Yottabyte = 1000 zettabytes
Big data
The Large Hadron Collider generates 15 petabytes of data p.a.
Big is only big in a context it is not just about gigabytes – what counts is how data can be used to create value for individuals,
organisa4ons and society
but …
“The ‘big’ there is purely marke4ng,” Mr. Reed said. “This is all fear … This is about you buying big expensive servers and whatnot.” “The exci4ng thing is you can get a lot of this stuff done just in Excel,” he said. “You don’t need these big pla`orms. You don’t need all this big fancy stuff. If anyone says ‘big’ in front of it, you should look at them very skep4cally … You can tell charlatans when they say ‘big’ in front of everything.”
h9p://chronicle.com/blogs/wiredcampus/big-‐data-‐is-‐bunk-‐obama-‐campaigns-‐tech-‐guru-‐tells-‐university-‐leaders/47885
Hype?
Inter-‐connectedness
Big data is not just a technical problem – it is part of a complex sociotechnical entanglement …
Regulatory and legal aspects
Technologies
Ethical implica4ons
Stakeholders
Problems and “solu4ons”
Socio-‐poli4cal-‐economic factors
… with unintended consequences
h9p://www.4meshighereduca4on.co.uk/news/big-‐data-‐could-‐create-‐dystopian-‐future-‐for-‐students/2010061.ar4cle
“I fear that as we move into the big data age … this argument will not hold much currency any more. Then I worry that the predic4ons will take over, and schools, universi4es and colleges will not take any risks any more.” Professor Mayer-‐Schönberger, Oxford Internet Ins4tute
Big data – what’s special about it? • Zikopoulos et al. (2012), in an IBM publication,
describe ‘Big Data’ as consisting of: – Volume - increasing amounts of data over
traditional settings. – Velocity - information is being generated at a rate
that exceeds those of traditional systems. – Variety - multiple emerging forms of data that are
of interest to enterprises, such as social media data
Zikopoulos P, Eaton C, DeRoos D, Deutsch T, Lapis G. 2012. Understanding Big Data: Analy4cs for Enterprise Class Hadoop and Streaming Data. McGraw-‐Hill.
A technical challenge • “As data is increasingly becoming more varied, more
complex and less structured, it has become imperative to process it quickly. Meeting such demanding requirements poses an enormous challenge for traditional databases and scale-up infrastructures. . . . Big Data refers to new scale-out architectures that address these needs. Big Data is fundamentally about massively distributed architectures and massively parallel processing using commodity building blocks to manage and analyze data.”
EMC. 2012. Big data-‐as-‐a-‐service: a market and technology perspec4ve, h9p://www.emc.com/collateral/sojware/ white-‐papers/h10839-‐big-‐data-‐as-‐a-‐service-‐perspt.pdf, July (accessed January 2013).
Solution - the cloud • Cloud computing is a general term for anything that involves
delivering hosted services over the Internet
• A cloud service has three distinct characteristics that differentiate it from traditional hosting: – It is sold on demand, typically by the minute or the hour – It is elastic -- a user can have as much or as little of a service as
they want at any given time – The service is fully managed by the provider (the consumer
needs nothing but a personal computer and Internet access) • These services are broadly divided into three categories:
– Infrastructure-as-a-Service (IaaS) – Platform-as-a-Service (PaaS) – Software-as-a-Service (SaaS)
• The cloud can be public or private
h9p://searchcloudcompu4ng.techtarget.com/defini4on/cloud-‐compu4ng
h9p://www.bbc.co.uk/news/business-‐25773266
“IBM believes the cloud services market could be worth $200bn by 2020.Businesses are increasingly leasing data storage, compu4ng power and web hos4ng services from a growing number of specialist cloud companies -‐ effec4vely outsourcing their IT needs to cut costs and improve efficiency.”
Internet of Things (IoT) • Although the concept wasn't named until 1999, the
Internet of Things has been in development for decades
• The first Internet appliance was a Coke machine at Carnegie Melon University in the early 1980s. The programmers could connect to the machine over the Internet, check the status of the machine and determine whether or not there would be a cold drink awaiting them, should they decide to make the trip down to the machine
h9p://wha4s.techtarget.com/defini4on/Internet-‐of-‐Things
Internet of Things (IoT) • The Internet of Things (IoT) is a scenario in which
objects, animals or people are provided with unique identifiers and the ability to automatically transfer data over a network without requiring human-to-human or human-to-computer interaction
• So far, the Internet of Things has been most closely associated with machine-to-machine (M2M) communication in manufacturing and power, oil and gas utilities. Products built with M2M communication capabilities are often referred to as being smart, (e.g., smart meter)
h9p://wha4s.techtarget.com/defini4on/Internet-‐of-‐Things
Things • A thing, in the Internet of Things, can be:
– a person with a heart monitor implant (physio sensing)
– A person with a brain scanner (neuro sensing) – a farm animal with a biochip transponder – an automobile that has built-in sensors to alert the
driver when tire pressure is low – … or any other natural or man-made object that can
be assigned an IP address and provided with the ability to transfer data over a network
h9p://wha4s.techtarget.com/defini4on/Internet-‐of-‐Things
h9p://consumertechnik.wordpress.com/2013/03/20/why-‐things-‐ma9er/
Mr Cameron said the UK and Germany could find themselves on the forefront of a new "industrial revolu4on". "I see the internet of things as a huge transforma4ve development -‐ a way of boos4ng produc4vity, of keeping us healthier, making transport more efficient, reducing energy needs, tackling climate change," he said.
BBC NEWS 9 March 2014
Ubiquitous computing • Ubiquitous computing is the growing trend towards
embedding microprocessors in everyday objects so they can communicate information
• Ubiquitous mean "existing everywhere“ - ubiquitous computing devices are completely connected and constantly available
• Ubiquitous computing relies on the convergence of wireless technologies, advanced electronics and the Internet
• The goal of researchers working in ubiquitous computing is to create smart products that communicate unobtrusively (e.g., wearable computers, Google glass, smart meters)
h9p://searchnetworking.techtarget.com/defini4on/pervasive-‐compu4ng
h9p://www.droid-‐life.com/2013/04/09/this-‐is-‐how-‐google-‐glass-‐works-‐infographic/
Big data Data science
Be9er decisions
Analysis and outcomes
Data analysis Data visualiza4on
Data analysis and presenta4on
Vidgen, R., (2014). Big data: an introduc4on. The BigDataScience blog. h9p://datasciencebusiness.wordpress.com/
Using big data
h9p://www.slideshare.net/datasciencelondon/big-‐data-‐sorry-‐data-‐science-‐what-‐does-‐a-‐data-‐scien4st-‐do
Better decisions - predictive analytics • A predictive model that calculates strawberry
purchases based on: – Weather forecast – Store temperature – Freezer sensor data – Remaining stock per shelf life – Sales transaction point of sale feeds – Web searches, social mentions
h9p://www.slideshare.net/datasciencelondon/big-‐data-‐sorry-‐data-‐science-‐what-‐does-‐a-‐data-‐scien4st-‐do
Predictive analytics • For example, what data might help us predict which students will drop out?
– Assessment grades at University – Prior education attainment – Social background – Distance of home from University – Friendship circles and networks (e.g., sports club memberships) – Attendance at lectures and tutorials – Interaction in lectures and tutorials – Time spent on campus – Time spent in library – Number of accesses to electronic learning resources – Text books purchased – Engagement in subject-related forums – Sentiment of social media posts – Etc.
h9p://www.slideshare.net/datasciencelondon/big-‐data-‐sorry-‐data-‐science-‐what-‐does-‐a-‐data-‐scien4st-‐do
Who works with the big data?
Some of the techniques data scientists use • Classification • Clustering • Association rules • Decision trees • Regression • Genetic algorithms • Neural networks and
support vector machines
• Machine learning
• Natural language processing
• Sentiment analysis
• Artificial intelligence
• Time series analysis
• Simulations
• Social network analysis
Technologies for data analysis: usage rates
King, J., & R. Magoulas (2013). Data Science Salary Survey. O’Reilly Media.
R and Python programming languages come above Excel
Enterprise products bo9om of the heap
Data visualiza4on Correla4on matrix based on MPG, horsepower, engine size, number of cylinders, weight, etc.
h9ps://boraberan.wordpress.com/2013/12/09/crea4ng-‐a-‐correla4on-‐matrix-‐in-‐tableau-‐using-‐r-‐or-‐table-‐calcula4ons/
(Masera4 is like a Ferrari; Lotus is not like a Cadillac)
“According to a recent Gartner report, 64% of enterprises surveyed indicate that they're deploying or planning Big Data projects. Yet even more acknowledge that they s4ll don't know what to do with Big Data.”
Gartner On Big Data: Everyone's Doing It, No One Knows Why
Challenges of big data
h9p://readwrite.com/2013/09/18/gartner-‐on-‐big-‐data-‐everyones-‐doing-‐it-‐no-‐one-‐knows-‐why#awesm=~ost43oe8yXjDzr
Big data: it's about iteration • Start small when tackling big data
• Go open source software
• Train existing employees who know the business rather than hunt for data talent
• Iterate on your project as you learn which data sources are valuable, and which questions yield real insights
• You don't have to know the end from the beginning, but you should have a clearer view of what you hope to achieve with Big Data than the Gartner report seems to indicate most have
h9p://readwrite.com/2013/09/18/gartner-‐on-‐big-‐data-‐everyones-‐doing-‐it-‐no-‐one-‐knows-‐why#awesm=~ost43oe8yXjDzr
Resources McKinsey (2011). Big data: The next frontier for innovation, competition,
and productivity http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
Sogetti. Various reports on data analytics, privacy, legal aspects, predicting behaviour http://vint.sogeti.com/download-big-data-reports/
The Economist (2012). Big data: Lessons from the leaders http://www.economistinsights.com/sites/default/files/downloads/EIU_SAS_BigData_4.pdf