29
Prof Richard Vidgen Hull University Business School January 2014 Big data: an introduction

Introduction to big data

Embed Size (px)

DESCRIPTION

An introduction to big data, the cloud, the Internet of Things, predictive analytics, data science, and behaviour change

Citation preview

Page 1: Introduction to big data

Prof Richard Vidgen Hull University Business School

January 2014

Big data: an introduction

Page 2: Introduction to big data

Internet  of  things  

Ubiquitous  compu4ng  

Big  data  

Data  management  

Data  science  

Be9er  decisions  

Big data in context

Social  media  

Data  genera4on  

Data  storage  and  management  

The  cloud  

Data  analysis   Data  visualiza4on  

Data  analysis  and  presenta4on  

Vidgen,  R.,  (2014).  Big  data:  an  introduc4on.  The  BigDataScience  blog.  h9p://datasciencebusiness.wordpress.com/  

Page 3: Introduction to big data

Big data •  Big data is a general term used to describe the

voluminous amount of unstructured and semi-structured data a company creates -- data that would take too much time and cost too much money to load into a relational database for analysis

•  Although Big data doesn't refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data

h9p://searchcloudcompu4ng.techtarget.com/defini4on/big-­‐data-­‐Big-­‐Data  

Page 4: Introduction to big data

Data volumes •  1 Gigabyte = 1000 megabytes

•  1 Terabyte = 1000 gigabytes

•  1 Petabyte = 1000 terabytes

•  1 Exabyte = 1000 petabytes

•  1 Zettabyte = 1000 exabytes

•  1 Yottabyte = 1000 zettabytes

Big  data  

The  Large  Hadron  Collider  generates  15  petabytes  of  data  p.a.  

Big  is  only  big  in  a  context  it  is  not  just  about  gigabytes  –  what  counts  is  how  data  can  be  used  to  create  value  for  individuals,  

organisa4ons  and  society  

but  …  

Page 5: Introduction to big data

“The  ‘big’  there  is  purely  marke4ng,”  Mr.  Reed  said.  “This  is  all  fear  …  This  is  about  you  buying  big  expensive  servers  and  whatnot.”  “The  exci4ng  thing  is  you  can  get  a  lot  of  this  stuff  done  just  in  Excel,”  he  said.  “You  don’t  need  these  big  pla`orms.  You  don’t  need  all  this  big  fancy  stuff.  If  anyone  says  ‘big’  in  front  of  it,  you  should  look  at  them  very  skep4cally  …  You  can  tell  charlatans  when  they  say  ‘big’  in  front  of  everything.”  

h9p://chronicle.com/blogs/wiredcampus/big-­‐data-­‐is-­‐bunk-­‐obama-­‐campaigns-­‐tech-­‐guru-­‐tells-­‐university-­‐leaders/47885  

Hype?

Page 6: Introduction to big data

Inter-­‐connectedness  

Big data is not just a technical problem – it is part of a complex sociotechnical entanglement …

Regulatory  and  legal  aspects  

Technologies  

Ethical  implica4ons  

Stakeholders  

Problems  and  “solu4ons”  

Socio-­‐poli4cal-­‐economic  factors  

… with unintended consequences

Page 7: Introduction to big data

h9p://www.4meshighereduca4on.co.uk/news/big-­‐data-­‐could-­‐create-­‐dystopian-­‐future-­‐for-­‐students/2010061.ar4cle  

“I  fear  that  as  we  move  into  the  big  data  age  …  this  argument  will  not  hold  much  currency  any  more.  Then  I  worry  that  the  predic4ons  will  take  over,  and  schools,  universi4es  and  colleges  will  not  take  any  risks  any  more.”    Professor  Mayer-­‐Schönberger,  Oxford  Internet  Ins4tute    

Page 8: Introduction to big data

Big data – what’s special about it? •  Zikopoulos et al. (2012), in an IBM publication,

describe ‘Big Data’ as consisting of: – Volume - increasing amounts of data over

traditional settings. – Velocity - information is being generated at a rate

that exceeds those of traditional systems. – Variety - multiple emerging forms of data that are

of interest to enterprises, such as social media data

Zikopoulos  P,  Eaton  C,  DeRoos  D,  Deutsch  T,  Lapis  G.  2012.  Understanding  Big  Data:  Analy4cs  for  Enterprise  Class  Hadoop  and  Streaming  Data.  McGraw-­‐Hill.  

Page 9: Introduction to big data

A technical challenge •  “As data is increasingly becoming more varied, more

complex and less structured, it has become imperative to process it quickly. Meeting such demanding requirements poses an enormous challenge for traditional databases and scale-up infrastructures. . . . Big Data refers to new scale-out architectures that address these needs. Big Data is fundamentally about massively distributed architectures and massively parallel processing using commodity building blocks to manage and analyze data.”

EMC.  2012.  Big  data-­‐as-­‐a-­‐service:  a  market  and  technology  perspec4ve,  h9p://www.emc.com/collateral/sojware/  white-­‐papers/h10839-­‐big-­‐data-­‐as-­‐a-­‐service-­‐perspt.pdf,  July  (accessed  January  2013).  

Page 10: Introduction to big data

Solution - the cloud •  Cloud computing is a general term for anything that involves

delivering hosted services over the Internet

•  A cloud service has three distinct characteristics that differentiate it from traditional hosting: –  It is sold on demand, typically by the minute or the hour –  It is elastic -- a user can have as much or as little of a service as

they want at any given time –  The service is fully managed by the provider (the consumer

needs nothing but a personal computer and Internet access) •  These services are broadly divided into three categories:

–  Infrastructure-as-a-Service (IaaS) –  Platform-as-a-Service (PaaS) –  Software-as-a-Service (SaaS)

•  The cloud can be public or private

h9p://searchcloudcompu4ng.techtarget.com/defini4on/cloud-­‐compu4ng  

Page 11: Introduction to big data

h9p://www.bbc.co.uk/news/business-­‐25773266  

“IBM  believes  the  cloud  services  market  could  be  worth  $200bn  by  2020.Businesses  are  increasingly  leasing  data  storage,  compu4ng  power  and  web  hos4ng  services  from  a  growing  number  of  specialist  cloud  companies  -­‐  effec4vely  outsourcing  their  IT  needs  to  cut  costs  and  improve  efficiency.”  

Page 12: Introduction to big data

Internet of Things (IoT) •  Although the concept wasn't named until 1999, the

Internet of Things has been in development for decades

•  The first Internet appliance was a Coke machine at Carnegie Melon University in the early 1980s. The programmers could connect to the machine over the Internet, check the status of the machine and determine whether or not there would be a cold drink awaiting them, should they decide to make the trip down to the machine

h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things  

Page 13: Introduction to big data

Internet of Things (IoT) •  The Internet of Things (IoT) is a scenario in which

objects, animals or people are provided with unique identifiers and the ability to automatically transfer data over a network without requiring human-to-human or human-to-computer interaction

•  So far, the Internet of Things has been most closely associated with machine-to-machine (M2M) communication in manufacturing and power, oil and gas utilities. Products built with M2M communication capabilities are often referred to as being smart, (e.g., smart meter)

h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things  

Page 14: Introduction to big data

Things •  A thing, in the Internet of Things, can be:

–  a person with a heart monitor implant (physio sensing)

– A person with a brain scanner (neuro sensing) –  a farm animal with a biochip transponder –  an automobile that has built-in sensors to alert the

driver when tire pressure is low – … or any other natural or man-made object that can

be assigned an IP address and provided with the ability to transfer data over a network

h9p://wha4s.techtarget.com/defini4on/Internet-­‐of-­‐Things  

Page 15: Introduction to big data

h9p://consumertechnik.wordpress.com/2013/03/20/why-­‐things-­‐ma9er/  

Page 16: Introduction to big data

Mr  Cameron  said  the  UK  and  Germany  could  find  themselves  on  the  forefront  of  a  new  "industrial  revolu4on".    "I  see  the  internet  of  things  as  a  huge  transforma4ve  development  -­‐  a  way  of  boos4ng  produc4vity,  of  keeping  us  healthier,  making  transport  more  efficient,  reducing  energy  needs,  tackling  climate  change,"  he  said.  

BBC  NEWS  9  March  2014  

Page 17: Introduction to big data

Ubiquitous computing •  Ubiquitous computing is the growing trend towards

embedding microprocessors in everyday objects so they can communicate information

•  Ubiquitous mean "existing everywhere“ - ubiquitous computing devices are completely connected and constantly available

•  Ubiquitous computing relies on the convergence of wireless technologies, advanced electronics and the Internet

•  The goal of researchers working in ubiquitous computing is to create smart products that communicate unobtrusively (e.g., wearable computers, Google glass, smart meters)

h9p://searchnetworking.techtarget.com/defini4on/pervasive-­‐compu4ng  

Page 18: Introduction to big data

h9p://www.droid-­‐life.com/2013/04/09/this-­‐is-­‐how-­‐google-­‐glass-­‐works-­‐infographic/  

Page 19: Introduction to big data

Big  data   Data  science  

Be9er  decisions  

Analysis and outcomes

Data  analysis   Data  visualiza4on  

Data  analysis  and  presenta4on  

Vidgen,  R.,  (2014).  Big  data:  an  introduc4on.  The  BigDataScience  blog.  h9p://datasciencebusiness.wordpress.com/  

Page 20: Introduction to big data

Using big data

h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do  

Page 21: Introduction to big data

Better decisions - predictive analytics •  A predictive model that calculates strawberry

purchases based on: – Weather forecast – Store temperature – Freezer sensor data – Remaining stock per shelf life – Sales transaction point of sale feeds – Web searches, social mentions

h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do  

Page 22: Introduction to big data

Predictive analytics •  For example, what data might help us predict which students will drop out?

–  Assessment grades at University –  Prior education attainment –  Social background –  Distance of home from University –  Friendship circles and networks (e.g., sports club memberships) –  Attendance at lectures and tutorials –  Interaction in lectures and tutorials –  Time spent on campus –  Time spent in library –  Number of accesses to electronic learning resources –  Text books purchased –  Engagement in subject-related forums –  Sentiment of social media posts –  Etc.

Page 23: Introduction to big data

h9p://www.slideshare.net/datasciencelondon/big-­‐data-­‐sorry-­‐data-­‐science-­‐what-­‐does-­‐a-­‐data-­‐scien4st-­‐do  

Who works with the big data?

Page 24: Introduction to big data

Some of the techniques data scientists use •  Classification •  Clustering •  Association rules •  Decision trees •  Regression •  Genetic algorithms •  Neural networks and

support vector machines

•  Machine learning

•  Natural language processing

•  Sentiment analysis

•  Artificial intelligence

•  Time series analysis

•  Simulations

•  Social network analysis

Page 25: Introduction to big data

Technologies for data analysis: usage rates

King,  J.,  &  R.  Magoulas  (2013).  Data  Science  Salary  Survey.  O’Reilly  Media.  

R  and  Python  programming  languages  come  above  Excel  

Enterprise  products  bo9om  of  the  heap  

Page 26: Introduction to big data

Data  visualiza4on    Correla4on  matrix  based  on  MPG,  horsepower,  engine  size,  number  of  cylinders,  weight,  etc.  

h9ps://boraberan.wordpress.com/2013/12/09/crea4ng-­‐a-­‐correla4on-­‐matrix-­‐in-­‐tableau-­‐using-­‐r-­‐or-­‐table-­‐calcula4ons/  

(Masera4  is  like  a  Ferrari;  Lotus  is  not  like  a  Cadillac)  

Page 27: Introduction to big data

“According  to  a  recent  Gartner  report,  64%  of  enterprises  surveyed  indicate  that  they're  deploying  or  planning  Big  Data  projects.  Yet  even  more  acknowledge  that  they  s4ll  don't  know  what  to  do  with  Big  Data.”  

Gartner  On  Big  Data:  Everyone's  Doing  It,  No  One  Knows  Why  

Challenges of big data

h9p://readwrite.com/2013/09/18/gartner-­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr  

Page 28: Introduction to big data

Big data: it's about iteration •  Start small when tackling big data

•  Go open source software

•  Train existing employees who know the business rather than hunt for data talent

•  Iterate on your project as you learn which data sources are valuable, and which questions yield real insights

•  You don't have to know the end from the beginning, but you should have a clearer view of what you hope to achieve with Big Data than the Gartner report seems to indicate most have

h9p://readwrite.com/2013/09/18/gartner-­‐on-­‐big-­‐data-­‐everyones-­‐doing-­‐it-­‐no-­‐one-­‐knows-­‐why#awesm=~ost43oe8yXjDzr  

Page 29: Introduction to big data

Resources McKinsey (2011). Big data: The next frontier for innovation, competition,

and productivity http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation

Sogetti. Various reports on data analytics, privacy, legal aspects, predicting behaviour http://vint.sogeti.com/download-big-data-reports/

The Economist (2012). Big data: Lessons from the leaders http://www.economistinsights.com/sites/default/files/downloads/EIU_SAS_BigData_4.pdf