Big Data – so what’s the big deal?kevinx-chiu.weebly.com/uploads/8/9/8/3/8983380/class_05... ·...

Preview:

Citation preview

Big Data – so what’s the big deal?

Jevin D. West iSchool, University of Washington

jevinw@uw.edu

What is big data?

Agenda •  Introductions •  Big data – why should you care? •  Introduction to big data •  Examples •  Nuances in big data •  Big data at UW and in Seattle •  An exercise in big data •  Concerns •  Big data skills •  Homework •  References

jevinw@uw.edu ������

MGH 330D

Molecular & Cell Biology

Medicine

Physics

Ecology & Evolution

Economics

Geosciences

Psychology

Chemistry

Psychiatry

Environmental Chemistry & Microbiology

Mathematics

Computer Science

Analytic ChemistryBusiness & Marketing

Political Science

Fluid Mechanics

Medical Imaging

Material Engineering

Sociology

Probability & Statistics

Astronomy & Astrophysics

Gastroenterology

Law

Chemical Engineering

Education

Telecommunication

Control Theory

Operations Research

Ophthalmology

Crop Science

Geography

Anthropology

Computer Imaging

Agriculture

Parasitology

Dentistry

Dermatology

Urology

Rheumatology

Applied Acoustics

Pharmacology

Pathology

Otolaryngology

Electromagnetic Engineering

Circuits

Power Systems

Tribology

Neuroscience

Orthopedics Veterinary

Environmental Health

A

Citation flow from B to ACitation flow within field

Citation flow from A to BCitation flow out of field

B

Why should you care about big data?

A shortage of 1.5 million jobs!

Universities are going big

What is big data?

“Yes,  some  of  the  best  theorizing  comes  a4er  collec6ng  data  because  then  you  become  aware  of  another  

reality…”  

Robert  Shiller,  Nobel  Price  in  Economics  (2013)    

Data Exhaust

Data Exhaust: by-product of human activity

Examples: cell phone locations, purchase transactions, social media Predicting human behavior, spread of infectious disease

Barabasi et al., Nature (2008), Ginsperg et al., Nature (2009)

Why big data?

•  Cheaper sensors (climate research, astronomy, high energy physics, high-throughput gene sequencing, cell phones)

•  Cheaper storage (4 TB, $168) •  People willing to share their personal

information (Facebook, social media) •  Faster communication (internet, cell phones) •  Other reasons?

The Four A’s

•  Architecture •  Acquisition •  Analysis •  Archiving

The Four V’s

•  Volume •  Velocity •  Variety •  Veracity

Big Data is messy

Correlation versus Causation

Sampling

Big Data at UW

•  LSST •  CS (Farecast) •  Libraries (digital content) •  Oceanography •  Neuroscience

Is there a secondary market for the data that companies are collecting?

Big Data in action

DJ Patil

If you had access to the personal calendars of 200 million people, what could you do with it? What products

could you create?

The power of meta data…

hHp://qz.com/140357/what-­‐your-­‐facebook-­‐friends-­‐list-­‐reveals-­‐about-­‐your-­‐love-­‐life/  

It’s only meta data…

hHp://kieranhealy.org/blog/archives/2013/06/09/using-­‐metadata-­‐to-­‐find-­‐paul-­‐revere/  

hHp://kieranhealy.org/blog/archives/2013/06/09/using-­‐metadata-­‐to-­‐find-­‐paul-­‐revere/  

hHp://kieranhealy.org/blog/archives/2013/06/09/using-­‐metadata-­‐to-­‐find-­‐paul-­‐revere/  

Betweeness centrality

Eigenvector centrality

hHp://kieranhealy.org/blog/archives/2013/06/09/using-­‐metadata-­‐to-­‐find-­‐paul-­‐revere/  

Big data is about asking good questions

Concerns

•  Privacy •  Probabilistic Models •  Correlation •  The big players own the big data •  NSA •  Reproducibility •  What else?

Enjoy the wave but be cautious…

‘The Data Scientist’

Communication skills Ethical Reasoning

Information/Data Management Personnel Management

Interdisciplinary Adaptable

Big Data involves people

Homework

Example of big data (1) Why you think it is big data. (2) How it involves people?

jevinw@uw.edu  

References

“Data is increasingly digital air : the oxygen we breathe and the carbon dioxide that we exhale. It

can be a source of both sustenance and pollution.” -- Dana Boyd

D. Boyd & K. Crawford (2011) Six Provocations on Big Data. SSRN

Why should you care about big data?

Jobs

Privacy

jevinw@uw.edu ������

UW, MGH 330D ������

Data Lab (http://datalab.ischool.uw.edu)������

Recommended