29
Big Data: Predictive Analytics VS Causal Inference Wharton Data Camp Sessions 7 Agenda 1) What’s the deal here? 2) Why should you be aware? 3) What kind of development is going on right now? 4) “Big Data and You”

Big Data: Predictive Analytics VS Causal Inference Wharton Data Camp Sessions 7 Agenda 1)What’s the deal here? 2)Why should you be aware? 3)What kind of

Embed Size (px)

Citation preview

Big Data: Predictive Analytics VS Causal Inference

Wharton Data Camp Sessions 7

Agenda1) What’s the deal here? 2) Why should you be aware? 3) What kind of development is going on right now?4) “Big Data and You”

Big Data! Some videos

Joy of Stat - Hans Rosling! http://www.youtube.com/watch?v=CiCQepmcuj8 http://www.ted.com/playlists/56/making_sense_of_too_much_data.html

http://motherboard.vice.com/blog/big-data-explained-brilliantly-in-one-short-video Broad stop at 4:26

http://www.intel.com/content/www/us/en/big-data/big-data-101-animation.html More relevant stop at 2:10

“This is the caveman era of the big data”

What’s cool is cool because we are looking at these for the first time and even correlation is cool sometimes! Mash up of different big data makes things scary sometimes (CMU Face app)

Scientific process always begins with correlation then moves onto causality when mature

The Rise of Predictive Models

Statistics & Computer Science

With the rise of data + computational power Better prediction Model free – no theory backing Blackbox algorithms Statistical algorithms

Goal: Predict well (with big enough data, it works)

Techniques: MANY Take CIS 520:Machine Learning for basic intro. At least audit! It

will open up your eyes Stat 9XX- Statistical Learning Theory if offered! Also great – will

be a lot of probability/stat theory

Good Old Causal Inference

Statistics & Econometrics

Explore -> Develop Theory -> Test with Statistical Inference models ( Linear Models / Graphical Models / etc)

Requirement for X Causes Y X must temporally come before Y (NOT in Predictive

model) X must have significant statistical relation to Y Association between X and Y must not be due to

other omitted variable (NOT in Predictive model)

Theory is from economics/sociology/psychology etc

Predictive Analytics VS Causal Inference

Predictive analytics (Machine Learning, Algorithms) Art of prediction RMSE/Error functions

Causal Inference (Rubin Causal Model, Structural) Theory building Testing theory with statistical tools and robust design of experiment or

techniques to deal with observational data

Statistics/Comp Sci (Algorithms and Data mining, Machine Learning)

Statistics/Econometrics (Causality – different school of thoughts even within causal inference groups. For brief fun intro, see http://leedokyun.com/obs.pdf)

Paradigm-Building – Kuhnian sense & Falsify existing beliefs – Popperian Sense Causal inference can do both. Predictive Models cannot

Arguments About the Big data Movement

Great portion of Start-ups & Many big data firms these dayshttp://xkcd.com/882/

You’d be surprised how

Naïve some industry

people are. Some are great.

Companies are trying to collect everything about everyone. Becomes unwieldy beast!

Arguments About the Big data Movement

Read these interesting pieces featuring the dynamic duo of marketing (Prof Eric Bradlow and Prof Peter Fader) and Prof Eric Clemons of OPIM.

http://www.sas.com/resources/asset/SAS_BigData_final.pdf

http://knowledge.wharton.upenn.edu/article.cfm?articleid=2186

http://www.datanami.com/datanami/2012-05-03/wharton_professor_pokes_hole_in_big_data_balloon.html

Predictive vs Causal

Statistics

Causal Inference Predictive analytics

Econometrics

Machine Learning

Some notable examples

Not Causal

Causal

Small Large

RevolutionR

Hal Varian: GoogleSusan Athey: MS

Angrist, Krueger

UCLA Stat bookTargeted learning

Economists

Machine Learning

What are you doing here?

NetflixGoogle

Data Mining

Structural ModelingInformation Systems Management

Lab experiment

Fraud detection

Hans Rosling

Marketing

Association rule

Causal inference without data mining is myopic and data mining without causal inference is blind

As academics/economists/scientists, we need to embrace the rise of predictive model in big data era and use it to extract unstructured data but never forget that our goal is to build paradigm and contribute to the knowledge/theory base

Usage of Predictive analytics/Machine learning

Exploration (Unsupervised learning/Clustering/Anomaly detection)

Data Extraction From Unstructured Data (NLP + Supervised Learning)

Some people have started to incorporate machine learning techniques into causal inference Machine learning in matching (PSM)Targeted Learning, 2012 Springer Series

(http://www.targetedlearningbook.com/)

Great blogs for causal inference

Andrew Gelman: Bayesian Statistician at Columbia U http://andrewgelman.com/ This is where lot of action is happening That great fight of 2009 between the Pearlian vs Rubinian!

“Boy, these academic disputes are fun! Such vitriol! Such personal animosity! It's better than reality TV. Did Rubin slap Pearl's mom, or perhaps vice versa?”

“With all due respect, I think you are wrong that Judea does not understand the Rubin approach.” – Larry Wasserman

Judea Pearl “Causality”

Observational Learning books by Paul Rosenbaum

Miguel Hernan and Jamie Robins “Causal Inference” free now http://www.hsph.harvard.edu/miguel-hernan/causal-inference-

book/

Intro to Practical Natural Language Processing

Wharton Data Camp Sessions 7

Agenda1) Brief light-hearted Intro to NLP (What is it and why should

I care?)2) Basic ideas in NLP3) Usage in Business Research

Quick OverviewWhat is Natural (Spoken) Language Processing (NLP)?

Examples

How this technology may affect:Industry

Academics

Natural Language Processing

Natural Language Processing is an interdisciplinary field composed of techniques and ideas from computer science, statistics and linguistics that are concerned with making computers able to parse, understand (knowledge representation), store (knowledge database), and ultimately interact (convey information) in natural language (human language such as English)

Methods: machine learning, bayesian statistics, algorithms, higher order logic, linguistics.

Subcategories of NLPInformation Retrieval: Google. Optimizing text database search.

Information Extraction: Crude basic form is Web Crawling + REGEX. Really sophisticated form, you’ll see later – Thomson Reuters

Machine Translation

Sentiment Analysis and more

Cool ApplicationsNSA - uses NLP to detect anomalous activity in internet and phone calls for terrorist activities

Lie detection via spoken language processing

Automatic plagiarism detector

ETS Testing - since 1999 “e-rater” automatic essay scoring on GMAT, GRE, TOEFL.

Shazam – song discovery (application of spoken language processing)

News aggregators based on topic

Entertainment - Cleverbot (Turing test 59.3% VS real human 63.3%) Really evolved from dumb predecessors ELIZA, Smarter child etc.

Business ApplicationsMarketing - sentiment analysis and demand analysis of products from reviews and blogs e.g. movies, consumer products

Marketing – Opinion Mining/Subjectivity analysis/Emotion Detection/Opinion Spam Detection etc

Finance - Quantitative Qualitative high frequency trading ( Thomson Reuters, Bloomberg)

Management – Resume filtering and firm-employee matching

Legal Studies – legal document search engines

E-Commerce – help chat bots

Main stream Applications

Siri (dumb) - preprogrammed. No learning

IBM Watson/ Wolfram Alpha (smart):

semantic representation of concepts

acquisition of knowledge

logical inference machine

As of 2011, Watson had knowledge equivalent of a second year medical student (which isn’t saying much but still cool due to the speed Watson learns)

Watson gets an attitude

IBM Watson learned urban dictionary in 2013…

“Watson couldn't distinguish between polite language and profanity -- which the Urban Dictionary is full of. Watson picked up some bad habits from reading Wikipedia as well. In tests it even used the word "bullshit" in an answer to a researcher's query.

Ultimately, Brown's 35-person team developed a filter to keep Watson from swearing and scraped the Urban Dictionary from its memory.”

Well no $@!# Sherlock! You

mea@#$%s can bite my shiny metal

!@$

Some fun facts

15,000

Average number of words spoken by an average person per day (various sociology, linguistics studies). approximately 15 words per min assuming 8 hour sleep.

100Million~300Million:

Average number of words spoken by an average person in a lifetime.

100 TRILLION:

approx number of words on internet in 2007 by Peter Norvig (leads google research & AI scientist).

Reasons why you should at least acknowledge NLP and keep it in

mind for the rest of your life

1. It will definitely be a disrupting technology and a large part of everyday life affecting most type of business (already has disrupted finance, marketing, management, etc)

2. Text Data: Explosion of web, Company performance report, news, security filings etc

3. Even in business research outside of Information Systems Management and Marketing, more and more researchers are utilizing NLP

Example Focus(Finance)

Thomson Reuters (Automation Team) and Bloomberg

Business Wire: 60 stories per second

“Apple also announced that Scott Forstall will be leaving Apple next year and will serve as an advisor to CEO Tim Cook in the interim”

Lake Shore Bancorp, Inc. (the “Company”) (NASDAQ Global Market: LSBK), the holding company for Lake Shore Savings Bank (the “Bank”), announced third quarter 2012 net income of $863,000, or $0.15 per diluted share, compared to net income of $1.2 million, or $0.20 per diluted share, for third quarter 2011. The Company had net income of $2.8 million, or $0.48 per diluted share, for the nine months ended September 30, 2012, compared to net income of $3.1 million, or $0.54 per diluted share for the same period in 2011.

Extract relevant information -> computer readable format such as XML/JSON

KEY: the format of where information is, and how to extract is not preprogrammed. The NLP engine learns as new information comes in. Initially, it learns how to extract and what is important by humans tagging many articles. (semi-supervised learning)

Lake Shore Bancorp, Inc. (the “Company”) (NASDAQ Global Market: LSBK), the holding company for Lake Shore Savings Bank (the “Bank”), announced third quarter 2012 net income of $863,000, or $0.15 per diluted share, compared to net income of $1.2 million, or $0.20 per diluted share, for third quarter 2011. The Company had net income of $2.8 million, or $0.48 per diluted share, for the nine months ended September 30, 2012, compared to net income of $3.1 million, or $0.54 per diluted share for the same period in 2011. [...]

Named Entity Recognition: Has to realize that “Lake Shore Bancorp, Inc.” is a name of a company Coreference resolution: “the company” is Lake Shore Bancorp, Inc Morphological segmentation: breaking of words into basic parts and meaning “lexeme” e.g. Announced is past tense of lexeme “announce” with inflection rule -edPart of Speech Tagging and Grammar Parsing Chunking and Breaking: e.g. A and B of X and Y is(A,X) and is(B,Y)

Example Focus(Finance)

<company name=”Lake Shore Bancorp”><Alias>The Company</Alias><Holds>Lake Shore Savings Bank</Holds><Holds>The Bank</Holds> <Q year=”2012” period=”third”>863000</Q><Q year=”2011” period=”third”>1.2 Million</Q><Net year=”2012” month=”9”>2.8 Million</Net><Net year=”2011” month=”9”>3.1 Million</Net>.........</company>

Example Focus(Finance)

Lake Shore Bancorp, Inc. (the “Company”) (NASDAQ Global Market: LSBK), the holding company for Lake Shore Savings Bank (the “Bank”), announced third quarter 2012 net income of $863,000, or $0.15 per diluted share, compared to net income of $1.2 million, or $0.20 per diluted share, for third quarter 2011. The Company had net income of $2.8 million, or $0.48 per diluted share, for the nine months ended September 30, 2012, compared to net income of $3.1 million, or $0.54 per diluted share for the same period in 2011. [...]

Real example XML from Thomson Reuters

Bottom Line

NLP can do lots of cool stuff

Unstructured text data is huge and is growing faster than ever. And it will continue to grow as online population increases

NLP is an important tool for anyone to be aware of

Jurafsky & Martin “Speech and Language Processing” for deep theory

Bing Liu’s two books: http://www.cs.uic.edu/~liub/

Practical NLTK books: an NLTK cookbook by Jacob Perkins and “NLP with python” by Steven Birds et al

Next SessionYou’ll see NLP in action (specific tasks)

Actual codes using NLTK (install this!)

Example case based on my research – Uses NLP and cutting-edge machine learning techniques (similar to netflix prize winning algorithm) to content-code large scale social media data.