47
The Big Data Revolution Learning from Big Data: Text, Feelings and Machine Learning Cases Conclusions Big Data Daniel Hardt IT Management, CBS Supply Chain Leaders Forum 3 September 2015 Daniel Hardt Big Data

Big Data - CBS - Copenhagen Business School · Madrigal (2014), Atlantic ... Big Data at Roskilde Festival A New Analysis using Language Technology Extract relevant factors based

  • Upload
    ngokien

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Data

Daniel Hardt

IT Management, CBS

Supply Chain Leaders Forum3 September 2015

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Outline

1 The Big Data RevolutionWatsonGoogle Self-Driving Car

2 Learning from Big Data: Text, Feelings and MachineLearning

Sentiment AnalysisMining Facebook for Feelings

3 CasesBig Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

4 Conclusions

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Increasing Availability of Data

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Data Challenges

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Characteristics

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Lots of Photos

fstoppers.com

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Lots of Photos

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Big Data: Definition

Big data – data sets so large or complex that traditional dataprocessing applications are inadequate. (Wikipedia)

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Is this Surprising?

Moore’s law: computing power doubles every 2 years(roughly)

forums.xkcd.com

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Is this True?

Big data – data sets so large or complex that traditional dataprocessing applications are inadequate. (Wikipedia)

Increase of data is keeping pace with processing powerIn fact, increase in data is itself supporting new ways toprocess data – Artificial Intelligence

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Watson: The Jeopardy Challenge

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Watson: Jeopardy

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Jeopardy is Hard!

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Watson: Health Care

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

IBM – Evolution of Computing

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Cognitive Computing

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

Google Self-Driving Car

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

The Google Car’s View of the World

Madrigal (2014), Atlantic

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

The Google Car’s View of the World

Madrigal (2014), Atlantic

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

WatsonGoogle Self-Driving Car

The Trick: “Crawling” the World

Google wants to make the self-driving car problem into aBig Data problemCar has ultra-detailed map for every road it travels on,“down to tiny details like the position and height of everysingle curb . . . a precision measured in inches”Google has mapped 2,000 miles of road. The US roadnetwork has 4 million miles of road. “It is work,” Urmsonadded, shrugging, “but it is not intimidating work.”

Madrigal (2014), Atlantic

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Liu, Bing. Sentiment analysis and subjectivity Handbook of natural language processing 2 (2010): 568.

Sentiment analysis or opinion mining is the computationalstudy of opinions, sentiments and emotions expressed intextLots of Buzz!

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Business

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Business

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Business

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Business

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Business

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Machine Learning MethodsPang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. "Thumbs up?: sentiment classification using machinelearning techniques." Proceedings of the ACL-02 conference on Empirical methods in natural languageprocessing-Volume 10. Association for Computational Linguistics, 2002.

Bag of words: With lexicon of m words, each document d isrepresented by the document vector(n1(d),n2(d), ...,nm(d))Machine Learning: Naive Bayes, Maximum Entropy,Support Vector MachinesNaive Bayes:

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Data: Facebook Feelings

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Arousal and Valence: Data

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Five Basic Feelings: Data

AnimatedExcited 155291Pumped 2979Surprised 752Amused 14993

JoyHappy 114259Wonderful 54691Awesome 22351Super 5794Great 55180Fantastic 3596Delighted 805Satisfied 1349Content 628Hopeful 21399

AngryAngry 12680Pissed 3851Annoyed 16839Frustrated 1145Disappointed 2534Disgusted 1566

FearfulWorried 3274Scared 2075Anxious 1002Shocked 1391Confused 3904

EmpoweredDetermined 29850Confident 2341Accomplished 6570Proud 31363

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Classifier

Basic Feelings (5-way classification)Classifier: MaxEntTraining Accuracy: .87Testing Accuracy: .75 (10-fold validation)

Arousal (2-way classification)Classifier: MaxEntTraining Accuracy: .99Testing Accuracy: .80 (10-fold validation)

Valence (2-way classification)Classifier: MaxEntTraining Accuracy: .99Testing Accuracy: .83 (10-fold validation)

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Two-D Classification: Valence and Arousal

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Two-D Classification: Comparisons

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Feeling Meter: Manual Assessment

Test Set: 160 examples from different sourcesManual Task: Order Feelings Expressed (1 is mostexpressed, 5 least; 0 not expressed at all)Results: Binary Decision – is feeling expressed or not?

(Ignore examples where 1st coder notes no feelingsexpressed – leaves 92 examples)Agreement on Feelings Expressed1st coder vs 2nd coder: 0.797385620915033 366 out of459 in 92 cases1st coder vs System: 0.734204793028322 337 out of 459in 92 cases

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Sentiment AnalysisMining Facebook for Feelings

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Assessment of Country Logistics Systems

What are logistics and supply chain costs in differentcountries?Specific transportation system cost categories like road,rail, air etc.Interaction of these costs with each other and withinformation and communication systemsRelevant to the investment decision-making considerationsof firms

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

An Analysis based on Reports

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Assessing Relevant Factors from Reports

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

A New Analysis using Language Technology

Extract relevant factors based on distribution of words andterms in reportsUse metrics like TFIDF, which finds terms that are likely tobe characteristic of a given textWith automatic analysis, can consider 10 or 100 timeslarger quantities of data – reports over a ten year period,with dozens of countries

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Roskilde slides

from Per Østergaard Jacobsen (CBS) and Henrik Hammer Eliassen (IBM)

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Big Transportation and Trade Data AnalyticsBig Data at Roskilde Festival

Big Data Analysis

Where do people go?What do they buy?Machine Learning and AI can predict: under a given set ofconditions (weather, previous movements, age, gender,etc), what is the probability of a given purchase?Watson technology is being brought to bear on suchquestionsRelevant for supply chain

Daniel Hardt Big Data

The Big Data RevolutionLearning from Big Data: Text, Feelings and Machine Learning

CasesConclusions

Conclusions

All industries will be fundamentally transformed by BigDataMany changes in areas like transportation, consumerforecasting that are crucial for supply chain managementLots of things happening at CBS!

Daniel Hardt Big Data