12
Advanced Political Analysis through “Big Data” Elections 2012

Political Opinion Mining, Sentiment Analysis and Technology

Embed Size (px)

DESCRIPTION

http://www.zd8a.com Slide Deck Focus Sentiment analysis through Facebook and Twitter leveraging -Hadoop -MongoDB -Mahout -Greenplum -Solr This slide deck was a product of developing a sentiment and text analytics engine. We leveraged Facebook Connect, Twitter Firehose and web scrapting to gather text and store it in both MongoDB and Hadoop. Once we had it stored we performed Mahout and Solr text searching and anlytics to determine trends within the data. Although our dataset was not large enough to need it, we used Greenplum as a test MPP database to tie all three of those technologies into one dashboard using Pentaho.

Citation preview

Page 1: Political Opinion Mining, Sentiment Analysis and Technology

Advanced Political Analysis through “Big Data”

Elections 2012

Page 2: Political Opinion Mining, Sentiment Analysis and Technology

Z DATA’S AGILE ANALYSIS – THE “BIG DATA STACK”

• How we leverage the “Big Data” stack?– Technology

• Don’t back your problem into available technologies, leave your toolset open.• Organically grow new skillsets, hire the right individuals

– Development• Be agile in your approach• Comparative analysis both using new mathematical methods and open source technologies

– Embrace the shift into a data driven world• Empower your Engineering and Science team to be creative• Let the data lead your direction • Use new data types previously unavailable to drive insights

“Associating structured and unstructured data at relevant points is where the most value is gained and where the highest level of challenge is presented.” – Ryan Abo PHD – Z Data Inc.

Page 3: Political Opinion Mining, Sentiment Analysis and Technology

ANALYZING THE POLITICAL LANDSCAPE

Phase 1

• Location based Google Search and Twitter mentions

• Word pair mentions

Phase 2

• Facebook and Twitter Sentiment and Geospatial Analysis

Page 4: Political Opinion Mining, Sentiment Analysis and Technology

Structured Data• Standard Datawarehouse – finance, sales• GeoSpatial – locations, places• Technologies – Greenplum, Netezza, Teradata

Unstructured Data • Textual Objects - Social Media, Blogs, forums• Bitmap Objects – images, video, audio• Technologies – Hadoop, Cassandra, Solr, NoSql

UNSTRUCTURED AND STRUCTURED DATACOMPLEMENTING YOUR TECHNOLOGIES

Page 5: Political Opinion Mining, Sentiment Analysis and Technology

Identifying Unstructured Data Sources

Facebook

- User Likes and Favorites

- Article/Video/Link Shares

- Views

- Comments

- Location / Geospatial

Twitter

Tweet Characteristics

- Length

- Language Model

- Symantics

- Emoticons

- Location / Geospatial

Google / You Tube

- Blogs

- Comments

- Search Statistics

- Likes vs Dislikes

- Shares / Views / Comments

Objective: Identify and leverage social media outlets to better predict the overall sentiment across political candidates.

Page 6: Political Opinion Mining, Sentiment Analysis and Technology

Search Engine Data

• Number of Searches for a candidate or political party

• Word pair / combination analysis

Why should we care?• Determine the most successful candidate

online• Effectiveness of campaigns and conversion

to online competitive content

SEARCH, MENTION AND WORD PAIR ANALYSIS

Page 7: Political Opinion Mining, Sentiment Analysis and Technology

What is this sentiment they speak of?

• Unstructured Text Data

• Using computational linguistics to accurately determine the attitude of a writer with respect to a topic.

Why should we care?

• Use “Opinion Mining” to predict political bias

ADVANCED SENTIMENT ANALYSIS

Page 8: Political Opinion Mining, Sentiment Analysis and Technology

Zdata Unstructured Cluster

Customer Data

Relational and UnstructuredAnalytics / BI

ELT

Z DATA ADVANCED ANALYTICS SOLUTION

Page 9: Political Opinion Mining, Sentiment Analysis and Technology

Agile Analysis - Mathematical Methods

Prediction and Machine Learning

- Unigram and Bigram Features

- Bayesian Probability- Maximum Entropy- Distant Supervision- Support Vector

Machines

Page 10: Political Opinion Mining, Sentiment Analysis and Technology

POLITICAL OPINION MINING

#obama #Kardashian#iran #bieber#biglove #romney#palin #healthcare#stimulus #nexttopmodel#bigdata #teaparty

#obama#iran#romney#palin#stimulus#teaparty

UnStructured Analysis -

Naïve Bayes classifier

Political ClassificationUnstructured + Structured

Political Relevance

Erica – Wow I love cookies in the morning, check out my new batch

Daria – #Romney speech was horrible that guy knows nothing

Daria – #Romney speech was horrible that guy knows nothing

NAÏVE BAYES

0%10%20%30%40%50%60%70%80%90%

100%

ACURACYPECISIONRECALL

NAÏVE BAYES

0%10%20%30%40%50%60%70%80%90%

100%

ACURACYPECISIONRECALL

Page 11: Political Opinion Mining, Sentiment Analysis and Technology

ELECTIONS 2012 DASHBOARD

Positive

EducationEconomyForeign PolicyHealth Care

Neutral

EducationEconomyForeign PolicyHealth Care

Negative

EducationEconomyForeign PolicyHealth Care

Rom

ney

Paul

Hun

tsm

an

Gin

gric

h

Sant

orum

0

1

2

3

4

5

6

7

8

9

10

Orange County (January 2011 – May 2011)

SentimentActuals

FILTER BY:

Facebook

Twitter

Google

Mitt RomneyRepublican Primary

Democratic Vote

Republican Vote

Democratic Sentiment

Republican Sentiment

Page 12: Political Opinion Mining, Sentiment Analysis and Technology

SOCIAL SOLUTIONS WITH BIG DATAENOUGH TALK…