48
Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Embed Size (px)

Citation preview

Page 1: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Automatic Hierarchy Discovery and Opinion Mining of Political Blogs

Amit Goyal

Kristi McBurnieNovember 28, 2007

Page 2: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Outline

Introduction Previous Work Our Approach Example Challenges and Future Work Milestones Conclusion

Page 3: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Introduction

The Web contains a wealth of opinions about products, politics, newsgroup posts, review sites, and elsewhere

Our interest: to mine opinions expressed in user generated content

Page 4: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Applications Businesses and Organizations

Market Intelligence: A huge amount of money is spent to find consumer sentiments and opinions

Opinion Polls, surveys Individuals interested in other opinions when

Purchasing a product Finding opinion on political topics Using a service etc.

Smart Ads Place an ad when one praises a product Place an ad from a competitor if one criticizes a product

Opinion Search Provide search for opinions Give me opinions on “gmail” Give me comparisons between “gmail vs yahoomail”

Page 5: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Types of opinions Direct Opinions: sentiment expressions on objects. E.g.

policies, politicians, movies, products E.g. “I find myself in support of the Senate Judiciary Committee,

which approved legislation that clears the way for millions of undocumented workers to continue working in America and seek citizenship.”

Comparisons: relations expressing similarities or differences of more than one object. E.g. “I think Bush will beat Kerry in the presidential elections” or

“The lens quality of Camera A is better than Camera B”

Page 6: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Problem Statement

Given a object and a collection of reviews on it, the task is Identification of featuresMaking hierarchy of featuresSentiment Analysis: Determining the

orientation and strengthProvide a visualization (summary)

Page 7: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work Mainly focused on product and movie reviews Feature Extraction

Opinion Observer (Hu and Liu, 2004) Opine (Popescu and Etzioni, 2005) Red Opal (Scaffidi, 2007)

Hierarchical Discovery To be filled by kristi

Page 8: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Opinion Observer By Bing Liu and Minqing Hu Feature Extraction

Identify Nouns using POS tagging Identify Noun phrases by Association Rule Mining Compactness pruning, redundancy pruning Opinion word extraction Infrequent feature identification

72% precision and 80% recall

Page 9: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

OPINE Feature Extraction

First, extract nouns and noun phrases, retains those with frequency greater than some threshold

Evaluates each noun phrase by computing the PMI (point-wise mutual information) scores between the phrase and meronymy discriminators associated with the product class

E.g. “of scanner”, “scanner has”, “scanner come with” etc. for the Scanner class

PMI(f,d) = Hits(d+f) / {Hits(d) * Hits(f)} Then, PMI score are converted to binary features for a Naïve

Bayes Classifier, which outputs a probability associated with each fact

Compared to Hu and Liu work, 22% better precision and 3% lower recall

Page 10: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Red Opal 3 components:

Feature Extractor Product Scorer User Interface

Performs better than Opinion Observer

Page 11: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Red Opal Feature Extraction

POS tagging, takes noun and noun phrases as potential features

Use lemma frequency to rank the features Product Scoring: Score of feature f of product p

o(r,f) is the number of occurrences of feature f in review r

w(r,f) is the weight of feature f in review r

Page 12: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Clustering

Conceptual clustering CLUSTER/2

Places object descriptions and attributes together to obtain domain-dependent goals

COBWEB Favours classes that maximize the information that can be predicted

from knowledge of class membership

Hierarchical clustering BIRCH

Hierarchically cluster elements in a dataset Level of clustering quality = level in the hierarchy

Page 13: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Hierarchy Discovery

Han and Fu define formally as “A sequence of mapping from a set of lower-level concepts to their higher-level correspondences” DBLearn automatically discovered a hierarchy of concepts for

the purpose of data mining Ie: birthplace may have the following hierarchy: city, province,

country

Foreman et al. Trains categorizers and automatically constructs hierarchy of

categories using human trainers Good GUI Difficult for novice users and hard to optimize

Page 14: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Page 15: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Hierarchy Discovery

Sanderson and Croft Automatically develop hierarchy in web documents Organize extracted words/phrases using subsumption No clustering or training techniques

Yang and Lee Hierarchies of web directories Text mining to discover relationships between documents and

between words Cluster them into document and word maps

Page 16: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Sentiment Analysis

Esuli and Sebastiani 3 stages:

Determine subjective/objective polarity Determine positive/negative polarity Determine strength of the positive/negative polarity

Uses SentiWordNet to assign 3 scores to each word (objectivity, positivity, negativity)

Page 17: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Sentiment Analysis Pang and Lee

Only subjective sections of the movie review Machine learning techniques

Pair-wise relations between extracts to build an undirected graph Minimum cut

Efficient and results in higher accuracy rates

Agarwal and Bhattacharyya: SVM classifier Determine strength of polarity of subjective adjectives in good vs

bad classification based on WordNet’s synonymy graph Applied cut-based graph similar to Pang et al Reached accuracies of 84%-95.6%

Page 18: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our proposal Apply feature extraction and opinion mining in political

domain Applications in political domain:

Automatic opinion polls Identification of local/global issues in elections Target campaigning in elections Impact of speech

Output: <politician, topic, opinion, polarity> Objects are politicians Categories are political organizations Topic may be policies, issues etc In this project, we focus mainly on feature extraction and

their hierarchy discovery

Page 19: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Observations

Two kinds of opinions: Direct – talks about single object Comparison – talks about multiple objects

Two kinds of information Facts (objective) Opinions (subjective)

Sentiment Analysis can be done only on subjective information

Although, features occur both categories, subjective sentences are noisy

Page 20: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Comparison to product domainProduct Domain Political Domain

Category Product Category (e.g. Camera)

Political Organizations (e.g. Democrats)

Object Product (e.g. Camera A)

Leaders (e.g. Bush)

Features/Topics Properties (e.g. lens)

Policies (e.g. Immigration)

Page 21: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Page 22: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Perform feature extraction Split into objective and subjective phrases Hierarchy discovery on features from

objective sentences Sentiment analysis on features from

subjective sentences

Page 23: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Feature Extraction Extract the features

Extract nouns from POS tagging Extract noun phrases from Association Rule Mining Pruning Rank the features based on lemma frequency

Identify the subjectivity of all sentences Mine the opinion words (adjectives) Use key phrases dictionary (e.g. “can you believe”, “I think”, “I

recommend” etc) Visual differences – factual data is often represented in quotes

Page 24: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Hierarchy Discovery

3 approaches: Subsumption

Sanderson and Croft Look at every pair of terms and apply subsumption X subsumes Y if the documents in which Y occurs are a subset of

the documents in which X occurs P(X|Y) = 1 and P(Y|X) < 1

Clustering Use DBpedia and/or YAGO

X

Y

Page 25: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Hierarchy Discovery

3 approaches: Subsumption Clustering

Yang and Lee Cluster phrases by co-occurrance Using unsiupervised learning algorithm SOM networks

Organizes phrases into a 2D map of neurons According to similarity of vectors

3 Steps: Training process Assigning phrases to a neuron Labelling process

Use DBpedia and/or YAGO

Page 26: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Hierarchy Discovery

3 approaches: Subsumption Clustering

Find a group of dominating clusters (neurons) Make these as superclusters and put neighbours one level down Repeat for lower level of hierarchy under each subcluster

Use DBpedia and/or YAGO

Page 27: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Hierarchy Discovery

3 approaches: Subsumption Clustering Use DBpedia and/or YAGO

DBpedia provides 3 classification schemes: Wikipedia categories YAGO classification Word Net Sysnet Links

Page 28: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Hierarchy Discovery

3 approaches: Subsumption Clustering Use DBpedia and/or YAGO

Page 29: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Hierarchy Discovery

Page 30: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Sentiment Analysis

2 ways to approach this: Subjective phrases

What does the public think about each policy Objective phrases

What is the policy Rank parties from each policy on a scale from right-wing to left-wing

Page 31: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Sentiment Analysis

Subjective phrases What does the public think the policy Pang and Lee

Cut-based classification (Pang and Lee) Individual scores Association scores Partition Cost

A cut (S,T) of G is a partition of its nodes into sets S = {s} U S’ and T = {t} U T’, where s not contained in S’ and t is not contained in T’. Its cost cost(S,T) is the sum of the weights of all edges crossing from S to T

A minimum cut of G is one of minimum cost.

Page 32: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Sentiment Analysis

Subjective phrases What does the public think about each policy Agarwal and Bhattacharyya

Determine adjective strength

Cut-based classification between

sentences (Pang and Lee) Cut-based classification between

documents Improved accuracy

Page 33: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Our Approach

Sentiment Analysis

Objective phrases What is the policy Rank parties from each policy on a scale from right-wing to left-

wing

Definition of polarity would be left/right using a comparison of left-wing and right-wing policies/ideals

Instead of traditional positive/negative using the ideal words ‘poor’ and ‘excellent’

Left-wing(Liberal)

Right-wing(Conservative)

Page 34: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields.

Political Organization = Republicans

Politician = George Bush

Topic = War in Iraq Sub-topic = cost Opinion words =

absurd, killing, freeing Polarity = negative

Ideal case:

Page 35: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields.

Noun phrases: economic cost, war in Iraq, amount, report, amount, money, people, oil fields

Proper nouns: White House, Democrats on Congress Joint Economic Committee

Frequent features: economic cost, war in Iraq, money, oil fields, White House

Feature Extraction:

Page 36: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields.

Opinion words: think, absurd

1st sentence is objective, and 2nd is subjective

Interesting features: economic cost, war in Iraq

Identification of Subjectivity

Page 37: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields.

Identification of category/object for proper nouns using DBpedia

Category = Republicans

Object = George Bush

Hierarchy Discovery – step 1

Page 38: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

Page 39: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields.

Identification of policy hierarchy using subsumption and clustering

Policies are derived from interesting features economic cost, war in

Iraq

Hierarchy Discovery – step 2

Page 40: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

Page 41: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Example

The economic cost of the war in Iraq is estimated to total $1.3 trillion – roughly double the amount the White House has requested thus far, according to a new report by Democrats on Congress’ Joint Economic Committee. I think this is an absurd amount of money to be spending on killing people and freeing oil fields.

Opinion is the subjective sentence

Polar words: absurd, spending, killing, freeing

Polarity: Negative

Sentiment Analysis

Page 42: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Challenges Difficult to distinguish between objective and

subjective information Opinion words also occur in objective sentences Identification of spam blogs Identification of implicit features Mapping politician to the policy in comparison

blogs Deciding on a distance measurement for

clustering

Page 43: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Future Work

Implementation of algorithms Summarization of opinions

Visualization Refinements

Page 44: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Milestones

Decide on domain Read previous works Decide on an approach that is best for the

domain Write up an example to illustrate it Challenges and future work Presentation Write the paper

Page 45: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Questions?

Page 46: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

OPINE (Backup Slide)

Overall Process

Page 47: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Previous Work

Opinion Observer (Backup Slide)

By Bing Liu and

Minqing Hu

Page 48: Automatic Hierarchy Discovery and Opinion Mining of Political Blogs Amit Goyal Kristi McBurnie November 28, 2007

Types of opinions Direct Opinions: sentiment expressions on objects. E.g.

policies, politicians, movies, products E.g. “I find myself in support of the Senate Judiciary Committee,

which approved legislation that clears the way for millions of undocumented workers to continue working in America and seek citizenship.”

Comparisons: relations expressing similarities or differences of more than one object. E.g. “I think Bush will beat Kerry in the presidential elections” or

“The lens quality of Camera A is better than Camera B”