59
Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 [email protected] [email protected]

Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 [email protected] [email protected]

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Unsupervised Approaches

Aditya M Joshi Center for Indian Language Technologies (CFILT)

IIT Bombay 20th June, 2016

[email protected] [email protected]

Page 2: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Images from wikimedia commons

Page 3: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu
Page 4: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Unsupervised Approaches

• Technique to infer a function to describe hidden structure from unlabelled data

• Use unlabelled data for prediction tasks

Page 5: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Popular Approaches

• Clustering

• Latent Dirichlet Allocation (LDA) Model

Page 6: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

clustering

Page 7: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Clustering

• Find clusters in a set of data points

Page 8: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Clustering

• Find clusters in a set of data points

Page 9: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

k-means Clustering

• Dataset {x1, x2 … xn} • Goal: Partition n observations into k

clusters • Membership indicated by rnk

• Goal, redefined: Minimize J

Page 10: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Algorithm

• Initialisation: pick K of the data points to be the initial means.

• Go over each data point and assign it to one of the means based on which is closest. E.g. if datapoint xn is closest to the second mean, assign it to that mean

• Recompute each of the means as the average of all the points assigned to the mean

Page 11: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Illustration

Page 12: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

latent dirichlet allocation models

Page 13: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

Page 14: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Outline

• Motivation and Introduction (Blei (2011)) • Building blocks of LDA: Dirichlet and

Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Page 15: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Revisiting classifiersWhat did Prof. Pushpak Bhattacharyya talk about in

“Topics in NLP” lecture today?

Lecture transcript

Classifier

NLP, Databases, Compilers

Topic models can do much more than this: with unlabeled corpus

SA, MT, Wordnet

Page 16: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

“Topic-document distribution”

Lectures from 2008 to 2013

Topic Modeler

strong AIparser

alignmentACL

thwarting

co-reference resolution

demoRPC

MTP

KrishnaRaag

Mahabharat

NLP Academic Cultural

Swar-sandhya

*Hypothetical example

NLP = 0.7 Academic = 0.2 Cultural = 0.1

Proportion of each topic in a document: “Multiple membership”

* And in context of sentiment analysis?

Page 17: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

“Word-topic distribution”“Aaditya, you are not making

sense.” “Let’s study word sense

disambiguation”

Lectures from 2008 to 2013

Topic Modeler

senselogic

explanationconfused

sense

wordnetpolysemy

iterative word

*Hypothetical example

“Relevance of each word to a topic” Words across “topics” actually indicate different senses in which a word occurs

Page 18: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Definition

• Topic models are a suite of algorithms that discover thematic structures in a data collection. (Blei (2011) )

• What is a thematic structure? A topic: a collection of words

• Used for a wide variety of tasks such as: author recognition, aspect extraction, sentiment modelling

Page 19: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Black box

Topic Modeler

Unlabeled corpus

Document-topic distribution

Overall word-topic distribution

FAQs:

Can you predict a test document directly? Not directly. Is there only one way to construct a topic model? No. By intelligent structure of the model, you can derive useful information.

Page 20: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

LDA Model

• Latent Dirichlet Allocation (LDA) model is a basic probabilistic topic model

• This presentation focuses on LDA and its adaptations with sentiment as the goal.

Page 21: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Plate Notation (1/2)

wNd

D

wNd

D

Unlabeled Corpus

wNd

D

Labeled Corpus

L

z

Word

Topic (Latent)

Page 22: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Plate Notation (2/2)

wNd

D

z

wNd

D

z

wNd

D

z

Ns

Word-level topics Document-level topicsSentence-level topics

Page 23: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Growing LDA further

wNd

D

z

θ θ(Z): NLP = 0.7, culture = 0.2, motivation = 0.1

Z

ϕ

ϕ (Z,word): (NLP, sense) = 0.7, (culture, sense)= 0.1, (motivation, sense) = 0.2

Let us now focus on these two multinomial distributions

Page 24: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Page 25: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Multinomial distribution

• Training a LDA implies learning the parameters of the multinomial distribution

• We now focus on a multinomial distribution and the way it is modelled in case of LDA.

θ

Page 26: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Parameter estimation (Heinrich (2004))

P(θ|x) = P(x | θ) P(θ) P(x)

Prior

Posterior

Likelihood

P(x) = ∫ P (x| θ). P(θ) d θ

Estimating the posterior: P(θ|x) Why? Goal: To estimate θd and ϕwz as accurately, given the data: documents The two are categorical distributions

P(θ|x) ∝ P(x | θ) P(θ)

Marginal likelihood

Page 27: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Binomial distribution & MLE

• Toss of a biased coin

P(X=1) = q P(X=0) = (1-q)

X = {x1, x2,... ,xN}

MLE = argmax P(X|q)

MLE = argmax P(x1|q). P(x2|q)... P(XN|q) = argmax qx1(1-q)(1-x

1). .... qxN(1-q)

(1-xN

)

= argmax q(x1+x2..XN)(1-q)(N-(X1+X2..XN)

= argmax qm(1-q)n-m

= m / n

P(x1|q) = qx1(1-q)(1-x1)

argmax (mlog q + (n-m)log(1-q))

Equating derivative to zero, m/q = (n-m)/(1-q)

q = m/n m, n are “sufficient statistics” of a binominal distribution

Page 28: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

MAP of Binomial distribution (1/2)

MAP = argmax P(q|X) = argmax P(X|q) P(q) = argmax qm(1-q)n-m P(q) Problem! P(q) can be any distribution,

strictly speaking. Computationally difficult!

Assume: P(q) is a beta distribution P(q) = qα-1 (1-q) β-1 / Beta(β-α)

Page 29: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

MAP of Binomial distribution (2/2)

MAP = argmax qm(1-q)n-m P(q) ∝ argmax qm(1-q)n-m qα-1 (1-q) β-1 ∝ argmax q(m+α-1)(1-q)(n-m+β-1)

∝ (m+α-1) / (n+α+β-2)argmax (m+α-1) log q + (n-m+β-1) log(1-q))

Equating derivative to zero, (m+α-1) /q = (n-m+β-1) /(1-q) (m+α-1)-q(m+α-1) = q (n-m+β-1) (m+α-1) = q(m+α-1+n-m+ β-1) (m+α-1) = q(α+n+ β-2) q = (m+α-1) /(α+n+ β-2)

Beta Distribution

Binomial Distribution

Conjugate prior

Page 30: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Conjugate prior

• A distribution is a conjugate prior to a posterior distribution if both of them have the same form

• “Algebraic convenience”

Beta distribution is a conjugate prior of binomial distribution

What is it for categorical distribution?

Page 31: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Categorical distribution

• Roll of a diceP(X=1) = q1

P(X=2) = q2

... P(X=6) = q6

P(xi=k|q) = qk

X = {X1,...XN} ~ Cat(q)P(X|q) = argmax P(x1|q). P(x2|q)... P(XN|q)

= argmax π qjcj

MAP = argmax P(X|q) P(q)

= argmax π qjcj P(q)

P(q) ∝ π qj αj-1

MAP ∝ argmax π qjcj qj αj-1

∝ argmax π qj αj+cj-1

Dirichlet distributionDirichlet Distribution

Categorical Distribution

Conjugate prior

Page 32: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Binomial & Categorical distribution

θ zCategorical

αθ ~ Dir (α) z ~ Categorical (θ)

q xBinomial

α q ~ Beta(α, β ) x ~ Binomial (q)

β

P(z| θ) = θz

Hyper-parameters Distribution Random variable assignments

Does the name Latent Dirichlet Allocation seem justifiable now?

Page 33: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Z

Nd

D

Our first LDA model

θ

z

w

α

ϕ

β

Page 34: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004)) • Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Page 35: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Estimation of LDA model

The denominator is computationally intractable. Hence, Gibbs sampling is used.

We now describe the generative story.

P(θ, ϕ| w) = P(w| θ, ϕ)P(θ, ϕ )/P(w)

Every LDA paper has: Plate notation Generative story Gibbs sampling formulas

Page 36: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Z

Nd

D

Generative story

θ

z

w

α

ϕ

β

Sample ϕ ~ Dir(β)

For each document, Generate θ ~ Dir (α) For each word,

Sample z ~ Multinomial (θ)

Sample w ~ ϕ(z)

Page 37: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Z

Nd

D

Implementing topic models

θ

z

w

α

ϕ

β

Sample ϕ ~ Dir(β)

For each document, Generate θ ~ Dir (α) For each word,

Sample z ~ Multinomial (θ)

Sample w ~ ϕ(z)

Page 38: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Sampling from multinomial

Input: θ : P(z=0) = 0.1, P(z=1) = 0.3, P(z=2) = 0.6

Goal: Sample a z given this distribution

θ

z

Z=0 Z=1 Z=2

0 0.1 0.4 1

Page 39: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Z

Nd

D

Implementing topic models

θ

z

w

α

ϕ

β

Sample ϕ ~ Dir(β)

For each document, Generate θ ~ Dir (α) For each word,

Sample z ~ Multinomial (θ)

Sample w ~ ϕ(z)

Page 40: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Gibbs sampling

Initialize all word positions to random z’s. Compute θ & ϕ accordingly. For each iteration, For each document, For each word, Generate a z based on θ Generate a w based on ϕw|z

Compute θ & ϕ

Page 41: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009)) • Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Page 42: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Evaluation

• Qualitative evaluation (Understanding topic cohesion) (Mukherjee et al (2012))

• Classification accuracy based on topics uncovered

• Held-out likelihood (Likelihood of data given parameters) (Wallach et al (2009))

A naïve addition: • Measuring sentiment cohesion: Count of positive

and negative words in each topic

Page 43: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Page 44: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Experiments with LDA

• Goal: Understand topic models & obtain sentiment-coherent topics from a LDA model

• Implementation: – Topic model implementation using Gibbs

sampling – Hyper-parameter estimation as given in Heinrich

(2009) – “Left to right” likelihood algorithm by Wallach

(2009)

Page 45: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Data set

• Movie review data set from Amazon by McAuley & Leskovec (2013). – Training data set: 11000 movie reviews – Test data set: 2000 movie reviews

• Average length of a review: ~140 words

Page 46: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Effect of hyper-parameter estimation

Page 47: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Discovering sentiment-coherent topics

Modify basic LDA in one of the following ways:

1) Bootstrap sentiment priors with word lists

2) Modifying the structure of the topic model

Page 48: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Existing topic models

• Lin & He(2009) present a Joint Sentiment-Topic Model with sentiment as a latent variable.

• Jo & Ho(2011) extract senti-aspects: (sentiment, feature) pairs.

• Titov & McDonald (2008) use a sliding window model to incorporate discourse nature of reviews.

• Mukherjee & Liu (2012b) identify words belonging to six types of review comment expressions from an unlabeled corpus.

Page 49: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Discovering sentiment-coherent topics

Modify basic LDA in one of the following ways:

1) Bootstrap sentiment priors with word lists

2) Modifying the structure of the topic model

Page 50: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Discovering sentiment: Use of priors

• Induce positive words and negative words to belong to certain topics (based on Lin & He (2009) ) – For negative words, set

beta(word, z=0 to z/2) = 2*beta. beta(word, z=z/2 to z) = 0.

– The corresponding beta for positive words.

Page 51: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Use of Priors: Results (1/2)

• “Basic”: Imposing priors on only 12 sentiment words

• Leads to greater sentiment words being identified in correct topics

Page 52: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Use of Priors: Results(2/2)

Qualitative evaluation:Topic 38 7.330 horror 2.392 killer 2.248 scary 2.147 house 2.072 gore Some topics are positive while

others are negative, depending on the priors.Topic 13

6.931 michael 3.929 fans 3.423 live 2.379 amazing 2.354 concert

Page 53: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Discovering sentiment-coherent topics

Modify basic LDA in one of the following ways:

1) Bootstrap sentiment priors with word lists

2) Modifying the structure of the topic model

Page 54: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Discovering sentiment: Modifying structure

• Sentiment is explicitly modelled as a latent variable (Based on joint sentiment Tying model by Lin & He (2009)SLDA SLDA-Split

Page 55: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Sentiment as a Variable: Results (1/2)

Parameters: Z = 70; S = 2

Page 56: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Sentiment as a Variable: Results (2/2)

• SLDA

• SLDA-Split

Topic 13, s = 0 9.551 show 9.254 humor 7.166 comedy 4.846 watch 4.680 hilarious

Topic 13, s = 2 6.964 rock 5.547 children 5.38 school 4.636 remember 3.432 learn

No equivalence between topic 13 for s = 0 and s = 2.

Topic 31, s = 0 8.006 product 6.277 received 5.244 amazon 4.119 condition 4.043 seller

Topic 31, s = 1 5.206 return 4.661 problem 4.412 disappoint 3.654 case 3.616 copy

Topic 31, s = 2 10.358 amazon 9.213 play 7.068 player 3.651 dvds 3.594 purchased

A topic essentially implies “different polarities” in the same ‘context’

For S = 3,

Page 57: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

Conclusion

• Unsupervised approaches rely on unlabelled data

• We looked at k-means clustering • Also at unsupervised/semi-supervised

approaches like LDA

Page 58: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

References (1/2)• Balamurali, A., Joshi, A., & Bhattacharyya, P. (2011). Harnessing wordnet senses for supervised sentiment classication. In Proceedings of

the Conference on Empirical Methods in Natural Language Processing, (pp. 1081{1091). Association for Computational Linguistics. • Balamurali, A., Joshi, A., & Bhattacharyya, P. (2012). Cross-lingual sentiment analysis for indian languages using linked wordnets. In

COLING (Posters), (pp. 73{82). • Balamurali, A., Khapra, M. M., & Bhattacharyya, P. (2013). Lost in translation: viability of machine translation for cross language

sentiment analysis. In Computational Linguistics and Intelligent Text Processing (pp. 38{49). Springer. • Banea, C., Mihalcea, R., Wiebe, J., & Hassan, S. (2008). Multilingual subjectivity analysis using machine translation. In Proceedings of

the Conference on Empirical Methods in Natural Language Processing, (pp. 127{135). Association for Computational Linguistics. • Blei, D. M. (2011). Introduction to probabilistic topic models. • Blei, D. M., Ng, A. Y., Jordan, M. I., & La

erty, J. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003. • Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., & Blei, D. (2009). Reading tea leaves: How humans interpret topic models. In Neural

Information Processing Systems (NIPS). • Brody, S. & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online reviews. In HLT-NAACL, (pp. 804{812). The

Association for Computational Linguistics. • Carl, M. (2012). Translog-ii: a program for recording user activity data for empirical reading and writing research. In LREC, (pp.

4108{4112). • Dragsted, B. (2010). Coordination of reading and writing processes in translation. Translation and Cognition, American Translators

Association Scholarly Monograph Series. Amsterdam/Philadelphia: Benjamins, 41{62. • Duh, K., Fujino, A., & Nagata, M. (2011). Is machine translation ripe for cross-lingual sentiment classication? In ACL (Short Papers), (pp.

429{433). • Fellbaum, C. (2010). Wordnet: An electronic lexical database. 1998. WordNet is available from http://www. cogsci. princeton. edu/wn. • Jo, Y. & Oh, A. (2011). Aspect and sentiment unication model for online review analysis. In Proceedings of the fourth ACM international

conference on Web search and data • mining, (pp. 815{824). ACM.

Page 59: Unsupervised Approaches...Unsupervised Approaches Aditya M Joshi Center for Indian Language Technologies (CFILT) IIT Bombay 20th June, 2016 adityaj@cse.iitb.ac.in ajos17@student.monash.edu

References (2/2)• Joshi, S., Kanojia, D., & Bhattacharyya, P. (2013). More than meets the eye: Study of human cognition in sense annotation. In Proceedings of NAACL-HLT, (pp. 733{738). • Kulis, B. (2012). Conjugate priors. • Lin, C. & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In Cheung, D. W.-L., Song, I.-Y., Chu, W. W., Hu, X., & Lin, J. J. (Eds.), CIKM, (pp. 375{384).

ACM. • Lu, B., Tan, C., Cardie, C., & Tsou, B. K. Joint bilingual sentiment classification with unlabeled parallel corpora. • McAuley, J. J. & Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd

international conference on World Wide Web, (pp. 897{908). International World Wide Web Conferences Steering Committee. • McCallum, A. (2002). MALLET: A machine learning for language toolkit. • Meng, X., Wei, F., Liu, X., Zhou, M., Xu, G., & Wang, H. (2012). Cross-lingual mixture model for sentiment classication. In Proceedings of the 50th Annual Meeting of the

Association for Computational Linguistics: Long Papers-Volume 1, (pp. 572{581). • Association for Computational Linguistics. • Mukherjee, A. & Liu, B. (2012a). Aspect extraction through semi-supervised modeling. In ACL (1), (pp. 339{348). The Association for Computer Linguistics. • Mukherjee, A. & Liu, B. (2012b). Modeling review comments. In ACL (1), (pp. 320{329). The Association for Computer Linguistics. • Mukherjee, A. & Liu, B. (2013). Discovering user interactions in ideological discussions. In ACL (1), (pp. 671{681). The Association for Computer Linguistics. • Mukherjee, S. & Bhattacharyya, P. (2012). Wikisent: Weakly supervised sentiment analysis through extractive summarization with wikipedia. In Machine Learning and

Knowledge Discovery in Databases (pp. 774{793). Springer. • Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In Li, Y., 0001, B. L., & Sarawagi, S. (Eds.), KDD, (pp.

542{550). ACM. • Pang, B. & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting

on Association for Computational Linguistics, (pp. 271). Association for Computational Linguistics. • Pang, B. & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2 (1-2), 1{135. • Prettenhofer, P. & Stein, B. (2010). Cross-language text classication using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association

for Computational Linguistics, (pp. 1118{1127). Association for Computational Linguistics. • Rosen-Zvi, M., Griths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In 20th Conference on Uncertainty in Articial Intelligence,

volume 21, Ban Park Lodge, Ban , Canada.

• Scott, G. G., O'Donnell, P. J., & Sereno, S. C. (2012). Emotion words aect eye xations during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38 (3), 783.

• Searle, J. R. (1992). The rediscovery of the mind. the MIT Press. • Titov, I. & McDonald, R. T. (2008a). A joint model of text and aspect ratings for sentiment summarization. In McKeown, K., Moore, J. D., Teufel, S., Allan, J., & Furui, S.

(Eds.), ACL, (pp. 308{316). The Association for Computer Linguistics. • Titov, I. & McDonald, R. T. (2008b). Modeling online reviews with multi-grain topic models. CoRR, abs/0801.1063. • Wallach, H. M., Mimno, D. M., & McCallum, A. (2009). Rethinking lda: Why priors matter. In NIPS, volume 22, (pp. 1973{1981). • Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on

Machine Learning, (pp. 1105{1112). ACM. • Wang, X., McCallum, A., & Wei, X. (2007). Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE • International Conference on Data Mining (ICDM), Nebraska, USA. • Yin, Y., Zhou, C., & Zhu, J. (2010). A pipe route design methodology by imitating human imaginal thinking. CIRP Annals-Manufacturing Technology, 59 (1), 167{170.