94
Joint Author Sentiment Topic Model Subhabrata Mukherjee Max Planck Institute for Informatics Gaurab Basu and Sachindra Joshi IBM India Research Lab April 25, 2014

Joint Author Sentiment Topic Model

Embed Size (px)

DESCRIPTION

Joint Author Sentiment Topic Model, Subhabrata Mukherjee, Gaurab Basu and Sachindra Joshi, In Proc. of the SIAM International Conference in Data Mining (SDM 2014), Pennsylvania, USA, Apr 24-26, 2014 [http://people.mpi-inf.mpg.de/~smukherjee/jast.pdf]

Citation preview

Page 1: Joint Author Sentiment Topic Model

Joint Author Sentiment Topic Model

Subhabrata MukherjeeMax Planck Institute for Informatics

Gaurab Basu and Sachindra JoshiIBM India Research Lab

April 25, 2014

Page 2: Joint Author Sentiment Topic Model

April 25, 2014

“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Aspect Rating and Review Rating

Page 3: Joint Author Sentiment Topic Model

April 25, 2014

“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Identify topics - direction, story and acting Story has facets - plot and narration

Aspect Rating and Review Rating

Page 4: Joint Author Sentiment Topic Model

April 25, 2014

“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Identify topics - direction, story and acting Story has facets - plot and narration

Identify facet sentiments – great (plot), powerful (story), sloppy (acting) etc.

Aspect Rating and Review Rating

Page 5: Joint Author Sentiment Topic Model

April 25, 2014

“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Identify topics - direction, story and acting Story has facets - plot and narration

Identify facet sentiments – great (plot), powerful (story), sloppy (acting) etc.

Overall review rating - aggregation of facet-specific sentiments

Aspect Rating and Review Rating

Page 6: Joint Author Sentiment Topic Model

April 25, 2014

“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Identify topics - direction, story and acting Story has facets - plot and narration

Identify facet sentiments – great (plot), powerful (story), sloppy (acting) etc.

Overall review rating - aggregation of facet-specific sentiments

Why joint modeling ? Sentiment words help locating topic words and vice-versa Neighboring words establish semantics / sentiment of terms

Aspect Rating and Review Rating

Page 7: Joint Author Sentiment Topic Model

Why Author-Specificity ?“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

April 25, 2014

Page 8: Joint Author Sentiment Topic Model

Why Author-Specificity ?“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Overall rating varies for authors with different topic preferences Positive for those with greater preference for acting and narration Negative for acting

April 25, 2014

Page 9: Joint Author Sentiment Topic Model

Why Author-Specificity ?“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Overall rating varies for authors with different topic preferences Positive for those with greater preference for acting and narration Negative for acting

Affective sentiment value varies for authors How much negative is “does not quite make the mark” for me ?

April 25, 2014

Page 10: Joint Author Sentiment Topic Model

Why Author-Specificity ?“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Overall rating varies for authors with different topic preferences Positive for those with greater preference for acting and narration Negative for acting

Affective sentiment value varies for authors How much negative is “does not quite make the mark” for me ?

Author-writing style helps in locating / associating facets and sentiments E.g. topic switch, verbosity, use of content and function words etc. The author makes a topic switch in above review using the function word

however

April 25, 2014

Page 11: Joint Author Sentiment Topic Model

Why Author-Specificity ?“ [ This film is based on a true-life incident. It sounds like a great plot and the director makes a decent attempt in narrating a powerful story. ] [ However, the film does not quite make the mark due to sloppy acting. ] ”

Overall rating varies for authors with different topic preferences Positive for those with greater preference for acting and narration Negative for acting

Affective sentiment value varies for authors How much negative is “does not quite make the mark” for me ?

Author-writing style helps in locating / associating facets and sentiments E.g. topic switch, verbosity, use of content and function words etc. The author makes a topic switch in above review using the function word

however

Traditional works learn a global model independent of the review author

April 25, 2014

Page 12: Joint Author Sentiment Topic Model

Why care about writing style or coherence?

Better association of facets to topics bydetecting semantic-syntactic class transitionsand topic switch

semantic dependencies - association betweenfacets to topics

syntactic dependencies - connection betweenfacets and background words required to makethe review coherent and grammatically correct

April 25, 2014

Page 13: Joint Author Sentiment Topic Model

Contributions

Show that author identity helps in rating prediction

April 25, 2014

Page 14: Joint Author Sentiment Topic Model

Contributions

Show that author identity helps in rating prediction

Author-specific generative model of a review thatincorporates author-specifictopic and facet preferences

April 25, 2014

Page 15: Joint Author Sentiment Topic Model

Contributions

Show that author identity helps in rating prediction

Author-specific generative model of a review thatincorporates author-specifictopic and facet preferencesgrading style

April 25, 2014

Page 16: Joint Author Sentiment Topic Model

Contributions

Show that author identity helps in rating prediction

Author-specific generative model of a review thatincorporates author-specifictopic and facet preferencesgrading stylewriting style

April 25, 2014

Page 17: Joint Author Sentiment Topic Model

Contributions

Show that author identity helps in rating prediction

Author-specific generative model of a review thatincorporates author-specifictopic and facet preferencesgrading stylewriting stylemaintain coherence in reviews

April 25, 2014

Page 18: Joint Author Sentiment Topic Model

Topic Models

April 25, 2014

Page 19: Joint Author Sentiment Topic Model

Topic Models

April 25, 2014

1. LDA Model

Page 20: Joint Author Sentiment Topic Model

Topic Models

April 25, 2014

1. LDA Model 2. Author-Topic Model

Page 21: Joint Author Sentiment Topic Model

Topic Models

April 25, 2014

1. LDA Model 2. Author-Topic Model

3. Joint Sentiment Topic Model

Page 22: Joint Author Sentiment Topic Model

Topic Models

April 25, 2014

1. LDA Model 2. Author-Topic Model

3. Joint Sentiment Topic Model 4. Topic Syntax Model

Page 23: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

Visit Restaurant

Page 24: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

Visit Restaurant

Overall Impression

…I think I will give overall rating +4

Page 25: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

Visit Restaurant

Overall Impression

…I think I will give overall rating +4

Topics to write on

I will write about food, ambience and …

Page 26: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

Visit Restaurant

Overall Impression

…I think I will give overall rating +4

Topic Ratings

I will give food +5 .It makes awesome

Pasta … my favorite !!!

But the ambience is loud… I will give it +2.But I do not care about

it much

Topics to write on

I will write about food, ambience and …

Page 27: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

Visit Restaurant

Overall Impression

…I think I will give overall rating +4

Topic Ratings

I will give food +5 .It makes awesome

Pasta … my favorite !!!

But the ambience is loud… I will give it +2.But I do not care about

it much

Topics to write on

I will write about food, ambience and …

Topic Opinion

It makes awesome Pasta. But the

ambience is loud.

Page 28: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

Visit Restaurant

Overall Impression

…I think I will give overall rating +4

Topic Ratings

I will give food +5 .It makes awesome

Pasta … my favorite !!!

But the ambience is loud… I will give it +2.But I do not care about

it much

Topics to write on

I will write about food, ambience and …

Topic Opinion

It makes awesome Pasta. But the

ambience is loud.

How to write it ?

Page 29: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

How to write it ?

Page 30: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

How to write it ?

Topic Word ? Background Word ?

Page 31: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

How to write it ?

Topic Word ? Background Word ?

New Topic ?Current Topic ?

Page 32: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

How to write it ?

Topic Word ? Background Word ?

New Topic ?Current Topic ?

Topic Label ?

Page 33: Joint Author Sentiment Topic Model

Generative Process for a Review

April 25, 2014

How to write it ?

Topic Word ? Background Word ?

New Topic ?Current Topic ?

Topic Label ?

Word

Page 34: Joint Author Sentiment Topic Model

JAST Model

Page 35: Joint Author Sentiment Topic Model

JAST Model

1. For each document d, author a chooses overall rating r ~ Multinomial(Ω) from author-specific overall document rating distribution

Page 36: Joint Author Sentiment Topic Model

JAST Model

2. For each topic z and each sentiment label l, draw ξz, l ~ Dirichlet(γ)3. For each class c and each sentiment label l = 0, draw ξc, l ~ Dirichlet(δ)

Page 37: Joint Author Sentiment Topic Model

JAST Model

4. Choose author-specific class transition distribution π

Author Writing Style

Page 38: Joint Author Sentiment Topic Model

JAST Model

5. Author a chooses author-rating specific topic-label distribution ϕa, r ~ Dirichlet(α)

Author-Topic PreferenceAuthor Emotional

Attachment to Topics

Author Grading Style

Page 39: Joint Author Sentiment Topic Model

JAST Model5. For each word w in the document

Page 40: Joint Author Sentiment Topic Model

JAST Model5. For each word w in the document

b. If c = 1, Draw z, l ~ Multinomial(ϕa,r) . Draw w ~ Multinomial(ξz,l).

Page 41: Joint Author Sentiment Topic Model

JAST Model5. For each word w in the document

b. If c = 1, Draw z, l ~ Multinomial(ϕa,r) . Draw w ~ Multinomial(ξz,l).

Page 42: Joint Author Sentiment Topic Model

JAST Model5. For each word w in the document

b. If c = 1, Draw z, l ~ Multinomial(ϕa,r) . Draw w ~ Multinomial(ξz,l).

Semantic Dependenciesand

Review Coherence

Page 43: Joint Author Sentiment Topic Model

JAST Model5. For each word w in the document

b. If c = 1, Draw z, l ~ Multinomial(ϕa,r) . Draw w ~ Multinomial(ξz,l).

Review Coherence and

Syntactic Dependencies

Page 44: Joint Author Sentiment Topic Model

JAST Model5. For each word w in the document

b. If c = 1, Draw z, l ~ Multinomial(ϕa,r) . Draw w ~ Multinomial(ξz,l).

d. If c≠ 1, 2, Draw w ~ Multinomial(ξc,l).Review Coherence

andSyntactic

Dependencies

Page 45: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 46: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 47: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 48: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 49: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 50: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 51: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 52: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 53: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 54: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 55: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 56: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 57: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 58: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 59: Joint Author Sentiment Topic Model

Inferencing

We use collapsed Gibb's sampling for estimating the parameters

Conditional distribution for joint updation of the latent variables is given by :

April 25, 2014

Page 60: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 61: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 62: Joint Author Sentiment Topic Model

Inferencing

April 25, 2014

Page 63: Joint Author Sentiment Topic Model

Dataset for Evaluation

IMDB movie review dataset TripAdvisor restaurant review dataset

April 25, 2014

Page 64: Joint Author Sentiment Topic Model

Baselines

Lexical classification using majority voting Joint Sentiment Topic Model1

Author-Topic LR Model2

Model PriorA sentiment lexicon is used to initialize the

prior polarity of words in ξT x L[w]

April 25, 2014

1. Chenghua Lin and Yulan He, Joint sentiment/topic model for sentiment analysis, CIKM '09, pp. 375-384.2. Subhabrata Mukherjee, Gaurab Basu, and Sachindra Joshi, Incorporating author preference in sentiment

rating prediction of reviews, WWW 2013.

Page 65: Joint Author Sentiment Topic Model

Model Initialization Parameters

April 25, 2014

Page 66: Joint Author Sentiment Topic Model

Model Initialization Parameters

April 25, 2014

Page 67: Joint Author Sentiment Topic Model

Model Initialization Parameters

April 25, 2014

Minimize Model Perplexity

Page 68: Joint Author Sentiment Topic Model

Model Comparison with Baselines

April 25, 2014

Page 69: Joint Author Sentiment Topic Model

Model Comparison with Baselines

April 25, 2014

IMDB Movie Review Dataset

Page 70: Joint Author Sentiment Topic Model

Model Comparison with Baselines

April 25, 2014

IMDB Movie Review Dataset

TripAdvisor Restaurant Review Dataset

Page 71: Joint Author Sentiment Topic Model

April 25, 2014

Com

paris

on w

ith T

op P

erfo

rmin

g M

odel

s in

IMD

B D

atas

et

Page 72: Joint Author Sentiment Topic Model

April 25, 2014

Com

paris

on w

ith T

op P

erfo

rmin

g M

odel

s in

IMD

B D

atas

et

Page 73: Joint Author Sentiment Topic Model

April 25, 2014

Com

paris

on w

ith T

op P

erfo

rmin

g M

odel

s in

IMD

B D

atas

et

Page 74: Joint Author Sentiment Topic Model

Snapshot of Topic-Label-Word Extraction by JAST

April 25, 2014

Page 75: Joint Author Sentiment Topic Model

Snapshot of Topic-Label-Word Extraction by JAST

April 25, 2014

Page 76: Joint Author Sentiment Topic Model

Snapshot of Topic-Label-Word Extraction by JAST

April 25, 2014

Page 77: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 78: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 79: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 80: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 81: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 82: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 83: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 84: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - TripAdvisor

April 25, 2014

Page 85: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - IMDB

April 25, 2014

Page 86: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - IMDB

April 25, 2014

Page 87: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - IMDB

April 25, 2014

Page 88: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - IMDB

April 25, 2014

Page 89: Joint Author Sentiment Topic Model

Snapshot of Author-Rating-Topic-LabelDistribution Extracted by JAST - IMDB

April 25, 2014

Page 90: Joint Author Sentiment Topic Model

Conclusions

Sentiment classification and aspect rating prediction models can be improved if author is known

April 25, 2014

Page 91: Joint Author Sentiment Topic Model

Conclusions

Sentiment classification and aspect rating prediction models can be improved if author is known

Authorship information helps in identifying author topic preferences, and author writing style to maintain review coherence Semantic-syntactic class transition and topic switch

April 25, 2014

Page 92: Joint Author Sentiment Topic Model

Conclusions

Sentiment classification and aspect rating prediction models can be improved if author is known

Authorship information helps in identifying author topic preferences, and author writing style to maintain review coherence Semantic-syntactic class transition and topic switch

JAST is unsupervised, with overhead of knowing author identity Performs better than all unsupervised/semi-supervised models and

some supervised models

April 25, 2014

Page 93: Joint Author Sentiment Topic Model

Conclusions

Sentiment classification and aspect rating prediction models can be improved if author is known

Authorship information helps in identifying author topic preferences, and author writing style to maintain review coherence Semantic-syntactic class transition and topic switch

JAST is unsupervised, with overhead of knowing author identity Performs better than all unsupervised/semi-supervised models and

some supervised models

It will be interesting to use JAST for authorship attribution task

April 25, 2014

Page 94: Joint Author Sentiment Topic Model

QUESTIONS ???

April 25, 2014