47
Li Dong A Statistical Parsing Framework for Sentiment Classification ProbModels@ILCC Li Dong

A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

A Statistical Parsing Framework for Sentiment Classification

ProbModels@ILCC

Li Dong

Page 2: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Input: sentence

Output: polarity label (e.g., positive / negative)

Sentence-Level Sentiment Classification

One of the most challenging problem is:

Sentiment composition

Page 3: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Lexicon-based

Lexicons (funny, dislike) + Rules (not *, * but *)

Pros: simple, interpretable

Cons: scalability

Classifier-based

Classifier (SVM, MaxEnt) + Features (n-gram, POS)

Pros: data-driven, coverage

Cons: tricks to handle sentiment compositions

Two Mainstream Methods

Page 4: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Two key components

Lexicon

Rule

Q1: Can we learn them from data?

Revisit Sentiment Composition

The movie is just so so, but i still like it.

Neg Pos

Neg but Pos -> Pos

Page 5: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Sentiment composition is not only about polarity

P(Pos|Very good) > P(Pos|Good)

Q2: Can we model polarity strength?

Revisit Sentiment Composition

The movie is just so so, but i still like it.

Neg Pos

Neg but Pos -> Pos

Page 6: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Latent sentiment structure

Q3: How do we define the sentiment structure?

Revisit Sentiment Composition

The movie is just so so, but i still like it.

Neg Pos

Neg but Pos -> Pos

Page 7: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Latent sentiment structure

Q4: Can we only use (sentence, polarity) pairs to learn latent sentiment structure?

Revisit Sentiment Composition

The movie is just so so, but i still like it.

Neg Pos

Neg but Pos -> Pos

Page 8: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Q1: learn lexicons and rules?

Q2: calculate polarity strength?

Q3: define sentiment structure?

Q4: learn structure from (sentiment, polarity)?

Overview

Page 9: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Sentiment parsing Semantic parsing

sentiment lexicons lexical triggers

(latent) sentiment tree (latent) meaning representation

(sentence, polarity) pairs (question, answer) pairs

calculate polarity strength execute query

Comparison with Semantic Parsing

𝑝 𝑙𝑎𝑏𝑒𝑙 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒,𝑚𝑜𝑑𝑒𝑙 = 𝑝 𝑙𝑎𝑏𝑒𝑙 𝑡𝑟𝑒𝑒, 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦 𝑚𝑜𝑑𝑒𝑙p(𝑡𝑟𝑒𝑒|𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒, 𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑚𝑜𝑑𝑒𝑙)

𝑝 𝑎𝑛𝑠𝑤𝑒𝑟 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛,𝑚𝑜𝑑𝑒𝑙 = 𝑝 𝑎𝑛𝑠𝑤𝑒𝑟 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛, 𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒𝑝(𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛|𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛, 𝑝𝑎𝑟𝑠𝑖𝑛𝑔 𝑚𝑜𝑑𝑒𝑙)

Sentiment parsing

Semantic parsing

deterministic

probabilistic

Page 10: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Sentiment Grammar

Latent

Page 11: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Built upon Context-Free Grammar

Gs =< Vs,Σs,S,Rs >

Vs={N,P,S,E}: non-terminal set

Σs : terminal set

S: start symbol

Rs : rewrite rule set

Sentiment Grammar

Page 12: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Dictionary rules

P -> I still like it

P -> good

Sentiment Grammar

Page 13: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Combination rules

P->N but P

P->not P

P-> very P

Sentiment Grammar

Page 14: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Glue rules

P->NP

N->NN

Sentiment Grammar

Page 15: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

OOV rules

Out-Of-Vocabulary text spans

Sentiment Grammar

Page 16: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Auxiliary rules

Combine out-of-vocabulary span and non-terminal

Sentiment Grammar

Page 17: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Start rules

Sentiment Grammar

Page 18: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Sentiment Grammar

Page 19: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Present inference rules using deductive proof systems

If

Then

Parsing Model

(Shieber, Schabes, and Pereira 1995; Goodman 1999)

Rule

Item

Page 20: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Inference Rules

P->good

N->not P

P->N but P

P->NP

E->,

P->EP

P->PE

Page 21: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Sentiment Grammar

Latent

Page 22: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Calculate polarity strength from subspans

Polarity model:

Polarity Model

Page 23: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Two constraints for polarity strength

Non-negative

Normalized to 1

Polarity Model

Page 24: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Dictionary rules

P -> I still like it

P -> good

Estimated from data

Polarity Model

Page 25: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Glue rules

P->NP

P->PN

P->PP

N->NN

Polarity Model

Page 26: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Auxiliary rules

Combine OOV span and non-terminal

OOV span is ignored

Polarity Model

Page 27: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Combination rules

P->N but P

P->not P

P-> very P

Logistic regression

Polarity Model

Page 28: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Negation (N->not P)

Switch negation (Choi and Cardie 2008; Sauri 2008)

• Simply reverse strength

• P(Neg|not good) = P(Pos|good)

Shift negation (Taboada et al. 2011)

• Problem: P(Neg|not very good) > P(Neg|not good)

• Solution: P(Neg|not good) = P(Pos|good) – fixed_value

Why is logistic regression good?

Page 29: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Intensification (P->extremely P)

Fixed intensification (Polanyi and Zaenen 2006; Kennedy and Inkpen 2006)

• P(Pos|very good) = P(Pos|good) + fixed_value (>0)

Percentage intensification (Taboada et al. 2011)

• P(Pos|very good) = P(Pos|good) * fixed_value (!=1)

Why is logistic regression good?

Page 30: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Reasons

Smooth polarity strength to (0,1)

Can learn various types of negation and intensification

Can handle contrast (P->N but P)

Why is logistic regression good?

Page 31: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Inference Rules (w/ Polarity Model)

P->good

N->not P

P->N but P

P->NP

E->,

P->EP

P->PE

Page 32: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Add side condition C for inference rules

Constraints (in Parsing Model)

Page 33: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Polarity should be consistent with non-terminal

Avoid improperly using combination rules for neutral phrase

Do not use P-> a lot of P for P->a lot of people

Constraints (in Parsing Model)

Page 34: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Inference Rules (w/ Polarity Model and Constraints)

P->good

N->not P

P->N but P

P->NP

E->,

P->EP

P->PE

Page 35: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Parsing model generates many candidates T(s)

Use log-linear model to rank and score T(s)

Ranking Model

Feature

Weight

Log-partition function

Page 36: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Ranking features decompose along trees

CYK algorithm can be used to conduct decoding

Dynamic programming

Predicted polarity

Bottom-Up Decoding

Page 37: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Maximize probability of decoding correct labels

Example: (a b c d, P)

Ranking Model Training

Regularizer

Log-likelihood of trees obtaining the correct polarity label

a b c d

N

N

a b c d

N

P

P

S S

……

a b c d

N

P

P

S

a b c d

N

N

P

S

Page 38: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

EM-like training (Liang, Jordan, and Klein 2013)

Iteration 1:

…….

Iteration N:

Learning Ranking Model

sentencebeam search and score candidates

update parameters (AdaGrad)model

K-best trees

Tree1

Tree2

Tree3

Tree4

sentencebeam search and score candidates

update parameters (AdaGrad)model

K-best trees

Tree3

Tree5

Tree8

Tree4

Page 39: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Dictionary rules (P->good, N->i dislike this)

Mine frequent fragments as candidates

Prune them using polarity strength

Learning Sentiment Grammar and Polarity Model

Problem

This is not a good movie. (negative)

Solution

Consider negation rules when learning polarity model for dictionary rules

Page 40: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Combination rules (N->not P)

Generalize dictionary rules

Polarity model: logistic regression

Learning Sentiment Grammar and Polarity Model

Page 41: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Learning Sentiment Grammar and Polarity Model

Combination rules

generalize

Dictionary rules

logistic

regression

Polarity model

Polarity modelcount

negation rules

Iteration process

Page 42: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Datasets

Experiments

user

reviews

critic

reviews

Page 43: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

55

60

65

70

75

80

85

90

RT-C PL05-C SST RT-U IMDB-U MPQA

SVM MNB LM Voting-w/Rev HardRule TreeCRF RAE MVRNN s.parser

RT-C PL05-C SST RT-U IMDB-U MPQA

Experiments

Page 44: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

70

75

80

85

90

RT-C PL05-C SST RT-U IMDB-U MPQA

LongMatch w/o Comb s.parser

AblationHeuristic ranking trees

Without combination rules

Page 45: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Combination Rules

Switch negation (Choi and Cardie 2008; Sauri 2008)

Simply reverse strength

P(Neg|i do not like) = P(Pos|like)

Shift negation (Taboada et al. 2011)

P(Neg|is not good) = P(Pos|good) – fixed_value

Page 46: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Conclusion

Sentiment Grammar

Latent

Page 47: A Statistical Parsing Framework for Sentiment Classification · Lexicons (funny, dislike) + Rules (not *, * but *) Pros: simple, interpretable Cons: scalability Classifier-based Classifier

Li Dong

Many ThanksP->thanks

P->many P