Document-level Semantic Orientation and Argumentation

Document-level Semantic Orientation and Argumentation

Presented by Marta TatuCS7301

March 15, 2005

or ? Semantic Orientation Applied to

Unsupervised Classification of

Reviews

Peter D. TurneyACL-2002

3

Overview Unsupervised learning algorithm for

classifying reviews as recommended or not recommended

The classification is based on the semantic orientation of the phrases in the review which contain adjectives and adverbs

4

AlgorithmInput: review

Identify phrases that contain adjectives or adverbs by using a part-of-speech tagger

Estimate the semantic orientation of each phrase

Assign a class to the given review based on the average semantic orientation of its phrasesOutput: classification ( or )

5

Step 1 Apply Brill’s part-of-speech tagger on the review Adjective are good indicators of subjective sentences. In

isolation: unpredictable steering () / plot ()

Extract two consecutive words: one is an adjective or adverb, the other provides the context

First Word Second Word Third Word(not extracted)

1. JJ NN or NNS Anything

2. RB, RBR, or RBS JJ Not NN nor NNS

3. JJ JJ Not NN nor NNS

4. NN or NNS JJ Not NN nor NNS

5. RB, RBR, or RBS VB, VBD, VBN, or VBG Anything

6

Step 2 Estimate the semantic orientation of the

extracted phrases using PMI-IR (Turney, 2001) Pointwise Mutual Information (Church and Hanks,

1989):

Semantic Orientation:

PMI-IR estimates PMI by issuing queries to a search engine (Altavista, ~350 million pages)

)()()(

221 21

21log),(PMI wordpwordpwordwordpwordword

)poor"",(PMI)excellent"",(PMI)(SO phrasephrasephrase

)excellent")hits("poor"" NEAR hits(

)poor")hits("excellent"" NEAR hits(log)(SO 2 phrase

phrasephrase

7

Step 2 – continued Added 0.01 to hits to avoid division by

zero If hits(phrase NEAR “excellent”) and hits(phrase

NEAR “poor”)≤4, then eliminate phrase Added “AND (NOT host:epinions)” to the

queries not to include the Epinions website

8

Step 3 Calculate the average

semantic orientation of the phrases in the given review

If the average is positive, then

If the average is negative, then

Phrase POS tags SOdirect deposit JJ NN 1.288

local branch JJ NN 0.421

small part JJ NN 0.053

online service JJ NN 2.780

well other RB JJ 0.237

low fees JJ NNS 0.333

…

true service JJ NN -0.732

other bank JJ NN -0.850

inconveniently located

RB VBN -1.541

Average Semantic Orientation

0.322

9

Experiments 410 reviews from Epinions

170 (41%) () 240 (59%) () Average phrases per review: 26

Baseline accuracy: 59%

Domain Accuracy Correlation

Automobiles 84.00% 0.4618

Banks 80.00% 0.6167

Movies 65.83% 0.3608

Travel Destinations 70.53% 0.4155

All 74.39% 0.5174

10

Discussion What makes the movies hard to classify?

The average SO tends to classify a recommended movies as not recommended

Evil characters make good movies The whole is not necessarily the sum of the

parts Good beaches do not necessarily add up

to a good vacation But good automobile parts usually add up

to a good automobile

11

Applications Summary statistics for search engines Summarization of reviews

Pick out the sentence with the highest positive/negative semantic orientation given a positive/negative review

Filtering “flames” for newsgroups When the semantic orientation drops below a

threshold, the message might be a potential flame

12

Questions ? Comments ? Observations ?

? Sentiment Classification using Machine Learning

Techniques

Bo Pang, Lillian Lee and Shivakumar Vaithyanathan

EMNLP-2002

14

Overview Consider the problem of classifying

documents by overall sentiment Three machine learning methods besides

the human-generated lists of words Naïve Bayes Maximum Entropy Support Vector Machines

15

Experimental Data Movie-review domain Source: Internet Movie Database (IMDb) Stars or numerical value ratings converted

into positive, negative, or neutral » no need to hand label the data for training or testing

Maximum of 20 reviews/author/sentiment category 752 negative reviews 1301 positive reviews 144 reviewers

16

List of Words Baseline Maybe there are certain words that people tend

to use to express strong sentiments Classification done by counting the number of

positive and negative words in the document Random-choice baseline: 50%

17

Machine Learning Methods Bag-of-features framework:

{f1,…,fm} predefined set of m features

ni(d) = number of times fi occurs in document d

(Naïve Bayes)

))(,),(),(( 21 dndndnd m

)(

))|()((:)|(

)(

)|()()|(),|(maxarg

1

)(

dP

cfPcPdcP

dP

cdPcPdcPdcPc

m

i

dni

NB

c

i

18

Machine Learning Methods – continued (Maximum

Entropy)

where Fi,c is a feature/class function:

Support vector machines: Find hyperplane that maximizes the margin. The constraint optimization problem:

cj is the correct class of document dj

)),(exp()(

1: ,,

iciciME cdF

dZP

otherwise ,0

and 0)(,1:),(,

ccdncdF i

ci

w

}1,1{,0,: jjj

jjj cdcw

19

Evaluation 700 positive-sentiment and 700 negative-

sentiment documents 3 equal-sized folds The tag “NOT_” was added to every word

between a negation word (“not”, “isn’t”, “didn’t”) and the first punctuation mark “good” is opposite to “not very good”

Features: 16,165 unigrams appearing at least 4 times in

the 1400-document corpus 16,165 most often occurring bigrams in the

same data

20

Results

POS information added to differentiate between: “I love this movie” and “This is a love story”

21

Conclusion Results produced by the machine learning

techniques are better than the human-generated baselines SVMs tend to do the best Unigram presence information is the most

effective Frequency vs. presence: “thwarted

expectation”, many words indicative of the opposite sentiment to that of the entire review

Some form of discourse analysis is necessary

22


Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status

Simone Teufel and Marc MoensCL-2002

24

Overview Summarization of scientific articles:

restore the discourse context of extracted material by adding the rhetorical status of each sentence in the document

Gold standard data for summaries consisting of computational linguistics articles annotated with the rhetorical status and relevance for each sentence

Supervised learning algorithm which classifies sentences into 7 rhetorical categories

25

Why? Knowledge about the rhetorical status of

the sentence enables the tailoring of the summaries according to user’s expertise and task Nonexpert summary: background information

and the general purpose of the paper Expert summary: no background, instead

differences between this approach and similar ones

Contrasts or complementarity among articles can be expressed

26

Rhetorical Status Generalizations about the nature of scientific

texts + information to enable the construction of better summaries

Problem structure: problems (research goals), solutions (methods), and results

Intellectual attribution: what the new contribution is, as opposed to previous work and background (generally accepted statements)

Scientific argumentation Attitude toward other people’s work: rival

approach, prior approach with a fault, or an approach contributing parts of the authors’ own solution

27

Metadiscourse and Agentivity Metadiscourse is an aspect of scientific

argumentation and a way of expressing attitude toward previous work “we argue that”, “in contrast to common

belief, we” Agent roles in argumentation: rivals,

contributors of part of the solution (they), the entire research community, or the authors of the paper (we)

28

Citations and Relatedness Just knowing that an article cites another

is often not enough One needs to read the context of the

citation to understand the relation between the articles Article cited negatively or contrastively Article cited positively or in which the authors

state that their own work originates from the cited work

29

Rhetorical Annotation Scheme

Only one category assigned to each full sentence Nonoverlapping, nonhierarchical scheme The rhetorical status is determined on the basis

of the global context of the paper

30

Relevance Select important content from text Highly subjective » low human agreement Sentence is considered relevant if it

describes the research goal or states a difference with a rival approach

Other definitions: relevant sentence if it shows a high level of similarity with a sentence in the abstract

31

Corpus 80 conference articles

Association for Computational Linguistics (ACL) European Chapter of the Association for

Computational Linguistics (EACL) Applied Natural Language Processing (ANLP) International Joint Conference on Artificial

Intelligence (IJCAI) International Conference on Computational

Linguistics (COLING). XML markups added

32

The Gold Standard 3 tasked-trained annotators 17 pages of guidelines 20 hours of training No communication between annotators Evaluation measures of the annotation:

Stability Reproducibility

33

Results of Annotation Kappa coefficient K (Siegel and Castellan, 1988)

where P(A)= pairwise agreement and P(E)= random agreement

Stability: K=.82, .81, .76 (N=1,220 and k=2) Reproducibility: K=.71

)(1

)()(

EP

EPAPK

34

The System Supervised machine learning Naïve Bayes

35

Features Absolute location of a sentence

Limitations of the author’s own method can be expected to be found toward the end, while limitations of other researchers’ work are discussed in the introduction

36

Features – continued Section structure: relative and absolute

position of sentence within section: First, last, second or third, second-last or third-

last, or either somewhere in the first, second, or last third of the section

Paragraph structure: relative position of sentence within a paragraph Initial, medial, or final

37

Features – continued Headlines: type of headline of current

section Introduction, Implementation, Example,

Conclusion, Result, Evaluation, Solution, Experiment, Discussion, Method, Problems, Related Work, Data, Further Work, Problem Statement, or Non-Prototypical

Sentence length Longer or shorter than 12 words (threshold)

38

Features – continued Title word contents: does the sentence

contain words also occurring in the title? TF*IDF word contents

High values to words that occur frequently in one document, but rarely in the overall collection of documents

Do the 18 highest-scoring TF*IDF words belong to the sentence?

Verb syntax: voice, tense, and modal linguistic features

39

Features – continued Citation

Citation (self), citation (other), author name, or none + location of the citation in the sentence (beginning, middle, or end)

History: most probable previous category AIM tends to follow CONTRAST Calculated as a second pass process during

training

40

Features – continued Formulaic expressions: list of phrases described

by regular expressions, divided into 18 classes, comprising a total of 644 patterns Clustering prevents data sparseness

41

Features – continued Agent: 13 types, 167 patterns

The placeholder WORK_NOUN can be replaced by a set of 37 nouns including theory, method, prototype, algorithm

Agent classes with a distribution very similar with the overall distribution of target categories were excluded

42

Features – continued Action: 365 verbs clustered into 20 classes based

on semantic concepts such as similarity, contrast PRESENTATION_ACTIONs: present, report, state RESEARCH_ACTIONs: analyze, conduct, define, and

observe Negation is considered

43

System Evaluation 10-fold-cross-validation

44

Feature Impact The most distinctive single feature is Location,

followed by SegAgent, Citations, Headlines, Agent and Formulaic

45


46

Thank You !

Documents

Document-level Semantic Orientation and Argumentation