Upload
basil
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Document-level Semantic Orientation and Argumentation. Presented by Marta Tatu CS7301 March 15, 2005. or ? Semantic Orientation Applied to Unsupervised Classification of Reviews. Peter D. Turney ACL-2002. Overview. - PowerPoint PPT Presentation
Citation preview
Document-level Semantic Orientation and Argumentation
Presented by Marta TatuCS7301
March 15, 2005
or ? Semantic Orientation Applied to
Unsupervised Classification of
Reviews
Peter D. TurneyACL-2002
3
Overview Unsupervised learning algorithm for
classifying reviews as recommended or not recommended
The classification is based on the semantic orientation of the phrases in the review which contain adjectives and adverbs
4
AlgorithmInput: review
Identify phrases that contain adjectives or adverbs by using a part-of-speech tagger
Estimate the semantic orientation of each phrase
Assign a class to the given review based on the average semantic orientation of its phrasesOutput: classification ( or )
5
Step 1 Apply Brill’s part-of-speech tagger on the review Adjective are good indicators of subjective sentences. In
isolation: unpredictable steering () / plot ()
Extract two consecutive words: one is an adjective or adverb, the other provides the context
First Word Second Word Third Word(not extracted)
1. JJ NN or NNS Anything
2. RB, RBR, or RBS JJ Not NN nor NNS
3. JJ JJ Not NN nor NNS
4. NN or NNS JJ Not NN nor NNS
5. RB, RBR, or RBS VB, VBD, VBN, or VBG Anything
6
Step 2 Estimate the semantic orientation of the
extracted phrases using PMI-IR (Turney, 2001) Pointwise Mutual Information (Church and Hanks,
1989):
Semantic Orientation:
PMI-IR estimates PMI by issuing queries to a search engine (Altavista, ~350 million pages)
)()()(
221 21
21log),(PMI wordpwordpwordwordpwordword
)poor"",(PMI)excellent"",(PMI)(SO phrasephrasephrase
)excellent")hits("poor"" NEAR hits(
)poor")hits("excellent"" NEAR hits(log)(SO 2 phrase
phrasephrase
7
Step 2 – continued Added 0.01 to hits to avoid division by
zero If hits(phrase NEAR “excellent”) and hits(phrase
NEAR “poor”)≤4, then eliminate phrase Added “AND (NOT host:epinions)” to the
queries not to include the Epinions website
8
Step 3 Calculate the average
semantic orientation of the phrases in the given review
If the average is positive, then
If the average is negative, then
Phrase POS tags SOdirect deposit JJ NN 1.288
local branch JJ NN 0.421
small part JJ NN 0.053
online service JJ NN 2.780
well other RB JJ 0.237
low fees JJ NNS 0.333
…
true service JJ NN -0.732
other bank JJ NN -0.850
inconveniently located
RB VBN -1.541
Average Semantic Orientation
0.322
9
Experiments 410 reviews from Epinions
170 (41%) () 240 (59%) () Average phrases per review: 26
Baseline accuracy: 59%
Domain Accuracy Correlation
Automobiles 84.00% 0.4618
Banks 80.00% 0.6167
Movies 65.83% 0.3608
Travel Destinations 70.53% 0.4155
All 74.39% 0.5174
10
Discussion What makes the movies hard to classify?
The average SO tends to classify a recommended movies as not recommended
Evil characters make good movies The whole is not necessarily the sum of the
parts Good beaches do not necessarily add up
to a good vacation But good automobile parts usually add up
to a good automobile
11
Applications Summary statistics for search engines Summarization of reviews
Pick out the sentence with the highest positive/negative semantic orientation given a positive/negative review
Filtering “flames” for newsgroups When the semantic orientation drops below a
threshold, the message might be a potential flame
12
Questions ? Comments ? Observations ?
? Sentiment Classification using Machine Learning
Techniques
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan
EMNLP-2002
14
Overview Consider the problem of classifying
documents by overall sentiment Three machine learning methods besides
the human-generated lists of words Naïve Bayes Maximum Entropy Support Vector Machines
15
Experimental Data Movie-review domain Source: Internet Movie Database (IMDb) Stars or numerical value ratings converted
into positive, negative, or neutral » no need to hand label the data for training or testing
Maximum of 20 reviews/author/sentiment category 752 negative reviews 1301 positive reviews 144 reviewers
16
List of Words Baseline Maybe there are certain words that people tend
to use to express strong sentiments Classification done by counting the number of
positive and negative words in the document Random-choice baseline: 50%
17
Machine Learning Methods Bag-of-features framework:
{f1,…,fm} predefined set of m features
ni(d) = number of times fi occurs in document d
(Naïve Bayes)
))(,),(),(( 21 dndndnd m
)(
))|()((:)|(
)(
)|()()|(),|(maxarg
1
)(
dP
cfPcPdcP
dP
cdPcPdcPdcPc
m
i
dni
NB
c
i
18
Machine Learning Methods – continued (Maximum
Entropy)
where Fi,c is a feature/class function:
Support vector machines: Find hyperplane that maximizes the margin. The constraint optimization problem:
cj is the correct class of document dj
)),(exp()(
1: ,,
iciciME cdF
dZP
otherwise ,0
and 0)(,1:),(,
ccdncdF i
ci
w
}1,1{,0,: jjj
jjj cdcw
19
Evaluation 700 positive-sentiment and 700 negative-
sentiment documents 3 equal-sized folds The tag “NOT_” was added to every word
between a negation word (“not”, “isn’t”, “didn’t”) and the first punctuation mark “good” is opposite to “not very good”
Features: 16,165 unigrams appearing at least 4 times in
the 1400-document corpus 16,165 most often occurring bigrams in the
same data
20
Results
POS information added to differentiate between: “I love this movie” and “This is a love story”
21
Conclusion Results produced by the machine learning
techniques are better than the human-generated baselines SVMs tend to do the best Unigram presence information is the most
effective Frequency vs. presence: “thwarted
expectation”, many words indicative of the opposite sentiment to that of the entire review
Some form of discourse analysis is necessary
22
Questions ? Comments ? Observations ?
Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status
Simone Teufel and Marc MoensCL-2002
24
Overview Summarization of scientific articles:
restore the discourse context of extracted material by adding the rhetorical status of each sentence in the document
Gold standard data for summaries consisting of computational linguistics articles annotated with the rhetorical status and relevance for each sentence
Supervised learning algorithm which classifies sentences into 7 rhetorical categories
25
Why? Knowledge about the rhetorical status of
the sentence enables the tailoring of the summaries according to user’s expertise and task Nonexpert summary: background information
and the general purpose of the paper Expert summary: no background, instead
differences between this approach and similar ones
Contrasts or complementarity among articles can be expressed
26
Rhetorical Status Generalizations about the nature of scientific
texts + information to enable the construction of better summaries
Problem structure: problems (research goals), solutions (methods), and results
Intellectual attribution: what the new contribution is, as opposed to previous work and background (generally accepted statements)
Scientific argumentation Attitude toward other people’s work: rival
approach, prior approach with a fault, or an approach contributing parts of the authors’ own solution
27
Metadiscourse and Agentivity Metadiscourse is an aspect of scientific
argumentation and a way of expressing attitude toward previous work “we argue that”, “in contrast to common
belief, we” Agent roles in argumentation: rivals,
contributors of part of the solution (they), the entire research community, or the authors of the paper (we)
28
Citations and Relatedness Just knowing that an article cites another
is often not enough One needs to read the context of the
citation to understand the relation between the articles Article cited negatively or contrastively Article cited positively or in which the authors
state that their own work originates from the cited work
29
Rhetorical Annotation Scheme
Only one category assigned to each full sentence Nonoverlapping, nonhierarchical scheme The rhetorical status is determined on the basis
of the global context of the paper
30
Relevance Select important content from text Highly subjective » low human agreement Sentence is considered relevant if it
describes the research goal or states a difference with a rival approach
Other definitions: relevant sentence if it shows a high level of similarity with a sentence in the abstract
31
Corpus 80 conference articles
Association for Computational Linguistics (ACL) European Chapter of the Association for
Computational Linguistics (EACL) Applied Natural Language Processing (ANLP) International Joint Conference on Artificial
Intelligence (IJCAI) International Conference on Computational
Linguistics (COLING). XML markups added
32
The Gold Standard 3 tasked-trained annotators 17 pages of guidelines 20 hours of training No communication between annotators Evaluation measures of the annotation:
Stability Reproducibility
33
Results of Annotation Kappa coefficient K (Siegel and Castellan, 1988)
where P(A)= pairwise agreement and P(E)= random agreement
Stability: K=.82, .81, .76 (N=1,220 and k=2) Reproducibility: K=.71
)(1
)()(
EP
EPAPK
34
The System Supervised machine learning Naïve Bayes
35
Features Absolute location of a sentence
Limitations of the author’s own method can be expected to be found toward the end, while limitations of other researchers’ work are discussed in the introduction
36
Features – continued Section structure: relative and absolute
position of sentence within section: First, last, second or third, second-last or third-
last, or either somewhere in the first, second, or last third of the section
Paragraph structure: relative position of sentence within a paragraph Initial, medial, or final
37
Features – continued Headlines: type of headline of current
section Introduction, Implementation, Example,
Conclusion, Result, Evaluation, Solution, Experiment, Discussion, Method, Problems, Related Work, Data, Further Work, Problem Statement, or Non-Prototypical
Sentence length Longer or shorter than 12 words (threshold)
38
Features – continued Title word contents: does the sentence
contain words also occurring in the title? TF*IDF word contents
High values to words that occur frequently in one document, but rarely in the overall collection of documents
Do the 18 highest-scoring TF*IDF words belong to the sentence?
Verb syntax: voice, tense, and modal linguistic features
39
Features – continued Citation
Citation (self), citation (other), author name, or none + location of the citation in the sentence (beginning, middle, or end)
History: most probable previous category AIM tends to follow CONTRAST Calculated as a second pass process during
training
40
Features – continued Formulaic expressions: list of phrases described
by regular expressions, divided into 18 classes, comprising a total of 644 patterns Clustering prevents data sparseness
41
Features – continued Agent: 13 types, 167 patterns
The placeholder WORK_NOUN can be replaced by a set of 37 nouns including theory, method, prototype, algorithm
Agent classes with a distribution very similar with the overall distribution of target categories were excluded
42
Features – continued Action: 365 verbs clustered into 20 classes based
on semantic concepts such as similarity, contrast PRESENTATION_ACTIONs: present, report, state RESEARCH_ACTIONs: analyze, conduct, define, and
observe Negation is considered
43
System Evaluation 10-fold-cross-validation
44
Feature Impact The most distinctive single feature is Location,
followed by SegAgent, Citations, Headlines, Agent and Formulaic
45
Questions ? Comments ? Observations ?
46
Thank You !