Sentiment and Subjectivity Analysis - EuroLAN 2017eurolan.info.uaic.ro/html/profs/materials/Ide.pdf · Sentiment and Subjectivity Analysis An Overview ... through classification,

Sentiment and Subjectivity AnalysisAn Overview

Alina AndreevskaiaDepartment of Computer ScienceConcordia University Montreal, QuebecCanada

Nancy IdeDepartment of Computer Science

Vassar CollegePoughkeepsie, New York

USA

Definition• Sentiment Analysis

– Also called Opinion Mining– Classify words/senses, texts, documents according to the opinion, emotion, or sentiment they express

• Applications– Determining critics’ opinions of products

– Track attitudes toward political candidates

– etc.

Sub-Tasks• Determine Subjective-Objective polarity

– Is the text or language factual or an expression of an opinion?

• Determine Positive-Negative polarity– Does the subjective text express a positive or negative opinion of the subject matter?

• Determine the strength of the opinion– Is the opinion weakly positive/negative, strongly positive/negative, or neutral?

History• Current work stems from

– Content analysis• “analysis of the manifest and latent content of a body of communicated material (as a book or film) through classification, tabulation, and evaluation of its key symbols and themes in order to ascertain its meaning and probable effect” (Webster’s Dictionary of the English Language, 1961)

• Long history – Quantitative newspaper analysis (1890-on)

– Lasswell, 1941: study of political symbols in editorials and public speeches

– Gerbner, 1969: establish “violence profiles” for different TV networks; trace trends; see how various groups portrayed

Content Analysis• Psychology and sociology

– Analysis of verbal patterns to determine • motivational, mental, personal characteristics (1940’s)

• group processes• cultural commonalities/differences (Osgood, Suci, and Tannebaum, 1957, semantic differential scales)

– Major contribution: General Inquirer• http://www.wjh.harvard.edu/~inquirer/

• Anthropology– Study of myths, riddles, folktales– Analysis of kinship terminology (Goodenough, 1972)

• Literary and Rhetorical analysis– Stylistic analysis (Sedelow and Sedelow, 1966)– Thematic analysis (Smith, 1972; Ide, 1982, 1989)

Other Roots• Point of view tracking in narrative (Banfield, 1982; Uspensky, 1973; Wiebe, 1994)– Subjectivity analysis

• Affective Computing (Picard, 1997)– Develop means to enable computer todetect and appropriately respond to user's emotions

• Directionality (Hearst, 1992)– Determine if author is positive, neutral, or negative toward some part of a document

History

• Late ‘90’s: first automatic systems implemented for NLP– Spertus, 1997– Wiebe and Bruce, 1995– Wiebe et al., 1999– Bruce and Wiebe, 2000

• Now a major research stream

Current work in NLP• Sentiment tagging

– Assignment of positive, negative, or neutral values/tags to texts and its components

– Began with focus on binary (positive-negative) classification

– Recently, include neutrals

• Little work on other types of affect– Still a focus in much content analysis work in other fields

Other Work• Assignment of fine-grained affect labels based on various

psychological theories (Valitutti et al., 2004; Strapparavaand Mihalcea, 2007)

• Detection of – opinion holders (Kim and Hovy, 2004; Kim and Hovy, 2005; Kim and

Hovy, 2006; Choi et al., 2005; Bethard et al., 2004; Kobayashi et al., 2007)

– opinion targets (Hurst and Nigam, 2004; Gamon and Aue, 2005; Huand Liu, 2004; Popescu and Etzioni, 2005; Kim and Hovy, 2006; Kobayashi et al., 2007)

– perspective (Lin et al., 2006)– pros and cons in reviews (Kim and Hovy, 2006a)– bloggers’ mood (Mishne and Glance, 2006; Mishne, 2005; Leshed

and Kaye, 2006)– happiness (Mihalcea and Liu, 2006) – politeness (Roman et al., 2005)

• Assignment of ratings to movie reviews (Pang and Lee, 2005)• Identification of support/opposition in congressional

debates (Thomas et al., 2006)• Prediction of election results (Kim and Hovy, 2007)

Subjectivity• Focuses on determining subjective words and texts that mark the presence of opinions and evaluations vs. objective words and texts, used to present factual information (Wiebe, 2000; Wiebe et al., 2004; Wiebe and Riloff, 2005)

Terminology

• Many terms– Sentiment classification, sentiment analysis

– Semantic orientation– Opinion analysis, opinion mining– Valence– Polarity– Attitude

• Here, use the term sentiment

Theories of Emotion and Affect

• Osgood's semantic differential (Osgood, Suci, and Tannenbaum, 1957) – Three recurring attitudes that people use to evaluate words and phrases• Evaluation (good-bad) • Potency (strong-weak) • Activity (active-passive)

• Ortony's salience-imbalance theory(Ortony, 1979) – defines metaphors in terms of particular relationships between topic and vehicle


• Martin’s Appraisal Framework– http://www.grammatics.com/appraisal/

• Three sub-types of attitude– Affect (emotion)

• evaluation of emotional disposition

– Judgment (ethics)• normative assessments of human behavior

– Appreciation (aesthetics)• assessments of form, appearance, composition, impact, significance etc of human artefacts and individuals


• Elliot’s Affective Reasoner– http://condor.depaul.edu/~elliott/ar.html

GRO UP SPE CIFICATI ON CATEGORY LABELAN D EMOTION TYPE

Well-Being appr aisal of a situation as an event

joy, distress

Fortune s-of-Others

presumed value of a situation as an event affecting another

happy, gloating, resen tmen t, jealous y, env y, sorr y-for

Prospe ct-based appr aisal of a situation as a prospec tive event

hop e, fear

Confirmati on appr aisal of a situation as confirming or disconfirming an expe ctation

satis faction, relief, fears-con firmed, disappoin tmen t

Attribution appr aisal of a situation as an act of some agent accounta ble

prid e, admiration, shame, repr oach

Attraction appr aisal of a situation as cont aining an a ttractive or unattractive obje ct

liking, dislik ing

Well-being/ Attribution

compound emo tions gra titud e, ange r, gra tification, remorse

Attraction/Attribution

compound emo tion extensions love, hate


• Ekman’s basic emotions– Ekman’s work revealed that facial expressions of emotion are not culturally determined, but universal to human culture and thus biological in origin

– Found expressions of anger, disgust, fear, joy, sadness, and surprise to be universal •Some evidence for contempt

Resource Development

• Semantic properties of individual words are good predictors of semantic characteristics of a phrase or a text that contain them

• Requires development of lists ofwords indicative of sentiment

Manually-created Lists• General Inquirer (GI)

– Best known extensive list of words categorized for various categories

– Developed as part of content-analysis project (Stone et al., 1966; Stone et al., 1997)

– Three main word lists: • Harvard IV-4 dictionary of content-analysis categories

(includes Osgood’s three dimensions of value, power and activity)

• Lasswell’s dictionary– eight basic value categories (WEALTH, POWER, RESPECT, RECTITUDE,

SKILL, ENLIGHTENMENT, AFFECTION, WELLBEING) plus other info

• Five categories based on social cognition work of Semin and Fiedler (1988)

– Verb and adjective types

– Some words tagged for sense – Recognized as a gold standard for evaluation of

automatically produced lists

WordNetAffect• Developed semi-automatically by Strapparava and

colleagues – assigned affect labels to words in WordNet– expanded the lists using WordNet relations such as

synonymy, antonymy, entailment, hyponymy• Includes

– semantic labels based on psychological and social science theories (Ortony, Elliot, Ekman)

– valence (positive or negative) – arousal (strength of emotion)

• 2004 version covers 1314 synsets, 3340 words • Part of WordNet Domains

– http://wndomains.itc.it/

Others• Whissell’s Dictionary of Affect in Language (DAL) (Sweeney and Whissell, 1984; Whissell, 1989; Whissell and Charuk, 1985)

• Affective Norms for English Words(ANEW) (Bradley and Lang, 1999)

• Sentiment-bearing adjectives by Hatzivassiloglou and McKeown(1997)

• Sentiment and subjectivity cluesfrom work by Wiebe

Caution

• Limitations of manually annotated lists – limited coverage – low inter-annotator agreement – diversity of the tags used

Automatically-created Lists

Corpus-based methods• Hatzivassiloglou and McKeown (1997) (HM)

– builds on the observation that some linguistic constructs, such as conjunctions, impose constraints on the semantic orientation of their constituents

– clustered adjectives from the Wall Street Journalin a graph into positive and negative sets based on the type of conjunction between them

– cluster with higher average frequency was deemed to contain positive adjectives, lower average frequency meant negative sentiment

• Limitations– algorithm limited to adjectives (also adverbs --

Turney and Littman, 2002)– requires large amounts of hand-labeled data to

produce accurate results

Web As Corpus• Peter Turney (Turney, 2002; Turney and Littman, 2002; Turney and Littman, 2003) – More general method, does not require previously annotated data for training

– Induce sentiment of a word from the strength of its association with 14 seed words with known positive or negative semantic orientation

– Two methods for association:• Point-wise mutual information• Semantic latency

– Used web as data • ran 14 queries on AltaVista using NEAR operator to acquire co-occurrence statistics with the 14 seed words

Turney, con’t

• Results evaluated against GI on a variety of test settings gave up to 97.11% accuracy for top 25% of words

• Size of the corpus had a considerable effect– 10-million word corpus instead of the full Web content reduced accuracy to 61.26-68.74%

• LSA performed relatively better than PMI on 10m word corpus– LSA more complex, harder to implement

End of An Era

• Due to its simplicity, high accuracy and domain independence, the PMI method became popular

• In 2005, AltaVista discontinued support for NEAR operator, upon which the method relied

• Attempts to substitute NEAR with AND led to considerable deterioration in system performance

Other Approaches

• Bethard et al. (2004) – Used two different methods to acquire opinion words from corpora • calculated frequency of co-occurrence with seed words taken from Hatzivassiloglou and McKeown, computed the log-likelihood ratio

• computed relative frequencies of words in subjective and objective documents

– First method produced better results for adverbs and nouns, gave higher precision but lower recall for adjectives

– Second method worked best for verbs �

Other Approaches• Kim and Hovy (2005)

– separated opinion words from non-opinion words by computing their relative frequency in subjective (editorial) and objective (non-editorial) texts from TREC data

• Riloff et al. (2003), Grefenstetteet al. (2006) – Used syntactic patterns – Learn lexico-syntactic expressions characteristic for subjective nouns

Automatically-created Lists

Dictionary-Based Methods• Addresses some of the limitations of corpus-based methods

• Use semantic resources such as WordNet, thesauri

• Two approaches:– Rely on thesaural relations between words (synonymy, antonymy, hyponymy, hyperonymy) to find similarity between seed words and other words

– Exploit information contained in definitions and glosses

Use of WordNet

• Kim and Hovy (2004,2005)– Extended word lists by using WordNet synsets

– Ranked lists based on sentiment polarity assigned to each word in synset based on WordNetdistance from positive and negative seed words

• Similar approach used by Huand Liu (2004)

SentiWordNet• Developed by Esuli and Sebastiani– Trained several classifiers to give rating for positive, negative, objective for each synset in WordNet 2.0 •Scores from 0 - 1

– Freely available

• http://sentiwordnet.isti.cnr.it

Beyond Synsets• Kamps et al. (2004)

– Tagged words with Osgood’s three semantic dimensions

– Computed shortest path through WordNet relations connecting a word to words representative of the three categories (e.g., good and bad for evaluation)

• Esuli and Sebastiani (2005)– Classified words in WordNet into positive and negative based on synsets, glosses and examples

Further Beyond Synsets• Andreevskaia and Bergler (2006)

– Take advantage of the semantic similarity between glosses and head words

– Start with a list of manually annotated words, expand with synonyms and antonyms, search WordNet glosses for occurrences of seed words

– If a gloss contains a word with known sentiment, head word deemed to have same sentiment

– Suggest that overlap measure reflects the centrality of the word in the sentiment category

Role of Neutrals• Most work cited so far classifies words as positiveor negative

• Results vary between 60-80% agreement with GI as gold standard

• Adding neutrals severely reduces accuracy by 10-20%, depending on part of speech

Problems• Words without strong positive or negative connotations are difficult to categorize accurately– Strength of +/- affinity can be used as a measure

– Highest accuracy for words on extremes of the +/- poles

• Many words have both sentiment-bearing and neutral senses– E.g., great typically tagged as positive, but according to statistics in WordNet, used neutrally 75% of occurrences

– Solve this by using sense-tagged word lists

Sense-tagged lists

• The need for sense-level sentiment annotation has recently attracted considerable attention

• Development of methods to devise sense-tagged word lists

Sense-tagged Word Lists• Andreevskaia and Bergler (2006)

– applied gloss-based sentiment tagging to sense level

– extended their system by adding a word sense disambiguation module (Senti-Sense) • used syntactic patterns to disambiguate between sentiment-bearing and neutral senses

– learned generalized adjective-noun patterns for sentiment- bearing adjectives from unambiguous data

– abstracted learned patterns to higher levels of hypernym hierarchies using predetermined propagation rules

– applied learned patterns to disambiguate adjectives with multiple senses in order to locate senses that bear sentiment

Sense-tagged Word Lists

• Esuli and Sebastiani (2007)– Application of random walk PageRanking algorithm to sentiment tagging of synsets

– Takes advantage of the graph-like structure of the WordNethierarchy

Sense-tagged Word Lists• Wiebe and Mihalcea (2006)

– Sense-level tagging for subjectivity(i.e., neutrals vs. sentiment-bearing words)

– Automatic method for sense-level sentiment tagging based on Lin’s(1998) similarity measure

– Acquire a list of top-ranking distributionally similar words

– Compute WordNet-based measure of semantic similarity for each word in the list

Beyond Positive and Negative

• Ide (2006)– Bootstrapped sense-tagged word lists for semantic

categories based on • FrameNet frames (e.g., commitment, reasoning, etc.)• GI categories (hostile/friendly weak/strong,

submit/dominate, power/cooperation)

– Treated lexical units associated with a given frame as a “bag of words”

– Used WordNet::Similarity to compute similarity among senses of the words• Relation set consisted of different pairwisecombinations of synsets, glosses, examples, hypernyms, and hyponyms

– Based on results, compute “suresenses” (strongest association) and retain; iterate

– Augment with “suresenses” for synsets, hypernyms, hyponyms

• Resulting lists very high on precision (98%), lower on recall

• Used hierarchical clustering to group senses in a given category into positive and negative and finer-grained distinctions

Results for “judgment-communication”

deride1ridicule1gibe2scoff1mock1scoff2remonstrate2

accuse1denigrate2

accuse2charge2recriminate1

condemn1decry1excoriate1deprecate1

belittle2disparage1reprehend1censure1denounce1 remonstrate3blame2castigate1

denigrate1deprecate2execrate2

acclaim1extol1laud1commend4commend1praise1cite2

ridiculeaccusechargecondemnbelittledenigrateacclaim

Manually AnnotatedCorpora

• Scarcity of manually annotated resources for system training and evaluation

• Some work uses user-created rankings in online product, book, or movie reviews– ranking scale (good-bad, liked-disliked, etc.) – easily available, fast to collect– Drawbacks

• contain significant amount of noise – erroneous ratings – misspellings– phrases in different languages

• exacerbated when researchers automatically break reviews into sentences or snippets

Available Corpora

Corpus Level of annotation

Annotation type(s)

Corpus size Link

MPQA Phrases and sentences

Private states 535 documents (10 657 sentences)

http://www.cs.pitt. edu/wiebe/mpqa

Opinion corpus

Expressions and sentences

Subjectivity and objectivity

2 sets of documents, 500 sentences each from WSJ Treebank

http://www.cs.pitt. edu/wiebe/pub4.html

Product reviews

Product features

Sentiment features

500 reviews http://www.cs.uic. edu/liub/FBS/FBS.html

SemEval-07 Task 14 dataset

Headlines Sentiment -100 to +100 polarity scale, 6 basic emotions

2225 news headlines http://www.cse. unt.edu/~rada/affectivetext/

Inter-annotator Agreement

• Rate of agreement among human annotators may reveal important insights about the task, provide a critical baseline for system evaluation

• Few inter-annotator agreement studies conducted to date

• Inter-annotator agreement on popular sentiment genres such as movie reviews, blogs, so far unexplored

• Agreement depends on – unit annotated (sentence or text)– annotation type (subjectivity or sentiment)– domain and genre

IA Studies• Sentence subjectivity labels (Wiebe et al., 1999)– annotators classified sentence as subjective if it contained any significant expression of subjectivity

– multiple rounds of training and annotation instructions adjustment

– pairwise Kappa over WSJ test set ranged from 0.59 to 0.76

Variable Results• Kim and Hovy (2004)

– Relatively high agreement (κ=0.91) between two annotators who assigned positive, negative, and n/a labels to 100 newspaper sentences

• Gamon and Aue (2005) – Similar study using car reviews produced pairwise Kappa of 0.70 - 0.80

Granularity

• Strapparava and Mihalcea (2007)– Study suggests inter-annotator agreement substantially lower on fine-grained types of annotation

– Six annotators assigned sentiment score and Ekman’s six basic emotions scores to news headlines

– Agreement (Pearson correlation) 78.01 for sentiment

– Agreement for emotion labels ranged from 36.07 (surprise) to 68.19 (sadness)

IA for texts

• Wiebe et al., 2001– Two genres

•flames (hostile, inflammatory messages)– κ=0.78

•opinion pieces from WSJ– κ=0.94-0.95

– Domain and genre may affect level of inter-annotator agreement

Sentiment Analysis• Ultimate goal of sentiment and subjectivity annotation is the analysis of clauses, sentences, and texts

• Resources :– Lists – Annotated (validated) corpora

• Subjectivity analysis uses MPQA corpus– reliable gold standard for training and evaluation of

newspaper texts

• Sentiment analysis must rely on small manually annotated test sets

– created ad hoc– seldom made publicly available

– too small for machine learning methods

Sentiment Analysis• Single text often includes both positive and negative sentences

• Sentences and clauses regarded as the most natural units for sentiment/subjectivity annotation– Sentiment usually more homogenous in sentence than whole text

– But harder to identify due to a limited number of clues

– Work on improving system performance by performing analysis simultaneously at different levels • Sentence-level analysis has high precision• Text-level analysis has high recall

Words vs. Other Features

• Words/unigrams provide good accuracy in sentence-level sentiment tagging– Yu and Hatzivassiloglou (2003)

• no significant improvement when bigrams or trigrams added to feature set

– Kim and Hovy (2004) • presence of sentiment-bearing words works better than more sophisticated scoring methods

– Riloff et al. (2006)• similar results for unigrams vs. unigrams plus bigrams, extraction patterns

• subsumption hierarchy and feature selection brought less than 1% improved accuracy

Other Features vs. Words• Some studies show gains with use of features– Wilson et al. (2005), Andreevskaia et al. (2007)

• syntactic properties of the phrase (role in sentence structure, presence of modifiers and valence shifters) yield statistically significant increases in system performance sentiment and subjectivity analysis

– Gains in subjectivity analysis:• presence of complex adjectival phrases (Bethard et al., 2004)

• similarity scores (Yu and Hatzivassiloglou, 2003• position in the paragraph (Wiebe et al., 1999)

– Gains in sentiment analysis• syntactic patterns and negation (Hurst and Nigam, 2004; Mulder et al., 2004; Andreevskaia et al., 2007)

• knowledge about the opinion holder (Kim and Hovy, 2004)

• target of the sentiment (Hu and Liu, 2004)

Multiple Features• Some experiments suggest improvement gained when multiple feature setscombined– Hatzivassiloglou and Wiebe, 2000

• combination of lists of adjectives tagged with dynamic, polarity, and gradability labels best predictor of sentence subjectivity

– Riloff et al. (2003) • accuracy of subjectivity tagging results improved with addition of each new feature (25 in all)

– Gamon and Aue (2005) • sentiment tagging using larger number of features increased average precision and recall

SemEval-2007 Affective Text Task

• Opportunity to compare systems for sentiment analysis run on the same dataset of 2000 manually annotated news headlines

• Machine learning methods had highest recall at the cost of low precision– naïve Bayes-based CLaC-NB (Andreevskaia and

Bergler, 2007) – word-space model-based SICS (Sahlgren et al.,

2007)

• Knowledge-based unsupervised approaches had highest precision, but low recall because few sentiment clues per headline– CLaC (Andreevskaia and Bergler, 2007)– UPAR7 (Chaumartin, 2007)

General Conclusion (so far)• Subjectivity classification

– statistical approaches that use naïve Bayesor SVM significantly outperform non-statistical techniques

• Binary (positive-negative) sentiment classification– non-statistical methods that rely on presence of sentiment markers in a sentence, or on strength of sentiment associated with these markers, yield as good or better accuracy than statistical approaches

• BUT: both methods show some strengths in both tasks

• Need LOTS more research

Text-level Analysis

• Features– Choice of features used by a sentiment annotation system is a critical factor

– Wide range of features used:• lists of words• lemmas or unigrams• Bigrams• higher order n-grams• part-of-speech• syntactic properties of surrounding context

• etc.

Use of Features

• Use of multiple models can improve the performance of both sentiment and subjectivity classifiers– Use of different features within the same classifier or in a community of several classifiers improves system performance

Features• Studies show improvement when using ngrams vs. unigrams (Dave et al., 2003; Cui et al., 2006; Aue and Gamon, 2005; Wiebe et al., 2001; Riloff et al., 2006)

• Improvement when n-gram models / word lists augmented with – context information (Wiebe et al., 2001b; Andreevskaia et al., 2007)

– features associated with syntactic structure(Gamon, 2004)

– combination of words annotated for semantic categories related to sentiment or subjectivity (Whitelaw et al., 2005; Fletcher and Patrick, 2005; Mullen and Collier, 2004)

Feature Selection• Costly or computationally unfeasible to use all

features– Aue and Gamon (2005)

• n-grams selected based on Expectation Maximizationalgorithm

– Riloff et al. (2006) • Use a subsumption hierarchy for n-grams (unigrams and bigrams) and extraction patterns

– if a feature’s words and dependencies are a superset of a more general ancestor in the hierarchy, discard

– only features with higher information gain were allowed to subsume less informative ones

• for both subjective/objective and positive/negative text-level classifications, a combined use of subsumption and traditional feature selection improves performance of subjective/objective and positive/negative text-level classification

Part of Speech• Subjectivity

– Adjectives best predictors of subjectivity (Hatzivassiloglou and Wiebe, 2000)

– Modals, pronouns, adverbs, cardinal numbers also used as subjectivity clues (Wiebe et al., 1999; Bruce and Wiebe, 2000)

• Sentiment– Combined use of words from all parts-of-speech produced more accurate tags (Blair et al., 2004; Salvetti et al., 2004)

Feature Generation• Most sentiment classifiers use standard machine learning techniques to learn and select features from labeled corpora– Works well when large labeled corpora available for training and validation (e.g., movie reviews)

– Falls short when training data is • scarce• different domain or topic • different time period

• Led to increased interest in unsupervised and semi-supervised approaches to feature generation

Some Promising Research

• Systems trained on a small number of labeled examples and large quantities of unlabelled in-domain data perform relatively well (Aue and Gamon, 2005)

• Structural correspondence learning applied to small number of labeled examples sufficient to adapt to new domain (Blitzer et al., 2007)

• So far performance of these methods inferior to supervised approaches and knowledge-based methods

• Availability of word lists and clues makes knowledge-based approaches an attractive alternative to supervised machine learning when labeled data is scarce

Domain Effects

• Variety of domains used in sentiment analysis– movie, music, book and other entertainment reviews

– product reviews– Blogs– dream corpus – Etc.

• Choice of domain can have a major impact on results

Movie Reviews• Popular domain in sentiment research• Positive and negative words/expressions do not necessarily convey the opinion holder’s attitude– E.g., “evil” used in movie reviews when referring to characters or plot, does not convey sentiment toward the movie itself (Turney, 2002)

• Simple counting of positive and negative clues in movie review texts insufficient

• Clues acquired from out-of-domain sources often fail

Product Reviews

• Sentiment towards whole product sum of the sentiment towards its parts, components, and attributes(Turney, 2002)

• General word lists perform better on product reviews (Turney, 2002; Kennedy and Inkpen, 2006)

Attitude Influence• Texts with positive sentiment easier to classify than negative ones – Kennedy and Inkpen, 2006; Hurst and Nigam, 2004; Dave et al., 2003; Koppel and Schler, 2006; Chaovalit and Zhou, 2005

• Possible explanations: – positive documents more uniform (Dave et al., 2003)

– positive clues have higher discriminantvalue (Koppel and Schler, 2006)

– negative texts characterized by extensive use of negations and other valence shifters that reverse the sentiment conveyed by individual words (e.g., not bad) (Pang and Lee, 2004)

Attitude Influence

• Improvement in accuracy when valence shifters taken into account (Kennedy and Inkpen, 2006; Andreevskaia et al., 2007) – but negative impact reported when negation included in feature set (Dave et al., 2003)

• Use of balanced evaluation sets with equal number of positive and negative documents has become a standard in sentiment research

Classification Algorithms

• Wide variety of classification approaches used: – simple keyword counting methods, with or without

scoring– rule-based methods– content analytical methods (statistical) – SVM, naïve Bayes and other statistical classifiers

used alone, sequentially, or as a community

• Comparison of results does not provide a definite answer as to which of these methods is the best for sentiment or subjectivity tagging – choice of features and training domain have a more

impact on accuracy than choice of classification algorithm

– comparison of performance of systems evaluated on different domains or different feature sets not conclusive

Summary

• Sentiment and subjectivity analysis has evolved into a strong research stream in NLP

• State-of-the-art systems can reach up to 90% accuracy on certain domains– But need a generally applicable method

Research Directions• Development of semi-supervised machine-learning approaches that will maximize the usefulness of the available resources and ensure domain adaptation with limited in-domain data

• Creation of reliable and extensive resources such as lists of words and expressions, syntactic patterns, combinatorial rules, and annotated corpora

• Creation of uniform ways to denote and represent sentiment and subjectivity annotation

Bibliography• Andreevskaia, A. and S. Bergler: 2007, ‘CLaC and CLaC-NB: Knowledge-

based and Corpus-based Approaches to Sentiment Tagging’. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

• Andreevskaia, A., S. Bergler, and M. Urseanu: 2007, ‘All Blogs are not made Equal’. In: International Conference on Weblogs and Social Media (ICWSM-2007). Boulder, Colorado.

• Aue, A. and M. Gamon: 2005, ‘Customizing Sentiment Classifiers to New Domains: a Case Study’. In: RANLP-05, the International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria.

• Bethard, S., H. Yu, A. Thornton, V. Hatzivassiloglou, and D. Jurafsky: 2004, ‘ Automatic Extraction of Opinion Propositions and their Holders’. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004) .Stanford University.

• Blair, L., A. Jaharria, S. Lewis, T. Oda, C. Reichenbach, J. Rueppel, and F. Salvetti: 2004, ‘Impact of Lexical Filtering on Semantic Orientation’. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.

Bibliography• Breck, E., Y. Choi, and C. Cardie: 2007, ‘Identifying expressions of

opinion in context’. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-2007). Hyderebad, India.

• Bruce, R. and J. Wiebe: 2000, ‘Recognizing subjectivity: A case study of manual processing’. Natural Language Engineering 5(2), 187–205.

• Chaovalit, P. and L. Zhou: 2005, ‘Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches’. In: Proceedings of HICSS-05, the 38th Hawaii International Conference on System Sciences.

• Chaumartin, F.-R.: 2007, ‘ UPAR7: A Knowledge-based System for Headline Sentiment Tagging’. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

• Cui, H., V. Mittal, and M. Datar: 2006, ‘Comparative Experiments on Sentiment Classification for Online Product Reviews’. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006). Boston, MA.

Bibliography• Das, S. R. and M. Y. Chen: 2001, ‘Yahoo! For Amazon: Sentiment

extraction from small talk on the Web’. In: Asia Pacific Finance Association Annual Conference (APFA01).

• Dave, K., S. Lawrence, and D. M. Pennock: 2003, ‘Mining the Peanut gallery: opinion extraction and semantic classification of product reviews’. In: (WWW03). Budapest, Hungary, pp. 519–528.

• Drezde, M., J. Blitzer, and F. Pereira: 2007, ‘Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification’. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). Prague, Czech Republic.

• Dunning, T.: 1993, ‘Accurate Methods for the Statistics of Surprise and Coincidence’. Computational Linguistics 19, 61–74.

• Ekman, P.: 1993, ‘Facial expression of emotion.’. American Psychologist 48, 384–392.

• Fletcher, J. and J. Patrick: 2005, ‘Evaluating the Utility of Appraisal Hierarchies as a Method for Sentiment Classification’. In: Proceedings of Australian Language Technology Workshop 2005. Sydney, Australia, pp. 134–142.

Bibliography• Gamon, M.: 2004, ‘Sentiment classification on customer feedback

data: noisy data, large feature vectors, and the role of linguistic analysis’. In: Proceeding of COLING-04, the 20th International Conference on Computational Linguistics. Geneva, CH, pp. 841–847.

• Gamon, M. and A. Aue: 2005, ‘Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms’. In: Proceedings of the ACL-05 Workshop on Feature Engineering for Machine Learning in Natural Language Processing. Ann Arbor, US.

• Goldberg, A. and J. Zhu: 2006, ‘Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization’. In: Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing. Boston, MA.

• Hatzivassiloglou, V. and J. Wiebe: 2000, ‘ Effects of Adjective Orientation and Gradability on Sentence Subjectivity’. In: 18th International Conference on Computational Linguistics (COLING-2000).

• Hu, M. and B. Liu: 2004, ‘Mining and summarizing customer reviews’. In: KDD-04. pp. 168–177.

• Hurst, M. and K. Nigam: 2004, ‘Retrieving topical sentiments from Online document collection’. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.

Bibliography• Kennedy, A. and D. Inkpen: 2006, ‘Sentiment Classification of Movie

Reviews Using Contextual Valence Shifters’. Computational Intelligence 22(2), 110–125.

• Kim, S.-M. and E. Hovy: 2004, ‘Determining the Sentiment of Opinions’. In: Proceedings COLING-04, the Conference on Computational Linguistics. Geneva, CH, pp. 1367–1373.

• Kim, S.-M. and E. Hovy: 2005a, ‘Automatic Detection of Opinion Bearing Words and Sentences’. In: Companion Volume to the Proceedings of IJCNLP-05, the Second International Joint Conference on Natural Language Processing. Jeju Island, KR, pp. 61–66.

• Kim, S.-M. and E. Hovy: 2005b, ‘Identifying Opinion Holders for Question Answering in Opinion Texts’. In: Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains. Pittsburgh, US.

• Kim, S.-M. and E. Hovy: 2006, ‘Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text’. In: Proceedings of the ACL/COLING Workshop on Sentiment and Subjectivity in Text. Sydney, Australia.

• Koppel, M. and J. Schler: 2006, ‘The importance of neutral examples for learning sentiment’. Computational Intelligence 22(2), 100–116.

Bibliography• Mao, Y. and G. Lebanon: 2006, ‘Sequential Models for Sentiment

Prediction’. In: Proceedings of the ICML workshop on Learning in Structured Output Spaces.

• McDonald, R., K. Hannan, T. Nevon, M. Wells, and J. Reynar: 2007, ‘Structured Models for Fine-to-Coarse Sentiment Analysis’. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007). Prague, Czech Republic.

• Mulder, M., A. Nijholt, M. den Uyl, and P. Terpstra: 2004, ‘A Lexical Grammatical Implementation of Affect’. In: Proceedings of TSD-04, the 7th International Conference Text, Speech and Dialogue, Vol. 3206 of Lecture Notes in Computer Science. Brno, CZ, pp. 171–178.

• Mullen, T. and N. Collier: 2004, ‘Sentiment analysis using support vector machines with diverse information sources’. In: Proceedings of EMNLP-04, 9th Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain.

• Pang, B. and L. Lee: 2004, ‘A sentiment education: Sentiment analysis using subjectivity summarization based on minimum cuts’. In: ACL. pp. 271–278. arXiv:cs.CL/040958.

Bibliography• Pang, B., L. Lee, and S. Vaithyanathan: 2002, ‘Thumbs up? Sentiment

classification using machine learning techniques’. In: Conference on Empirical Methods in Natural Language Processing (EMNLP-2002). pp. 79–86.

• Read, J.: 2005, ‘Using emoticons to reduce dependency in machine learning techniques for sentiment classification’. In: Proceedings of the ACL-2005 Student Research Workshop. Ann Arbor, MI.

• Riloff, E., S. Patwardhan, and J. Wiebe: 2006, ‘Feature Subsumptionfor Opinion Analysis’. In: Proceedings of EMNLP-06, the Conference on Empirical Methods in Natural Language Processing. Sydney, AUS, pp. 440–448.

• Riloff, E., J. Wiebe, and T. Wilson: 2003, ‘Learning subjective nouns using extraction pattern bootstrapping’. In: W. Daelemans and M. Osborne (eds.): Proceedings of CONLL-03, 7th Conference on Natural Language Learning. Edmonton, CA, pp. 25–32.

• Sahlgren, M., J. Karlgren, and G. Eriksson: 2007, ‘ SICS: Valence Annotation Based on Seeds in Word Space’. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

BibliographySalvetti, F., S. Lewis, and C. Reichenbach: 2004, ‘Impact of lexical

filtering on overall polarity identification’. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.

• Snyder, B. and R. Barzilay: 2007, ‘Multiple Aspect Ranking using the Good Grief Algorithm’. In: Proceedings of NAACL-2007. Washington, DC.

• Stoyanov, V., C. Cardie, D. Litman, and J. Wiebe: 2004, ‘Evaluating an Opinion Annotation Scheme Using a New Multi-Perspective Question and Answer Corpus’. In: J. G. Shanahan, J. Wiebe, and Y. Qu (eds.): Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Stanford, US.

• Strapparava, C. and R. Mihalcea: 2007, ‘SemEval-2007 Task 14: Affective Text’. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

• Turney, P.: 2002, ‘Thumbs up or thumbs down? Semantic orientation applied to un-supervised classification of reviews’. In: 40th Annual Meeting of the Association of Computational Linguistics (ACL02)). pp. 417–424.

Bibliography• Whitelaw, C., N. Garg, and S. Argamon: 2005, ‘Using Appraisal

Taxonomies for Sentiment Analysis’. In: Proceedings of CIKM-05, the ACM SIGIR Conference on Information and Knowledge Management. Bremen, Germany.

• Wiebe, J.: 2002, ‘Instructions for Annotating Opinions in Newspaper Artic’. Technical Report TR-01-101, University of Pittsburgh, Department of Computer Science, Pittsburgh, PA.

• Wiebe, J., E. Breck, C. Buckley, C. Cardie, P. Davis, B. Fraser, D. Litman, D. Pierce, E. Riloff, T. Wilson, D. Day, and M. Maybury: 2003, ‘Recognizing and Organizing Opinions Expressed in World Press’. In: Proceedings of the AA AI Spring Symposium on New Directions in Question Answering.

• Wiebe, J., R. Bruce, M. Bell, M. Martin, and T. Wilson: 2001a, ‘A corpus study of Evaluative and Speculative Language’. In: Proceedings of the 2nd ACL SIGDial Workshop on Discourse and Dialogue). Aalberg, Denmark.

• Wiebe, J. and E. Riloff: 2005, ‘Creating Subjective and Objective Sentence Classifiers from Unannotated Texts’. In: Proceeding of CICLing-05, International Conference on Intelligent Text Processing and Computational Linguistics, Vol. 3406 of Lecture Notes in Computer Science. Mexico City, MX, pp. 475–486.

Bibliography• Wiebe, J., T. Wilson, and M. Bell: 2001b, ‘Identifying Collocations

for Recognizing Opinions’. In: Proceedings of the ACL Workshop on collocation. Toulouse, France.

• Wiebe, J., T. Wilson, and C. Cardie: 2005, ‘Annotating expressions of opinions and emotions in language’. Language Resources and Evaluation 39(2–3), 165–210.

• Wiebe, J. M., R. F. Bruce, and T. P. OHara: 1999, ‘ Development and Use of a Gold-Standard Data Set for Subjectivity Classifications ’. In: 37th Annual Meeting of the Association for Computational Linguistics (ACL-99). pp. 246–253.

• Wilson, T. and J. Wiebe: 2003, ‘Annotating Opinions in the World Press’. In: 4th SIGdial Workshop on Discourse and Dialogue (SIGdial-03).

• Wilson, T., J. Wiebe, and R. Hwa: 2006, ‘Recognizing Strong and Weak Opinion Clauses’. Computational Intelligence 2(22), 73–99.

• Yu, H. and V. Hatzivassiloglou: 2003, ‘Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences’. In: M. Collins and M. Steedman(eds.): Proceedings of EMNLP-03, 8th Conference on Empirical Methods in Natural Language Processing. Sapporo, Japan, pp. 129–136.

Documents

Sentiment and Subjectivity Analysis - EuroLAN 2017eurolan.info.uaic.ro/html/profs/materials/Ide.pdf · Sentiment and Subjectivity Analysis An Overview ... through classification,