292
XIV° Doctorate Course in Communication Science Department of Political, Social and Communication Science University of Salerno Doctoral Thesis in Computational Linguistics Detecting Subjectivity through Lexicon-Grammar Strategies Databases, Rules and Apps for the Italian Language Serena Pelosi Matr. 8880300067 Supervisor Prof. Annibale Elia Coordinator Prof. Alessandro Laudanna A.A. 2015/2016

Detecting Subjectivity through Lexicon-Grammar Strategies

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Detecting Subjectivity through Lexicon-Grammar Strategies

XIVdeg Doctorate Course in Communication ScienceDepartment of Political Social and Communication Science

University of Salerno

Doctoral Thesisin Computational Linguistics

Detecting Subjectivitythrough Lexicon-Grammar Strategies

Databases Rules and Apps for the Italian Language

Serena PelosiMatr 8880300067

SupervisorProf Annibale Elia

CoordinatorProf Alessandro Laudanna

AA 20152016

To Mariashe is in every

elementary sentenceof my mind

Contents

1 Introduction 111 Research Context 212 Filtering Information Automatically 513 Subjectivity Emotions and Opinions 714 Structure of the Thesis 9

2 Theoretical Background 1321 The Lexicon-Grammar Theoretical Framework 13

211 Distributional and Transformational Analysis 16212 Formal Representation of Elementary Sentences 19

2121 The Continuum from Simple Sentences to Polyrematic Structures 212122 Against the Bipartition of the Sentence Structure 22

213 Lexicon-Grammar Binary Matrices 25214 Syntax-Semantics Interface 27

2141 Frame Semantics 292142 Semantic Predicates Theory 30

22 The Italian Module of Nooj 34221 Electronic Dictionaries 35222 Local grammars and Finite-state Automata 36

3 SentIta Lexicon-Grammar based Sentiment Resources 3931 Lexicon-Grammar of Sentiment Expressions 3932 Literature Survey on Subjectivity Lexicons 43

321 Polarity Lexicons 46322 Affective Lexicons 51323 Subjectivity Lexicons for Other Languages 53

3231 Italian Resources 533232 Sentiment Lexical Resources for Different Languages 55

33 SentIta and its Manually-built Resources 56

v

vi CONTENTS

331 Semantic Orientations and Prior Polarities 58332 Adjectives of Sentiment 60

3321 A Special Remark on Negative Words in Dictionaries and Posi-tive Biases in Corpora 62

333 Psychological and other Opinion Bearing Verbs 633331 A Brief Literature Survey on the Classification of Psych Verbs 633332 Semantic Predicates from the LG Framework 643333 Psychological Predicates 663334 Other Verbs of Sentiment 78

334 Psychological Predicatesrsquo Nominalizations 793341 The Presence of Support Verbs 843342 The Absence of Support Verbs 853343 Ordinary Verbs Structures 863344 Standard-Crossed Structures 89

335 Psychological Predicatesrsquo Adjectivalizations 893351 Support Verbs 953352 Nominal groups with appropriate nouns Napp 973353 Verbless Structures 1003354 Evaluation Verbs 101

336 Vulgar Lexicon 10234 Towards a Lexicon of Opinionated Compounds 107

341 Compound Adverbs and their Polarities 1073411 Compound Adjectives of Sentiment 111

342 Frozen Expressions of Sentiment 1123421 The Lexicon-Grammar Classes Under Examination PECO

CEAC EAA ECA and EAPC 1153422 The Formalization of Sentiment Frozen Expressions 118

35 The Automatic Expansion of SentIta 125351 Literature Survey on Sentiment Lexicon Propagation 126

3511 Thesaurus Based Approaches 1273512 Corpus Based Approaches 1303513 The Morphological Approach 132

352 Deadjectival Adverbs in -mente 1353521 Semantics of the -mente formations 1393522 The Superlative of the -mente Adverbs 141

353 Deadjectival Nouns of Quality 141354 Morphological Semantics 147

4 Shifting Lexical Valences through FSA 15341 Contextual Valence Shifters 15342 Polarity Intensification and Downtoning 154

421 Related Works 154

CONTENTS vii

422 Intensification and Downtoning in SentIta 155423 Excess Quantifiers 158

43 Negation Modeling 160431 Related Works 160432 Negation in SentIta 163

44 Modality Detection 170441 Related Works 170442 Modality in SentIta 172

45 Comparative Sentences Mining 175451 Related Works 175452 Comparison in SentIta 178453 Interaction with the Intensification Rules 179454 Interaction with the Negation Rules 182

5 Specific Tasks of the Sentiment Analysis 18551 Sentiment Polarity Classification 185

511 Related Works 186512 DOXA a Sentiment Classifier based on FSA 188513 A Dataset of Opinionated Reviews 189514 Experiment and Evaluation 191

5141 Sentence-level Sentiment Analysis 1925142 Document-level Sentiment Analysis 195

52 Feature-based Sentiment Analysis 197521 Related Works 198522 Experiment and Evaluation 203523 Opinion Visual Summarization 210

53 Sentiment Role Labeling 211531 Related Works 213532 Experiment and Evaluation 215

6 Conclusion 223

A Italian Lexicon-Grammar Symbols 229

B Acronyms 231

Bibliography 239

Chapter 1

Introduction

Consumers as Internet users can freely share their thoughts withhuge and geographically dispersed groups of people competing thisway with the traditional power of marketing and advertising channelsDifferently from the traditional word-of-mouth which is usually lim-ited to private conversations the Internet used generated contentscan be directly observed and described by the researchers (Pang andLee 2008a p 7) identified the 2001 as the ldquobeginning of widespreadawareness of the research problems and opportunities that SentimentAnalysis and Opinion Mining raise and subsequently there have beenliterally hundreds of papers published on the subjectrdquoThe context that facilitated the growth of interest around the auto-matic treatment of opinion and emotions can be summarized in thefollowing points

bull the expansion of the e-commerce (Matthews et al 2001)

bull the growth of the user generated contents (forum discussiongroup blog social media review website aggregation site) thatcan constitute large scale databases for the machine learning al-gorithms training (Chin 2005 Pang and Lee 2008a)

1

2 CHAPTER 1 INTRODUCTION

bull the rise of machine learning methods (Pang and Lee 2008a)

bull the importance of the on-line Word of Mouth (eWOM) (Gruenet al 2006 Lee and Youn 2009 Park and Lee 2009 Chu and Kim2011)

bull the customer empowerment (Vollero 2010)

bull the large volume the high velocity and the wide variety of un-structured data (Russom et al 2011 Villars et al 2011 McAfeeet al 2012)

The same preconditions caused a parallel raising of attention alsoin economics and marketing literature studies Basically through usergenerated contents consumers share positive or negative informationthat can influence in many different ways the purchase decisionsand can model the buyer expectations above all with regard to experi-ence goods (Nakayama et al 2010) such as hotels (Ye et al 2011 Nel-son 1970) restaurants (Zhang et al 2010) movies (Duan et al 2008Reinstein and Snyder 2005) books (Chevalier and Mayzlin 2006) orvideogames (Zhu and Zhang 2006 Bounie et al 2005)

11 Research Context

Experience goods are products or services whose qualities cannot beobserved prior to purchase They are distinguished from search goods(eg clothes smartphones) on the base of the possibility to test thembefore the purchase and depending on the information costs (Nelson1970) (Akerlof 1970 p 488-490) clearly exemplified the problemscaused by the interaction of quality differences and uncertainty

ldquoThere are many markets in which buyers use some marketstatistic to judge the quality of prospective purchases []The automobile market is used as a finger exercise to illus-trate and develop these thoughts [] The individuals in this

11 RESEARCH CONTEXT 3

market buy a new automobile without knowing whether thecar they buy will be good or a lemon1 [] After owninga specific car however for a length of time the car ownercan form a good idea of the quality of this machine [] Anasymmetry in available information has developed for thesellers now have more knowledge about the quality of a carthan the buyers [] The bad cars sell at the same price asgood cars since it is impossible for a buyer to tell the differ-ence between a good and a bad car only the seller knowsrdquo

Asymmetric information could damage both the customers whenthey are not able to recognize good quality products and servicesand the market itself when it is negatively affected by the presenceof a smaller quantity of quality products with respect to a quantityof quality products that would have been offered under the conditionof symmetrical information (Lavanda and Rampa 2001 von Ungern-Sternberg and von Weizsaumlcker 1985)However it has been deemed possible for experience goods to basean evaluation of their quality on the past experiences of customersthat in previous ldquoperiodsrdquo had already experienced the same goods

ldquoIn most real world situations suppliers stay on the marketfor considerably longer than one lsquoperiodrsquo They then havethe possibility to build up reputations or goodwill with theconsumers This is due to the following mechanism Whilethe consumers cannot directly observe a productrsquos quality atthe time of purchase they may try to draw inferences aboutthis quality from the past experience they (or others) havehad with this supplierrsquos productsrdquo (von Ungern-Sternbergand von Weizsaumlcker 1985 p 531)

Possible sources of information for buyers of experience goods areword of mouth and the good reputation of the sellers (von Ungern-Sternberg and von Weizsaumlcker 1985 Diamond 1989 Klein and Lef-

1A defective car discovered to be bad only after it has been bought

4 CHAPTER 1 INTRODUCTION

fler 1981 Shapiro 1982 1983 Dewally and Ederington 2006)The possibility to survey other customersrsquo experiences increases thechance of selecting a good quality service and the customer expectedutility It is affordable to continue with this search until the cost ofsuch is equated to its expected marginal return This is the ldquooptimumamount of searchrdquo defined by Stigler (1961)The rapid growth of the Internet drew the managers and business aca-demics attention to the possible influences that this medium can ex-ert on customers information search behaviors and acquisition pro-cesses The hypothesis that the Web 20 as interactive medium couldmake transparent the experience goods market by transforming it intoa search goods market has been explored by Klein (1998) who hasbased his idea on three factors

bull lower search costs

bull different evaluation of some product (attributes)

bull possibility to experience products (attributes) virtually withoutphysically inspecting them

Actually this idea seems too groundbreaking to be entirely true In-deed the real outcome of web searches depends on the quantity ofavailable information on its quality and on the possibility to makefruitful comparisons among alternatives (Alba et al 1997 Nakayamaet al 2010)Therefore from this perspective it is true that the growth of the usergenerated contents and the eWOM can truly reduce the informa-tion search costs On the other hand the distance increased by e-commerce the content explosion and the information overload typi-cal of the Big Data age can seriously hinder the achievement of a sym-metrical distribution of the information affecting not only the marketof experience goods but also that of search goods

12 FILTERING INFORMATION AUTOMATICALLY 5

12 Filtering Information Automatically

The last remarks on the influence of Internet on information searchesinevitably reopen the long-standing problems connected to web in-formation filtering (Hanani et al 2001) and to the possibility of auto-matically ldquoextracting value from chaosrdquo (Gantz and Reinsel 2011)An appropriate management of online corporate reputation requiresa careful monitoring of the new digital environments that strengthenthe stakeholdersrsquo influence and independence and give support dur-ing the decision making process As an example to deeply analyzethem can indicate to a company the areas that need improvementsFurthermore they can be a valid support in the price determinationor in the demand prediction (Bloom 2011 Pang and Lee 2008a)It would be difficult indeed for humans to read and summarize sucha huge volume of data about costumer opinions However in otherrespects to introduce machines to the semantic dimension of humanlanguage remains an ongoing challengeIn this context business and intelligence applications would play acrucial role in the ability to automatically analyze and summarize notonly databases but also raw data in real timeThe largest amount of on-line data is semistructured or unstructuredand as a result its monitoring requires sophisticated Natural Lan-guage Processing (NLP) tools that must be able to pre-process themfrom their linguistic point of view and then automatically accesstheir semantic contentTraditional NLP tasks consist in Data Mining Information Retrievaland Extraction of facts objective information about entities fromsemistructured and unstructured textsIn any case it is of crucial importance for both customers and compa-nies to dispose of automatically extracted analyzed and summarizeddata which do not include only factual information but also opinionsregarding any kind of good they offer (Liu 2010 Bloom 2011)Companies could take advantage of concise and comprehensive cus-

6 CHAPTER 1 INTRODUCTION

tomer opinion overviews that automatically summarize the strengthsand the weaknesses of their products and services with evident ben-efits in term of reputation management and customer relationshipmanagementCustomer information search costs could be decreased trough thesame overviews which offer the opportunity to evaluate and comparethe positive and negative experiences of other consumers who havealready tested the same products and servicesObviously the advantages of sophisticated NLP methods and soft-ware and their ability to distinguish factual from opinionated lan-guage are not limited to the ones discussed so far but they are dis-persed and specialized among different tasks and domains such as

Ads placement possibility to avoid websites that are irrelevant butalso unfavorable when placing advertisements (Jin et al 2007)

Question-answering chance to distinguish between factual andspeculative answers (Carbonell 1979)

Text summarization potential of summarizing different opinionsand perspectives in addition to factual information (Seki et al2005)

Recommendation systems opportunity to recommend only the itemswith positive feedback (Terveen et al 1997)

Flame and cyberbullying detection ability to find semantically andsyntactically complex offensive contents on the web (Xiang et al2012 Chen et al 2012 Reynolds et al 2011)

Literary reputation tracking possibility to distinguish disapprovingfrom supporting citations (Piao et al 2007 Taboada et al 2006)

Political texts analysis opportunity to better understand positive ornegative orientations of both voters or politicians (Laver et al2003 Mullen and Malouf 2006)

13 SUBJECTIVITY EMOTIONS AND OPINIONS 7

13 Subjectivity Emotions and Opinions

Sentiment Analysis Opinion Mining subjectivity analysis review min-ing appraisal extraction affective computing are all terms that refer tothe computational treatment of opinion sentiment and subjectivity inraw text Their ultimate shared goal is enabling computers to recog-nize and express emotions (Pang and Lee 2008a Picard and Picard1997)The first two terms are definitely the most used ones in literature Al-though they basically refer to the same subject they have demon-strated varying degrees of success into different research communi-ties While Opinion Mining is more popular among web informationretrieval researchers Sentiment Analysis is more used into NLP en-vironments (Pang and Lee 2008a) Therefore the coherence of ourworkrsquos approach is the only reason that justifies the favor of the termSentiment Analysis All the terms mentioned above could be used andinterpreted nearly interchangeablySubjectivity is directly connected to the expression of private statespersonal feelings or beliefs toward entities events and their proper-ties (Liu 2010) The expression private state has been defined by Quirket al (1985) and Wiebe et al (2004) as a general covering term for opin-ions evaluations emotions and speculationsThe main difference from objectivity regards the impossibility to di-rectly observe or verify subjective language but neither subjective norobjective implies truth ldquowhether or not the source truly believes theinformation and whether or not the information is in fact true areconsiderations outside the purview of a theory of linguistic subjectiv-ityrdquo (Wiebe et al 2004 p 281)Opinions are defined as positive or negative views attitudes emotionsor appraisals about a topic expressed by an opinion holder in a giventime They are represented by Liu (2010) as the following quintuple

oj fjk ooijkl hi tl

8 CHAPTER 1 INTRODUCTION

Where oj represents the object of the opinion fjk its features ooijklthe positive or negative opinion semantic orientation hi the opinionholder and tl the time in which the opinion is expressedBecause the time can almost alway be deducted from structured datawe focused our work on the automatic detection and annotation ofthe other elements of the quintupleAs regard the opinion holder it must be underlined that ldquoIn the caseof product reviews and blogs opinion holders are usually the authorsof the posts Opinion holders are more important in news articles be-cause they often explicitly state the person or organization that holdsa particular opinionrdquo (Liu 2010 p 2)The Semantic Orientation (SO) gives a measure to opinions byweighing their polarity (positivenegativeneutral) and strength (in-tenseweak) (Taboada et al 2011 Liu 2010) The polarity can be as-signed to words and phrases inside or outside of a sentence or dis-course context In the first case it is called prior polarity (Osgood1952) in the second case contextual or posterior polarity (Gatti andGuerini 2012)We can refer to both opinion objects and features with the term target(Liu 2010) represented by the following function

T=O(f)

Where the object can take the shape of products services individ-uals organizations events topics etc and the features are compo-nents or attributes of the objectEach object O represented as a ldquospecial featurerdquo which is defined by asubset of features is formalized in the following way

F = f1 f2 fn

Targets can be automatically discovered in texts through both syn-onym words and phrases Wi or indicators Ii (Liu 2010)

Wi = wi1 wi2 wim

14 STRUCTURE OF THE THESIS 9

Ii = ii1 ii2 iiq

Emotions constitute the research object of the Emotion Detec-tion (Strapparava et al 2004 Fragopanagos and Taylor 2005 Almet al 2005 Strapparava et al 2006 Neviarouskaya et al 2007 Strap-parava and Mihalcea 2008 Whissell 2009 Argamon et al 2009Neviarouskaya et al 2009b 2011 de Albornoz et al 2012) a very ac-tive NLP subfield that encroaches on Speech Recognition (Buchananet al 2000 Lee et al 2002 Pierre-Yves 2003 Schuller et al 2003Bhatti et al Schuller et al 2004) and Facial Expression Recognition(Essa et al 1997 Niedenthal et al 2000 Pantic and Patras 2006 Palet al 2006 De Silva et al 1997) In this NLP based work we focus onthe Emotion Detection from written raw textsIn order to provide a formal definition of sentiments we refer to thefunction of Gross (1995)

P(senth)

Caus(s sent h)

In the first one the experiencer of the emotion (h) is function ofa sentiment (Sent) through a predicative relationship the latter ex-presses a sentiment that is caused by a stimulus (s) on a person (h)

14 Structure of the Thesis

In this Chapter we introduced the research field of the SentimentAnalysis placing it in the wider framework of Computational Linguis-tics and Natural Language Processing We tried to clarify some termi-nological issues raised by the proliferation and the overlap of termsin the literature studies on subjectivity The choice of these topics hasbeen justified through the brief presentation of the research contextwhich poses numerous challenges to all the Internet stakeholdersThe main aim of the thesis has been identified with the necessity to

10 CHAPTER 1 INTRODUCTION

treat both facts and opinion expressed in the the Web in the form ofraw data with the same accuracy and velocity of the data stored indatabase tablesDetails about the contribution provided by this work to SentimentAnalysis studies are structured as followsChapter 2 introduces the theoretical framework on which the wholework has been grounded the Lexicon-Grammar (LG) approach withan in-depth examination of the distributional and transformationalanalysis and the semantic predicates theory (Section 21)The assumption that the elementary sentences are the basic discourseunits endowed with meaning and the necessity to replace a uniquegrammar with plenty of lexicon-dependent local grammars are thefounding ideas from the LG approach on which the research hasbeen designed and realizedThe formalization of the sentiment lexical resources and tools as beenperformed through the tools described in Section 22Coherently with the LG method our approach to Sentiment Analy-sis is lexicon-based Chapter 3 provides a detailed description of thesentiment lexicon built ad hoc for the Italian language It includesmanually annotated resources for both simple (Section 33) and mul-tiword (Section 34) units and automatically derived annotations (Sec-tion 35) for the adverbs in -mente and for the nouns of qualityBecause the wordrsquos meaning cannot be considered out of contextChapter 3 explores the influences of some elementary sentence struc-tures on the sentiment words occurring in them while Chapter 4 in-vestigates the effects on the sentence polarities of the so called Con-textual Valence Shifters (CVS) namely intensification and downtoning(Section 42) Negation (Section 43) Modality (Section 44) and Com-parison (Section 45)Chapter 5 focuses on the three most challenging subtasks of the Senti-ment Analysis the sentiment polarity classification of sentences anddocuments (Section 51) the Sentiment Analysis based on features(Section 52) and the sentiment-based semantic role labeling (Sec-

14 STRUCTURE OF THE THESIS 11

tion 53) Experiments about these tasks are conduced through ourresources and rules respectively on a multi-domain corpus of cus-tomer reviews on a dataset of hotels comments and on a dataset thatmixes tweets and news headlinesConclusive remarks and limitations of the present work are reportedin Chapter 6Due to the large number of challenges faced in this work and in or-der to avoid penalizing the clarity and the readability of the thesiswe preferred to mention the state of the art works related to a giventopic directly in the sections in which the topic is discussed For easyreference we anticipate that the literature survey on the already exis-tent sentiment lexicons is reported in Section 32 the methods testedin literature for the lexicon propagation are presented in 351 worksrelated to intensification and negation modeling to modality detec-tion and to comparative sentence mining are respectively mentionedin Sections 421 431 441 and 451 in the end state of the artapproaches to sentiment classification the feature-based SentimentAnalysis and the Sentiment Role Labeling (SRL) tasks are cited in Sec-tions 511 521 and 531Some Sections of this thesis refer to works that in part have been al-ready presented to the international computational linguistics com-munity These are the cases of Chapter 3 briefly introduced in Pelosi(2015ab) and Pelosi et al (2015) and Chapter 4 anticipated in Pelosiet al (2015) Furthermore the results presented in Section 51 havebeen discussed in Maisto and Pelosi (2014b) as well as the results ofSection 52 in Maisto and Pelosi (2014a) and the ones of Section 53 inElia et al (2015)

Chapter 2

Theoretical Background

21 The Lexicon-Grammar Theoretical Framework

With Lexicon-Grammar we mean the method and the practice offormal description of the natural language introduced by MauriceGross in the second half of the 1960s who during the verificationof some procedures from the transformational-generative grammar(TGG) (Chomsky 1965) laid the foundations for a brand new theo-retical frameworkMore specifically during the description of French complementclause verbs through a transformational-generative approach Gross(1975) realized that the English descriptive model of Rosenbaum(1967) was not enough to take into account the huge number of ir-regularities he found in the French languageLG introduced important changes in the way in which the relationshipbetween lexicon and syntax was conceived (Gross 1971 1975) It hasbeen underlined for the first time the necessity to provide linguisticdescriptions grounded on the systematic testing of syntactic and se-mantic rules along the whole lexicon and not only on a limited set of

13

14 CHAPTER 2 THEORETICAL BACKGROUND

speculative examplesActually in order to define what can be considered to be a rule andwhat must be considered an exception a small sample of the lexiconof the analyzed language collected in arbitrary ways can be truly mis-leading It is assumed to be impossible to make generalizations with-out verifying or falsifying the hypotheses According with (Gross 1997p 1)

ldquoGrammar or as it has now been called linguistic the-ory has always been driven by a quest for complete gen-eralizations resulting invariably in recent times in the pro-duction of abstract symbolism often semantic but alsoalgorithmic This development contrasts with that of theother Natural Sciences such as Biology or Geology wherethe main stream of activity was and is still now the searchand accumulation of exhaustive data Why the study of lan-guage turned out to be so different is a moot question Onecould argue that the study of sentences provides an endlessvariety of forms and that the observer himself can increasethis variety at will within his own production of new formsthat would seem to confirm that an exhaustive approachmakes no sense []But grammarians operating at the level of sentences seemto be interested only in elaborating general rules and do sowithout performing any sort of systematic observation andwithout a methodical accumulation of sentence forms to beused by further generations of scientistsrdquo

In the LG methodology it is crucial the collection and the analysisof a large quantity of linguistic facts and their continuous compari-son with the reality of the linguistic usages by examples and counter-examplesThe collection of the linguistic information is constantly registeredinto LG tables that cross-check the lexicon and the syntax of any given

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 15

language The ultimate purpose of this procedure is the definition ofwide-ranging classes of lexical items associated to transformationaldistributional and structural rules (DrsquoAgostino 1992)What emerges from the LG studies is that associating more than fiveor six properties to a lexical entries each one of such entries shows anindividual behavior that distinguishes it from any other lexical itemHowever it is always possible to organize a classification around at listone definitional property that is simultaneously accepted by all theitem belonging to a same LG class and for this reason is promoted asdistinctive feature of the classHaving established this it becomes easier to understand the necessityto replace ldquotherdquo grammar with thousands of lexicon-dependent localgrammarsThe choice of this paradigm is due to its compatibility with the pur-poses of the present work and with the computational linguistics ingeneral that in order to reach high performances in results requiresa large amount of linguistic data which must be as exhaustive repro-ducible and well organized as possibleThe huge linguistic datasets produced over the years by the inter-national LG community provide fine-grained semantic and syntac-tic descriptions of thousands of lexical entries also referred as ldquolexi-cally exhaustive grammarsrdquo (DrsquoAgostino 2005 DrsquoAgostino et al 2007Guglielmo 2009) available for reutilization in any kind of NLP task1The LG classification and description of the Italian verbs2 (Elia et al1981 Elia 1984 DrsquoAgostino 1992) is grounded on the differentiationof three different macro-classes transitive verbs intransitive verbsand complement clause verbs Every LG class has its own definitionalstructure that corresponds with the syntactic structure of the nuclearsentence selected by a given number of verbs (eg V for piovere ldquotorain and all the verbs of the class 1 N0 V for bruciare ldquoto burn and

1 LG descriptions are available for 4000+ nouns that enter into verb support constructions 7000+verbs 3000+ multiword adverbs and almost 1000 phrasal verbs (Elia 2014a)

2LG tables are freely available at the address httpdscunisaitcompostitavolecombo

tavoleasp

16 CHAPTER 2 THEORETICAL BACKGROUND

the other verbs of the class 3 N0 V da N1 for provenire ldquoto come fromand the verbs belonging to the class 6 etc) All the lexical entries arethen differentiated one another in each class by taking into accountall the transformational distributional and structural properties ac-cepted or rejected by every item

211 Distributional and Transformational Analysis

The Lexicon-Grammar theory lays its foundations on the Operator-argument grammar of Zellig S Harris the combinatorial system thatsupports the generation of utterances into the natural languageSaying that the operators store inside information regarding the sen-tence structures means to assume the nuclear sentence to be the min-imum discourse unit endowed with meaning (Gross 1992b)This premise is shared with the LG theory together with the centralityof the distributional analysis a method from the structural linguisticsthat has been formulated for the first time by Bloomfield (1933) andthen has been perfected by Harris (1970) The insight that some cat-egories of words can somehow control the functioning of a numberof actants through a dependency relationship called valency insteadcomes from Tesniegravere (1959)The distribution of an element A is defined by Harris (1970) as the sumof the contexts of A Where the context of A is the actual disposition ofits co-occurring elements It consists in the systematic substitution oflexical items with the aim of verifying the semantic or the transforma-tional reactions of the sentences All of this is governed by verisimil-itude rules that involve a graduation in the acceptability of the itemcombinationAlthough the native speakers of a language generally think that thesentence elements can be combined arbitrarily they actually choosea set of items along the classes that regularly appear together Theselection depends on the likelihood that an element co-occurs with

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 17

elements of one class rather than another3When two sequences can appear in the same context they have anequivalent distribution When they do not share any context they havea complementary distributionObviously the possibility to combine sentence elements is related tomany different levels of acceptability Basically the utterances pro-duced by the words co-occurrence can vary in terms of plausibilityIn the operator-argument grammar of Harris the verisimilitude of oc-currence of a word under an operator (or an argument) is an approxi-mation of the likelihood or of the frequency of this word with respectto a fixed number of occurrence of the operator (or argument) (Harris1988)Concerning transformations (Harris 1964) we refer to the phe-nomenon in which the sentences A and B despite a different combi-nation of words are equivalent to a semantic point of view (eg Flo-ria ha letto quel romanzo = quel romanzo egrave stato letto da Floria ldquoFlo-ria read that novel = that novel has been read by Floriardquo (DrsquoAgostino1992)) Therefore a transformation is defined as a function T that tiesa set of variables representing the sentences that are in paraphrasticequivalence (A and B are partial or full synonym) and that follow themorphemic invariance (A and B possess the same lexical morphemes)Transformational relations must not be mixed up with any derivationprocess of the sentence B from A and vice versa A and B are just twocorrelated variants in which equivalent grammatical relations are re-alized (DrsquoAgostino 1992)4 That is why in this work we do not use di-rected arrows to indicate transformed sentencesAmong the transformational manipulations considered in this thesiswhile allying the Lexicon-Grammar theoretical framework to the Sen-

3Among the traits that mark the co-occurrences classes we mention by way of example human andnot human nouns abstract nouns verbs with concrete or abstract usages etc (Elia et al 1981)

4 Different is the concept of transformation in the generative grammar that envisages an orientedpassage from a deep structure to a superficial one ldquoThe central idea of transformational grammar isthat they are in general distinct and that the surface structure is determined by repeated applicationof certain formal operations called lsquogrammatical transformationsrsquo to objects of a more elementary sortrdquo(Chomsky 1965 p 16)

18 CHAPTER 2 THEORETICAL BACKGROUND

timent Analysis field we mention the following (Elia et al 1981 Vietri2004)

Substitution The substitution of polar elements in sentences is thestarting point of this work that explores the mutations in term ofacceptability and semantic orientation in the transformed sen-tences It can be found almost in every chapter of this thesis

Paraphrases with support verbs5 nominalizations and adjectival-izations This manipulation is essential for example in Section334 and 335 where we discuss the nominalization (eg amare= amore ldquoto love = loverdquo) and the adjectivalization (eg amare =amoroso ldquoto love = lovingrdquo) of the Italian psychological verbs orin Section 352 when we account for the relation among a set ofsentiment adjectives and adverbs (eg appassionatamente = inmodo appassionato ldquopassionately = in a passionate wayrdquo)

Repositioning Emphasis permutation dislocation extraction are alltransformation that shifting the focus on a specific element ofthe sentence influence its strength when the element is polar-ized

Extension We are above all interested in the expansion of nominalgroups with opinionated modifiers (see Section 332)

Restructuring The complement restructuring plays a crucial role inthe Feature-based Sentiment Analysis for the correct annotationof the features and the objects of the opinions (see Sections 334and 52)

Passivization In our system of rules passive sentences merely pos-

5 The concept of operator does not depend on specific part of speech so also nouns adjectives andprepositions can possess the power to determine the nature and the number of the sentence arguments(DrsquoAgostino 1992) Because only the verbs carry out morpho-grammatical information regarding themood tense person and aspect they must give this kind of support to non-verbal operators The socalled Support Verbs (Vsup) are different from auxiliaries (Aux) that instead support other verbs Sup-port verbs (eg essere ldquoto berdquo avere ldquoto haverdquo fare ldquoto dordquo) can be case by case substituted by stylisticaspectual and causative equivalents

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 19

sess the same semantic orientation of the corresponding activesentence

Systematic correlation Examples are the standardcrossed structuresmentioned in Section 334

212 Formal Representation of Elementary Sentences

While the generative grammar focused its studies on complex sen-tences both the Operator grammar and the LG approach assumedthe respectively so called nuclear or elementary sentences to bethe minimum discourse units endowed with meaning Above all be-cause complex sentences are made from elementary ones as statedby (Chomsky 1957 p 92) himself

ldquoIn particular in order to understand a sentence it is neces-sary to know the kernel sentences from which it originates(more precisely the terminal strings underlying these ker-nel sentences) and the phrase structure of each of these el-ementary components as well as the transformational his-tory of development of the given sentence from these kernelsentences The general problem of analyzing the process oflsquounderstandingrsquo is thus reduced in a sense to the problemof explaining how kernel sentences are understood thesebeing considered the basic rsquocontent elementsrsquo from whichthe usual more complex sentences of real life are formed bytransformational developmentrdquo

This assumption shifts the definition of creativity from ldquorecursivityrdquoin complex sentences (Chomsky 1957) to ldquocombinatory possibilityrdquoat the level of elementary sentences (Vietri 2004) but affects also theway in which the lexicon is conceived (Gross 1981 p 48) made thefollowing observation in this regard6

6 On this topic also Chomsky agrees that

20 CHAPTER 2 THEORETICAL BACKGROUND

ldquoLes entreacutees du lexique ne sont pas des mots mais des phrasessimples Ce principe nrsquoest en contradiction avec les notionstraditionnelles de lexique que de faccedilon apparente En ef-fet dans un dictionnaire il nrsquoest pas possible de donner lesens drsquoun mot sans utiliser une phrase ni de contraster desemplois diffeacuterents drsquoun mecircme mot sans le placer dans desphrasesrdquo7

As it will be discovered in the following Sections the idea that theentries of the lexicon can be sentences rather that isolated words is aprinciple that has been entirely shared among the sentiment lexicalresources built for the present workWith regard to formalisms Gross (1992b) represented all the elemen-tary sentences through the following generic shape

N0 V WWhere N0 stands for the sentence formal subject V stands for the

verbs and W can indicate all kinds of essential complements includ-ing an empty one While N0 V represents a great generality W raisesmore complex classification problems (Gross 1992b) Complementsindicated by W can be direct (Ni) prepositional (Prep Ni) or sentential(Ch F)

ldquoIn describing the meaning of a word it is often expedient or necessary to refer to thesyntactic framework in which this word is usually embedded eg in describing themeaning of hit we would no doubt describe the agent and object of the action in termsof the notions subject and object which are apparently best analyzed as purelyformal notions belonging to the theory of grammarrdquo (Chomsky 1957 p 104)

But without any exhaustive application of syntactic rules to lexical items he feels compelled to pointout that

ldquo() to generalize from this fairly systematic use and to assign lsquostructural meaningsrsquo togrammatical categories or constructions just as lsquolexical meaningsrsquo are assigned to wordsor morphemes is a step of very questionable validityrdquo (Chomsky 1957 p 104)

7 ldquoThe entries of the lexicon are not words but simple sentences This principle contradicts the tradi-tional notions of lexicon just into an apparent way Indeed into a dictionary it is impossible to providethe meaning of a word without a sentence nor to contrast different uses of the same word without placingit into sentencesrdquo Author translation

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 21

Information about the nature and the number of complement arecontained in the verbs (or in other parts of speech that case by caseplay the a predicative function)Gross (1992b) presented the following typology of complements re-spectively approached for the Italian language by the LG researchersmentioned below

frozen arguments (C) Vietri (1990 2011 2014d)

free arguments (N) DrsquoAgostino (1992) Bocchino (2007)

sentential arguments (Ch F) Elia (1984)

2121 The Continuum from Simple Sentences to Polyrematic Structures

Gross (1988 1992b 1993) recognized and formalized the concept ofphrase figeacutee ldquofrozen sentencerdquo which refers to groups of words thatcan be tied one another by different degrees of variability and cohe-sion (Elia and De Bueriis 2008) They can take the shape of verbalnominal adjectival or adverbial structuresThe continuum along which frozen expressions can be classifiedvaries according to higher or lower levels of compositionality and id-iomaticity and goes from completely free structures to totally frozenexpressions as summarized below (DrsquoAgostino and Elia 1998 Eliaand De Bueriis 2008)

distributionally free structures high levels of co-occurrence variabil-ity among words

distributionally restricted structures limited levels of co-occurrencevariability

frozen structures (almost) null levels of co-occurrence variability

proverbs no variability of co-occurrence

22 CHAPTER 2 THEORETICAL BACKGROUND

Idiomatic interpretations can be attributed to the last two elements ofthe continuum due to their low levels of compositionalityThe fact that the meaning of these expressions cannot be computedby simply summing up the meanings of the words that compose themhas a direct impact on every kind of Sentiment Analysis applicationthat does not correctly approaches the linguistic phenomenon of id-iomaticityWe will show the role of multiword expressions in Section 34

2122 Against the Bipartition of the Sentence Structure

In the operator grammar differently from the generative grammarsentences are not supposed to have a bipartite structure

S rarr N P + V P

V P rarrV + N P 8

but are represented as predicative functions in which a central ele-ment the operator influences the organization of its variables thearguments The role of operator must not be represented necessar-ily by verbs but also by other lexical items that possess a predicativefunction such as nous or adjectivesThis idea shared also by the LG framework is particularly importantin the Sentiment Analysis field especially if the task is the detection ofthe semantic roles involved into sentiment expressions (Section 53)Sentences like (1) and (2) are examples in which the semantic roles ofexperiencer of the sentiment and stimulus of the sentiment are respec-tively played by ldquoMariardquo and il fumo ldquothe smokerdquo It is intuitive thatfrom a syntactic point of view these semantic roles are differently dis-tributed into the direct (the experiencer is the subject such as in 1)and the reverse (the experiencer is the object as in 2) sentence struc-tures (see Section 333 for further explanations)

8Sentence = Noun Phrase + Verb Phrase Verb Phrase = Verb + Noun Phrase

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 23

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]9

ldquoMaria hates the smokerdquo

(2) [Sentiment[StimulusIl fumo] tormenta [ExperiencerMaria]]ldquoThe smoke harasses Mariardquo

The indissoluble bond between the sentiment and the experi-encer10 that does not differ from (1) to (2) is perfectly expressedby the Operator-argument grammar function Onn for both the sen-tences by the LG representation N0 V N1 or by the function of Gross(1995) Caus(ssenth) (described in Section 13 and deepened in Sec-tion 3)

(1) Caus(il fumo odiare Maria)(2) Caus(il fumo tormentare Maria)

The same can not be said of the sentence bipartite structure thatrepresenting the experiencer ldquoMariardquo inside (2) and outside (1) theverb phrase unreasonably brings it closer or further from the senti-ment to which it should be attached just in the same way (Figure 21)

Moreover it does not explain how the verbs of (1) and (2) can ex-ercise distributional constraints on the noun phrase indicating the ex-periecer (that must be human or at least animated) both in the casein which it is under the dependence of the same verb phrase (experi-encer object see 2b) and also when it is not (experiencer subject see2a)

(1a) [Sentiment [Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]odi a[Stimulusi l f umo]]

ldquo(Maria + My cat + The television) hates the smokerdquo

9For the sentence annotation we used the notation of Jackendoff (1990) [Event ]10Un sentiment est toujours attacheacute agrave la personne qui lrsquoeacuteprouve ldquoA sentiment is always attached to the

person that feels itrdquo (Gross 1995 p 1)

24 CHAPTER 2 THEORETICAL BACKGROUND

Figure 21 Syntactic Trees of the sentences (1) and (2)

(2a) [Sentiment[StimulusI l f umo]tor ment a[Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]]

ldquoThe smoke harasses (Maria + My cat + The television)rdquo

According with (Gross 1979 p 872) the problem is that

ldquorewriting rules (eg S rarr N P V P V P rarr V N P ) describeonly LOCAL dependencies [] But there are numerous syn-tactic situations that involve non-local constraintsrdquo

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 25

While the Lexicon-Grammar and the operator grammar represen-tations do not change the representations of the paraphrases of (12)with support verbs (1b2b) further complicate the generative gram-mar solution that still does not correctly represent the relationshipsamong the operators and the arguments (see Figure 22) It fails todetermine the correct number of arguments (that are still two andnot three) and does not recognize the predicative function which isplayed by the nouns odio ldquohaterdquo and tormento ldquoharassmentrdquo and notby the verbs provare ldquoto feelrdquo and dare ldquoto giverdquo

(1b) [Sentiment[ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

(2b) [Sentiment[StimulusIl fumo] dagrave il tormento a [ExperiencerMaria]]ldquoThe smoke gives harassment to Mariardquo

For all these reasons for the computational treatment of subjectiv-ity emotions and opinions in raw text we preferred in this work theLexicon-Grammar framework to other linguistic approaches despiteof their popularity in the NLP literature

213 Lexicon-Grammar Binary Matrices

We said that the difference between the LG language description andany other linguistic approach regards the systematic formalization ofa very broad quantity of data This means that given a set of proper-ties P that define a class of lexical items L the LG method collects andformalizes any single item that enters in such class (extensional classi-fication) while other approaches limit their analyses only to reducedsets of examples (intensional classification) (Elia et al 1981)Because the presentation of the information must be as transparentrigorous and formalized as possible LG represents its classes throughbinary matrices like the one exemplified in Table 21

They are called ldquobinaryrdquo because of the mathematical symbols ldquo+rdquo

26 CHAPTER 2 THEORETICAL BACKGROUND

Figure 22 Syntactic Trees of the sentences (1b) and (2b)

and ldquondashrdquo respectively used to indicate the acceptance or the refuse ofthe properties P by each lexical entry LThe symbol X unequivocally denotes an LG class and is always asso-ciated to a class definitional property simultaneously accepted by allthe items of the class (in the example of Table 21 P1) Subclasses of Xcan be easily built by choosing another P as definitional property (egL2 L3 and L4 under the definitional substructure P2)The sequences of ldquo+rdquo and ldquondashrdquo attributed to each entry constitute the

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 27

X P1 P2 P3 P4 Pn

L1 + ndash ndash ndash +L2 + + ndash ndash ndashL3 + + + + +L4 + + + + +Ln + ndash ndash ndash +

Table 21 Example of a Lexicon-Grammar Binary Matrix

ldquolexico-syntactic profilesrdquo of the lexical items (Elia 2014a)The properties on which the classification is arranged can regard thefollowing aspects (Elia et al 1981)

distributional properties by which all the co-occurrences of the lexi-cal items L with distributional classes and subclasses are testedinside acceptable elementary sentence structures

structural and transformational properties through which all thecombinatorial transformational possibilities inside elementarysentence structures are explored (see Section 211 for more in-depth explanations)

It is possible for a lexical entry to appear in more than one LG ma-trix In this case we are not referring to a single verb that accepts di-vergent syntactic and semantic properties but instead to two differ-ent verb usages attributable to the same morpho-phonological entry(Elia et al 1981 Vietri 2004 DrsquoAgostino 1992) Verb usages possessthe same syntactic and semantic autonomy of two verbs that do notpresent any morpho-phonological relation

214 Syntax-Semantics Interface

In the sixties the hot debate about the syntax-semantics Interface op-posed theories supporting the autonomy of the syntax from the se-mantics to approaches sustaining the close link between these two

28 CHAPTER 2 THEORETICAL BACKGROUND

linguistic dimensionsChomsky (1957) postulated the independence of the syntax from thesemantics due to the vagueness of the semantics and because of thecorrespondence mismatches among these two linguistic dimensionsbut recognized the existence of a relation between the two that de-serves to be explored

ldquoThere is however little evidence that lsquointuition aboutmeaningrsquo is at all useful in the actual investigation of lin-guistic form []It seems clear then that undeniable though only imperfectcorrespondences hold between formal and semantic fea-tures in language The fact that the correspondences are soinexact suggests that meaning will be relatively useless as abasis for grammatical description []The fact that correspondences between formal and seman-tic features exist however cannot be ignored [] Havingdetermined the syntactic structure of the language we canstudy the way in which this syntactic structure is put to usein the actual functioning of language An investigation ofthe semantic function of level structure () might be a rea-sonable step towards a theory of the interconnections be-tween syntax and semanticsrdquo (Chomsky 1957 p 94-102)

The contributions that come from the generative grammar withparticular reference to the theta-theory (Chomsky 1993) assume thepossibility of a mapping between syntax and semantics

ldquoEvery content bearing major phrasal constituent of a sen-tence (S NP AP PP ETC) corresponds to a conceptual con-stituent of some major conceptual category11rdquo (Jackendoff1990 p 44)

11Object Event State Action Place Path Property Amount

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 29

But their autonomy is not negated through this assumption the-matic roles are still considered to be part of the level of conceptualstructure and not part of syntax (Jackendoff 1990) This is confirmedagain by (Chomsky 1993 p 17) when stating that ldquo() the mappingof S-structures onto PF and LF are independent from one anotherrdquoHere the S-structure generated by the rules of the syntax is associ-ated by means of different interpretative rules with representations inphonetic form (PF) and in logical form (LF)Many contributions from the linguistic community define the con-cept of thematic roles through the linking problem between syntax(syntactic realization of predicate argument structures that determinethe roles) and semantics (semantic representation of such structures)(Dowty 1989 Grimshaw 1990 Rappaport Hovav and Levin 1998)Among the canonical thematic roles we mention the agent (Gruber1965) the patient Baker (1988) the experiencer (Grimshaw 1990) thetheme (Jackendoff 1972) ect

2141 Frame Semantics

ldquoSome words exist in order to provide access to knowledgeof such frames to the participants in the communicationprocess and simultaneously serve to perform a categoriza-tion which takes such framing for granted (Fillmore 2006p 381)

With these words Fillmore (2006) depicted the Frame Semanticswhich describes sentences on the base of predicators able to bring tomind the semantic frames (inference structures linked through lin-guistic convention to the lexical items meaning) and the frame ele-ments (participants and props in the frame) involved in these frames(Fillmore 1976 Fillmore and Baker 2001 Fillmore 2006)A frame semantic description starts from the identification of the lex-ical items that carry out a given meaning and then explores theways in which the frame elements and their constellations are realized

30 CHAPTER 2 THEORETICAL BACKGROUND

around the structures that have such items as head (Fillmore et al2002)This theoretical framework poses its bases on the Case grammar astudy on the combination of the deep cases chosen by verbs an on thedefinition of case frames that have the function of providing ldquoa bridgebetween descriptions of situations and underlying syntactic represen-tationsrdquo by attributing ldquosemantico-syntactic roles to particular partic-ipants in the (real or imagined) situation represented by the sentence(Fillmore 1977 p 61) A direct reference of Fillmorersquos work is the va-lency theory of Tesniegravere (1959)Based on these principles the FrameNet research project produced alexicon of English for both human use and NLP applications (Bakeret al 1998 Fillmore et al 2002 Ruppenhofer et al 2006) Its purposeis to provide a large amount of semantically and syntactically anno-tated sentences endowed with information about the valences (com-binatorial possibilities) of the items derived from annotated contem-porary English corpus Among the semantic domains covered thereare also emotion and cognition (Baker et al 1998)For the Italian language Lenci et al (2012) developed LexIt a toolthat following the FrameNet approach automatically explores syn-tactic and semantic properties of Italian predicates in terms of distri-butional profiles It performs frame semantic analyses using both LaRepubblica corpus and the Wikipedia taxonomy

2142 Semantic Predicates Theory

The Lexicon-Grammar framework offers the opportunity to creatematches between sets or subsets of lexico-syntactic structures andtheir semantic interpretations The base of such matches is the con-nection between the arguments selected by a given predicative itemlisted in our tables and the actants involved by the same semanticpredicate According with (Gross 1981 p 9)

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 31

ldquoSoit Sy lrsquoensemble des formes syntaxiques drsquoune langue []Soit Se un ensemble drsquoeacuteleacutements de sens les eacuteleacutements de Syseront associeacutes agrave ceux de Se par des regravegles drsquointerpreacutetation[] nous appellerons actants syntaxiques les sujet et com-pleacutement(s) du verbe tels qursquoils sont deacutecrits dans Sy nous ap-pelons arguments les variables des preacutedicats seacutemantiquesDans certains exemples il y a correspondance biunivoqueentre actants et arguments entre phrase simple et preacutedi-catrdquo12

This is the basic assumption on which the Semantic Predicates the-ory has been built into the LG framework It postulates a complexparallelism between the Sy and the Se that differently from other ap-proaches can not ignore the idiosyncrasies of the lexicon since ldquoilnrsquoexiste pas deux verbes ayant le mecircme ensemble de proprieacuteteacutes syntax-iquesrdquo (Gross 1981 p 10) Actually the large number of structures inwhich the words can enter underlines the importance of a sentimentlexical database that includes both semantic and (lexicon dependent)syntactic classification parameters

Therefore the possibility to create intuitive semantic macro-classifications into the LG framework does not contrast with the au-tonomy of the semantics from the syntax (Elia 2014b)In this thesis we will refer to following three different predicates all ofthem connected to the expression of subjectivity

Sentiment (eg odiare to hate) that involves the actants experi-encer and stimulus

Opinion (eg difendere to stand up for) that implicates the ac-tants opinion holder and target of the opinion

12 ldquoLet Sy be the set of syntactic forms of a language [] Let it Se be the set of semantic elementsthe elements from Sy will be associated to the ones of Se through interpretation rules [] We will callsyntactic actants the subject and the complement(s) of the verb in the way in which they are describedin Sy we will call arguments the variables of semantic predicates In some examples there is a one-to-one correspondence between actants and arguments between simple sentence and predicaterdquo Authortranslation

32 CHAPTER 2 THEORETICAL BACKGROUND

Figure 23 LG syntax-semantic interface

Physical act (eg baciare to kiss) that evokes the actants patientand agent

As an example the predicate in the sentence (1) from the LG class43 will be associated to a Predicate with two variables described bythe mathematical function Caus(ssenth)

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]]ldquoMaria hates the smokerdquo

The rules of interpretations which have been followed are

1 the experiencer (h in the Se) corresponds to the formal subject(N0 in Sy)

2 the sentiment stimulus (t in the Se) is the human complement(N1 in Sy)

As shown in the few examples below the syntactic transformationsin which the same predicate is involved preserve the role played by itsarguments

Nominalization

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 33

(1b) [Sentiment [ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

Adjectivalization

(1c) [Sentiment [Stimulusil fumo] egrave odioso per [ExperiencerMaria]]ldquoThe smoke is hateful for Mariardquo

Passivization

(1d) [Sentiment [Stimulusil fumo] egrave odiato da [ExperiencerMaria]]ldquoThe smoke is hated by Mariardquo

Repositioning

(1e) [Sentiment egrave il [Stimulusil fumo] che [ExperiencerMaria] odia]ldquoIt is the smoke that Maria hatesrdquo

As concerns the link among actants and arguments the back-ground of the research is the LEG-Semantic Role Labeling system ofElia (2014b) with particular reference to its psychological evaluativeand bodily predicates The granularity of the verb properties in thisSRL system are evidence of the strong dependence of the syntax fromthe lexicon and proof of its absolute independence from semanticsSpecial kinds of Semantic Predicates have been already used in NLPapplications into a Lexicon-Grammar context we mention Vietri(2014a) Elia et al (2010) Elia and Vietri (2010) that formalized andtested the Transfer Predicates on the Italian Civil Code Elia et al(2013) that focused on the Spatial Predicates and Maisto and Pelosi(2014b) Pelosi (2015b) Elia et al (2015) that exploited the Psycholog-ical Semantic Predicates for Sentiment Analysis purposes

34 CHAPTER 2 THEORETICAL BACKGROUND

22 The Italian Module of Nooj

Nooj is the NLP tool used in this work for both the language formal-ization and the corpora pre-processing and processing at the ortho-graphical lexical morphological syntactic and semantic levels (Sil-berztein 2003)Among the Nooj modules that have been developed for more thantwenty languages by the international Nooj community the ItalianLingware has been built by the Maurice Gross Laboratory from theUniversity of SalernoThe Italian module for NooJ can be any time integrated with other adhoc resources in form of electronic dictionaries and local grammarsThe freely downloadable Italian resources13 include

bull electronic dictionaries of simple and compound words

bull an electronic dictionary of proper names

bull an electronic dictionary of toponyms

bull a set of morphological grammars

bull samples of syntactic grammars

This knowledge and linguistic based environment is not in contrastwith hybrid and statistical approaches that often require annotatedcorpora for their testing stagesThe Nooj annotations can go through the description of simple wordforms multiword units and even discontinuous expressionsFor a detailed description of the potential of the Italian module ofNooj see Vietri (2014c)

13wwwnooj-associationorg

22 THE ITALIAN MODULE OF NOOJ 35

221 Electronic Dictionaries

If we tried to digit on a search engine the key-phrase ldquoelectronic dic-tionaryrdquo the first results that would appear in the SERP would concernCD-ROM dictionaries or machine translatorsActually electronic dictionaries (usable with all the computerizeddocuments for the text recognition for the information retrieval andfor the machine translation) should be conceived as lexical databasesintended to be used only by computer applications rather than by awide audience who probably wouldnrsquot be able to interpret data codesformalized in a complex wayIn addition to the target the differences between the two types of dic-tionary can be summarized in three points

Completeness human dictionaries may overlook information thatthe human users could extract from their encyclopedic knowl-edge Electronic dictionaries instead must necessarily be ex-haustive They cannot leave anything to chance the computermustnrsquot have possibility of error

Explanation human dictionaries can provide part of the informationimplicitly because they can rely on human userrsquos skills (intuitionadaptation and deduction) Electronic dictionaries on the con-trary must respect this principle the computer can only performfully explained symbols and instructions

Coding unlike human dictionaries all the information provided intothe electronic dictionaries related to the use of software for auto-matic processing of data and texts must be accurate consistentand codified

Joining all these features enlarges the size and magnifies the com-plexity of monolingual and bilingual electronic dictionaries whichare for this reason always bigger than the human onesThe Italian electronic dictionaries system based on the Maurice

36 CHAPTER 2 THEORETICAL BACKGROUND

Grossrsquos Lexicon-Grammar has been developed at the University ofSalerno by the Department of Communication Science it is groundedon the European standard RELEX

222 Local grammars and Finite-state Automata

Local grammars are algorithms that through grammatical morpho-logical and lexical instructions are used to formalize linguistic phe-nomena and to parse texts They are defined ldquolocalrdquo because despiteany generalization they can be used only in the description and anal-ysis of limited linguistic phenomenaThe reliability of ldquofinite state Markov processesrdquo (Markov 1971) hadalready been excluded by (Chomsky 1957 p 23-24) who argued that

ldquo() there are processes of sentence-formation that finitestate grammars are intrinsically not equipped to handle Ifthese processes have no finite limit we can prove the literalinapplicability of this elementary theory If the processeshave a limit then the construction of a finite state grammarwill not be literally out of the question since it will be possi-ble to list the sentences and a list is essentially a trivial finitestate grammar But this grammar will be so complex that itwill be of little use or interestrdquo

and remarked that

ldquoIf a grammar does not have recursive devices () it will beprohibitively complex If it does have recursive devices ofsome sort it will produce infinitely many sentencesrdquo

Instead Gross (1993) caught the potential of finite state machinesespecially for those linguistic segments which are more limited from acombinatorial point of viewThe Nooj software actually is not limited to finite-state machines andregular grammars but relies on the four types of grammars of the

22 THE ITALIAN MODULE OF NOOJ 37

Figure 24 Deterministic Finite-State Automaton

Figure 25 Non deterministic Finite-State Automaton

Figure 26 Finite-State Transducer

Figure 27 Enhanced Recursive Transition Network

38 CHAPTER 2 THEORETICAL BACKGROUND

Chomsky hierarchy In fact as we will show in this research it is ca-pable to process context free and sensitive grammars and to performtransformations in cascadeFinite-State Automata (FSA) are abstract devices characterized by a fi-nite set of nodes or ldquostatesrdquo (Si) connected one another by transitions(tj) that allow us to determine sequences of symbols related to a par-ticular path (see Figure 24)These graphs are read from left to right or rather from the initial state(S0) to the final state (S3) (Gross 1993) FSA can be deterministic suchas the one in Figure 24 or non deterministic if they activate morethen one path as in Figure 25 A Finite-State Transducer (FST) tans-duces the input symbols (Sii) into the output ones (Sio) as in Figure26 A Recursive Transition Network (RTN) is a grammar that containsembedded graphs (the S3 node in Figure 27) An Enhanced RecursiveTransition Network (ERTN) is a more complex grammar that includesvariables (V) and constraints (C) (Silberztein 2003 see Figure 27)For simplicity from now on we will just use the acronym FSA in orderto indicate all the kinds of local grammars used to formalize differentlinguistic phenomenaIn general the nodes of the graphs can anytime recall entries or se-mantic and syntactic information from the electronic dictionariesFSA are inflectional or derivational if the symbols in the nodes aremorphemes syntactic if the symbols are words

Chapter 3

SentIta Lexicon-Grammarbased Sentiment Resources

31 Lexicon-Grammar of Sentiment Expressions

The LG theoretical framework with its huge collections and catego-rization of linguistic data in the form of LG tables built and refinedover years by its researchers offers the opportunity to systematicallyrelate a large number of semantic and syntactic properties to the lex-icon The complex intersection between semantics and syntax thegreatest benefit of this approach corrects the worse deficiency ofother semantic strategies proposed in the Sentiment Analysis litera-ture the tendency to group together only on the base of their meaningwords that have nothing in common from the syntactic point of view(Le Pesant and Mathieu-Colas 1998)1

1 On this purpose (Elia 2013 p 286) illustrates that

ldquo() the semantic intuition that drives us to put together certain predicates and theirargument is not correlated to the set of syntactic properties of the verbs nor is it laquohelpedraquoby it except in a very superficial way indeed we might say that the granular nature of thesyntax in the lexicon would be an obstacle for a mind that is organized in an efficient and

39

40 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ldquoClasses drsquoobjetsrdquo is the expression introduced by Gross (1992a) to in-dicate the linguistic device that allow the language formalization fromboth a syntactic and semantic point of view Arguments and Predi-cates are classified into subsets which are semantically consistent andthat also share lexical and grammatical properties (De Bueriis andElia 2008)According with (Gross 2008 p 1)

ldquoSi un mot ne peut pas ecirctre deacutefini en lui-mecircme crsquoest-agrave-direhors contexte mais seulement dans un environnement syn-taxique donneacute alors le lexique ne peut pas ecirctre seacutepareacute de lasyntaxe crsquoest-agrave-dire de la combinatoire des mots La seacuteman-tique nrsquoest pas autonome non plus elle est le reacutesultat des eacuteleacute-ments lexicaux organiseacutes drsquoune faccedilon deacutetermineacutee (distribu-tion) Qursquoil est soit ainsi est confirmeacute par les auteurs de dic-tionnaires eux-mecircmes qui timidement et sans aucune meacuteth-ode notent pour un preacutedicat donneacute le ou les arguments quipermettent de seacuteparer un emploi drsquoun autre Pas de seacuteman-tique sans syntaxe donc crsquoest-agrave-dire sans contexterdquo2

SentIta is a sentiment lexical database that directly aims to apply theLexicon-Grammar theory starting from its basic hypothesis the min-imum semantic units are the elementary sentences not the words(Gross 1975)Therefore in this work the lemmas collected into the dictionaries andtheir Semantic Orientations are systematically recalled and computedinto a specific context through the use of local grammars On the baseof their combinatorial features and co-occurrences contexts the Sen-

rigorous logic wayrdquo

2ldquoIf a word can not be defined in itself namely out of context but only into a specific syntactic environ-ment then the lexicon can not be separated from the syntax that is to say from the word combinatoricsSemantics is not autonomous anymore it is the result of lexical elements organized into a determinedway (distribution) This is confirmed by the dictionary authors themselves that timidly and without anymethod note for a specific predicate the argument(s) that allow to separate one use from another Nosemantics without syntax that means without contextrdquo Authorrsquos translation

31 LEXICON-GRAMMAR OF SENTIMENT EXPRESSIONS 41

tIta lexical items can take the following shapes (Buvet et al 2005 Elia2014a)

operators the predicates that can be verbs nouns adjectives ad-verbs multiword expressions prepositions and conjunctions

arguments the predicate complements that can be nominal andprepositional groups or entire clauses

As already specified in Section 22 the sentiment words and other ori-ented Atomic Linguistic Units (ALU) are listed into Nooj electronicdictionaries while the local grammars used to manipulate their po-larities are formalized thanks to Finite State AutomataElectronic dictionaries have then been exploited in order to list andto semantically and syntactically classify into a machine readableformat the sentiment lexical resources The computational powerof Nooj graphs has instead been used to represent the accep-tancerefuse of the semantic and syntactic properties through the useof constraint and restrictionsChapter 2 underlined the role played in the elementary sentences bythe Predicates and the one assumed by its arguments According toLe Pesant and Mathieu-Colas (1998)

ldquoLa structure preacutedicatarguments semble plus opeacuteratoire queles deacutecoupages binaires classiques (sujetpreacutedicat)[] Crsquoest le preacutedicat qui deacutetermine le nombre de positionsconstitutives de la phraserdquo 3

Predicates must not be identified only with the category of verbs butalso with predicative nouns and adjectives that enter into supportverb structuresIn the subgroup of sentiment expressions starting from their syntacticstructures Gross identified two semantic types

3The predicateargument structure seems to be more operating than the classical binary subdivision(subjectpredicate) [] It is the predicate that determines the number of positions that constitute thesentence Authorrsquos translation

42 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1 sentences with two arguments where one is the person and theother is the feeling (eg Maria egrave angosciata ldquoMaria is anguishedrdquo)

2 sentences with three arguments that differs from the first one forthe presence of a stimulus that triggers the feeling into anotherperson (eg Gianni angoscia Maria ldquoGianni anguishes Mariardquo)

The notation for their representation is the same as the one usedfor the Semantic Predicates (Gross 1981) The type 1 can be formal-ized by the following formula

1 P(senth) eg P(angoscia Maria)

in which the variable h is function of a sentiment Sent through a pred-icative relationshipThe type 2 instead can be expressed by the following formula

2 Caus(s sent h) eg Caus(Gianni angoscia Maria)

in which a sentiment Sent is caused by a stimulus s on a person hBoth of them are outcomes of the following assumption ldquoun senti-ment est toujours attacheacute agrave la personne qui lrsquoeacuteprouverdquo4 (Gross 1995p 1) At first glance the problem connected to sentiment expressionsseems to be easy to define and solve Actually upon a deeper analysisit presents its challenging aspects the quantity and the variability ofsentiment expressions which can vary both from a transformationaland from a morpho-syntactic point of view As displayed below a sen-timent expressed by a word can take the shape of verbs nouns ad-jectives or adverbs (Gross 1995 DrsquoAgostino 2005 DrsquoAgostino et al2007)

V angosciare ldquoto anguishrdquo angosciarsi ldquoto become anguishedrdquo

N angoscia ldquoanguishrdquo

A angosciante ldquoanguishingrdquo angosciato angoscioso ldquoanguishedrdquo

ADV angosciatamente angosciosamente ldquowith anguishrdquo4ldquoA sentiment is always attached to the person that feels itrdquo Authorrsquos translation

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 43

Such groups of words related one another by the derivational mor-phology under a common root (eg angosc-) represent from the LGpoint of view classes of paraphrastic equivalence or also paraphras-tic constellations (DrsquoAgostino 1992) These concepts are crucial whenapplied to the Sentiment Analysis field because the connection of thewords displayed above is related to a formal principle that regards thesentence structure and also a semantic principle that concerns thedistributional properties and the semantic roles of the arguments se-lected by the predicates (DrsquoAgostino 1992)Nevertheless from a syntactic point of view Gross (1995) noticed thatthe terms of sentiment can occupy all the different nominal adjecti-val verbal or adverbial positions within the same sentence structure

32 Literature Survey on Subjectivity Lexicons

The most used approaches in the Sentiment Analysis field can be di-vided and summarized in three main lines of research lexicon-basedmethods learning and statistical methods and hybrid methodsThe first ones coincide with the strategy that has been chosen to carryout the present workLexicon-based approaches always start from the following assump-tion the text sentiment orientation comes from the semantic orien-tations of words and phrases contained in itThe most commonly used SO indicators are adjectives or adjectivephrases (Hatzivassiloglou and McKeown 1997 Hu and Liu 2004Taboada et al 2006) but recently it became really common the useof adverbs (Benamara et al 2007) nouns (Vermeij 2005 Riloff et al2003) and verbs as well (Neviarouskaya et al 2009a)Hand-built lexicons are definitely more accurate than theautomatically-built ones especially in cross-domain SentimentAnalysis tasks Nevertheless to manually draw up a dictionary is

44 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

considered a painstakingly time-consuming activity (Taboada et al2011 Bloom 2011) This is why the presence of a large number ofstudies on automatic polarity lexicons creation and propagation canbe observed in literature On this topic it must be said that automaticdictionaries seem to be more unstable but usually larger than themanually built ones (see Tables 31 and 32) Size anyway does notmean always quality it is common for these large dictionaries tohave scarcely detailed information a large amount of entries coulddenote fewer details in description eg the Maryland dictionary(Mohammad et al 2009) does not specify the entriesrsquo part of speechor instead could mean more noiseAmong the state of the art methods used to build and test thosedictionaries we mention the Latent Semantic Analysis (LSA) (Lan-dauer and Dumais 1997) bootstrapping algorithms (Riloff et al2003) graph propagation algorithms (Velikovich et al 2010 Kaji andKitsuregawa 2007) conjunctions (and or but) and morphologicalrelations between adjectives (Hatzivassiloglou and McKeown 1997)Context Coherency (Kanayama and Nasukawa 2006a) distributionalsimilarity (Wiebe 2000) etc5Word Similarity is a very frequently used method in the dictionarypropagation over the thesaurus-based approaches Examples are theMaryland dictionary created thanks to a Roget-like thesaurus and ahandful of affixe (Mohammad et al 2009) and other lexicons basedon WordNet like SentiWordNet built on the base of quantitativeanalysis of glosses associated to synsets (Esuli and Sebastiani 20052006a) or other lexicons based on the computing of the distancemeasure on WordNet (Kamps et al 2004 Esuli and Sebastiani 2005)Pointwise Mutual Information (PMI) using Seed Words has beenapplied to sentiment lexicon propagation by Turney (2002) Turneyand Littman (2003) Rao and Ravichandran (2009) Velikovich et al(2010) Gamon and Aue (2005) Seed words are words which arestrongly associated with a positivenegative meaning such as eccel-

5More details about the lexicon propagation will be given in Section 351

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 45

lente (ldquoexcellentrdquo) or orrendo (ldquohorriblerdquo) by which it is possible tobuild a bigger lexicon detecting other words that frequently occuralongside them It has been observed indeed that positive wordsoccur often close to positive seed words whereas negative words arelikely to appear around negative seed words (Turney 2002 Turneyand Littman 2003)PMI could be easily calculated in the past using the Web as a corpusthanks to the AltaVistarsquos NEAR operator Today one must be contentwith the Google AND operatorLearning and statistical methods for Sentiment Analysis intent usuallymake use of Support Vector Machine (Pang et al 2002 Mullen andCollier 2004 Ye et al 2009) or Naiumlve Bayes classifiers (Tan et al 2009Kang et al 2012)In the end as regards the hybrid methods it has to be cite the workof Read and Carroll (2009) Li et al (2010) Andreevskaia and Bergler(2008) Dang et al (2010) Dasgupta and Ng (2009) Goldberg and Zhu(2006) Prabowo and Thelwall (2009) and Wan (2009)6A brief summary about the nature and the size of the most popularlexicons for the Sentiment Analysis is given in Tables 31 and 32A clarification needs to be done about the distinction between Polar-

ity lexicons and Affect lexicon The first ones are used in subjectivitydetection or in polarity and intensity classification systems while thelatter going beyond the dichotomy positivenegative refer to a morefine-grained emotion classificationIn the following paragraphs we will firstly describe some of the mostfamous Polarity lexicon and then we will briefly cite some databasesfor affect classificationAnother distinction will be made among the resources conceived forthe English language and the ones created for other languages TheItalian language will especially be considered

6These contributions will be deepened in Section 511

46 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Dictionary Author

Year

En

trie

s

Lan

guag

e

SO-CAL lexicon Taboada 2011 5000+ EngAFINN-111 Hansen 2011 2400+ EngSentilex Silva 2010 7000+ PortQ-WordNet 30 Agerri et al 2010 15500+ EngDAL-2009 Whissell 2009 8700+ EngMPQA lexicon Wilson et al 2005 8000+ EngHu-Liu lexicon Hu et al 2004 6800+ EngHatzivassiloglou lexicon Hatzivassiloglou 1997 15400+ pairs EngGeneral Inquirer Stone 1961 9800+ Eng

Table 31 Manually built Polarity Lexicons

Dictionary Author

Year

En

trie

s

Lan

guag

eSentix Basile Nissim 2013 59700+ ItaSentiWordNet Baccianella et al 2010 38000+ EngWeb-generated lexicon Velikovich et al 2010 178000+ EngMaryland dictionary Mohammad et al e 2009 76000+ Eng

Table 32 (Semi-)Automatically built Polarity Lexicons

321 Polarity Lexicons

SO-CAL dictionary Taboada et al (2011) due to the low stability ofthe automatically generated lexical databases manually devel-oped the SO-CAL dictionary They hand tagged with an evalu-ation scale that ranged from +5 to -5 the semantically orientedwords they found into a variety of sources

bull the multi-domain collection of 400 reviews belonging to dif-ferent categories described in Taboada and Grieve (2004)Taboada et al (2006)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 47

bull 100 movie reviews from the Polarity Dataset of Pang et al(2002) Pang and Lee (2004)

bull the whole General Inquirer dictionary (Stone et al 1966)

The result was a dictionary of 2252 adjectives 1142 nouns 903verbs and 745 adverbs The adverb list has been automaticallygenerated by matching adverbs ending in ldquo-lyrdquo to their poten-tially corresponding adjective Moreover also a set of multiwordexpressions (152 phrasal verbs eg ldquoto fall apartrdquo and 35 intensi-fier expressions eg ldquoa little bitrdquo) have been taken into accountIn case of overlapping between a simple word (eg ldquofunrdquo +2) anda multiword expression (eg ldquoto make fun ofrdquo -1) with differentpolarity the latter possess the higher priority in the annotationprocess

AFINN It is a list of English words that associates 1446 words with avalence between +5 (positive) and -5 (negative) (Hansen et al2011) AFINN-111 that is its newest version includes 2477words and phrases in its list The drawing of the list started froma a set of obscene words and then gradually extended by examin-ing Twitter posts This has lead to a bias of 65 towards negativewords compared to positive words (Nielsen 2011)

Q-WordNet It is a lexical resource consisting of WordNet senses auto-matically annotated by positive and negative polarity (Agerri andGarciacutea-Serrano 2010) It is grounded on the hypothesis that if apositive synset is matched in a gloss then its synset can be alsoannotated as positive Exceptions are the cases under the scopeof a negation that are classified as the opposite of the matchedlemmaWhat differentiates Q-WordNet from SentiWordNet is the factthat positivenegative labels are attributed to word senses at acertain level by means of a graded classification

Multi-Perspective Question Answering Lexicon The MPQA subjectiv-

48 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ity lexicon (Wiebe et al 2004 Wilson et al 2005) is comprised by8000+ subjectivity clues (words and phrases that may be used toexpress private states) annotated by type (strongsubj or weakly-subj) and prior polarity Subjectivity clues and contextual fea-tures have been learned from two different kinds of corpora asmall amount of data manually annotated at the expression-level from the Wall Street Journal and newsgroup data and alarge amount of data with existing document-level annotationsfrom the Wall Street Journal The focus is on three types of sub-jectivity clues

bull hapax legomena words that appeared just once in the cor-pus

bull collocations identified through fixed n-grams

bull adjective and verb features identified using the results of amethod for clustering words according to distributional sim-ilarity

Hu-Liu lexicon The Hu and Liu (2004) wordlist comprises around6800 English items (2006 positive and 4783 negative) includ-ing slang misspellings morphological variants and social mediamark up

Hatzivassiloglou Lexicon Hatzivassiloglou and McKeown (1997)starting from a list of 1336 manually labeled adjectives (657 pos-itive and 679 negative terms) demonstrated that conjunctionsbetween adjectives can provide indirect information about theirorientation In detail the authors extracted the conjunctionsthrough a two-level finite-state grammar able to cover modifi-cation patterns and noun-adjective apposition and tested theirmethod on 21 million word from the Wall Street Journal corpusThis way they collected 13426 conjunctions of adjectives furtherexpanded to a total of 15431 conjoined adjective pairs

Dictionary of Affect in Language The Whissel (1989) Whissell (2009)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 49

Dictionary of Affect (DAL) lists about 4500 English words Itsmore recent version (Whissell 2009) is provided with 8742 wordsmanually annotated fro their Activation Evaluation ImageryWhissell (2009)

General Inquirer It is an IBM 7090 program system developed at Har-vard in the spring of 1961 for content analysis research problemsin behavioral sciences (Stone et al 1966 Stone and Hunt 1963)It is based on the idea that through the analysis of many vari-ables at once it is often possible to discover trends that can ap-pear ldquolatent to casual observationThe description of the ldquosystematicrdquo content analysis processesmust be as ldquoobjectiverdquo as possible the features of texts must beldquomanifestrdquo and the results ldquoquantitativerdquoThe lexicon attaches syntactic semantic and pragmatic infor-mation to part-of-speech tagged words Among others the cate-gories related to polarity measures are the ones pertaining to thethree semantic dimensions of Osgood (1952)

Positive 1915 words of positive outlook

Negative 2291 words of negative outlook

Strong 1902 words implying strength

Weak 755 words implying weakness

Active 2045 words implying an active orientation

Passive 911 words indicating a passive orientation7

SentiWordNet Esuli and Sebastiani (2006a) developed the Senti-WordNet 10 lexicon based on WordNet 20 (Miller 1995) by auto-matically associating each WordNet synset to three scores Obj(s)for objective terms and Pos(s) and Neg(s) for positive and nega-tive terms

7Table 31 refers to the General Inquirer size only in terms of sentiment words

50 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Every score ranges from 00 to 10 and their sum is 10 for eachsynset The values are determined on the base of the propor-tion of eight ternary classifiers (with similar accuracy levels butdifferent classification behavior) that quantitatively analyze theglosses associated with every synset and assign them the properlabelWhat differentiates SentiWordNet from other sentiment lexiconsis the fact that the weighting process focuses on synonym setsrather than on simple terms The researchers intuitively ob-serve that different senses of the same term could have differentopinion-related propertiesSentiWordNet 30 (Baccianella et al 2010) improves SentiWord-Net 10 The main differences between them are the version ofWordNet they annotate (30 for SentiWordNet 30) and the algo-rithm used to annotate WordNet that in the 30 version alongwith the semi-supervised learning step also includes a random-walk step that perfects the scores

Velikovich Web-generated lexicon This dictionary of sentiment thathas been built through graph propagation algorithms and lexi-cal graphs counts 178104 entries which include not only sim-ple words but also spelling variations slang vulgarity and mul-tiword expressions (Velikovich et al 2010)

Maryland dictionary The procedure on which Mohammad et al(2009) built the Maryland lexicon can be split into two steps theidentification of a seed set of positive and negative words and thepositive and negative annotation of the words synonymous in aRoget-like thesaurusEleven antonym-generating affix patterns have been used to gen-erate overtly marked words and their unmarked counterpartsThis way such patterns generated 2692 pairs of valid Englishwords listed in the Macquarie ThesaurusThe main innovation introduced by Mohammad et al (2009) is

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 51

the autonomy from the manual annotation of any term in thedictionary even if it could be used when available to improveboth the correctness and the coverage of its entries

322 Affective Lexicons

SentiSense SentiSense (de Albornoz et al 2012) is a semi-automatically developed affective lexicon which consists of5496 words and 2190 synsetsSimilarly to SentiWordNet or WordNet-Affect it associates emo-tional meanings or categories to concepts (instead of terms) fromthe WordNet lexical database and can be used for both polarityand intensity classification and emotion identificationSynsets are labeled with a set of 14 emotional categories whichare also related by an antonym relationships Examples are dis-gustlike lovehate sadnessjoy) ambiguous anger anticipa-tion calmness despair disgust fear hate hope joy like lovesadness surprise

SentiFul and the Affect Database Neviarouskaya et al (2009b 2011)developed the SentiFul lexicon by automatically expanding thecore of sentiment lexicon through many different strategies

bull synonymy relation

bull direct antonymy relation

bull hyponymy relations

bull derivation and scoring of morphologically modified words

bull compounding of sentiment-carrying base components

In summary 4190 new words connected to the known onesthrough semantic relations have been automatically extractedfrom WordNet and 4029 lemmas have been automatically de-

52 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

rived with the morphological method The starting point of thework has been the Affect Database of Neviarouskaya et al (2007)that contained 364 emoticons 337 popular acronyms and abbre-viations (eg bl ldquobelly laughingrdquo) words standing for commu-nicative functions (eg ldquowowrdquo ldquoouchrdquo etc) 112 modifiers (egldquoveryrdquo ldquoslightlyrdquo etc) and 2438 direct and indirect emotion-related entries (1627 of which taken from WordNet-Affect)In the Affect Database emotion categories (anger disgust fearguilt interest joy sadness shame an surprise) and intensity (from00 to 10) was manually assigned to each entry by three indepen-dent annotators

Appraisal Lexicon The appraisal lexicon (Argamon et al 2009) hasbeen built through semi-supervised learning on a set of WordNetglosses containing adjectives and adverbs among which only asmall subset of the training data was manually labelledThe construction of the lexicon is grounded on the AppraisalTheory that is a linguistic approach for the analysis of subjec-tive language In this framework several sentiment-related fea-tures are assigned to relevant lexical items The feature set thatallows more subtle distinctions than the most used binary classi-fication PositiveNegative includes the labels orientation (Posi-tive or Negative) attitude type (Affect Appreciation Judgment)and force (Low Median High or Max)

WordNet-Affect WordNet-Affect is another extension of WordNetfor the lexical representation of affective knowledge (Strappar-ava et al 2004) Thanks to the synset model it puts in relationa set of affective concepts with a number of affective words Ina first step in the WordNet-Affect creation the affective corehas been created starting from almost 2000 terms directly orindirectly referring to mental states Lexical and affective in-formation have then been added to each item by creating forthem specific frames In the end the senses of terms have been

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 53

mapped to their respective synsets in WordNet The synsets havebeen specialized with a hierarchically organized set of emotionalcategories (namely emotion mood trait cognitive state physi-cal state edonic signal emotion-eliciting situation emotional re-sponse behaviour attitude sensation) Furthermore the emo-tional valences have been distinguished by four additional a-labels positive negative ambiguous and neutral (Strapparavaet al 2006)WordNet-Affect contains 2874 synsets and 4787 words (Strap-parava et al 2004)

323 Subjectivity Lexicons for Other Languages

3231 Italian Resources

The largest part of the state of the art works on polarity lexicons forSentiment Analysis purposes focuses on the English language ThusItalian lexical databases are mostly created by translating and adapt-ing the English ones SentiWordNet and WordNet-Affect among oth-ers Basile and Nissim (2013) merged the semantic information be-longing to existing lexical resources in order to obtain an annotatedlexicon of senses for Italian Sentix (Sentiment Italian Lexicon) Ba-sically MultiWordNet (Pianta et al 2002) the Italian counterpart ofWordNet (Miller 1995 Fellbaum 1998) has been used to transfer po-larity information associated to English synsets in SentiWordNet toItalian synsets thanks to the multilingual ontology BabelNet (Navigliand Ponzetto 2012)Every Sentixrsquos entry is described by information concerning its part ofspeech its WordNet synset ID a positive and a negative score fromSentiWordNet a polarity score (from -1 to 1) and an intensity score(from 0 to 1)The dictionary contains 59742 entries for 16043 synsets Details

54 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

about its composition are reported in Table 33

PoS Lemmas Synsetsnoun 52257 12228adjective 3359 1805verb 2775 1260adverb 1351 750all 59742 16043

Table 33 Sentix Basile and Nissim (2013)

Moreover Steinberger et al (2012) verified a triangulation hypoth-esis for the creation of sentiment dictionaries in many languagesnamely English Spanish Arabic Czech French German Russian andalso ItalianThe starting point is the semi-automatic generation of two pilot sen-timent dictionaries for English and Spanish that have been automat-ically transposed into the other languages by using the overlap of thetranslations (triangulation) and then manually filtered and expandedAs regards the Italian lexicon the authors collected 918 correctly tri-angulated terms and 1500 correctly translated terms from the startingdictionariesHernandez-Farias et al (2014) achieved good results in the Sen-tiPolC 2014 task by semi-automatically translating in Italian differ-ent lexicons namely SentiWordNet Hu-Liu Lexicon AFINN LexiconWhissel Dictionary etcBaldoni et al (2012) proposed an ontology-driven approach to Sen-timent Analysis where tags and tagged resources are related to emo-tions They selected representative Italian emotional words and usedthem to query MultiWordNet The synsets connected to these lem-mas were then processed with WordNet-Affect in order to populatethe emotion ontology only with the words belonging to synsets thatrepresented affective information The resulting ontology containsabout 450 Italian words referring to the 87 emotional categories of On-

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 55

toEmotion an ontology conceived for the categorization of emotion-denoting words Furthermore thanks to the SentiWordNet databaseevery synset has been associated to the neutral positive or negativescores

3232 Sentiment Lexical Resources for Different Languages

SentiLex-PT (Silva et al 2010 2012) is a sentiment lexicon for Por-tuguese made up of 7014 lemmas distributed in this way

bull 4779 adjectives

ndash eg castigadoPoS=AdjTG=HUMN0POLN0=-1ANOT=JALC

bull 1081 nouns

ndash eg aberraccedilatildeoPoS=NTG=HUMN0POLN0=-1ANOT=MAN

bull 489 verbs

ndash eg enganarPoS=VTG=HUMN0N1POLN0=-1POLN1=0ANOT=MAN

bull 666 verb-based idiomatic expressions

ndash eg engolir em secoPoS=IDIOMTG=HUMN0POLN0=-1ANOT=MAN

The sentiment entries are all human nouns modifiers compiled fromdifferent publicly available corpora and dictionariesThe tagset used to describe the attributes of the lemmas is explainedbelow

bull part-of-speech (ADJ N V and IDIOM)

bull polarity (POL) that ranges between 1 (positive) and -1 (negative)

bull target of polarity (TG) which corresponds to a human subject(HUM)

bull polarity assignment (ANOT) that specifies if the annotation hasbeen performed manually (MAN) or automatically by the Judg-ment Analysis Lexicon Classifier (JALC)

56 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Differently from Silva et al (2010 2012) that focused on thedomain-specific features of the opinion lexicon of social judgmentSouza et al (2011) proposed for the Portuguese language a domain-independent lexicon composed by 7077 polar words (adjectivesverbs and nouns) and expressionsAs regards the dictionary population the authors applied three differ-ent methods

bull corpus based approach applied on 1317 documents annotatedusing the Pointwise Mutual Information

bull thesaurus based approach on 44077 words and their semanticrelations from the TEP thesaurus (Maziero et al 2008)

bull translation based from the Liursquos English Opinion Lexicon (Huand Liu 2004)

Other contribution that deserve to be cited are Perez-Rosas et al(2012) for Spanish Mathieu (2006 1999a 1995) for French Clematideand Klenner (2010) Waltinger (2010) and Remus et al (2010) for Ger-man Abdul-Mageed et al (2011) Abdul-Mageed and Diab (2012) forArabic and Chetviorkin and Loukachevitch (2012) for RussianAs regards Multilingual Sentiment Analysis we can mention Balahurand Turchi (2012) Steinberger et al (2012) Pak and Paroubek (2011)and Denecke (2008) among others

33 SentIta and its Manually-built Resources

In this Section we will go in depth in the description of SentIta the LGbased Sentiment lexicon for the Italian language that has been semi-automatically built on the base of the richness of both the Italian mod-ule of Nooj and the Italian Lexicon-Grammar resourcesThe tagset used for the Prior Polarity annotation Osgood (1952) anno-tation of the resources is composed of four tags

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 57

Lexical Item Translation Tag Description Scoremeraviglioso wonderful +POS+FORTE Intensely Positive +3colto cultured +POS Positive +2rasserenante calming +POS+DEB Weakly Positive +1nuovo new - Neutral 0insapore flavourless +NEG+DEB Weakly Negative -1cafone boorish +NEG Negative -2disastroso disastrous +NEG+FORTE Intensely Negative -3

Table 34 SentIta Evaluation tagset

Lexical Item Translation Tag Description Scoregrande big +FORTE Intense +velato veiled +DEB Weak -

Table 35 SentIta Strength tagset

POS positive

NEG negative

FORTE intense

DEB weak

Such labels if combined together can generate an evaluation scalethat goes from -3 to +3 (see Table 34) and a strength scale that rangesfrom -1 to +1 (see Table 35)Neutral words (eg nuovo ldquonewrdquo with score 0 in the evaluation scale)have been excluded from the lexiconThe main difference between the words listed in the two scales is thepossibility to use them as indicators for the subjectivity detection taskBasically the words belonging to the evaluation scale are ldquoanchorsrdquothat begin the identification of polarized phrases or sentences whilethe ones belonging to the strength scale are just used as intensitymodifiers (see Section 42)

Details about the lexical asset available for the Italian language issummarized in Table 36 Here we report all the items contained in

58 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Category Entries Example TranslationAdjectives 5381 allegro cheerfulAdverbs 3693 tristemente sadlyCompound Adv 774 a gonfie vele full steam aheadIdioms 577 essere in difetto to be in faultNouns 3215 eccellenza excellencePsych Verbs 604 amare to loveLG Verbs 651 prendersela to feel offendedBad words 182 leccaculo arse lickerTot 15077 - -

Table 36 Composition of SentIta

SentIta including the ones automatically derived (namely adverbsand nouns) that will be described in detail in Section 35Because hand-built lexicons are more accurate than the automaticallybuilt ones (Taboada et al 2011 Bloom 2011) we started the creationof the SentIta database with the manual tagging of part of the lemmascontained in the Nooj Italian dictionaries In detail adjectives andbad words have been manually extracted and evaluated starting fromthe Nooj Italian electronic dictionary of simple words preservingtheir inflectional (FLX) and derivational (DRV) properties Moreovercompound adverbs (Elia 1990) idioms (Vietri 1990 2011) psychverbs and other LG verbs (Elia et al 1981 Elia 1984) have beenweighted starting from the Italian LG tables in order to maintain thesyntactic semantic and transformational properties connected toeach one of them

331 Semantic Orientations and Prior Polarities

In the Sentiment Analysis field the Semantic Orientation is a sub-jectivity and an opinion measure that weights the polarity (posi-tivenegative) and the strength (intenseweak) of an opinion Other

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 59

concepts used to refer to SO are sentiment orientation polarity ofopinion or opinion orientation (Taboada et al 2011 Liu 2010)The Prior Polarity instead refers to the individual wordsrsquo SemanticOrientation and differs from the SO because it is always independentfrom the context (Osgood 1952) The second concept generally usedin sentiment dictionary population has been exploited also to manu-ally evaluate the words contained in the simple word dictionary of theItalian module of Nooj (Sdic_itdic)Before the description of the annotation process begins it must beclarified that at this stage of the work the multiple meanings thata single word can possess have not been taken into account In thissense SentIta is different from the resources in which Prior Polaritiesare assigned to ldquoconceptsrdquo rather than to mere words (eg SentiWord-Net) Such procedure gives rise to the need to reconstruct the Prior Po-larities starting from the various word meanings transforming themactually in Posterior Polarities (Gatti and Guerini 2012) An exam-ple is the word ldquocoldrdquo that possesses different meanings and conse-quently different polarities if associated to the words ldquobeerrdquo (with alow temperature neutral) or ldquopersonrdquo (emotionless weakly negative)(Gatti and Guerini 2012)From our point of view this distinction is redundant in a work that in-cludes a syntactic module We strongly support the idea that only thePrior Polarities must be contained in the lexicon and that the contextshould be used to compute them in a parallel module that is the syn-tactic oneIndeed our purpose is to reduce at the same time the presence of am-biguity the computational time and the size of the electronic dictio-naries Therefore the annotators of SentIta simply labeled the wordsof Sdic_it according to the meaning they thought dominant for eachword If out of any context the word ldquocoldrdquo can not be unequivo-cally associated to a positive or negative orientation we preferred toattribute it a neutral polarity

60 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

332 Adjectives of Sentiment

The annotation of the Sdic_it started with 34000+ adjectives byconsidering each word apart from any context The first decisionis binary and regards the choice between the positive or negativepolarity so the fist tags +POS or +NEG (that respectively correspondto the score +2 -2) are attached to the adjective under exam as shownin the following examples

coltoA+FLX=N88+DRV=ISSIMON88+POS

cafoneA+FLX=N88+DRV=ISSIMON88+NEG

A fine grained classification approach must include during thePrior Polarity attribution also the determination of the intensity ofthe oriented words The strength of the SO is therefore definedwith the combined use of the POSNEG tags and the FORTEDEBlabels The last two respectively increase or decrease of one point theintensity of the sentiment items This way as shown in Table 34 wecould take into account three different degrees of positivity and asmuch levels of negativity scoresAs we will explain better in Section 42 we selected among the ad-jectives of Sdic_it a list of about 650 words that did not receive anypolarity tag but that have just been labeled with information abouttheir intensity This is the case of the items reported below which arejust examples from the SentIta Strenght list

grandeA+FLX=N79+DRV=GRANDEN88+DRV=GRANDE-CN79+FORTE

velatoA+FLX=N88+DRV=ISSIMON88+DEB

Such words are not used to locate subjectivity in texts but they canbecome really advantageous when the task is to compute the actualorientation of a phrase The words endowed with the tag +FORTEare called intensifiers and the ones marked with +DEB downtoners(Taboada et al 2011) Their function is respectively to increase or

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 61

decrease the score of the polarized wordsDetails about the distribution of sentiment tag in the adjective dictio-nary are reported in Table 37Table 38 presents a summary in term of percentage values of thecomposition of the adjective dictionary in SentIta In the upper partof the Table percentages are evaluated on the base of SentIta that ac-counts 5381 adjectives which in the Table are indicated by tot ori-ented adj The differences between the positive and negative adjec-tives and the intensity modifiers are shown hereThe coverage of the SentIta adjectives with respect to the whole num-ber of adjectives contained in Sdic_it at a first glance seems to be verylow but if compared with the other part of speech it becomes moresignificantFor a LG treatment of adjectives see Section 335

Score Adjectives Entries+3 +POS+FORTE 111+2 +POS 1137+1 +POS+DEB 110-1 +NEG+DEB 770-2 +NEG 2375-3 +NEG+FORTE 240+ +FORTE 512- +DEB 126

Table 37 SentIta tag distribution in the adjective list

Adjectives Entries Percentage ()tot pos 1358 25 (SentIta)tot neg 3385 63 (SentIta)tot int 638 12 (SentIta)tot oriented adj 5381 16 (Sdic_it)tot neutral adj 28664 84 (Sdic_it)tot adj Sdic_it 34045 100

Table 38 Adjectives percentage values in SentIta

62 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3321 A Special Remark on Negative Words in Dictionaries and Positive Biasesin Corpora

The most important aspect to notice in Table 38 is the fact that thepositive items cover just the 25 of the total amount of oriented ad-jectives while with a percentage of 63 negative adjectives predom-inate the dictionaryJust as in SentIta also other sentiment lexicons present an excess ofnegative items such as the SO-CAL lexicon (Taboada et al 2011)AFINN-111 (Nielsen 2011) and the Maryland dictionary (Mohammadet al 2009)In texts instead exactly the opposite trend can be observed ThePollyanna hypothesis asserts the universal human tendency to usepositive words more frequently than negative ones through the pop-ular claim ldquopeople tend to look on (and talk about) the bright side ofliferdquo (Boucher and Osgood 1969 p 1)The experimental results of Montefinese et al (2014) and Warrineret al (2013) confirmed the so-called linguistic positivity bias theprevalence of positive words in different languagesGarcia et al (2012) also measured this bias by quantifying in the con-text of online written communication both the emotional contentof words in terms of valence and the frequency of word usage in thewhole indexable web They provided strong evidence that words witha positive emotional connotation are used more oftenAs regards the experiment in the Sentiment Analysis field that tried tosolve the positive bias we mention Taboada et al (2011) that uponnoticing the presence of almost twice as many positive as negativewords in their corpus addressed the problem by increasing the SO ofany negative expression by a fixed amount (of 50) Voll and Taboada(2007) overcame the same problem by shifting the numerical cut-offpoint between positive and negative reviewsIn this research we chose to avoid the introduction of unpredictableerrors due to the difficulty to accurately evaluate to what extent posi-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 63

tive and negative items are differently distributed in written texts

333 Psychological and other Opinion Bearing Verbs

In the present Section we will go through the more complex annota-tion of verbs of sentiment from available Lexicon-Grammar matricesWe will return to the adjectives of sentiment in Section 335 where itwill be addressed the issue of the adjectivalization of the psychologicalpredicates

3331 A Brief Literature Survey on the Classification of Psych Verbs

The difference among the direct (Experiencer subject) and reverse (Ex-periencer subject) representations of thematic roles is underlined inthe extensive body of scientific literature on psych verbs eg Chom-sky (1965) Belletti and Rizzi (1988) Kim and Larson (1989) Jackend-off (1990) Whitley (1995) Pesetsky (1996) Filip (1996) Baker (1997)Martiacuten (1998) Arad (1998) Klein and Kutscher (2002) Bennis (2004)Landau (2010)The thematic roles involved are the following

Experiencer the entity or the person that experiences the reaction

Cause ldquothe thing person state of affairs that causes the reaction(Whitley 1995) ldquoor cognitive judgment in the experiencerrdquo (Kleinand Kutscher 2002)

These two main classes8 correspond to three classes proposed byBelletti and Rizzi (1988) in which the reverse representation is splitaccording to the presence of a direct or indirect object mapped withthe role of experiecer

8These classes of psych verbs are also known as the Fear class (direct) and the Frighten class (reverse)(Jackendoff 1990 Pesetsky 1996 Baker 1997) on the base of the syntactic positions occupied by thethematic roles of the experiencer and the cause (theme)

64 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class I verbs which have the experiencer as subject and the cause asdirect object (eg temere ldquoto be afraid ofrdquo)

Class II verbs which have the cause as subject and the experiencer asdirect object (eg preoccupare ldquoto worryrdquo)

Class III verbs which have the cause as subject and the experienceras indirect object in dative case (eg piacere ldquoto likerdquo9)

Transitivity is taken into account also by Whitley (1995) who in-stead identified four classes of psych verbs for the Spanish

Class I transitive-direct (eg amar ldquoto loverdquo)

Class II intransitive-direct (eg gozar de ldquoto enjoyrdquo)

Class III intransitive-reverse (eg gustar ldquoto likerdquo)

Class IIII transitive-reverse (eg tranquilizar ldquoto appeaserdquo)

3332 Semantic Predicates from the LG Framework

Shifting the focus on the works realized into the Lexicon-Grammarframework we find again the distinction among two types of pos-sible constructions on the base of the syntactic position (subject orcomplement) of the person who feels the sentiment (Mathieu 1999aRuwet 1994) and of course the distinction between the transitiveand intransitive verbsThe (reverse) French verbs of sentiment that enter into the transitivestructure N0 V N1 have been examined by Mathieu (1995)10 In thisstructure N0 the stimulus of the sentiment can be a human concreteor abstract noun or a completive or infinitive clause and N1 theexperiencer of the sentiment is a human noun or also other animated

9Note that if the Italian verb piacere is classified in this class of Belletti and Rizzi (1988) its Englishtranslation ldquoto likerdquo belongs to the first one

10The analysis of Mathieu (1995) is based on the syntactic model of the LG French Table 4

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 65

nouns such as animals or gods (Ruwet 1994 see example 3)

(3)

Luc

Ce t abl eauLe mensong e

Que Luc soi t venuTr embler

i r r i te M ar i e (Mathieu 1995)

ldquo(Luc + This table + The lie + That Luc has come + To tremble) irri-tates Marierdquo

Nevertheless Ruwet (1994) argued that a number of predicatesfrom this class11 select an Num in position N1 that do not play the roleof Experiencer as happens in the example (5) differently from whathappens in (4) According with Ruwet (1994) le ministre of (4) doesnot feel any sentiment he is instead the object of a negative (public)judgment caused by the stimulus expressed by N0

(4) La lecture drsquoHomegravere passionne Maxime (Ruwet 1994)ldquoReading Homer moves Maximerdquo

(5) Ses deacuteclarations intempestives discreacuteditent le ministre (Ruwet 1994)ldquoHis untimely declarations discredit the Ministerrdquo

One of the main purposes of this thesis is to provide a Lexicon-Grammar based database for Sentiment Analysis tools We decided tostart from the LG works on psychological predicates because of theirrichness but we do not look at the Ruwet (1994) objection as a limit forour data collection It offers only a wider range of information whichwe associate to our data Sentiment Analysis does not focus only onmental states but searches for every kind of positive or negative state-ment expressed in raw texts therefore also a positive or negative (pub-lic) judgment represents a valuable resource in this fieldThe hypothesis is that the the Predicates under examination could bedistinguished into three subsets the ones indicating a mental state

11 Among the verbs pointed out by Ruwet (1994) there are abuser aigrir anoblir aplatir avilir bernerciviliser compromettre consacrer corrompre couler deacutedouaner deacutemasquer deacutenaturer etc

66 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

that select the roles experiencer and causer (amare ldquoto loverdquo odiareldquoto haterdquo) the ones connected to the manifestation of judgments that(following the Sentiment Analysis trend in terminology) involve theroles of opinion holder and target (eg difendere ldquoto defendrdquo con-dannare ldquoto condemnrdquo) and the ones related to physical acts that se-lect the roles patient and agent (eg baciare ldquoto kissrdquo schiaffeggiareldquoto slaprdquo)12 The mapping between the syntactic representations ofactants and the semantic representations of arguments is feasible andvaluable but from a LG point of view becomes a little bit more com-plex if compared with the generalizations described in the previousParagraph The big problem aroused by the LG method is that theSyntax-Semantics mapping can not ignore the idiosyncrasies of thelexiconIn the following paragraphs we will show the distribution of Psycho-logical Predicates among the classes 41 42 43 and 43B (3333) and wewill demonstrate that a large number of predicates evoking judgmentphysical positivenegative acts but also psychological states are dis-tributed among other 28 LG classes every one of which possesses itsown definitional syntactic structure The large number of structuresin which these verbs can enter underlines the importance of a senti-ment database that includes both lexiocn-based semantic and syntac-tic classification parameters

3333 Psychological Predicates

Among the verbs chosen for our sentiment lexicon a prominent posi-tion is occupied by the Predicates belonging to the Italian LG classes41 42 43 and 43B that constitute 47 of the total amount of verbsin SentIta (Table 311) From the lexical items contained in such LGclasses a list of 602 entries has been evaluated and hand-tagged withthe same labels used to evaluate the adjectives

12An experiment realized on these frames is reported in Section 53

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 67

In the following Paragraphs we will describe some of the peculiaritiesof the psych verbs from the classes 41 42 43 and 43B questioning inmany cases their psychological nature We will then briefly examinethe presence of subjective verbs in other 28 LG classes

Psych Verb Translation LG Class Scoreangosciare ldquoto anguish 41 -3piacere ldquoto like 42 +2amare ldquoto love 43 +3biasimare ldquoto blame 43B -2

Table 39 Examples from the Psych Predicates dictionary

LGC

lass

TotC

lass

Psy

chVe

rbs

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

()

41 599 525 136 325 45 19 8842 114 8 4 4 0 0 743 298 39 19 20 0 0 1343B 142 32 11 21 0 0 23Tot 1153 604 170 368 45 19 52

Table 310 Psych Predicates chosen for SentIta

LG Class 41 Ch F V N1

The verbs from this LG class have as definitional structure N0 V N1in which

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquobut with few exceptions it can be also a human a concrete or anabstract noun

68 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Sen

tIta

Verb

s

TotI

tem

s

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Psych Verbs 604 170 368 45 19Other Verbs 651 189 350 79 33

Tot 1255 359 718 124 52

Table 311 All the Semantic Predicates from SentIta

bull N1 is a human noun (Num)

A great part of these verbs (436 entries) with homogeneous syntacticbehavior can be grouped together also because of a semantic homo-geneity connected to their psychological nature (Elia 1984) examplesof this subset are reported below in their dictionary form

addolorareV+NEG+Sent+FLX=V3+DRV=RI+41

ingelosireV+NEG+DEB+Sent+FLX=V201+41

perturbareV+NEG+Sent+FLX=V3+41

rallegrareV+POS+Sent+FLX=V3+DRV=RI+41

rincuorareV+POS+DEB+Sent+FLX=V3+41

tormentareV+NEG+FORTE+Sent+FLX=V3+DRV=RI+4113

This Psychological subgroup identified in the electronic dictionarythrough the tag +Sent exactly like the French verbs from the LG class4 enters into a reverse transitive psych-verb structure by selectingan Experiencer as object N1 and a Stimulus as N0 as exemplified in (6)

13ldquoTo grieve to make jealous to upset to cheer up to reassure to tormentrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 69

(6) [Sentiment[StimulusLe sue parole]

addolorano[-2]

ingelosiscono[-1]

perturbano[-2]

rallegrano[+2]

rincuorano[+1]

tormentano[-3]

[ExperiencerMaria]]

ldquoHis words (grieve + make jealous + upset + cheer up + reassure +torment) Mariardquo

The objection of Ruwet (1994) on the psychological semantichomogeneity on this class can be applied also to the Italian LG class41 The extract of the dictionary reported below explains it better thanwords

danneggiareV+NEG+Op+FLX=V4+41

esautorareV+NEG+Op+FLX=V3+41

glorificareV+POS+Op+FORTE+FLX=V8+41

ledereV+NEG+Op+FLX=V34+41

miticizzareV+POS+Op+FLX=V3+41

smascherareV+NEG+Op+DEB+FLX=V3+4114

122 items from the class 41 have been selected and marked withthe tag +Op to indicate according with Ruwet (1994) that they selectan N1 that does not plays the role of the Experiencer but that repre-sents just the Target of a positive or negative judgment Differentlyfrom other verbs also tagged with the +Op label (eg condannare ldquotocondemnrdquo) here N0 is not the Opinion Holder but just the Causethat justifies the opinion conveyed by the verb (see example 7) TheOpinion Holder is never expressed by the arguments of the predicatesfrom the class 41 because it can be played in turn by the publicopinion or the people (Ruwet 1994)

14ldquoTo damage to reduce or deprive of the authority to glorify to wrong to make into heroes to un-maskrdquo

70 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(7) [Opinion[CauseLe sue parole]

danneggiano[-2]

esautorano[-2]

glorificano[+3]

ledono[-2]

miticizzano[+2]

smascherarno[-2]

[TargetMaria]]

ldquoHis words (damage + reduce or deprive of the authority + glorify +wrong + make into heroes + unmask) Mariardquo

Another subgroup of the verbs from this class marked in the LGtable as accepting the property V= uso concreto (ldquoV= concrete userdquo)possesses also a concrete meaning This is the case of abbattere ldquotodemolishrdquo (fig ldquoto discouragerdquo) accendere ldquoto lightrdquo (fig ldquoto fire uprdquo)colpire ldquoto hitrdquo (fig ldquoto moverdquo) etcOf course the semantic properties of the figurative verbs are notshared with their concrete homographsThe automatic disambiguation of these metaphors is sometimes verysimple because it can be based on the divergent properties of the dif-ferent LG classes to which the concrete and figurative words belong (89)

Concrete usage (Class 20NR)

(8) Carla accende il fiammifero (Elia 1984)ldquoCarla lights the matchrdquo

Figurative usage (Class 41)

(9) Parlare di rivoluzione accende Arturo (Elia 1984)ldquoTalking about revolution fires up Arturordquo

As exemplified the verb accendere belongs also to the class 20NRin which it does not accept a completive as N0 (Parlare di rivoluzioneaccende il fiammifero ldquoTalking about revolution lights the matchrdquo)neither a human noun as N1 (Carla accende il fratello ldquoCarla lights herbrotherrdquo) or a body part noun (Carla accende la gamba del fratelloldquoCarla lights her brotherrsquos legrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 71

In other cases it proves more difficult but it can be solved as well withthe help of semantically annotated dictionaries This happens in theexamples below in which the fact that this verb is marked in the class20NR as accepting the property Prep N2strum makes it possible toautomatically distinguish its concrete use (10) from the figurative one(11)

Concrete usage

(10) C ar l a accende i l f uoco del l a

stu f ag r i g l i acuci na

ldquoCarla lights the fire of the (stove + grill + cooker)

Figurative usage

(11) C ar l a accende i l f uoco del (l a)

r i vol uzi onecambi amento

passi one

ldquoCarla ignites the fire of (revolution + change + passion)rdquo

LG Class 42 Ch F V Prep N1

The verbs from the class 42 enter into an intransitive reverse psych-verb structure and are defined by the LG formula N0 V Prep N1 where

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquo) butin few cases can be also a human (Num) or a non restricted noun(Nnr)

bull N1 is a human noun (Num) but can be also a body part or anabstract noun of the kind discorso ldquo discourserdquo mente ldquomindrdquomemoria ldquomemoryrdquo (Elia 1984)

Here we find a selection of Psychological Predicates so poor thatcan be entirely reported

72 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

aggradareV+POS+Sent+42+FLX=V403+Prep=a

arridereV+POS+Sent+42+FLX=V34+Prep=a

dispiacereV+NEG+Sent+42+FLX=V37+Prep=a

spiacereV+NEG+Sent+42+FLX=V37+Prep=a

piacereV+POS+Sent+42+FLX=V37+DRV=RI+Prep=a15

The importance of this LG class into SentIta is not given by the(very small) amount of entries it accounts but depend on the ex-tremely high frequency of the verb piacere ldquoto likerdquoHere the prepositions (Prep) have been associated directly in thedictionary in order to avoid ambiguity with other verb usagesThe semantic annotation does not raise any problem as can bededuced from the example (12)

(12) [Sentiment[Stimulus

I l f at to che si f umi

C he si f umiFumar e

I l f umar eI l f umo

]

ag g r ad a[+2]

di spi ace[-2]

pi ace[+2]

a[ExperiencerMaria]]

ldquo(The fact that people smokes + That people smokes + Smoking +The smoke) (pleases + regrets + is appreciated by) Mariardquo

The main difference with the class 41 added to its intransitivity isthe fact that in Italian a more easily accepted form is the permutedone (Elia et al 1981 see example 13)

(13) [SentimentA [ExperiencerMaria] piace[+2] [Stimulusfumare]]ldquoMaria likes smokingrdquo

Again a subset of verbs from this class can be marked as opinionindicator

15ldquoto please to favor to regret to likerdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 73

Both N0 and N1 can be human nounsincombereV+NEG+Op+42+FLX=V21+Prep=loc+N0um+N1um

primeggiareV+POS+Op+FLX=V4+Prep=su+N0um+N1um

contrastareV+NEG+Op+FLX=V3+Prep=con+N0um+N1um

contareV+POS+Op+FLX=V3+Prep=per+N0um+N1um

Just N1 can be a human nounaddirsiV+42+POS+Op+FLX=V263+PRX+Prep=a+N1um

sconvenireV+NEG+Op+FLX=V236+Prep=a+N1um

convenireV+POS+Op+FLX=V236+Prep=a+N1um

costareV+NEG+Op+FLX=V3+Prep=a+N1um

Neither N0 nor N1 can be human nounsdegenerareV+NEG+FORTE+Op+42+FLX=V3+DRV=BILEN79+Prep=in

stridereV+NEG+Op+FLX=V2+Prep=con16

Not every verb of this subset accepts a human noun as N1 (see exam-ples 14 and 15)

(14) [Opinion[TargetQueste idee]str i dono[-2] con

i mi ei obi et t i vil a tua i mmag i ne

lowastM ar i a

]

ldquoThese ideas are at odd with (my purposes + your image + Maria)rdquo

(15) [Opinion[TargetLa situazione]degenera[-3] in un

dr ammacon f l i t tolowastM ar i a

]

ldquoThe situation degenerates into a (drama + conflict + Maria)rdquo

In such cases the Opinion Holder remains implicit but the judg-ment negatively affects the subject of the sentence instead of the N1Sentences like (16) and (17) raise to doubt whether N1 can be theHolder of the opinion expressed by the sentence or not

16ldquoto loom over to excel to hinder to worth to suit to be inconvenient to be convenient to cost todegenerate to clashrdquo

74 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(16) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion HolderArturo]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto Arturordquo

(17) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[convi ene[+2]

cost a[-2]

][Opinion Holderad Arturo]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for Arturordquo

Actually the verb does not provide any information regarding theactive or passive role of N1 in the expression of the opinion Thewriter can be convinced about what ldquomatters is convenient or costsrdquofor Arturo but it remains a belief that is not confirmed in the text (1617) Therefore we chose to assign the Opinion Holder role to the N1

of the class 42 only in the case in which the object corresponds to thepersonal pronouns ldquomerdquo or ldquousrdquo (18 19)

(18) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion Holderme]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto merdquo

(19) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[Opinion Holderci]

[convi ene[+2]

cost a[-2]

]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for usrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 75

LG Classes 43 and 43B N0 V Ch F and N0 V il fatto Ch F

The sentence structure that defines this class is N0 V N1

bull N0 in both the classes 43 and 43B is generally a human noun(Num)

bull N1 is an infinitive clause or a completive one which is direct forthe 43 and introduced by il fatto (Ch F + di) ldquothe fact (that S + of)rdquo)for the 43B

The psychological (Sent) but also the opinion (Op) verbs from theclass 43 share in many cases the acceptance or the refusal of a set ofLG properties

deprecareV+Op+NEG+Class=43+FLX=V8

bestemmiareV+Op+NEG+FORTE+Class=43+FLX=V11

criticareV+Op+NEG+Class=43+FLX=V8

maledireV+Op+NEG+FORTE+Class=43+FLX=V216

tollerareV+Op+NEG+DEB+Class=43+FLX=V3

adorareV+Sent+POS+FORTE+Class=43+FLX=V3

amareV+Sent+POS+FORTE+Class=43+FLX=V3

ammirareV+Sent+POS+Class=43+FLX=V3

odiareV+Sent+NEG+FORTE+Class=43+FLX=V12

rimpiangereV+Sent+NEG+DEB+Class=43+FLX=V3117

As regards the distribution of N0 it must be noticed that all theseverbs accept a human noun but only two of them (condannare ldquotocomndemnrdquo and meritare ldquoto deserverdquo) accept also a not restrictednoun (Nnr not active) or a subjective clause introduced by il fatto (ChF + di) This fact is perfectly consistent with the animated nature ofthe Experiencer with respect to the Psychological verbs (20) Howeverit underlines that this class of verbs selects also an Opinion Holder asN0 (except for the verb meritare)

17ldquoto deprecate to blaspheme to criticize to curse to tolerate to adore to love to admire to haterdquo

76 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(20) [Sentiment[ExperiencerMaria]

[ador a[+3]

odi a[-3]

][Stimulus

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (adores + hates) (Arturo + that Arturo smokes + the fact thatArturo smokes + the smoke)rdquo

(21) [Opinion[Opinion HolderMaria]

[cr i t i ca[-2]

tol ler a[-1]

][Target

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (criticizes + tolerates) (Arturo + that Arturo smokes + thefact that Arturo smokes + the smoke)rdquo

As far as the distribution of N1 is concerned all the psych and opin-ion items accept the pronoun ciograveldquothisrdquo and the Ppv lo ldquoithimherrdquo(apart from inveire ldquoto inveighrdquo) The only verbs that refuse Numas complement are auspicare ldquoto hope forrdquo inveire ldquoto inveighrdquolamentare ldquoto lamentrdquo malignare ldquoto speakthink ill ofrdquo meritare ldquotodeserverdquo agognare ldquoto yearn forrdquo anelare ldquoto long forrdquo deilrare ldquototalk nonsenserdquo and gustare ldquoto tasterdquo The verbs that instead do notaccept a N-um are fewer inveire and delirareThe property N1 dal fatto Ch F is refused by every verb except forapprezzare ldquoto admirerdquo that together with sopportare ldquoto sufferrdquodesiderare ldquoto desirerdquo and bramare ldquoto craverdquo accept also the propertyN1 da N2 Other properties refused in bulk are Ch N0 V essere Comp(save malignare sospettare ldquoto suspectrdquo temere ldquoto be afraid ofrdquo)N1 Agg (save sospettare temere) N1 = Agg Ch F (save sospettare) N0

V essere Cong (save sospettare and malignare) N1 = se F o se F (savebenedire ldquoto bless rdquo apprezzare and gustare) Ch F a N2um (saveapprovare ldquoto approverdquo)Similar observations but with an even higher homogeneity can be

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 77

made on the class 43B that differently from the 43 has the N1 alwaysintroduced by il fatto (Ch F + di)The Psychological (Sent) and the opinionated (Op) verbs from theclass 43B share sometimes the acceptance or the refusal of a set of LGproperties

attaccareV+Op+NEG+Class=43B+FLX=V8

biasimareV+Op+NEG+Class=43B+FLX=V3

lodareV+Op+POS+Class=43B+FLX=V3

osannareV+Op+POS+FORTE+Class=43B+FLX=V3

snobbareV+Op+NEG+Class=43B+FLX=V3

assaporareV+Sent+POS+Class=43B+FLX=V3

pregustareV+Sent+POS+Class=43B+FLX=V3

schifareV+Sent+NEG+FORTE+Class=43B+FLX=V3

sdegnareV+Sent+NEG+FORTE+Class=43B+FLX=V16

spregiareV+Sent+NEG+FORTE+Class=43B+FLX=V418

Here again an N0 different from a human noun can be acceptedjust by three verbs (banalizzare ldquoto trivializerdquo demitizzare ldquoto unideal-izerdquo and pregiudicare ldquoto compromiserdquo)The pronoun ciograveldquothisrdquo the Ppv lo ldquoithimherrdquo and a not human nounare accepted by all the Sent and Op verbs in complement positionwhile the N1um is refused by 2 psychological and 5 opinionated verbsAs regards the properties refused by almost all the verbs we account N1

dal fatto Ch F N1 da N2 (save difendere ldquoto defendrdquo) and Ch F a N2um(save apologizzare ldquoto make an apology forrdquo and vantare ldquoto praiserdquo)Differently from the class 43 in which the passive transformation isnot accepted in 3 cases in the class 43B it is accepted by all the items

18ldquo to attack to blame to praise to snub to taste to foretaste to be disgusted by to be indignant todespiserdquo

78 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3334 Other Verbs of Sentiment

We stated in Paragraph 3332 that almost half of the SentIta verbsdo not coincide with what is defined in literature as ldquopsych verbrdquoMoreover a lot of psych verbs and other opinion-bearing items canbe found in LG classes different from the ones classically identifiedas containing psychological predicates namely 41 42 43 43B (Elia2014a 2013) In detail a large number of verbs that can be used assentiment and opinion indicators is classified over 28 other LG classesdescribed by as many definitional structures This fact could be inter-preted as the failure of the syntactic-semantic mapping hypothesisbut actually into the LG framework the semantic entropy of the lex-icon can be decreased through the identification of classes of wordsthat share sets of lexical-syntactic behaviorsThe main Lexicon-Grammar verbal subclasses are the mentioned be-low

Intransitive verbs LG classes from 1 to 15 on which LG researcherstested the acceptance or the refusal of 545 combinatorial and dis-tributional properties through 23827 elementary sentences

Transitive verbs LG classes from 16 to 40 tested on 361 propertiesamong 19453 sentences

Complement clause verbs (transitive and intransitive) LG classesfrom 41 to 60 tested on 443 properties and 57242 sentences(Elia 2014a)

Table 312 shows the percentage values of opinionated verbsin each one of the Lexicon-Grammar verbal subclasses mentionedabove As we can see the complement clause group contains thelarger number of oriented items When inserted in our electronicdictionaries the lemmas from these LG classes are enriched with se-mantic information concerning the Types (Sent sentiments emo-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 79

LG Class

Verb

usa

ges

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intransitive verbs 401 138 21Transitive verbs 605 205 31Complement clause verbs 760 308 47Tot 1766 651 100

Table 312 Polar verbs in the main Lexicon-Grammar verbal subclasses

tions psychological states Op opinions judgments cognitive statesPhy physical acts) the Orientations (POS NEG) and the Intensities(STRONG WEAK ) of the described lemmasSimilarly to SentiLex (Silva et al 2010 2012) SentIta is provided withthe specification of the argument(s) Ni semantically affected by theorientation of the verbsThe presence of oriented verbs and the distribution of semantic tagsare respectively described in Tables 313 and 314 For a complete pre-sentation of LG verbal classes their composition and their granularitysee Elia (2014a)

334 Psychological Predicatesrsquo Nominalizations

The nominal entities of SentIta that will described in this Sectionare all predicative forms due to their correlation with the verbalpredicates described above eg angosciare ldquoto anguishrdquo angoscialdquoanguishrdquo (DrsquoAgostino 2005)The complex phenomenon related to the combinatorial constraintsbetween sentiment V-n and specific verbs in simple sentences hasbeen already explored by Balibar-Mrabti (1995) and Gross (1995) forthe French language In this regard Gross (1995) specified that ldquoles

80 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Mai

nLG

Cla

ss

LGC

lass

Verb

usa

ge

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intr

ansi

tive

Verb

s 2 122 34 282A 40 9 234 31 12 399 72 35 4910 56 19 3411 80 29 36

Tran

siti

veVe

rbs

18 40 8 2020I 25 1 420NR 83 15 1820UM 145 65 4521 29 2 721A 65 44 6822 63 28 4423R 34 13 3824 72 18 2527 37 10 2728ST 12 1 8

Co

mp

lem

entC

lau

seVe

rbs

44 33 18 5544B 44 20 4545 31 23 7445B 21 16 7647 313 120 3847B 34 12 3549 96 12 1350 51 41 8051 36 19 5353 28 9 3256 73 18 25

Tot - 1766 651 37

Table 313 Polar verbs in other LG verb classes which contain at least one orienteditem

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 81

Mai

nLG

Cla

ss

LGC

lass

Ori

ente

dVe

rbs

Sent Op POS NEG FORTE DEB

Intr

ansi

tive

Verb

s 2 34 7 24 8 26 0 02A 9 4 2 1 8 0 04 12 0 11 5 7 0 09 35 10 25 11 24 0 010 19 4 15 9 10 0 011 29 25 4 0 29 0 0

Tran

siti

veVe

rbs

18 8 0 0 4 4 0 020I 1 0 0 1 0 0 020NR 15 5 8 4 11 0 020UM 65 16 25 11 54 0 021 2 0 2 0 2 0 021A 44 9 14 17 6 11 1022 28 0 28 21 7 0 023R 13 0 0 4 9 0 024 18 0 0 0 1 15 227 10 0 10 10 0 0 028ST 1 0 0 0 1 0 0

Co

mp

lem

entC

lau

seVe

rbs

44 18 0 18 16 2 0 044B 20 2 18 14 6 0 045 23 6 16 7 15 1 045B 16 5 10 7 9 0 047 120 0 120 5 54 43 1847B 12 0 34 4 8 0 049 12 0 12 4 8 0 050 41 4 33 15 22 4 051 19 1 14 10 9 0 053 9 0 9 0 9 0 056 18 1 9 1 9 5 3

tot - 651 99 461 189 350 79 33

Table 314 Distribution of semantic labels into the LG verb classes that contain atleast one oriented item

82 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

restrictions de noms de sentiment agrave certains verbes sont nombreuseset tellement inattendues que dans un premier temps nous les avionstraiteacutees comme des expressions figeacutees crsquoest-agrave-dire comme de simpleslistes19rdquoAs regards Italian works in the LG framework we mention the worksof DrsquoAgostino (2005) DrsquoAgostino et al (2007) and Guglielmo (2009)that respectively explored the lexical micro-classes of the sentimentsira ldquoangerrdquo angoscia ldquoanguishrdquo and vanitagrave ldquovanityrdquo studying theirnominal manifestation in support verb constructionsDifferently from ordinary verbs (discussed here in Sections 333)support verbs do not carry any semantic information this role is justplayed here by the sentiment nouns but they can modulate it to acertain extentIn the present research the Nominalizations of Psychological verbshave been used to manually start the formalization of the SentItadictionary dedicated to nouns It comprehends 1000+ entries andincludes list of 80+ human nouns evaluated and added to this dic-tionary in order to use them in sentences like N0um essere un N1sent(eg Max egrave un attaccabrighe ldquoMax is a cantankerousrdquo)Table 315 shows one example of Psych V-n for each LG class took intoaccount during the annotation of Psychological Predicates

A sample of the entries from this dictionary is given below The LG

Psych verb V-n V-n Translation LG Class Scoreangosciare angoscia ldquoanguish 41 -3piacere piacere ldquopleasure 42 +2amare amore ldquolove 43 +3biasimare biasimo ldquoblame 43B -2

Table 315 Description of the Nominalizations of the Psychological Semantic Pred-icates

class is associated to a given item by the property ldquoClass that let

19ldquoThe restrictions of nouns of sentiment to certain verbs are numerous and so unexpected that at firstwe treated them as frozen expressions that is to say as mere listsrdquo Authorrsquos translation

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 83

the grammar recognize the proper syntactic structures in which thenominalizations occur in real texts

LG Class 41bellezzaN+POS+Class=41+DASOLO+FLX=N41

caloreN+POS+Class=41+FLX=N5

contentezzaN+POS+Class=41+DASOLO+FLX=N41

dolcezzaN+POS+Class=41+DASOLO+FLX=N41

fascinoN+POS+Class=41+DASOLO+FLX=N520

LG Class 42dispiacereN+FLX=N5+42+NEG+DASOLO

dispiacevolezzaN+FLX=N41+42+NEG+DASOLO

dispiacimentoN+FLX=N5+42+NEG+DEB+DASOLO

piacereN+FLX=N5+42+POS+DASOLO

piacevolezzaN+FLX=N41+42+POS+DASOLO21

LG Class 43approvazioneN+FLX=N46+43+POS+DASOLO

benedizioneN+FLX=N46+43+POS+DASOLO

bestemmiaN+FLX=N41+43+NEG+DASOLO

bramosiaN+FLX=N41+43+NEG+DEB+DASOLO

condannaN+FLX=N41+43+NEG+DASOLO22

LG Class 43BdenigrazioneN+FLX=N46+43B+NEG+FORTE+DASOLO

derisioneN+FLX=N46+43B+NEG+DASOLO

dileggioN+FLX=N12+43B+NEG+DASOLO

disdegnoN+FLX=N5+43B+NEG+FORTE+DASOLO

disprezzoN+FLX=N5+43B+NEG+FORTE+DASOLO23

Table 316 presents the details about the distribution of sentimenttags into the Psych V-n dictionary The size of the nouns dictionary is

20 ldquoBeauty warmth happiness kindness charmrdquo21 ldquoSorrow displeasure regret pleasure pleasantnessrdquo22 ldquoApproval blessing curse yearning condemnationrdquo23 ldquoDetraction derision mockery disdain despiserdquo

84 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

LGC

lass

No

un

s

Po

siti

veIt

ems

Neg

ativ

eIt

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

s

41 856 245 542 41 28 7742 11 5 6 0 0 143 170 57 113 0 0 1543B 74 21 53 0 0 7Tot 1111 328 714 41 28 100 100 30 64 4 3 -

Table 316 Nominalizations of the Psych Predicates chosen from SentIta

larger than the the verbsrsquo one This happens because the same pred-icate can possess more than one nominalization (eg the verb amarefrom the LG class 43 is related with the V-n amabilitagrave amore amorev-olezza)

3341 The Presence of Support Verbs

To include V-n in a sentiment lexicon is particularly risky In manycases it is misleading to use them out of any context because theyoften have a specific SO only if used with particular support verbsAmong the Vsup considered we mention the following with theirequivalents (DrsquoAgostino 2005 DrsquoAgostino et al 2007)

bull essere in ldquoto be inrdquo patire di soffrire di ldquoto suffer of stare in ldquotobe inrdquo andare in ldquoto go inrdquo

bull avere ldquoto haverdquo avvertire ldquoto perceiverdquo patire soffrire ldquoto suffertenere ldquoto have sentire provare ldquoto feel subire ldquoto undergordquonutrire covare ldquoto harborrdquo provare un (senso + sentimento) di ldquoto ldquoto feel a (sense + sentiment) of

bull causare ldquoto causerdquo dare ldquoto give suscitare ldquoto raiserdquo provocare

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 85

ldquoto provoke creare ldquoto create alimetare ldquoto instigate stimo-lare ldquoto inspire sviluppare ldquoto develop incutere ldquoto instillrdquo sol-lecitare ldquoto urge tordquo donare regalare offrire ldquoto offerrdquo

Sometimes it is even possible to have a SO switch by changing theword context For example the noun pietagrave ldquopity and ldquocompassionis strongly negative into a fixed expression with the support verb fare(22) but it becomes positive when it co-occurs with the sentiment ex-pression of (23)

(22) Egrave proprio la sceneggiatura che fa pietagrave [-3]ldquoThatrsquos the script that is pitifulrdquo

(23) Mary prova un sentimento di pietagrave [+2]ldquoMary feels a sentiment of compassionrdquo

Systematically evaluating the relationships that exist between eachone of the psych nouns and each one of the support verbs that canbe combined with them through LG tables can bring to an importantconclusion the fact that the support verb variants can influence thepolarity of the V-n with which they occur is not an exception For ex-ample extensions of Vsup like subire ldquoto undergordquo covare ldquoto harborrdquoincutere ldquoto instillrdquo bring a negative orientation with them donare re-galare offrire ldquoto offerrdquo are positively connoted perdere ldquoto loserdquo ab-bandonare ldquoto abandonrdquo lasciare ldquoto leaverdquo cause a polarity switch forwhatever psych noun orientationSupport verbs like the ones cited above can not be simply includedinto a Vsup list but must be inserted in specific paths of a nouns gram-mars able to correctly manage their influence on V-n

3342 The Absence of Support Verbs

Another problem arises when a psych predicative noun from our col-lection presents different semantic behaviors when occurring withoutsupport verbs in real texts This is the case of calore ldquowarmthrdquo from the

86 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

verb riscaldare ldquoto warm uprdquo of the LG class 41 that possesses its psy-chological figurative meaning only together with specific structures(24) but assumes a physical interpretation when it occurs alone (26)Different is the case of contentezza ldquohappinessrdquo that preserves its psy-chological meaning in both the cases (25 27)

(24) Le tue parole invadono[+] Maria di calore[+2] [+3]ldquoyour words warm up Maryrdquo

(25) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoyour words overwhelm Mary with happinessrdquo

(26) Che calore[+2] [0]ldquowhat a heatrdquo

(27) Che contentezza[+2] [2]ldquowhat a joyrdquo

In a banal way we solved the problem by distinguishing in the dic-tionaries and recalling in the grammars the cases in which the nounscan occur alone from the cases in which a specific Vsup context is re-quired to have a particular SO by means of the special tag +DASOLOldquoalonerdquo

3343 Ordinary Verbs Structures

In order to deeper examine how the presence of different verbs canaffect the polarity of the Psychological V-n we translated the list of 61verbs of Balibar-Mrabti (1995) from the French and annotated themwith semantic information The sentence structure which they havebeen conceived for is N0sent V N1um (eg una gioia intensa riempieMax ldquoan intense joy fills Maxrdquo) in which the subject N0 plays the roleof Nsent Then they have been tested into the structure N0 V N1umdi N2sent (eg le tue parole riempiono Max di una gioia intensa ldquoyourwords fill Max with an intense joyrdquo) that in turn is related with the

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 87

passive Max egrave pieno di una gioia intensa ldquoMax is full of an intense joyrdquoThis list of corresponding Italian verbs contains 47 entries because of

the exclusion of the duplicates from the LG class 4124 Among them 11give to the Nsent a negative connotation (eg abbattere ldquoto lose heartrdquooffuscare ldquoto confuserdquo corrodere ldquoto corroderdquo) 6 confer it a positiveorientation (eg rispeldere ldquoto shinerdquo cullare ldquoto clingrdquo illuminareldquotolight uprdquo) 18 intensify their polarity (eg travolgere ldquoldquoto overwhelmrdquordquoattraversare ldquoto undergordquo riempire ldquoto fillrdquo) and 12 simply possess aneutral orientationTheir role can become crucial into two differently oriented sentencesthat make use of the same Nsent like (28a b c)

(28a) Lrsquoamore[+2] tormenta[-3] Maria [-3](28b) Lrsquoamore[+2] illumina[+2] Maria [+2](28c) Lrsquoamore[+2] travolge[+] Maria [+3]

ldquoThe love (torments + lights up + overwhelms) Mariardquo

Therefore for sake of simplicity we decided to put them into aFSA that assigns as sentence polarity the score of the verbs with tagsNEGPOS (28a b) and that uses the verbs with the tag FORTE as in-tensifiers (28c) A simplified version of this grammar is displayed inFigure 31 Here the paths marked with the number (1) represents thenegative rule (28a) the paths (2) represent the positive rule (28b) andthe paths marked with (3) exemplify the intensification rule (28c)In this dictionarygrammar pair we did not mark the verbs with spe-cial tags consequently other Psychological Predicated endowed withpolarity and intensity tags can be matched in the correspondingpaths

24The verbs from this class are all compatible with this structure because they accept in the subjectposition an abstract noun and in position N1 require always a human noun

88 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re31F

SAfo

rth

ein

teraction

betw

eenN

sentan

dsem

antically

ann

otated

verbs

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 89

3344 Standard-Crossed Structures

To conclude the work on sentiment V-n we mention the case of thestructures of the type ldquostandardrdquo (N0sent V Loc N1um) ldquocrossedrdquo(N0um V di N1sent) exemplified below (DrsquoAgostino 2005 Elia et al1981)

(29) lrsquoamore[+2] brilla[+] negli occhi di Maria [+3]ldquolove shines in Mariarsquos eyesrdquo

(30) gli occhi di Maria brillano[+] drsquoamore[+2] [+3]ldquoMariarsquos eyes shine with loverdquo

To formalize this group of sentiment expressions we made use of 12verbs from the LG class 12 interpreted into a figurative way in whichthe N1um ldquostandardrdquo position and the N0um ldquocrossedrdquo position aredesigned by their body parts Npc through a metonymic process (Eliaet al 1981)As happens in (29) and (30) such verbs can work as intensifiers (egbruciare ldquoto burnrdquo) or can also have a negative polarity influence (onlyin the case of dolere ldquoto hurtrdquo)

335 Psychological Predicatesrsquo Adjectivalizations

Paragraph 332 described the manually-built collection of opinionbearing adjectives that counts more than 5000 itemsIn this list a subset of items in morpho-phonological relation with ourpredicates from the classes 41 42 43 and 43B has been automaticallyassociated to the verbs of origin and to their respective LG classes Theresult of this work is an electronic dictionary of 757 entries realizedwith a morphological grammar of Nooj (see Figure 32) It is displayedin the sample below

90 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

angoscianteA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciatoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciosoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

piacenteA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

piacevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

amatoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorosoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

biasimevoleA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=biasimare+Class=43B25

Table 317 gives information about the distribution of its semantictags while Table 318 compares the sentiment selection of V-a withthe whole dictionary of sentiment adjectives of SentIta in which it iscontained

Tag 41 42 43 43B TotPOS+FORTE 8 0 0 9 17POS 133 3 35 11 182POS+DEB 14 0 0 0 14NEG+DEB 66 0 5 1 72NEG 307 2 24 25 358NEG+FORTE 49 0 6 2 57FORTE 46 1 0 0 47DEB 9 0 0 1 10Tot 632 6 70 49 757

Table 317 Distribution of semantic tags in the dictionary of Adjectival Psych Pred-icates

The grammar resembles the ones used to derive the orientation ofadverbs in -mente and quality nouns from the semantic informationlisted in the sentiment adjectives that will be respectively treated inSections 352 and 353 The process is similar we copy and paste thedictionary that has to be semantically enriched into a Nooj text and weannotate it through the instructions of the grammar We then export

25 ldquoAnguishing anguished anguishous attractive pleasing beloved loving amorous blameworthyrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 91

Adjectives Entries Percentages () Percentages ()on Adj Psych on SentIta

Tot pos 213 28 4Tot neg 487 64 9Tot int 57 8 1Tot Psych 757 100 14Tot SentIta 5381 - 100

Table 318 Percentage values of Psych Adjectivalizations in SentIta

the annotations and compile a brand new dictionary on their baseBecause the operations involve just the two dictionaries we exclude

any other lexical (nod) and morphological (nom) resource from theNooj Preferences before applying the grammar to the dictionary-textThe grammar recognizes each verbal stem checks if it belongs to theLG class of interest and then adds the adjectival suffixes listed intoa dic file (see Table 319) The difference from the grammars forthe quick population of the adverbs in -mente and the quality nounsis that here the morphological strategy is not exploited to discoverthe prior polarity of the words since the adjective dictionary was al-ready tagged with sentiment information In fact in the final annota-tion the adjectives preserve all their semantic ($2S) inflectional andderivational ($2F) properties The new information they acquire re-gard their nature of V-a (+Va) the connection with the psychologicalverb (+V=$1L) and the relative LG class of the predicate (+Class=41)Regarding the adjectival suffixes for the deverbal derivation we exam-ined the ones that follow (Ricca 2004a)

bull morphemes for the adjectives formation (-bile -(t)orio -evole-(t)ivo -(t)ore)-bondo -ereccio -iccio -ido -ile -ndo -tario -ulo)

bull morphemes for the formation of adjectives that coincide with theverbrsquos past or present participle (-to -nte)

bull not typically deverbal morphemes (-oso -istico-(t)ico -aneo -ardo -ale -ario)

92 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re32E

xtractofth

em

orp

ho

logicalF

SAth

atassociates

psych

verbs

toth

eirad

jectivalization

s

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 93

bull not typically adjectival morphemes (-(t)ore -trice -ista -one)

SFX Adjectives-t-o 309-nt-e 159-(t)or-e 100-(t)iv-o 38-os-o 38-(t)ori-o 35-evol-e 26-o (conversion) 20-(t)ic-o 8-on-e 6-ist-a 4-istic-o 3-al-e 3-il-e 2-bil-e 1-bond-o 1-nd-o 1-tric-e 1-icci-o 1-ari-o 1-erecci-o 0-id-o 0-tari-o 0-ul-o 0-ane-o 0-ard-o 0

Table 319 Suffixes from the dictionary of Psych Adjectivalizations

As it can be seen in Table 319 some suffixes produced a largenumber of connections some of them were less productive Six mor-phemes did not produce any connectionFurthermore also a set of adjectives derived by conversion has beenautomatically connected to the respective verbs

94 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCESP

sych

verb

Psy

chV-

n

Psy

chV-

a

V-a

Tran

slat

ion

LGC

lass

Sco

re

angosciare angosciaangosciante ldquoanguishing

41 -3angosciato ldquoanguishedangoscioso ldquoanguishous

piacere piacerepiacente ldquoattractive

42 +2piacevole ldquopleasing

amare amoreamato ldquobeloved

43 +3amorevole ldquolovingamoroso ldquoamorous

biasimare biasimo biasimevole ldquoblameworthy 43B -2

Table 320 Adjectivalizations of the Psych Predicates

LG class 43 maligno = malignare ldquomean = to speak ill ofrdquoLG class 41 schifo = schifareldquodisgusting = to loatherdquo

The average precision reached in the automatic association pro-cess is 97 The list produced by Nooj has then been manuallycorrected in order to add it to the SentIta resources without anypossible source of errorsTable 320 shows examples of the connection between the verbs fromthe tables 41 42 43 and 43B and their respective nominalizations andadjectivalizationsActually nouns can be associated to adjectives also independentlyfrom verbs (Gross 1996) as happens in (31) Section 353 will dealwith such cases

(31) Luc

[est g eacuteneacuter eux

a de l a g eacuteneacuter osi teacute

]enver s ses ennemi s (Gross 1996)

ldquoLuc (is generous + has generosity) for his enemiesrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 95

Basically we chose to associate the psych adjectives to the respec-tive verbs in order to preserve their argumentactant selection for Se-mantic Role Labeling purposes This does not exclude that also otheradjectives that do not belong to this subset can have the function ofpredicates in the sentences in which they occur as happens in (31)

3351 Support Verbs

As exemplified above the sentiment expressions in which we insertedthe adjectives from SentIta are of the kind N0 essere Agg val WhereAgg val represents an adjective that expresses an evaluation (Elia et al1981) and the support verb for these predicates is essere ldquoto berdquoThis verb gives its support also to the expression N0 essere un N1-class(eg Questo film egrave una porcheria ldquoThis movie is a messrdquo) that withoutit together with the adjectives would not posses any mark of tense(Elia et al 1981)A great part of the compound adverbs possess also an adjectival func-tion (eg a fin di bene ldquofor goodrdquo tutto rose e fiori ldquoall peace and lightrdquosee Section 341) so with good reason they have been included inthis support verb construction as wellThe support verbsrsquo equivalents included in this case are the following(Gross 1996)

bull aspectual equivalents stare ldquoto stayrdquo diventare ldquoto becomerdquo ri-manere restare ldquoto remainrdquo

bull causative equivalents rendere ldquoto makerdquo

bull stylistic equivalents sembrare ldquoto seemrdquo apparire ldquoto appearrdquorisultare ldquoto resultrdquo rivelarsi ldquoto reveal to berdquo dimostrarsimostrarsi ldquoto show oneself to berdquo

Among the Italian LG structures that include adjectives26 we se-lected the following in which polar and intensive adjectives can occur

26For the LG study of adjectives in French see Picabia (1978) Meunier (1999 1984)

96 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

with the support verb essere (Vietri 2004 Meunier 1984)

bull Sentences with polar adjectives

ndash N0 Vsup Agg Val27

Lrsquoidea iniziale era accettabile[+1]

ldquoThe initial idea was acceptablerdquo

ndash V-inf Vsup Agg Val

Vedere questo film egrave demoralizzante[ndash2]

ldquoWatching this movie is demoralizingrdquo

ndash N0 Vsup Agg Val di V-Inf

La polizia sembra incapace[ndash2] di fare indagini

ldquoThe police seems unable to do investigaterdquo

ndash N0 Vsup Agg Val a N1

La giocabilitagrave egrave inferiore[ndash2] alla serie precedente

ldquoThe playability is worse than the preceding seriesrdquo

ndash N0 Vsup Agg Val Per N1

Per me questo film egrave stato noioso[ndash2]

ldquoIn my opinion this movie was boringrdquo

bull Sentences with adjectives as nouns intensifiers and downtoners

ndash N0 Vsup Agg Int di N1

Una trama piena[+] di falsitagrave[ndash2]

ldquoA plot filled with mendacityrdquo

27 We preferred to use in these examples the notation Agg Val rather than Psych V-a because they canrefer to both adjectivalizations of psych verbs and to other SentIta evaluative adjectives

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 97

3352 Nominal groups with appropriate nouns Napp

The support verb avere ldquoto haverdquo (and its equivalent tenere) has beenobserved into a transformation (Nb Vsup Na V-a) in which is involveda special kind of GN subject that contains noms approprieacutes ldquoappro-priate nounsrdquo Napp (Harris 1970 Guillet and Leclegravere 1981 Laporte1997 2012 Meydan 1996 1999)Citing (Laporte 2012 p 1)

ldquoA sequence is said to be appropriate to a given context ifit has the highest plausibility of occurrence in that contextand can therefore be reduced to zero In French the notionof appropriateness is often connected with a metonymicalrestructuration of the subjectrdquo

and (Mathieu 1999b p 122)

ldquoOn considegravere comme substantif approprieacute tout substantifNa pour lequel dans une position syntaxique donneacutee Na deNb = Nb28rdquo

we can clarify that ldquothe notion of highest plausibility of occurrenceof a term in a given contextrdquo (Laporte 2012) should not be interpretedin statistic terms or proved by searches in corpora but just intuitivelydefined through the paraphrastic relation Na di Nb = NbAccording to (Meydan 1996 p 198) ldquothe adjectival transformationswith Napp (nb (Na di Nb)Q essere V-a = Il fisico di Lea egrave attraenteldquoThe body of Maria is attractiverdquo) can be put in relation through fourtypes of transformationsrdquo which correspond also to the structuresincluded into our network of sentiment FSA The obligatoriness of themodifiers and the appropriateness of the nouns are reflected in thesetransformations (Laporte 1997)

28ldquoIt is considered to be appropriate substantive each substantive fro which into a given syntactic po-sition Na of Nb = Nbrdquo Authorrsquos translation

98 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull a nominal construction Vsup Napp

Nb Vsup Na V-a

Lea ha un fisico attraente ldquoLea has an attractive bodyrdquo

bull a restructured sentence in which the GN subject is exploded intotwo independent constituents

Nb essere V-a Prep Na

Lea egrave attraente (per il suo + di) fisico ldquoLea is attractive for herbodyrdquo

bull a metonymic sentence in which the Napp is erased

Nb essere V-a

Lea is attractive ldquoLea is attractiverdquo

bull a construction in which the Napp is adverbalized

Nb essere Na-mente V-a

Lea egrave fisicamente attraente ldquoLea is physically attractiverdquo

The concept of appropriate noun has a crucial importance notonly in these adjectival expressions but also when it appears as ob-ject of psychological predicates (see Section 334) as exemplified by(Mathieu 1999b p 122)

N0 V (Deacutet N de Nhum)1 = N0 V (Nhum)1

Les soucis rongent lrsquoesprit de Marie = Les soucis rongent MarieldquoWorries torment the spirit of Mary = Worries torment Maryrdquo

Moreover into the Sentiment Analysis field where the identifica-tion and the classification of the features of the opinion object evenconsist in a whole subfield of research (see Section 52) the Napp be-comes a very advantageous linguistic device for the automatic featureanalysis See for example (32) in which Na (Napp) is the feature andthe Nb (N-um) is the the object of the opinion

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 99

(32) [Opinion[Feature

[La r i sol uzi one

Lprimeobi et t i vo

]]del l a [Object f otocamer a] egrave eccel lente[+3]]

(32a) [Opinion[ObjectLa f otocamer a] ha una [Feature

[r i sol uzi one

obi et t i vo

]]eccel lente[+3]]

(32b) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3]]

(32c) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3] per [Feature

[l a sua r i soluzi one

i l suo obi et t i vo

]] ]

ldquo(The image resolution + The lens) of the camera is excellentrdquoldquoThe camera has an excellent (image resolution + lens)rdquoldquoThe camera is excellentrdquoldquoThe camera is excellent for its (image resolution + lens)rdquo

A useful classification of the nouns that can play the role of Nappfor both human and not human nouns and that can be exploited forthe opinion feature classification is the following (Meydan 1996)

Napp of Num

Npc = body parts

Npabs = personality trait

Ncomport = attitudes

Npreacutedr = intentions

Napp of N-um

Npo = shape

Nprop = ability

Dnom = model

100CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3353 Verbless Structures

Predicativity is not a property necessarily possessed by a particularclass of morpho-syntagmatic structures (eg verbs that carry infor-mation concerning person tense mood aspect) instead it is deter-mined by the connection between elements (Giordano and Voghera2008 De Mauro and Thornton 1985)Also on the base of their frequency in written and spoken corpora andin informal and formal speech together with Giordano and Voghera(2008) we consider verbless expressions syntactically and semanti-cally autonomous sentences which can be coordinated juxtaposedand that can introduce subordinate clauses just like verbal sentencesAmong the verbless sentences available in the Italian language we areinterested here on those involving adjectives indicating appreciation(Agg val) eg Bella questa ldquoGood onerdquo (Meillet 1906 De Mauro andThornton 1985)Below we report a selection of customer reviews from the dataset de-scribed in Section 513 that can give an idea of the diffusion of theverbless constructions in user generated contents

bull Movie Reviews [Molto bravi gli interpreti] [la mia preferita lasorella combina guai] [Bella la fotografia] [i colori]29

bull Car Reviews [Ottimo lrsquoimpianto radio cd] [comodissimi i co-mandi al volante]30

bull Hotel Reviews [Posizione fantastica] si raggiungono a piedi moltiluoghi strategici di Londra [molto curato nel servizio e nel soddis-fare le nostre richieste][ottimo servizio in camera]31

bull Videogame Reviews [Provato una volta] e [subito scartato]

29httpwwwmymoviesitfilm2013abouttimepubblicoid=68043930httpautociaoitNuova_Fiat_500__Opinione_1336287SortOrder131httpwwwtripadvisoritShowUserReviews-g186338-d651511-r120264867-Haymarket_

Hotel-London_Englandhtml

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 101

[odioso il modo in cui viene recensito]32

bull Smartphone Reviews [Utilissimo il sistema di spegnimento e ri-accensione ad orari programmati] [Telefonia e sensibilitagrave OK ]33

bull Book Reviews [Azzeccatissima la descrizione anche psicologica eintrospettiva dei personaggi]34

3354 Evaluation Verbs

In this Paragraph we mention the use of the verbs of evaluationVval (Elia et al 1981) which represent a subclass of the LG class43 grouped together through the acceptance of at least one of theproperties N1 = N1 Agg1 and N1 = Agg1 Ch F Examples are giudicareldquoto judgerdquo trovare ldquoto considerrdquo avvertire ldquoto noticerdquo valutare ldquotoevaluaterdquo etcOf course the N1 Agg here can include an Napp that takes the shapeof (Na di Nb)1 Agg just as happens with the psychological predicatesof Mathieu (1999b)They present a different role attribution with respect to support verbsconstructions since the N0 of the evaluative verbs is always an Opin-ion Holder (33)

(33) [Opinion[Opinion HolderM ar i a ]consi der a

[[Target Ar tur o] a f f asci nante[+2]

a f f asci nante[+2] [Targetche Ar tur o veng a]

]

ldquoMaria considers (Arturo fascinating + enchanting that Arturocomes)rdquo

32httpwwwamazonitreviewRK6CGST4C9LXI33httpwwwamazonitAlcatel-Touch-Smartphone-Dual-Italiaproduct-reviews

B00F621PPGpageNumber=334httpwwwamazonitproduct-reviewsB007IZ1Q5SpageNumber=3

102CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

The N0 of support verb construction instead is always the targetof the opinion (34) The Opinion Holder remains here unexpressed

(34) [Opinion[Target Ar tur o ]

egraver i manesembr a

d av ver o[+] a f f asci nante[+2]]

ldquoArturo (is + remain + seems) truly fascinatingrdquo

Into causative constructions the roles are again different (35)

(35) [Opinion[Cause

[M ar i a

Lprimeacconci atur a

]] r ende [Target Ar tur o ]d av ver o[+] a f f asci nante[+2]]

ldquo(Maria + The hairstyle) makes Arturo truly fascinatingrdquo

336 Vulgar Lexicon

The fact that ldquotaboo language is emotionally powerfulrdquo (Zhou 2010 p8) can not be ignored during the collection of lexical and grammaticalresources for Sentiment Analysis purposesSentIta is provided with a collection of 182 bad words that include thefollowing grammatical and semantic subcategories

bull Nouns 115 entries among which

ndash 30 are human nouns (eg rompiballe ldquopain in the assrdquo)

ndash 9 are nouns of body parts (eg cazzo ldquodickrdquo)

bull Verbs 45 entries among which

ndash 17 are pro-complementary and pronominal verbs (eg 4+PRXCOMP fottersene ldquoto give not a fuckrdquo and 13 +PRX in-cazzarsi ldquoto get madrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 103

ndash 17 allow the derivation of nouns in -bile ldquoblerdquo (eg tromba-bile ldquobangablerdquo from trombare ldquoto screwrdquo)

ndash 4 are already included in the LG class 41 (eg fregare ldquoto mat-terrdquo)

bull Adjectives 16 entries among which 10 are from SentIta (eg 5+POS cazzuto ldquodie-hardrdquo and 5 +NEG scoglionato ldquoannoyedrdquo)

bull Adverbs 4 entries (eg incazzosamente ldquogrumpilyrdquo)

bull Exclamation 8 entries (eg vaffanculo ldquofuck offrdquo)

Figure 33 Frozen and semi-frozen expressions that involve the vulgar word cazzoand its euphemisms

104CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Although some vulgar terms and expressions eg with the func-tions of interjections or discourse markers can be semantically emptythe great part of them have metaphorical functions that are relevantfor the correct analysis of the sentiment in real texts Examples are of-fense imprecation intensification of negation etcThe scatological and sexual semantic fields are among the most fre-quent in our lexicon The male and female nouns derived from humanbody parts can have different kinds of orientations and functions thatgo through the insult (coglione ldquoassholerdquo) and the praise (figata ldquocoolthingrdquo)In accord with the observations made up by Velikovich et al (2010)in web corpora our lexicon involves a set of racial or homophobicderogatory terms (eg terrone35 ldquosouthern Italianrdquo culattone ldquofaggotrdquo)The vulgar words in the SentIta database especially the ones with anuncertain polarity can be part of frozen or semi-frozen expressionsthat can make clear for each occurrence the actual semantic orienta-tion of the words in context A very typical Ialian example is cazzoldquodickrdquo with its more or less vulgar regional variants (eg minchiapirla) and euphemisms (eg cavolo ldquocabbagerdquo cacchio ldquodangrdquo mazzaldquostickrdquo tubo ldquopiperdquo corno ldquohornrdquo etc) Figure 33 exemplifies thecases in which the context gives to the word(s) under examination anegative connotation In detail the FSA includes negative adverbialand adjectival functions (paths 12 5 7) exclamations (paths 3 6 9)interrogative forms (6) and intensification of negations (paths 8 9) Inpath (4) it is also exemplified the frozen sentence from the LG classCAN (N0 V (C1 di N)= N0 V C1 a N2) that presents cazzo (Vietri 2011) asC1 frozen complement This frozen sentence in particular is also re-lated to some monorematic compound nouns and adjectives includedin the bad word list of SentitaHowever a Sentiment Analysis tool that works on user generated con-tents must be aware of the fact that in colloquial and informal situ-ations a word like cazzo can work simply as a regular intensifier also

35term used with insulting purposes only by Northern Italians

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 105

for positive sentences (eg egrave cosigrave bello[+2] cazzo[+] [+3] ldquoitrsquos fuckingnicerdquo) This is why it received in the dictionary just the tag attributedto intensifiers (+FORTE) In order to be annotated in texts with otherorientations it must occur as component of specific expressionsAs we can see in Table 321 although the majority of the entries in thebad words dictionary possess a negative connotation 10 of them arepositive (eg strafigo ldquosupercoolrdquo) Moreover the manual annotationof the whole list of bad words with sentiment tags confirmed the bud-ding concept of the emotional power of taboo language 84 of thevulgar lexicon is endowed with a defined semantic orientation (see Ta-ble 322)

Score Bad Words Entries+3 +POS+FORTE 6+2 +POS 12+1 +POS+DEB 0-1 +NEG+DEB 8-2 +NEG 125-3 +NEG+FORTE 6+ +FORTE 3- +DEB 0

Table 321 SentIta tag distribution in the list of bad words

Bad Words Entries Percentage ()tot pos 18 10tot neg 132 73tot int 3 2tot oriented 153 84tot neutral 29 16tot BW 182 100

Table 322 Bad Words percentage values in SentIta

106CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Collateral Benefits of Exhaustive Vulgar Lexical and Grammatical Resources

The Web 20 offers the opportunity to Internet users to freelyshare almost everything in online communities hiding their ownidentity behind the shelter of anonymityFlaming trolling harassment cyberbullying cyberstalking cy-berthreats are all terms used to refer to offensive contents on theweb The shapes can be different and the focus can be on various top-ics eg physical appearance ethnicity sexuality social acceptanceetc (Singhal and Bansal 2013) but they are all annoying (when noteven dangerous when the victims are children or teenagers) for thesame reason they make the online experience always unpleasantThe detection and filtering of offensive language on the web isstrongly connected to the Sentiment Analysis field when one decideto handle it automatically The occurrence of derogatory terms andphrases or the unjustified repetition of words with a strongly negativeconnotation can effectively help with this taskAlthough taboo language is generally considered to be the strongestclue of harassment in the web (Xiang et al 2012 Chen et al 2012Reynolds et al 2011 Xu and Zhu 2010 Yin et al 2009 Mahmud et al2008) it must be clarified that the presence of bad words in posts doesnot necessarily indicate the presence of offensive behaviorsWe already explained that the words in our collection of vulgar termsin some cases are neutral or even positive Profanity can be usedwith comical or satirical purposes and bad words are often just theexpression of strong emotions (Yin et al 2009)Thus the well-known censoring approaches based only on keywordsthat simply match words appearing in text messages with offensivewords stored in blacklists are clearly not meant to reach high levels ofaccuracyConsistent with this idea in recent years many studies on offensivecyberbullying and flame detection integrated the bad wordsrsquo contextin their methods and tools

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 107

Chen et al (2012) exploited a Lexical Syntactic Feature (LSF) archi-tecture to detect offensive content and identify potential offensiveusers in social media introducing the concept of syntactic intensifierto adjust wordsrsquo offensiveness levels based on their contextXiang et al (2012) learned topic models from a dataset of tweetthrough Latent Dirichlet Allocation (LDA) algorithm They combinedtopical and lexical features into a single compact feature spaceXu and Zhu (2010) proposed a sentence-level semantic filtering ap-proach that combined grammatical relations with offensive wordsin order to semantically remove all the offensive content from sen-tencesInsulting phrases (eg get-a-life get-lost etc) and derogatorycomparisons of human beings with insulting items or animals (egdonkey dog) were clues used by Mahmud et al (2008) to locate flamesand insults in text

34 Towards a Lexicon of Opinionated Compounds

341 Compound Adverbs and their Polarities

Elia (1990) adapted to the Italian language the formal definition of ldquoad-verbrdquo of1 2q Jespersen (1965) who for the first time included alsospecial kinds of phrases and sentences in the category of the adverbsIn detail the structures included in this category are the following

bull traditional adverbs with a monolithic shape (eg volentierildquogladlyrdquo)

bull traditional adverbs in -mente (eg impetuosamente ldquoimpetu-ouslyrdquo)

bull prepositional phrases (eg a vanvera ldquorandomlyrdquo)

108CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull phases derived from sentences with finite or non-finite verbs(eg da levare il fiato ldquoto take the breath away)

The first two types of adverbs will be treated separately in Section352 due to the fact that they have been automatically derived fromthe SentIta adjective dictionaryCompound adverbs or frozen or idiomatic adverbs (the last two pointsof Jespersenrsquos list) can be defined as

ldquo() adverbs that can be separated into several wordswith some or all of their words frozen that is semanticallyandor syntactically noncompositionalrdquo (Gross 1986 p 2)

Therefore just as happens in polyrematic expressions the syntacticelements that compose such adverbs can not be freely shifted substi-tuted or deleted This despite the fact that from a syntactic point ofview they work just as simple adverbsFollowing the LG paradigm and according to their internal mor-phosyntactic structure the compound adverbs have been classified inmany languages namely English(Gross 1986) French (Gross 1990Laporte and Voyatzi 2008 Tolone and Stavroula 2011) Italian (Elia1990 Elia De Gioia 1994ab 2001) Modern Greek (Voyatzi 2006)Portuguese (Baptista 2003) etcThe LG description and classification of compound adverbs rep-resents the intersection of two criteria corresponding to the phe-nomenon of the multiword expressions and to the adverbial functions(Tolone and Stavroula 2011 Laporte and Voyatzi 2008)According to (Laporte and Voyatzi 2008 p 32)

ldquo() a phrase composed of several words is considered tobe a multiword expression if some or all of its elements arefrozen together that is if their combination does not obeyproductive rules of syntactic and semantic compositional-ityrdquo

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 109

The adverbial functions instead pertain to the adverbial roleswhich can take the shape of circumstantial complements or of com-plements which are not objects of the predicate of the clause in whichthey appearIn LG descriptions compound adverbs are formalized as a sequenceof parts of speech and syntactic categories Their morpho-syntacticstructures are described on the base of the number the category andthe position of the frozen and free components of the adverbialsThe Italian classification system for compound adverbs follows thesame logic as the French one with which it shares a strong similar-ity in terms of syntactic structuresIn this research among the 16 classes of idiomatic adverbs selected byDe Gioia (1994ab) we worked on the ones reported in Table 323The comparative structures with come ldquolikerdquo treated by Vietri (19902011) as idiomatic sentences will be described in Section 342The adverbs belonging to the classes described in the first 9 rows of

Table 323 have been manually selected and evaluated starting fromthe electronic dictionaries and the LG tables of Elia (1990) while theclasses PF PV and PCPN come from the LG tables of De Gioia (2001)In the first classes Elia (1990) recognized the following three abstractstructures

1 Prep (E+Det) (E+Agg) C1 (E+Agg)

(a) PC

(b) PDETC

(c) PAC

(d) PCA

2 Prep (E+Det+Agg) C1 (E+Agg) Prep (E+Det+Agg) C2 (E+Agg)

(a) PCDC

(b) PCPC

110CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class

Structu

reE

xamp

leTran

lation

PC

Prep

Ccon

trovoglia

un

willin

glyP

DE

TC

Prep

DetC

conle

cattiveb

yan

ym

eans

necessary

PAC

Prep

Agg

Ca

pien

ivotiw

ithfl

ying

colo

rsP

CA

Prep

CA

gga

testaalta

with

you

rh

eadh

eldh

ighP

CD

CP

repC

diC

concogn

izione

dicau

saw

ithfu

llknow

ledge

PC

PC

Prep

CP

repC

diben

ein

meglio

from

goo

dto

better

PC

ON

G(P

rep)

CC

on

gC

drsquoam

oree

drsquoaccord

oin

armo

ny

CC

CC

nien

tem

alen

oth

ing

bad

CP

CC

Prep

Clrsquoira

diD

ioa

kingrsquos

ranso

mP

FC

om

po

un

dSen

tence

sisalvichip

uograve

everym

anfo

rh

imself

PV

Prep

VW

sedu

tastan

teo

nth

esp

ot

PC

PN

Prep

CP

repN

inon

taa

insp

iteo

f

Table

323Classes

ofC

om

po

un

dA

dverb

sin

SentIta

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 111

3 (E+Prep) (E+Det) C1 (E+Cong+Prep) (E+Det+Agg) C2

(a) PCONG

(b) CC

(c) CPC

Table 324 shows the distribution of each one of the evaluation(POSNEG) intensity (FORTEDEB) and discourse tags36 (SINTESINEGAZIONE etc) in every class of compound adverbAs we can see in the two lower section of the Table 324 the percentageof compound adverbs that posses an orientation or that are signifi-cant for Sentiment Analysis purposes is the 51 of the total amount ofcompound adverbs under exam Moreover it must be observed thatthe 60 of the labels are connected to the intensity while we have onlythe 30 of evaluation tagsThe presence of discourse tags seems to be marginal but if comparedwith other part of speech (or even with the -mente and monolithic ad-verbs see Section 352) it reveals its significance

3411 Compound Adjectives of Sentiment

A great part of the polyrematic units with an adverbial function canalso play the role of adjectives

Adverbial function agire a fin di bene ldquoto act for goodrdquo

Adjectival function una bugia a fin di bene ldquoa white lierdquo

For simplicity we did not distinguish the cases in which theyposses just one function from the cases in which they can have boththe roles However separating these two groups is often impossible

36These tags annotate 70 discourse operators able to summarize (eg in parole povere ldquoin simpletermsrdquo) invert (eg nonostante cioacute ldquoneverthelessrdquo) confirm (eg in altri termini ldquoin other termsrdquo) com-pare (eg allo stesso modo ldquoin the same wayrdquo) and negate (eg neanche per sogno ldquoin your dreamsrdquo) theopinion expressed in the previous sentences of the text

112CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

if one considers also their invariability and the fact that their func-tion depends on semantic features rather then on morphological ones(Voghera 2004)

AVV+AVVC+PC+PN without adjectival functionin conclusione ldquoin conclusionrdquo

AVV+AVVC+PC+PN with adjectival functiona vanvera (eg parlare a vanvera ldquoto prattlerdquo un discorso a van-vera ldquoa prattlerdquo)

Therefore we just did not make any distinction into the dictionar-ies in order to avoid the duplication of the entries but we recalledthem also in the grammars dedicated to adjectives as nouns modi-fiers and into constructions of the kind N0 (essere + E) Agg Val (seeSection 335)

342 Frozen Expressions of Sentiment

In this thesis oriented words are always considered in context In thisparagraph we will focus on the importance that the frozen sentencescan have in a LG based module for Sentiment AnalysisTraditional and generative grammar generally treat idioms as aberra-tions and many taggers and corpus linguistic tools just ignore multi-word units and frozen expressions (Vietri 2004 Silberztein 2008)The LG paradigm started with Gross (1982) the systematic and formalstudies on frozen sentences underling their non-exceptional naturefrom both the lexical and syntactic point of views Indeed idioms oc-cupy in the lexicon a volume that is comparable with the one of thefree forms (Gross 1982)This kind of approach opens plenty of possibilities in the field of Sen-timent Analysis in which the collection of lexical resources endowedwith a defined polarity can not just run aground on simple words butneeds to be opened also to different kinds of oriented multiword ex-

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 113

Tag

PC

CC

CPC

PAC

PCA

PCDC

PCONG

PCPC

PDETC

PF

PV

PCPN

TotP

OS+

FO

RT

E0

60

42

02

15

10

021

PO

S1

52

915

101

014

00

057

PO

S+D

EB

00

03

22

20

51

00

15N

EG

+DE

B0

01

04

01

37

31

020

NE

G36

156

155

62

519

70

512

1N

EG

+FO

RT

E0

00

11

12

23

11

012

FO

RT

E10

240

549

2613

3518

768

50

377

DE

B23

131

144

45

26

10

073

SIN

TE

SI13

20

61

20

01

05

030

CO

NF

ER

MA

52

00

00

00

50

01

13IN

VE

RT

E4

20

80

00

07

01

022

CO

MP

30

01

00

00

10

00

5N

EG

AZ

ION

E3

00

00

00

21

01

18

Tot

190

8515

110

6038

5033

150

2214

777

4To

tCla

ss87

234

610

431

632

024

813

212

762

120

212

453

332

1Pe

rcen

tage

()

2225

1435

1915

3826

2311

1113

51

Eva

luat

ion

tags

()

1931

6029

4850

2033

3559

1471

32In

ten

sity

tags

()

6662

4057

5045

8061

5541

360

58D

isco

urs

eta

gs(

)15

70

142

50

610

050

2910

Tab

le3

24C

om

po

siti

on

oft

he

com

po

un

dad

verb

dic

tio

nar

y

114CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

pressions and frozen sentencesAccording to (Vietri 2014b p 32)

() a verb when co-occurring with a certain noun (or a setof nouns) produces a ldquospecial meaningrdquo that it would nothave been assigned if the noun was substituted This typeof meaning is conventionally defined ldquonon-compositionalrdquobut this does not automatically imply from a syntactic pointof view that idioms are ldquounitsrdquo

On the base of this statement strictly connected to the distributionalproperties of the idiomatic sentences we included in our sentimentdatabase more than 500 Italian frozen sentences that have been man-ually evaluated and then formalized with pairs of dictionary-grammarThis choice is connected to the purpose of lemmatizing them in lexicaldatabases taking into account through FSA also their syntactic flexi-bility and lexical variation Treating into a computational way idiomsmeans to deal with the problems they pose in terms of relationship be-tween ldquointerpretationrdquo and ldquosyntactic constructionsrdquo (Vietri 2014b)The source of the work are the Lexicon-Grammar descriptions of id-ioms that take the shape of binary tables in which lexical entries aredescribed on the base of syntactic and semantic propertiesThe formal notation is the traditional one used in the LG frameworkThe symbol C is used in frozen sentences to indicate a fixed nom-inal position that can not be substituted by different items belong-ing to the same class or semantic field and that can not be morpho-grammatically modified (Vietri 2004)The Lexicon-Grammar classification of Italian idioms includes morethan 30 Lexicon-Grammar classes of idioms with ordinary verbs andsupport verbs The most common support verbs are avere to ldquohaverdquoessere ldquoto berdquo fare ldquoto makerdquo They differ from their ordinary counter-parts because of their semantic emptiness When they are implied inthe formation of idiomatic constructions they present a very high lex-ical and syntactic flexibility that can take the shape of the following

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 115

phenomena (Vietri 2014d)

bull the alternation of support verbs with aspectual variants (eg ri-manere ldquoto remainrdquo diventare ldquoto becomerdquo)

bull the production of causative constructions (eg with verbs likerendere ldquoto makerdquo)

bull the deletion of the support verb (eg N0 Agg come C1 una donnaastuta come una volpe ldquoa woman as sharp as a tackrdquo)

Because of the complexity of their lexical and syntactic variabilitythat often affects the intensity of the idiom itself we chose to reducethe sample of the idioms under examination for Sentiment Analysispurposes just to the frozen sentences with the support verb essere thatinclude adjectives in their structure namely PECO CEAC EAA ECAand EAPC Due to the fact that a large part of the idioms belongingto this subgroup contains adjectives evaluated in SentIta we foundinteresting a comparison between the prior polarity of such adjectivesinvolved in idioms and the polarity assigned to the idiom itself Theresults of this evaluation of differences is reported in Table 325

PECO CEAC EAA ECA EAPC TotIdioms in the LG Class 274 36 40 153 162 665Idioms in SentIta 245 27 32 134 139 577Adj of SentIta in Idioms 133 12 15 37 56 253

Table 325 Polar adjectives in frozen sentences

3421 The Lexicon-Grammar Classes Under Examination PECO CEAC EAAECA and EAPC

Vietri (1990) showed that the comparative frozen sentences of the typeN0 Agg come C1 (PECO) usually intensify the polarity of the adjectiveof sentiment they contain as happens in (36) in which the SentIta

116CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

adjective bello endowed with a polarity score of +2 occurring into theidiom Essere bello come il sole changes its polarity in +3

(36) Maria egrave bella[+2] come il sole [+3]ldquoMaria is as beautiful as the sun

Actually it is also possible for an idiom of this sort to be polarizedwhen the adjective (eg bianco ldquowhite) contained in it is neutral (37)or even to reverse its polarity as happens in (38) (eg agile ldquoagile ispositive)

(37) Maria egrave bianca[0] come un cadavere [-2]ldquoMaria is as white as a dead body (Maria is pale)

(38) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

In this regard it is interesting to notice that the 84 of the idiomshas a clear SO while just the 36 of the adjectives they contain ispolarized see the examples in Table 326 This confirms the signifi-cance of a sentiment lexicon that includes also frozen expressions inits list Indeed a resource of this sort is able to disambiguate the casesin which the polarity of sentiment (or neutral) words is changed by theironic nature of the idioms in which they occur (38) (39) (40)The disambiguation rule is only one and consist in annotating alwaysthe longest match in the text analysis

(39) Maria egrave alta[0] come un soldo di cacio [-2]ldquoMaria is knee-high to a grasshopperrdquo (Maria is short)

(40) Maria egrave asciutta[0] come lrsquoesca [-2]ldquoMaria is on the bones of its arserdquo (Maria is penniless)

Other idioms included in our resources are the ones belonging tothe classes EAPC that involve the presence of an adjective or a verb inits past participle form a prepositional complement These frozen ex-pressions usually intensify the polarity of the adjectivepast participle

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 117

Froz

enTr

ansl

atio

nA

dj

Ad

jId

iom

Idio

mSe

nte

nce

Po

lari

tySc

ore

Po

lari

tySc

ore

esse

reag

ile

com

eu

na

gatt

ad

ipio

mbo

ldquoto

be

asag

ile

asa

lead

catrdquo

PO

S+2

NE

G-2

esse

reag

ile

com

eu

na

gazz

ella

ldquoto

be

asag

ile

asa

lead

gaze

llerdquo

PO

S+2

PO

S+

2es

sere

agil

eco

me

un

osc

oiat

tolo

ldquoto

be

asag

ile

asa

squ

irre

lrdquoP

OS

+2P

OS

+2

esse

reas

tuto

com

eil

dem

onio

ldquoto

be

ascl

ever

asth

ed

evilrdquo

-0

PO

S+

2es

sere

astu

toco

me

un

avo

lpe

ldquoto

be

assh

arp

asa

tack

rdquo-

0P

OS

+2

esse

reas

tuto

com

eu

no

zin

garo

ldquoto

be

ascu

nn

ing

asa

gyp

syrdquo

-0

NE

G-2

Tab

le3

26E

xam

ple

so

fid

iom

sth

atm

ain

tain

sw

itch

or

shif

tth

ep

rio

rp

ola

rity

oft

he

adje

ctiv

esth

eyco

nta

in

118CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(see example below) but just as the PECO ones they can also switchit or shift it

(41) Maria egrave matta[-2] da legare [-3]ldquoMaria is so crazy she should be locked up

The LG class EAA with definitional structure N0 essere Agg e Agg in-cludes two adjectives that can be polarized or not Example (42) showsthat in this case as well the sentence polarity can be also oppositewith respect to the adjectives ones

(42) Maria egrave bella[+2] e fritta[0] [-2]ldquoMaria is cooked

An example from the class CEACis reported in (43)

(43) Lrsquoanima egrave nera[0] come il carbone [-3]ldquoThe soul is black like the coalrdquo

Among the transformation that these idioms can have we includedN0 avere C Agg (44) that corresponds also to the most frequent one

(44) Maria ha lrsquoanima nera[0] come il carbone [-3]ldquoMaria has a black soul like the coalrdquo

In the end we formalized the LG class ECA that accounts in itsfrozen elements the nominal element C1 and the Adjective

(45) Maria egrave una gatta morta[0] [-2]ldquoMaria is a cock tease

3422 The Formalization of Sentiment Frozen Expressions

Due to their syntactic and lexical variation and to discontinuity frozenexpressions need to be formalized in an electronic context able to sys-tematically list them in electronic databases and to take into accounttheir syntactic variability Basically the final purpose is to take advan-tage of the large number of information listed into the LG matrices

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 119

while recognizing and annotating idioms in real textsOne of the best solution is to link dictionaries which contain the listsof Atomic Linguist Units and their syntactic and semantic propertieswith syntactic grammars which can make such words and propertiesinteract into a FSA context (Silberztein 2008) ALU are ldquothe smallestelements that make up the sentence ie the non-analyzable units ofthe languagerdquo (Silberztein 2003) They can take the shape of simplewords affixes multiword expressions and frozen expressionsWhat makes different frozen expressions and the other ALU is discon-tinuity as displayed in Figure 34 in fact it is possible for adverbs tooccur between the verb and the direct object (Vietri 2004) That iswhy frozen sentences need both dictionaries and FSA to be recognizedand annotated in a correct way the dictionaries allow the recogni-tion of idioms thanks to the characteristic components (word forms orsequences that occur every time the expressions occur and originatethe recognition of the frozen expressions in texts (Silberztein 2008))and FSA let the machine compute them despite of the many differentforms that in real texts they can assumeFigure 34 represents a simplified version the main graph of the FSA

for the recognition and the sentiment annotation of idioms of the LGclass PECO It includes a metanode that allows the recognition of anominal group in subject position (N0) an embedded graph for theverb and six nodes related to different Semantic OrientationsThe instruction XREF is used to exclude the linguistic material thatcan occur between the support verb and the idiom object when theannotation is performed Figure 35 shows as an example the contentof the metanode that annotates frozen expressions with +3 as Seman-tic Orientation (MOLTOPOS) Here we observe the interactions be-tween the idiom dictionary and the restrictions reported in the syntac-tic grammar In its structure the grammar under examination looksexactly the same as the grammar for the identification of free sen-tences with the same syntactic shape The difference are the syntac-tic and distributional constraints that in the grammar recall only the

120CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re34E

xtractofth

eF

SAfo

rth

eSO

iden

tificatio

no

fidio

ms

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 121

properties and the lexical items associated in the dictionary with aspecific characteristic component To be more precise the PECO dic-tionary among others lists the three entries shown below which havebeen labeled with the semantic tags POS and FORTE (+3) as happensin the other subsets of SentIta

Essere (forte+coraggioso) come un leone ldquoTo be (strong+brave) like alionrdquo

leoneN+Class=PECO

+A=coraggioso+A=forte+AVV=come+AVV=quanto+DET=un

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0essereunC1+N0esserecomeC1+N0esserepiu

Essere buono come il pane ldquoTo be as good as breadrdquo

paneN+Class=PECO

+A=buono+AVV=come+AVV=quanto+DET=il+PREP=di

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserediC1+N0esserecomeC1+N0esserepiu

Essere (astuto+furbo) come il demonio ldquoTo be as clever as the devilrdquo

demonioN+Class=PECO

+A=astuto+A=furbo+AVV=come+AVV=quanto+DET=il

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserepiu

122CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

As shown below the characteristic components C (eg leone ldquolionrdquopane ldquobreadrdquo demonio ldquodevilrdquo) have been used as Nooj entry in theelectronic dictionary and have been associated to the other compo-nents of the frozen expressions (eg +A +AVV +DET) Moreover theyhave been described also with other information referring to the fol-lowing properties borrowed from the already cited LG tables

distributionaleg +N0um +Vsup +Vcaus

transformationaleg +N0essereunC1 +N0esserecomeC1 +N0esserediC1 +N0esserepiu

All these properties are recalled in the idiom syntactic grammar inform of lexical and grammatical constraintThe restrictions +N0um and +Vsup +Vcaus are respectively meant forthe distributional properties of sentence subject that can or can notbe a human and of the support verb essere with its aspectual (Vsup)or causative (Vcaus) variants While the support verbs restare and ri-manere are always acceptable diventare and rendere are not accept-able only in few cases (Vietri 1990)We chose to formalize the distributional restrictions in the syntacticgrammars with the following instruction written under each involvednode and into a related variable (var)

lt$var=$C$vargt

It represents a constraint for the words that are allowed to appearinto the nodes it restricts which can be only the ones indicated in thedictionary by the same variable var for the entry C As an example theidiom Essere (forte+coraggioso) come un leone that in the dictionaryhas as values for the variable +A only the adjectives forte ldquostrongrdquo andcoraggioso ldquobraverdquo can not accept in the grammar into the variable Aother adjectives than the ones written above if the variable goes withthe restriction shown below

lt$A=$C$Agt

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 123

The same happens with the verb aspectual and causative variants(Vietri 1990) with similar restrictions

lt$Vvar=$C$Vvargtlt$Vcaus=$C$Vcausgt

The symbols ldquo=rdquo is used in Figure 35 when the inflection of theword in the variable is permitted (eg for the adjective essere furbicome il demonio) Otherwise it is used the symbol ldquo=rdquoWe preferred to treat transformational properties instead differentlyfrom Vietri (1990) In order to reduce the dimension of the gram-mar that contains here also the polarity and intensity informationwe wrote transformational constraints before the path starts in the fol-lowing form

lt$C$vargt

This instructs the FSA that the specific path is allowed just for theidioms that present in the dictionary the variable var for the entry CIn the FSA in Figure 35 the PECO standard structure Essere Agg come

C1 is recognized by the path (2) The path (1) deletes the adjective ifthe idiom recognized by C possesses the specific transformational rule+N0esserecomeC1 Below are reported examples of idioms that satisfythis constraint The ones preceded by the asterisk do not satisfy therestrictions

essere come un leone ldquoto be like a lionrdquoessere come il pane ldquoto be like the breadrdquoessere come il demonio ldquoto be like the devilrdquo

In the path (3) the grammar performs the deletion of both the ad-jective and the adverb come for the idioms endowed with the property+N0essereunC1 in the electronic dictionary

essere un leone (di uomo) ldquoto be a lion (man)rdquoessere un pane ldquoto be a breadrsquoessere un demonio (di uomo) ldquoto be a devil (man)rdquo

124CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re35E

xtractofth

eF

SAfo

rth

eextractio

no

fPE

CO

idio

ms

35 THE AUTOMATIC EXPANSION OF SENTITA 125

Differently from essere un pane that is not acceptable at all es-sere come il demonio and essere un demonio (di uomo) are accept-able sentences that change the polarity of the basic idiom probablybecause they are related with another comparative sentence with op-posite meaning and orientation eg essere (brutto + cattivo) come ildemonio ldquoto be (ugly + evil) like the devilrdquo (Vietri 1990)The last two paths respectively refer to the correlation of PECO withother sentence structures such as N0 essere di C1 (4) indicated by theproperty +N0esserediC1 and N0 essere piugrave Agg di C summarized in+N0esserepiu in the dictionary and in the path (5)

essere (fatto + E) di pane ldquoto be made of breadrdquoessere (fatto + E) di leone ldquoto be made of lionrdquoessere (fatto + E) di demonio ldquoto be made of devilrdquo

essere piugrave (forte + coraggioso) di un leone ldquoto be (stronger + braver)than a lionrdquoessere piugrave buono del pane ldquoto be better that the breadrdquoessere piugrave (astuto + furbo) del demonio ldquoto be cleverer that the devilrdquo

While just the first idiom is allowed to walk through the path (4)the path (5) implies a property accepted by all the exemplified idioms

35 The Automatic Expansion of SentIta

The present Section discusses the automatic enlargement of the man-ually built electronic dictionaries of SentItaOur work took advantage of derivation linguistic clues that put in rela-tion semantically oriented adjectives with quality nouns and with ad-verbs in -mente The purpose is making new words automatically de-rive the semantic information associated to the adjectives with whichthey are morpho-phonologically relatedFurthermore we used as morphological Contextual Valence Shifters(mCVS) a list of prefixes able to negate or to intensifydowntone the

126CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

orientation of the words with which they occurWe clarify in advance that the morphological method could have beenapplied also to Italian verbs but we chose to avoid this solution be-cause of the complexity of their argument structures We decidedinstead to manually evaluate all the verbs described in the ItalianLexicon-Grammar binary tables so we could preserve the differentlexical syntactic and transformational rules connected to each oneof them

351 Literature Survey on Sentiment Lexicon Propagation

The works on sentiment lexicon propagation follow three main re-search lines The first one is grounded on the richness of the alreadyexistent thesauri WordNet (Miller 1995) among others AlthoughWordNet does not include semantic orientation information for itslemmas semantic relations such as synonymy or antonymy are com-monly used in order to automatically propagate the polarity startingfrom a manually annotated set of seed words Anyway this approachpresents some drawbacks such as the lack of scalability the unavail-ability of enough resources for many languages including Italian andthe difficulty to handle newly coined words which are not alreadycontained in the thesauri Furthermore homographs which belongto different synsets could present a diversity of meanings and polari-ties (Silva et al 2010)The second line of research is based on the hypothesis that the wordsthat convey the same polarity appear close in the same corpus sothe propagation can be performed on the base of co-occurrence al-gorithmsIn the end the morphological approach is the one that employs mor-phological structures and relations for the the assignment of the priorsentiment polarities to unknown words on the base of the manipula-tion of the morphological structures of known lemmas

35 THE AUTOMATIC EXPANSION OF SENTITA 127

In the following Paragraphs we will briefly describe the most interest-ing contributions pertaining to the just described research areas

3511 Thesaurus Based Approaches

Kamps et al (2004) investigated the graph-theoretic model of Word-Netrsquos synonymy relation and using elementary notions from graphtheory proposed a measure for the automatic attribution of the se-mantic orientation to adjectives All the words listed in WordNet havebeen collected and related on the base of their occurrence in the samesynsetAlthough current WordNet-based measures of distance or similarityusually focus on taxonomic relations and in general include in theiralgorithms the antonymy relation the authors preferred to use onlythe synonymy relationHu and Liu (2004) chose to limit the lexicon construction to the ad-jectives occurring in sentences that contained one or more productfeatures due to their defined interest on product featuresIn their research in order to predict the semantic orientations ofsuch adjectives the authors proposed a simple and effective methodgrounded on the adjective synonym and antonym sets discoveredsurfing the WordNet graphsKim and Hovy (2004) proposed a system for the automatic detectionof opinion holders topics and features for each opinion that containstwo modules one for the determination of the word sentiment andanother for the sentiment sentencesIn their System Architecture it is interesting for the task in exam thestage in which the word sentiment classifier individually calculatedthe polarity of all the sentiment-bearing words Their basic approachstarted from a small amount of seed words sorted by polarity into apositive and a negative list WordNet has been used to lengthen theinitial list on the base of two very intuitive insights synonyms of polarwords which usually preserve the orientation of the original lemma

128CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

and antonyms of polar words that reverse the orientation of the origi-nal lemmasEsuli and Sebastiani (2005) presented a semi-supervised learningmethod for the orientation identification of subjective terms basedon the classification of their glossesThe authors started from a representative seed set built for both thepositive and negative categories and from some lexical relations de-fined in WordNet (eg synonymy hypernymy and hyponymy for theannotation of terms with the same orientation and direct antonymyand indirect antonymy for the terms with opposite orientation) Everytime in which a new term is added to the original ones it becomes partof the training set for the learning phase performed with a binary textclassifier All the glosses of a given term found in a machine-readabledictionary are collocated into a vectorial form thanks to standard textindexing techniquesEsuli and Sebastiani (2006b) tested the following approaches

bull a two-stage method in which a first classifier groups the termsinto the Subjective or Objective categories and another classifiertags the Subjective terms as Positive or Negative

bull other two binary classifiers based on learning algorithms cat-egorise terms belonging to the Positivenot-Positive categoriesand the ones fitting the Negativenot-Negative categories

bull a method that simply considers Positive Negative and Objectiveas three categories with equal status

Argamon et al (2009) grounded on the Appraisal Theory a methodfor the automatic determination of complex sentiment-related at-tributes (eg type and force) The authors applied supervised learn-ing to Word-Net glosses The method started from small training setsof (positive or negative) seed words and then added them new setsof terms collected in the WordNet graph along the synonymy andantonymy relations This way based on the WordNet glosses each

35 THE AUTOMATIC EXPANSION OF SENTITA 129

term can received a vectorial representations thanks to standard textindexing techniquesDragut et al (2010) based their research on few basic assumptions

bull the semantic relations between words with known polarities canbe used to enlarge sentimental word dictionaries

bull two words are related if they share at least a synset

bull each synset has a unique polarity

bull the deduction can be performed just on few WordNet semanticrelations eg hyponymy antonymy and similarity

Thanks to these intuitions given a sentimental seed dictionary theydeduced approximately an additional 50 of words endowed with aspecific polarity on top of the electronic dictionary WordNetHassan and Radev (2010) applied a Markov random walk model to alarge word relatedness graph with the purpose of producing a polarityestimate of the subjective wordsIn detail in their network two nodes are linked if they are semanti-cally related Among the sources of information used as indicators ofthe relatedness of words we mention WordNetThe hypothesis of the work is the following a random walk that startswith a given word is more likely to firstly hit a word with the same se-mantic orientation than words with a different orientationPaulo-Santos et al (2011) with a semi-supervised approach createda small polarity lexicon (3000+ entries 8486 accuracy) for the Por-tuguese language having as input only a limited collection of re-sources a set of ten seed words labelled as positive or negative acommon online dictionary a set of extraction rules and an intuitivepolarity propagation algorithmThe authors implemented their task by converting the online dictio-nary into a directed graph in which nodes correspond to words andedges correspond to synonyms antonyms or other semantic relationsbetween words Thus they propagated the polarities of the known

130CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

seed set to unlabelled words applying a graph breadth-first traversalIn their research Maks and Vossen (2011) explored two approachesfor the annotation of polarity (positive negative and neutral) of ad-jective synsets in Dutch using WordNet as lexical resource The firstapproach is based on the translation of the English SentiWordNet 10(Esuli and Sebastiani 2006b) into Dutch and the respective transferof the lemmasrsquo polarity values In the second approach WordNet isused in combination with a propagation algorithm that starts with aseed list of synsets of known sentiment and extend the polarity valuesthrough WordNet making use of its lexical relations

3512 Corpus Based Approaches

Baroni and Vegnaduzzo (2004) observed the co-occurrences of polaradjectives into subjective texts The ranking of a large list of adjectivesaccording to a subjectivity score has been implemented without anyknowledge-intensive external resource (eg lexical databases humanannotation ect ) The authors made use just of an unannotated listof adjectives and a small seeds set of manually selected subjective ad-jectivesThe main idea of this work is taken from the Turney (2001) work on co-occurrence statistics applied on the web as a corpus through the Web-based Mutual Information (WMI) method Baroni and Vegnaduzzo(2004) obtained their subjectivity scores by computing the Mutual In-formation of pairs of adjectives taken from each set using frequencyand co-occurrence frequency counts on the World Wide Web col-lected through queries to the AltaVista search engineKanayama and Nasukawa (2006b) proposed an unsupervised methodof lexicon construction for the annotation of polar clauses for domain-oriented Sentiment AnalysisThey grounded their work on unannotated corpora and anchoredtheir research on the context coherency (the tendency for equal po-larities to appear successively in contexts) achieving very satisfying re-

35 THE AUTOMATIC EXPANSION OF SENTITA 131

sults in terms of PrecisionQiu et al (2009) proposed a double propagation method which in-volved different kinds of relations between both sentiment words andfeatures (words modified by the sentiment words)With this method it has been possible to locate specific sentimentwords from relevant reviews starting from a small set of seed senti-ment words Their research emphasized the fact that in reviews senti-ment words always occur in association with featuresDouble propagation basically means that both sentiment words andfeatures can be reutilized to extract new sentiment words and newfeatures until no more sentiment words or features can be identifiedto continue the process The relations are described above all by De-pendency Grammars and trees Tesniegravere (1959) which represented thebase for the design of the extraction rulesIn detail the polarity of the new words is inferred using the followingrules

bull Heterogeneous rule the same polarity of known words is as-signed to the novel words

bull Homogeneous rule the same polarity of known words is as-signed to the novel words unless contrary words occur betweenthem

bull Intra-review rule in the cases in which the polarity cannot beinferred by dependency cues it is assumed that the sentimentword takes the polarity of the whole review

Maks et al (2012) proposed a method for the Dutch language thatis based on the idea that the words that express different types of sub-jectivity are distributed differently depending on the text types there-fore they grounded their work on the comparison between three cor-pora Wikipedia News and News commentsIn order to perform their task they used and test two different calcula-tions generally employed in Keyword Extraction tasks to identify the

132CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

words significantly frequent in a corpus with respect to other corporathe Log-likelihood Technique and the Percentage Difference Calcula-tions (DIFF) Their research revealed the better performances of theDIFF calculationWawer (2012) introduced for the Polish language a novel iterativetechnique of sentiment lexicon expansion which involved rule min-ing and contrast sets discovery The research took advantages fromtwo different textual resources in a first stage a small corpus endowedwith morpho-syntactic annotations (The National Corpus of Polish)to acquire candidates for emotive patterns and evaluate the vocabu-lary and then the whole web as corpus to perform the lexical expan-sionThe key idea of this work implies the fact that if a word appears inthe same extraction patterns as the seeds so they belong to the samesemantic class

3513 The Morphological Approach

Hatzivassiloglou and McKeown (1997) proposed a method for thesentiment lexicon expansion based on morphological relations be-tween adjectives They demonstrated that adjectives related in formalmost always have different semantic orientations eg ldquoadequate-inadequaterdquo ldquothoughtful-thoughtlessrdquo achieving 97 accuracyMoilanen and Pulman (2008) proposed five methods for the assign-ment of the prior sentiment polarities to unknown words based onknown sentiment carriers They started from a core sentiment lexi-con which contained 41109 entries tagged with positive (+) neutral(N) or negative (-) prior polarities and then they employed a classi-fier society of five rule-driven classifiers every one of which adopteda specific analytical strategy

Conversion that estimated zero-derived paronyms by retagging theunknown words with different POS tags

35 THE AUTOMATIC EXPANSION OF SENTITA 133

Morphological derivation that relied on derivational and inflectionalmorphology and is based on the words progressive transforma-tion into shorter paronymic aliases by using affixes and neo-classical combining forms Also some polarity reversal affixes(eg -less and not-so-) were supported

Affix-like Polarity Markers that computed affix-like sentiment mark-ers (eg well-built badly-behaving) which usually propagatesits sentiment orientation across the whole word

Syllables that split unknown words into individual syllables with arule-based syllable chunker and then computed them in orderto connect them with the words contained in the original lexicon

Shallow Parsing that split and POS-tagged unknown word into vir-tual sentences

Ku et al (2009) employed morphological structures and relationsfor the opinion analysis on Chinese words and sentences They classi-fied the words into eight morphological types through Support VectorMachines (SVM) and Conditional Random Fields (CRF) classifiersChinese morphological formative structures consist in three majorprocesses compounding affixation and conversion Every word iscomposed of one or more Chinese characters on the base of whichthe wordrsquos meaning can be interpretedThe authors demonstrated that the injection of morphological infor-mation can truly improve the performances of the word polarity de-tection taskNeviarouskaya (2010) proposed the expansion of the Sentiful lexicongrounding its task on the manipulation of the morphological struc-ture base and affixes of known lemmas (Plag 2003)The derivation process a linguistic device for the creation of new lem-mas from known ones by adding prefixes or suffixesThe results of the application of this method are 4000+ new derivedand scored sentiment words (1400+ adjectives almost 500 adverbs

134CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1800 nouns and almost 350 verbs)The author distinguished four types of affixes on the base of their rolesin the sentiment feature attribution

Propagating The sentiment features are preserved and are inher-ited by new derived lexical units often belonging to differentgrammatical categories (eg en- -ous -fy) The scoring functiontransfers the original polarity score to the new word without anyvariation

Reversing The affixes have the effect of changing the semantic orien-tation of the original lexeme (eg dis- -less) The scoring func-tion simply switches the original score of the starting lemma

Intensifying The sentiment features are strengthened (eg over-super-) The scoring function increases the score of the new wordwith respect to the score of the original lemma

Weakening The sentiment features are decreased (eg semi-) so thescore of the derive word is proportionally reduced by the scoringfunction

In the the approach of Neviarouskaya (2010) if a word is not alreadycontained in the Sentiful lexicon the algorithm checks the presenceof the missing lemma into the Wordnet database if the word actuallyexists there it is taken into account for a future inclusion into theSentiful lexicon Its sentiment orientation is always deduced from thefeatures of the starting word and the affixes used to derivate itNeviarouskaya et al (2011) described methods to automatically gen-erate and score a new sentiment lexicon SentiFul and expand itthrough direct synonymy and antonymy relations hyponymy rela-tions derivation and compounding with known lexical unitsThe authors distinguished four types of affixes used to derive newwords depending on the role they play with regard to sentimentfeatures propagating reversing intensifying and weakening Theyelaborated the algorithm for automatic extraction of new sentiment-

35 THE AUTOMATIC EXPANSION OF SENTITA 135

related compounds from WordNet by using words from SentiFul asseeds for sentiment-carrying base components and applying thepatterns of compound formationsWang et al (2011) proposed a morpheme-based fine-to-coarse strat-egy for Chinese sentence-level sentiment classification which usedpre-existing sentiment dictionaries in order to extract sentiment mor-phemes and calculate their polarity intensity Such morphemes arethen reused to evaluate the sentence-level semantic orientation onthe base of the sentiment phrases and their relevant polarity scoresThe authors preferred morphemes to words as the basic tokens forsentiment classification because they are much less numerous thanwordsWang et al (2011) derived the semantic orientation of unknown senti-ment words on the base of their component morphemes (Yuen et al2004 Ku et al 2009) thanks to a specific morpheme segmentationmodule which applied the Forward Maximum Matching (FMM) wordsegmentation technique for the decomposition of words into stringsof morphemes Their approach can manage both unknown lexicalsentiments and contextual sentiments for sentence-level sentimentclassification by taking into consideration multiple granularity-levelsentiments (eg sentiment morphemes sentiment words and senti-ment phrases) in a morpheme-based framework

352 Deadjectival Adverbs in -mente

As anticipated this Section aims to enlarge the size of SentIta on thebase of the morphological relations that connect the words and theirmeanings In a first stage of the work more than 5000 labeled ad-jectives have been used to predict the orientation of the adverbs withwhich they were morphologically related This Section explores therules formalized to perform the task

136CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Adverbs are morphologically invariable and consequently they donot present any inflection Anyway the majority of them is charac-terized by a complex structure that includes an adjective base and thederivational morpheme -mente ldquo-lyrdquo

[[veloce]A -mente]AVV ldquofast-lyrdquo

[[fragile]A -mente]AVV ldquodelicate-lyrdquo

[[rapido]A -mente]AVV ldquorapid-lyrdquo

Therefore thanks to a morphological FSA in SentIta the dictio-nary of sentiment adverbs has been derived from the adjectives one(see Figure 36) All the adverbs contained in the Italian dictionary ofsimple words have been put in a Nooj text and the above-mentionedgrammar has been used to quickly populate the new dictionary by ex-tracting the words ending with the suffix -mente ldquo-lyrdquo (Ricca 2004b)and by making such words inherit the adjectivesrsquo polarityThe Nooj annotations consisted in a list of 3200+ adverbs that at alater stage has been manually checked in order to adjust the gram-marrsquos mistakes (eg vigorosamente ldquovigorouslyrdquo and pazzamenteldquomadlyrdquo that respectively come from the positive adjective ldquovigorousrdquoand the negative adjective ldquomadrdquo as adverbs become intensifiers) andto add the Prior Polarity to the adverbs that did not end with the suf-fixes used in the grammar (eg volentieri ldquogladlyrdquo controvoglia ldquoun-willinglyrdquo)The manual check produced a set of 3600+ adverbs of sentiment thathas been completed with 126 adverbs that did not end in -menteIn detail the Precision achieved in this task is 99 and the Recall is88 The distribution of semantic tags in this dictionary is reported inTable 327 Figure 36 shows the local grammar in which we formal-ized the rules for the adverbs formation We started the word recogni-tion anchoring it on the localization of an adjective stem (see the firstnode on the left in the variable Agg)Than the automaton continues the word analysis by checking

35 THE AUTOMATIC EXPANSION OF SENTITA 137

Fig

ure

36

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofs

enti

men

tad

verb

s

138CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Tag Automatically Manuallygenerated adjusted

POS+FORTE 82 89POS 743 779

POS+DEB 61 55NEG+DEB 436 456

NEG 1273 1466NEG+FORTE 160 158

FORTE 366 482DEB 84 163

SINTESI 0 9CONFERMA 0 17

INVERTE 0 19Tot 3205 3693

Tot Int 450 645Tot Sent 2755 3003

Tot Discorso 0 42

Table 327 Composition of the adverb dictionary

that the adjective stem belongs to the sentiment dictionary (eg$Agg=A+POS+DEB) Figure 36 represents just an extract of the biggergrammar that includes not only the positive (POS) adverbs but alsothe negative onesNotice in the center of the FSA a multiplication of paths This dependson the different inflectional classes of adjectives to which the suffix -mente is attached (Figure 328)The rules used in the grammar to derive the adverbs are given in thefollowing (see Table 329 and Figure 328)

bull class 2 first path nothing in the adjective changes (eg veloc-eveloce-mente)

bull class 2 second path in the adjectives ending in -re -le the -e isdeleted [e] (eg fragil-e fragil-mente )

bull class 1 third path the -o is deleted [o] and substituted by the

35 THE AUTOMATIC EXPANSION OF SENTITA 139

Adj Number=s Number=pClass Gender=m Gender=f Gender=m Gender=f

1 -o -a -i -e2 -e -i

Table 328 Adjectiversquos inflectional classes Salvi and Vanelli (2004)

Adj Adjective AdverbClass Deletion Stem Thematic Vowel Suffix

1 rapid-o o rapid- -a-2 veloc-e - veloce- - -mente

fragil-e e fragil- -

Table 329 Rules for the adverb derivation

thematic vowel -a (eg rapid-o rapid-a-mente)

Actually in the FSA almost nothing about the inflection of the baseadjectives has been specified Because the adverbs of sentiment mustbe identified among the whole list of nooj adverbs and semanticallyannotated (and not generated from scratch) we did not find neces-sary to recall all the inflectional classes of the derived adjectives justthe deletion or the conservation of the final vowels was used to selectthe correct derivational ruleMistakes concerning the annotation of the adjectives regard those ex-ceptions that were not enough productive to deserve a specific pathin the local grammar Examples are the adjectives in -lento althoughthey have the -o as last vowel they do not require the thematic vowel-a- but the -e- (eg violento becomes violent-e-mente rather thanviolent-a-mente)

3521 Semantics of the -mente formations

The meaning of the deadjectival adverbs in -mente is not always pre-dictable starting from the base adjectives from which they are derived

140CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Also the syntactic structures in which they occur influences their in-terpretation Depending on their position in sentences the deadjecti-val adverbs can be described as follows

Adjective modifiers they modify adjectives or other adverbs eventhough two adverbs in -mente can appear close almost never

bull degree modifiers

ndash intensifierseg altamente ldquohighlyrdquo enormemente ldquoenormouslyrdquo

ndash downtonerseg moderatamente ldquomoderatelyrdquo parzialmente ldquopar-tiallyrdquo

Predicate modifiers they are translatable with the paraphrase in a Away in which A is the adjective base

bull verb argumentseg Maria si comporta perfettamenteldquoMaria acts perfectlyrdquo

bull extranuclear elementseg Maria si reca settimanalmente a MilanoldquoMaria goes weekly to Milanrdquo

Sentence modifiers they usually do not modify the sentence contentbut they often give it coherence or work as discourse signals

bull evaluative adverbs eg stupidamente ldquofoolishlyrdquo

bull modal adverbs eg necessariamente ldquonecessarilyrdquo

bull circumstantial adverbs of time eg ultimamente ldquolatelyrdquo

bull quantifiers over time eg frequentemente ldquooftenrdquo

bull domain adverbs eg politicamente ldquopoliticallyrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 141

The French LG description of adverbs in -mente includes 3200+items represented in LG tables which classify the adverbs in a sim-ilar way They refer to sentential adverbs eg conjuncts style dis-juncts and attitude disjuncts that includes evaluative adverbs adverbsof habit modal adverbs and subject oriented attitude adverbs andadverbs integrated into the sentence eg adverbs of subject orientedmanner adverbs of verbal manner adverbs of quantity adverbs oftime viewpoint adverbs focus adverbs (Molinier and Levrier 2000Sagot and Fort 2007)

3522 The Superlative of the -mente Adverbs

As concern the superlative form of the adverbs in -mente it must beunderlined that they have been treated as adverbs derived from thesuperlative form of the adjectives In fact the rule for the adverb for-mation is selected by the inflectional paradigm of the superlative formand not by the adjective inflection Moreover the semantic orienta-tion inherited by the superlative adverb is again the one belonging tothe superlative adjective and not the one of the adjective itself There-fore the superlative adverbs FSA is almost the same of the one shownin Figure 1 the only difference is in the recognition of the base adjec-tive that is A+SUP rather than only A

353 Deadjectival Nouns of Quality

The derivation of quality nouns (QN) from qualifier adjectives is an-other derivation phenomenon of which we took advantage for the au-tomatic enlargement of SentIta These kind of nouns allow to treat asentities the qualities expressed by the base adjectivesA morphological FSA (Figure 37) following the same idea of the ad-verbs grammar matches in a list of abstract nouns the stems that are

142CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

in morpho-phonological relation with our list of hand-tagged adjec-tives Because the nouns differently from the adverbs need to havespecified the inflection information we associated to each suffix en-try in the QN suffixes electronic dictionary the inflectional paradigmthat they give to the words with which they occur (see Table 330)

As regards the suffixes used to form the quality nouns (Rainer 2004)it must be said that they generally make the new words simply inheritthe orientation of the derived adjectives Exceptions are -edine and -eria that almost always shift the polarity of the quality nouns into theweakly negative one (-1) eg faciloneria ldquoslapdash attitude Also thesuffix -mento differs from the others in so far it belongs to the deriva-tional phenomenon of the deverbal nouns of action (Gaeta 2004) Ithas been possible to use it into our grammar for the deadjectival nounderivation by using the past participles of the verbs listed in the ad-jective dictionary of sentiment (eg Vsfinire ldquoto wear out Asfinitoldquoworn out Nsfinimento ldquoweariness) Basically we chose to includeit in our list of suffixes because of its productivity This caused an over-lap of 475 nouns derived by both the psychological predicates and thequalifier adjectives of sentiment Just 185 psych nominalizations ex-ceeded the coverage of our mophological FSA proving the power ofour methodology also in terms of RecallIn about half the cases the annotations differed just in terms of inten-

sity the other mistakes affected the orientation also (see Table 333)In general the Precision achieved in this task is 93 while the Preci-sion performances of the automatic tagging of the QN compared withthe manual tagging made on the corresponding nominalizations ofthe psych verbs is summarized in Table 334Tables 331 and 332 show the differences in productivity of all the suf-

fixes (SFX) among the electronic dictionary of abstract nouns (Abstr)the sublists of QN that have their counterparts in the Psy V-n list (PsyV-n=QN) and the whole collection of QN (QN tot)As regards the suffixes productivity Abstr the suffixes that achievedthe higher percentage values are -itagrave -ia and -(zione) The most pro-

35 THE AUTOMATIC EXPANSION OF SENTITA 143

Fig

ure

37

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofq

ual

ity

no

un

s

144CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Infl

ecti

on

alp

arad

igm

(FL

X)

Co

rrec

tmat

ches

Err

ors

Pre

cisi

on

()

-edin-e N46 0 0 --etagrave N602 0 0 --izie N602 0 0 --el-a N41 0 1 0-udine N46 5 2 7143-or-e N5 36 9 80-zion-e N46 359 59 8589-anz-a N41 57 9 8636-itudin-e N46 13 2 8667-ur-a N41 142 20 8765-ment-o N5 514 58 8986-izi-a N41 14 1 9333-enz-a N41 148 10 9367-eri-a N41 71 4 9467-ietagrave N602 27 1 9643-aggin-e N46 72 2 973-i-a N41 145 3 9797-itagrave N602 666 13 9809-ezz-a N41 305 2 9935-igi-a N41 3 0 100-z-a N41 2 0 100tot - 2579 196 9294

Table 330 Error analysis of the automatic QN annotation

35 THE AUTOMATIC EXPANSION OF SENTITA 145

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 498 73 305-aggin-e 149 13 72-itudin-e 31 1 13-ietagrave 65 2 27-igi-a 9 0 3-izi-a 40 2 14-enz-a 461 11 148-anz-a 189 7 57-eri-a 267 13 71-itagrave 3263 53 666-or-e 297 6 36-zion-e 2866 87 359-ur-a 1700 11 142-udin-e 39 3 5-i-a 3545 13 145-el-a 15 0 0-edin-e 4 0 0-etagrave 66 0 0-izie 3 0 0-z-a 30 2 2-ment-o 2520 150 514

Table 331 Presence of the QN suffixes in different dictionaries

146CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 31 1633 628-aggin-e 093 291 148-itudin-e 019 022 027-ietagrave 04 045 056-igi-a 006 0 006-izi-a 025 045 029-enz-a 287 246 305-anz-a 118 157 117-eri-a 166 291 146-itagrave 2032 1186 1372-or-e 185 134 074-zion-e 1785 1946 74-ur-a 1059 246 293-udin-e 024 067 01-i-a 2208 291 299-el-a 009 0 0-edin-e 002 0 0-etagrave 041 0 0-izie 002 0 0-z-a 019 045 004-ment-o 1569 3356 1059

Table 332 Productivity of the QN suffixes in different dictionaries

35 THE AUTOMATIC EXPANSION OF SENTITA 147

Co

rres

po

nd

ence

s

Dif

fere

nce

s

Dif

fere

ntO

rien

tati

on

On

lyD

iffe

ren

tIn

ten

sity

475 185 93 92

Table 333 About the overlap among Psych V-n dictionary and QN dictionary

Precision Precision Orientation onlyQNltgtPsy V-n 092 -QN= Psy V-n 061 080

Table 334 Precision measure about the overlap of QN and Psy V-n

ductive suffixes for the Psy V-n=QN formation are -(z)ione -mento and-ezza while the ones most productive for the QN tot are -itagrave -(z)ione-mento

354 Morphological Semantics

Our morphological FSA can moreover interact with a list of pre-fixes able to negate (eg anti- contra- non- ect ) or to inten-sifydowntone (eg arci- semi- ect ) the orientation of the wordsin which they appear (Iacobini 2004)If the suffixes for the creation of quality nouns can interact with thepreexisting dictionaries of the Italian module of Nooj in order toautomatically tag them with new semantic descriptions the prefixestreated in this Section directly work on opinionated documents sothe machine can understand the actual orientation of the wordsoccurring in real texts also when their morphological context shifts

148CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

the polarity of the words listed in the dictionariesThe collection of the suffixes that from now on we will call Morpho-logical Contextual Valence Shifters (mCVS) is reported in Tables 335and 336

Prefixes Negation typesnon- CDDmis- CRa- CR+PRVdis- CR+PRVin- CR+PRVs- CR+PRVanti- OPPcontra- OPPcontro- OPPde- PRVdi- PRVes- PRV

Table 335 Negation suffixes

They are endowed with special tags that specify the way in whichthey alter the meaning of the sentiment words with which they occur

bull FORTE ldquostrongrdquo intensifies the Semantic Orientation of thewords making their polarity increase of one position in the eval-uation scale (first path in Figure 38)

bull DEB ldquoweakrdquo downtones the Semantic Orientation of the wordsmaking their polarity decrease of one position in the evaluationscale (second path in Figure 38)

bull NEGAZIONE ldquonegationrdquo works following the same rules of thenegation formalised in the syntactic grammars (third path in Fig-ure 38)

ndash CR ldquocontraryrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 149

Intensifiers Downtonersiper- bis-macro- fra-maxi- infra-mega- intra-multi- ipo-oltre- micro-pluri- mini-poli- para-sopra- semi-sovra- sotto-stra- sub-super- -sur- -ultra- -

Table 336 Intensifierdowntoner suffixes

ndash OPP ldquooppositionrdquo

ndash PRV ldquodeprivationrdquo

ndash CDD ldquocontradictionrdquo

Figure 38 shows the morphological FSA that combines the polarityand intensity of the adjectives of sentiment with the meaning carriedon by the mCVSThe shifting rules in terms of polarity score which are the same ex-ploited in the syntactic grammar net (see Section 4) are summarizedin this grammar through the green comments on the right side of theautomatonAs regards the manipulation of the annotations of the resulting wordswe followed three parallel solutions

bull when the score doesnrsquot change (eg the case of the intensificationof words with starting polarity +3 or the case in which words withpolarity -1 are decreased) the resulting word simply inherits theinflectional ($2F) and syntactic and semantic ($2S) information

150CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re38M

orp

ho

logicalC

on

textualV

alence

Shiftin

go

fAd

jectives

35 THE AUTOMATIC EXPANSION OF SENTITA 151

of the original word

bull when the intensity changes and the polarity remains steady (egwhen the words with polarity +-2 are intensified or downtoned)the resulting word inherits the inflectional and syntactic and se-mantic information of the original word but also obtain the tagsFORTEDEB

bull when both the intensity and the polarity change (in almost ev-ery case in which the words are negated) the resulting word justinherits the inflectional information while the semantic tags areadded from scratch

We used the list of the sentiment adjectives as Nooj text in order tocheck the performances of our grammar The discovery that the wordscontaining the mCVS lemmatised in the dictionary are very few (just29 adjectives eg straricco ldquovery richrdquo ultraresistente ldquoheavy dutyrdquostramaledetto ldquodamnedrdquo and 9 adverbseg strabene ldquovery wellrdquo ul-trapiattamente ldquovery dullyrdquo) increases the importance of a FTA likethe one described in this ParagraphIt is also important to underline that all the synonyms of poco ldquofewrdquoalso the ones that take the shape of morphemes eg ipo- sotto- sub-do not seem be proper downtoners but resemble the behaviour ofthe negation words that transform the sentiment words with whichthey occur into weakly negative ones see Section 42 (eg ipodotatoldquosubnormalrdquo ipofunzionante ldquohypofunctioningrdquo) Therefore we ex-cluded them from the dictionary of mCVS and we included it into adedicated metanode of the morphological FSA able to correctly com-pute its meaning

Chapter 4

Shifting Lexical Valencesthrough FSA

41 Contextual Valence Shifters

In Section 331 we provided the definitions of Semantic Orientationthe measure of the polarity and intensity of an opinion (Taboada et al2011 Liu 2010) and of Prior Polarity the individual words contex-independent orientation (Osgood 1952) While the second concept isrelevant during the population of sentiment dictionaries the first onerefers to the final semantic annotation that can be automatically ormanually attributed to whole sentences and documentsThe need to make a distinction between these two measures is due tothe semantic modifications that can come from the words surround-ing contexts (Pang and Lee 2008a) Contextual Valence Shifters arelinguistic devices able to change the prior polarity of words when co-occurring in the same context (Kennedy and Inkpen 2006 Polanyiand Zaenen 2006)From a computational point of view Contextual Valence Shifting

153

154 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

is nothing more than an enhanced version of the term-countingmethod firstly proposed by Turney (2002) but from the distribu-tional analysis perspective it issues a number of challenges that de-serve to be examined (Neviarouskaya et al 2009a)In this work we handle the contextual shifting by generalizing all thepolar words that posses the same prior polarity A network of localgrammars as been designed on a set of rules that compute the wordsindividual polarity scores according to the contexts in which they oc-curIn general the sentence annotation is performed using six differentmetanodes that working as containers for the sentiment expressionsassign equal labels to the patterns embedded in the same graphsWe used the FSA editor of Nooj to formalize our CVS rules for theirease of use and their computational power but nothing prevents theuse of other strategies and tools for their application to texts Con-sequently in this chapter we will limit the discussion to the shift-ing rules without any further mentions of their actual formalizationwhich in a banal way can be summarized into the treatment of em-bedded graphs as score boxes

42 Polarity Intensification and Downtoning

421 Related Works

Amplifiers and downtoners (Quirk et al 1985) intensifiers and di-minishers (Polanyi and Zaenen 2006) overstatements and under-statements (Kennedy and Inkpen 2006) intensifiers and downtoners(Taboada et al 2011) are all couples of concepts that refer to the in-creasing or decreasing effects of special kinds of words on orientedlexical itemsAlthough the changes of the polarity scores are very intuitive the sim-ple addition and subtraction of a score tofrom the base valence of a

42 POLARITY INTENSIFICATION AND DOWNTONING 155

term (Polanyi and Zaenen 2006 Kennedy and Inkpen 2006) has beencriticized by Taboada et al (2011) which instead associated differentpercentage values (positive for intensifiers and negative for downton-ers) to each strength wordDifferently from both of these works we decided to restrain the com-plexity of the rules by avoiding the distinction of degrees In compen-sation we did not limit the intensive function only to degree adverbsand adjectives but we took under consideration also verbs and nouns

422 Intensification and Downtoning in SentIta

The lexical items for the computational treatment of intensification inSentIta are the ones denoted by the tags FORTE and DEBAs stated in the previous Chapter Section 33 the lexical items en-dowed with these labels do not possess any orientation but can mod-ify the intensity of semantically oriented wordsThe strength indicators included in the intensification FSA that ac-count for (but are not limited to) such modifiers are the following

bull Morphological Rules

ndash the use of the absolute superlative suffixes -issim-o and -errim-o that always increase the wordsrsquo semantic orienta-tions

ndash the use of intensifying or downtoning prefixes (eg super-stra- semi- micro-) shown in Section 354

bull Syntactic Rules

ndash the repetition of more than one negative or positive words(that always increases the wordsrsquo semantic orientations) egbello[+2] bello[+2] bello[+2] [+3] ldquonice nice nicerdquo)

ndash the co-occurrence of words belonging to the strength scale(tags FORTEDEB) with the sentiment words listed in the

156 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

evaluation scale (tags POSNEG)

As generally assumed in literature the adverbs intensify or atten-uate adjectives (46) verbs (48) and other adverbs (49) while the ad-jectives modify the intensity of nouns (47) We adopted the simpleststrategy for the polarity modification the additionsubtraction of onepoint into the evaluation scale (Polanyi and Zaenen 2006 Kennedyand Inkpen 2006) Because this scale does not exceed the +ndash3words that starts from this polarity can not be increased beyond (egdavvero[+] eccezionale[+3] [+3] ldquotruly exceptionalrdquo)

(46) Parzialmente[-] deludente[-2] anche il reparto degli attori [-1]ldquoPartially unsatisfying also the actor staff

(47) Ciograve che ne deriva () egrave una terribile[-2] confusione[-2] narrativa[-3]

ldquoWhat comes from it is a terrible narrative chaos

(48) Alla guida ci si diverte[+2] molto[+] [+3]ldquoIn the driverrsquos seat you have a lot of fun

(49) Ne sono rimasta molto[+] favorevolmente[+2] colpita [+3]ldquoI have been very favourably affected of it

Anyway as already discussed in Section 334 also special subsetsof verbs possess the power of intensifying or downtoning the polarityof the nominal groups or complement clauses they case by case affectin subject (50) or object (51) position (see Figure 31 Section 334)

(50) Lrsquoamore[+2] travolge[+] Maria [+3]ldquoThe love overwhelms Mariardquo

(51) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoYour words overwhelm Mary with happinessrdquo

Examples of intensifyingdowntoning verbs with the LG classesthey belong are reported below

42 POLARITY INTENSIFICATION AND DOWNTONING 157

bull the list of entries translated from the French work of Balibar-Mrabti (1995) eg travolgere[+] ldquoto overwhelmrdquo risplendere[+] ldquotoshinerdquo

bull verbs from LG class 24 (15 intense 2 weak) eg aumentare[+]

ldquoto increaserdquo gonfiare[+] ldquoto amplifyrdquo calare[ndash] ldquoto go downrdquodiminuire[ndash] ldquoto decreaserdquo

bull verbs from LG class 41 (45 intense 19 weak) eg rivoluzionare[+]

ldquoto revolutionizerdquo sconquassare[+] ldquoto smashrdquo moderare[ndash] ldquotomoderaterdquo placare[ndash] ldquoto appeaserdquo

bull verbs from LG class 47 (43 intense 18 weak) eg strillare[+] ldquotoshoutrdquo sventolare[+] ldquoto show offrdquo mormorare[ndash] ldquoto murmurrdquosussurrare[ndash] ldquoto whisperrdquo

bull verbs from LG class 50 (4 intense 0 weak) eg assicurare[+] ldquotoassurerdquo garantire[ndash] ldquoto guaranteerdquo

bull verbs from LG class 56 (5 intense 3 weak) eg abbandonarsi[+]

ldquoto abandon yourselfrdquo affrettarsi[+] ldquoto rushrdquo accennare[ndash] ldquototouch uponrdquo esitare[ndash] ldquoto hesitaterdquo

As regards intensifying or downtoning nouns we refer to the onesautomatically derived from degree adjectives (220 intense 95 weak)eg sfrenatezza[+] ldquowildnessrdquo abbondanza[+] ldquoabundancerdquo labilitagrave[ndash]

ldquoevanescencerdquo fugacitagrave[ndash] ldquofugacityrdquo and to the nominalizations of in-tensifying or downtoning verbs (41 intense 95 weak) eg rafforza-mento[+] ldquostrengtheningrdquo fervore[+] ldquofervorrdquo affievolimento[ndash] ldquoweak-ening rdquo diminuzione[ndash] ldquodecreaserdquoThey can modify both other nouns (52) and verbs (53)

(52) La sfrenatezza[+] dellrsquoodio[-3] di Maria [-3]ldquoThe wildness of the Maryrsquos haterdquo

(53) Luca difendeva[+2] Maria con fervore[+] [+3]ldquoLuca was defending Maria with fervourrdquo

158 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Intensification negation modality and comparison can appear to-gether in the same sentence We will describe these eventuality in thefurther paragraphs

423 Excess Quantifiers

Words that at first glance seem to be intensifiers but at a deeper anal-ysis reveal a more complex behavior are abbastanza ldquoenoughrdquo troppoldquotoo much and poco ldquonot much Because of their particular influ-ence on polar words their meaning especially when associated topolar adjectives have been associated in literature to other contex-tual valence shifters1 Examples are (Meier 2003 p 69) that proposedan interpretation that includes a comparison between extents

ldquoEnough too and so are quantifiers that relate an extentpredicate and the incomplete conditional (expressed bythe sentential complement) and are interpreted as compar-isons between two extentsThe first extent is the maximal extent that satisfies the extentpredicate The second extent is the minimal or maximal ex-tent of a set of extents that satisfy the (hidden) conditionalwhere the sentential complement supplies the consequentand the main clause the antecedent [] Intuitively thevalue an object has on a scale associated with the meaningof the adjective or adverb I related to some standard of com-parison that is determined by the sentential complementrdquo(Meier 2003 p 69)

(Schwarzschild 2008 p 316) that noticed the possibility of an implicitnegation

1 When quantifiers like the ones under examination modify polar words they compare such wordswith a threshold value that is defined by an infinitive clause in position N1 (eg The food is too good tothrow (it) away) (Meier 2003 Schwarzschild 2008)

42 POLARITY INTENSIFICATION AND DOWNTONING 159

ldquoIn excessives the infinitival complement while it does notform a threshold description it does contain an implicitnegation So like other negative statements excessives sup-port let alone rejoinders2rdquo

And again (Meier 2003 p 71) that transformed the infinitive clauseinto a modal expression

ldquoIn this paper I will propose that the sentential comple-ments of constructions with too and enough implicitly orexplicitly contain a modal expression [] Evidence for thismove is the fact that a modal expression (with existentialforce) can be added or omitted without changing the intu-itive meaning of the sentences [] An explicit modal ex-pression like be able or be allowed in the sentential comple-ment makes them more precise but it does not make themunacceptablerdquo

In this research we noticed as well that the co-occurrence of troppopoco and abbastanza with polar lexical items can provoke in their se-mantic orientation effects that can be associated to other contextualvalence shifters The ad hoc rules dedicated to these words (see Table41) are not actually new but refer to other contextual valence shift-ing rules that have been discussed in this Paragraph andor will beexplored in this ChapterTroppo and poco can also co-occur or can be negated the rules to beapplied in each case are reported in Table 42Troppo poco and abbastanza are also able to give a polarity to wordsthat in SentIta possess a neutral Prior Polarity An exception is the caseof non troppo + neutral words (Table 42) that does not produce gen-eralizable effects (eg non troppo decorato ldquonot so decoratedrdquo couldhave a weakly positive orientation while same can not be said of non

2This fuel is too volatile to use in a car engine = This fuel is too volatile to use in a car engine let alone alawn mower engine = This fuel should not be used in a car engine let alone a lawn mower engine Examplefrom Schwarzschild (2008)

160 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo Intensification Negative Switch Intensificationpoco Weakly Negative Switch Weakly Negative Switch Weak Negationabbastanza Downtoning Weakly Positive Switch Intensification

Table 41 The effects of troppo poco and abbastanza on polar lexical items

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo poco Strong Negation Negative Switch Downtoningun poco troppo Intensification Negative Switch Intensificationnon troppo Negative Switch mdash Weak Negationnon poco Intensification Weakly Positive Switch Intensification

Table 42 The effects of the co-occurrence and the negation of troppo poco withreference to polar lexical items

troppo nuovo ldquonot so newrdquo) Therefore it has been excluded from therules in order to avoid additional errors

43 Negation Modeling

431 Related Works

Negation modeling is an NLP research area that includes the auto-matic detection of negation expressions in raw texts and the determi-nation of the negation scopes that are the parts of the meaning modi-fied by negation) (Jia et al 2009)Negation meets the Sentiment Analysis need of determining the cor-rect polarity of opinionated words when shifted by the context That iswhy we included it into our set of Contextual Valence Shifters (Polanyiand Zaenen 2006 Kennedy and Inkpen 2006)In this paragraph we will survey the literature on negation modelinggoing in depth through the state-of-the-art methods for the automatic

43 NEGATION MODELING 161

negation detection from the pioneer work of Pang and Lee (2004) tomore sophisticated techniques such as some Semantic Compositionmethods which even define the syntactic contexts of the polar expres-sions (Choi and Cardie 2008)Despite the large interest on negation in the biomedical domain(Harkema et al 2009 Descleacutes et al 2010 Dalianis and Skeppstedt2010 Vincze 2010) maybe due to the availability of free annotatedcorpora (Vincze et al 2008) this paragraph focuses on the more sig-nificant works on sentiment analysis

Bag of Words (Pang and Lee 2004) demonstrated that using con-textual information can improve the polarity-classification accu-racy By means of a standard bag-of-words representation theyhandled negation modeling by adding artificial words to textsand without any explicit knowledge of polar expressions (eg Ido not NOT_like NOT_this NOT_new NOT_Nokia NOT_model)The problem with this method is the duplication of the featureswhich are always represented in both their plain and negated oc-currences (Wiegand et al 2010)

Contextual Valence Shifters Polanyi and Zaenen (2006) and Kennedyand Inkpen (2006) proposed a more sophisticated method thatboth switches (eg clever[+2] = not clever[-2]) or shifts (efficient[+2]

= rather efficient[+1]) the initial scores of words when negated ordowntoned in context As a general rule polarized expressionsare considered to be negated if the negation words immediatelyprecede themWilson et al (2005 2009) distinguished the following three typesof binary relationship polarity features for negation modeling

bull negation features negation expressions that negate polar ex-pressions

bull shifter features expressions that can be approximatelyequated with downtoners

162 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull positive and negative polarity modification features specialkind of polar expressions that modify other polar expres-sions turning their polarity into the positive or negative one

Semantic Composition Moilanen and Pulman (2007) exploited thesyntactic representations of sentences into a compositional se-mantics framework with the purpose of computing the polarityof headlines and complex noun phrases In this work composi-tion rules are incrementally applied to constituents dependingon the negation scope by means of negation words shifters andnegative polar expressionsA similar approach is the one of Shaikh et al (2007) that used verbframes as a more abstract level of representationChoi and Cardie (2008) computed the phrases polarity in twosteps the assessment of polarity of the constituent and the sub-sequent application of a set of previously defined inference rulesFor example the rule [Polarity([NP1]ndash [IN] [NP2]ndash) = +] can be ap-plied to phrases like [[lack]ndash

NP1 [of ]IN [crime]ndashNP2 in rural areas])

Liu and Seneff (2009) Taboada et al (2011) proposed composi-tional models able to mix together intensifiers downtoners po-larity shifters and negation words always starting from the wordsprior polarities and intensitiesThe work of Benamara et al (2012) goes beyond the studies men-tioned above since they specifically account for negative polarityitems and for multiple negativesSocher et al (2013) exploited Recursive Neural Tensor Network(RNTN) to compute two types of negation

bull negation of positive sentences where the negation is sup-posed to changes the overall sentiment of a sentence frompositive to negative

bull negation of negative sentences and their negationwhere theoverall sentiment is considered to become less negative (butnot necessarily positive)

43 NEGATION MODELING 163

Negation Scope Detection Jia et al (2009) introduced the conceptof scope of the negation term t The scope identification3 goesthrough the computing of a candidate scope (a subset of thewords appearing after t in the sentence that represent the mini-mal logical units of the sentence containing the scope) and thena pruning of those words The computational procedure thatapproximates the candidate scope considers the following ele-ments static delimiters eg because or unless that signal the be-ginning of another clause dynamic delimiters eg like and forand heuristic rules focused on polar expressions that involve sen-timental verbs adjectives and nouns

Morphological Negation modeling In order to complete the litera-ture survey on negation modeling we can not fail to mentionalso morphological negation modeling However due to the factthat the scope of any morphological negation remains includedinto simple words (Councill et al 2010) we deepened this as-pect of the negation in Sections 351 and 354 of Chapter 3We mention again here only the most significant contributionon this topic Hatzivassiloglou and McKeown (1997) Plag (2003)Yuen et al (2004) Moilanen and Pulman (2008) Ku et al (2009)Neviarouskaya (2010) Neviarouskaya et al (2011) and Wang et al(2011)

432 Negation in SentIta

As exemplified in the following sentences negation indicators do notalways change a sentence polarity in its positive or negative counter-parts (54) they often have the effect of increasing or decreasing thesentence score (55) That is why we prefer to talk about valence ldquoshift-ing rather than ldquoswitchingrdquo

3Scope Identification concerns the determination at a sentence level of the tokens that are affectedby negation cues (Jia et al 2009)

164 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

(54) Citroen non[neg] produce auto valide[+2] [-2]ldquoCitroen does not produce efficient cars

(55) Grafica non proprio[neg] spettacolare[+3] [-1]ldquoThe graphic not quite spectacular

(Taboada et al 2011 p 277) made comparable considerations

ldquoConsider excellent a +5 adjective If we negate it we get notexcellent which intuitively is a far cry from atrocious a -5adjective In fact not excellent seems more positive than notgood which would negate to a -3 In order to capture thesepragmatic intuitions we implemented another method ofnegation a polarity shift (shift negation) Instead of chang-ing the sign the SO value is shifted toward the opposite po-larity by a fixed amount (in our current implementation 4)Thus a +2 adjective is negated to a -2 but the negation of a-3 adjective (for instance sleazy) is only slightly positive aneffect we could call lsquodamning with faint praisersquo [] it is verydifficult to negate a strongly positive word without implyingthat a less positive one is to some extent true and thus ournegator becomes a downtonerrdquo

Just as (Benamara et al 2012 p 11)

ldquo() negation cannot be reduced to reversing polarity Forexample if we assume that the score of the adjective ldquoex-cellentrdquo is +3 then the opinion score in ldquothis student is notexcellentrdquo cannot be -3 It rather means that the student isnot good enough Hence dealing with negation requires togo beyond polarity reversalrdquo

Even though the subtraction of a fixed amount of polarity points4

seems to us a risky generalization the authors confirmed the Hornrsquos

4 Other strategies for the automatic negation modeling have been proposed by Yessenalina and Cardie(2011) who combined words using iterated matrix multiplication and Benamara et al (2012) who morein line with our research computed negation on the base of its own types

43 NEGATION MODELING 165

ideas on the asymmetry between affirmative and negative sentencesin natural language differently from standard logic (Horn 1989)Such assumption complicates any attempt to automatically extractthe meaning of negated phrases and sentences simply switching theiraffirmative scoresOur hypothesis that can be easily collocated into the compositionalsemantics approaches is that the final polarity of a negated expres-sion can be easily modulated but only taking into account at thesame time both the prior polarity of the opinionated expressions andthe strength of the negation indicators (see Tables 43 45 and 46)Despite of any complexity the finite-state technology offered us theopportunity to easily compute the negation influence on polarizedexpressions and to formalize the rules to automatically annotate realtextsAs regards the full list of negation indicators we included in our gram-mar three types of negation (Benamara et al 2012 Godard 2013) thatrespectively include the following items As we will show each typehas a different impact on the opinion expressions in terms of polarityand intensity

Negation operators

bull simple adverbs of negationnoAVV+NEGAZIONE

nonAVV+NEGAZIONE

micaAVV+NEGAZIONE

affattoAVV+NEGAZIONE+FORTE

neancheAVV+NEGAZIONE

neppureAVV+NEGAZIONE

mancoAVV+NEGAZIONE

senzaPREP+NEGAZIONE 5

bull compound adverbs of negationneanche per sognoAVV+AVVC+PC+PN+NEGAZIONE+FORTE

5ldquonotrdquo ldquoat allrdquo ldquoby no meansrdquo ldquoneitherrdquo

166 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

per niente al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

per nulla al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

in nessun modoAVV+AVVC+PDETC+PDETN+NEGAZIONE+FORTE

per nienteAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

per nullaAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

niente affattoAVV+AVVC+CC+AVVAVV+NEGAZIONE+FORTE6

Negation operator Rules Negative operators can appear alone (56)or with a strengthening function with another negation adverb(57)

(56)

Mi caA f f at to

Per ni ente

bel lo l primehotel

ldquoTotally not cool the hotelrdquo

(57) Lprimehotel non egrave

mi caa f f at to

per ni ente

bel lo

ldquoThe hotel is not cool at allrdquo

The general rules concerning negation operators formalized inthe grammars in the form of FSA are summarized in the Tables43 44 45 and 46 The local grammar principle is to put inthe same embedded graph all the structures that are supposedto receive the same score Such score is indicated in the columnShifted Polarity

Negative Quantifiers

bull AdjectivesnessunoA+FLX=A114+NEGAZIONE

nienteA+FLX=A605+NEGAZIONE 7

6ldquono wayrdquo ldquofor anything in the worldrdquo ldquotherersquos no wayrdquo ldquofor nothingrdquo ldquonot at allrdquo7ldquonobodyrdquo ldquonothingrdquo

43 NEGATION MODELING 167

Negation Sentiment Word ShiftedOperator Word Polarity Polarity

non

fantastico +3 -1bello +2 -2

carino +1 -2scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 43 Negation rules

Negation Negation Sentiment Word ShiftedOperator Operator Word Polarity Polarity

non mica

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 44 Negation rules with the repetition of negative operators

Strong Negation Sentiment Word ShiftedOperator Word Polarity Polarity

per niente

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 45 Negation rules with strong operators

168 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Weak Negation Sentiment Word ShiftedOperator Word Polarity Polarity

poco

fantastico +3 -1bello +2 -1

carino +1 -1nuovo 0 -1scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 46 Negation rules with weak operators

bull AdverbsnienteAVV+NEGAZIONE

nullaAVV+NEGAZIONE

pocoAVV+NEGAZIONE+DEB

poco o nienteAVV+PCONG+NKN+NEGAZIONE+DEB

poco o nullaAVV+PCONG+NKN+NEGAZIONE+DEB maiAVV+NEGAZIONE8

bull PronounsnessunoPRON+FLX=PRO717+NEGAZIONE

nientePRON+FLX=PRO719+NEGAZIONE

pocoPRON+FLX=PRO721+NEGAZIONE+DEB9

Negative Quantifiers Rules As indicated in the dictionary extractabove and as exemplified in the sentences below negative quan-tifiers which express both a negation and a quantification cantake the shape of different part of speech and consequently theycan assume in sentences different syntactic positionsWe can directly observe that they change their function on thebase of the different roles they assume

Negative quantifiers as Adjectives they tend to be pragmatically

8ldquonothingrdquo ldquolittlerdquo ldquolittle or nothingrdquo ldquoneverrdquo9ldquonot onerdquo ldquofewrdquo

43 NEGATION MODELING 169

negative when occurring as adjectival modifiers (58 59) justin the same way of lexical negation indicators (Potts 2011)

(58) Cortesia nessuna [-2]ldquoNo kindnessrdquo(59) Nessun servizio nelle stanze [-2]ldquoNo room servicerdquo

Negative quantifiers as Adverbs they work exactly as poco whenthey occupy the adverbial position (60 61) following therules formalized in the next paragraph (see Table 46) As ad-verbs they can occur together with negation operators (61)or not (60)

(60) Costa quasi niente [+1]ldquo It costs almost nothingrdquo(61) Non costa nulla [+2]ldquo It doesnrsquot cost anythingrdquo

Negative quantifiers as Pronouns they seem to possess the samestrengthening function of negative operators when occur-ring as pronouns (62 63) the rules of reference are in Table45

(62) Non lo consiglio a nessuno [-3]ldquoI donrsquot recommend it to anyonerdquo(64) Non gliene frega niente a nessuno [-3]ldquoNobody caresrdquo

Negative quantifiers in verbless sentences into elliptical sen-tences (64 65) they resemble the negation function of neg-ative operators following the rules of Table 43

(65) Niente di veramente innovativo [-1]ldquoNothing truly innovativerdquo(66) Nulla da eccepire [+1]ldquoNo objectionsrdquo

170 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Lexical Negation

Nouns

V er bs

Ad j ect i ves

mancanzaassenzacar enza

mancar edi f et t ar e

scar seg g i ar e

mancantecar ente

pr i vo

di

10

Lexical negation Rules lexical negation regards content word nega-tors that always possess an implicit negative influence There-fore when they occur in texts their polarity is assumed to be neg-ativeBecause they can co-occur with both the negative operators andquantifiers their polarity can also be shifted to the positive oneby simply applying the negation rules conceived for the first twonegation indicators

44 Modality Detection

441 Related Works

Speculative Language Detection (Dalianis and Skeppstedt 2010 De-scleacutes et al 2010 Vincze et al 2008) Hedge Detection (Lakoff 1973Ganter and Strube 2009 Zhao et al 2010) Irrealis Bloking (Taboadaet al 2011) and Uncertainty Detection (Rubin 2010) are all tasks thatcompletely or partially overlap modal constructions

10ldquolack to lack in to be laking ofrdquo

44 MODALITY DETECTION 171

In order to account for the most popular annotate corpora in this NLPfields we mention the BioScope corpus Vincze et al (2008) which hasbeen annotated with negation and speculation cues and their scopesthe FactBank that has been annotated with event factuality (Sauriacuteand Pustejovsky 2009) and the CoNLL Shared Task 2010 (Farkas et al2010) which focused on hedges detection in natural language textsModality has been defined by Kobayakawa et al (2009) into a taxon-omy that includes request recommendation desire will judgmentetcTaboada et al (2011) did not consider irrealis markers to be reliablefor the purposes of sentiment analysis Therefore they simply ignoredthe semantic orientation of any word in their scope Their list of irre-alis markers includes modals conditional markers (eg ldquoifrdquo) negativepolarity items (eg ldquoanyrdquo and ldquoanythingrdquo) some verbs (ldquoto expectrdquo ldquotodoubtrdquo) questions words enclosed in quotes which could not neces-sarily reflect the authorrsquos opinion)According to Benamara et al (2012) modality can be used to expresspossibility necessity permission obligation or desire through gram-matical cues such as adverbial phrases (eg ldquomayberdquo ldquocertainlyrdquo)conditional verbal moods some verbs (eg ldquomustrdquo ldquocanrdquo ldquomayrdquo)some adjectives and nouns (eg ldquoa probable causerdquo)Relying on the categories of Larreya (2004) Portner (2009) they distin-guished the following three types of modality each one of which hasbeen indicated to possess a specific effect on the opinion expressionin its scope in term of strength or degree of certainty

Buletic Modality it indicates the speakerrsquos desireswishes

Cues verbs denoting hope (eg ldquoto wishrdquo)

Epistemic modality it refers to the speakerrsquos personal beliefs andaffects the strength and the certainty degree of opinion expres-sions

Cues doubt possibility or necessity adverbs (eg ldquoperhapsrdquo

172 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ldquodefinitelyrdquo) and modal verbs (eg ldquohave tordquo ldquomaycanrdquo)

Deontic Modality it indicates possibilityimpossibility or obliga-tionpermission

Cues same modal verbs of the epistemic modality but with adeontic reading (eg ldquoYou must go see the movierdquo)

442 Modality in SentIta

When computing the Prior Polarities of the SentIta items into the tex-tual context we considered that modality can also have a significantimpact on the SO of sentiment expressionsAccording to the literature trends but without specifically focusingon the Benamara et al (2012) modality categories we recalled in theFSA dedicated to modality the following linguistic cues and we madethem interact with the SentIta expressions

Sharpening and Softening Adverbs

bull simple adverbs (32 sharp and 12 soft)inequivocabilmenteAVV+FORTE+Focus=SHARP

necessariamenteAVV+FORTE+Focus=SHARP

relativamenteAVV+DEB+Focus=SOFT

suppergiugraveAVV+DEB+Focus=SOFT

bull compound adverbs (38 sharp and 26 soft)con ogni probabilitagraveAVV+AVVC+PDETC+PDETN+Focus=SHARP

senza alcun dubbioAVV+AVVC+PDETC+PDETN+Focus=SHARP

fino a un certo puntoAVV+AVVC+PAC+PDETAGGN+Focus=SOFT

in qualche modoAVV+AVVC+PDETC+PDETN+Focus=SOFT

Modal Verbs from the LG class 56 (dovere ldquohave to potereldquomaycan volere ldquowantrdquo) that have as definitional struc-

44 MODALITY DETECTION 173

ture N0 V (E + Prep) Vinf W and accept in position N1 a director prepositional infinitive clause (all) and only in the cases ofpotere and volere a N1-um

Conditional and Imperfect Tenses tenses are located after thetexts preprocessing through the annotations Tempo=IM andTempo=C

Dealing with modality is very challenging in the Sentiment analy-sis field because in this research area it is not enough to simply detectits indicators but it is also necessary to manage its effects in term ofpolarity We did not find appropriate to simplify the problem by ig-noring all the semantic orientation of phrases and sentences in whichmodality can be detected (Taboada et al 2011) In fact as exemplifiedbelow there are cases in which modality sensibly affects its intensityand other cases in which it regularly shifts the sentence polarity intospecific directions

Uncertainty Degrees the words annotated with the labels +SHARPand +SOFT described above just as intensifiers and downtonershave the power of modifying the intensity of the words endowedwith positive or negative prior polarities with the same effectsdescribed in Section 42 We respectively selected them amongthe words tagged with the labels FORTE and DEB with the onlyaim of providing an exhaustive module that can work well withboth Modality Detection and Sentiment Analysis purposes

ldquoPotererdquo + Indicative Imperfect + Oriented Item modal verbs occur-ring with an imperfect tense can turn a sentence into a weaklynegative one (66) when combined with positive words or into aweakly positive one when occurring with negative items (67)The effect is really close to the weak negation (see Table 46 inSection 43) This can be explained by the fact that the meaning

174 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

of the sentences of this kind always concerns subverted expecta-tions

(66) Poteva[Modal+IM] essere una trama interessante[+2] [-1]ldquoIt could be an interesting plot

(67) Poteva[Modal+IM] essere una trama noiosa[-2] [+1]ldquoIt could be a boring plot

ldquoPotererdquo + Indicative Imperfect + Comparative + Oriented Items alsoin this case we use the explanation of subverted expectationsThe terms of comparison are obviously the expected opinionobject and the real one If the expected object is better than thereal one the sentence score is negative (68) if it is worse the sen-tence score is positive (69) The rules for the polarity score attri-bution are the ones displayed in Section 45

(68) Poteva[Modal+IM] essere piugrave[I-CW] dettagliata[+2] [-1]ldquoIt might have been more detailedrdquo

(69) Poteva[Modal+IM] andare peggio[I-OpW +2] [-1]ldquoIt might have gone worserdquo

ldquoDovererdquo + Indicative Imperfect Here again we face disappointmentof expectations but with a stronger effect the negation does notregard the possibility of the opinion object to be good but a ne-cessity Therefore we assumed the final score in these cases tofollow the negation rules shown in Table 43 in Section 43 (70)

(70) Questo doveva[Modal+IM] essere un film di sfumature[+1] [-2]ldquoThis one was supposed to be a nuanced movierdquo

ldquoDovererdquo + ldquoPotererdquo + Past Conditional Past conditional sentences(Narayanan et al 2009) also have a particular impact on theopinion polarity especially with the modal verb dovere (ldquohaveto) In such cases as exemplified in (71) the sentence polarityis always negative in our corpus

45 COMPARATIVE SENTENCES MINING 175

(71) Non[Negation] avrei[Aux+C] dovuto[Modal+PP] buttare via i mieisoldi [-2]ldquoI should not have burnt my money

The verb potere instead follows the same rules described abovefor the imperfect tense

45 Comparative Sentences Mining

451 Related Works

Comparative Sentence Mining and Comparison Mining Systems areNLP techniques and tool that only in part coincide with the SentimentAnalysis fieldSentences that express a comparison generally carry along with themopinions about two or more entities with regard to their shared fea-tures or attributes (Ganapathibhotla and Liu 2008) While the Sen-timent Analysis main purpose is to identify details about such opin-ions the Information Extraction main goals regard comparative sen-tence detection and classification Therefore if the object of the anal-ysis and the basic information units are shared among the two ap-proaches their purposes can sometimes coincide but in other casesbe differentAmong the few studies on the computational treatment of compari-son (Jindal and Liu 2006ab Ganapathibhotla and Liu 2008 Fiszmanet al 2007 Yang and Ko 2011) only Ganapathibhotla and Liu (2008)tried to extract the authorsrsquo preferred entities and to identify the Se-mantic Orientation of comparative sentences after their identifica-tion and classificationWith the following two examples Ganapathibhotla and Liu (2008)showed the difficulty in the polarity determination of context-dependent opinion comparatives that for a correct interpretation al-

176 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ways require domain knowledge eg the word ldquolongerrdquo can assumepositive (72) or negative (73) orientation depending on the productfeature on which it is associated

(72) ldquoThe battery life of Camera X is longer than Camera Yrdquo(73) ldquoProgram Xrsquos execution time is longer than Program Yrdquo

Since this work aims to provide a general purpose lexical and gram-matical database for Sentiment Analysis and Opinion Mining we didnot face here the treatment of this special kind of opinionated sen-tences just as we did not include in the lexicon items with domain-dependent meaningConcerning the comparative classification the main types that havebeen identified in literature are the following

bull Gradable (direct comparison)

ndash Non-equal

Increasing

Decreasing

ndash Equative

ndash Superlative

bull Non-gradable (implicit comparisons)

According to Ganapathibhotla and Liu (2008) Jindal and Liu(2006ab) a comparative relation can be represented in the followingway

ComparativeWord Features EntityS1 EntityS2 Type

Where ComparativeWord is the keyword used to express a compar-ative relation in a sentence Features is a set of features being com-pared EntityS1 is the first term of comparison and EntityS2 is thesecon oneThis way examples (72) and (73) are represented as follows

45 COMPARATIVE SENTENCES MINING 177

(72) longer battery life Camera X Camera Y Non-equal Gradable

(73) longer execution time Program X Program Y Non-equal Gradable

Our work shares with Ganapathibhotla and Liu (2008) the followingbasic rules

Increasing comparative + Negative item minusrarr Negative opinion minusrarr EntityS2 is preferred

Increasing comparative + Positive item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Negative item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Positive item minusrarr Negative opinion minusrarr EntityS2 is preferred

However differently from our work the authors simply switchedthe polarity of the opinions (and consequently the preferred en-tity) in the cases in which comparative sentences contained negationwords or phrases They underlined that sometimes this operationcould appear problematic eg ldquonot longerrdquo does not mean ldquoshorterrdquoActually the problem is that the authors did not take into accountdifferent degrees of positivity and negativity As we will show in thenext paragraphs of this Section the co-occurrences of weakly pos-itivenegative and intensely positivenegative Prior Polarities withcomparison indicators can generate different final sentence scoresAs regards the automatic analysis of superlatives we mention thework of Bos and Nissim (2006) that grounded the semantic interpre-tation of superlatives on the characterization of a comparison set (theset of entities that are compared to each other with respect to a cer-tain dimension) The authors took into consideration the superlativemodification performed by ordinals cardinals or adverbs (eg inten-sifiers or modals) and also the fact that superlative can manifest itselfboth in predicative or attributive formTheir superlative parser able to recognise a superlative expressionand its comparison set with high accuracy provides a CombinatoryCategorial Grammar (CCG) (Clark and Curran 2004) derivation of theinput sentence that is then used to construct a Discourse Represen-tation Structure (DRS) (Kamp and Reyle 1993)In this thesis we did not go through a semantic and logical analysis of

178 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

comparative and superlative sentences as well explained in the nextparagraph we just used them as CVS in order to accurately modify thepolarity of the entities they involve

452 Comparison in SentIta

In general this work which focuses on Gradable comparison in-cludes the analysis of the following comparative structures

bull equative comparative frozen sentences of the type N0 Agg comeC1 from the LG class PECO (see Section 342)

bull opinionated comparative sentences that involve the expressionsmeglio di migliore di ldquobetter than peggio di peggiore di ldquoworsethan superiore a ldquosuperior to inferiore a ldquoless than (74)

bull adjectival adverbial and nominal comparative structures that re-spectively involve oriented adjectives adverbs and nouns fromSentIta (75)

bull absolute superlative (75)

bull comparative superlative (76)

The comparison with other products (74) has been evaluated withthe same measures of the other sentiment expressions so the polaritycan range from -3 to +3The superlative that can imply (76) or not (75) a comparison with awhole class of items (Farkas and Kiss 2000) confers to the first termof the comparison the higher polarity score so it always increases thestrength of the opinion Its polarity can be -3 or +3

(74) LrsquoS3 egrave complessivamente superiore allrsquoIphone5 [+2]ldquoThe S3 is on the whole superior to the iPhone5

(75) Il suo motore era anche il piugrave brioso[+2] [+3]ldquoIts engine was also the most lively

45 COMPARATIVE SENTENCES MINING 179

(76) Un film peggiore di qualsiasi telefilm [-3]ldquoA film worse than whatever tv series

We did not include the equative comparative in the ContextualValence Shifters Section because it does not change the polarity ex-pressed by the opinion words (77)

(77) [Non[Negation] entusiasmante[+3]][-1] come i PES degli anni prece-denti

ldquoNot exciting as the the past years PESrdquo

In the next paragraphs we will go through the rules used in this FSAnetwork to compute the Semantic Orientations of minority and ma-jority comparative structures As will be observed the variety and thecomplexity of co-occurrence rules can be easily managed using thefinite-state technology

453 Interaction with the Intensification Rules

Rule II increasing comparative sentences with increasing opinion-ated comparative items listed in SentIta with positive Prior Polaritiesgenerate a positive opinionComparative indicators are the following

bull increasing opinionated comparative words (I-Op-CW)eg meglio better = +2

bull Intensifiers + increasing opinionated comparative words (Int + I-Op-CW)eg molto meglio a lot better = +3

Rule I increasing comparative sentences which occur with positiveitems from SentIta generate a positive OpinionComparative indicators are the following

180 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull Increasing Comparative Words (I-CW)eg piugrave morerdquo

bull Intensifiers + Increasing comparative words (Int + I-CW)eg molto piugrave a lot morerdquo

The sentence score depends on the interaction of I-CWs intensi-fiers and the prior polarity scores attributed to the SentIta wordsin the electronic dictionaries (see Table 47)

The sentence score just depends on the Prior polarity of the I-Op-CW

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

I-CW + OpW[POS] piugrave +1 +2 +3Int + I-CW + OpW[POS] molto piugrave +2 +3 +3

Table 47 Rule I increasing comparative sentences with positive orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

D-CW + OpW[POS] meno -1 -2 -1Int + D-CW + OpW[POS] molto meno -2 -3 -1

Table 48 Rule III decreasing comparative sentences with negative orientation

Rule III decreasing comparative sentences which occur with positiveitems from SentIta generate a negative opinion (see Table 48)Comparative indicators are the following

bull Decreasing Comparative Words (D-CW)eg meno lessrdquo

45 COMPARATIVE SENTENCES MINING 181

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

I-CW + OpW[NEG] piugrave -1 -2 -3Int + I-CW + OpW[NEG] molto piugrave -2 -3 -3

Table 49 Rule IV increasing comparative sentences with Negative Orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

D-CW + OpW[NEG] meno +1 +1 +1Int + D-CW + OpW[NEG] molto meno +2 +2 +1

Table 410 Rule VI decreasing comparative sentences with positive orientation

bull Intensifier + Decreasing comparative word (Int + D-CW)eg molto meno much lessrdquo

Rule IV this rule is perfectly specular to Rule I Increasing compara-tive sentences which occur with Negative items from SentIta gen-erate a negative opinionThe sentence score depends again on the interaction of I-CW In-tensifiers and the prior polarity scores attributed to the SentItawords that appear in the same sentence (see Table 49)

Rule V decreasing comparative sentences with decreasing opinion-ated comparative items listed in SentIta with negative Prior Po-larities generate a negative opinionComparative indicators are the following

bull decreasing opinionated comparative words (I-Op-CW)eg peggio worse = -2

bull Intensifier + decreasing opinionated comparative word (Int+ I-Op-CW)

182 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + I-CW + OpW[POS] non piugrave -1 -1 -1N + Int + I-CW + OpW[POS] non molto piugrave -1 -1 -1

Table 411 Nagation of Rule I Negated increasing comparative sentences with neg-ative orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + D-CW + OpW[POS] non meno +1 +2 +3N+ Int + D-CW + OpW[POS] non molto meno +1 +1 +2

Table 412 Negation of Rule III Negated decreasing comparative sentences withpositive orientation

eg molto peggio much worse = -3

Just as in Rule II the sentence score just depends on the PriorPolarity of the I-Op-CWs

Rule VI decreasing comparative sentences which occur with neg-ative items from SentIta generate a positive opinion (see Table410)Rule VI shares the comparative indicators with Rule III

454 Interaction with the Negation Rules

Negation of the Rule I increasing comparative sentences which occurwith positive items from SentIta when negated generate a neg-ative opinionComparative indicators are of course the same of Rule I with

45 COMPARATIVE SENTENCES MINING 183

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + I-CW + OpW[NEG] non piugrave +1 +1 -1N+ Int + I-CW + OpW[NEG] non molto piugrave +1 +1 -2

Table 413 Negation of Rule IV Negated increasing comparative sentences with neg-ative orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + D-CW + OpW[NEG] non meno -1 -2 -3N+ Int + D-CW + OpW[NEG] non molto meno -1 -2 -3

Table 414 Negation of RuleVI Negated decreasing comparative sentences with pos-itive orientation

the addition of negation indicators (N) eg non ldquonotrdquo The sen-tence score is always turned into the weakly negative one (seeTable 411)

Negation of the Rule II increasing comparative sentences in whichoccur increasing opinionated comparative items listed in SentItawith positive Prior Polarities when negated generate always anegative opinionComparative indicators are the same of Rule II

bull Negation markers + Increasing Opinionated ComparativeWords (N + I-Op-CW)eg non meglio ldquonot betterrdquo = -2

bull Negation markers + Intensifiers + Increasing OpinionatedComparative Words (N + Int + I-Op-CW)eg non molto meglio ldquonot much betterrdquo = -1

184 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Negation of Rule III the negation of Rule III that involves decreasingcomparative sentences with positive items from SentIta alwaysgenerates positive opinionsThe different sentence scores computed on the base of the in-teraction of comparative words and SentItarsquos Prior Polarities arereported in Table 412

Negation of Rule IV increasing comparative sentences which occurwith negative items from SentIta generate in the great part of thecases positive opinions (see Table 413) The sentence score isinfluenced by the interaction of Negation markers I-CW Inten-sifiers and negative prior polarity scores

Negation of Rule V decreasing comparative sentences in which occurincreasing opinionated comparative items listed in SentIta withnegative Prior Polarities when negated generate a weakly nega-tive opinion (non (molto + E) peggio ldquonot (much + E) worserdquo)

Negation of Rule VI decreasing comparative sentences which occurwith negative items from SentIta generate negative opinions ifnegated see Table 414

Chapter 5

Specific Tasks of the SentimentAnalysis

51 Sentiment Polarity Classification

Differently from the traditional topic-based text classification senti-ment classification aims to categorize documents through the classespositive and negative The objective can be a simple binary classifica-tion or it can come after a more complex classification with a largernumber of classes in the continuum between positive and negative(Pang and Lee 2008a) The rating-inference problem (Pang and Lee2005 Leung et al 2006 2008 Shimada and Endo 2008) regards themapping of automatically detected document polarity scores on thefine-grained rating scales (eg 510) of existing Collaborative Filter-ing algorithms (Breese et al 1998 Herlocker et al 1999 Sarwar et al2001 Linden et al 2003) The challenging problem of the sentimentclassification is the fact that the sentiment classes far from being un-related to one another represent opposing (binary classification) orordinal categories (multi-class a and rating-inference)

185

186 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

511 Related Works

Many techniques have been discussed in literature to perform theSentiment Polarity Classification We divided them in Section 32 inlexicon-based methods learning methods and hybrid methods1The first which calculate the sentiment orientation of sentences anddocuments starting from the polarity of words and phrases they con-tain are the methods that have been chosen for the present workThis is why they received an in-depth analysis in Chapter 3 wherewe discussed also the pros and cons of both the manual and auto-matic draft of sentiment lexical resources In this respect we men-tion again the major contribution related to knowledge-based so-lutions Landauer and Dumais (1997) Hatzivassiloglou and McKe-own (1997) Wiebe (2000) Turney (2002) Turney and Littman (2003)Riloff et al (2003) Hu and Liu (2004) Kamps et al (2004) Vermeij(2005) Gamon and Aue (2005) Esuli and Sebastiani (2005 2006a)Kanayama and Nasukawa (2006a) Benamara et al (2007) Kaji andKitsuregawa (2007) Rao and Ravichandran (2009) Mohammad et al(2009) Neviarouskaya et al (2009a) Velikovich et al (2010) Taboadaet al (2006 2011)Rule-based approaches that take into account the syntactic dimen-sion of the Sentiment Analysis are the ones used by Mulder et al(2004) Moilanen and Pulman (2007) and Nasukawa and Yi (2003)The most used classification algorithms in learning and statisticalmethods are Support Vector Machines which trained on specificdata-sets map a space of unstructured data points onto a new struc-tured space through a kernel function and Naiumlve Bayes which areprobabilistic classifiers that connect the presenceabsence of a classfeature to the absencepresence of other features given the class vari-able Effective sentiment features exploited in supervised machinelearning applications are terms their frequencies and TF-IDF parts

1In general while hybrid methods are characterized by the application of classifiers in sequence ap-proaches based on rules consists of an if-then relation among an antecedent and its associated conse-quent (Prabowo and Thelwall 2009)

51 SENTIMENT POLARITY CLASSIFICATION 187

of speech sentiment words and phrases sentiment shifters syntac-tic dependency etcTan et al (2009) and Kang et al (2012) adapted Naiumlve Bayes algorithmsfor sentiment analysisPang et al (2002) with the purpose to test the hypothesis that senti-ment classification can be treated just as a special case of topic-basedcategorization with the positive and negative sentiment consideredas topics used SVM Naiumlve Bayes and Maximum Entropy classifierswith diverse sets of featuresMullen and Collier (2004) employed the values potency activity andevaluative from a variety of sources and created a feature space clas-sified by means of SVMYe et al (2009) compared SVM with other statistical approaches prov-ing that SVM and n-gram outperform the Naiumlve Bayes approachThe minimum-cut framework working on graphs has been used byPang and Lee (2004)Bespalov et al (2011) performed sentiment classification via latent n-gram analysisNakagawa et al (2010) presented a dependency tree-based classifica-tion method grounded on conditional random fieldsCompositionality algorithms have been tested by Yessenalina andCardie (2011) and Socher et al (2013) Socher et al (2013) proposedRecursive Neural Tensor Network able to compute compositionalvector representations for phrases of variable length or of syntactickind obtaining better performances of standard recursive neural net-works matrix-vector recursive neural networks Naiumlve bi-gram Naiumlveand SVM Their output is the annotated corpus Stanford SentimentTreebank which represents a precious resource for the analysis of thecompositional effects of sentiment in languageSentiment indicators can be used for sentiment classification in anunsupervised manner this is the case of Turney (2002) and the otherlexicon-based methods (Liu 2012)In the end as regards the hybrid methods we mention the works of An-

188 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

dreevskaia and Bergler (2008) that integrated the accuracy of a corpus-based system with the portability of a lexicon-based system in a sin-gle ensemble of classifiers Aue and Gamon that combined nine SVM-based classifiers Dasgupta and Ng (2009) that firstly mined the un-ambiguous reviews through spectral techniques and afterwards ex-ploited them to classify the ambiguous reviews by combining activelearning transductive learning and ensemble learning Goldberg andZhu (2006) that presented an adaptation of graph-based learning al-gorithms for rating-inference problems and Prabowo and Thelwall(2009) that combined rule-based and machine learning classificationalgorithms

512 DOXA a Sentiment Classifier based on FSA

Using the command-line program noojapplyexe we built Doxa aprototype written in Java by which users can automatically apply theNooj lexicon-grammatical resources from Section 3 and rules fromSection 4 to every kind of text getting back a feedback of statistics thatcontain the opinions expressed in each caseDoxa sums up the values corresponding to every sentiment expressionand then standardizes the result for the total number of sentimentexpressions contained in each review Neutral sentences are simplyignored by both the FSA and DoxaThe automatically attributed polarity score are compared with thestars that the opinion holder gave to his review Then through thestatistics about the opinions expressed in every domain Doxa teststhe consistency of the sentiment lexical databases and tools with re-spect to the rating-inference problem in each document and in thewhole corpusFigure 51 reports an example of the analysis on the domain of hotelsreviews

51 SENTIMENT POLARITY CLASSIFICATION 189

Figure 51 Doxa the LG sentiment classifier based on the finite-state technology

513 A Dataset of Opinionated Reviews

Despite the fact that for the English language there is a large availabil-ity of corpora for sentiment analysis (Hu and Liu 2004 Pang and Lee2004 2005 Wiebe et al 2005 Wilson et al 2005 Thomas et al 2006Jindal and Liu 2007 Snyder and Barzilay 2007 Blitzer et al 2007 Sekiet al 2007 Macdonald et al 2007 Pang and Lee 2008b among oth-ers) the Italian language presents fewer contributions in this sense(Basile and Nissim 2013 Bosco et al 2013 2014)The dataset used to evaluate the lexical and grammatical resources ofthe present work has been built from scratch using Italian opinion-ated texts in the form of usersrsquo reviews and comments of e-commerceand opinion websites It contains 600 texts units and refers to sixdifferent domains for all of which different websites have been ex-ploited

190 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Cars wwwciaoit

Smartphones wwwtecnozoomit wwwciaoit wwwamazonitwwwalatestit

Books wwwamazonit wwwqlibriit

Movies wwwmymoviesit wwwcinemaliait wwwfilmtvitwwwfilmscoopit

Hotels wwwtripadvisorit wwwexpediait wwwvenerecomithotelscom wwwbookingcom

Videogames wwwamazonit

Details about the corpus are shown in Table 1

Text

Feat

ure

s

Car

s

Smar

tph

on

es

Bo

oks

Mov

ies

Ho

tels

Gam

es

Tot

Neg Docs 50 50 50 50 50 50 300Pos Docs 50 50 50 50 50 50 300Text Files 20 20 20 20 20 20 120

Word Forms 17163 19226 8903 37213 12553 5597 101655Tokens 21663 24979 10845 45397 16230 7070 126184

Table 51 Dataset of opinionated online customer reviews

Adjectives as it is commonly recognized in literature (Hatzivas-siloglou and McKeown 1997 Hu and Liu 2004 Taboada et al 2006)seem to be the most reliable SO indicators considering that the 17of the adjectives occurring in the corpus are polarized compared tothe 3 of the adverbs the 2 of nouns and the 7 of verbsMoreover considering all the opinion bearing words in the corpus theadjectivesrsquo patterns cover 81 of the total number of occurrences (al-most 5000 matches) while the adverbs the nouns and the verbs reach

51 SENTIMENT POLARITY CLASSIFICATION 191

respectively a percentage of 4 6 and 2 The remaining 7 is cov-ered by the other sentiment expressions that in any case contributeto the achievement of satisfactory levels of RecallAs regards the Semantic Orientation while in the dictionaries almost70 of the words are connoted by a negative Prior Polarity the num-ber of positive and negative expressions in our perfectly balanced cor-pus lead to almost antipodal results 64 of the sentiment expressionspossess a positive polarity In particular it is positive 61 of the ad-jective expressions 77 of the adverb expressions 65 of the nounsexpressions and 51 of the verb expressionsThe domain independent expressions have diametrically oppositepercentages only 34 of the occurrences is positive This result is dueto the presence of the really productive negative metanodes troppoldquotoo muchrdquo and poco ldquonot muchrdquo Inactivating them in fact the pos-itive percentage reaches 58 of occurrences Thus it could be statedthat although the lexicon makes available a greater number of nega-tive items the real occurrences of the opinion words throw off balancein the direction of the positive items (see Section 3321 to deepen thephenomenon of positive biases)

514 Experiment and Evaluation

In the sentiment classification task the information unit is representedby an opinionated document as a whole The purpose of our senti-ment tool is to classify such documents on the base of their overallSemantic OrientationTo ground this task on a lexicon means to hypothesize that the polar-ities of opinion words can be considered indicators of the polarity ofthe document in which the words and the expressions are containedOur Lexicon-Grammar based sentiment lexical database confers to el-ementary sentences the status of minimum semantic units of the lan-guage Therefore the indicators on which we grounded our classifi-

192 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cation go beyond the limit of mere words to match entire elementarysentences or polar phrases from complex sentencesBecause the inference of the whole document SO depends on the cor-rect annotation of its sentences and phrases the evaluation phase in-volves both the sentence-level and the document-level performancesof the tool

5141 Sentence-level Sentiment Analysis

This Paragraph presents the results pertaining to the Nooj outputwhich has been produced applying the SentIta LG sentiment re-sources to our corpus of costumer reviewsFor sake of clarity and for the reproducibility of the results it must bepointed out that some of the lexical resources have been applied withhigher priority attributions Details about the preferences are givenbelow

bull H1 to the sentiment adverb and nouns dictionaries

bull H2 to the sentiment adjective dictionary

bull H3 to the freely available dictionary Contrazioni that belongs tothe official Italian module of Nooj

bull H4 to an high-level priority dictionary that contains what is com-monly called a ldquostop word listrdquo

bull H5 to a dictionary of compound adverbs

In order to avoid ambiguity as much as possible the dictionaryof the verbs of sentiment must have the same priority of the Italianstandard dictionary (sdic) lower than any other mentioned resourceBecause our lexical and grammatical resources are not domain-specific we observed their interaction with every single part of thecorpus which is composed of many different domains each one ofthem characterized by its own peculiarities

51 SENTIMENT POLARITY CLASSIFICATION 193

Moreover in order to verify the performances of every part of speech(and of the expressions connected to them) we checked as shown inTable 52 the Precision and the Recall applying separately every singlemetanode (A ADV N V D-ind2) of the main graph of the sentimentgrammar

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A 88 90 79 875 915 835 866ADV 804 758 879 923 92 50 797N 818 857 828 778 853 857 832V 882 571 848 895 571 100 795D-ind 879 835 781 767 90 947 879Average 853 784 825 848 832 828 828

Table 52 Precision measure in sentence-level classification

The sentence-level Precision has been manually calculated deter-mining whereas the polarity and the intensity score of the expressionswas correctly assigned on all the concordances produced by Nooj Wemade an exception for the expressions of the adjectives which due tothe extremely large number of concordances have been evaluated byextracting a sample of 1200 matches (200 for each domain)The values marked by the asterisks have been reported to be thoroughbut they are not really relevant because of the small number of concor-dances on which they have been calculatedThe best performance has been reached in average by the cars datasetwhile the lower values have been assigned to the movies dataset Theperformances of the adjectives grammar are really satisfying espe-

2Domain independent sentiment expressions (D-ind) refer to all the frozen and semi-frozen expres-sions that have been directly formalized in the FSA and include also the vulgar expressions

194 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cially if compared with the high number of matches they producedEven though the domain-independent node covers a small part of thematches (7) it reached the best Precision results if compared withthe other metanodesAnyway in many cases just evaluating the polarity and the intensityof the expressions contained into a document is not enough to deter-mine whereas such expression truly contributes to the identificationof the polarity and the intensity of the document itself Sometimes po-larized sentences and phrases are not opinion indicators especiallyin the reviews of movies and books in which polarized sentences canjust refer to the plot (78)

(78) Le belle case sono dimora della degenerazione piugrave bieca [-3]ldquopretty houses hosts the grimmest degeneracyrdquo)

An opinionated sentence can be considered not pertinent also isthe case in which it does not directly refer to the object of the opinionbut mentions aspects that are just indirectly connected with the prod-uct Examples of this are (79) that does not refer to the product but tothe delivery service that in a book review does not refers to the objectbut to another media product related to the book

(79) Apprezzo[+2] molto[+] Amazon per questo [+3]ldquoI really appreciate Amazon for thisrdquo

(80) Il film non[Negation] mi era piaciuto[+2] [-2]ldquoI didnrsquot like the movierdquo

In this work we chose to exclude the second sentence from the cor-rect matches but we considered worthwhile to include the first onebecause of the high influence that this feature has on the numericalvalue that the opinion holder confers to his own opinionTable 53 corrects the results reported in Table 52 by excluding from

the correct matches the ones that do not give their contribution tothe individuation of the correct document SO This makes the averagesentence-level Precision drop of 87 points reaching the percentage of

51 SENTIMENT POLARITY CLASSIFICATION 195

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A -5 -55 -28 -105 -1 -15 -86ADV -22 0 -297 -77 0 0 -66N -114 -143 -395 -148 -59 -143 -167V 0 0 -176 -158 0 0 -56D-ind -86 0 -133 -67 -25 -53 -61Average -54 -4 -256 -111 -19 -42 -87

Table 53 Irrelevant matches in sentence-level classification

741 which we considered adequateWe chose to present these two results separately because this way it ispossible to distinguish the domains that are more influenced by thisproblem (eg movies and books) from the ones that are less affectedby the pertinence problem (eg hotels and smartphones)

5142 Document-level Sentiment Analysis

As far as the document-level performance is concerned (Table 54)we calculated the precision twice by considering in a first case as truepositive the reviews correctly classified by Doxa on the base of theirpolarity and in a second case by considering as true positive the docu-ments that received by our tool exactly the same stars specified by theOpinion HolderIn detail in the Polarity only row the True Positives are the documentsthat have been correctly classified by Doxa with a polarity attributionthat corresponds to the one specified by the Opinion HolderIn the Intensity also row the True Positives are the document that re-ceived by Doxa exactly the same stars specified by the Opinion Holder

196 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Pre

cisi

on

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Polarity only 710 720 630 740 910 720 740Intensity also 320 450 250 330 490 340 363

Table 54 Precision measure in document-level classification

As we can see the latter seem to have a very low precision but upona deeper analysis we discovered that is really common for the OpinionHolders to write in their reviews texts that do not perfectly correspondto the stars they specified That increases the importance of a softwarelike Doxa that does not stop the analysis on the structured data butenters the semantic dimension of texts written in natural language

Rec

all

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Sentence-level 727 796 648 657 721 588 690Document-level 100 986 100 961 989 912 975

Table 55 Recall in both the sentence-level and the document level classifications

The Recall pertaining to the sentence-level performance of our toolhas been manually calculated on a sample of 150 opinionated docu-ments (25 from each domain) by considering as false negatives thesentiment indicators which have not been annotated by our grammarThe document-level Recall instead has been automatically checkedwith Doxa by considering as true positive all the opinionated docu-ments which contained at least one appropriate sentiment indicator

52 FEATURE-BASED SENTIMENT ANALYSIS 197

So the documents in which the Nooj grammar did not annotate anypattern were considered false negatives Because we assumed that allthe texts of our corpus were opinionated we considered as false neg-atives also the cases in which the value of the document was 0Taking the F-measure into account the best results were achievedwith the smartphonersquos domain (770) in the sentence-level taskand with the hotelrsquos dataset (948) into the document-level perfor-mance

52 Feature-based Sentiment Analysis

The purpose of the sentiment analysis based on features is to providecompanies with customer opinions overviews which automaticallysummarize the strengths and the weaknesses of the products and ser-vices they offer In Section 1 we connected the definition of the opin-ion features to the following function (Liu 2010)

T=O(f)

Where the features (f ) of an object (O) (eg hotels) can be brandnames (eg Artemide Milestone) properties (eg cleanness loca-tion) parts (eg rooms apartments) related concepts (eg city to-ponyms) or parts of the related concepts (eg city centre tube mu-seum etc ) (Popescu and Etzioni 2007) Liu (2010) represented ev-ery object as a ldquospecial featurerdquo (F) defined by a subset of features (fi)

F = f1 f2 fn

Both F and fi grouped together under the noun target can beautomatically discovered in texts through both synonym words andphrases Wi or other direct or indirect indicators Ii

Wi = wi1 wi2 wim

Ii = ii1 ii2 iiq

198 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

It is essential to discover not only the overall polarity of an opin-ionated document but also its topic and features in order to discernthe aspect of a product that must be improved or whether the opin-ions extracted by the Sentiment Analysis applications are relevant tothe product or not For example (81) an extract from a hotel reviewexpresses two opinions on two different kinds of features the behav-ior of the staff (81a) and the beauty of the city in which the hotel islocated (81b)

(81a) [Opinion[FeaturePersonale] meravigliosamente accogliente]ldquoStaff wonderfully welcomingrdquo

(81b) [Opinion[FeatureRoma]egrave bellissima]ldquoRome is beautifulrdquo

It is crucial to distinguish between the two topics because just oneof them (81a) refers to an aspect that can be influenced or improvedby the company

521 Related Works

Pioneer works on feature-based opinion summarization are Hu andLiu (2004 2006) Carenini et al (2005) Riloff et al (2006) and Popescuand Etzioni (2007) Both Popescu and Etzioni (2007) and Hu and Liu(2004) firstly identified the product features on the base of their fre-quency and then calculated the Semantic Orientation of the opin-ions expressed on these features In order to find the most impor-tant features commented in reviews Hu and Liu (2004) used the as-sociation rule mining thanks to which frequent itemsets can be ex-tracted in free texts Redundant and meaningless items are removedduring a Feature Pruning phase Experiments have been carried outon the first 100 reviews belonging to five classes of electronic products(wwwamazoncom and wwwC|netcom) Hu and Liu (2006) presentedthe algorithm ClassPrefix-Span that aimed to find special kinds of pat-

52 FEATURE-BASED SENTIMENT ANALYSIS 199

tern the Class Sequential Rules (CSR) using fixed target and classesTheir method was based on the sequential pattern miningCarenini et al (2005) struck a balance between supervised and unsu-pervised approaches They mapped crude (learned) features into aUser-Defined taxonomy of the entityrsquos Features (UDF) that provideda conceptual organization for the information extracted This methodtook advantages from a similarity matching in which the UDF re-duced the redundancies by grouping together identical features andthen organized and presented information by using hierarchical rela-tions) Experiments have been conducted on a testing set containingPros Cons and detailed free reviews of five productsRiloff et al (2006) used the subsumption hierarchy in order to iden-tify complex features and then to reduce a feature set by remov-ing useless features which have for example a more general coun-terpart in the subsumption hierarchy The feature representationsused for opinion analysis are n-grams (unigrams bigrams) and lexico-syntactic extraction patterns Popescu and Etzioni (2007) presentedOPINE an unsupervised feature and opinion extraction system thatused the Web as a corpus to identify explicit and implicit featuresand relaxation-labelling methods to infer the Semantic Orientation ofwords The system draws on WordNetrsquos semantic relations and hier-archies for the individuation of the features (parts properties and re-lated concepts) and the creation of clusters of wordsFerreira et al (2008) made a comparison between the likelihood ratiotest approach (Yi et al 2003) and the Association mining approach(Hu and Liu 2004) The e-product dataset (topical documents) andthe annotation scheme (improved with some revisions) are the onesused by Hu and Liu (2004) In particular the revised annotationscheme comprehended

bull part-of relationship with the product (eg battery is a part of adigital camera)

bull attribute-of relationship with the product (weight and design are

200 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

attributes of a camera)

bull attribute-of relationship with the productrsquos feature (battery life isan attribute of the battery that is in turn a camerarsquos feature)

Double Propagation (Qiu et al 2009) focuses on the natural relationbetween opinion words and features Because opinion words are of-ten used to modify features such relations can be identified thanks tothe dependency grammarBecause these methods have good results only for medium-size cor-pora they must be supported by other feature mining methods Thestrategy proposed by Zhang et al (2010) is based on ldquono patternsrdquo andpart-whole patterns (meronymy)

bull NP + Prep + CP ldquobattery of the camerardquo

bull CP + with + NP ldquomattress with a coverrdquo

bull NP CP or CP NP ldquomattress padrdquo

bull CP Verb NP ldquothe phone has a big screenrdquo

Where NP is a noun phrase and CP a class concept phrase Theverbs used are ldquohasrdquo ldquohaverdquo ldquoincluderdquo ldquocontainrdquo ldquoconsistrdquo etc ldquoNordquopatterns are feature indicators as well Examples of such patterns areldquono noiserdquo or ldquono indentationrdquo Clearly a manually built list of fixedldquonordquo expression like ldquono problemrdquo is filtered out from the feature can-didates Every sentence is POS-tagged using the Brillrsquos tagger (Brill1995) and used as system inputIn order to find important features and rank them high Zhang et al(2010) used a web page ranking algorithm named HITS (Kleinberg1999)Somprasertsri and Lalitrojwong (2010) used a dependency based ap-proach for the opinion summarization task A central stage in theirwork is the extraction of relations between product features (ldquothe topicof the sentimentrdquo) and opinions (ldquothe subjective expression of theproduct featurerdquo) from online customer reviews Adjectives and verbs

52 FEATURE-BASED SENTIMENT ANALYSIS 201

have been used in this study as opinion words The maximum entropymodel has been used in order to predict the opinion-relevant productfeature relation In general in a dependency tree a feature can be ofdiffering kinds

bull Child the product feature is the subject or object of the verbsand the opinion word is a verb or a complement of a copular verb(opinions depend on the product features)

eg I like(opinion) this camera(feature)

bull Parent the opinion words are in the modifiers of product fea-tures which include adjectival modifier relative clause modifieretc (product features depend on the opinions)

eg I have found that this camera takes incredible(opinion) pic-tures(feature)

bull Sibling the opinion word may also be in an adverbial modifiera complement of the verb or a predicative (both opinions andproduct features depend on the same words)

eg The pictures(feature) some time turn out blurry(opinion)

bull Grand Parent the opinion words are adjectival complement ofmodifiers of product features (in the following example the wordldquogoodrdquo is the adverbial complement of relative clause modifierof noun phrase ldquomovie moderdquo) (opinions depend on the wordswhich depend on the product features)

eg It has movie mode(feature) that works good(opinion) for a digitalcamera

bull Grand Child the product feature is the subject or object of thecomplements and the opinion word is a verb or a complementof a copular verb for example (product features depend on thewords which depend on the opinions)

eg Itrsquos great(opinion) having the LCD display(feature)

202 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

bull Indirect none of the above relations

The Stanford Parser has been used by Somprasertsri and Lalitroj-wong (2010) to parse documents in the pre-processing phase beforethe dependency analysis stage Definite linguistic filtering (POSbased) pattern and the General Inquirer dictionary (Stone et al 1966)has been used to locate noun phrases and to extract product featurecandidate The generalized iterative scaling algorithm (Darroch andRatcliff 1972) has been used to estimate parameters or weights of theselected features Because it is possible to refer to a particular featureusing several synonyms Somprasertsri and Lalitrojwong (2010) usedsemantic information encoded into a product ontology manuallybuilt by integrating manufacturer product descriptions and termi-nologies in customer reviewsWei et al (2010) proposed a semantic-based method that made usesof a list of positive and negative adjectives defined in the GeneralInquirer to recognize opinion words and then extracted the relatedproduct features in consumer reviewsXia and Zong (2010) performed the feature extraction and selectiontasks using word relation features which seems to be effective featuresfor sentiment classification because they encode relation informationbetween words They can be of different kinds Uni (unigrams) WR-Bi(traditional bigrams) WR-Dp (word pairs of dependency relation)GWR-Bi (generalized bigrams) and GWR-Dp (dependency relations)Gutieacuterrez et al (2011) exploited Relevant Semantic Trees (RST) for theword-sense disambiguation and measured the association betweenconcepts at the sentence-level using the association ratio measureMejova and Srinivasan (2011) explored different feature definitionand selection techniques (stemming negation enriched featuresterm frequency versus binary weighting n-grams and phrases) andapproaches (frequency based vocabulary trimming part-of-speechand lexicon selection and expected Mutual Information Mejovarsquosexperimental results confirmed the fact that adjectives are importantfor polarity classification Furthermore they revealed that stemming

52 FEATURE-BASED SENTIMENT ANALYSIS 203

and using binary instead of term frequency feature vectors do nothave any impact on performances and that the helpfulness of certaintechniques depends on the nature of the dataset (eg size and classbalance)Concordance based Feature Extraction (CFE) is the technique used byKhan et al (2012) After a traditional pre-processing step regular ex-pressions are used to extract candidate features Evaluative adjectivescollected on the base of a seed list from Hu and Liu (2004) are helpfulin the feature extraction task In the end a grouping phase found theappropriate features for the opinionrsquos topic grouping together all therelated features and removing the useless ones The algorithm usedin this phase is based on the co-occurrence of features and uses theleft and right featurersquos contextKhan et al selected candidate product features by employing nounphrases that appear in texts close to subjective adjectives This intu-ition is shared with Wei et al (2010) and Zhang and Liu (2011) Thecentrepiece of the Khanrsquos method is represented by hybrid patternsCombined Pattern Based Noun Phrases (cBNP) that are grounded onthe dependency relation between subjective adjectives (opinionatedterms) and nouns (product features) Nouns and adjectives canbe sometimes connected by linking verbs (eg ldquocamera producesfantastically good picturesrdquo) Preposition based noun phrases (egldquoquality of photordquo ldquorange of lensesrdquo) often represents entity-to-entityor entity-to-feature relations The last stage is the proper featureextraction phase in which using an ad hoc module the noun phrasesof the cBNP patterns have been designated as product features

522 Experiment and Evaluation

In order to give our contribution to the resolution of the opinionfeature extraction and selection tasks we both exploited the SentIta

204 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

database and other preexisting Italian lexical resources of Nooj thatconsist of an objective dictionary that describes its lemmas withappropriate semantic properties it has been used to define andsummarize the features on which the opinion holder expressed hisopinionsIn order to identify the relations between opinions and features thelexical items contained in the just mentioned dictionaries have beensystematically recalled into a Nooj local grammar that provides asfeedback annotations about the (positive or negative) nature of theopinion and about the semantic category of the featuresThe objective lexicon consists in a Nooj dictionary of concrete nouns(Table 56) that has been manually built and checked by seven lin-guists It counts almost 22000 nouns described from a semantic pointof view by the property Hyper that specify every lemmarsquos hyperonymThe tags it includes are shown in Table 56 Through the softwareNooj it has been possible to create a connection between thesedictionaries and an ad hoc FSA for the feature analysisAs we will demonstrate below the feature classification is stronglydependent on the review domains and topics on the base of whichspecific local grammars need to be adapted (Engstroumlm 2004 An-dreevskaia and Bergler 2008)As shown in Figure 52 in the Feature Extraction phase the sentencesor the noun phrases expressing the opinions on specific features arefound in free texts Annotations regarding their polarity (BENEFITDRAWBACK) and their intensity (SCORE=[-3+3]) are attributed to each sen-tence on the base of the opinion words on which they are anchoredObviously the graph reported in Figure 52 represents just an extractof the more complex FSA built to perform the task which includesmore than 100 embedded graphs and thousands of different pathsable to describe not only the sentences and the phrases anchored onsentiment adjectives but also the ones that contain sentiment ad-verbs nouns verbs idioms and other lexicon-independent opinionbearing expressions (eg essere (dotato + fornito + provvisto) di ldquoto be

52 FEATURE-BASED SENTIMENT ANALYSIS 205

Label Tag Entries Example Traslationbody part Npc 598 labbro liporganism part Npcorg 627 colon colontext Ntesti 1168 rubrica address book(part of) article of clothing Nindu 1009 vestaglia nightgowncosmetic product Ncos 66 rossetto lipstickfood Ncibo 1170 agnello lambliquid substance Nliq 341 urina urinepotable liquid substance Nliqbev 29 sidro cidermoney Nmon 216 rublo ruble(part of) building Nedi 1580 torre towerplace Nloc 1018 valle valleyphysical substance Nmat 1727 agata agate(part of) plant Nbot 2187 zucca pumpkinmedicine Nfarm 415 sedativo sedativedrug Ndrug 69 mescalina mescalinechemical element Nchim 1212 cloruro chloride(part of) electronic device Ndisp 1191 motosega chainsaw(part of) vehicle Nvei 1082 gondola gondola(part of) furniture Narr 450 tavolo tableinstrument Nstrum 3666 ventaglio fanTot - 19946 - -

Table 56 Semantically Labeled Dictionary of Concrete Nouns

equipped withrdquo essere un (aspetto + nota + cosa + lato) negativo ldquoto bea negative siderdquo)The Contextual Valence Shifters that have been considered are theones described in Chapter 4Differently from the other approaches proposed in Literature theFeature Pruning task (Figure 53) in which the features extracted aregrouped together on the base of their semantic nature is performedin the same moment of the Extraction phase just applying once thedictionaries-grammar pair to free texts This happens because thegrammar shown in Figure 53 is contained into the metanode NG(Nominal Group) of the feature extraction grammarThanks to the instructions contained in this subgraph the words de-

scribed in our dictionary of concrete nouns as belonging to the same

206 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figure 52 Extract of the Feature Extraction grammar

hyperonym receive the same annotation The words labeled with thetags Narr (furniture eg ldquobedrdquo) and Nindu (article of clothing egldquobath towelrdquo) are annotated as ldquofurnishingsrdquo (first path) the wordslabeled with the tags Ncibo (food eg ldquocakerdquo) and NliqBev (potableliquid substance eg ldquocoffeerdquo) receive the annotation ldquofoodservicerdquo(second path) and so onIn order to perfect the results we also directly included in the nodes ofthe grammar the words that were important for the feature selectionbut that were not contained in the dictionary of concrete nouns (ab-stract nouns eg pranzo ldquolunchrdquo vista ldquoviewrdquo) and the hyperonymsthemselves that in general correspond with each feature class name(eg arredamento ldquofurnishingsrdquo luogo ldquolocationrdquo)Finally we disambiguated terms that in the objective dictionary wereassociated to more than one tag by excluding the wrong interpreta-tions from the corresponding node For example in the domain of

52 FEATURE-BASED SENTIMENT ANALYSIS 207

Figure 53 Extract of the Feature Pruning metanode

hotel reviews if one comments the cucina ldquocookingrdquo (first path) or thesoggiorno ldquosojournrdquo (third path) he is clearly not talking about rooms(in Italian cucina means also ldquokitchenrdquo and soggiorno ldquoliving roomrdquo)but respectively about the food service and the whole holiday expe-rienceThe annotations provided by Nooj once the lexical and grammaticalresources are applied to texts written in natural language can beon the form of XML (eXtensible Markup Language) tags For thisreason they can be effortlessly recalled into an application that au-tomatically applies the Nooj ad hoc resources and makes statistical

208 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

analyses on them Therefore we dedicated a specific module of Doxato the feature-based sentiment analysis that has been tested on acorpus composed of 1000 hotel reviews collected from websites likewwwtripadvisorit wwwexpediait and wwwbookingcomAn extract of the automatically annotated corpus is reported below

Review

Sono stato in questo albergo quasi per caso dopo aver subitoun incidente e devo ammettere che se non fosse stato per ladisponibilitagrave la cortesia e la professionalitagrave di Mario Pinoe Rosa non avrei potuto risolvere una serie di cose Lrsquohotelegrave fantastico silenzioso e decisamente pulito La colazioneottima ed il responsabile della sala colazione gentilissimo

ldquoI happened to be in this hotel by accident and I must admitthat if it hadnrsquot been for the willingness the kindness andthe competence of Mario Pino and Rosa I couldnrsquot havesolved a number of things The hotel is fantastic silentand definitely clean Great the breakfast and very kind theresponsible of the breakfast roomrdquo

Annotation

Sono stato in questo albergo quasi per caso dopo aver subito un incidente e devo ammettere

che se non fosse stato per ltBENEFIT TYPE=ATTITUDES SCORE=2gt la disponibilitagrave

ltBENEFITgt ltBENEFIT TYPE=ATTITUDES SCORE=2gt la cortesia ltBENEFITgt e

ltBENEFIT TYPE=ATTITUDES SCORE=2gt la professionalitagrave ltBENEFITgt di Mario Pino

e Rosa non avrei potuto risolvere una serie di cose

ltBENEFIT TYPE=ACCOMMODATIONS SCORE=3gt Lrsquohotel egrave fantastico ltBENEFITgt

ltBENEFIT TYPE=LOCATION SCORE=2gt silenzioso ltBENEFITgt e

ltBENEFIT SCORE=3 TYPE=CLEANLINESSgt decisamente pulito ltBENEFITgt

ltBENEFIT TYPE=FOODSERVICE SCORE=3gt La colazione ottima ltBENEFITgt ed

52 FEATURE-BASED SENTIMENT ANALYSIS 209

ltBENEFIT TYPE=ATTITUDES SCORE=3gt il responsabile della sala colazione gentilissimo

ltBENEFITgt

The evaluation presented in Table 57 shows that the method pro-posed in the present work performs very well with the hotel domainWe manually calculated on a sample of 100 documents from theoriginal corpus the Precision on the Semantic Orientation Identi-fication task (that regards the annotation of the sentencesrsquo polarityand intensity) and on the Feature Classification task (that regards theannotation of the types of the feature) Because the classes used to

Method Precision Recall F-scoreSO Identification Classification

No residual category 933 924 723 811Residual category 929 826 783 804

Table 57 F-score of Feature Extraction and Pruning

group the features have been identified a priori in our grammar wealso checked the results of our tool with the introduction of a residualcategory to annotate the features that do not belong to these classesEven though this causes an increase on the Recall a commensuratedecrease of the Precision in the Classification task makes the valueof the F-score similar to the one reached when the residual categoryis not taken into account Significant examples of the sentences thathave been automatically annotated in our corpus are given below

(82) Il responsabile della sala colazione gentilissimoldquoVery kind the responsible of the breakfast roomrdquoBENEFIT TYPE= ATTITUDES SCORE= 3

(83) Le camere erano sporcheldquoThe rooms were dirtyrdquoDRAWBACK TYPE= CLEANLINESS SCORE= -2

210 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

(84) Lrsquohotel egrave vicinissimo alla metroldquoThe hotel is really close to the tuberdquoBENEFIT TYPE= LOCATION SCORE= 3

In (82) the subjective adjective gentilissimo ldquovery kindrdquo recognisedby the grammar as the superlative form (SCORE=+3) of the positive adjec-tive gentile ldquokindrdquo (SCORE=+2) makes the grammar recognize the wholesentiment expression and provides the information regarding its po-larity and intensity This is why the FSA associates the tag BENEFIT andthe higher score to the feature that in this case is represented by anounIn (83) is shown that the ldquotyperdquo of the features can be indicated notonly by nouns but also by specific adjectives as happens with sporcheldquodirtyrdquo that is both an opinion word that specifies that the whole ex-pression is negative and a feature class indicator that let the grammarclassify it in the group of ldquocleanlinessrdquo rather than in the group of ldquoac-commodationsrdquo (that instead would be selected by the noun cameraldquoroomrdquo described in our objective dictionary with the tags Conc ldquocon-creterdquo and Nedi ldquobuildingrdquo)In the end sentence (84) attests the strength of the influence of thereview domains sometimes the lexicon is not enough to correctly ex-tract and classify the product features Domain-specific lexicon in-dependent patterns are often required in order to make the analysesadequate (Engstroumlm 2004 Andreevskaia and Bergler 2008)

523 Opinion Visual Summarization

We said in Section 51 that Doxa is a prototype able to analyze opin-ionated documents from their semantic point of view It automati-cally applies the Nooj resources to free texts sums up the values cor-responding to every sentiment expression standardizes the results forthe total number of sentiment expressions contained in the reviewsand then provides statistics about the opinions expressed

53 SENTIMENT ROLE LABELING 211

In this Section we show the Doxa improved functionalities dedicatedto the sentiment analysis based on the opinion features In the Ho-tel domain this deeper analysis considerably increases the Precisionon the sentences-level SO identification task that goes from 813 to933 without any loss in the Recall that holds steady just rising of02 percentage points from 721 to 723 (see Table 57)The feature-based module of Doxa completely grounded on the auto-matic analysis of free texts allows the comparison between more thanone object on the base of different kinds of aspects that characterizethemThe regular decagon in the center of the spider graphs of Figure 54represents the threshold value (zero) with which every group of opin-ion on the same object is compared when a figure is larger than it theopinion is positive when it is smaller the opinion is negative when afeaturersquos value is zero it means that the consumers did not express anyjudgment on that topicThe automatic transformation of non-structured reviews in struc-tured visual summaries has a strong impact on the work of companymanagers that must ground their marketing strategies on the analy-sis of a large amount of information and on the individual consumersthat need to quickly compare the Pros and Cons of the products orservices they want to buy

53 Sentiment Role Labeling

Semantic Roles also called theta or thematic roles (Chomsky 1993Jackendoff 1990 see Section 214) are linguistic devices able to iden-tify in texts ldquoWho did What to Whom and How When and Whererdquo(Palmer et al 2010 p 1)The main task of an SRL system is to map the syntactic elements of freetext sentences to their semantic representation through the identifi-cation of all the syntactic functions of clause-complements and their

212 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figu

re54D

oxam

od

ule

for

Feature-b

asedSen

timen

tAn

alysis

53 SENTIMENT ROLE LABELING 213

labeling with appropriate semantic roles (Rahimi Rastgar and Razavi2013 Monachesi 2009)To work an SRL system needs machine readable lexical resources inwhich all the predicates and their relative arguments are specifiedFrameNet (Baker et al 1998 Fillmore et al 2002 Ruppenhofer et al2006) VerbNet Schuler (2005) and the PropBank Frame Files (Kings-bury and Palmer 2002) provide the basis for large scale semantic an-notation of corpora (Giuglea and Moschitti 2006 Palmer et al 2010)In this Section according with the Semantic Predicates theory (Gross1981) we performed a matching between the definitional syntacticstructures attributed to each class of verbs and the semantic infor-mation we attached in the database to every lexical entryThis way we could create a strict connection between the argumentsselected by a given Predicate listed in our tables and the actants in-volved into the same verbrsquos Semantic Frame (Fillmore 1976 Fillmoreand Baker 2001 Fillmore 2006) In the experiment that we are goingto describe we focused on the following frames

bull ldquoSentiment (eg odiare ldquoto hate) that involves the roles experi-encer and stimulus

bull ldquoOpinion (eg difendere ldquoto stand up for) that implicates theroles opinion holder target of the opinion and cause of the opin-ion

bull ldquoPhysical act (eg baciare ldquoto kiss) that evokes the roles patientand agent

531 Related Works

A group of researches in the field of Semantic Role Labeling focusedon the automatic annotation of the roles which are relevant to Sen-timent Analysis and opinion mining such as sourceopinion holdertargetthemetopic

214 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Bethard et al (2004) with question answering purposes proposed anextension of semantic parsing techniques coupled with additionallexical and syntactic features for the automatic extraction of propo-sitional opinions The algorithm used to identify and label semanticconstituents is the one developed by Gildea and Jurafsky (2002) Prad-han et al (2003)Kim and Hovy (2006) presented a Semantic Role Labeling systembased on the Frame Semantics of Fillmore (1976) that aimed to tagthe frame elements semantically related to an opinion bearing wordexpressed in a sentence which is considered as the key indicatorof an opinion The learning algorithm used for the classificationmodel is Maximum Entropy (Berger et al 1996 Kim and Hovy 2005)FrameNet II Ruppenhofer et al (2006) has been used to collect framesrelated to opinion wordsOther works based on FrameNet are Houen (2011) that used Frame-Frame relations in order to expand the subset of polarity scoredFrames found in the FrameNet database Ruppenhofer et al (2008)Ruppenhofer and Rehbein (2012) and Ruppenhofer (2013) whichadded opinion frames with slots for source target polarity and in-tensity to the FrameNet database to build SentiFrameNet a lexi-cal database enriched with inherently evaluative items associated toopinion framesLexicon-based approaches are the one of Kim et al (2008) that madeuse of a collection of communication and appraisal verbs the Sen-tiWordNet lexicon a syntactic parser and a named entity recognizerfor the extraction of opinion holders from texts and the one ofBloom et al (2007) that extracted domain-dependent opinion holdersthrough a combination of heuristic shallow parsing and dependencyparsingConditional Random Fields (Lafferty et al 2001) is the method usedby Choi et al (2005) that interpreting the semantic role labeling taskas an information extraction problem exploited a hybrid approachthat combines two very different learning-based methods CRF for the

53 SENTIMENT ROLE LABELING 215

named entity recognition and a variation of AutoSlog (Riloff 1996) forthe information extraction CRF have been also used by by Choi et al(2006) who presented a global inference approach to jointly extractentities and relations in the context of opinion oriented informationextraction through two separate token-level sequence-tagging classi-fiers for opinion expression extraction and source extraction via lin-ear chain CRF (Lafferty et al 2001)

532 Experiment and Evaluation

The starting point of our experiment on SRL are the LG tables of theItalian verbs developed at the Department of Communication Sci-ence of the University of Salerno Among the lexical entries listed insuch matrices we manually extracted all the verbs endowed with apositive or negative defined semantic orientation Such verbs couldbe expression of emotions opinions or physical acts Details of thisclassification can be found in Section 333As an example in Tables 58 and 59 we show a small group of verbs be-longing to the Lexicon-Grammar class 45 This class includes all theverbs that can entry into a syntactic structure such as N0 V di N1 inwhich the ldquosubject (N0) selected by the verb (V ) is generally a humannoun (Num) and the complement (N1) is a completive (Ch F) or in-finitive (V-inf comp) clause usually introduced by the preposition ldquodi(see Table58) As shown in Table 59 our databases contain also se-mantic information concerning the nature the semantic orientationand the strength of the Predicates in examIn detail the tagset used in this experiment is the following

1 Type

(a) SENT sentiment

(b) OP opinion

(c) PHY physical act

216 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

N0=

Nu

m

N0=

ilfa

tto

Ch

F

Verb

N1=

che

F

N1=

che

Fco

ng

diN

1um

diN

1-u

m

diN

1co

ntr

oN

2

pre

pV

-in

fco

mp

+ - profittare + + + + + -+ - ridersene + + + + - -+ - risentirsi + + + + - ++ - strafottersene + + + + - -+ - vergognarsi + + + + - +

Table 58 Extract of the LG table of the verb Class 45

2 Orientation

(a) POS positive

(b) NEG negative

3 Intensity

(a) FORTE intense

(b) DEB feeble

Speaking in terms of Frame Semantics we identified in the OpinionMining and in the Sentiment Analysis field three Frames of interestrecalled by specific Predicates Sentiment Opinion and Physical actThe frame elements evoked by such frames are described below

Sentiment

It refers to the expression of any given frame of mind or affectivestateThe ldquosentiment words can be put in connection with some WordNetAffect categories (Strapparava et al 2004) such as emotion mood he-donic signalExamples are sdegnarsi ldquoto be indignant (class 10) odiare ldquoto hate(class 20) affezionarsi ldquoto grow fond (class 44B) flirtare ldquoto flirt

53 SENTIMENT ROLE LABELING 217

(class 9) disprezzare ldquoto despise (class 20) gioire ldquoto rejoice (class45)Predicates of that kind evoke as frame elements an experiencerthat feels the emotion or other internal states and a stimulusand event or a person that instigates such states (Gross 1995Gildea and Jurafsky 2002 Swier and Stevenson 2004 Palmeret al 2005 Elia 2013) This semantic frame summarizes theFrameNet ones connected to emotions such as Sensation Emo-tions_of_mental_activity Cause_to_experience Emotion_active Emo-tions Cause_emotion ect

Opinion The type ldquoOpinion instead is the expression of positiveor negative viewpoints beliefs or judgments that can be personal orshared by most people It comprehends among the the WN-affect cat-egories trait cognitive state behavior attitude Examples are ignorareldquoto neglect (class 20) premiare ldquoto reward (class 20) difendere ldquotodefend (class 27) esaltare ldquoto exalt (class 22) dubitare ldquoto doubt(class 45) condannare ldquoto condemn (class 49) deridere ldquoto make funof (class 50)The frame elements they evoke are an opinion holder that states anopinion about an object or an event and an opinion target that repre-sents the event or the object on which the opinion is expressed about(Kim and Hovy 2006 Liu 2012) Some predicates imply also the pres-ence of a cause that motivates the opinion (see Section 333)Into the FrameNet frame Opinion and Judgment the opinion holder iscalled Cognizer bu we preferred to use a word which is more commonin the Sentiment Analysis and in the Opinion Mining literature

Physical act

The type ldquoPhysical act comprises verbs like baciare ldquoto kiss (class18) suicidarsi ldquoto commit suicide (class 2) vomitare ldquoto vomit (class2A) schiaffeggiare ldquoto slap (class 20) sparare ldquoto shoot (class 4)

218 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

palpeggiare ldquoto grope (class 18)For this group of predicates the selected frame elements ara a patientthat is the victim (for the negative actions) or the beneficiary (for thepositive ones) of the physical act carried out by an agent (Carreras andMagraverquez 2005 Magraverquez et al 2008) It includes a large number ofFrameNet frames such as Cause_bodily_experience Shoot_projectilesKilling Cause_harm Rape Sex Violence ect

In order to perform the orientation and the intensity attributionwe manually explored the Italian LG tables of verbs and the SentItasemantic labelsThanks to lexical resources of this kind it is possible to automaticallyextract and semantically describe real occurrences of sentences like(85)

(85) [Sentiment[ExperiencerRenzi(N0)] si vergogna(V) [Stimulusdi parlare di energia in Eu-

ropa(N1)]]

ldquoRenzi feels ashamed of talking about energy in Europe

in which the syntactic structure of the verb vergognarsi ldquoto feelashamed N0 V (di) Ch F is matched by means of interpretationrules (eg V = N0 V (di) Ch F = Caus(s sent h) N0 = h N1 = s) to thesemantic function Caus(s sent h) that puts in relation an experiencerh and a stimulus s thanks to a Sentiment Semantic Predicate sent(Gross 1995 Section 214)Moreover we provided our LG databases with the specification of thearguments (N0 N1 N2 ect) that are semantically influenced by thesemantic orientation of the verbs The purpose is to correctly identifythem as features of the opinionated sentences and to work on theirbase also into feature-based sentiment analysis tasks

The reliability of the LG method on the Semantic Role Labelingin the Sentiment Analysis task has been tested on three different

53 SENTIMENT ROLE LABELING 219

Verb

LGC

lass

Typ

e

Ori

enta

tio

n

Inte

nsi

ty

Infl

uen

ce

profittare 45 opinion negative - N0ridersene 45 opinion negative weak N1risentirsi 45 sentiment negative - N0strafottersene 45 opinion negative strong N1vergognarsi 45 sentiment negative strong N0

Table 59 Examples of semantic descriptions of the opinionated verbs from the LGclass 45

datasets two of which have been extracted from social networks orweb resourcesIn detail the first two datasets come from Twitter and the thirdis a free web news headings dataset provided by DataMediaHub(wwwdatamediahubit wwwhumanhighwayit)The tweets have been downloaded using the hashtag that groupstogether the user comments on the election of the Italian PresidentMattarella (1) and the one that collects the comments on the Mas-terchef Italian TV show (2)

bull Tweets 46393 tweets

1 Mattarellapresidente 10000 tweets

2 Masterchefit 36393 tweets

bull News Headings 80651 titles

The LG based approach on which this experiment has been builtincludes the following basic steps

bull pre-processing pipeline that includes two phases

ndash a cleaning up phase carried out with Python routines that

220 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

aims to distinguish in the datasets linguistic elements fromstructural elements (eg markup informations web specificelements)

ndash an automatic linguistic analysis phase with the goal to lin-guistically standardize relevant elements obtained from thecleaned datasets in this phase texts are tokenized lem-matized and POS tagged using TreeTagger (Schmid 1994Schmid et al 2007) and then parsed using DeSR a depen-dency parser (Attardi et al 2009)

bull LG based automatic analysis in which the raw data are seman-tically labeled according with the syntacticsemantic rules of in-terpretation connected with each LG verb class

Figure 55 Examples of syntactically and semantically annotated sentences

53 SENTIMENT ROLE LABELING 221

Figure 55 presents three headlines examples processed both withthe dependency syntactic parser and the semantic LG-based seman-tic analyzerNotice that the elements of the traditional grammar automaticallyidentified by DeSR such as subjects and complements have beenrenamed according with the Lexicon-Grammar traditionThe corpus that counts 127044 short texts has been analyzed andsemantically and syntactically annotatedThe representative sample on which the human evaluation has beenperformed instead has 42348 textsThe evaluation of the performances of our tool proved the effec-tiveness of the Lexicon-Grammar approach The average F-scoresachieved in the different datasets are 071 in the Twitter and 076 inthe Heading corpus

Chapter 6

Conclusion

The present research handled the detection of linguistic phenomenaconnected to subjectivity emotions and opinions from a computa-tional point of viewThe necessity to quickly monitor huge quantity of semistructured andunstructured data from the web poses several challenges to NaturalLanguage Processing that must provide strategies and tools to ana-lyze their structures from a lexical syntactical and semantic point ofviewsThe general aim of the Sentiment Analysis shared with the broaderfields of NLP Data Mining Information Extraction etc is the auto-matic extraction of value from chaos (Gantz and Reinsel 2011) itsspecific focus instead is on opinions rather than on factual informa-tion This is the aspect that differentiates it from other computationallinguistics subfieldsThe majority of the sentiment lexicons has been manually or auto-matically created for the English language therefore existent Italianlexicons are mostly built through the translation and adaptation of theEnglish lexical databases eg SentiWordNet and WordNet-AffectUnlike many other Italian and English sentiment lexicons SentIta

223

224 CHAPTER 6 CONCLUSION

made up on the interaction of electronic dictionaries and lexicon de-pendent local grammars is able to manage simple and multiwordstructures that can take the shape of distributionally free structuresdistributionally restricted structures and frozen structuresMoreover differently from other lexicon-based Sentiment Analysismethods our approach has been grounded on the solidity of theLexicon-Grammar resources and classifications that provide fine-grained semantic but also syntactic descriptions of the lexical entriesSuch lexically exhaustive grammars distance themselves from the ten-dency of other sentiment resources to classify together words thathave nothing in common from the syntactic point of viewFurthermore through the Semantic Predicate theory it has been pos-sible to postulate a complex parallelism among syntax and semanticsthat does not ignore the numerous idiosyncrasies of the lexiconAccording with the major contribution in the Sentiment Analysis lit-erature we did not consider polar words in isolation We computedthey elementary sentence contexts with the allowed transformationsand then their interaction with contextual valence shifters the lin-guistic devices that are able to modify the prior polarity of the wordsfrom SentIta when occurring with them in the same sentencesIn order to do so we took advantage of the computational power ofthe finite-state technology We formalized a set of rules that work forthe intensification downtoning and negation modeling the modalitydetection and the analysis of comparative formsHere the difference with other state-of-the-art strategies consists inthe elimination of complex mathematical calculation in favor of theeasier use of embedded graphs as containers for the expressions de-signed to receive the same annotations into a compositional frame-workWith regard to the applicative part of the research we conducted withsatisfactory results three experiments on the same number of Sen-timent Analysis subtasks the sentiment classification of documentsand sentences the feature-based Sentiment Analysis and the seman-

225

tic role labeling based on sentimentsThis work exploited a method based on knowledge but this doesnot run counter statistical strategies All the semantically annotateddatasets produced by our fine-grained analysis that can indeed bereused as testing sets for learning machinesAs concerns the limitations of this research we mention among oth-ers the cases of irony sarcasm and cultural stereotypes which stillremain open problems for the NLP in general and for the SentimentAnalysis in particular since they can completely overturn the polarityof the sentencesSarcasm hyperbole jocularity etc are all rhetorical devices usedto express verbal irony a figurative linguistic process used to inten-tionally deny what it is literally expressed (Gibbs 2000 Curcoacute 2000)Therefore irony could been interpreted as a sort of indirect negation(Giora 1995 Giora et al 1998) in which some expectations are vio-lated (Clark and Gerrig 1984 Wilson and Sperber 1992) On this pur-pose (Reyes et al 2013 p 243) affirms that irony is perceived ldquoat theboundaries of conflicting frames of reference in which an expectationof one frame has been inappropriately violated in a way that is appro-priate in the otherrdquoThis linguistic device has been explored in both computational lin-guistics studies and in Opinion Mining works Sets of potential indica-tors have been tested on large corpora through different approachesobtaining encouraging results over the state-of-the-art in some re-spectsIn detail Tepperman et al (2006) used prosodic spectral and contex-tual cues Carvalho et al (2009) selected as irony cues emoticons ono-matopoeic expressions for laughter heavy punctuation marks quota-tion marks and positive interjections Davidov et al (2010) exploitedthe meta-data provided by Amazon to locate gaps between the starratings and the sentiments expressed in the raw reviews Reyes andRosso (2011) chose n-grams part of speech n-grams funny profilingpositivenegative profiling affective profiling and pleasantness pro-

226 CHAPTER 6 CONCLUSION

filing Gonzaacutelez-Ibaacutenez et al (2011) identified sarcasm through punc-tuation marks emoticons quotes capitalized words discursive termsthat hint at opposition or contradiction in a text temporal and contex-tual imbalance character-grams skip-grams and emotional scenar-ios concerning activation imagery and pleasantness Maynard andGreenwood (2014) exploited the hashtag tokenizationAmong others Veale and Hao (2009) carried out a very interestingstudy on creative irony focusing on ironic comparisons of the kind ATopic is as Ground as a Vehicle (Fishelov 1993 Moon 2008) whichcan consist on pre-fabricated similes eg ldquoas strong as an oxrdquo (Tay-lor 1954 Norrick 1986 Moon 2008) or on new and more emphaticand creative variations along the frozen similes eg ldquoas dangerous asa toothless wolfrdquo The authors found out that the 2 of these elabora-tions subvert the original simile to achieve an ironic effect The prob-lem is that sarcasm in general is a phenomenon inherently difficult toanalyze for machines and often for humans as well (Justo et al 2014Maynard and Greenwood 2014) In addition the barriers to the cre-ation of brand new comparisons are very low (Veale and Hao 2009)In any case the solutions to verbal irony proposed in literature seemfar from being accurate and generalizable They frequently lead tocontradictory results such as with the use of punctuation markswhich made Carvalho et al (2009) and Tepperman et al (2006) reachrelatively high precision but served as very weak predictors for Justoet al (2014) and Tsur et al (2010)In this works speaking in cost-benefit terms we realized that thequantity of errors introduced by the use of irony cues would havegone over the errors caused by their absence Therefore we limited theirony analysis to the cases of frozen comparative sentences describedin Section 342 that can attribute a semantic orientation to neutralwords (86) or reverse the orientation of polar words (87)

(86) Arturo egrave bianco[0] come un cadavere [-2]ldquoArturo is as white as a dead body (Arturo is pale)

227

(87) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

Anyway as can be noticed in (88) and (89) ironic comparisons canvary considerably from one case to another (Veale and Hao 2009) andcannot be distinguished automatically from non-ironic comparisons(90) as long as we don not include (domain-dependent) encyclopedicknowledge into the analysis tools

(88) La ripresa egrave degna[+2] di un trattore con aratro inseritoldquoThe pickup is worthy of a tractor with an inserted ploughrdquo

(89) E quel tocco di piccante () egrave gradevole[+2] quanto lo sarebbe unaspruzzata di pepe su un gelato alla panna

ldquoAnd that touch of piquancy ( ) is as pleasant as a spattering ofpepper on a cream flavoured ice-creamrdquo

(90) Gli spazi interni sono comodi[+2] come una berlina [+2]ldquoInside spaces are as comfortable as a sedanrdquo

Similar comments on the required encyclopedic knowledge can bemade about cultural stereotypes as well (91 92)

(91) La nuova fiat 500 egrave consigliabile[+2] molto di piugrave ad una ragazza[-2]

ldquoThe new Fiat 500 is recommended a lot more to a girlrdquo

(92) Un gioco per bambini di 12 anni [-2]ldquoA game for 12 years old childrdquo

Appendix A

Italian Lexicon-GrammarSymbols

Agg Val Evaluative adjective

Agg Adjective

Avv Adverbe

Ci Constrained nominal group

Ch F Completive complement without specification of mood

Det Determiner

N0 Sentence formal subject

N123 Sentence complements

Ni Nominal group

N Noun

Napp Appropriate noun

Nconc Concrete noun

229

230 APPENDIX A ITALIAN LEXICON-GRAMMAR SYMBOLS

Npc Noun of body part

Nsent Noun of Sentiment

Num Human noun

N-um Not human noun

Ppv Pronoun in preverbal position

Prep Preposition

V Verb

V-a Adjectivalization

Vcaus Causative verb

V-n Nominalization

Vsup Support verb

Vval Evaluative verb

Vvar Support verb variant

Appendix B

Acronyms

ALU Atomic Linguistic Units

cBNP Combined Pattern Based Noun Phrases

CCG Combinatory Categorial Grammar

CFE Concordance-based Feature Extraction

CRF Conditional Random Field

CSR Class Sequential Rule

CVS Contextual Valence Shifters

DRS Discourse Representation Structure

ERTN Enhanced Recursive Transition Network

eWOM electronic Word of Mouth

FMM Forward Maximum Matching

FSA Finite State Automata

FST Finite State Transducers

231

232 APPENDIX B ACRONYMS

LDA Latent Dirichlet Allocation

LG Lexicon-Grammar

LSA Latent Semantic Analysis

LSF Lexical Syntactic Feature

mCVS Morphological Contextual Valence Shifters

MPQA Multi Perspective Question Answering

NLP Natural Language Processing

PMI Pointwise Mutual Information

POS Part of Speech

PP Prior Polarity

QN Quality Noun

RNTN Recursive Neural Tensor Network

RST Relevant Semantic Tree

RTN Recursive Transition Network

SO Semantic Orientation

SVM Support Vector Machine

TGG Transformational-Generative Grammar

UDF User Defined taxonomy of entity Feature

WMI Web-based Mutual Information

XML eXtensible Markup Language

List of Figures

21 Syntactic Trees of the sentences (1) and (2) 2422 Syntactic Trees of the sentences (1b) and (2b) 2623 LG syntax-semantic interface 3224 Deterministic Finite-State Automaton 3725 Non deterministic Finite-State Automaton 3726 Finite-State Transducer 3727 Enhanced Recursive Transition Network 37

31 FSA for the interaction between Nsent and semanticallyannotated verbs 88

32 Extract of the morphological FSA that associates psychverbs to their adjectivalizations 92

33 Frozen and semi-frozen expressions that involve the vul-gar word cazzo and its euphemisms 103

34 Extract of the FSA for the SO identification of idioms 12035 Extract of the FSA for the extraction of PECO idioms 12436 Extract of the FSA for the automatic annotation of senti-

ment adverbs 13737 Extract of the FSA for the automatic annotation of qual-

ity nouns 14338 Morphological Contextual Valence Shifting of Adjectives 150

233

234 LIST OF FIGURES

51 Doxa the LG sentiment classifier based on the finite-state technology 189

52 Extract of the Feature Extraction grammar 20653 Extract of the Feature Pruning metanode 20754 Doxa module for Feature-based Sentiment Analysis 21255 Examples of syntactically and semantically annotated

sentences 220

List of Tables

21 Example of a Lexicon-Grammar Binary Matrix 27

31 Manually built Polarity Lexicons 4632 (Semi-)Automatically built Polarity Lexicons 4633 Sentix Basile and Nissim (2013) 5434 SentIta Evaluation tagset 5735 SentIta Strength tagset 5736 Composition of SentIta 5837 SentIta tag distribution in the adjective list 6138 Adjectives percentage values in SentIta 6139 Examples from the Psych Predicates dictionary 67310 Psych Predicates chosen for SentIta 67311 All the Semantic Predicates from SentIta 68312 Polar verbs in the main Lexicon-Grammar verbal sub-

classes 79313 Polar verbs in other LG verb classes which contain at

least one oriented item 80314 Distribution of semantic labels into the LG verb classes

that contain at least one oriented item 81315 Description of the Nominalizations of the Psychological

Semantic Predicates 82316 Nominalizations of the Psych Predicates chosen from

SentIta 84

235

236 LIST OF TABLES

317 Distribution of semantic tags in the dictionary of Adjec-tival Psych Predicates 90

318 Percentage values of Psych Adjectivalizations in SentIta 91319 Suffixes from the dictionary of Psych Adjectivalizations 93320 Adjectivalizations of the Psych Predicates 94321 SentIta tag distribution in the list of bad words 105322 Bad Words percentage values in SentIta 105323 Classes of Compound Adverbs in SentIta 110324 Composition of the compound adverb dictionary 113325 Polar adjectives in frozen sentences 115326 Examples of idioms that maintain switch or shift the

prior polarity of the adjectives they contain 117327 Composition of the adverb dictionary 138328 Adjectiversquos inflectional classes Salvi and Vanelli (2004) 139329 Rules for the adverb derivation 139330 Error analysis of the automatic QN annotation 144331 Presence of the QN suffixes in different dictionaries 145332 Productivity of the QN suffixes in different dictionaries 146333 About the overlap among Psych V-n dictionary and QN

dictionary 147334 Precision measure about the overlap of QN and Psy V-n 147335 Negation suffixes 148336 Intensifierdowntoner suffixes 149

41 The effects of troppo poco and abbastanza on polar lex-ical items 160

42 The effects of the co-occurrence and the negation oftroppo poco with reference to polar lexical items 160

43 Negation rules 16744 Negation rules with the repetition of negative operators 16745 Negation rules with strong operators 16746 Negation rules with weak operators 168

LIST OF TABLES 237

47 Rule I increasing comparative sentences with positiveorientation 180

48 Rule III decreasing comparative sentences with nega-tive orientation 180

49 Rule IV increasing comparative sentences with NegativeOrientation 181

410 Rule VI decreasing comparative sentences with positiveorientation 181

411 Nagation of Rule I Negated increasing comparative sen-tences with negative orientation 182

412 Negation of Rule III Negated decreasing comparativesentences with positive orientation 182

413 Negation of Rule IV Negated increasing comparativesentences with negative orientation 183

414 Negation of RuleVI Negated decreasing comparativesentences with positive orientation 183

51 Dataset of opinionated online customer reviews 19052 Precision measure in sentence-level classification 19353 Irrelevant matches in sentence-level classification 19554 Precision measure in document-level classification 19655 Recall in both the sentence-level and the document level

classifications 19656 Semantically Labeled Dictionary of Concrete Nouns 20557 F-score of Feature Extraction and Pruning 20958 Extract of the LG table of the verb Class 45 21659 Examples of semantic descriptions of the opinionated

verbs from the LG class 45 219

Bibliography

Abdul-Mageed M and Diab M (2012) Toward building a large-scalearabic sentiment lexicon In Proceedings of the 6th InternationalGlobal WordNet Conference pages 18ndash22

Abdul-Mageed M Diab M T and Korayem M (2011) Subjectiv-ity and sentiment analysis of modern standard arabic In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies short papers-Volume 2pages 587ndash591 Association for Computational Linguistics

Agerri R and Garciacutea-Serrano A (2010) Q-WordNet Extracting polar-ity from WordNet senses In Proceedings of the International Confer-ence on Language Resources and Evaluation pages 2300ndash2305

Akerlof G A (1970) The market for ldquolemonsrdquo Quality uncertaintyand the market mechanism In The quarterly journal of economicspages 488ndash500 JSTOR

Alba J Lynch J Weitz B Janiszewski C Lutz R Sawyer A andWood S (1997) Interactive home shopping consumer retailer andmanufacturer incentives to participate in electronic marketplacesIn The Journal of Marketing pages 38ndash53 JSTOR

Alm C O Roth D and Sproat R (2005) Emotions from text ma-chine learning for text-based emotion prediction In Proceedingsof the conference on Human Language Technology and Empirical

239

240 BIBLIOGRAPHY

Methods in Natural Language Processing pages 579ndash586 Associa-tion for Computational Linguistics

Andreevskaia A and Bergler S (2008) When specialists and gener-alists work together Overcoming domain dependence in sentimenttagging In ACL pages 290ndash298

Arad M (1998) Psych-notes In UCL Working Papers in Linguisticsvolume 10 pages 1ndash22

Argamon S Bloom K Esuli A and Sebastiani F (2009) Automati-cally determining attitude type and force for sentiment analysis InHuman Language Technology Challenges of the Information Societypages 218ndash231 Springer

Attardi G DellrsquoOrletta F Simi M and Turian J (2009) Accuratedependency parsing with a stacked multilayer perceptron In Pro-ceedings of EVALITA volume 9

Aue A and Gamon M Customizing sentiment classifiers to new do-mains A case study In Proceedings of recent advances in naturallanguage processing (RANLP)

Baccianella S Esuli A and Sebastiani F (2010) SentiWordNet 30An enhanced lexical resource for sentiment analysis and opinionmining In LREC volume 10 pages 2200ndash2204

Baker C F Fillmore C J and Lowe J B (1998) The berkeleyframenet project In Proceedings of the 17th international confer-ence on Computational linguistics-Volume 1 pages 86ndash90 Associa-tion for Computational Linguistics

Baker M (1988) Theta theory and the syntax of applicatives inchichewa In Natural Language amp Linguistic Theory volume 6 pages353ndash389 Springer

Baker M C (1997) Thematic roles and syntactic structure In Ele-ments of grammar pages 73ndash137 Springer

BIBLIOGRAPHY 241

Balahur A and Turchi M (2012) Multilingual sentiment analysis us-ing machine translation In Proceedings of the 3rd Workshop inComputational Approaches to Subjectivity and Sentiment Analysispages 52ndash60 Association for Computational Linguistics

Baldoni M Baroglio C Patti V and Rena P (2012) From tags toemotions Ontology-driven sentiment analysis in the social seman-tic web In Intelligenza Artificiale volume 6 pages 41ndash54

Balibar-Mrabti A (1995) Une eacutetude de la combinatoire des noms desentiment dans une grammaire locale In Langue franccedilaise pages88ndash97 JSTOR

Baptista J (2003) Some families of compound temporal adverbs inportuguese In Proceedings of the Workshop on Finite-State Methodsfor Natural Language Processing

Baroni M and Vegnaduzzo S (2004) Identifying subjective adjec-tives through web-based mutual information In Proceedings ofKONVENS volume 4 pages 17ndash24

Basile V and Nissim M (2013) Sentiment analysis on italian tweetsIn Proceedings of the 4th Workshop on Computational Approaches toSubjectivity Sentiment and Social Media Analysis pages 100ndash107

Belletti A and Rizzi L (1988) Psych-verbs and θ-theory In NaturalLanguage amp Linguistic Theory volume 6 pages 291ndash352 Springer

Benamara F Cesarano C Picariello A Recupero D R and Subrah-manian V S (2007) Sentiment analysis Adjectives and adverbs arebetter than adjectives alone In ICWSM

Benamara F Chardon B Mathieu Y Popescu V and Asher N(2012) How do negation and modality impact on opinions In Pro-ceedings of the Workshop on Extra-Propositional Aspects of Meaningin Computational Linguistics pages 10ndash18 Association for Compu-tational Linguistics

242 BIBLIOGRAPHY

Bennis H (2004) Unergative adjectives and psych verbs In Studiesin unaccusativity The syntax-lexicon interface pages 84ndash113

Berger A L Pietra V J D and Pietra S A D (1996) A maximum en-tropy approach to natural language processing In Computationallinguistics volume 22 pages 39ndash71 MIT Press

Bespalov D Bai B Qi Y and Shokoufandeh A (2011) Sentimentclassification based on supervised latent n-gram analysis In Pro-ceedings of the 20th ACM international conference on Informationand knowledge management pages 375ndash382 ACM

Bethard S Yu H Thornton A Hatzivassiloglou V and Jurafsky D(2004) Automatic extraction of opinion propositions and their hold-ers In 2004 AAAI Spring Symposium on Exploring Attitude and Affectin Text volume 2224

Bhatti M W Wang Y and Guan L A neural network approachfor human emotion recognition in speech In Circuits and Systems2004 ISCASrsquo04 volume 2

Blitzer J Dredze M Pereira F et al (2007) Biographies bollywoodboom-boxes and blenders Domain adaptation for sentiment clas-sification In ACL volume 7 pages 440ndash447

Bloom K (2011) Sentiment analysis based on appraisal theory andfunctional local grammars PhD thesis Illinois Institute of Technol-ogy

Bloom K Garg N and Argamon S (2007) Extracting appraisal ex-pressions In HLT-NAACL volume 2007 pages 308ndash315

Bloomfield L (1933) Language holt new york In Edizione italiana(1974) Il linguaggio Il Saggiatore Milano

Bocchino F (2007) Lessico-Grammatica dellrsquoitaliano Le costruzioniintransitive PhD thesis Universitagrave degli studi di Salerno

BIBLIOGRAPHY 243

Bos J and Nissim M (2006) An empirical approach to the interpreta-tion of superlatives In Proceedings of the 2006 conference on empiri-cal methods in natural language processing pages 9ndash17 Associationfor Computational Linguistics

Bosco C Allisio L Mussa V Patti V Ruffo G Sanguinetti M andSulis E (2014) Detecting happiness in italian tweets Towards anevaluation dataset for sentiment analysis in felicitta In Proc of ES-SSLOD pages 56ndash63

Bosco C Patti V and Bolioli A (2013) Developing corpora for sen-timent analysis The case of irony and senti-tut In IEEE IntelligentSystems number 2 pages 55ndash63 IEEE

Boucher J and Osgood C E (1969) The pollyanna hypothesis InJournal of Verbal Learning and Verbal Behavior volume 8 pages 1ndash8 Elsevier

Bounie D Bourreau M Gensollen M and Waelbroeck P (2005)The effect of online customer reviews on purchasing decisions Thecase of video games In Retrieved July volume 8 page 2009 Citeseer

Breese J S Heckerman D and Kadie C (1998) Empirical analysisof predictive algorithms for collaborative filtering In Proceedingsof the Fourteenth conference on Uncertainty in artificial intelligencepages 43ndash52 Morgan Kaufmann Publishers Inc

Brill E (1995) Transformation-based error-driven learning and nat-ural language processing A case study in part-of-speech tagging InComputational linguistics volume 21 pages 543ndash565 MIT Press

Buchanan T W Lutz K Mirzazade S Specht K Shah N J ZillesK and Jaumlncke L (2000) Recognition of emotional prosody and ver-bal components of spoken language an fmri study In CognitiveBrain Research volume 9 pages 227ndash238 Elsevier

244 BIBLIOGRAPHY

Buvet P-A Girardin C Gross G and Groud C (2005) Les preacutedicatsdrsquoaffect In LIDIL number 32 pages ppndash125

Carbonell J G (1979) Subjective understanding Computer modelsof belief systems Technical report DTIC Document

Carenini G Ng R T and Zwart E (2005) Extracting knowledge fromevaluative text In Proceedings of the 3rd international conference onKnowledge capture pages 11ndash18 ACM

Carreras X and Magraverquez L (2005) Introduction to the conll-2005shared task Semantic role labeling In Proceedings of the Ninth Con-ference on Computational Natural Language Learning pages 152ndash164 Association for Computational Linguistics

Carvalho P Sarmento L Silva M J and De Oliveira E (2009) Cluesfor detecting irony in user-generated contents oh itrsquos so easy-) In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion pages 53ndash56 ACM

Chen Y Zhou Y Zhu S and Xu H (2012) Detecting offensive lan-guage in social media to protect adolescent online safety In Pri-vacy Security Risk and Trust (PASSAT) 2012 International Confer-ence on and 2012 International Confernece on Social Computing (So-cialCom) pages 71ndash80 IEEE

Chetviorkin I and Loukachevitch N V (2012) Extraction of rus-sian sentiment lexicon for product meta-domain In COLING pages593ndash610

Chevalier J A and Mayzlin D (2006) The effect of word of mouthon sales Online book reviews In Journal of marketing research vol-ume 43 pages 345ndash354 American Marketing Association

Chin P (2005) The value of user-generated contents In Intranet Jour-nal www intranetjournal com

BIBLIOGRAPHY 245

Choi Y Breck E and Cardie C (2006) Joint extraction of entities andrelations for opinion recognition In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing pages431ndash439 Association for Computational Linguistics

Choi Y and Cardie C (2008) Learning with compositional seman-tics as structural inference for subsentential sentiment analysis InProceedings of the Conference on Empirical Methods in Natural Lan-guage Processing pages 793ndash801

Choi Y Cardie C Riloff E and Patwardhan S (2005) Identify-ing sources of opinions with conditional random fields and extrac-tion patterns In Proceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Language Processingpages 355ndash362 Association for Computational Linguistics

Chomsky N (1957) Syntactic Structures Mouton and Co The Hague-Paris

Chomsky N (1965) Aspects of the Theory of Syntax Number 11 MITpress

Chomsky N (1993) Lectures on government and binding The Pisalectures Number 9 Walter de Gruyter

Chu S-C and Kim Y (2011) Determinants of consumer engagementin electronic word-of-mouth (ewom) in social networking sites InInternational journal of Advertising volume 30 pages 47ndash75 Tayloramp Francis

Clark H H and Gerrig R J (1984) On the pretense theory of ironyIn Journal of Experimental Psychology General volume 113 pages121ndash126

Clark S and Curran J R (2004) The importance of supertagging forwide-coverage ccg parsing In Proceedings of the 20th international

246 BIBLIOGRAPHY

conference on Computational Linguistics page 282 Association forComputational Linguistics

Clematide S and Klenner M (2010) Evaluation and extension of apolarity lexicon for german In Proceedings of the First Workshop onComputational Approaches to Subjectivity and Sentiment Analysispages 7ndash13

Councill I G McDonald R and Velikovich L (2010) Whatrsquos greatand whatrsquos not learning to classify the scope of negation for im-proved sentiment analysis In Proceedings of the workshop on nega-tion and speculation in natural language processing pages 51ndash59Association for Computational Linguistics

Curcoacute C (2000) Irony Negation echo and metarepresentation InLingua volume 110 pages 257ndash280 Elsevier

DrsquoAgostino E (1992) Analisi del discorso metodi descrittividellrsquoitaliano drsquouso Loffredo

DrsquoAgostino E (2005) Grammatiche lessicalmente esaustive delle pas-sioni il caso dellrsquoio collerico le forme nominali In Quaderns drsquoItaliagravepages 149ndash169

DrsquoAgostino E De Bueriis G Cicalese A Monteleone M VellutinoD Messina S Langella A Santonicola S Longobardi F andGuglielmo D (2007) Lexicon-grammar classifications or better toget rid of anguish In 26th International Conference on Lexis andGrammar (LGCrsquo07)

DrsquoAgostino E and Elia A (1998) Il significato delle frasi un contin-uum dalle frasi semplici alle forme polirematiche In AA VV Ai limitidel linguaggio Laterza Bari pages 287ndash310

Dalianis H and Skeppstedt M (2010) Creating and evaluating a con-sensus for negated and speculative words in a swedish clinical cor-pus In Negation and Speculation in Natural Language Processing(NeSp-NLP 2010) page 5

BIBLIOGRAPHY 247

Dang Y Zhang Y and Chen H (2010) A lexicon-enhanced methodfor sentiment classification An experiment on online product re-views In Intelligent Systems IEEE volume 25 pages 46ndash53 IEEE

Darroch J N and Ratcliff D (1972) Generalized iterative scaling forlog-linear models In The annals of mathematical statistics pages1470ndash1480 JSTOR

Dasgupta S and Ng V (2009) Mine the easy classify the hard asemi-supervised approach to automatic sentiment classification InProceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural LanguageProcessing of the AFNLP Volume 2-Volume 2 pages 701ndash709 Asso-ciation for Computational Linguistics

Davidov D Tsur O and Rappoport A (2010) Semi-supervisedrecognition of sarcastic sentences in twitter and amazon In Pro-ceedings of the Fourteenth Conference on Computational NaturalLanguage Learning pages 107ndash116 Association for ComputationalLinguistics

de Albornoz J C Plaza L and Gervaacutes P (2012) Sentisense An easilyscalable concept-based affective lexicon for sentiment analysis InLREC pages 3562ndash3567

De Bueriis G and Elia A (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

De Gioia M (1994a) Chiaro come il sole a mezzogiorno una classedi avverbi idiomatici dellrsquoitaliano In The Linguist volume 33 pages220ndash225 Newnorth Print Ltd

De Gioia M (1994b) Problemi di rappresentazione et traduzione degliavverbi idiomatici Micromeacutegas

De Gioia M (2001) Avverbi idiomatici dellrsquoitaliano analisi lessico-grammaticale LrsquoHarmattan Italia

248 BIBLIOGRAPHY

De Mauro T and Thornton A M (1985) La predicazione teoria e ap-plicazione allrsquoitaliano In Sintassi e morfologia della lingua italianadrsquouso teorie ed applicazioni descrittive pages 487ndash519

De Silva L C Miyasato T and Nakatsu R (1997) Facial emotionrecognition using multi-modal information In Information Com-munications and Signal Processing 1997 ICICS Proceedings of 1997International Conference on volume 1 pages 397ndash401 IEEE

Denecke K (2008) Using SentiWordNet for multilingual sentimentanalysis In Data Engineering Workshop 2008 ICDEW 2008 IEEE24th International Conference on pages 507ndash512 IEEE

Descleacutes J Makkaoui O and Hacegravene T (2010) Automatic annota-tion of speculation in biomedical texts new perspectives and largescale evaluation In Negation and Speculation in Natural LanguageProcessing (NeSp-NLP 2010) page 32

Dewally M and Ederington L (2006) Reputation certification war-ranties and information as remedies for seller-buyer informationasymmetries Lessons from the online comic book market In TheJournal of Business volume 79 pages 693ndash729 JSTOR

Diamond D W (1989) Reputation acquisition in debt markets InThe journal of political economy pages 828ndash862 JSTOR

Dowty D R (1989) On the semantic content of the notion of lsquothe-matic rolersquo In Properties types and meaning pages 69ndash129 Springer

Dragut E C Yu C Sistla P and Meng W (2010) Construction ofa sentimental word dictionary In Proceedings of the 19th ACM in-ternational conference on Information and knowledge managementpages 1761ndash1764

Duan W Gu B and Whinston A B (2008) The dynamics of on-line word-of-mouth and product salesmdashan empirical investigation

BIBLIOGRAPHY 249

of the movie industry In Journal of retailing volume 84 pages 233ndash242 Elsevier

Elia A Per filo e per segno la struttura degli avverbi compostiIn DrsquoAgostino (a cura di) Tra sintassi e semantica Descrizioni emetodi di elaborazione automatica della lingua drsquouso pages 167ndash263 Napoli ESI

Elia A (1984) Le verbe italien Les compleacutetives dans les phrases agrave uncompleacutement Fasano di Puglia Schena-Nizet

Elia A (1990) Chiaro e tondo Lessico-Grammatica degli avverbi com-posti in italiano Segno Associati

Elia A (2013) On lexical semantic and syntactic granularity of ital-ian verbs In Penser le Lexique Grammaire Perspectives ActuellesHonoreacute Champion Paris pages 277ndash286

Elia A (2014a) Lessico e sintassi tra tempo e massa parlante InMarchese MP Nocentini A Il lessico nella teoria e nella storia lin-guistica pages 15ndash47 Edizioni il Calamo

Elia A (2014b) Operatori argomenti e il sistema leg-semantic rolelabelling dellrsquoitaliano In Mirto I a cura di Le relazioni irresistibilipages 105ndash118 Edizioni Ets

Elia A and De Bueriis G (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

Elia A Guglielmo D Maisto A and Pelosi S (2013) A linguistic-based method for automatically extracting spatial relations fromlarge non-structured data In Algorithms and Architectures for Par-allel Processing pages 193ndash200 Springer

Elia A Martinelli M and DrsquoAgostino E (1981) Lessico e Strut-ture sintattiche Introduzione alla sintassi del verbo italiano NapoliLiguori

250 BIBLIOGRAPHY

Elia A Pelosi S Maisto A and Raffaele G (2015) Towards alexicon-grammar based framework for nlp an opinion mining ap-plication In RANLP 2015 pages 160ndash167

Elia A and Vietri S (2010) Lexis-grammar amp semantic web In IN-FOtheca volume 11 pages 15andash38a

Elia A Vietri S Postiglione A Monteleone M and Marano F(2010) Data mining modular software system In SWWS pages 127ndash133

Engstroumlm C (2004) Topic dependence in sentiment classification InUnpublished MPhil Dissertation University of Cambridge

Essa I Pentland A P et al (1997) Coding analysis interpretationand recognition of facial expressions In Pattern Analysis and Ma-chine Intelligence IEEE Transactions on volume 19 pages 757ndash763IEEE

Esuli A and Sebastiani F (2005) Determining the semantic orienta-tion of terms through gloss classification In Proceedings of the 14thACM international conference on Information and knowledge man-agement pages 617ndash624 ACM

Esuli A and Sebastiani F (2006a) Determining term subjectivity andterm orientation for opinion mining In EACL volume 6 page 2006

Esuli A and Sebastiani F (2006b) SentiWordNet A publicly avail-able lexical resource for opinion mining In Proceedings of LRECvolume 6 pages 417ndash422

Farkas D F and Kiss K Eacute (2000) On the comparative and absolutereadings of superlatives In Natural Language amp Linguistic Theoryvolume 18 pages 417ndash455 Springer

Farkas R Vincze V Moacutera G Csirik J and Szarvas G (2010) Theconll-2010 shared task learning to detect hedges and their scope in

BIBLIOGRAPHY 251

natural language text In Proceedings of the Fourteenth Conferenceon Computational Natural Language LearningmdashShared Task pages1ndash12 Association for Computational Linguistics

Fellbaum C (1998) WordNet Wiley Online Library

Ferreira L Jakob N and Gurevych I (2008) A comparative studyof feature extraction algorithms in customer reviews In SemanticComputing 2008 IEEE International Conference on pages 144ndash151IEEE

Filip H (1996) Psychological predicates and the syntax-semantics in-terface In Conceptual Structure Discourse and Language StanfordCenter for the Study of Language and Information

Fillmore C J (1976) Frame semantics and the nature of languageIn Annals of the New York Academy of Sciences volume 280 pages20ndash32 Wiley Online Library

Fillmore C J (1977) The case for case reopened In Syntax and se-mantics volume 8 pages 59ndash82

Fillmore C J (2006) Frame semantics In Cognitive linguisticsBasic readings volume 34 pages 373ndash400 Berlin Mounton deGruyter[Originally published in Linguistics in the Morning CalmSeoul Hanshin Publishing Co 1982]

Fillmore C J and Baker C F (2001) Frame semantics for text un-derstanding In Proceedings of WordNet and Other Lexical ResourcesWorkshop NAACL

Fillmore C J Baker C F and Sato H (2002) The framenet databaseand software tools In LREC

Fishelov D (1993) Poetic and non-poetic simile structure seman-tics rhetoric In Poetics Today pages 1ndash23 JSTOR

252 BIBLIOGRAPHY

Fiszman M Demner-Fushman D Lang F M Goetz P and Rind-flesch T C (2007) Interpreting comparative constructions inbiomedical text In Proceedings of the Workshop on BioNLP 2007Biological Translational and Clinical Language Processing pages137ndash144 Association for Computational Linguistics

Fragopanagos N and Taylor J G (2005) Emotion recognition inhumanndashcomputer interaction In Neural Networks volume 18pages 389ndash405 Elsevier

Gaeta L (2004) Nomi drsquoazione In La formazione delle parole in ital-iano Tuumlbingen Max Niemeyer Verlag pages 314ndash51

Gamon M and Aue A (2005) Automatic identification of senti-ment vocabulary exploiting low association with known sentimentterms In Proceedings of the ACL Workshop on Feature Engineeringfor Machine Learning in Natural Language Processing pages 57ndash64

Ganapathibhotla M and Liu B (2008) Mining opinions in compara-tive sentences In Proceedings of the 22nd International Conferenceon Computational Linguistics-Volume 1 pages 241ndash248 Associationfor Computational Linguistics

Ganter V and Strube M (2009) Finding hedges by chasing weaselsHedge detection using wikipedia tags and shallow linguistic fea-tures In Proceedings of the ACL-IJCNLP 2009 Conference Short Pa-pers pages 173ndash176 Association for Computational Linguistics

Gantz J and Reinsel D (2011) Extracting value from chaos In IDCiview number 1142 pages 9ndash10

Garcia D Garas A and Schweitzer F (2012) Positive words carryless information than negative words In EPJ Data Science volume 1EPJ

Gatti L and Guerini M (2012) Assessing sentiment strength in wordsprior polarities In arXiv preprint arXiv12124315

BIBLIOGRAPHY 253

Gibbs R W (2000) Irony in talk among friends In Metaphor andsymbol volume 15 pages 5ndash27 Taylor amp Francis

Gildea D and Jurafsky D (2002) Automatic labeling of semanticroles In Computational linguistics volume 28 pages 245ndash288 MITPress

Giora R (1995) On irony and negation In Discourse processes vol-ume 19 pages 239ndash264 Taylor amp Francis

Giora R Fein O and Schwartz T (1998) Irony Grade salience andindirect negation In Metaphor and Symbol volume 13 pages 83ndash101 Taylor amp Francis

Giordano R and Voghera M (2008) Frasi senza verbo il contributodella prosodia In Sintassi storica e sincronica dellrsquoitaliano Atti delConv Intern SILFI Basilea

Giuglea A-M and Moschitti A (2006) Semantic role labeling viaframenet verbnet and propbank In Proceedings of the 21st Inter-national Conference on Computational Linguistics and the 44th an-nual meeting of the Association for Computational Linguistics pages929ndash936 Association for Computational Linguistics

Godard D (2013) Les neacutegateurs In La grande Grammaire du franccedilaisActes Sud

Goldberg A B and Zhu X (2006) Seeing stars when there arenrsquot manystars graph-based semi-supervised learning for sentiment catego-rization In Proceedings of the First Workshop on Graph Based Meth-ods for Natural Language Processing pages 45ndash52 Association forComputational Linguistics

Gonzaacutelez-Ibaacutenez R Muresan S and Wacholder N (2011) Identify-ing sarcasm in twitter a closer look In Proceedings of the 49th An-nual Meeting of the Association for Computational Linguistics Hu-man Language Technologies short papers-Volume 2 pages 581ndash586Association for Computational Linguistics

254 BIBLIOGRAPHY

Grimshaw J (1990) Argument structure the MIT Press

Gross G (1992a) Un outil pour le FLE les classes d lsquoobjets In Actesdu colloque FLE pages 169ndash192

Gross G (2008) Les classes drsquoobjets In Lalies number 28 pages 111ndash165

Gross M (1971) Transformational Analysis of French Verbal Con-structions University of Pennsylvania

Gross M (1975) Meacutethodes en syntaxe Hermann

Gross M (1979) On the failure of generative grammar In Languagepages 859ndash885 JSTOR

Gross M (1981) Les bases empiriques de la notion de preacutedicat seacute-mantique In Langages pages 7ndash52 JSTOR

Gross M (1982) Une classification des phrases laquofigeacuteesraquo du franccedilaisIn Revue queacutebeacutecoise de linguistique volume 11 pages 151ndash185 Uni-versiteacute du Queacutebec agrave Montreacuteal

Gross M (1986) Lexicon-grammar the representation of compoundwords In Proceedings of the 11th coference on Computational lin-guistics pages 1ndash6 Association for Computational Linguistics

Gross M (1988) Les limites de la phrase figeacutee In Langages pages7ndash22 JSTOR

Gross M (1990) La caracteacuterisation des adverbes dans un lexique-grammaire In Langue franccedilaise pages 90ndash102 JSTOR

Gross M (1992b) The argument structure of elementary sentencesIn Language Research volume 28 pages 699ndash716

Gross M (1993) Les phrases figeacutees en franccedilais In Lrsquoinformationgrammaticale volume 59 pages 36ndash41 Peeters

BIBLIOGRAPHY 255

Gross M (1995) Une grammaire locale de lrsquoexpression des senti-ments In Langue franccedilaise pages 70ndash87 JSTOR

Gross M (1996) Les verbes supports drsquoadjectifs et le passif In Lan-gages volume 30 pages 8ndash18 Armand Colin

Gross M (1997) The construction of local grammars In Finite-statelanguage processing volume 11 page 329 MIT Press

Gruber J S (1965) Studies in lexical relations PhD thesis Mas-sachusetts Institute of Technology

Gruen T W Osmonbekov T and Czaplewski A J (2006) ewomThe impact of customer-to-customer online know-how exchangeon customer value and loyalty In Journal of Business research vol-ume 59 pages 449ndash456 Elsevier

Guglielmo D (2009) ldquoGuardarsi allo specchiordquo forme lessicali e sin-tattiche della vanitagrave In Atti del II Convegno Interdipartimentale delCentro Studi sulle Rappresentazioni Linguistiche

Guillet A and Leclegravere C (1981) Restructuration du groupe nominalIn Langages pages 99ndash125 JSTOR

Gutieacuterrez Y Vaacutezquez S and Montoyo A (2011) Sentiment classi-fication using semantic features extracted from WordNet-based re-sources In Proceedings of the 2nd Workshop on Computational Ap-proaches to Subjectivity and Sentiment Analysis pages 139ndash145 As-sociation for Computational Linguistics

Hanani U Shapira B and Shoval P (2001) Information filteringOverview of issues research and systems In User Modeling andUser-Adapted Interaction volume 11 pages 203ndash259 Kluwer Aca-demic Publishers

Hansen L K Arvidsson A Nielsen F Aring Colleoni E and Etter M(2011) Good friends bad news-affect and virality in twitter In Fu-ture information technology pages 34ndash43 Springer

256 BIBLIOGRAPHY

Harkema H Dowling J N Thornblade T and Chapman W W(2009) Context An algorithm for determining negation expe-riencer and temporal status from clinical reports In Journal ofbiomedical informatics volume 42 page 839 NIH Public Access

Harris Z (1988) Language and information Columbia UniversityPress

Harris Z S (1964) Transformations in linguistic structure In Pro-ceedings of the American Philosophical Society pages 418ndash422

Harris Z S (1970) Discourse analysis In Papers in structural andtransformational linguistics pages 313ndash347

Hassan A and Radev D (2010) Identifying text polarity using randomwalks In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics pages 395ndash403

Hatzivassiloglou V and McKeown K R (1997) Predicting the seman-tic orientation of adjectives In Proceedings of the 35th annual meet-ing of the association for computational linguistics and eighth con-ference of the european chapter of the association for computationallinguistics pages 174ndash181 Association for Computational Linguis-tics

Herlocker J L Konstan J A Borchers A and Riedl J (1999) An al-gorithmic framework for performing collaborative filtering In Pro-ceedings of the 22nd annual international ACM SIGIR conference onResearch and development in information retrieval pages 230ndash237ACM

Hernandez-Farias I Buscaldi D and Priego-Saacutenchez B (2014)Iradabe Adapting english lexicons to the italian sentiment polar-ity classification task In First Italian Conference on ComputationalLinguistics (CLiC-it 2014) and the fourth International WorkshopEVALITA2014 pages 75ndash81

BIBLIOGRAPHY 257

Horn L (1989) A natural history of negation University of ChicagoPress

Houen S (2011) Opinion mining with semantic analysis In Onlinewww diku dk forskning Publikationer specialer 2011

specialerapport_ final_ Soren_ Houen pdf

Hu M and Liu B (2004) Mining opinion features in customer re-views In AAAI volume 4 pages 755ndash760

Hu M and Liu B (2006) Opinion feature extraction using classsequential rules In AAAI Spring Symposium Computational Ap-proaches to Analyzing Weblogs pages 61ndash66

Iacobini C (2004) Prefissazione In La formazione delle parole initaliano Tuumlbingen Max Niemeyer Verlag pages 97ndash161

Jackendoff R (1972) Semantic interpretation in generative grammarMIT press Cambridge MA

Jackendoff R (1990) Semantic structures volume 18 MIT press

Jespersen O (1965) The philosophy of grammar Number 307 Uni-versity of Chicago Press

Jia L Yu C and Meng W (2009) The effect of negation on senti-ment analysis and retrieval effectiveness In Proceedings of the 18thACM conference on Information and knowledge management pages1827ndash1830 ACM

Jin X Li Y Mah T and Tong J (2007) Sensitive webpage classifica-tion for content advertising In Proceedings of the 1st internationalworkshop on Data mining and audience intelligence for advertisingpages 28ndash33 ACM

Jindal N and Liu B (2006a) Identifying comparative sentences intext documents In Proceedings of the 29th annual internationalACM SIGIR conference on Research and development in informationretrieval pages 244ndash251 ACM

258 BIBLIOGRAPHY

Jindal N and Liu B (2006b) Mining comparative sentences and re-lations In AAAI volume 22 pages 1331ndash1336

Jindal N and Liu B (2007) Review spam detection In Proceedings ofthe 16th international conference on World Wide Web pages 1189ndash1190 ACM

Justo R Corcoran T Lukin S M Walker M and Torres M I (2014)Extracting relevant knowledge for the detection of sarcasm and nas-tiness in the social web In Knowledge-Based Systems volume 69pages 124ndash133 Elsevier

Kaji N and Kitsuregawa M (2007) Building lexicon for sentimentanalysis from massive collection of html documents In EMNLP-CoNLL pages 1075ndash1083

Kamp H and Reyle U (1993) From discourse to logic Introduction tomodeltheoretic semantics of natural language formal logic and dis-course representation theory Number 42 Springer Science amp Busi-ness Media

Kamps J Marx M Mokken R J and De Rijke M (2004) Usingwordnet to measure semantic orientations of adjectives In LRECvolume 4 pages 1115ndash1118

Kanayama H and Nasukawa T (2006a) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In Proceedings ofthe 2006 Conference on Empirical Methods in Natural Language Pro-cessing pages 355ndash363 Association for Computational Linguistics

Kanayama H and Nasukawa T (2006b) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In COLING ACL2006 page 355

Kang H Yoo S J and Han D (2012) Senti-lexicon and improvednaiumlve bayes algorithms for sentiment analysis of restaurant reviews

BIBLIOGRAPHY 259

In Expert Systems with Applications volume 39 pages 6000ndash6010Elsevier

Kennedy A and Inkpen D (2006) Sentiment classification of moviereviews using contextual valence shifters In Computational intelli-gence volume 22 pages 110ndash125 Wiley Online Library

Khan K Baharudin B and Khan A Identifying product featuresfrom customer reviews using hybrid dependency patterns Citeseer

Khan K Baharudin B B and City T (2012) Identifying product fea-tures from customer reviews using lexical concordance In ResearchJournal of Applied Sciences Engineering and Technology volume 4pages 833ndash839

Kim S-M and Hovy E (2004) Determining the sentiment of opin-ions In Proceedings of the 20th international conference on Compu-tational Linguistics page 1367

Kim S-M and Hovy E (2005) Identifying opinion holders for ques-tion answering in opinion texts In Proceedings of AAAI-05 Workshopon Question Answering in Restricted Domains pages 1367ndash1373

Kim S-M and Hovy E (2006) Extracting opinions opinion holdersand topics expressed in online news media text In Proceedings ofthe Workshop on Sentiment and Subjectivity in Text pages 1ndash8 As-sociation for Computational Linguistics

Kim Y Kim S and Myaeng S-H (2008) Extracting topic-relatedopinions and their targets in NTCIR-7 In Proceedings of the 7th NT-CIR Workshop Meeting Tokyo Japan

Kim Y-j and Larson R (1989) Scope interpretation and the syntaxof psych-verbs In Linguistic Inquiry pages 681ndash688 JSTOR

Kingsbury P and Palmer M (2002) From treebank to propbank InLREC

260 BIBLIOGRAPHY

Klein B and Leffler K B (1981) The role of market forces in assuringcontractual performance In The Journal of Political Economy pages615ndash641 JSTOR

Klein K and Kutscher S (2002) Psych-verbs and lexical economy InTheorie des Lexikons volume 122 pages 1ndash41

Klein L R (1998) Evaluating the potential of interactive mediathrough a new lens Search versus experience goods In Journal ofbusiness research volume 41 pages 195ndash203 Elsevier

Kleinberg J M (1999) Authoritative sources in a hyperlinked envi-ronment In Journal of the ACM (JACM) volume 46 pages 604ndash632ACM

Kobayakawa T S Kumano T Tanaka H Okazaki N Kim J-D andTsujii J (2009) Opinion classification with tree kernel svm usinglinguistic modality analysis In Proceedings of the 18th ACM confer-ence on Information and knowledge management pages 1791ndash1794ACM

Ku L-W Huang T-H and Chen H-H (2009) Using morphologicaland syntactic structures for chinese opinion analysis In Proceedingsof the 2009 Conference on Empirical Methods in Natural LanguageProcessing Volume 3-Volume 3 pages 1260ndash1269

Lafferty J McCallum A and Pereira F C (2001) Conditional ran-dom fields Probabilistic models for segmenting and labeling se-quence data

Lakoff G (1973) Hedges A study in meaning criteria and the logicof fuzzy concepts In Journal of philosophical logic volume 2 pages458ndash508 Springer

Landau I (2010) The locative syntax of experiencers volume 34 MITpress Cambridge

BIBLIOGRAPHY 261

Landauer T K and Dumais S T (1997) A solution to platorsquos problemThe latent semantic analysis theory of acquisition induction andrepresentation of knowledge In Psychological review volume 104page 211 American Psychological Association

Laporte E (1997) Lrsquoanalyse de phrases adjectivales par reacutetablisse-ment de noms approprieacutes In Langages volume 31 pages 79ndash104Armand Colin

Laporte E (2012) Appropriate nouns with obligatory modifiers InarXiv preprint arXiv12074625

Laporte E and Voyatzi S (2008) An electronic dictionary of frenchmultiword adverbs In Language Resources and Evaluation Confer-ence Workshop Towards a Shared Task for Multiword Expressionspages 31ndash34

Larreya P (2004) Lrsquoexpression de la modaliteacute en franccedilais et enanglais (domaine verbal) In Revue belge de philologie et drsquohistoirevolume 82 pages 733ndash762 Socieacuteteacute pour le progregraves des eacutetudesphilologiques et historiques

Lavanda I and Rampa G (2001) Microeconomia scelte individuali ebenessere sociale Carocci

Laver M Benoit K and Garry J (2003) Extracting policy positionsfrom political texts using words as data In American Political Sci-ence Review volume 97 pages 311ndash331 Cambridge Univ Press

Le Pesant D and Mathieu-Colas M (1998) Introduction aux classesdrsquoobjets In Langages pages 6ndash33 JSTOR

Lee C M Narayanan S S and Pieraccini R (2002) Combiningacoustic and language information for emotion recognition In IN-TERSPEECH

262 BIBLIOGRAPHY

Lee M and Youn S (2009) Electronic word of mouth (ewom) howewom platforms influence consumer product judgement In Inter-national Journal of Advertising volume 28 pages 473ndash499 Taylor ampFrancis

Lenci A Lapesa G and Bonansinga G (2012) Lexit A computa-tional resource on italian argument structure In LREC pages 3712ndash3718

Leung C W Chan S C and Chung F-l (2006) Integrating collabo-rative filtering and sentiment analysis A rating inference approachIn Proceedings of the ECAI 2006 workshop on recommender systemspages 62ndash66

Leung C W Chan S C and Chung F-L (2008) Evaluation of a rat-ing inference approach to utilizing textual reviews for collaborativerecommendation In Cooperative Internet Computing World Scien-tific Publisher World Scientific

Li F Huang M and Zhu X (2010) Sentiment analysis with globaltopics and local dependency In AAAI

Linden G Smith B and York J (2003) Amazon com recommenda-tions Item-to-item collaborative filtering In Internet ComputingIEEE volume 7 pages 76ndash80 IEEE

Liu B (2010) Sentiment analysis and subjectivity In Handbook ofnatural language processing volume 2 pages 627ndash666 Chapman ampHall Goshen CT

Liu B (2012) Sentiment analysis and opinion mining In SynthesisLectures on Human Language Technologies volume 5 pages 1ndash167Morgan amp Claypool Publishers

Liu J and Seneff S (2009) Review sentiment scoring via a parse-and-paraphrase paradigm In Proceedings of the 2009 Conference on Em-pirical Methods in Natural Language Processing Volume 1-Volume1 pages 161ndash169 Association for Computational Linguistics

BIBLIOGRAPHY 263

Macdonald C Ounis I and Soboroff I (2007) Overview of the trec2007 blog track In TREC volume 7 pages 31ndash43

Mahmud A Ahmed K Z and Khan M (2008) Detecting flames andinsults in text In Proceedings of the Sixth International Conferenceon Natural Language Processing BRAC University

Maisto A and Pelosi S (2014a) Feature-based customer review sum-marization In On the Move to Meaningful Internet Systems OTM2014 Workshops pages 299ndash308

Maisto A and Pelosi S (2014b) A lexicon-based approach to senti-ment analysis the italian module for nooj In Proceedings of the In-ternational Nooj 2014 Conference University of Sassari Italy Cam-bridge Scholar Publishing

Maks I and Vossen P (2011) Different approaches to automatic po-larity annotation at synset level In Proceedings of the First Interna-tional Workshop on Lexical Resources WoLeR pages 62ndash69

Maks I Vossen P et al (2012) Building a fine-grained subjectivitylexicon from a web corpus In LREC pages 3070ndash3076

Markov A (1971) Extension of the limit theorems of probability theoryto a sum of variables connected in a chain John Wiley amp Sons Inc

Magraverquez L Carreras X Litkowski K C and Stevenson S (2008)Semantic role labeling an introduction to the special issue In Com-putational linguistics volume 34 pages 145ndash159 MIT Press

Martiacuten J (1998) The lexico-syntactic interface psych verbs andpsych nouns In Hispania pages 619ndash631 JSTOR

Mathieu Y Y (1995) Verbes psychologiques et interpreacutetation seacuteman-tique In Langue franccedilaise pages 98ndash106 JSTOR

Mathieu Y Y (1999a) Les preacutedicats de sentiment In Langages pages41ndash52 JSTOR

264 BIBLIOGRAPHY

Mathieu Y Y (1999b) Un classement seacutemantique des verbes psy-chologiques In Cahiers du CIEL Publications Paris 7

Mathieu Y Y (2006) A computational semantic lexicon of frenchverbs of emotion In Computing attitude and affect in text Theoryand applications pages 109ndash124 Springer

Matthews H Hendrickson C and Soh D (2001) Environmentaland economic effects of e-commerce A case study of book publish-ing and retail logistics In Transportation Research Record Journal ofthe Transportation Research Board number 1763 pages 6ndash12 Trans-portation Research Board of the National Academies

Maynard D and Greenwood M A (2014) Who cares about sarcastictweets investigating the impact of sarcasm on sentiment analysisIn Proceedings of LREC

Maziero E G Pardo T A Di Felippo A and Dias-da Silva B C(2008) A base de dados lexical ea interface web do tep 20 thesauruseletrocircnico para o portuguecircs do brasil In Companion Proceedingsof the XIV Brazilian Symposium on Multimedia and the Web pages390ndash392 ACM

McAfee A Brynjolfsson E Davenport T H Patil D and Barton D(2012) Big data In The management revolution Harvard Bus Revvolume 90 pages 61ndash67

Meier C (2003) The meaning of too enough and so that In NaturalLanguage Semantics volume 11 pages 69ndash107 Springer

Meillet A (1906) La Phrase nominale en indoeuropeacuteen Socieacuteteacute delinguistique

Mejova Y and Srinivasan P (2011) Exploring feature definition andselection for sentiment classifiers In ICWSM

BIBLIOGRAPHY 265

Meunier A (1984) La seacutemantique locative de certaines structuresN0 ecirctre adj In Revue queacutebeacutecoise de linguistique volume 13 pages95ndash121 Universiteacute du Queacutebec agrave Montreacuteal

Meunier A (1999) Une construction complexe N0hum ecirctre Adj deV0-inf W caracteeacuteristique de certains adjectifs aacute sujet humain InLangages pages 12ndash44 JSTOR

Meydan M (1996) Constructions adjectivales substantifs approprieacuteset verbes supports In Linx volume 34 pages 197ndash210 Centre derecherches linguistiques de Paris 10

Meydan M (1999) La restructuration du gn sujet dans les phrases ad-jectivales agrave substantif approprieacute In Langages pages 59ndash80 JSTOR

Miller G A (1995) WordNet a lexical database for english In Com-munications of the ACM volume 38 pages 39ndash41 ACM

Mohammad S Dunne C and Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked wordsand a thesaurus In Proceedings of the 2009 Conference on EmpiricalMethods in Natural Language Processing Volume 2-Volume 2 pages599ndash608 Association for Computational Linguistics

Moilanen K and Pulman S (2007) Sentiment composition In Pro-ceedings of the Recent Advances in Natural Language Processing In-ternational Conference pages 378ndash382

Moilanen K and Pulman S (2008) The good the bad and the un-known morphosyllabic sentiment tagging of unseen words In Pro-ceedings of the 46th Annual Meeting of the Association for Computa-tional Linguistics on Human Language Technologies Short Paperspages 109ndash112

Molinier C and Levrier F (2000) Grammaire des adverbes descrip-tion des formes en-ment volume 33 Librairie Droz

266 BIBLIOGRAPHY

Monachesi P (2009) Annotation of semantic roles In Utrechet Uni-versity Netherlands

Montefinese M Ambrosini E Fairfield B and Mammarella N(2014) The adaptation of the affective norms for english words(anew) for italian In Behavior research methods volume 46 pages887ndash903 Springer

Moon R (2008) Conventionalized as-similes in english a problemcase In International Journal of Corpus Linguistics volume 13pages 3ndash37 John Benjamins Publishing Company

Mulder M Nijholt A Den Uyl M and Terpstra P (2004) A lexi-cal grammatical implementation of affect In Text Speech and Dia-logue pages 171ndash177

Mullen T and Collier N (2004) Sentiment analysis using supportvector machines with diverse information sources In EMNLP vol-ume 4 pages 412ndash418

Mullen T and Malouf R (2006) A preliminary investigation into sen-timent analysis of informal political discourse In AAAI Spring Sym-posium Computational Approaches to Analyzing Weblogs pages159ndash162

Nakagawa T Inui K and Kurohashi S (2010) Dependency tree-based sentiment classification using crfs with hidden variables InHuman Language Technologies The 2010 Annual Conference of theNorth American Chapter of the Association for Computational Lin-guistics pages 786ndash794 Association for Computational Linguistics

Nakayama M Sutcliffe N and Wan Y (2010) Has the web trans-formed experience goods into search goods In Electronic Marketsvolume 20 pages 251ndash262 Springer

Narayanan R Liu B and Choudhary A (2009) Sentiment analy-sis of conditional sentences In Proceedings of the 2009 Conference

BIBLIOGRAPHY 267

on Empirical Methods in Natural Language Processing Volume 1-Volume 1 pages 180ndash189

Nasukawa T and Yi J (2003) Sentiment analysis Capturing favora-bility using natural language processing In Proceedings of the 2ndinternational conference on Knowledge capture pages 70ndash77 ACM

Navigli R and Ponzetto S P (2012) BabelNet The automatic con-struction evaluation and application of a wide-coverage multilin-gual semantic network volume 193 pages 217ndash250

Nelson P (1970) Information and consumer behavior In The Journalof Political Economy pages 311ndash329 JSTOR

Neviarouskaya A (2010) Compositional Approach for AutomaticRecognition of Fine-Grained Affect Judgment and Appreciation inText PhD thesis University of Tokyo

Neviarouskaya A Prendinger H and Ishizuka M (2007) Textualaffect sensing for sociable and expressive online communicationIn Affective Computing and Intelligent Interaction pages 218ndash229Springer

Neviarouskaya A Prendinger H and Ishizuka M (2009a) Com-positionality principle in recognition of fine-grained emotions fromtext In ICWSM

Neviarouskaya A Prendinger H and Ishizuka M (2009b) Senti-ful Generating a reliable lexicon for sentiment analysis In AffectiveComputing and Intelligent Interaction and Workshops 2009 ACII2009 3rd International Conference on pages 1ndash6 IEEE

Neviarouskaya A Prendinger H and Ishizuka M (2011) Sentiful Alexicon for sentiment analysis In Affective Computing IEEE Trans-actions on volume 2 pages 22ndash36 IEEE

268 BIBLIOGRAPHY

Niedenthal P M Halberstadt J B Margolin J and Innes-Ker Aring H(2000) Emotional state and the detection of change in facial ex-pression of emotion In European Journal of Social Psychology vol-ume 30 pages 211ndash222 Wiley Online Library

Nielsen F Aring (2011) A new anew Evaluation of a word list for senti-ment analysis in microblogs In arXiv preprint arXiv11032903

Norrick N R (1986) Stock similes In Journal of Literary Semanticsvolume 15 pages 39ndash52

Osgood C E (1952) The nature and measurement of meaning InPsychological bulletin volume 49 page 197 American PsychologicalAssociation

Pak A and Paroubek P (2011) Twitter for sentiment analysis Whenlanguage resources are not available In Database and Expert Sys-tems Applications (DEXA) 2011 22nd International Workshop onpages 111ndash115 IEEE

Pal P Iyer A N and Yantorno R E (2006) Emotion detection frominfant facial expressions and cries In Acoustics Speech and SignalProcessing 2006 ICASSP 2006 Proceedings 2006 IEEE InternationalConference on volume 2 pages IIndashII IEEE

Palmer M Gildea D and Kingsbury P (2005) The proposition bankAn annotated corpus of semantic roles In Computational linguis-tics volume 31 pages 71ndash106 MIT press

Palmer M Gildea D and Xue N (2010) Semantic role labelingIn Synthesis Lectures on Human Language Technologies volume 3pages 1ndash103 Morgan amp Claypool Publishers

Pang B and Lee L (2004) A sentimental education Sentiment anal-ysis using subjectivity summarization based on minimum cuts InProceedings of the 42nd annual meeting on Association for Compu-tational Linguistics page 271 Association for Computational Lin-guistics

BIBLIOGRAPHY 269

Pang B and Lee L (2005) Seeing stars Exploiting class relation-ships for sentiment categorization with respect to rating scales InProceedings of the 43rd Annual Meeting on Association for Compu-tational Linguistics pages 115ndash124 Association for ComputationalLinguistics

Pang B and Lee L (2008a) Opinion mining and sentiment analysisIn Foundations and trends in information retrieval volume 2 pages1ndash135 Now Publishers Inc

Pang B and Lee L (2008b) Using very simple statistics for reviewsearch An exploration In COLING (Posters) pages 75ndash78

Pang B Lee L and Vaithyanathan S (2002) Thumbs up senti-ment classification using machine learning techniques In Proceed-ings of the ACL-02 conference on Empirical methods in natural lan-guage processing-Volume 10 pages 79ndash86 Association for Compu-tational Linguistics

Pantic M and Patras I (2006) Dynamics of facial expression recog-nition of facial actions and their temporal segments from face pro-file image sequences In Systems Man and Cybernetics Part B Cy-bernetics IEEE Transactions on volume 36 pages 433ndash449 IEEE

Park C and Lee T M (2009) Information direction website reputa-tion and ewom effect A moderating role of product type In Journalof Business Research volume 62 pages 61ndash67 Elsevier

Paulo-Santos A Ramos C and Marques N C (2011) Determin-ing the polarity of words through a common online dictionary InProgress in Artificial Intelligence pages 649ndash663 Springer

Pelosi S (2015a) Morphological relations for the automatic expan-sion of italian sentiment lexicons In Proceedings of the Interna-tional Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts NooJ 2015

270 BIBLIOGRAPHY

Pelosi S (2015b) Sentita and doxa Italian databases and tools forsentiment analysis purposes In Proceedings of the Second ItalianConference on Computational Linguistics CLiC-it 2015 pages 226ndash231 Accademia University Press

Pelosi S Maisto A and Elia A (2015) Sentiment analysis throughfinite state automata In the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 - FSMNLP2015

Perez-Rosas V Banea C and Mihalcea R (2012) Learning senti-ment lexicons in spanish In LREC pages 3077ndash3081

Pesetsky D (1996) Zero syntax Experiencers and cascades Num-ber 27 MIT press

Pianta E Bentivogli L and Girardi C (2002) MultiWordNet devel-oping an aligned multilingual database In Proceedings of the firstinternational conference on global WordNet volume 152 pages 55ndash63

Piao S Ananiadou S Tsuruoka Y Sasaki Y and McNaught J(2007) Mining opinion polarity relations of citations In Interna-tional Workshop on Computational Semantics (IWCS) pages 366ndash371

Picabia L (1978) Les constructions adjectivales en franccedilais systeacutema-tique transformationnelle volume 11 Librairie Droz

Picard R W and Picard R (1997) Affective computing volume 252MIT press Cambridge

Pierre-Yves O (2003) The production and recognition of emotionsin speech features and algorithms In International Journal ofHuman-Computer Studies volume 59 pages 157ndash183 Elsevier

Plag I (2003) Word-formation in English Cambridge UniversityPress

BIBLIOGRAPHY 271

Polanyi L and Zaenen A (2006) Contextual valence shifters In Com-puting attitude and affect in text Theory and applications pages 1ndash10 Springer

Popescu A-M and Etzioni O (2007) Extracting product features andopinions from reviews In Natural language processing and text min-ing pages 9ndash28 Springer

Portner P (2009) Modality OUP Oxford

Potts C (2011) On the negativity of negation In Semantics and Lin-guistic Theory number 20 pages 636ndash659

Prabowo R and Thelwall M (2009) Sentiment analysis A combinedapproach In Journal of Informetrics volume 3 pages 143ndash157 Else-vier

Pradhan S Hacioglu K Ward W Martin J H and Jurafsky D(2003) Semantic role parsing Adding semantic structure to un-structured text In Data Mining 2003 ICDM 2003 Third IEEE In-ternational Conference on pages 629ndash632 IEEE

Qiu G Liu B Bu J and Chen C (2009) Expanding domain senti-ment lexicon through double propagation In IJCAI volume 9 pages1199ndash1204

Quirk R Crystal D and Education P (1985) A comprehensive gram-mar of the English language volume 397 Cambridge Univ Press

Rahimi Rastgar S and Razavi N (2013) A System for Building CorpusAnnotated With Semantic Roles PhD thesis

Rainer F (2004) Derivazione nominale deaggettivale In La for-mazione delle parole in italiano pages 293ndash314

Rao D and Ravichandran D (2009) Semi-supervised polarity lexiconinduction In Proceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Linguistics pages 675ndash682 Association for Computational Linguistics

272 BIBLIOGRAPHY

Rappaport Hovav M and Levin B (1998) Building verb meaningsIn The projection of arguments Lexical and compositional factorspages 97ndash134

Read J and Carroll J (2009) Weakly supervised techniques fordomain-independent sentiment classification In Proceedings of the1st international CIKM workshop on Topic-sentiment analysis formass opinion pages 45ndash52 ACM

Reinstein D A and Snyder C M (2005) The influence of expert re-views on consumer demand for experience goods A case study ofmovie critics In The journal of industrial economics volume 53pages 27ndash51 Wiley Online Library

Remus R Quasthoff U and Heyer G (2010) Sentiws-a pub-licly available german-language resource for sentiment analysis InLREC

Reyes A and Rosso P (2011) Mining subjective knowledge from cus-tomer reviews a specific case of irony detection In Proceedings ofthe 2nd Workshop on Computational Approaches to Subjectivity andSentiment Analysis pages 118ndash124 Association for ComputationalLinguistics

Reyes A Rosso P and Veale T (2013) A multidimensional approachfor detecting irony in twitter In Language Resources and Evaluationvolume 47 pages 239ndash268 Springer

Reynolds K Kontostathis A and Edwards L (2011) Using machinelearning to detect cyberbullying In Machine Learning and Applica-tions and Workshops (ICMLA) 2011 10th International Conferenceon volume 2 pages 241ndash244 IEEE

Ricca D (2004a) Aggettivi deverbali In La formazione delle parole initaliano Tuumlbingen Niemeyer pages 419ndash444

BIBLIOGRAPHY 273

Ricca D (2004b) Derivazione avverbiale In La formazione delle pa-role in italiano pages 472ndash489

Riloff E (1996) An empirical study of automated dictionary construc-tion for information extraction in three domains In Artificial intel-ligence volume 85 pages 101ndash134 Elsevier

Riloff E Patwardhan S and Wiebe J (2006) Feature subsumptionfor opinion analysis In Proceedings of the 2006 Conference on Em-pirical Methods in Natural Language Processing pages 440ndash448 As-sociation for Computational Linguistics

Riloff E Wiebe J and Wilson T (2003) Learning subjective nounsusing extraction pattern bootstrapping In Proceedings of the sev-enth conference on Natural language learning at HLT-NAACL 2003-Volume 4 pages 25ndash32 Association for Computational Linguistics

Rosenbaum P S (1967) The grammar of English predicate comple-ment constructions Research monograph 47 Cambridge MA MITPress

Rubin V L (2010) Epistemic modality From uncertainty to certaintyin the context of information seeking as interactions with texts InInformation Processing amp Management volume 46 pages 533ndash540Elsevier

Ruppenhofer J (2013) Anchoring sentiment analysis in frame se-mantics In Veredas pages 66ndash81

Ruppenhofer J Ellsworth M Petruck M R Johnson C R andScheffczyk J (2006) FrameNet II Extended theory and practice

Ruppenhofer J and Rehbein I (2012) Semantic frames as an an-chor representation for sentiment analysis In Proceedings of the 3rdWorkshop in Computational Approaches to Subjectivity and Senti-ment Analysis pages 104ndash109 Association for Computational Lin-guistics

274 BIBLIOGRAPHY

Ruppenhofer J Somasundaran S and Wiebe J (2008) Finding thesources and targets of subjective expressions In LREC

Russom P et al (2011) Big data analytics In TDWI Best PracticesReport Fourth Quarter

Ruwet N (1994) Ecirctre ou ne pas ecirctre un verbe de sentiment In Languefranccedilaise pages 45ndash55 JSTOR

Sagot B and Fort K (2007) Ameacuteliorer un lexique syntaxique agrave lrsquoaidedes tables du lexique-grammaire adverbes en-ment In 26e Col-loque International sur le Lexique et la grammaire 2007

Salvi G and Vanelli L (2004) Nuova grammatica italiana BolognaIl Mulino

Sarwar B Karypis G Konstan J and Riedl J (2001) Item-basedcollaborative filtering recommendation algorithms In Proceedingsof the 10th international conference on World Wide Web pages 285ndash295 ACM

Sauriacute R and Pustejovsky J (2009) Factbank A corpus annotated withevent factuality In Language resources and evaluation volume 43pages 227ndash268 Springer

Schmid H (1994) Probabilistic part-of-speech tagging using decisiontrees In Proceedings of the international conference on new methodsin language processing volume 12 pages 44ndash49

Schmid H Baroni M Zanchetta E and Stein A (2007) The en-riched treetagger system In proceedings of the EVALITA 2007 work-shop

Schuler K K (2005) VerbNet A broad-coverage comprehensive verblexicon PhD thesis Computer and Information Science Dept Uni-versity of Pennsylvania

BIBLIOGRAPHY 275

Schuller B Rigoll G and Lang M (2003) Hidden markov model-based speech emotion recognition In Acoustics Speech and SignalProcessing 2003 Proceedings(ICASSPrsquo03) 2003 IEEE InternationalConference on volume 2 pages IIndash1 IEEE

Schuller B Rigoll G and Lang M (2004) Speech emotion recogni-tion combining acoustic features and linguistic information in a hy-brid support vector machine-belief network architecture In Acous-tics Speech and Signal Processing 2004 Proceedings(ICASSPrsquo04)IEEE International Conference on volume 1 pages Indash577 IEEE

Schwarzschild R (2008) The semantics of comparatives and otherdegree constructions In Language and Linguistics Compass vol-ume 2 pages 308ndash331 Wiley Online Library

Seki Y Eguchi K Kando N and Aono M (2005) Multi-documentsummarization with subjectivity analysis at duc 2005 In Proceed-ings of the Document Understanding Conference (DUC)

Seki Y Evans D K Ku L-W Chen H-H Kando N and Lin C-Y(2007) Overview of opinion analysis pilot task at ntcir-6 In Proceed-ings of NTCIR-6 Workshop Meeting pages 265ndash278

Shaikh M A M Prendinger H and Mitsuru I (2007) Assessingsentiment of text by semantic dependency and contextual valenceanalysis In Affective Computing and Intelligent Interaction pages191ndash202 Springer

Shapiro C (1982) Consumer information product quality and sellerreputation In The Bell Journal of Economics pages 20ndash35 JSTOR

Shapiro C (1983) Premiums for high quality products as returns toreputations In The quarterly journal of economics pages 659ndash679JSTOR

Shimada K and Endo T (2008) Seeing several stars A rating in-ference task for a document containing several evaluation criteria

276 BIBLIOGRAPHY

In Advances in Knowledge Discovery and Data Mining pages 1006ndash1014 Springer

Silberztein M (2003) Nooj manual In Available for download atwww nooj4nlp net

Silberztein M (2008) Complex annotations with nooj In Proceedingsof the 2007 International NooJ Conference pages pndash214 CambridgeScholars Publishing

Silva M J Carvalho P Costa C and Sarmento L (2010) Automaticexpansion of a social judgment lexicon for sentiment analysis Tech-nical Report TR 1008

Silva M J Carvalho P and Sarmento L (2012) Building a sentimentlexicon for social judgement mining In Computational Processingof the Portuguese Language pages 218ndash228 Springer

Singhal P and Bansal A (2013) Improved textual cyberbullying de-tection using data mining In International Journal of Informationand Computation Technology ISSN pages 0974ndash2239

Snyder B and Barzilay R (2007) Multiple aspect ranking using thegood grief algorithm In HLT-NAACL pages 300ndash307

Socher R Perelygin A Wu J Y Chuang J Manning C D Ng A Yand Potts C (2013) Recursive deep models for semantic composi-tionality over a sentiment treebank In Proceedings of the conferenceon empirical methods in natural language processing (EMNLP) vol-ume 1631 page 1642

Somprasertsri G and Lalitrojwong P (2010) Mining feature-opinionin online customer reviews for opinion summarization In J UCSvolume 16 pages 938ndash955

Souza M Vieira R Busetti D Chishman R Alves I M et al(2011) Construction of a portuguese opinion lexicon from multi-ple resources In STIL

BIBLIOGRAPHY 277

Steinberger J Ebrahim M Ehrmann M Hurriyetoglu A KabadjovM Lenkova P Steinberger R Tanev H Vaacutezquez S and ZavarellaV (2012) Creating sentiment dictionaries via triangulation In Deci-sion Support Systems volume 53 pages 689ndash694 Elsevier

Stigler G J (1961) The economics of information In The journal ofpolitical economy pages 213ndash225 JSTOR

Stone P J Dunphy D C and Smith M S (1966) The General In-quirer A Computer Approach to Content Analysis MIT press

Stone P J and Hunt E B (1963) A computer approach to contentanalysis studies using the general inquirer system In Proceedingsof the May 21-23 1963 spring joint computer conference pages 241ndash256 ACM

Strapparava C and Mihalcea R (2008) Learning to identify emo-tions in text In Proceedings of the 2008 ACM symposium on Appliedcomputing pages 1556ndash1560 ACM

Strapparava C Valitutti A et al (2004) WordNet-Affect an affectiveextension of WordNet In LREC volume 4 pages 1083ndash1086

Strapparava C Valitutti A and Stock O (2006) The affective weightof lexicon In Proceedings of the fifth international conference on lan-guage resources and evaluation pages 423ndash426

Swier R S and Stevenson S (2004) Unsupervised semantic role la-belling In Proceedings of EMNLP volume 95 page 102

Taboada M Anthony C and Voll K (2006) Methods for creatingsemantic orientation dictionaries In Proceedings of the 5th Inter-national Conference on Language Resources and Evaluation (LREC)Genova Italy pages 427ndash432

Taboada M Brooke J Tofiloski M Voll K and Stede M (2011)Lexicon-based methods for sentiment analysis In Computationallinguistics volume 37 pages 267ndash307 MIT Press

278 BIBLIOGRAPHY

Taboada M and Grieve J (2004) Analyzing appraisal automaticallyIn Proceedings of AAAI Spring Symposium on Exploring Attitude andAffect in Text (AAAI Technical Re port SS 04 07) Stanford Univer-sity CA pp 158q161 AAAI Press

Tan S Cheng X Wang Y and Xu H (2009) Adapting naive bayesto domain adaptation for sentiment analysis In Advances in Infor-mation Retrieval pages 337ndash349 Springer

Taylor A (1954) Proverbial comparisons and similes from Californiavolume 3 University of California Press

Tepperman J Traum D R and Narayanan S (2006) yeahright sarcasm recognition for spoken dialogue systems In INTER-SPEECH

Terveen L Hill W Amento B McDonald D and Creter J (1997)Phoaks A system for sharing recommendations In Communica-tions of the ACM volume 40 pages 59ndash62 ACM

Tesniegravere L (1959) Eleacutements de syntaxe structurale Librairie C Klinck-sieck

Thomas M Pang B and Lee L (2006) Get out the vote Deter-mining support or opposition from congressional floor-debate tran-scripts In Proceedings of the 2006 conference on empirical methodsin natural language processing pages 327ndash335 Association for Com-putational Linguistics

Tolone E and Stavroula V (2011) Extending the adverbial coverage ofa nlp oriented resource for french In arXiv preprint arXiv11113462

Tsur O Davidov D and Rappoport A (2010) Icwsm-a great catchyname Semi-supervised recognition of sarcastic sentences in onlineproduct reviews In ICWSM

Turney P D (2001) Mining the web for synonyms Pmi-ir versus lsaon toefl In Machine Learning ECML 2001 pages 491ndash502 Springer

BIBLIOGRAPHY 279

Turney P D (2002) Thumbs up or thumbs down semantic orienta-tion applied to unsupervised classification of reviews In Proceed-ings of the 40th annual meeting on association for computationallinguistics pages 417ndash424

Turney P D and Littman M L (2003) Measuring praise and criti-cism Inference of semantic orientation from association In ACMTransactions on Information Systems (TOIS) volume 21 pages 315ndash346

Veale T and Hao Y (2009) Support structures for linguistic creativityA computational analysis of creative irony in similes In Proceedingsof CogSci pages 1376ndash1381

Velikovich L Blair-Goldensohn S Hannan K and McDonald R(2010) The viability of web-derived polarity lexicons In HumanLanguage Technologies The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguisticspages 777ndash785 Association for Computational Linguistics

Vermeij M (2005) The orientation of user opinions through adverbsverbs and nouns In 3rd Twente Student Conference on IT EnschedeJune

Vietri S (1990) On some comparative frozen sentences in italianIn Lingvisticaelig Investigationes volume 14 pages 149ndash174 John Ben-jamins Publishing Company

Vietri S (2004) Lessico-grammatica dellrsquoitaliano Metodi descrizionie applicazioni UTET Universitagrave

Vietri S (2011) On a class of italian frozen sentences In LingvisticaeligInvestigationes volume 34 pages 228ndash267 John Benjamins Publish-ing Company

Vietri S (2014a) The construction of an annotated corpus for theanalysis of italian transfer predicates In Lingvisticae Investigationesvolume 37 pages 69ndash105 John Benjamins Publishing Company

280 BIBLIOGRAPHY

Vietri S (2014b) Idiomatic Constructions in Italian A Lexicon-grammar Approach volume 31 John Benjamins Publishing Com-pany

Vietri S (2014c) The italian module for nooj In In Proceedings of theFirst Italian Conference on Computational Linguistics CLiC-it 2014Pisa University Press

Vietri S (2014d) The lexicon-grammar of italian idioms In Work-shop on Lexical and Grammatical Resources for Language Process-ing page 137

Villars R L Olofson C W and Eastwood M (2011) Big data Whatit is and why you should care In White Paper IDC

Vincze V (2010) Speculation and negation annotation in naturallanguage texts what the case of bioscope might (not) reveal InNegation and Speculation in Natural Language Processing (NeSp-NLP 2010) page 28

Vincze V Szarvas G Farkas R Moacutera G and Csirik J (2008) Thebioscope corpus biomedical texts annotated for uncertainty nega-tion and their scopes In BMC bioinformatics volume 9 page S9BioMed Central Ltd

Voghera M (2004) Polirematiche In Grossmann M Rainer F(a curadi) La formazione delle parole in italiano Tbingen Niemeyer pages56ndash69

Voll K and Taboada M (2007) Not all words are created equal Ex-tracting semantic orientation as a function of adjective relevance InAI 2007 Advances in Artificial Intelligence pages 337ndash346 Springer

Vollero A (2010) E-marketing e Web communication Verso la gestionedella corporate reputation online Giappichelli Torino

BIBLIOGRAPHY 281

von Ungern-Sternberg T and von Weizsaumlcker C C (1985) The sup-ply of quality on a market for experience goods In The Journal ofIndustrial Economics pages 531ndash540 JSTOR

Voyatzi S (2006) Description morphosyntaxique et seacutemantique desadverbes figeacutes en vue drsquoun systeacuteme drsquoanalyse automatique des textesgrecs PhD thesis Universiteacute Paris-Est

Waltinger U (2010) Germanpolarityclues A lexical resource for ger-man sentiment analysis In LREC

Wan X (2009) Co-training for cross-lingual sentiment classificationIn Proceedings of the Joint Conference of the 47th Annual Meeting ofthe ACL and the 4th International Joint Conference on Natural Lan-guage Processing of the AFNLP Volume 1 pages 235ndash243 Associa-tion for Computational Linguistics

Wang X Zhao Y and Fu G (2011) A morpheme-based method tochinese sentence-level sentiment classification In Int J of AsianLang Proc volume 21 pages 95ndash106

Warriner A B Kuperman V and Brysbaert M (2013) Norms of va-lence arousal and dominance for 13915 english lemmas In Behav-ior research methods volume 45 pages 1191ndash1207 Springer

Wawer A (2012) Extracting emotive patterns for languages with richmorphology In International Journal of Computational Linguisticsand Applications volume 3 pages 11ndash24

Wei C-P Chen Y-M Yang C-S and Yang C C (2010) Un-derstanding what concerns consumers a semantic approach toproduct feature extraction from consumer reviews In Informa-tion Systems and E-Business Management volume 8 pages 149ndash167Springer

Whissel C (1989) The dictionary of affect in language emotion The-ory research and experience vol 4 the measurement of emotionsr In Plutchik and H Kellerman Eds New York Academic

282 BIBLIOGRAPHY

Whissell C (2009) Using the revised dictionary of affect in languageto quantify the emotional undertones of samples of natural lan-guage 1 2 In Psychological reports volume 105 pages 509ndash521 Am-mons Scientific

Whitley M S (1995) Gustar and other psych verbs a problem in tran-sitivity In Hispania pages 573ndash585 JSTOR

Wiebe J (2000) Learning subjective adjectives from corpora InAAAIIAAI pages 735ndash740

Wiebe J Wilson T Bruce R Bell M and Martin M (2004) Learn-ing subjective language In Computational linguistics volume 30pages 277ndash308 MIT Press

Wiebe J Wilson T and Cardie C (2005) Annotating expressionsof opinions and emotions in language In Language resources andevaluation volume 39 pages 165ndash210 Springer

Wiegand M Balahur A Roth B Klakow D and Montoyo A (2010)A survey on the role of negation in sentiment analysis In Proceed-ings of the workshop on negation and speculation in natural lan-guage processing pages 60ndash68 Association for Computational Lin-guistics

Wilson D and Sperber D (1992) On verbal irony In Lingua vol-ume 87 pages 53ndash76 Elsevier

Wilson T Wiebe J and Hoffmann P (2005) Recognizing contex-tual polarity in phrase-level sentiment analysis In Proceedings ofthe conference on human language technology and empirical meth-ods in natural language processing pages 347ndash354 Association forComputational Linguistics

Wilson T Wiebe J and Hoffmann P (2009) Recognizing contextualpolarity An exploration of features for phrase-level sentiment anal-ysis In Computational linguistics volume 35 pages 399ndash433 MITPress

BIBLIOGRAPHY 283

Xia R and Zong C (2010) Exploring the use of word relation featuresfor sentiment classification In Proceedings of the 23rd InternationalConference on Computational Linguistics Posters pages 1336ndash1344Association for Computational Linguistics

Xiang G Fan B Wang L Hong J and Rose C (2012) Detecting of-fensive tweets via topical feature discovery over a large scale twittercorpus In Proceedings of the 21st ACM international conference onInformation and knowledge management pages 1980ndash1984 ACM

Xu Z and Zhu S (2010) Filtering offensive language in online com-munities using grammatical relations In Proceedings of the SeventhAnnual Collaboration Electronic Messaging Anti-Abuse and SpamConference

Yang S and Ko Y (2011) Extracting comparative entities and pred-icates from texts using comparative type classification In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies-Volume 1 pages 1636ndash1644 Association for Computational Linguistics

Ye Q Law R Gu B and Chen W (2011) The influence of user-generated content on traveler behavior An empirical investigationon the effects of e-word-of-mouth to hotel online bookings In Com-puters in Human Behavior volume 27 pages 634ndash639 Elsevier

Ye Q Zhang Z and Law R (2009) Sentiment classification of on-line reviews to travel destinations by supervised machine learningapproaches In Expert Systems with Applications volume 36 pages6527ndash6535 Elsevier

Yessenalina A and Cardie C (2011) Compositional matrix-spacemodels for sentiment analysis In Proceedings of the Conference onEmpirical Methods in Natural Language Processing pages 172ndash182Association for Computational Linguistics

284 BIBLIOGRAPHY

Yi J Nasukawa T Bunescu R and Niblack W (2003) Sentimentanalyzer Extracting sentiments about a given topic using naturallanguage processing techniques In Data Mining 2003 ICDM 2003Third IEEE International Conference on pages 427ndash434

Yin D Xue Z Hong L Davison B D Kontostathis A and Ed-wards L (2009) Detection of harassment on web 20 In Proceedingsof the Content Analysis in the WEB volume 2 pages 1ndash7

Yuen R W Chan T Y Lai T B Kwong O Y and Trsquosou B K(2004) Morpheme-based derivation of bipolar semantic orientationof chinese words In Proceedings of the 20th international conferenceon Computational Linguistics page 1008 Association for Computa-tional Linguistics

Zhang L and Liu B (2011) Identifying noun product features thatimply opinions In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human Language Tech-nologies short papers-Volume 2 pages 575ndash580 Association forComputational Linguistics

Zhang L Liu B Lim S H and OrsquoBrien-Strain E (2010) Extractingand ranking product features in opinion documents In Proceed-ings of the 23rd international conference on computational linguis-tics Posters pages 1462ndash1470 Association for Computational Lin-guistics

Zhao Q Sun C Liu B and Cheng Y (2010) Learning to detecthedges and their scope using crf In Proceedings of the FourteenthConference on Computational Natural Language LearningmdashSharedTask pages 100ndash105 Association for Computational Linguistics

Zhou N (2010) Taboo Language on the Internet An Analysis of Gen-der Differences in Using Taboo Language PhD thesis KristianstadUniversity

  • Introduction
    • Research Context
    • Filtering Information Automatically
    • Subjectivity Emotions and Opinions
    • Structure of the Thesis
      • Theoretical Background
        • The Lexicon-Grammar Theoretical Framework
          • Distributional and Transformational Analysis
          • Formal Representation of Elementary Sentences
            • The Continuum from Simple Sentences to Polyrematic Structures
            • Against the Bipartition of the Sentence Structure
              • Lexicon-Grammar Binary Matrices
              • Syntax-Semantics Interface
                • Frame Semantics
                • Semantic Predicates Theory
                    • The Italian Module of Nooj
                      • Electronic Dictionaries
                      • Local grammars and Finite-state Automata
                          • SentIta Lexicon-Grammar based Sentiment Resources
                            • Lexicon-Grammar of Sentiment Expressions
                            • Literature Survey on Subjectivity Lexicons
                              • Polarity Lexicons
                              • Affective Lexicons
                              • Subjectivity Lexicons for Other Languages
                                • Italian Resources
                                • Sentiment Lexical Resources for Different Languages
                                    • SentIta and its Manually-built Resources
                                      • Semantic Orientations and Prior Polarities
                                      • Adjectives of Sentiment
                                        • A Special Remark on Negative Words in Dictionaries and Positive Biases in Corpora
                                          • Psychological and other Opinion Bearing Verbs
                                            • A Brief Literature Survey on the Classification of Psych Verbs
                                            • Semantic Predicates from the LG Framework
                                            • Psychological Predicates
                                            • Other Verbs of Sentiment
                                              • Psychological Predicates Nominalizations
                                                • The Presence of Support Verbs
                                                • The Absence of Support Verbs
                                                • Ordinary Verbs Structures
                                                • Standard-Crossed Structures
                                                  • Psychological Predicates Adjectivalizations
                                                    • Support Verbs
                                                    • Nominal groups with appropriate nouns Napp
                                                    • Verbless Structures
                                                    • Evaluation Verbs
                                                      • Vulgar Lexicon
                                                        • Towards a Lexicon of Opinionated Compounds
                                                          • Compound Adverbs and their Polarities
                                                            • Compound Adjectives of Sentiment
                                                              • Frozen Expressions of Sentiment
                                                                • The Lexicon-Grammar Classes Under Examination PECO CEAC EAA ECA and EAPC
                                                                • The Formalization of Sentiment Frozen Expressions
                                                                    • The Automatic Expansion of SentIta
                                                                      • Literature Survey on Sentiment Lexicon Propagation
                                                                        • Thesaurus Based Approaches
                                                                        • Corpus Based Approaches
                                                                        • The Morphological Approach
                                                                          • Deadjectival Adverbs in -mente
                                                                            • Semantics of the -mente formations
                                                                            • The Superlative of the -mente Adverbs
                                                                              • Deadjectival Nouns of Quality
                                                                              • Morphological Semantics
                                                                                  • Shifting Lexical Valences through FSA
                                                                                    • Contextual Valence Shifters
                                                                                    • Polarity Intensification and Downtoning
                                                                                      • Related Works
                                                                                      • Intensification and Downtoning in SentIta
                                                                                      • Excess Quantifiers
                                                                                        • Negation Modeling
                                                                                          • Related Works
                                                                                          • Negation in SentIta
                                                                                            • Modality Detection
                                                                                              • Related Works
                                                                                              • Modality in SentIta
                                                                                                • Comparative Sentences Mining
                                                                                                  • Related Works
                                                                                                  • Comparison in SentIta
                                                                                                  • Interaction with the Intensification Rules
                                                                                                  • Interaction with the Negation Rules
                                                                                                      • Specific Tasks of the Sentiment Analysis
                                                                                                        • Sentiment Polarity Classification
                                                                                                          • Related Works
                                                                                                          • DOXA a Sentiment Classifier based on FSA
                                                                                                          • A Dataset of Opinionated Reviews
                                                                                                          • Experiment and Evaluation
                                                                                                            • Sentence-level Sentiment Analysis
                                                                                                            • Document-level Sentiment Analysis
                                                                                                                • Feature-based Sentiment Analysis
                                                                                                                  • Related Works
                                                                                                                  • Experiment and Evaluation
                                                                                                                  • Opinion Visual Summarization
                                                                                                                    • Sentiment Role Labeling
                                                                                                                      • Related Works
                                                                                                                      • Experiment and Evaluation
                                                                                                                          • Conclusion
                                                                                                                          • Italian Lexicon-Grammar Symbols
                                                                                                                          • Acronyms
                                                                                                                          • Bibliography
Page 2: Detecting Subjectivity through Lexicon-Grammar Strategies

To Mariashe is in every

elementary sentenceof my mind

Contents

1 Introduction 111 Research Context 212 Filtering Information Automatically 513 Subjectivity Emotions and Opinions 714 Structure of the Thesis 9

2 Theoretical Background 1321 The Lexicon-Grammar Theoretical Framework 13

211 Distributional and Transformational Analysis 16212 Formal Representation of Elementary Sentences 19

2121 The Continuum from Simple Sentences to Polyrematic Structures 212122 Against the Bipartition of the Sentence Structure 22

213 Lexicon-Grammar Binary Matrices 25214 Syntax-Semantics Interface 27

2141 Frame Semantics 292142 Semantic Predicates Theory 30

22 The Italian Module of Nooj 34221 Electronic Dictionaries 35222 Local grammars and Finite-state Automata 36

3 SentIta Lexicon-Grammar based Sentiment Resources 3931 Lexicon-Grammar of Sentiment Expressions 3932 Literature Survey on Subjectivity Lexicons 43

321 Polarity Lexicons 46322 Affective Lexicons 51323 Subjectivity Lexicons for Other Languages 53

3231 Italian Resources 533232 Sentiment Lexical Resources for Different Languages 55

33 SentIta and its Manually-built Resources 56

v

vi CONTENTS

331 Semantic Orientations and Prior Polarities 58332 Adjectives of Sentiment 60

3321 A Special Remark on Negative Words in Dictionaries and Posi-tive Biases in Corpora 62

333 Psychological and other Opinion Bearing Verbs 633331 A Brief Literature Survey on the Classification of Psych Verbs 633332 Semantic Predicates from the LG Framework 643333 Psychological Predicates 663334 Other Verbs of Sentiment 78

334 Psychological Predicatesrsquo Nominalizations 793341 The Presence of Support Verbs 843342 The Absence of Support Verbs 853343 Ordinary Verbs Structures 863344 Standard-Crossed Structures 89

335 Psychological Predicatesrsquo Adjectivalizations 893351 Support Verbs 953352 Nominal groups with appropriate nouns Napp 973353 Verbless Structures 1003354 Evaluation Verbs 101

336 Vulgar Lexicon 10234 Towards a Lexicon of Opinionated Compounds 107

341 Compound Adverbs and their Polarities 1073411 Compound Adjectives of Sentiment 111

342 Frozen Expressions of Sentiment 1123421 The Lexicon-Grammar Classes Under Examination PECO

CEAC EAA ECA and EAPC 1153422 The Formalization of Sentiment Frozen Expressions 118

35 The Automatic Expansion of SentIta 125351 Literature Survey on Sentiment Lexicon Propagation 126

3511 Thesaurus Based Approaches 1273512 Corpus Based Approaches 1303513 The Morphological Approach 132

352 Deadjectival Adverbs in -mente 1353521 Semantics of the -mente formations 1393522 The Superlative of the -mente Adverbs 141

353 Deadjectival Nouns of Quality 141354 Morphological Semantics 147

4 Shifting Lexical Valences through FSA 15341 Contextual Valence Shifters 15342 Polarity Intensification and Downtoning 154

421 Related Works 154

CONTENTS vii

422 Intensification and Downtoning in SentIta 155423 Excess Quantifiers 158

43 Negation Modeling 160431 Related Works 160432 Negation in SentIta 163

44 Modality Detection 170441 Related Works 170442 Modality in SentIta 172

45 Comparative Sentences Mining 175451 Related Works 175452 Comparison in SentIta 178453 Interaction with the Intensification Rules 179454 Interaction with the Negation Rules 182

5 Specific Tasks of the Sentiment Analysis 18551 Sentiment Polarity Classification 185

511 Related Works 186512 DOXA a Sentiment Classifier based on FSA 188513 A Dataset of Opinionated Reviews 189514 Experiment and Evaluation 191

5141 Sentence-level Sentiment Analysis 1925142 Document-level Sentiment Analysis 195

52 Feature-based Sentiment Analysis 197521 Related Works 198522 Experiment and Evaluation 203523 Opinion Visual Summarization 210

53 Sentiment Role Labeling 211531 Related Works 213532 Experiment and Evaluation 215

6 Conclusion 223

A Italian Lexicon-Grammar Symbols 229

B Acronyms 231

Bibliography 239

Chapter 1

Introduction

Consumers as Internet users can freely share their thoughts withhuge and geographically dispersed groups of people competing thisway with the traditional power of marketing and advertising channelsDifferently from the traditional word-of-mouth which is usually lim-ited to private conversations the Internet used generated contentscan be directly observed and described by the researchers (Pang andLee 2008a p 7) identified the 2001 as the ldquobeginning of widespreadawareness of the research problems and opportunities that SentimentAnalysis and Opinion Mining raise and subsequently there have beenliterally hundreds of papers published on the subjectrdquoThe context that facilitated the growth of interest around the auto-matic treatment of opinion and emotions can be summarized in thefollowing points

bull the expansion of the e-commerce (Matthews et al 2001)

bull the growth of the user generated contents (forum discussiongroup blog social media review website aggregation site) thatcan constitute large scale databases for the machine learning al-gorithms training (Chin 2005 Pang and Lee 2008a)

1

2 CHAPTER 1 INTRODUCTION

bull the rise of machine learning methods (Pang and Lee 2008a)

bull the importance of the on-line Word of Mouth (eWOM) (Gruenet al 2006 Lee and Youn 2009 Park and Lee 2009 Chu and Kim2011)

bull the customer empowerment (Vollero 2010)

bull the large volume the high velocity and the wide variety of un-structured data (Russom et al 2011 Villars et al 2011 McAfeeet al 2012)

The same preconditions caused a parallel raising of attention alsoin economics and marketing literature studies Basically through usergenerated contents consumers share positive or negative informationthat can influence in many different ways the purchase decisionsand can model the buyer expectations above all with regard to experi-ence goods (Nakayama et al 2010) such as hotels (Ye et al 2011 Nel-son 1970) restaurants (Zhang et al 2010) movies (Duan et al 2008Reinstein and Snyder 2005) books (Chevalier and Mayzlin 2006) orvideogames (Zhu and Zhang 2006 Bounie et al 2005)

11 Research Context

Experience goods are products or services whose qualities cannot beobserved prior to purchase They are distinguished from search goods(eg clothes smartphones) on the base of the possibility to test thembefore the purchase and depending on the information costs (Nelson1970) (Akerlof 1970 p 488-490) clearly exemplified the problemscaused by the interaction of quality differences and uncertainty

ldquoThere are many markets in which buyers use some marketstatistic to judge the quality of prospective purchases []The automobile market is used as a finger exercise to illus-trate and develop these thoughts [] The individuals in this

11 RESEARCH CONTEXT 3

market buy a new automobile without knowing whether thecar they buy will be good or a lemon1 [] After owninga specific car however for a length of time the car ownercan form a good idea of the quality of this machine [] Anasymmetry in available information has developed for thesellers now have more knowledge about the quality of a carthan the buyers [] The bad cars sell at the same price asgood cars since it is impossible for a buyer to tell the differ-ence between a good and a bad car only the seller knowsrdquo

Asymmetric information could damage both the customers whenthey are not able to recognize good quality products and servicesand the market itself when it is negatively affected by the presenceof a smaller quantity of quality products with respect to a quantityof quality products that would have been offered under the conditionof symmetrical information (Lavanda and Rampa 2001 von Ungern-Sternberg and von Weizsaumlcker 1985)However it has been deemed possible for experience goods to basean evaluation of their quality on the past experiences of customersthat in previous ldquoperiodsrdquo had already experienced the same goods

ldquoIn most real world situations suppliers stay on the marketfor considerably longer than one lsquoperiodrsquo They then havethe possibility to build up reputations or goodwill with theconsumers This is due to the following mechanism Whilethe consumers cannot directly observe a productrsquos quality atthe time of purchase they may try to draw inferences aboutthis quality from the past experience they (or others) havehad with this supplierrsquos productsrdquo (von Ungern-Sternbergand von Weizsaumlcker 1985 p 531)

Possible sources of information for buyers of experience goods areword of mouth and the good reputation of the sellers (von Ungern-Sternberg and von Weizsaumlcker 1985 Diamond 1989 Klein and Lef-

1A defective car discovered to be bad only after it has been bought

4 CHAPTER 1 INTRODUCTION

fler 1981 Shapiro 1982 1983 Dewally and Ederington 2006)The possibility to survey other customersrsquo experiences increases thechance of selecting a good quality service and the customer expectedutility It is affordable to continue with this search until the cost ofsuch is equated to its expected marginal return This is the ldquooptimumamount of searchrdquo defined by Stigler (1961)The rapid growth of the Internet drew the managers and business aca-demics attention to the possible influences that this medium can ex-ert on customers information search behaviors and acquisition pro-cesses The hypothesis that the Web 20 as interactive medium couldmake transparent the experience goods market by transforming it intoa search goods market has been explored by Klein (1998) who hasbased his idea on three factors

bull lower search costs

bull different evaluation of some product (attributes)

bull possibility to experience products (attributes) virtually withoutphysically inspecting them

Actually this idea seems too groundbreaking to be entirely true In-deed the real outcome of web searches depends on the quantity ofavailable information on its quality and on the possibility to makefruitful comparisons among alternatives (Alba et al 1997 Nakayamaet al 2010)Therefore from this perspective it is true that the growth of the usergenerated contents and the eWOM can truly reduce the informa-tion search costs On the other hand the distance increased by e-commerce the content explosion and the information overload typi-cal of the Big Data age can seriously hinder the achievement of a sym-metrical distribution of the information affecting not only the marketof experience goods but also that of search goods

12 FILTERING INFORMATION AUTOMATICALLY 5

12 Filtering Information Automatically

The last remarks on the influence of Internet on information searchesinevitably reopen the long-standing problems connected to web in-formation filtering (Hanani et al 2001) and to the possibility of auto-matically ldquoextracting value from chaosrdquo (Gantz and Reinsel 2011)An appropriate management of online corporate reputation requiresa careful monitoring of the new digital environments that strengthenthe stakeholdersrsquo influence and independence and give support dur-ing the decision making process As an example to deeply analyzethem can indicate to a company the areas that need improvementsFurthermore they can be a valid support in the price determinationor in the demand prediction (Bloom 2011 Pang and Lee 2008a)It would be difficult indeed for humans to read and summarize sucha huge volume of data about costumer opinions However in otherrespects to introduce machines to the semantic dimension of humanlanguage remains an ongoing challengeIn this context business and intelligence applications would play acrucial role in the ability to automatically analyze and summarize notonly databases but also raw data in real timeThe largest amount of on-line data is semistructured or unstructuredand as a result its monitoring requires sophisticated Natural Lan-guage Processing (NLP) tools that must be able to pre-process themfrom their linguistic point of view and then automatically accesstheir semantic contentTraditional NLP tasks consist in Data Mining Information Retrievaland Extraction of facts objective information about entities fromsemistructured and unstructured textsIn any case it is of crucial importance for both customers and compa-nies to dispose of automatically extracted analyzed and summarizeddata which do not include only factual information but also opinionsregarding any kind of good they offer (Liu 2010 Bloom 2011)Companies could take advantage of concise and comprehensive cus-

6 CHAPTER 1 INTRODUCTION

tomer opinion overviews that automatically summarize the strengthsand the weaknesses of their products and services with evident ben-efits in term of reputation management and customer relationshipmanagementCustomer information search costs could be decreased trough thesame overviews which offer the opportunity to evaluate and comparethe positive and negative experiences of other consumers who havealready tested the same products and servicesObviously the advantages of sophisticated NLP methods and soft-ware and their ability to distinguish factual from opinionated lan-guage are not limited to the ones discussed so far but they are dis-persed and specialized among different tasks and domains such as

Ads placement possibility to avoid websites that are irrelevant butalso unfavorable when placing advertisements (Jin et al 2007)

Question-answering chance to distinguish between factual andspeculative answers (Carbonell 1979)

Text summarization potential of summarizing different opinionsand perspectives in addition to factual information (Seki et al2005)

Recommendation systems opportunity to recommend only the itemswith positive feedback (Terveen et al 1997)

Flame and cyberbullying detection ability to find semantically andsyntactically complex offensive contents on the web (Xiang et al2012 Chen et al 2012 Reynolds et al 2011)

Literary reputation tracking possibility to distinguish disapprovingfrom supporting citations (Piao et al 2007 Taboada et al 2006)

Political texts analysis opportunity to better understand positive ornegative orientations of both voters or politicians (Laver et al2003 Mullen and Malouf 2006)

13 SUBJECTIVITY EMOTIONS AND OPINIONS 7

13 Subjectivity Emotions and Opinions

Sentiment Analysis Opinion Mining subjectivity analysis review min-ing appraisal extraction affective computing are all terms that refer tothe computational treatment of opinion sentiment and subjectivity inraw text Their ultimate shared goal is enabling computers to recog-nize and express emotions (Pang and Lee 2008a Picard and Picard1997)The first two terms are definitely the most used ones in literature Al-though they basically refer to the same subject they have demon-strated varying degrees of success into different research communi-ties While Opinion Mining is more popular among web informationretrieval researchers Sentiment Analysis is more used into NLP en-vironments (Pang and Lee 2008a) Therefore the coherence of ourworkrsquos approach is the only reason that justifies the favor of the termSentiment Analysis All the terms mentioned above could be used andinterpreted nearly interchangeablySubjectivity is directly connected to the expression of private statespersonal feelings or beliefs toward entities events and their proper-ties (Liu 2010) The expression private state has been defined by Quirket al (1985) and Wiebe et al (2004) as a general covering term for opin-ions evaluations emotions and speculationsThe main difference from objectivity regards the impossibility to di-rectly observe or verify subjective language but neither subjective norobjective implies truth ldquowhether or not the source truly believes theinformation and whether or not the information is in fact true areconsiderations outside the purview of a theory of linguistic subjectiv-ityrdquo (Wiebe et al 2004 p 281)Opinions are defined as positive or negative views attitudes emotionsor appraisals about a topic expressed by an opinion holder in a giventime They are represented by Liu (2010) as the following quintuple

oj fjk ooijkl hi tl

8 CHAPTER 1 INTRODUCTION

Where oj represents the object of the opinion fjk its features ooijklthe positive or negative opinion semantic orientation hi the opinionholder and tl the time in which the opinion is expressedBecause the time can almost alway be deducted from structured datawe focused our work on the automatic detection and annotation ofthe other elements of the quintupleAs regard the opinion holder it must be underlined that ldquoIn the caseof product reviews and blogs opinion holders are usually the authorsof the posts Opinion holders are more important in news articles be-cause they often explicitly state the person or organization that holdsa particular opinionrdquo (Liu 2010 p 2)The Semantic Orientation (SO) gives a measure to opinions byweighing their polarity (positivenegativeneutral) and strength (in-tenseweak) (Taboada et al 2011 Liu 2010) The polarity can be as-signed to words and phrases inside or outside of a sentence or dis-course context In the first case it is called prior polarity (Osgood1952) in the second case contextual or posterior polarity (Gatti andGuerini 2012)We can refer to both opinion objects and features with the term target(Liu 2010) represented by the following function

T=O(f)

Where the object can take the shape of products services individ-uals organizations events topics etc and the features are compo-nents or attributes of the objectEach object O represented as a ldquospecial featurerdquo which is defined by asubset of features is formalized in the following way

F = f1 f2 fn

Targets can be automatically discovered in texts through both syn-onym words and phrases Wi or indicators Ii (Liu 2010)

Wi = wi1 wi2 wim

14 STRUCTURE OF THE THESIS 9

Ii = ii1 ii2 iiq

Emotions constitute the research object of the Emotion Detec-tion (Strapparava et al 2004 Fragopanagos and Taylor 2005 Almet al 2005 Strapparava et al 2006 Neviarouskaya et al 2007 Strap-parava and Mihalcea 2008 Whissell 2009 Argamon et al 2009Neviarouskaya et al 2009b 2011 de Albornoz et al 2012) a very ac-tive NLP subfield that encroaches on Speech Recognition (Buchananet al 2000 Lee et al 2002 Pierre-Yves 2003 Schuller et al 2003Bhatti et al Schuller et al 2004) and Facial Expression Recognition(Essa et al 1997 Niedenthal et al 2000 Pantic and Patras 2006 Palet al 2006 De Silva et al 1997) In this NLP based work we focus onthe Emotion Detection from written raw textsIn order to provide a formal definition of sentiments we refer to thefunction of Gross (1995)

P(senth)

Caus(s sent h)

In the first one the experiencer of the emotion (h) is function ofa sentiment (Sent) through a predicative relationship the latter ex-presses a sentiment that is caused by a stimulus (s) on a person (h)

14 Structure of the Thesis

In this Chapter we introduced the research field of the SentimentAnalysis placing it in the wider framework of Computational Linguis-tics and Natural Language Processing We tried to clarify some termi-nological issues raised by the proliferation and the overlap of termsin the literature studies on subjectivity The choice of these topics hasbeen justified through the brief presentation of the research contextwhich poses numerous challenges to all the Internet stakeholdersThe main aim of the thesis has been identified with the necessity to

10 CHAPTER 1 INTRODUCTION

treat both facts and opinion expressed in the the Web in the form ofraw data with the same accuracy and velocity of the data stored indatabase tablesDetails about the contribution provided by this work to SentimentAnalysis studies are structured as followsChapter 2 introduces the theoretical framework on which the wholework has been grounded the Lexicon-Grammar (LG) approach withan in-depth examination of the distributional and transformationalanalysis and the semantic predicates theory (Section 21)The assumption that the elementary sentences are the basic discourseunits endowed with meaning and the necessity to replace a uniquegrammar with plenty of lexicon-dependent local grammars are thefounding ideas from the LG approach on which the research hasbeen designed and realizedThe formalization of the sentiment lexical resources and tools as beenperformed through the tools described in Section 22Coherently with the LG method our approach to Sentiment Analy-sis is lexicon-based Chapter 3 provides a detailed description of thesentiment lexicon built ad hoc for the Italian language It includesmanually annotated resources for both simple (Section 33) and mul-tiword (Section 34) units and automatically derived annotations (Sec-tion 35) for the adverbs in -mente and for the nouns of qualityBecause the wordrsquos meaning cannot be considered out of contextChapter 3 explores the influences of some elementary sentence struc-tures on the sentiment words occurring in them while Chapter 4 in-vestigates the effects on the sentence polarities of the so called Con-textual Valence Shifters (CVS) namely intensification and downtoning(Section 42) Negation (Section 43) Modality (Section 44) and Com-parison (Section 45)Chapter 5 focuses on the three most challenging subtasks of the Senti-ment Analysis the sentiment polarity classification of sentences anddocuments (Section 51) the Sentiment Analysis based on features(Section 52) and the sentiment-based semantic role labeling (Sec-

14 STRUCTURE OF THE THESIS 11

tion 53) Experiments about these tasks are conduced through ourresources and rules respectively on a multi-domain corpus of cus-tomer reviews on a dataset of hotels comments and on a dataset thatmixes tweets and news headlinesConclusive remarks and limitations of the present work are reportedin Chapter 6Due to the large number of challenges faced in this work and in or-der to avoid penalizing the clarity and the readability of the thesiswe preferred to mention the state of the art works related to a giventopic directly in the sections in which the topic is discussed For easyreference we anticipate that the literature survey on the already exis-tent sentiment lexicons is reported in Section 32 the methods testedin literature for the lexicon propagation are presented in 351 worksrelated to intensification and negation modeling to modality detec-tion and to comparative sentence mining are respectively mentionedin Sections 421 431 441 and 451 in the end state of the artapproaches to sentiment classification the feature-based SentimentAnalysis and the Sentiment Role Labeling (SRL) tasks are cited in Sec-tions 511 521 and 531Some Sections of this thesis refer to works that in part have been al-ready presented to the international computational linguistics com-munity These are the cases of Chapter 3 briefly introduced in Pelosi(2015ab) and Pelosi et al (2015) and Chapter 4 anticipated in Pelosiet al (2015) Furthermore the results presented in Section 51 havebeen discussed in Maisto and Pelosi (2014b) as well as the results ofSection 52 in Maisto and Pelosi (2014a) and the ones of Section 53 inElia et al (2015)

Chapter 2

Theoretical Background

21 The Lexicon-Grammar Theoretical Framework

With Lexicon-Grammar we mean the method and the practice offormal description of the natural language introduced by MauriceGross in the second half of the 1960s who during the verificationof some procedures from the transformational-generative grammar(TGG) (Chomsky 1965) laid the foundations for a brand new theo-retical frameworkMore specifically during the description of French complementclause verbs through a transformational-generative approach Gross(1975) realized that the English descriptive model of Rosenbaum(1967) was not enough to take into account the huge number of ir-regularities he found in the French languageLG introduced important changes in the way in which the relationshipbetween lexicon and syntax was conceived (Gross 1971 1975) It hasbeen underlined for the first time the necessity to provide linguisticdescriptions grounded on the systematic testing of syntactic and se-mantic rules along the whole lexicon and not only on a limited set of

13

14 CHAPTER 2 THEORETICAL BACKGROUND

speculative examplesActually in order to define what can be considered to be a rule andwhat must be considered an exception a small sample of the lexiconof the analyzed language collected in arbitrary ways can be truly mis-leading It is assumed to be impossible to make generalizations with-out verifying or falsifying the hypotheses According with (Gross 1997p 1)

ldquoGrammar or as it has now been called linguistic the-ory has always been driven by a quest for complete gen-eralizations resulting invariably in recent times in the pro-duction of abstract symbolism often semantic but alsoalgorithmic This development contrasts with that of theother Natural Sciences such as Biology or Geology wherethe main stream of activity was and is still now the searchand accumulation of exhaustive data Why the study of lan-guage turned out to be so different is a moot question Onecould argue that the study of sentences provides an endlessvariety of forms and that the observer himself can increasethis variety at will within his own production of new formsthat would seem to confirm that an exhaustive approachmakes no sense []But grammarians operating at the level of sentences seemto be interested only in elaborating general rules and do sowithout performing any sort of systematic observation andwithout a methodical accumulation of sentence forms to beused by further generations of scientistsrdquo

In the LG methodology it is crucial the collection and the analysisof a large quantity of linguistic facts and their continuous compari-son with the reality of the linguistic usages by examples and counter-examplesThe collection of the linguistic information is constantly registeredinto LG tables that cross-check the lexicon and the syntax of any given

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 15

language The ultimate purpose of this procedure is the definition ofwide-ranging classes of lexical items associated to transformationaldistributional and structural rules (DrsquoAgostino 1992)What emerges from the LG studies is that associating more than fiveor six properties to a lexical entries each one of such entries shows anindividual behavior that distinguishes it from any other lexical itemHowever it is always possible to organize a classification around at listone definitional property that is simultaneously accepted by all theitem belonging to a same LG class and for this reason is promoted asdistinctive feature of the classHaving established this it becomes easier to understand the necessityto replace ldquotherdquo grammar with thousands of lexicon-dependent localgrammarsThe choice of this paradigm is due to its compatibility with the pur-poses of the present work and with the computational linguistics ingeneral that in order to reach high performances in results requiresa large amount of linguistic data which must be as exhaustive repro-ducible and well organized as possibleThe huge linguistic datasets produced over the years by the inter-national LG community provide fine-grained semantic and syntac-tic descriptions of thousands of lexical entries also referred as ldquolexi-cally exhaustive grammarsrdquo (DrsquoAgostino 2005 DrsquoAgostino et al 2007Guglielmo 2009) available for reutilization in any kind of NLP task1The LG classification and description of the Italian verbs2 (Elia et al1981 Elia 1984 DrsquoAgostino 1992) is grounded on the differentiationof three different macro-classes transitive verbs intransitive verbsand complement clause verbs Every LG class has its own definitionalstructure that corresponds with the syntactic structure of the nuclearsentence selected by a given number of verbs (eg V for piovere ldquotorain and all the verbs of the class 1 N0 V for bruciare ldquoto burn and

1 LG descriptions are available for 4000+ nouns that enter into verb support constructions 7000+verbs 3000+ multiword adverbs and almost 1000 phrasal verbs (Elia 2014a)

2LG tables are freely available at the address httpdscunisaitcompostitavolecombo

tavoleasp

16 CHAPTER 2 THEORETICAL BACKGROUND

the other verbs of the class 3 N0 V da N1 for provenire ldquoto come fromand the verbs belonging to the class 6 etc) All the lexical entries arethen differentiated one another in each class by taking into accountall the transformational distributional and structural properties ac-cepted or rejected by every item

211 Distributional and Transformational Analysis

The Lexicon-Grammar theory lays its foundations on the Operator-argument grammar of Zellig S Harris the combinatorial system thatsupports the generation of utterances into the natural languageSaying that the operators store inside information regarding the sen-tence structures means to assume the nuclear sentence to be the min-imum discourse unit endowed with meaning (Gross 1992b)This premise is shared with the LG theory together with the centralityof the distributional analysis a method from the structural linguisticsthat has been formulated for the first time by Bloomfield (1933) andthen has been perfected by Harris (1970) The insight that some cat-egories of words can somehow control the functioning of a numberof actants through a dependency relationship called valency insteadcomes from Tesniegravere (1959)The distribution of an element A is defined by Harris (1970) as the sumof the contexts of A Where the context of A is the actual disposition ofits co-occurring elements It consists in the systematic substitution oflexical items with the aim of verifying the semantic or the transforma-tional reactions of the sentences All of this is governed by verisimil-itude rules that involve a graduation in the acceptability of the itemcombinationAlthough the native speakers of a language generally think that thesentence elements can be combined arbitrarily they actually choosea set of items along the classes that regularly appear together Theselection depends on the likelihood that an element co-occurs with

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 17

elements of one class rather than another3When two sequences can appear in the same context they have anequivalent distribution When they do not share any context they havea complementary distributionObviously the possibility to combine sentence elements is related tomany different levels of acceptability Basically the utterances pro-duced by the words co-occurrence can vary in terms of plausibilityIn the operator-argument grammar of Harris the verisimilitude of oc-currence of a word under an operator (or an argument) is an approxi-mation of the likelihood or of the frequency of this word with respectto a fixed number of occurrence of the operator (or argument) (Harris1988)Concerning transformations (Harris 1964) we refer to the phe-nomenon in which the sentences A and B despite a different combi-nation of words are equivalent to a semantic point of view (eg Flo-ria ha letto quel romanzo = quel romanzo egrave stato letto da Floria ldquoFlo-ria read that novel = that novel has been read by Floriardquo (DrsquoAgostino1992)) Therefore a transformation is defined as a function T that tiesa set of variables representing the sentences that are in paraphrasticequivalence (A and B are partial or full synonym) and that follow themorphemic invariance (A and B possess the same lexical morphemes)Transformational relations must not be mixed up with any derivationprocess of the sentence B from A and vice versa A and B are just twocorrelated variants in which equivalent grammatical relations are re-alized (DrsquoAgostino 1992)4 That is why in this work we do not use di-rected arrows to indicate transformed sentencesAmong the transformational manipulations considered in this thesiswhile allying the Lexicon-Grammar theoretical framework to the Sen-

3Among the traits that mark the co-occurrences classes we mention by way of example human andnot human nouns abstract nouns verbs with concrete or abstract usages etc (Elia et al 1981)

4 Different is the concept of transformation in the generative grammar that envisages an orientedpassage from a deep structure to a superficial one ldquoThe central idea of transformational grammar isthat they are in general distinct and that the surface structure is determined by repeated applicationof certain formal operations called lsquogrammatical transformationsrsquo to objects of a more elementary sortrdquo(Chomsky 1965 p 16)

18 CHAPTER 2 THEORETICAL BACKGROUND

timent Analysis field we mention the following (Elia et al 1981 Vietri2004)

Substitution The substitution of polar elements in sentences is thestarting point of this work that explores the mutations in term ofacceptability and semantic orientation in the transformed sen-tences It can be found almost in every chapter of this thesis

Paraphrases with support verbs5 nominalizations and adjectival-izations This manipulation is essential for example in Section334 and 335 where we discuss the nominalization (eg amare= amore ldquoto love = loverdquo) and the adjectivalization (eg amare =amoroso ldquoto love = lovingrdquo) of the Italian psychological verbs orin Section 352 when we account for the relation among a set ofsentiment adjectives and adverbs (eg appassionatamente = inmodo appassionato ldquopassionately = in a passionate wayrdquo)

Repositioning Emphasis permutation dislocation extraction are alltransformation that shifting the focus on a specific element ofthe sentence influence its strength when the element is polar-ized

Extension We are above all interested in the expansion of nominalgroups with opinionated modifiers (see Section 332)

Restructuring The complement restructuring plays a crucial role inthe Feature-based Sentiment Analysis for the correct annotationof the features and the objects of the opinions (see Sections 334and 52)

Passivization In our system of rules passive sentences merely pos-

5 The concept of operator does not depend on specific part of speech so also nouns adjectives andprepositions can possess the power to determine the nature and the number of the sentence arguments(DrsquoAgostino 1992) Because only the verbs carry out morpho-grammatical information regarding themood tense person and aspect they must give this kind of support to non-verbal operators The socalled Support Verbs (Vsup) are different from auxiliaries (Aux) that instead support other verbs Sup-port verbs (eg essere ldquoto berdquo avere ldquoto haverdquo fare ldquoto dordquo) can be case by case substituted by stylisticaspectual and causative equivalents

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 19

sess the same semantic orientation of the corresponding activesentence

Systematic correlation Examples are the standardcrossed structuresmentioned in Section 334

212 Formal Representation of Elementary Sentences

While the generative grammar focused its studies on complex sen-tences both the Operator grammar and the LG approach assumedthe respectively so called nuclear or elementary sentences to bethe minimum discourse units endowed with meaning Above all be-cause complex sentences are made from elementary ones as statedby (Chomsky 1957 p 92) himself

ldquoIn particular in order to understand a sentence it is neces-sary to know the kernel sentences from which it originates(more precisely the terminal strings underlying these ker-nel sentences) and the phrase structure of each of these el-ementary components as well as the transformational his-tory of development of the given sentence from these kernelsentences The general problem of analyzing the process oflsquounderstandingrsquo is thus reduced in a sense to the problemof explaining how kernel sentences are understood thesebeing considered the basic rsquocontent elementsrsquo from whichthe usual more complex sentences of real life are formed bytransformational developmentrdquo

This assumption shifts the definition of creativity from ldquorecursivityrdquoin complex sentences (Chomsky 1957) to ldquocombinatory possibilityrdquoat the level of elementary sentences (Vietri 2004) but affects also theway in which the lexicon is conceived (Gross 1981 p 48) made thefollowing observation in this regard6

6 On this topic also Chomsky agrees that

20 CHAPTER 2 THEORETICAL BACKGROUND

ldquoLes entreacutees du lexique ne sont pas des mots mais des phrasessimples Ce principe nrsquoest en contradiction avec les notionstraditionnelles de lexique que de faccedilon apparente En ef-fet dans un dictionnaire il nrsquoest pas possible de donner lesens drsquoun mot sans utiliser une phrase ni de contraster desemplois diffeacuterents drsquoun mecircme mot sans le placer dans desphrasesrdquo7

As it will be discovered in the following Sections the idea that theentries of the lexicon can be sentences rather that isolated words is aprinciple that has been entirely shared among the sentiment lexicalresources built for the present workWith regard to formalisms Gross (1992b) represented all the elemen-tary sentences through the following generic shape

N0 V WWhere N0 stands for the sentence formal subject V stands for the

verbs and W can indicate all kinds of essential complements includ-ing an empty one While N0 V represents a great generality W raisesmore complex classification problems (Gross 1992b) Complementsindicated by W can be direct (Ni) prepositional (Prep Ni) or sentential(Ch F)

ldquoIn describing the meaning of a word it is often expedient or necessary to refer to thesyntactic framework in which this word is usually embedded eg in describing themeaning of hit we would no doubt describe the agent and object of the action in termsof the notions subject and object which are apparently best analyzed as purelyformal notions belonging to the theory of grammarrdquo (Chomsky 1957 p 104)

But without any exhaustive application of syntactic rules to lexical items he feels compelled to pointout that

ldquo() to generalize from this fairly systematic use and to assign lsquostructural meaningsrsquo togrammatical categories or constructions just as lsquolexical meaningsrsquo are assigned to wordsor morphemes is a step of very questionable validityrdquo (Chomsky 1957 p 104)

7 ldquoThe entries of the lexicon are not words but simple sentences This principle contradicts the tradi-tional notions of lexicon just into an apparent way Indeed into a dictionary it is impossible to providethe meaning of a word without a sentence nor to contrast different uses of the same word without placingit into sentencesrdquo Author translation

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 21

Information about the nature and the number of complement arecontained in the verbs (or in other parts of speech that case by caseplay the a predicative function)Gross (1992b) presented the following typology of complements re-spectively approached for the Italian language by the LG researchersmentioned below

frozen arguments (C) Vietri (1990 2011 2014d)

free arguments (N) DrsquoAgostino (1992) Bocchino (2007)

sentential arguments (Ch F) Elia (1984)

2121 The Continuum from Simple Sentences to Polyrematic Structures

Gross (1988 1992b 1993) recognized and formalized the concept ofphrase figeacutee ldquofrozen sentencerdquo which refers to groups of words thatcan be tied one another by different degrees of variability and cohe-sion (Elia and De Bueriis 2008) They can take the shape of verbalnominal adjectival or adverbial structuresThe continuum along which frozen expressions can be classifiedvaries according to higher or lower levels of compositionality and id-iomaticity and goes from completely free structures to totally frozenexpressions as summarized below (DrsquoAgostino and Elia 1998 Eliaand De Bueriis 2008)

distributionally free structures high levels of co-occurrence variabil-ity among words

distributionally restricted structures limited levels of co-occurrencevariability

frozen structures (almost) null levels of co-occurrence variability

proverbs no variability of co-occurrence

22 CHAPTER 2 THEORETICAL BACKGROUND

Idiomatic interpretations can be attributed to the last two elements ofthe continuum due to their low levels of compositionalityThe fact that the meaning of these expressions cannot be computedby simply summing up the meanings of the words that compose themhas a direct impact on every kind of Sentiment Analysis applicationthat does not correctly approaches the linguistic phenomenon of id-iomaticityWe will show the role of multiword expressions in Section 34

2122 Against the Bipartition of the Sentence Structure

In the operator grammar differently from the generative grammarsentences are not supposed to have a bipartite structure

S rarr N P + V P

V P rarrV + N P 8

but are represented as predicative functions in which a central ele-ment the operator influences the organization of its variables thearguments The role of operator must not be represented necessar-ily by verbs but also by other lexical items that possess a predicativefunction such as nous or adjectivesThis idea shared also by the LG framework is particularly importantin the Sentiment Analysis field especially if the task is the detection ofthe semantic roles involved into sentiment expressions (Section 53)Sentences like (1) and (2) are examples in which the semantic roles ofexperiencer of the sentiment and stimulus of the sentiment are respec-tively played by ldquoMariardquo and il fumo ldquothe smokerdquo It is intuitive thatfrom a syntactic point of view these semantic roles are differently dis-tributed into the direct (the experiencer is the subject such as in 1)and the reverse (the experiencer is the object as in 2) sentence struc-tures (see Section 333 for further explanations)

8Sentence = Noun Phrase + Verb Phrase Verb Phrase = Verb + Noun Phrase

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 23

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]9

ldquoMaria hates the smokerdquo

(2) [Sentiment[StimulusIl fumo] tormenta [ExperiencerMaria]]ldquoThe smoke harasses Mariardquo

The indissoluble bond between the sentiment and the experi-encer10 that does not differ from (1) to (2) is perfectly expressedby the Operator-argument grammar function Onn for both the sen-tences by the LG representation N0 V N1 or by the function of Gross(1995) Caus(ssenth) (described in Section 13 and deepened in Sec-tion 3)

(1) Caus(il fumo odiare Maria)(2) Caus(il fumo tormentare Maria)

The same can not be said of the sentence bipartite structure thatrepresenting the experiencer ldquoMariardquo inside (2) and outside (1) theverb phrase unreasonably brings it closer or further from the senti-ment to which it should be attached just in the same way (Figure 21)

Moreover it does not explain how the verbs of (1) and (2) can ex-ercise distributional constraints on the noun phrase indicating the ex-periecer (that must be human or at least animated) both in the casein which it is under the dependence of the same verb phrase (experi-encer object see 2b) and also when it is not (experiencer subject see2a)

(1a) [Sentiment [Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]odi a[Stimulusi l f umo]]

ldquo(Maria + My cat + The television) hates the smokerdquo

9For the sentence annotation we used the notation of Jackendoff (1990) [Event ]10Un sentiment est toujours attacheacute agrave la personne qui lrsquoeacuteprouve ldquoA sentiment is always attached to the

person that feels itrdquo (Gross 1995 p 1)

24 CHAPTER 2 THEORETICAL BACKGROUND

Figure 21 Syntactic Trees of the sentences (1) and (2)

(2a) [Sentiment[StimulusI l f umo]tor ment a[Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]]

ldquoThe smoke harasses (Maria + My cat + The television)rdquo

According with (Gross 1979 p 872) the problem is that

ldquorewriting rules (eg S rarr N P V P V P rarr V N P ) describeonly LOCAL dependencies [] But there are numerous syn-tactic situations that involve non-local constraintsrdquo

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 25

While the Lexicon-Grammar and the operator grammar represen-tations do not change the representations of the paraphrases of (12)with support verbs (1b2b) further complicate the generative gram-mar solution that still does not correctly represent the relationshipsamong the operators and the arguments (see Figure 22) It fails todetermine the correct number of arguments (that are still two andnot three) and does not recognize the predicative function which isplayed by the nouns odio ldquohaterdquo and tormento ldquoharassmentrdquo and notby the verbs provare ldquoto feelrdquo and dare ldquoto giverdquo

(1b) [Sentiment[ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

(2b) [Sentiment[StimulusIl fumo] dagrave il tormento a [ExperiencerMaria]]ldquoThe smoke gives harassment to Mariardquo

For all these reasons for the computational treatment of subjectiv-ity emotions and opinions in raw text we preferred in this work theLexicon-Grammar framework to other linguistic approaches despiteof their popularity in the NLP literature

213 Lexicon-Grammar Binary Matrices

We said that the difference between the LG language description andany other linguistic approach regards the systematic formalization ofa very broad quantity of data This means that given a set of proper-ties P that define a class of lexical items L the LG method collects andformalizes any single item that enters in such class (extensional classi-fication) while other approaches limit their analyses only to reducedsets of examples (intensional classification) (Elia et al 1981)Because the presentation of the information must be as transparentrigorous and formalized as possible LG represents its classes throughbinary matrices like the one exemplified in Table 21

They are called ldquobinaryrdquo because of the mathematical symbols ldquo+rdquo

26 CHAPTER 2 THEORETICAL BACKGROUND

Figure 22 Syntactic Trees of the sentences (1b) and (2b)

and ldquondashrdquo respectively used to indicate the acceptance or the refuse ofthe properties P by each lexical entry LThe symbol X unequivocally denotes an LG class and is always asso-ciated to a class definitional property simultaneously accepted by allthe items of the class (in the example of Table 21 P1) Subclasses of Xcan be easily built by choosing another P as definitional property (egL2 L3 and L4 under the definitional substructure P2)The sequences of ldquo+rdquo and ldquondashrdquo attributed to each entry constitute the

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 27

X P1 P2 P3 P4 Pn

L1 + ndash ndash ndash +L2 + + ndash ndash ndashL3 + + + + +L4 + + + + +Ln + ndash ndash ndash +

Table 21 Example of a Lexicon-Grammar Binary Matrix

ldquolexico-syntactic profilesrdquo of the lexical items (Elia 2014a)The properties on which the classification is arranged can regard thefollowing aspects (Elia et al 1981)

distributional properties by which all the co-occurrences of the lexi-cal items L with distributional classes and subclasses are testedinside acceptable elementary sentence structures

structural and transformational properties through which all thecombinatorial transformational possibilities inside elementarysentence structures are explored (see Section 211 for more in-depth explanations)

It is possible for a lexical entry to appear in more than one LG ma-trix In this case we are not referring to a single verb that accepts di-vergent syntactic and semantic properties but instead to two differ-ent verb usages attributable to the same morpho-phonological entry(Elia et al 1981 Vietri 2004 DrsquoAgostino 1992) Verb usages possessthe same syntactic and semantic autonomy of two verbs that do notpresent any morpho-phonological relation

214 Syntax-Semantics Interface

In the sixties the hot debate about the syntax-semantics Interface op-posed theories supporting the autonomy of the syntax from the se-mantics to approaches sustaining the close link between these two

28 CHAPTER 2 THEORETICAL BACKGROUND

linguistic dimensionsChomsky (1957) postulated the independence of the syntax from thesemantics due to the vagueness of the semantics and because of thecorrespondence mismatches among these two linguistic dimensionsbut recognized the existence of a relation between the two that de-serves to be explored

ldquoThere is however little evidence that lsquointuition aboutmeaningrsquo is at all useful in the actual investigation of lin-guistic form []It seems clear then that undeniable though only imperfectcorrespondences hold between formal and semantic fea-tures in language The fact that the correspondences are soinexact suggests that meaning will be relatively useless as abasis for grammatical description []The fact that correspondences between formal and seman-tic features exist however cannot be ignored [] Havingdetermined the syntactic structure of the language we canstudy the way in which this syntactic structure is put to usein the actual functioning of language An investigation ofthe semantic function of level structure () might be a rea-sonable step towards a theory of the interconnections be-tween syntax and semanticsrdquo (Chomsky 1957 p 94-102)

The contributions that come from the generative grammar withparticular reference to the theta-theory (Chomsky 1993) assume thepossibility of a mapping between syntax and semantics

ldquoEvery content bearing major phrasal constituent of a sen-tence (S NP AP PP ETC) corresponds to a conceptual con-stituent of some major conceptual category11rdquo (Jackendoff1990 p 44)

11Object Event State Action Place Path Property Amount

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 29

But their autonomy is not negated through this assumption the-matic roles are still considered to be part of the level of conceptualstructure and not part of syntax (Jackendoff 1990) This is confirmedagain by (Chomsky 1993 p 17) when stating that ldquo() the mappingof S-structures onto PF and LF are independent from one anotherrdquoHere the S-structure generated by the rules of the syntax is associ-ated by means of different interpretative rules with representations inphonetic form (PF) and in logical form (LF)Many contributions from the linguistic community define the con-cept of thematic roles through the linking problem between syntax(syntactic realization of predicate argument structures that determinethe roles) and semantics (semantic representation of such structures)(Dowty 1989 Grimshaw 1990 Rappaport Hovav and Levin 1998)Among the canonical thematic roles we mention the agent (Gruber1965) the patient Baker (1988) the experiencer (Grimshaw 1990) thetheme (Jackendoff 1972) ect

2141 Frame Semantics

ldquoSome words exist in order to provide access to knowledgeof such frames to the participants in the communicationprocess and simultaneously serve to perform a categoriza-tion which takes such framing for granted (Fillmore 2006p 381)

With these words Fillmore (2006) depicted the Frame Semanticswhich describes sentences on the base of predicators able to bring tomind the semantic frames (inference structures linked through lin-guistic convention to the lexical items meaning) and the frame ele-ments (participants and props in the frame) involved in these frames(Fillmore 1976 Fillmore and Baker 2001 Fillmore 2006)A frame semantic description starts from the identification of the lex-ical items that carry out a given meaning and then explores theways in which the frame elements and their constellations are realized

30 CHAPTER 2 THEORETICAL BACKGROUND

around the structures that have such items as head (Fillmore et al2002)This theoretical framework poses its bases on the Case grammar astudy on the combination of the deep cases chosen by verbs an on thedefinition of case frames that have the function of providing ldquoa bridgebetween descriptions of situations and underlying syntactic represen-tationsrdquo by attributing ldquosemantico-syntactic roles to particular partic-ipants in the (real or imagined) situation represented by the sentence(Fillmore 1977 p 61) A direct reference of Fillmorersquos work is the va-lency theory of Tesniegravere (1959)Based on these principles the FrameNet research project produced alexicon of English for both human use and NLP applications (Bakeret al 1998 Fillmore et al 2002 Ruppenhofer et al 2006) Its purposeis to provide a large amount of semantically and syntactically anno-tated sentences endowed with information about the valences (com-binatorial possibilities) of the items derived from annotated contem-porary English corpus Among the semantic domains covered thereare also emotion and cognition (Baker et al 1998)For the Italian language Lenci et al (2012) developed LexIt a toolthat following the FrameNet approach automatically explores syn-tactic and semantic properties of Italian predicates in terms of distri-butional profiles It performs frame semantic analyses using both LaRepubblica corpus and the Wikipedia taxonomy

2142 Semantic Predicates Theory

The Lexicon-Grammar framework offers the opportunity to creatematches between sets or subsets of lexico-syntactic structures andtheir semantic interpretations The base of such matches is the con-nection between the arguments selected by a given predicative itemlisted in our tables and the actants involved by the same semanticpredicate According with (Gross 1981 p 9)

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 31

ldquoSoit Sy lrsquoensemble des formes syntaxiques drsquoune langue []Soit Se un ensemble drsquoeacuteleacutements de sens les eacuteleacutements de Syseront associeacutes agrave ceux de Se par des regravegles drsquointerpreacutetation[] nous appellerons actants syntaxiques les sujet et com-pleacutement(s) du verbe tels qursquoils sont deacutecrits dans Sy nous ap-pelons arguments les variables des preacutedicats seacutemantiquesDans certains exemples il y a correspondance biunivoqueentre actants et arguments entre phrase simple et preacutedi-catrdquo12

This is the basic assumption on which the Semantic Predicates the-ory has been built into the LG framework It postulates a complexparallelism between the Sy and the Se that differently from other ap-proaches can not ignore the idiosyncrasies of the lexicon since ldquoilnrsquoexiste pas deux verbes ayant le mecircme ensemble de proprieacuteteacutes syntax-iquesrdquo (Gross 1981 p 10) Actually the large number of structures inwhich the words can enter underlines the importance of a sentimentlexical database that includes both semantic and (lexicon dependent)syntactic classification parameters

Therefore the possibility to create intuitive semantic macro-classifications into the LG framework does not contrast with the au-tonomy of the semantics from the syntax (Elia 2014b)In this thesis we will refer to following three different predicates all ofthem connected to the expression of subjectivity

Sentiment (eg odiare to hate) that involves the actants experi-encer and stimulus

Opinion (eg difendere to stand up for) that implicates the ac-tants opinion holder and target of the opinion

12 ldquoLet Sy be the set of syntactic forms of a language [] Let it Se be the set of semantic elementsthe elements from Sy will be associated to the ones of Se through interpretation rules [] We will callsyntactic actants the subject and the complement(s) of the verb in the way in which they are describedin Sy we will call arguments the variables of semantic predicates In some examples there is a one-to-one correspondence between actants and arguments between simple sentence and predicaterdquo Authortranslation

32 CHAPTER 2 THEORETICAL BACKGROUND

Figure 23 LG syntax-semantic interface

Physical act (eg baciare to kiss) that evokes the actants patientand agent

As an example the predicate in the sentence (1) from the LG class43 will be associated to a Predicate with two variables described bythe mathematical function Caus(ssenth)

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]]ldquoMaria hates the smokerdquo

The rules of interpretations which have been followed are

1 the experiencer (h in the Se) corresponds to the formal subject(N0 in Sy)

2 the sentiment stimulus (t in the Se) is the human complement(N1 in Sy)

As shown in the few examples below the syntactic transformationsin which the same predicate is involved preserve the role played by itsarguments

Nominalization

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 33

(1b) [Sentiment [ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

Adjectivalization

(1c) [Sentiment [Stimulusil fumo] egrave odioso per [ExperiencerMaria]]ldquoThe smoke is hateful for Mariardquo

Passivization

(1d) [Sentiment [Stimulusil fumo] egrave odiato da [ExperiencerMaria]]ldquoThe smoke is hated by Mariardquo

Repositioning

(1e) [Sentiment egrave il [Stimulusil fumo] che [ExperiencerMaria] odia]ldquoIt is the smoke that Maria hatesrdquo

As concerns the link among actants and arguments the back-ground of the research is the LEG-Semantic Role Labeling system ofElia (2014b) with particular reference to its psychological evaluativeand bodily predicates The granularity of the verb properties in thisSRL system are evidence of the strong dependence of the syntax fromthe lexicon and proof of its absolute independence from semanticsSpecial kinds of Semantic Predicates have been already used in NLPapplications into a Lexicon-Grammar context we mention Vietri(2014a) Elia et al (2010) Elia and Vietri (2010) that formalized andtested the Transfer Predicates on the Italian Civil Code Elia et al(2013) that focused on the Spatial Predicates and Maisto and Pelosi(2014b) Pelosi (2015b) Elia et al (2015) that exploited the Psycholog-ical Semantic Predicates for Sentiment Analysis purposes

34 CHAPTER 2 THEORETICAL BACKGROUND

22 The Italian Module of Nooj

Nooj is the NLP tool used in this work for both the language formal-ization and the corpora pre-processing and processing at the ortho-graphical lexical morphological syntactic and semantic levels (Sil-berztein 2003)Among the Nooj modules that have been developed for more thantwenty languages by the international Nooj community the ItalianLingware has been built by the Maurice Gross Laboratory from theUniversity of SalernoThe Italian module for NooJ can be any time integrated with other adhoc resources in form of electronic dictionaries and local grammarsThe freely downloadable Italian resources13 include

bull electronic dictionaries of simple and compound words

bull an electronic dictionary of proper names

bull an electronic dictionary of toponyms

bull a set of morphological grammars

bull samples of syntactic grammars

This knowledge and linguistic based environment is not in contrastwith hybrid and statistical approaches that often require annotatedcorpora for their testing stagesThe Nooj annotations can go through the description of simple wordforms multiword units and even discontinuous expressionsFor a detailed description of the potential of the Italian module ofNooj see Vietri (2014c)

13wwwnooj-associationorg

22 THE ITALIAN MODULE OF NOOJ 35

221 Electronic Dictionaries

If we tried to digit on a search engine the key-phrase ldquoelectronic dic-tionaryrdquo the first results that would appear in the SERP would concernCD-ROM dictionaries or machine translatorsActually electronic dictionaries (usable with all the computerizeddocuments for the text recognition for the information retrieval andfor the machine translation) should be conceived as lexical databasesintended to be used only by computer applications rather than by awide audience who probably wouldnrsquot be able to interpret data codesformalized in a complex wayIn addition to the target the differences between the two types of dic-tionary can be summarized in three points

Completeness human dictionaries may overlook information thatthe human users could extract from their encyclopedic knowl-edge Electronic dictionaries instead must necessarily be ex-haustive They cannot leave anything to chance the computermustnrsquot have possibility of error

Explanation human dictionaries can provide part of the informationimplicitly because they can rely on human userrsquos skills (intuitionadaptation and deduction) Electronic dictionaries on the con-trary must respect this principle the computer can only performfully explained symbols and instructions

Coding unlike human dictionaries all the information provided intothe electronic dictionaries related to the use of software for auto-matic processing of data and texts must be accurate consistentand codified

Joining all these features enlarges the size and magnifies the com-plexity of monolingual and bilingual electronic dictionaries whichare for this reason always bigger than the human onesThe Italian electronic dictionaries system based on the Maurice

36 CHAPTER 2 THEORETICAL BACKGROUND

Grossrsquos Lexicon-Grammar has been developed at the University ofSalerno by the Department of Communication Science it is groundedon the European standard RELEX

222 Local grammars and Finite-state Automata

Local grammars are algorithms that through grammatical morpho-logical and lexical instructions are used to formalize linguistic phe-nomena and to parse texts They are defined ldquolocalrdquo because despiteany generalization they can be used only in the description and anal-ysis of limited linguistic phenomenaThe reliability of ldquofinite state Markov processesrdquo (Markov 1971) hadalready been excluded by (Chomsky 1957 p 23-24) who argued that

ldquo() there are processes of sentence-formation that finitestate grammars are intrinsically not equipped to handle Ifthese processes have no finite limit we can prove the literalinapplicability of this elementary theory If the processeshave a limit then the construction of a finite state grammarwill not be literally out of the question since it will be possi-ble to list the sentences and a list is essentially a trivial finitestate grammar But this grammar will be so complex that itwill be of little use or interestrdquo

and remarked that

ldquoIf a grammar does not have recursive devices () it will beprohibitively complex If it does have recursive devices ofsome sort it will produce infinitely many sentencesrdquo

Instead Gross (1993) caught the potential of finite state machinesespecially for those linguistic segments which are more limited from acombinatorial point of viewThe Nooj software actually is not limited to finite-state machines andregular grammars but relies on the four types of grammars of the

22 THE ITALIAN MODULE OF NOOJ 37

Figure 24 Deterministic Finite-State Automaton

Figure 25 Non deterministic Finite-State Automaton

Figure 26 Finite-State Transducer

Figure 27 Enhanced Recursive Transition Network

38 CHAPTER 2 THEORETICAL BACKGROUND

Chomsky hierarchy In fact as we will show in this research it is ca-pable to process context free and sensitive grammars and to performtransformations in cascadeFinite-State Automata (FSA) are abstract devices characterized by a fi-nite set of nodes or ldquostatesrdquo (Si) connected one another by transitions(tj) that allow us to determine sequences of symbols related to a par-ticular path (see Figure 24)These graphs are read from left to right or rather from the initial state(S0) to the final state (S3) (Gross 1993) FSA can be deterministic suchas the one in Figure 24 or non deterministic if they activate morethen one path as in Figure 25 A Finite-State Transducer (FST) tans-duces the input symbols (Sii) into the output ones (Sio) as in Figure26 A Recursive Transition Network (RTN) is a grammar that containsembedded graphs (the S3 node in Figure 27) An Enhanced RecursiveTransition Network (ERTN) is a more complex grammar that includesvariables (V) and constraints (C) (Silberztein 2003 see Figure 27)For simplicity from now on we will just use the acronym FSA in orderto indicate all the kinds of local grammars used to formalize differentlinguistic phenomenaIn general the nodes of the graphs can anytime recall entries or se-mantic and syntactic information from the electronic dictionariesFSA are inflectional or derivational if the symbols in the nodes aremorphemes syntactic if the symbols are words

Chapter 3

SentIta Lexicon-Grammarbased Sentiment Resources

31 Lexicon-Grammar of Sentiment Expressions

The LG theoretical framework with its huge collections and catego-rization of linguistic data in the form of LG tables built and refinedover years by its researchers offers the opportunity to systematicallyrelate a large number of semantic and syntactic properties to the lex-icon The complex intersection between semantics and syntax thegreatest benefit of this approach corrects the worse deficiency ofother semantic strategies proposed in the Sentiment Analysis litera-ture the tendency to group together only on the base of their meaningwords that have nothing in common from the syntactic point of view(Le Pesant and Mathieu-Colas 1998)1

1 On this purpose (Elia 2013 p 286) illustrates that

ldquo() the semantic intuition that drives us to put together certain predicates and theirargument is not correlated to the set of syntactic properties of the verbs nor is it laquohelpedraquoby it except in a very superficial way indeed we might say that the granular nature of thesyntax in the lexicon would be an obstacle for a mind that is organized in an efficient and

39

40 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ldquoClasses drsquoobjetsrdquo is the expression introduced by Gross (1992a) to in-dicate the linguistic device that allow the language formalization fromboth a syntactic and semantic point of view Arguments and Predi-cates are classified into subsets which are semantically consistent andthat also share lexical and grammatical properties (De Bueriis andElia 2008)According with (Gross 2008 p 1)

ldquoSi un mot ne peut pas ecirctre deacutefini en lui-mecircme crsquoest-agrave-direhors contexte mais seulement dans un environnement syn-taxique donneacute alors le lexique ne peut pas ecirctre seacutepareacute de lasyntaxe crsquoest-agrave-dire de la combinatoire des mots La seacuteman-tique nrsquoest pas autonome non plus elle est le reacutesultat des eacuteleacute-ments lexicaux organiseacutes drsquoune faccedilon deacutetermineacutee (distribu-tion) Qursquoil est soit ainsi est confirmeacute par les auteurs de dic-tionnaires eux-mecircmes qui timidement et sans aucune meacuteth-ode notent pour un preacutedicat donneacute le ou les arguments quipermettent de seacuteparer un emploi drsquoun autre Pas de seacuteman-tique sans syntaxe donc crsquoest-agrave-dire sans contexterdquo2

SentIta is a sentiment lexical database that directly aims to apply theLexicon-Grammar theory starting from its basic hypothesis the min-imum semantic units are the elementary sentences not the words(Gross 1975)Therefore in this work the lemmas collected into the dictionaries andtheir Semantic Orientations are systematically recalled and computedinto a specific context through the use of local grammars On the baseof their combinatorial features and co-occurrences contexts the Sen-

rigorous logic wayrdquo

2ldquoIf a word can not be defined in itself namely out of context but only into a specific syntactic environ-ment then the lexicon can not be separated from the syntax that is to say from the word combinatoricsSemantics is not autonomous anymore it is the result of lexical elements organized into a determinedway (distribution) This is confirmed by the dictionary authors themselves that timidly and without anymethod note for a specific predicate the argument(s) that allow to separate one use from another Nosemantics without syntax that means without contextrdquo Authorrsquos translation

31 LEXICON-GRAMMAR OF SENTIMENT EXPRESSIONS 41

tIta lexical items can take the following shapes (Buvet et al 2005 Elia2014a)

operators the predicates that can be verbs nouns adjectives ad-verbs multiword expressions prepositions and conjunctions

arguments the predicate complements that can be nominal andprepositional groups or entire clauses

As already specified in Section 22 the sentiment words and other ori-ented Atomic Linguistic Units (ALU) are listed into Nooj electronicdictionaries while the local grammars used to manipulate their po-larities are formalized thanks to Finite State AutomataElectronic dictionaries have then been exploited in order to list andto semantically and syntactically classify into a machine readableformat the sentiment lexical resources The computational powerof Nooj graphs has instead been used to represent the accep-tancerefuse of the semantic and syntactic properties through the useof constraint and restrictionsChapter 2 underlined the role played in the elementary sentences bythe Predicates and the one assumed by its arguments According toLe Pesant and Mathieu-Colas (1998)

ldquoLa structure preacutedicatarguments semble plus opeacuteratoire queles deacutecoupages binaires classiques (sujetpreacutedicat)[] Crsquoest le preacutedicat qui deacutetermine le nombre de positionsconstitutives de la phraserdquo 3

Predicates must not be identified only with the category of verbs butalso with predicative nouns and adjectives that enter into supportverb structuresIn the subgroup of sentiment expressions starting from their syntacticstructures Gross identified two semantic types

3The predicateargument structure seems to be more operating than the classical binary subdivision(subjectpredicate) [] It is the predicate that determines the number of positions that constitute thesentence Authorrsquos translation

42 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1 sentences with two arguments where one is the person and theother is the feeling (eg Maria egrave angosciata ldquoMaria is anguishedrdquo)

2 sentences with three arguments that differs from the first one forthe presence of a stimulus that triggers the feeling into anotherperson (eg Gianni angoscia Maria ldquoGianni anguishes Mariardquo)

The notation for their representation is the same as the one usedfor the Semantic Predicates (Gross 1981) The type 1 can be formal-ized by the following formula

1 P(senth) eg P(angoscia Maria)

in which the variable h is function of a sentiment Sent through a pred-icative relationshipThe type 2 instead can be expressed by the following formula

2 Caus(s sent h) eg Caus(Gianni angoscia Maria)

in which a sentiment Sent is caused by a stimulus s on a person hBoth of them are outcomes of the following assumption ldquoun senti-ment est toujours attacheacute agrave la personne qui lrsquoeacuteprouverdquo4 (Gross 1995p 1) At first glance the problem connected to sentiment expressionsseems to be easy to define and solve Actually upon a deeper analysisit presents its challenging aspects the quantity and the variability ofsentiment expressions which can vary both from a transformationaland from a morpho-syntactic point of view As displayed below a sen-timent expressed by a word can take the shape of verbs nouns ad-jectives or adverbs (Gross 1995 DrsquoAgostino 2005 DrsquoAgostino et al2007)

V angosciare ldquoto anguishrdquo angosciarsi ldquoto become anguishedrdquo

N angoscia ldquoanguishrdquo

A angosciante ldquoanguishingrdquo angosciato angoscioso ldquoanguishedrdquo

ADV angosciatamente angosciosamente ldquowith anguishrdquo4ldquoA sentiment is always attached to the person that feels itrdquo Authorrsquos translation

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 43

Such groups of words related one another by the derivational mor-phology under a common root (eg angosc-) represent from the LGpoint of view classes of paraphrastic equivalence or also paraphras-tic constellations (DrsquoAgostino 1992) These concepts are crucial whenapplied to the Sentiment Analysis field because the connection of thewords displayed above is related to a formal principle that regards thesentence structure and also a semantic principle that concerns thedistributional properties and the semantic roles of the arguments se-lected by the predicates (DrsquoAgostino 1992)Nevertheless from a syntactic point of view Gross (1995) noticed thatthe terms of sentiment can occupy all the different nominal adjecti-val verbal or adverbial positions within the same sentence structure

32 Literature Survey on Subjectivity Lexicons

The most used approaches in the Sentiment Analysis field can be di-vided and summarized in three main lines of research lexicon-basedmethods learning and statistical methods and hybrid methodsThe first ones coincide with the strategy that has been chosen to carryout the present workLexicon-based approaches always start from the following assump-tion the text sentiment orientation comes from the semantic orien-tations of words and phrases contained in itThe most commonly used SO indicators are adjectives or adjectivephrases (Hatzivassiloglou and McKeown 1997 Hu and Liu 2004Taboada et al 2006) but recently it became really common the useof adverbs (Benamara et al 2007) nouns (Vermeij 2005 Riloff et al2003) and verbs as well (Neviarouskaya et al 2009a)Hand-built lexicons are definitely more accurate than theautomatically-built ones especially in cross-domain SentimentAnalysis tasks Nevertheless to manually draw up a dictionary is

44 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

considered a painstakingly time-consuming activity (Taboada et al2011 Bloom 2011) This is why the presence of a large number ofstudies on automatic polarity lexicons creation and propagation canbe observed in literature On this topic it must be said that automaticdictionaries seem to be more unstable but usually larger than themanually built ones (see Tables 31 and 32) Size anyway does notmean always quality it is common for these large dictionaries tohave scarcely detailed information a large amount of entries coulddenote fewer details in description eg the Maryland dictionary(Mohammad et al 2009) does not specify the entriesrsquo part of speechor instead could mean more noiseAmong the state of the art methods used to build and test thosedictionaries we mention the Latent Semantic Analysis (LSA) (Lan-dauer and Dumais 1997) bootstrapping algorithms (Riloff et al2003) graph propagation algorithms (Velikovich et al 2010 Kaji andKitsuregawa 2007) conjunctions (and or but) and morphologicalrelations between adjectives (Hatzivassiloglou and McKeown 1997)Context Coherency (Kanayama and Nasukawa 2006a) distributionalsimilarity (Wiebe 2000) etc5Word Similarity is a very frequently used method in the dictionarypropagation over the thesaurus-based approaches Examples are theMaryland dictionary created thanks to a Roget-like thesaurus and ahandful of affixe (Mohammad et al 2009) and other lexicons basedon WordNet like SentiWordNet built on the base of quantitativeanalysis of glosses associated to synsets (Esuli and Sebastiani 20052006a) or other lexicons based on the computing of the distancemeasure on WordNet (Kamps et al 2004 Esuli and Sebastiani 2005)Pointwise Mutual Information (PMI) using Seed Words has beenapplied to sentiment lexicon propagation by Turney (2002) Turneyand Littman (2003) Rao and Ravichandran (2009) Velikovich et al(2010) Gamon and Aue (2005) Seed words are words which arestrongly associated with a positivenegative meaning such as eccel-

5More details about the lexicon propagation will be given in Section 351

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 45

lente (ldquoexcellentrdquo) or orrendo (ldquohorriblerdquo) by which it is possible tobuild a bigger lexicon detecting other words that frequently occuralongside them It has been observed indeed that positive wordsoccur often close to positive seed words whereas negative words arelikely to appear around negative seed words (Turney 2002 Turneyand Littman 2003)PMI could be easily calculated in the past using the Web as a corpusthanks to the AltaVistarsquos NEAR operator Today one must be contentwith the Google AND operatorLearning and statistical methods for Sentiment Analysis intent usuallymake use of Support Vector Machine (Pang et al 2002 Mullen andCollier 2004 Ye et al 2009) or Naiumlve Bayes classifiers (Tan et al 2009Kang et al 2012)In the end as regards the hybrid methods it has to be cite the workof Read and Carroll (2009) Li et al (2010) Andreevskaia and Bergler(2008) Dang et al (2010) Dasgupta and Ng (2009) Goldberg and Zhu(2006) Prabowo and Thelwall (2009) and Wan (2009)6A brief summary about the nature and the size of the most popularlexicons for the Sentiment Analysis is given in Tables 31 and 32A clarification needs to be done about the distinction between Polar-

ity lexicons and Affect lexicon The first ones are used in subjectivitydetection or in polarity and intensity classification systems while thelatter going beyond the dichotomy positivenegative refer to a morefine-grained emotion classificationIn the following paragraphs we will firstly describe some of the mostfamous Polarity lexicon and then we will briefly cite some databasesfor affect classificationAnother distinction will be made among the resources conceived forthe English language and the ones created for other languages TheItalian language will especially be considered

6These contributions will be deepened in Section 511

46 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Dictionary Author

Year

En

trie

s

Lan

guag

e

SO-CAL lexicon Taboada 2011 5000+ EngAFINN-111 Hansen 2011 2400+ EngSentilex Silva 2010 7000+ PortQ-WordNet 30 Agerri et al 2010 15500+ EngDAL-2009 Whissell 2009 8700+ EngMPQA lexicon Wilson et al 2005 8000+ EngHu-Liu lexicon Hu et al 2004 6800+ EngHatzivassiloglou lexicon Hatzivassiloglou 1997 15400+ pairs EngGeneral Inquirer Stone 1961 9800+ Eng

Table 31 Manually built Polarity Lexicons

Dictionary Author

Year

En

trie

s

Lan

guag

eSentix Basile Nissim 2013 59700+ ItaSentiWordNet Baccianella et al 2010 38000+ EngWeb-generated lexicon Velikovich et al 2010 178000+ EngMaryland dictionary Mohammad et al e 2009 76000+ Eng

Table 32 (Semi-)Automatically built Polarity Lexicons

321 Polarity Lexicons

SO-CAL dictionary Taboada et al (2011) due to the low stability ofthe automatically generated lexical databases manually devel-oped the SO-CAL dictionary They hand tagged with an evalu-ation scale that ranged from +5 to -5 the semantically orientedwords they found into a variety of sources

bull the multi-domain collection of 400 reviews belonging to dif-ferent categories described in Taboada and Grieve (2004)Taboada et al (2006)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 47

bull 100 movie reviews from the Polarity Dataset of Pang et al(2002) Pang and Lee (2004)

bull the whole General Inquirer dictionary (Stone et al 1966)

The result was a dictionary of 2252 adjectives 1142 nouns 903verbs and 745 adverbs The adverb list has been automaticallygenerated by matching adverbs ending in ldquo-lyrdquo to their poten-tially corresponding adjective Moreover also a set of multiwordexpressions (152 phrasal verbs eg ldquoto fall apartrdquo and 35 intensi-fier expressions eg ldquoa little bitrdquo) have been taken into accountIn case of overlapping between a simple word (eg ldquofunrdquo +2) anda multiword expression (eg ldquoto make fun ofrdquo -1) with differentpolarity the latter possess the higher priority in the annotationprocess

AFINN It is a list of English words that associates 1446 words with avalence between +5 (positive) and -5 (negative) (Hansen et al2011) AFINN-111 that is its newest version includes 2477words and phrases in its list The drawing of the list started froma a set of obscene words and then gradually extended by examin-ing Twitter posts This has lead to a bias of 65 towards negativewords compared to positive words (Nielsen 2011)

Q-WordNet It is a lexical resource consisting of WordNet senses auto-matically annotated by positive and negative polarity (Agerri andGarciacutea-Serrano 2010) It is grounded on the hypothesis that if apositive synset is matched in a gloss then its synset can be alsoannotated as positive Exceptions are the cases under the scopeof a negation that are classified as the opposite of the matchedlemmaWhat differentiates Q-WordNet from SentiWordNet is the factthat positivenegative labels are attributed to word senses at acertain level by means of a graded classification

Multi-Perspective Question Answering Lexicon The MPQA subjectiv-

48 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ity lexicon (Wiebe et al 2004 Wilson et al 2005) is comprised by8000+ subjectivity clues (words and phrases that may be used toexpress private states) annotated by type (strongsubj or weakly-subj) and prior polarity Subjectivity clues and contextual fea-tures have been learned from two different kinds of corpora asmall amount of data manually annotated at the expression-level from the Wall Street Journal and newsgroup data and alarge amount of data with existing document-level annotationsfrom the Wall Street Journal The focus is on three types of sub-jectivity clues

bull hapax legomena words that appeared just once in the cor-pus

bull collocations identified through fixed n-grams

bull adjective and verb features identified using the results of amethod for clustering words according to distributional sim-ilarity

Hu-Liu lexicon The Hu and Liu (2004) wordlist comprises around6800 English items (2006 positive and 4783 negative) includ-ing slang misspellings morphological variants and social mediamark up

Hatzivassiloglou Lexicon Hatzivassiloglou and McKeown (1997)starting from a list of 1336 manually labeled adjectives (657 pos-itive and 679 negative terms) demonstrated that conjunctionsbetween adjectives can provide indirect information about theirorientation In detail the authors extracted the conjunctionsthrough a two-level finite-state grammar able to cover modifi-cation patterns and noun-adjective apposition and tested theirmethod on 21 million word from the Wall Street Journal corpusThis way they collected 13426 conjunctions of adjectives furtherexpanded to a total of 15431 conjoined adjective pairs

Dictionary of Affect in Language The Whissel (1989) Whissell (2009)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 49

Dictionary of Affect (DAL) lists about 4500 English words Itsmore recent version (Whissell 2009) is provided with 8742 wordsmanually annotated fro their Activation Evaluation ImageryWhissell (2009)

General Inquirer It is an IBM 7090 program system developed at Har-vard in the spring of 1961 for content analysis research problemsin behavioral sciences (Stone et al 1966 Stone and Hunt 1963)It is based on the idea that through the analysis of many vari-ables at once it is often possible to discover trends that can ap-pear ldquolatent to casual observationThe description of the ldquosystematicrdquo content analysis processesmust be as ldquoobjectiverdquo as possible the features of texts must beldquomanifestrdquo and the results ldquoquantitativerdquoThe lexicon attaches syntactic semantic and pragmatic infor-mation to part-of-speech tagged words Among others the cate-gories related to polarity measures are the ones pertaining to thethree semantic dimensions of Osgood (1952)

Positive 1915 words of positive outlook

Negative 2291 words of negative outlook

Strong 1902 words implying strength

Weak 755 words implying weakness

Active 2045 words implying an active orientation

Passive 911 words indicating a passive orientation7

SentiWordNet Esuli and Sebastiani (2006a) developed the Senti-WordNet 10 lexicon based on WordNet 20 (Miller 1995) by auto-matically associating each WordNet synset to three scores Obj(s)for objective terms and Pos(s) and Neg(s) for positive and nega-tive terms

7Table 31 refers to the General Inquirer size only in terms of sentiment words

50 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Every score ranges from 00 to 10 and their sum is 10 for eachsynset The values are determined on the base of the propor-tion of eight ternary classifiers (with similar accuracy levels butdifferent classification behavior) that quantitatively analyze theglosses associated with every synset and assign them the properlabelWhat differentiates SentiWordNet from other sentiment lexiconsis the fact that the weighting process focuses on synonym setsrather than on simple terms The researchers intuitively ob-serve that different senses of the same term could have differentopinion-related propertiesSentiWordNet 30 (Baccianella et al 2010) improves SentiWord-Net 10 The main differences between them are the version ofWordNet they annotate (30 for SentiWordNet 30) and the algo-rithm used to annotate WordNet that in the 30 version alongwith the semi-supervised learning step also includes a random-walk step that perfects the scores

Velikovich Web-generated lexicon This dictionary of sentiment thathas been built through graph propagation algorithms and lexi-cal graphs counts 178104 entries which include not only sim-ple words but also spelling variations slang vulgarity and mul-tiword expressions (Velikovich et al 2010)

Maryland dictionary The procedure on which Mohammad et al(2009) built the Maryland lexicon can be split into two steps theidentification of a seed set of positive and negative words and thepositive and negative annotation of the words synonymous in aRoget-like thesaurusEleven antonym-generating affix patterns have been used to gen-erate overtly marked words and their unmarked counterpartsThis way such patterns generated 2692 pairs of valid Englishwords listed in the Macquarie ThesaurusThe main innovation introduced by Mohammad et al (2009) is

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 51

the autonomy from the manual annotation of any term in thedictionary even if it could be used when available to improveboth the correctness and the coverage of its entries

322 Affective Lexicons

SentiSense SentiSense (de Albornoz et al 2012) is a semi-automatically developed affective lexicon which consists of5496 words and 2190 synsetsSimilarly to SentiWordNet or WordNet-Affect it associates emo-tional meanings or categories to concepts (instead of terms) fromthe WordNet lexical database and can be used for both polarityand intensity classification and emotion identificationSynsets are labeled with a set of 14 emotional categories whichare also related by an antonym relationships Examples are dis-gustlike lovehate sadnessjoy) ambiguous anger anticipa-tion calmness despair disgust fear hate hope joy like lovesadness surprise

SentiFul and the Affect Database Neviarouskaya et al (2009b 2011)developed the SentiFul lexicon by automatically expanding thecore of sentiment lexicon through many different strategies

bull synonymy relation

bull direct antonymy relation

bull hyponymy relations

bull derivation and scoring of morphologically modified words

bull compounding of sentiment-carrying base components

In summary 4190 new words connected to the known onesthrough semantic relations have been automatically extractedfrom WordNet and 4029 lemmas have been automatically de-

52 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

rived with the morphological method The starting point of thework has been the Affect Database of Neviarouskaya et al (2007)that contained 364 emoticons 337 popular acronyms and abbre-viations (eg bl ldquobelly laughingrdquo) words standing for commu-nicative functions (eg ldquowowrdquo ldquoouchrdquo etc) 112 modifiers (egldquoveryrdquo ldquoslightlyrdquo etc) and 2438 direct and indirect emotion-related entries (1627 of which taken from WordNet-Affect)In the Affect Database emotion categories (anger disgust fearguilt interest joy sadness shame an surprise) and intensity (from00 to 10) was manually assigned to each entry by three indepen-dent annotators

Appraisal Lexicon The appraisal lexicon (Argamon et al 2009) hasbeen built through semi-supervised learning on a set of WordNetglosses containing adjectives and adverbs among which only asmall subset of the training data was manually labelledThe construction of the lexicon is grounded on the AppraisalTheory that is a linguistic approach for the analysis of subjec-tive language In this framework several sentiment-related fea-tures are assigned to relevant lexical items The feature set thatallows more subtle distinctions than the most used binary classi-fication PositiveNegative includes the labels orientation (Posi-tive or Negative) attitude type (Affect Appreciation Judgment)and force (Low Median High or Max)

WordNet-Affect WordNet-Affect is another extension of WordNetfor the lexical representation of affective knowledge (Strappar-ava et al 2004) Thanks to the synset model it puts in relationa set of affective concepts with a number of affective words Ina first step in the WordNet-Affect creation the affective corehas been created starting from almost 2000 terms directly orindirectly referring to mental states Lexical and affective in-formation have then been added to each item by creating forthem specific frames In the end the senses of terms have been

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 53

mapped to their respective synsets in WordNet The synsets havebeen specialized with a hierarchically organized set of emotionalcategories (namely emotion mood trait cognitive state physi-cal state edonic signal emotion-eliciting situation emotional re-sponse behaviour attitude sensation) Furthermore the emo-tional valences have been distinguished by four additional a-labels positive negative ambiguous and neutral (Strapparavaet al 2006)WordNet-Affect contains 2874 synsets and 4787 words (Strap-parava et al 2004)

323 Subjectivity Lexicons for Other Languages

3231 Italian Resources

The largest part of the state of the art works on polarity lexicons forSentiment Analysis purposes focuses on the English language ThusItalian lexical databases are mostly created by translating and adapt-ing the English ones SentiWordNet and WordNet-Affect among oth-ers Basile and Nissim (2013) merged the semantic information be-longing to existing lexical resources in order to obtain an annotatedlexicon of senses for Italian Sentix (Sentiment Italian Lexicon) Ba-sically MultiWordNet (Pianta et al 2002) the Italian counterpart ofWordNet (Miller 1995 Fellbaum 1998) has been used to transfer po-larity information associated to English synsets in SentiWordNet toItalian synsets thanks to the multilingual ontology BabelNet (Navigliand Ponzetto 2012)Every Sentixrsquos entry is described by information concerning its part ofspeech its WordNet synset ID a positive and a negative score fromSentiWordNet a polarity score (from -1 to 1) and an intensity score(from 0 to 1)The dictionary contains 59742 entries for 16043 synsets Details

54 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

about its composition are reported in Table 33

PoS Lemmas Synsetsnoun 52257 12228adjective 3359 1805verb 2775 1260adverb 1351 750all 59742 16043

Table 33 Sentix Basile and Nissim (2013)

Moreover Steinberger et al (2012) verified a triangulation hypoth-esis for the creation of sentiment dictionaries in many languagesnamely English Spanish Arabic Czech French German Russian andalso ItalianThe starting point is the semi-automatic generation of two pilot sen-timent dictionaries for English and Spanish that have been automat-ically transposed into the other languages by using the overlap of thetranslations (triangulation) and then manually filtered and expandedAs regards the Italian lexicon the authors collected 918 correctly tri-angulated terms and 1500 correctly translated terms from the startingdictionariesHernandez-Farias et al (2014) achieved good results in the Sen-tiPolC 2014 task by semi-automatically translating in Italian differ-ent lexicons namely SentiWordNet Hu-Liu Lexicon AFINN LexiconWhissel Dictionary etcBaldoni et al (2012) proposed an ontology-driven approach to Sen-timent Analysis where tags and tagged resources are related to emo-tions They selected representative Italian emotional words and usedthem to query MultiWordNet The synsets connected to these lem-mas were then processed with WordNet-Affect in order to populatethe emotion ontology only with the words belonging to synsets thatrepresented affective information The resulting ontology containsabout 450 Italian words referring to the 87 emotional categories of On-

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 55

toEmotion an ontology conceived for the categorization of emotion-denoting words Furthermore thanks to the SentiWordNet databaseevery synset has been associated to the neutral positive or negativescores

3232 Sentiment Lexical Resources for Different Languages

SentiLex-PT (Silva et al 2010 2012) is a sentiment lexicon for Por-tuguese made up of 7014 lemmas distributed in this way

bull 4779 adjectives

ndash eg castigadoPoS=AdjTG=HUMN0POLN0=-1ANOT=JALC

bull 1081 nouns

ndash eg aberraccedilatildeoPoS=NTG=HUMN0POLN0=-1ANOT=MAN

bull 489 verbs

ndash eg enganarPoS=VTG=HUMN0N1POLN0=-1POLN1=0ANOT=MAN

bull 666 verb-based idiomatic expressions

ndash eg engolir em secoPoS=IDIOMTG=HUMN0POLN0=-1ANOT=MAN

The sentiment entries are all human nouns modifiers compiled fromdifferent publicly available corpora and dictionariesThe tagset used to describe the attributes of the lemmas is explainedbelow

bull part-of-speech (ADJ N V and IDIOM)

bull polarity (POL) that ranges between 1 (positive) and -1 (negative)

bull target of polarity (TG) which corresponds to a human subject(HUM)

bull polarity assignment (ANOT) that specifies if the annotation hasbeen performed manually (MAN) or automatically by the Judg-ment Analysis Lexicon Classifier (JALC)

56 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Differently from Silva et al (2010 2012) that focused on thedomain-specific features of the opinion lexicon of social judgmentSouza et al (2011) proposed for the Portuguese language a domain-independent lexicon composed by 7077 polar words (adjectivesverbs and nouns) and expressionsAs regards the dictionary population the authors applied three differ-ent methods

bull corpus based approach applied on 1317 documents annotatedusing the Pointwise Mutual Information

bull thesaurus based approach on 44077 words and their semanticrelations from the TEP thesaurus (Maziero et al 2008)

bull translation based from the Liursquos English Opinion Lexicon (Huand Liu 2004)

Other contribution that deserve to be cited are Perez-Rosas et al(2012) for Spanish Mathieu (2006 1999a 1995) for French Clematideand Klenner (2010) Waltinger (2010) and Remus et al (2010) for Ger-man Abdul-Mageed et al (2011) Abdul-Mageed and Diab (2012) forArabic and Chetviorkin and Loukachevitch (2012) for RussianAs regards Multilingual Sentiment Analysis we can mention Balahurand Turchi (2012) Steinberger et al (2012) Pak and Paroubek (2011)and Denecke (2008) among others

33 SentIta and its Manually-built Resources

In this Section we will go in depth in the description of SentIta the LGbased Sentiment lexicon for the Italian language that has been semi-automatically built on the base of the richness of both the Italian mod-ule of Nooj and the Italian Lexicon-Grammar resourcesThe tagset used for the Prior Polarity annotation Osgood (1952) anno-tation of the resources is composed of four tags

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 57

Lexical Item Translation Tag Description Scoremeraviglioso wonderful +POS+FORTE Intensely Positive +3colto cultured +POS Positive +2rasserenante calming +POS+DEB Weakly Positive +1nuovo new - Neutral 0insapore flavourless +NEG+DEB Weakly Negative -1cafone boorish +NEG Negative -2disastroso disastrous +NEG+FORTE Intensely Negative -3

Table 34 SentIta Evaluation tagset

Lexical Item Translation Tag Description Scoregrande big +FORTE Intense +velato veiled +DEB Weak -

Table 35 SentIta Strength tagset

POS positive

NEG negative

FORTE intense

DEB weak

Such labels if combined together can generate an evaluation scalethat goes from -3 to +3 (see Table 34) and a strength scale that rangesfrom -1 to +1 (see Table 35)Neutral words (eg nuovo ldquonewrdquo with score 0 in the evaluation scale)have been excluded from the lexiconThe main difference between the words listed in the two scales is thepossibility to use them as indicators for the subjectivity detection taskBasically the words belonging to the evaluation scale are ldquoanchorsrdquothat begin the identification of polarized phrases or sentences whilethe ones belonging to the strength scale are just used as intensitymodifiers (see Section 42)

Details about the lexical asset available for the Italian language issummarized in Table 36 Here we report all the items contained in

58 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Category Entries Example TranslationAdjectives 5381 allegro cheerfulAdverbs 3693 tristemente sadlyCompound Adv 774 a gonfie vele full steam aheadIdioms 577 essere in difetto to be in faultNouns 3215 eccellenza excellencePsych Verbs 604 amare to loveLG Verbs 651 prendersela to feel offendedBad words 182 leccaculo arse lickerTot 15077 - -

Table 36 Composition of SentIta

SentIta including the ones automatically derived (namely adverbsand nouns) that will be described in detail in Section 35Because hand-built lexicons are more accurate than the automaticallybuilt ones (Taboada et al 2011 Bloom 2011) we started the creationof the SentIta database with the manual tagging of part of the lemmascontained in the Nooj Italian dictionaries In detail adjectives andbad words have been manually extracted and evaluated starting fromthe Nooj Italian electronic dictionary of simple words preservingtheir inflectional (FLX) and derivational (DRV) properties Moreovercompound adverbs (Elia 1990) idioms (Vietri 1990 2011) psychverbs and other LG verbs (Elia et al 1981 Elia 1984) have beenweighted starting from the Italian LG tables in order to maintain thesyntactic semantic and transformational properties connected toeach one of them

331 Semantic Orientations and Prior Polarities

In the Sentiment Analysis field the Semantic Orientation is a sub-jectivity and an opinion measure that weights the polarity (posi-tivenegative) and the strength (intenseweak) of an opinion Other

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 59

concepts used to refer to SO are sentiment orientation polarity ofopinion or opinion orientation (Taboada et al 2011 Liu 2010)The Prior Polarity instead refers to the individual wordsrsquo SemanticOrientation and differs from the SO because it is always independentfrom the context (Osgood 1952) The second concept generally usedin sentiment dictionary population has been exploited also to manu-ally evaluate the words contained in the simple word dictionary of theItalian module of Nooj (Sdic_itdic)Before the description of the annotation process begins it must beclarified that at this stage of the work the multiple meanings thata single word can possess have not been taken into account In thissense SentIta is different from the resources in which Prior Polaritiesare assigned to ldquoconceptsrdquo rather than to mere words (eg SentiWord-Net) Such procedure gives rise to the need to reconstruct the Prior Po-larities starting from the various word meanings transforming themactually in Posterior Polarities (Gatti and Guerini 2012) An exam-ple is the word ldquocoldrdquo that possesses different meanings and conse-quently different polarities if associated to the words ldquobeerrdquo (with alow temperature neutral) or ldquopersonrdquo (emotionless weakly negative)(Gatti and Guerini 2012)From our point of view this distinction is redundant in a work that in-cludes a syntactic module We strongly support the idea that only thePrior Polarities must be contained in the lexicon and that the contextshould be used to compute them in a parallel module that is the syn-tactic oneIndeed our purpose is to reduce at the same time the presence of am-biguity the computational time and the size of the electronic dictio-naries Therefore the annotators of SentIta simply labeled the wordsof Sdic_it according to the meaning they thought dominant for eachword If out of any context the word ldquocoldrdquo can not be unequivo-cally associated to a positive or negative orientation we preferred toattribute it a neutral polarity

60 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

332 Adjectives of Sentiment

The annotation of the Sdic_it started with 34000+ adjectives byconsidering each word apart from any context The first decisionis binary and regards the choice between the positive or negativepolarity so the fist tags +POS or +NEG (that respectively correspondto the score +2 -2) are attached to the adjective under exam as shownin the following examples

coltoA+FLX=N88+DRV=ISSIMON88+POS

cafoneA+FLX=N88+DRV=ISSIMON88+NEG

A fine grained classification approach must include during thePrior Polarity attribution also the determination of the intensity ofthe oriented words The strength of the SO is therefore definedwith the combined use of the POSNEG tags and the FORTEDEBlabels The last two respectively increase or decrease of one point theintensity of the sentiment items This way as shown in Table 34 wecould take into account three different degrees of positivity and asmuch levels of negativity scoresAs we will explain better in Section 42 we selected among the ad-jectives of Sdic_it a list of about 650 words that did not receive anypolarity tag but that have just been labeled with information abouttheir intensity This is the case of the items reported below which arejust examples from the SentIta Strenght list

grandeA+FLX=N79+DRV=GRANDEN88+DRV=GRANDE-CN79+FORTE

velatoA+FLX=N88+DRV=ISSIMON88+DEB

Such words are not used to locate subjectivity in texts but they canbecome really advantageous when the task is to compute the actualorientation of a phrase The words endowed with the tag +FORTEare called intensifiers and the ones marked with +DEB downtoners(Taboada et al 2011) Their function is respectively to increase or

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 61

decrease the score of the polarized wordsDetails about the distribution of sentiment tag in the adjective dictio-nary are reported in Table 37Table 38 presents a summary in term of percentage values of thecomposition of the adjective dictionary in SentIta In the upper partof the Table percentages are evaluated on the base of SentIta that ac-counts 5381 adjectives which in the Table are indicated by tot ori-ented adj The differences between the positive and negative adjec-tives and the intensity modifiers are shown hereThe coverage of the SentIta adjectives with respect to the whole num-ber of adjectives contained in Sdic_it at a first glance seems to be verylow but if compared with the other part of speech it becomes moresignificantFor a LG treatment of adjectives see Section 335

Score Adjectives Entries+3 +POS+FORTE 111+2 +POS 1137+1 +POS+DEB 110-1 +NEG+DEB 770-2 +NEG 2375-3 +NEG+FORTE 240+ +FORTE 512- +DEB 126

Table 37 SentIta tag distribution in the adjective list

Adjectives Entries Percentage ()tot pos 1358 25 (SentIta)tot neg 3385 63 (SentIta)tot int 638 12 (SentIta)tot oriented adj 5381 16 (Sdic_it)tot neutral adj 28664 84 (Sdic_it)tot adj Sdic_it 34045 100

Table 38 Adjectives percentage values in SentIta

62 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3321 A Special Remark on Negative Words in Dictionaries and Positive Biasesin Corpora

The most important aspect to notice in Table 38 is the fact that thepositive items cover just the 25 of the total amount of oriented ad-jectives while with a percentage of 63 negative adjectives predom-inate the dictionaryJust as in SentIta also other sentiment lexicons present an excess ofnegative items such as the SO-CAL lexicon (Taboada et al 2011)AFINN-111 (Nielsen 2011) and the Maryland dictionary (Mohammadet al 2009)In texts instead exactly the opposite trend can be observed ThePollyanna hypothesis asserts the universal human tendency to usepositive words more frequently than negative ones through the pop-ular claim ldquopeople tend to look on (and talk about) the bright side ofliferdquo (Boucher and Osgood 1969 p 1)The experimental results of Montefinese et al (2014) and Warrineret al (2013) confirmed the so-called linguistic positivity bias theprevalence of positive words in different languagesGarcia et al (2012) also measured this bias by quantifying in the con-text of online written communication both the emotional contentof words in terms of valence and the frequency of word usage in thewhole indexable web They provided strong evidence that words witha positive emotional connotation are used more oftenAs regards the experiment in the Sentiment Analysis field that tried tosolve the positive bias we mention Taboada et al (2011) that uponnoticing the presence of almost twice as many positive as negativewords in their corpus addressed the problem by increasing the SO ofany negative expression by a fixed amount (of 50) Voll and Taboada(2007) overcame the same problem by shifting the numerical cut-offpoint between positive and negative reviewsIn this research we chose to avoid the introduction of unpredictableerrors due to the difficulty to accurately evaluate to what extent posi-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 63

tive and negative items are differently distributed in written texts

333 Psychological and other Opinion Bearing Verbs

In the present Section we will go through the more complex annota-tion of verbs of sentiment from available Lexicon-Grammar matricesWe will return to the adjectives of sentiment in Section 335 where itwill be addressed the issue of the adjectivalization of the psychologicalpredicates

3331 A Brief Literature Survey on the Classification of Psych Verbs

The difference among the direct (Experiencer subject) and reverse (Ex-periencer subject) representations of thematic roles is underlined inthe extensive body of scientific literature on psych verbs eg Chom-sky (1965) Belletti and Rizzi (1988) Kim and Larson (1989) Jackend-off (1990) Whitley (1995) Pesetsky (1996) Filip (1996) Baker (1997)Martiacuten (1998) Arad (1998) Klein and Kutscher (2002) Bennis (2004)Landau (2010)The thematic roles involved are the following

Experiencer the entity or the person that experiences the reaction

Cause ldquothe thing person state of affairs that causes the reaction(Whitley 1995) ldquoor cognitive judgment in the experiencerrdquo (Kleinand Kutscher 2002)

These two main classes8 correspond to three classes proposed byBelletti and Rizzi (1988) in which the reverse representation is splitaccording to the presence of a direct or indirect object mapped withthe role of experiecer

8These classes of psych verbs are also known as the Fear class (direct) and the Frighten class (reverse)(Jackendoff 1990 Pesetsky 1996 Baker 1997) on the base of the syntactic positions occupied by thethematic roles of the experiencer and the cause (theme)

64 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class I verbs which have the experiencer as subject and the cause asdirect object (eg temere ldquoto be afraid ofrdquo)

Class II verbs which have the cause as subject and the experiencer asdirect object (eg preoccupare ldquoto worryrdquo)

Class III verbs which have the cause as subject and the experienceras indirect object in dative case (eg piacere ldquoto likerdquo9)

Transitivity is taken into account also by Whitley (1995) who in-stead identified four classes of psych verbs for the Spanish

Class I transitive-direct (eg amar ldquoto loverdquo)

Class II intransitive-direct (eg gozar de ldquoto enjoyrdquo)

Class III intransitive-reverse (eg gustar ldquoto likerdquo)

Class IIII transitive-reverse (eg tranquilizar ldquoto appeaserdquo)

3332 Semantic Predicates from the LG Framework

Shifting the focus on the works realized into the Lexicon-Grammarframework we find again the distinction among two types of pos-sible constructions on the base of the syntactic position (subject orcomplement) of the person who feels the sentiment (Mathieu 1999aRuwet 1994) and of course the distinction between the transitiveand intransitive verbsThe (reverse) French verbs of sentiment that enter into the transitivestructure N0 V N1 have been examined by Mathieu (1995)10 In thisstructure N0 the stimulus of the sentiment can be a human concreteor abstract noun or a completive or infinitive clause and N1 theexperiencer of the sentiment is a human noun or also other animated

9Note that if the Italian verb piacere is classified in this class of Belletti and Rizzi (1988) its Englishtranslation ldquoto likerdquo belongs to the first one

10The analysis of Mathieu (1995) is based on the syntactic model of the LG French Table 4

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 65

nouns such as animals or gods (Ruwet 1994 see example 3)

(3)

Luc

Ce t abl eauLe mensong e

Que Luc soi t venuTr embler

i r r i te M ar i e (Mathieu 1995)

ldquo(Luc + This table + The lie + That Luc has come + To tremble) irri-tates Marierdquo

Nevertheless Ruwet (1994) argued that a number of predicatesfrom this class11 select an Num in position N1 that do not play the roleof Experiencer as happens in the example (5) differently from whathappens in (4) According with Ruwet (1994) le ministre of (4) doesnot feel any sentiment he is instead the object of a negative (public)judgment caused by the stimulus expressed by N0

(4) La lecture drsquoHomegravere passionne Maxime (Ruwet 1994)ldquoReading Homer moves Maximerdquo

(5) Ses deacuteclarations intempestives discreacuteditent le ministre (Ruwet 1994)ldquoHis untimely declarations discredit the Ministerrdquo

One of the main purposes of this thesis is to provide a Lexicon-Grammar based database for Sentiment Analysis tools We decided tostart from the LG works on psychological predicates because of theirrichness but we do not look at the Ruwet (1994) objection as a limit forour data collection It offers only a wider range of information whichwe associate to our data Sentiment Analysis does not focus only onmental states but searches for every kind of positive or negative state-ment expressed in raw texts therefore also a positive or negative (pub-lic) judgment represents a valuable resource in this fieldThe hypothesis is that the the Predicates under examination could bedistinguished into three subsets the ones indicating a mental state

11 Among the verbs pointed out by Ruwet (1994) there are abuser aigrir anoblir aplatir avilir bernerciviliser compromettre consacrer corrompre couler deacutedouaner deacutemasquer deacutenaturer etc

66 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

that select the roles experiencer and causer (amare ldquoto loverdquo odiareldquoto haterdquo) the ones connected to the manifestation of judgments that(following the Sentiment Analysis trend in terminology) involve theroles of opinion holder and target (eg difendere ldquoto defendrdquo con-dannare ldquoto condemnrdquo) and the ones related to physical acts that se-lect the roles patient and agent (eg baciare ldquoto kissrdquo schiaffeggiareldquoto slaprdquo)12 The mapping between the syntactic representations ofactants and the semantic representations of arguments is feasible andvaluable but from a LG point of view becomes a little bit more com-plex if compared with the generalizations described in the previousParagraph The big problem aroused by the LG method is that theSyntax-Semantics mapping can not ignore the idiosyncrasies of thelexiconIn the following paragraphs we will show the distribution of Psycho-logical Predicates among the classes 41 42 43 and 43B (3333) and wewill demonstrate that a large number of predicates evoking judgmentphysical positivenegative acts but also psychological states are dis-tributed among other 28 LG classes every one of which possesses itsown definitional syntactic structure The large number of structuresin which these verbs can enter underlines the importance of a senti-ment database that includes both lexiocn-based semantic and syntac-tic classification parameters

3333 Psychological Predicates

Among the verbs chosen for our sentiment lexicon a prominent posi-tion is occupied by the Predicates belonging to the Italian LG classes41 42 43 and 43B that constitute 47 of the total amount of verbsin SentIta (Table 311) From the lexical items contained in such LGclasses a list of 602 entries has been evaluated and hand-tagged withthe same labels used to evaluate the adjectives

12An experiment realized on these frames is reported in Section 53

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 67

In the following Paragraphs we will describe some of the peculiaritiesof the psych verbs from the classes 41 42 43 and 43B questioning inmany cases their psychological nature We will then briefly examinethe presence of subjective verbs in other 28 LG classes

Psych Verb Translation LG Class Scoreangosciare ldquoto anguish 41 -3piacere ldquoto like 42 +2amare ldquoto love 43 +3biasimare ldquoto blame 43B -2

Table 39 Examples from the Psych Predicates dictionary

LGC

lass

TotC

lass

Psy

chVe

rbs

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

()

41 599 525 136 325 45 19 8842 114 8 4 4 0 0 743 298 39 19 20 0 0 1343B 142 32 11 21 0 0 23Tot 1153 604 170 368 45 19 52

Table 310 Psych Predicates chosen for SentIta

LG Class 41 Ch F V N1

The verbs from this LG class have as definitional structure N0 V N1in which

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquobut with few exceptions it can be also a human a concrete or anabstract noun

68 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Sen

tIta

Verb

s

TotI

tem

s

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Psych Verbs 604 170 368 45 19Other Verbs 651 189 350 79 33

Tot 1255 359 718 124 52

Table 311 All the Semantic Predicates from SentIta

bull N1 is a human noun (Num)

A great part of these verbs (436 entries) with homogeneous syntacticbehavior can be grouped together also because of a semantic homo-geneity connected to their psychological nature (Elia 1984) examplesof this subset are reported below in their dictionary form

addolorareV+NEG+Sent+FLX=V3+DRV=RI+41

ingelosireV+NEG+DEB+Sent+FLX=V201+41

perturbareV+NEG+Sent+FLX=V3+41

rallegrareV+POS+Sent+FLX=V3+DRV=RI+41

rincuorareV+POS+DEB+Sent+FLX=V3+41

tormentareV+NEG+FORTE+Sent+FLX=V3+DRV=RI+4113

This Psychological subgroup identified in the electronic dictionarythrough the tag +Sent exactly like the French verbs from the LG class4 enters into a reverse transitive psych-verb structure by selectingan Experiencer as object N1 and a Stimulus as N0 as exemplified in (6)

13ldquoTo grieve to make jealous to upset to cheer up to reassure to tormentrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 69

(6) [Sentiment[StimulusLe sue parole]

addolorano[-2]

ingelosiscono[-1]

perturbano[-2]

rallegrano[+2]

rincuorano[+1]

tormentano[-3]

[ExperiencerMaria]]

ldquoHis words (grieve + make jealous + upset + cheer up + reassure +torment) Mariardquo

The objection of Ruwet (1994) on the psychological semantichomogeneity on this class can be applied also to the Italian LG class41 The extract of the dictionary reported below explains it better thanwords

danneggiareV+NEG+Op+FLX=V4+41

esautorareV+NEG+Op+FLX=V3+41

glorificareV+POS+Op+FORTE+FLX=V8+41

ledereV+NEG+Op+FLX=V34+41

miticizzareV+POS+Op+FLX=V3+41

smascherareV+NEG+Op+DEB+FLX=V3+4114

122 items from the class 41 have been selected and marked withthe tag +Op to indicate according with Ruwet (1994) that they selectan N1 that does not plays the role of the Experiencer but that repre-sents just the Target of a positive or negative judgment Differentlyfrom other verbs also tagged with the +Op label (eg condannare ldquotocondemnrdquo) here N0 is not the Opinion Holder but just the Causethat justifies the opinion conveyed by the verb (see example 7) TheOpinion Holder is never expressed by the arguments of the predicatesfrom the class 41 because it can be played in turn by the publicopinion or the people (Ruwet 1994)

14ldquoTo damage to reduce or deprive of the authority to glorify to wrong to make into heroes to un-maskrdquo

70 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(7) [Opinion[CauseLe sue parole]

danneggiano[-2]

esautorano[-2]

glorificano[+3]

ledono[-2]

miticizzano[+2]

smascherarno[-2]

[TargetMaria]]

ldquoHis words (damage + reduce or deprive of the authority + glorify +wrong + make into heroes + unmask) Mariardquo

Another subgroup of the verbs from this class marked in the LGtable as accepting the property V= uso concreto (ldquoV= concrete userdquo)possesses also a concrete meaning This is the case of abbattere ldquotodemolishrdquo (fig ldquoto discouragerdquo) accendere ldquoto lightrdquo (fig ldquoto fire uprdquo)colpire ldquoto hitrdquo (fig ldquoto moverdquo) etcOf course the semantic properties of the figurative verbs are notshared with their concrete homographsThe automatic disambiguation of these metaphors is sometimes verysimple because it can be based on the divergent properties of the dif-ferent LG classes to which the concrete and figurative words belong (89)

Concrete usage (Class 20NR)

(8) Carla accende il fiammifero (Elia 1984)ldquoCarla lights the matchrdquo

Figurative usage (Class 41)

(9) Parlare di rivoluzione accende Arturo (Elia 1984)ldquoTalking about revolution fires up Arturordquo

As exemplified the verb accendere belongs also to the class 20NRin which it does not accept a completive as N0 (Parlare di rivoluzioneaccende il fiammifero ldquoTalking about revolution lights the matchrdquo)neither a human noun as N1 (Carla accende il fratello ldquoCarla lights herbrotherrdquo) or a body part noun (Carla accende la gamba del fratelloldquoCarla lights her brotherrsquos legrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 71

In other cases it proves more difficult but it can be solved as well withthe help of semantically annotated dictionaries This happens in theexamples below in which the fact that this verb is marked in the class20NR as accepting the property Prep N2strum makes it possible toautomatically distinguish its concrete use (10) from the figurative one(11)

Concrete usage

(10) C ar l a accende i l f uoco del l a

stu f ag r i g l i acuci na

ldquoCarla lights the fire of the (stove + grill + cooker)

Figurative usage

(11) C ar l a accende i l f uoco del (l a)

r i vol uzi onecambi amento

passi one

ldquoCarla ignites the fire of (revolution + change + passion)rdquo

LG Class 42 Ch F V Prep N1

The verbs from the class 42 enter into an intransitive reverse psych-verb structure and are defined by the LG formula N0 V Prep N1 where

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquo) butin few cases can be also a human (Num) or a non restricted noun(Nnr)

bull N1 is a human noun (Num) but can be also a body part or anabstract noun of the kind discorso ldquo discourserdquo mente ldquomindrdquomemoria ldquomemoryrdquo (Elia 1984)

Here we find a selection of Psychological Predicates so poor thatcan be entirely reported

72 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

aggradareV+POS+Sent+42+FLX=V403+Prep=a

arridereV+POS+Sent+42+FLX=V34+Prep=a

dispiacereV+NEG+Sent+42+FLX=V37+Prep=a

spiacereV+NEG+Sent+42+FLX=V37+Prep=a

piacereV+POS+Sent+42+FLX=V37+DRV=RI+Prep=a15

The importance of this LG class into SentIta is not given by the(very small) amount of entries it accounts but depend on the ex-tremely high frequency of the verb piacere ldquoto likerdquoHere the prepositions (Prep) have been associated directly in thedictionary in order to avoid ambiguity with other verb usagesThe semantic annotation does not raise any problem as can bededuced from the example (12)

(12) [Sentiment[Stimulus

I l f at to che si f umi

C he si f umiFumar e

I l f umar eI l f umo

]

ag g r ad a[+2]

di spi ace[-2]

pi ace[+2]

a[ExperiencerMaria]]

ldquo(The fact that people smokes + That people smokes + Smoking +The smoke) (pleases + regrets + is appreciated by) Mariardquo

The main difference with the class 41 added to its intransitivity isthe fact that in Italian a more easily accepted form is the permutedone (Elia et al 1981 see example 13)

(13) [SentimentA [ExperiencerMaria] piace[+2] [Stimulusfumare]]ldquoMaria likes smokingrdquo

Again a subset of verbs from this class can be marked as opinionindicator

15ldquoto please to favor to regret to likerdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 73

Both N0 and N1 can be human nounsincombereV+NEG+Op+42+FLX=V21+Prep=loc+N0um+N1um

primeggiareV+POS+Op+FLX=V4+Prep=su+N0um+N1um

contrastareV+NEG+Op+FLX=V3+Prep=con+N0um+N1um

contareV+POS+Op+FLX=V3+Prep=per+N0um+N1um

Just N1 can be a human nounaddirsiV+42+POS+Op+FLX=V263+PRX+Prep=a+N1um

sconvenireV+NEG+Op+FLX=V236+Prep=a+N1um

convenireV+POS+Op+FLX=V236+Prep=a+N1um

costareV+NEG+Op+FLX=V3+Prep=a+N1um

Neither N0 nor N1 can be human nounsdegenerareV+NEG+FORTE+Op+42+FLX=V3+DRV=BILEN79+Prep=in

stridereV+NEG+Op+FLX=V2+Prep=con16

Not every verb of this subset accepts a human noun as N1 (see exam-ples 14 and 15)

(14) [Opinion[TargetQueste idee]str i dono[-2] con

i mi ei obi et t i vil a tua i mmag i ne

lowastM ar i a

]

ldquoThese ideas are at odd with (my purposes + your image + Maria)rdquo

(15) [Opinion[TargetLa situazione]degenera[-3] in un

dr ammacon f l i t tolowastM ar i a

]

ldquoThe situation degenerates into a (drama + conflict + Maria)rdquo

In such cases the Opinion Holder remains implicit but the judg-ment negatively affects the subject of the sentence instead of the N1Sentences like (16) and (17) raise to doubt whether N1 can be theHolder of the opinion expressed by the sentence or not

16ldquoto loom over to excel to hinder to worth to suit to be inconvenient to be convenient to cost todegenerate to clashrdquo

74 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(16) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion HolderArturo]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto Arturordquo

(17) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[convi ene[+2]

cost a[-2]

][Opinion Holderad Arturo]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for Arturordquo

Actually the verb does not provide any information regarding theactive or passive role of N1 in the expression of the opinion Thewriter can be convinced about what ldquomatters is convenient or costsrdquofor Arturo but it remains a belief that is not confirmed in the text (1617) Therefore we chose to assign the Opinion Holder role to the N1

of the class 42 only in the case in which the object corresponds to thepersonal pronouns ldquomerdquo or ldquousrdquo (18 19)

(18) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion Holderme]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto merdquo

(19) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[Opinion Holderci]

[convi ene[+2]

cost a[-2]

]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for usrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 75

LG Classes 43 and 43B N0 V Ch F and N0 V il fatto Ch F

The sentence structure that defines this class is N0 V N1

bull N0 in both the classes 43 and 43B is generally a human noun(Num)

bull N1 is an infinitive clause or a completive one which is direct forthe 43 and introduced by il fatto (Ch F + di) ldquothe fact (that S + of)rdquo)for the 43B

The psychological (Sent) but also the opinion (Op) verbs from theclass 43 share in many cases the acceptance or the refusal of a set ofLG properties

deprecareV+Op+NEG+Class=43+FLX=V8

bestemmiareV+Op+NEG+FORTE+Class=43+FLX=V11

criticareV+Op+NEG+Class=43+FLX=V8

maledireV+Op+NEG+FORTE+Class=43+FLX=V216

tollerareV+Op+NEG+DEB+Class=43+FLX=V3

adorareV+Sent+POS+FORTE+Class=43+FLX=V3

amareV+Sent+POS+FORTE+Class=43+FLX=V3

ammirareV+Sent+POS+Class=43+FLX=V3

odiareV+Sent+NEG+FORTE+Class=43+FLX=V12

rimpiangereV+Sent+NEG+DEB+Class=43+FLX=V3117

As regards the distribution of N0 it must be noticed that all theseverbs accept a human noun but only two of them (condannare ldquotocomndemnrdquo and meritare ldquoto deserverdquo) accept also a not restrictednoun (Nnr not active) or a subjective clause introduced by il fatto (ChF + di) This fact is perfectly consistent with the animated nature ofthe Experiencer with respect to the Psychological verbs (20) Howeverit underlines that this class of verbs selects also an Opinion Holder asN0 (except for the verb meritare)

17ldquoto deprecate to blaspheme to criticize to curse to tolerate to adore to love to admire to haterdquo

76 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(20) [Sentiment[ExperiencerMaria]

[ador a[+3]

odi a[-3]

][Stimulus

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (adores + hates) (Arturo + that Arturo smokes + the fact thatArturo smokes + the smoke)rdquo

(21) [Opinion[Opinion HolderMaria]

[cr i t i ca[-2]

tol ler a[-1]

][Target

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (criticizes + tolerates) (Arturo + that Arturo smokes + thefact that Arturo smokes + the smoke)rdquo

As far as the distribution of N1 is concerned all the psych and opin-ion items accept the pronoun ciograveldquothisrdquo and the Ppv lo ldquoithimherrdquo(apart from inveire ldquoto inveighrdquo) The only verbs that refuse Numas complement are auspicare ldquoto hope forrdquo inveire ldquoto inveighrdquolamentare ldquoto lamentrdquo malignare ldquoto speakthink ill ofrdquo meritare ldquotodeserverdquo agognare ldquoto yearn forrdquo anelare ldquoto long forrdquo deilrare ldquototalk nonsenserdquo and gustare ldquoto tasterdquo The verbs that instead do notaccept a N-um are fewer inveire and delirareThe property N1 dal fatto Ch F is refused by every verb except forapprezzare ldquoto admirerdquo that together with sopportare ldquoto sufferrdquodesiderare ldquoto desirerdquo and bramare ldquoto craverdquo accept also the propertyN1 da N2 Other properties refused in bulk are Ch N0 V essere Comp(save malignare sospettare ldquoto suspectrdquo temere ldquoto be afraid ofrdquo)N1 Agg (save sospettare temere) N1 = Agg Ch F (save sospettare) N0

V essere Cong (save sospettare and malignare) N1 = se F o se F (savebenedire ldquoto bless rdquo apprezzare and gustare) Ch F a N2um (saveapprovare ldquoto approverdquo)Similar observations but with an even higher homogeneity can be

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 77

made on the class 43B that differently from the 43 has the N1 alwaysintroduced by il fatto (Ch F + di)The Psychological (Sent) and the opinionated (Op) verbs from theclass 43B share sometimes the acceptance or the refusal of a set of LGproperties

attaccareV+Op+NEG+Class=43B+FLX=V8

biasimareV+Op+NEG+Class=43B+FLX=V3

lodareV+Op+POS+Class=43B+FLX=V3

osannareV+Op+POS+FORTE+Class=43B+FLX=V3

snobbareV+Op+NEG+Class=43B+FLX=V3

assaporareV+Sent+POS+Class=43B+FLX=V3

pregustareV+Sent+POS+Class=43B+FLX=V3

schifareV+Sent+NEG+FORTE+Class=43B+FLX=V3

sdegnareV+Sent+NEG+FORTE+Class=43B+FLX=V16

spregiareV+Sent+NEG+FORTE+Class=43B+FLX=V418

Here again an N0 different from a human noun can be acceptedjust by three verbs (banalizzare ldquoto trivializerdquo demitizzare ldquoto unideal-izerdquo and pregiudicare ldquoto compromiserdquo)The pronoun ciograveldquothisrdquo the Ppv lo ldquoithimherrdquo and a not human nounare accepted by all the Sent and Op verbs in complement positionwhile the N1um is refused by 2 psychological and 5 opinionated verbsAs regards the properties refused by almost all the verbs we account N1

dal fatto Ch F N1 da N2 (save difendere ldquoto defendrdquo) and Ch F a N2um(save apologizzare ldquoto make an apology forrdquo and vantare ldquoto praiserdquo)Differently from the class 43 in which the passive transformation isnot accepted in 3 cases in the class 43B it is accepted by all the items

18ldquo to attack to blame to praise to snub to taste to foretaste to be disgusted by to be indignant todespiserdquo

78 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3334 Other Verbs of Sentiment

We stated in Paragraph 3332 that almost half of the SentIta verbsdo not coincide with what is defined in literature as ldquopsych verbrdquoMoreover a lot of psych verbs and other opinion-bearing items canbe found in LG classes different from the ones classically identifiedas containing psychological predicates namely 41 42 43 43B (Elia2014a 2013) In detail a large number of verbs that can be used assentiment and opinion indicators is classified over 28 other LG classesdescribed by as many definitional structures This fact could be inter-preted as the failure of the syntactic-semantic mapping hypothesisbut actually into the LG framework the semantic entropy of the lex-icon can be decreased through the identification of classes of wordsthat share sets of lexical-syntactic behaviorsThe main Lexicon-Grammar verbal subclasses are the mentioned be-low

Intransitive verbs LG classes from 1 to 15 on which LG researcherstested the acceptance or the refusal of 545 combinatorial and dis-tributional properties through 23827 elementary sentences

Transitive verbs LG classes from 16 to 40 tested on 361 propertiesamong 19453 sentences

Complement clause verbs (transitive and intransitive) LG classesfrom 41 to 60 tested on 443 properties and 57242 sentences(Elia 2014a)

Table 312 shows the percentage values of opinionated verbsin each one of the Lexicon-Grammar verbal subclasses mentionedabove As we can see the complement clause group contains thelarger number of oriented items When inserted in our electronicdictionaries the lemmas from these LG classes are enriched with se-mantic information concerning the Types (Sent sentiments emo-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 79

LG Class

Verb

usa

ges

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intransitive verbs 401 138 21Transitive verbs 605 205 31Complement clause verbs 760 308 47Tot 1766 651 100

Table 312 Polar verbs in the main Lexicon-Grammar verbal subclasses

tions psychological states Op opinions judgments cognitive statesPhy physical acts) the Orientations (POS NEG) and the Intensities(STRONG WEAK ) of the described lemmasSimilarly to SentiLex (Silva et al 2010 2012) SentIta is provided withthe specification of the argument(s) Ni semantically affected by theorientation of the verbsThe presence of oriented verbs and the distribution of semantic tagsare respectively described in Tables 313 and 314 For a complete pre-sentation of LG verbal classes their composition and their granularitysee Elia (2014a)

334 Psychological Predicatesrsquo Nominalizations

The nominal entities of SentIta that will described in this Sectionare all predicative forms due to their correlation with the verbalpredicates described above eg angosciare ldquoto anguishrdquo angoscialdquoanguishrdquo (DrsquoAgostino 2005)The complex phenomenon related to the combinatorial constraintsbetween sentiment V-n and specific verbs in simple sentences hasbeen already explored by Balibar-Mrabti (1995) and Gross (1995) forthe French language In this regard Gross (1995) specified that ldquoles

80 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Mai

nLG

Cla

ss

LGC

lass

Verb

usa

ge

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intr

ansi

tive

Verb

s 2 122 34 282A 40 9 234 31 12 399 72 35 4910 56 19 3411 80 29 36

Tran

siti

veVe

rbs

18 40 8 2020I 25 1 420NR 83 15 1820UM 145 65 4521 29 2 721A 65 44 6822 63 28 4423R 34 13 3824 72 18 2527 37 10 2728ST 12 1 8

Co

mp

lem

entC

lau

seVe

rbs

44 33 18 5544B 44 20 4545 31 23 7445B 21 16 7647 313 120 3847B 34 12 3549 96 12 1350 51 41 8051 36 19 5353 28 9 3256 73 18 25

Tot - 1766 651 37

Table 313 Polar verbs in other LG verb classes which contain at least one orienteditem

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 81

Mai

nLG

Cla

ss

LGC

lass

Ori

ente

dVe

rbs

Sent Op POS NEG FORTE DEB

Intr

ansi

tive

Verb

s 2 34 7 24 8 26 0 02A 9 4 2 1 8 0 04 12 0 11 5 7 0 09 35 10 25 11 24 0 010 19 4 15 9 10 0 011 29 25 4 0 29 0 0

Tran

siti

veVe

rbs

18 8 0 0 4 4 0 020I 1 0 0 1 0 0 020NR 15 5 8 4 11 0 020UM 65 16 25 11 54 0 021 2 0 2 0 2 0 021A 44 9 14 17 6 11 1022 28 0 28 21 7 0 023R 13 0 0 4 9 0 024 18 0 0 0 1 15 227 10 0 10 10 0 0 028ST 1 0 0 0 1 0 0

Co

mp

lem

entC

lau

seVe

rbs

44 18 0 18 16 2 0 044B 20 2 18 14 6 0 045 23 6 16 7 15 1 045B 16 5 10 7 9 0 047 120 0 120 5 54 43 1847B 12 0 34 4 8 0 049 12 0 12 4 8 0 050 41 4 33 15 22 4 051 19 1 14 10 9 0 053 9 0 9 0 9 0 056 18 1 9 1 9 5 3

tot - 651 99 461 189 350 79 33

Table 314 Distribution of semantic labels into the LG verb classes that contain atleast one oriented item

82 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

restrictions de noms de sentiment agrave certains verbes sont nombreuseset tellement inattendues que dans un premier temps nous les avionstraiteacutees comme des expressions figeacutees crsquoest-agrave-dire comme de simpleslistes19rdquoAs regards Italian works in the LG framework we mention the worksof DrsquoAgostino (2005) DrsquoAgostino et al (2007) and Guglielmo (2009)that respectively explored the lexical micro-classes of the sentimentsira ldquoangerrdquo angoscia ldquoanguishrdquo and vanitagrave ldquovanityrdquo studying theirnominal manifestation in support verb constructionsDifferently from ordinary verbs (discussed here in Sections 333)support verbs do not carry any semantic information this role is justplayed here by the sentiment nouns but they can modulate it to acertain extentIn the present research the Nominalizations of Psychological verbshave been used to manually start the formalization of the SentItadictionary dedicated to nouns It comprehends 1000+ entries andincludes list of 80+ human nouns evaluated and added to this dic-tionary in order to use them in sentences like N0um essere un N1sent(eg Max egrave un attaccabrighe ldquoMax is a cantankerousrdquo)Table 315 shows one example of Psych V-n for each LG class took intoaccount during the annotation of Psychological Predicates

A sample of the entries from this dictionary is given below The LG

Psych verb V-n V-n Translation LG Class Scoreangosciare angoscia ldquoanguish 41 -3piacere piacere ldquopleasure 42 +2amare amore ldquolove 43 +3biasimare biasimo ldquoblame 43B -2

Table 315 Description of the Nominalizations of the Psychological Semantic Pred-icates

class is associated to a given item by the property ldquoClass that let

19ldquoThe restrictions of nouns of sentiment to certain verbs are numerous and so unexpected that at firstwe treated them as frozen expressions that is to say as mere listsrdquo Authorrsquos translation

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 83

the grammar recognize the proper syntactic structures in which thenominalizations occur in real texts

LG Class 41bellezzaN+POS+Class=41+DASOLO+FLX=N41

caloreN+POS+Class=41+FLX=N5

contentezzaN+POS+Class=41+DASOLO+FLX=N41

dolcezzaN+POS+Class=41+DASOLO+FLX=N41

fascinoN+POS+Class=41+DASOLO+FLX=N520

LG Class 42dispiacereN+FLX=N5+42+NEG+DASOLO

dispiacevolezzaN+FLX=N41+42+NEG+DASOLO

dispiacimentoN+FLX=N5+42+NEG+DEB+DASOLO

piacereN+FLX=N5+42+POS+DASOLO

piacevolezzaN+FLX=N41+42+POS+DASOLO21

LG Class 43approvazioneN+FLX=N46+43+POS+DASOLO

benedizioneN+FLX=N46+43+POS+DASOLO

bestemmiaN+FLX=N41+43+NEG+DASOLO

bramosiaN+FLX=N41+43+NEG+DEB+DASOLO

condannaN+FLX=N41+43+NEG+DASOLO22

LG Class 43BdenigrazioneN+FLX=N46+43B+NEG+FORTE+DASOLO

derisioneN+FLX=N46+43B+NEG+DASOLO

dileggioN+FLX=N12+43B+NEG+DASOLO

disdegnoN+FLX=N5+43B+NEG+FORTE+DASOLO

disprezzoN+FLX=N5+43B+NEG+FORTE+DASOLO23

Table 316 presents the details about the distribution of sentimenttags into the Psych V-n dictionary The size of the nouns dictionary is

20 ldquoBeauty warmth happiness kindness charmrdquo21 ldquoSorrow displeasure regret pleasure pleasantnessrdquo22 ldquoApproval blessing curse yearning condemnationrdquo23 ldquoDetraction derision mockery disdain despiserdquo

84 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

LGC

lass

No

un

s

Po

siti

veIt

ems

Neg

ativ

eIt

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

s

41 856 245 542 41 28 7742 11 5 6 0 0 143 170 57 113 0 0 1543B 74 21 53 0 0 7Tot 1111 328 714 41 28 100 100 30 64 4 3 -

Table 316 Nominalizations of the Psych Predicates chosen from SentIta

larger than the the verbsrsquo one This happens because the same pred-icate can possess more than one nominalization (eg the verb amarefrom the LG class 43 is related with the V-n amabilitagrave amore amorev-olezza)

3341 The Presence of Support Verbs

To include V-n in a sentiment lexicon is particularly risky In manycases it is misleading to use them out of any context because theyoften have a specific SO only if used with particular support verbsAmong the Vsup considered we mention the following with theirequivalents (DrsquoAgostino 2005 DrsquoAgostino et al 2007)

bull essere in ldquoto be inrdquo patire di soffrire di ldquoto suffer of stare in ldquotobe inrdquo andare in ldquoto go inrdquo

bull avere ldquoto haverdquo avvertire ldquoto perceiverdquo patire soffrire ldquoto suffertenere ldquoto have sentire provare ldquoto feel subire ldquoto undergordquonutrire covare ldquoto harborrdquo provare un (senso + sentimento) di ldquoto ldquoto feel a (sense + sentiment) of

bull causare ldquoto causerdquo dare ldquoto give suscitare ldquoto raiserdquo provocare

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 85

ldquoto provoke creare ldquoto create alimetare ldquoto instigate stimo-lare ldquoto inspire sviluppare ldquoto develop incutere ldquoto instillrdquo sol-lecitare ldquoto urge tordquo donare regalare offrire ldquoto offerrdquo

Sometimes it is even possible to have a SO switch by changing theword context For example the noun pietagrave ldquopity and ldquocompassionis strongly negative into a fixed expression with the support verb fare(22) but it becomes positive when it co-occurs with the sentiment ex-pression of (23)

(22) Egrave proprio la sceneggiatura che fa pietagrave [-3]ldquoThatrsquos the script that is pitifulrdquo

(23) Mary prova un sentimento di pietagrave [+2]ldquoMary feels a sentiment of compassionrdquo

Systematically evaluating the relationships that exist between eachone of the psych nouns and each one of the support verbs that canbe combined with them through LG tables can bring to an importantconclusion the fact that the support verb variants can influence thepolarity of the V-n with which they occur is not an exception For ex-ample extensions of Vsup like subire ldquoto undergordquo covare ldquoto harborrdquoincutere ldquoto instillrdquo bring a negative orientation with them donare re-galare offrire ldquoto offerrdquo are positively connoted perdere ldquoto loserdquo ab-bandonare ldquoto abandonrdquo lasciare ldquoto leaverdquo cause a polarity switch forwhatever psych noun orientationSupport verbs like the ones cited above can not be simply includedinto a Vsup list but must be inserted in specific paths of a nouns gram-mars able to correctly manage their influence on V-n

3342 The Absence of Support Verbs

Another problem arises when a psych predicative noun from our col-lection presents different semantic behaviors when occurring withoutsupport verbs in real texts This is the case of calore ldquowarmthrdquo from the

86 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

verb riscaldare ldquoto warm uprdquo of the LG class 41 that possesses its psy-chological figurative meaning only together with specific structures(24) but assumes a physical interpretation when it occurs alone (26)Different is the case of contentezza ldquohappinessrdquo that preserves its psy-chological meaning in both the cases (25 27)

(24) Le tue parole invadono[+] Maria di calore[+2] [+3]ldquoyour words warm up Maryrdquo

(25) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoyour words overwhelm Mary with happinessrdquo

(26) Che calore[+2] [0]ldquowhat a heatrdquo

(27) Che contentezza[+2] [2]ldquowhat a joyrdquo

In a banal way we solved the problem by distinguishing in the dic-tionaries and recalling in the grammars the cases in which the nounscan occur alone from the cases in which a specific Vsup context is re-quired to have a particular SO by means of the special tag +DASOLOldquoalonerdquo

3343 Ordinary Verbs Structures

In order to deeper examine how the presence of different verbs canaffect the polarity of the Psychological V-n we translated the list of 61verbs of Balibar-Mrabti (1995) from the French and annotated themwith semantic information The sentence structure which they havebeen conceived for is N0sent V N1um (eg una gioia intensa riempieMax ldquoan intense joy fills Maxrdquo) in which the subject N0 plays the roleof Nsent Then they have been tested into the structure N0 V N1umdi N2sent (eg le tue parole riempiono Max di una gioia intensa ldquoyourwords fill Max with an intense joyrdquo) that in turn is related with the

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 87

passive Max egrave pieno di una gioia intensa ldquoMax is full of an intense joyrdquoThis list of corresponding Italian verbs contains 47 entries because of

the exclusion of the duplicates from the LG class 4124 Among them 11give to the Nsent a negative connotation (eg abbattere ldquoto lose heartrdquooffuscare ldquoto confuserdquo corrodere ldquoto corroderdquo) 6 confer it a positiveorientation (eg rispeldere ldquoto shinerdquo cullare ldquoto clingrdquo illuminareldquotolight uprdquo) 18 intensify their polarity (eg travolgere ldquoldquoto overwhelmrdquordquoattraversare ldquoto undergordquo riempire ldquoto fillrdquo) and 12 simply possess aneutral orientationTheir role can become crucial into two differently oriented sentencesthat make use of the same Nsent like (28a b c)

(28a) Lrsquoamore[+2] tormenta[-3] Maria [-3](28b) Lrsquoamore[+2] illumina[+2] Maria [+2](28c) Lrsquoamore[+2] travolge[+] Maria [+3]

ldquoThe love (torments + lights up + overwhelms) Mariardquo

Therefore for sake of simplicity we decided to put them into aFSA that assigns as sentence polarity the score of the verbs with tagsNEGPOS (28a b) and that uses the verbs with the tag FORTE as in-tensifiers (28c) A simplified version of this grammar is displayed inFigure 31 Here the paths marked with the number (1) represents thenegative rule (28a) the paths (2) represent the positive rule (28b) andthe paths marked with (3) exemplify the intensification rule (28c)In this dictionarygrammar pair we did not mark the verbs with spe-cial tags consequently other Psychological Predicated endowed withpolarity and intensity tags can be matched in the correspondingpaths

24The verbs from this class are all compatible with this structure because they accept in the subjectposition an abstract noun and in position N1 require always a human noun

88 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re31F

SAfo

rth

ein

teraction

betw

eenN

sentan

dsem

antically

ann

otated

verbs

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 89

3344 Standard-Crossed Structures

To conclude the work on sentiment V-n we mention the case of thestructures of the type ldquostandardrdquo (N0sent V Loc N1um) ldquocrossedrdquo(N0um V di N1sent) exemplified below (DrsquoAgostino 2005 Elia et al1981)

(29) lrsquoamore[+2] brilla[+] negli occhi di Maria [+3]ldquolove shines in Mariarsquos eyesrdquo

(30) gli occhi di Maria brillano[+] drsquoamore[+2] [+3]ldquoMariarsquos eyes shine with loverdquo

To formalize this group of sentiment expressions we made use of 12verbs from the LG class 12 interpreted into a figurative way in whichthe N1um ldquostandardrdquo position and the N0um ldquocrossedrdquo position aredesigned by their body parts Npc through a metonymic process (Eliaet al 1981)As happens in (29) and (30) such verbs can work as intensifiers (egbruciare ldquoto burnrdquo) or can also have a negative polarity influence (onlyin the case of dolere ldquoto hurtrdquo)

335 Psychological Predicatesrsquo Adjectivalizations

Paragraph 332 described the manually-built collection of opinionbearing adjectives that counts more than 5000 itemsIn this list a subset of items in morpho-phonological relation with ourpredicates from the classes 41 42 43 and 43B has been automaticallyassociated to the verbs of origin and to their respective LG classes Theresult of this work is an electronic dictionary of 757 entries realizedwith a morphological grammar of Nooj (see Figure 32) It is displayedin the sample below

90 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

angoscianteA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciatoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciosoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

piacenteA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

piacevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

amatoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorosoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

biasimevoleA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=biasimare+Class=43B25

Table 317 gives information about the distribution of its semantictags while Table 318 compares the sentiment selection of V-a withthe whole dictionary of sentiment adjectives of SentIta in which it iscontained

Tag 41 42 43 43B TotPOS+FORTE 8 0 0 9 17POS 133 3 35 11 182POS+DEB 14 0 0 0 14NEG+DEB 66 0 5 1 72NEG 307 2 24 25 358NEG+FORTE 49 0 6 2 57FORTE 46 1 0 0 47DEB 9 0 0 1 10Tot 632 6 70 49 757

Table 317 Distribution of semantic tags in the dictionary of Adjectival Psych Pred-icates

The grammar resembles the ones used to derive the orientation ofadverbs in -mente and quality nouns from the semantic informationlisted in the sentiment adjectives that will be respectively treated inSections 352 and 353 The process is similar we copy and paste thedictionary that has to be semantically enriched into a Nooj text and weannotate it through the instructions of the grammar We then export

25 ldquoAnguishing anguished anguishous attractive pleasing beloved loving amorous blameworthyrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 91

Adjectives Entries Percentages () Percentages ()on Adj Psych on SentIta

Tot pos 213 28 4Tot neg 487 64 9Tot int 57 8 1Tot Psych 757 100 14Tot SentIta 5381 - 100

Table 318 Percentage values of Psych Adjectivalizations in SentIta

the annotations and compile a brand new dictionary on their baseBecause the operations involve just the two dictionaries we exclude

any other lexical (nod) and morphological (nom) resource from theNooj Preferences before applying the grammar to the dictionary-textThe grammar recognizes each verbal stem checks if it belongs to theLG class of interest and then adds the adjectival suffixes listed intoa dic file (see Table 319) The difference from the grammars forthe quick population of the adverbs in -mente and the quality nounsis that here the morphological strategy is not exploited to discoverthe prior polarity of the words since the adjective dictionary was al-ready tagged with sentiment information In fact in the final annota-tion the adjectives preserve all their semantic ($2S) inflectional andderivational ($2F) properties The new information they acquire re-gard their nature of V-a (+Va) the connection with the psychologicalverb (+V=$1L) and the relative LG class of the predicate (+Class=41)Regarding the adjectival suffixes for the deverbal derivation we exam-ined the ones that follow (Ricca 2004a)

bull morphemes for the adjectives formation (-bile -(t)orio -evole-(t)ivo -(t)ore)-bondo -ereccio -iccio -ido -ile -ndo -tario -ulo)

bull morphemes for the formation of adjectives that coincide with theverbrsquos past or present participle (-to -nte)

bull not typically deverbal morphemes (-oso -istico-(t)ico -aneo -ardo -ale -ario)

92 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re32E

xtractofth

em

orp

ho

logicalF

SAth

atassociates

psych

verbs

toth

eirad

jectivalization

s

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 93

bull not typically adjectival morphemes (-(t)ore -trice -ista -one)

SFX Adjectives-t-o 309-nt-e 159-(t)or-e 100-(t)iv-o 38-os-o 38-(t)ori-o 35-evol-e 26-o (conversion) 20-(t)ic-o 8-on-e 6-ist-a 4-istic-o 3-al-e 3-il-e 2-bil-e 1-bond-o 1-nd-o 1-tric-e 1-icci-o 1-ari-o 1-erecci-o 0-id-o 0-tari-o 0-ul-o 0-ane-o 0-ard-o 0

Table 319 Suffixes from the dictionary of Psych Adjectivalizations

As it can be seen in Table 319 some suffixes produced a largenumber of connections some of them were less productive Six mor-phemes did not produce any connectionFurthermore also a set of adjectives derived by conversion has beenautomatically connected to the respective verbs

94 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCESP

sych

verb

Psy

chV-

n

Psy

chV-

a

V-a

Tran

slat

ion

LGC

lass

Sco

re

angosciare angosciaangosciante ldquoanguishing

41 -3angosciato ldquoanguishedangoscioso ldquoanguishous

piacere piacerepiacente ldquoattractive

42 +2piacevole ldquopleasing

amare amoreamato ldquobeloved

43 +3amorevole ldquolovingamoroso ldquoamorous

biasimare biasimo biasimevole ldquoblameworthy 43B -2

Table 320 Adjectivalizations of the Psych Predicates

LG class 43 maligno = malignare ldquomean = to speak ill ofrdquoLG class 41 schifo = schifareldquodisgusting = to loatherdquo

The average precision reached in the automatic association pro-cess is 97 The list produced by Nooj has then been manuallycorrected in order to add it to the SentIta resources without anypossible source of errorsTable 320 shows examples of the connection between the verbs fromthe tables 41 42 43 and 43B and their respective nominalizations andadjectivalizationsActually nouns can be associated to adjectives also independentlyfrom verbs (Gross 1996) as happens in (31) Section 353 will dealwith such cases

(31) Luc

[est g eacuteneacuter eux

a de l a g eacuteneacuter osi teacute

]enver s ses ennemi s (Gross 1996)

ldquoLuc (is generous + has generosity) for his enemiesrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 95

Basically we chose to associate the psych adjectives to the respec-tive verbs in order to preserve their argumentactant selection for Se-mantic Role Labeling purposes This does not exclude that also otheradjectives that do not belong to this subset can have the function ofpredicates in the sentences in which they occur as happens in (31)

3351 Support Verbs

As exemplified above the sentiment expressions in which we insertedthe adjectives from SentIta are of the kind N0 essere Agg val WhereAgg val represents an adjective that expresses an evaluation (Elia et al1981) and the support verb for these predicates is essere ldquoto berdquoThis verb gives its support also to the expression N0 essere un N1-class(eg Questo film egrave una porcheria ldquoThis movie is a messrdquo) that withoutit together with the adjectives would not posses any mark of tense(Elia et al 1981)A great part of the compound adverbs possess also an adjectival func-tion (eg a fin di bene ldquofor goodrdquo tutto rose e fiori ldquoall peace and lightrdquosee Section 341) so with good reason they have been included inthis support verb construction as wellThe support verbsrsquo equivalents included in this case are the following(Gross 1996)

bull aspectual equivalents stare ldquoto stayrdquo diventare ldquoto becomerdquo ri-manere restare ldquoto remainrdquo

bull causative equivalents rendere ldquoto makerdquo

bull stylistic equivalents sembrare ldquoto seemrdquo apparire ldquoto appearrdquorisultare ldquoto resultrdquo rivelarsi ldquoto reveal to berdquo dimostrarsimostrarsi ldquoto show oneself to berdquo

Among the Italian LG structures that include adjectives26 we se-lected the following in which polar and intensive adjectives can occur

26For the LG study of adjectives in French see Picabia (1978) Meunier (1999 1984)

96 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

with the support verb essere (Vietri 2004 Meunier 1984)

bull Sentences with polar adjectives

ndash N0 Vsup Agg Val27

Lrsquoidea iniziale era accettabile[+1]

ldquoThe initial idea was acceptablerdquo

ndash V-inf Vsup Agg Val

Vedere questo film egrave demoralizzante[ndash2]

ldquoWatching this movie is demoralizingrdquo

ndash N0 Vsup Agg Val di V-Inf

La polizia sembra incapace[ndash2] di fare indagini

ldquoThe police seems unable to do investigaterdquo

ndash N0 Vsup Agg Val a N1

La giocabilitagrave egrave inferiore[ndash2] alla serie precedente

ldquoThe playability is worse than the preceding seriesrdquo

ndash N0 Vsup Agg Val Per N1

Per me questo film egrave stato noioso[ndash2]

ldquoIn my opinion this movie was boringrdquo

bull Sentences with adjectives as nouns intensifiers and downtoners

ndash N0 Vsup Agg Int di N1

Una trama piena[+] di falsitagrave[ndash2]

ldquoA plot filled with mendacityrdquo

27 We preferred to use in these examples the notation Agg Val rather than Psych V-a because they canrefer to both adjectivalizations of psych verbs and to other SentIta evaluative adjectives

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 97

3352 Nominal groups with appropriate nouns Napp

The support verb avere ldquoto haverdquo (and its equivalent tenere) has beenobserved into a transformation (Nb Vsup Na V-a) in which is involveda special kind of GN subject that contains noms approprieacutes ldquoappro-priate nounsrdquo Napp (Harris 1970 Guillet and Leclegravere 1981 Laporte1997 2012 Meydan 1996 1999)Citing (Laporte 2012 p 1)

ldquoA sequence is said to be appropriate to a given context ifit has the highest plausibility of occurrence in that contextand can therefore be reduced to zero In French the notionof appropriateness is often connected with a metonymicalrestructuration of the subjectrdquo

and (Mathieu 1999b p 122)

ldquoOn considegravere comme substantif approprieacute tout substantifNa pour lequel dans une position syntaxique donneacutee Na deNb = Nb28rdquo

we can clarify that ldquothe notion of highest plausibility of occurrenceof a term in a given contextrdquo (Laporte 2012) should not be interpretedin statistic terms or proved by searches in corpora but just intuitivelydefined through the paraphrastic relation Na di Nb = NbAccording to (Meydan 1996 p 198) ldquothe adjectival transformationswith Napp (nb (Na di Nb)Q essere V-a = Il fisico di Lea egrave attraenteldquoThe body of Maria is attractiverdquo) can be put in relation through fourtypes of transformationsrdquo which correspond also to the structuresincluded into our network of sentiment FSA The obligatoriness of themodifiers and the appropriateness of the nouns are reflected in thesetransformations (Laporte 1997)

28ldquoIt is considered to be appropriate substantive each substantive fro which into a given syntactic po-sition Na of Nb = Nbrdquo Authorrsquos translation

98 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull a nominal construction Vsup Napp

Nb Vsup Na V-a

Lea ha un fisico attraente ldquoLea has an attractive bodyrdquo

bull a restructured sentence in which the GN subject is exploded intotwo independent constituents

Nb essere V-a Prep Na

Lea egrave attraente (per il suo + di) fisico ldquoLea is attractive for herbodyrdquo

bull a metonymic sentence in which the Napp is erased

Nb essere V-a

Lea is attractive ldquoLea is attractiverdquo

bull a construction in which the Napp is adverbalized

Nb essere Na-mente V-a

Lea egrave fisicamente attraente ldquoLea is physically attractiverdquo

The concept of appropriate noun has a crucial importance notonly in these adjectival expressions but also when it appears as ob-ject of psychological predicates (see Section 334) as exemplified by(Mathieu 1999b p 122)

N0 V (Deacutet N de Nhum)1 = N0 V (Nhum)1

Les soucis rongent lrsquoesprit de Marie = Les soucis rongent MarieldquoWorries torment the spirit of Mary = Worries torment Maryrdquo

Moreover into the Sentiment Analysis field where the identifica-tion and the classification of the features of the opinion object evenconsist in a whole subfield of research (see Section 52) the Napp be-comes a very advantageous linguistic device for the automatic featureanalysis See for example (32) in which Na (Napp) is the feature andthe Nb (N-um) is the the object of the opinion

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 99

(32) [Opinion[Feature

[La r i sol uzi one

Lprimeobi et t i vo

]]del l a [Object f otocamer a] egrave eccel lente[+3]]

(32a) [Opinion[ObjectLa f otocamer a] ha una [Feature

[r i sol uzi one

obi et t i vo

]]eccel lente[+3]]

(32b) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3]]

(32c) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3] per [Feature

[l a sua r i soluzi one

i l suo obi et t i vo

]] ]

ldquo(The image resolution + The lens) of the camera is excellentrdquoldquoThe camera has an excellent (image resolution + lens)rdquoldquoThe camera is excellentrdquoldquoThe camera is excellent for its (image resolution + lens)rdquo

A useful classification of the nouns that can play the role of Nappfor both human and not human nouns and that can be exploited forthe opinion feature classification is the following (Meydan 1996)

Napp of Num

Npc = body parts

Npabs = personality trait

Ncomport = attitudes

Npreacutedr = intentions

Napp of N-um

Npo = shape

Nprop = ability

Dnom = model

100CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3353 Verbless Structures

Predicativity is not a property necessarily possessed by a particularclass of morpho-syntagmatic structures (eg verbs that carry infor-mation concerning person tense mood aspect) instead it is deter-mined by the connection between elements (Giordano and Voghera2008 De Mauro and Thornton 1985)Also on the base of their frequency in written and spoken corpora andin informal and formal speech together with Giordano and Voghera(2008) we consider verbless expressions syntactically and semanti-cally autonomous sentences which can be coordinated juxtaposedand that can introduce subordinate clauses just like verbal sentencesAmong the verbless sentences available in the Italian language we areinterested here on those involving adjectives indicating appreciation(Agg val) eg Bella questa ldquoGood onerdquo (Meillet 1906 De Mauro andThornton 1985)Below we report a selection of customer reviews from the dataset de-scribed in Section 513 that can give an idea of the diffusion of theverbless constructions in user generated contents

bull Movie Reviews [Molto bravi gli interpreti] [la mia preferita lasorella combina guai] [Bella la fotografia] [i colori]29

bull Car Reviews [Ottimo lrsquoimpianto radio cd] [comodissimi i co-mandi al volante]30

bull Hotel Reviews [Posizione fantastica] si raggiungono a piedi moltiluoghi strategici di Londra [molto curato nel servizio e nel soddis-fare le nostre richieste][ottimo servizio in camera]31

bull Videogame Reviews [Provato una volta] e [subito scartato]

29httpwwwmymoviesitfilm2013abouttimepubblicoid=68043930httpautociaoitNuova_Fiat_500__Opinione_1336287SortOrder131httpwwwtripadvisoritShowUserReviews-g186338-d651511-r120264867-Haymarket_

Hotel-London_Englandhtml

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 101

[odioso il modo in cui viene recensito]32

bull Smartphone Reviews [Utilissimo il sistema di spegnimento e ri-accensione ad orari programmati] [Telefonia e sensibilitagrave OK ]33

bull Book Reviews [Azzeccatissima la descrizione anche psicologica eintrospettiva dei personaggi]34

3354 Evaluation Verbs

In this Paragraph we mention the use of the verbs of evaluationVval (Elia et al 1981) which represent a subclass of the LG class43 grouped together through the acceptance of at least one of theproperties N1 = N1 Agg1 and N1 = Agg1 Ch F Examples are giudicareldquoto judgerdquo trovare ldquoto considerrdquo avvertire ldquoto noticerdquo valutare ldquotoevaluaterdquo etcOf course the N1 Agg here can include an Napp that takes the shapeof (Na di Nb)1 Agg just as happens with the psychological predicatesof Mathieu (1999b)They present a different role attribution with respect to support verbsconstructions since the N0 of the evaluative verbs is always an Opin-ion Holder (33)

(33) [Opinion[Opinion HolderM ar i a ]consi der a

[[Target Ar tur o] a f f asci nante[+2]

a f f asci nante[+2] [Targetche Ar tur o veng a]

]

ldquoMaria considers (Arturo fascinating + enchanting that Arturocomes)rdquo

32httpwwwamazonitreviewRK6CGST4C9LXI33httpwwwamazonitAlcatel-Touch-Smartphone-Dual-Italiaproduct-reviews

B00F621PPGpageNumber=334httpwwwamazonitproduct-reviewsB007IZ1Q5SpageNumber=3

102CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

The N0 of support verb construction instead is always the targetof the opinion (34) The Opinion Holder remains here unexpressed

(34) [Opinion[Target Ar tur o ]

egraver i manesembr a

d av ver o[+] a f f asci nante[+2]]

ldquoArturo (is + remain + seems) truly fascinatingrdquo

Into causative constructions the roles are again different (35)

(35) [Opinion[Cause

[M ar i a

Lprimeacconci atur a

]] r ende [Target Ar tur o ]d av ver o[+] a f f asci nante[+2]]

ldquo(Maria + The hairstyle) makes Arturo truly fascinatingrdquo

336 Vulgar Lexicon

The fact that ldquotaboo language is emotionally powerfulrdquo (Zhou 2010 p8) can not be ignored during the collection of lexical and grammaticalresources for Sentiment Analysis purposesSentIta is provided with a collection of 182 bad words that include thefollowing grammatical and semantic subcategories

bull Nouns 115 entries among which

ndash 30 are human nouns (eg rompiballe ldquopain in the assrdquo)

ndash 9 are nouns of body parts (eg cazzo ldquodickrdquo)

bull Verbs 45 entries among which

ndash 17 are pro-complementary and pronominal verbs (eg 4+PRXCOMP fottersene ldquoto give not a fuckrdquo and 13 +PRX in-cazzarsi ldquoto get madrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 103

ndash 17 allow the derivation of nouns in -bile ldquoblerdquo (eg tromba-bile ldquobangablerdquo from trombare ldquoto screwrdquo)

ndash 4 are already included in the LG class 41 (eg fregare ldquoto mat-terrdquo)

bull Adjectives 16 entries among which 10 are from SentIta (eg 5+POS cazzuto ldquodie-hardrdquo and 5 +NEG scoglionato ldquoannoyedrdquo)

bull Adverbs 4 entries (eg incazzosamente ldquogrumpilyrdquo)

bull Exclamation 8 entries (eg vaffanculo ldquofuck offrdquo)

Figure 33 Frozen and semi-frozen expressions that involve the vulgar word cazzoand its euphemisms

104CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Although some vulgar terms and expressions eg with the func-tions of interjections or discourse markers can be semantically emptythe great part of them have metaphorical functions that are relevantfor the correct analysis of the sentiment in real texts Examples are of-fense imprecation intensification of negation etcThe scatological and sexual semantic fields are among the most fre-quent in our lexicon The male and female nouns derived from humanbody parts can have different kinds of orientations and functions thatgo through the insult (coglione ldquoassholerdquo) and the praise (figata ldquocoolthingrdquo)In accord with the observations made up by Velikovich et al (2010)in web corpora our lexicon involves a set of racial or homophobicderogatory terms (eg terrone35 ldquosouthern Italianrdquo culattone ldquofaggotrdquo)The vulgar words in the SentIta database especially the ones with anuncertain polarity can be part of frozen or semi-frozen expressionsthat can make clear for each occurrence the actual semantic orienta-tion of the words in context A very typical Ialian example is cazzoldquodickrdquo with its more or less vulgar regional variants (eg minchiapirla) and euphemisms (eg cavolo ldquocabbagerdquo cacchio ldquodangrdquo mazzaldquostickrdquo tubo ldquopiperdquo corno ldquohornrdquo etc) Figure 33 exemplifies thecases in which the context gives to the word(s) under examination anegative connotation In detail the FSA includes negative adverbialand adjectival functions (paths 12 5 7) exclamations (paths 3 6 9)interrogative forms (6) and intensification of negations (paths 8 9) Inpath (4) it is also exemplified the frozen sentence from the LG classCAN (N0 V (C1 di N)= N0 V C1 a N2) that presents cazzo (Vietri 2011) asC1 frozen complement This frozen sentence in particular is also re-lated to some monorematic compound nouns and adjectives includedin the bad word list of SentitaHowever a Sentiment Analysis tool that works on user generated con-tents must be aware of the fact that in colloquial and informal situ-ations a word like cazzo can work simply as a regular intensifier also

35term used with insulting purposes only by Northern Italians

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 105

for positive sentences (eg egrave cosigrave bello[+2] cazzo[+] [+3] ldquoitrsquos fuckingnicerdquo) This is why it received in the dictionary just the tag attributedto intensifiers (+FORTE) In order to be annotated in texts with otherorientations it must occur as component of specific expressionsAs we can see in Table 321 although the majority of the entries in thebad words dictionary possess a negative connotation 10 of them arepositive (eg strafigo ldquosupercoolrdquo) Moreover the manual annotationof the whole list of bad words with sentiment tags confirmed the bud-ding concept of the emotional power of taboo language 84 of thevulgar lexicon is endowed with a defined semantic orientation (see Ta-ble 322)

Score Bad Words Entries+3 +POS+FORTE 6+2 +POS 12+1 +POS+DEB 0-1 +NEG+DEB 8-2 +NEG 125-3 +NEG+FORTE 6+ +FORTE 3- +DEB 0

Table 321 SentIta tag distribution in the list of bad words

Bad Words Entries Percentage ()tot pos 18 10tot neg 132 73tot int 3 2tot oriented 153 84tot neutral 29 16tot BW 182 100

Table 322 Bad Words percentage values in SentIta

106CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Collateral Benefits of Exhaustive Vulgar Lexical and Grammatical Resources

The Web 20 offers the opportunity to Internet users to freelyshare almost everything in online communities hiding their ownidentity behind the shelter of anonymityFlaming trolling harassment cyberbullying cyberstalking cy-berthreats are all terms used to refer to offensive contents on theweb The shapes can be different and the focus can be on various top-ics eg physical appearance ethnicity sexuality social acceptanceetc (Singhal and Bansal 2013) but they are all annoying (when noteven dangerous when the victims are children or teenagers) for thesame reason they make the online experience always unpleasantThe detection and filtering of offensive language on the web isstrongly connected to the Sentiment Analysis field when one decideto handle it automatically The occurrence of derogatory terms andphrases or the unjustified repetition of words with a strongly negativeconnotation can effectively help with this taskAlthough taboo language is generally considered to be the strongestclue of harassment in the web (Xiang et al 2012 Chen et al 2012Reynolds et al 2011 Xu and Zhu 2010 Yin et al 2009 Mahmud et al2008) it must be clarified that the presence of bad words in posts doesnot necessarily indicate the presence of offensive behaviorsWe already explained that the words in our collection of vulgar termsin some cases are neutral or even positive Profanity can be usedwith comical or satirical purposes and bad words are often just theexpression of strong emotions (Yin et al 2009)Thus the well-known censoring approaches based only on keywordsthat simply match words appearing in text messages with offensivewords stored in blacklists are clearly not meant to reach high levels ofaccuracyConsistent with this idea in recent years many studies on offensivecyberbullying and flame detection integrated the bad wordsrsquo contextin their methods and tools

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 107

Chen et al (2012) exploited a Lexical Syntactic Feature (LSF) archi-tecture to detect offensive content and identify potential offensiveusers in social media introducing the concept of syntactic intensifierto adjust wordsrsquo offensiveness levels based on their contextXiang et al (2012) learned topic models from a dataset of tweetthrough Latent Dirichlet Allocation (LDA) algorithm They combinedtopical and lexical features into a single compact feature spaceXu and Zhu (2010) proposed a sentence-level semantic filtering ap-proach that combined grammatical relations with offensive wordsin order to semantically remove all the offensive content from sen-tencesInsulting phrases (eg get-a-life get-lost etc) and derogatorycomparisons of human beings with insulting items or animals (egdonkey dog) were clues used by Mahmud et al (2008) to locate flamesand insults in text

34 Towards a Lexicon of Opinionated Compounds

341 Compound Adverbs and their Polarities

Elia (1990) adapted to the Italian language the formal definition of ldquoad-verbrdquo of1 2q Jespersen (1965) who for the first time included alsospecial kinds of phrases and sentences in the category of the adverbsIn detail the structures included in this category are the following

bull traditional adverbs with a monolithic shape (eg volentierildquogladlyrdquo)

bull traditional adverbs in -mente (eg impetuosamente ldquoimpetu-ouslyrdquo)

bull prepositional phrases (eg a vanvera ldquorandomlyrdquo)

108CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull phases derived from sentences with finite or non-finite verbs(eg da levare il fiato ldquoto take the breath away)

The first two types of adverbs will be treated separately in Section352 due to the fact that they have been automatically derived fromthe SentIta adjective dictionaryCompound adverbs or frozen or idiomatic adverbs (the last two pointsof Jespersenrsquos list) can be defined as

ldquo() adverbs that can be separated into several wordswith some or all of their words frozen that is semanticallyandor syntactically noncompositionalrdquo (Gross 1986 p 2)

Therefore just as happens in polyrematic expressions the syntacticelements that compose such adverbs can not be freely shifted substi-tuted or deleted This despite the fact that from a syntactic point ofview they work just as simple adverbsFollowing the LG paradigm and according to their internal mor-phosyntactic structure the compound adverbs have been classified inmany languages namely English(Gross 1986) French (Gross 1990Laporte and Voyatzi 2008 Tolone and Stavroula 2011) Italian (Elia1990 Elia De Gioia 1994ab 2001) Modern Greek (Voyatzi 2006)Portuguese (Baptista 2003) etcThe LG description and classification of compound adverbs rep-resents the intersection of two criteria corresponding to the phe-nomenon of the multiword expressions and to the adverbial functions(Tolone and Stavroula 2011 Laporte and Voyatzi 2008)According to (Laporte and Voyatzi 2008 p 32)

ldquo() a phrase composed of several words is considered tobe a multiword expression if some or all of its elements arefrozen together that is if their combination does not obeyproductive rules of syntactic and semantic compositional-ityrdquo

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 109

The adverbial functions instead pertain to the adverbial roleswhich can take the shape of circumstantial complements or of com-plements which are not objects of the predicate of the clause in whichthey appearIn LG descriptions compound adverbs are formalized as a sequenceof parts of speech and syntactic categories Their morpho-syntacticstructures are described on the base of the number the category andthe position of the frozen and free components of the adverbialsThe Italian classification system for compound adverbs follows thesame logic as the French one with which it shares a strong similar-ity in terms of syntactic structuresIn this research among the 16 classes of idiomatic adverbs selected byDe Gioia (1994ab) we worked on the ones reported in Table 323The comparative structures with come ldquolikerdquo treated by Vietri (19902011) as idiomatic sentences will be described in Section 342The adverbs belonging to the classes described in the first 9 rows of

Table 323 have been manually selected and evaluated starting fromthe electronic dictionaries and the LG tables of Elia (1990) while theclasses PF PV and PCPN come from the LG tables of De Gioia (2001)In the first classes Elia (1990) recognized the following three abstractstructures

1 Prep (E+Det) (E+Agg) C1 (E+Agg)

(a) PC

(b) PDETC

(c) PAC

(d) PCA

2 Prep (E+Det+Agg) C1 (E+Agg) Prep (E+Det+Agg) C2 (E+Agg)

(a) PCDC

(b) PCPC

110CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class

Structu

reE

xamp

leTran

lation

PC

Prep

Ccon

trovoglia

un

willin

glyP

DE

TC

Prep

DetC

conle

cattiveb

yan

ym

eans

necessary

PAC

Prep

Agg

Ca

pien

ivotiw

ithfl

ying

colo

rsP

CA

Prep

CA

gga

testaalta

with

you

rh

eadh

eldh

ighP

CD

CP

repC

diC

concogn

izione

dicau

saw

ithfu

llknow

ledge

PC

PC

Prep

CP

repC

diben

ein

meglio

from

goo

dto

better

PC

ON

G(P

rep)

CC

on

gC

drsquoam

oree

drsquoaccord

oin

armo

ny

CC

CC

nien

tem

alen

oth

ing

bad

CP

CC

Prep

Clrsquoira

diD

ioa

kingrsquos

ranso

mP

FC

om

po

un

dSen

tence

sisalvichip

uograve

everym

anfo

rh

imself

PV

Prep

VW

sedu

tastan

teo

nth

esp

ot

PC

PN

Prep

CP

repN

inon

taa

insp

iteo

f

Table

323Classes

ofC

om

po

un

dA

dverb

sin

SentIta

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 111

3 (E+Prep) (E+Det) C1 (E+Cong+Prep) (E+Det+Agg) C2

(a) PCONG

(b) CC

(c) CPC

Table 324 shows the distribution of each one of the evaluation(POSNEG) intensity (FORTEDEB) and discourse tags36 (SINTESINEGAZIONE etc) in every class of compound adverbAs we can see in the two lower section of the Table 324 the percentageof compound adverbs that posses an orientation or that are signifi-cant for Sentiment Analysis purposes is the 51 of the total amount ofcompound adverbs under exam Moreover it must be observed thatthe 60 of the labels are connected to the intensity while we have onlythe 30 of evaluation tagsThe presence of discourse tags seems to be marginal but if comparedwith other part of speech (or even with the -mente and monolithic ad-verbs see Section 352) it reveals its significance

3411 Compound Adjectives of Sentiment

A great part of the polyrematic units with an adverbial function canalso play the role of adjectives

Adverbial function agire a fin di bene ldquoto act for goodrdquo

Adjectival function una bugia a fin di bene ldquoa white lierdquo

For simplicity we did not distinguish the cases in which theyposses just one function from the cases in which they can have boththe roles However separating these two groups is often impossible

36These tags annotate 70 discourse operators able to summarize (eg in parole povere ldquoin simpletermsrdquo) invert (eg nonostante cioacute ldquoneverthelessrdquo) confirm (eg in altri termini ldquoin other termsrdquo) com-pare (eg allo stesso modo ldquoin the same wayrdquo) and negate (eg neanche per sogno ldquoin your dreamsrdquo) theopinion expressed in the previous sentences of the text

112CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

if one considers also their invariability and the fact that their func-tion depends on semantic features rather then on morphological ones(Voghera 2004)

AVV+AVVC+PC+PN without adjectival functionin conclusione ldquoin conclusionrdquo

AVV+AVVC+PC+PN with adjectival functiona vanvera (eg parlare a vanvera ldquoto prattlerdquo un discorso a van-vera ldquoa prattlerdquo)

Therefore we just did not make any distinction into the dictionar-ies in order to avoid the duplication of the entries but we recalledthem also in the grammars dedicated to adjectives as nouns modi-fiers and into constructions of the kind N0 (essere + E) Agg Val (seeSection 335)

342 Frozen Expressions of Sentiment

In this thesis oriented words are always considered in context In thisparagraph we will focus on the importance that the frozen sentencescan have in a LG based module for Sentiment AnalysisTraditional and generative grammar generally treat idioms as aberra-tions and many taggers and corpus linguistic tools just ignore multi-word units and frozen expressions (Vietri 2004 Silberztein 2008)The LG paradigm started with Gross (1982) the systematic and formalstudies on frozen sentences underling their non-exceptional naturefrom both the lexical and syntactic point of views Indeed idioms oc-cupy in the lexicon a volume that is comparable with the one of thefree forms (Gross 1982)This kind of approach opens plenty of possibilities in the field of Sen-timent Analysis in which the collection of lexical resources endowedwith a defined polarity can not just run aground on simple words butneeds to be opened also to different kinds of oriented multiword ex-

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 113

Tag

PC

CC

CPC

PAC

PCA

PCDC

PCONG

PCPC

PDETC

PF

PV

PCPN

TotP

OS+

FO

RT

E0

60

42

02

15

10

021

PO

S1

52

915

101

014

00

057

PO

S+D

EB

00

03

22

20

51

00

15N

EG

+DE

B0

01

04

01

37

31

020

NE

G36

156

155

62

519

70

512

1N

EG

+FO

RT

E0

00

11

12

23

11

012

FO

RT

E10

240

549

2613

3518

768

50

377

DE

B23

131

144

45

26

10

073

SIN

TE

SI13

20

61

20

01

05

030

CO

NF

ER

MA

52

00

00

00

50

01

13IN

VE

RT

E4

20

80

00

07

01

022

CO

MP

30

01

00

00

10

00

5N

EG

AZ

ION

E3

00

00

00

21

01

18

Tot

190

8515

110

6038

5033

150

2214

777

4To

tCla

ss87

234

610

431

632

024

813

212

762

120

212

453

332

1Pe

rcen

tage

()

2225

1435

1915

3826

2311

1113

51

Eva

luat

ion

tags

()

1931

6029

4850

2033

3559

1471

32In

ten

sity

tags

()

6662

4057

5045

8061

5541

360

58D

isco

urs

eta

gs(

)15

70

142

50

610

050

2910

Tab

le3

24C

om

po

siti

on

oft

he

com

po

un

dad

verb

dic

tio

nar

y

114CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

pressions and frozen sentencesAccording to (Vietri 2014b p 32)

() a verb when co-occurring with a certain noun (or a setof nouns) produces a ldquospecial meaningrdquo that it would nothave been assigned if the noun was substituted This typeof meaning is conventionally defined ldquonon-compositionalrdquobut this does not automatically imply from a syntactic pointof view that idioms are ldquounitsrdquo

On the base of this statement strictly connected to the distributionalproperties of the idiomatic sentences we included in our sentimentdatabase more than 500 Italian frozen sentences that have been man-ually evaluated and then formalized with pairs of dictionary-grammarThis choice is connected to the purpose of lemmatizing them in lexicaldatabases taking into account through FSA also their syntactic flexi-bility and lexical variation Treating into a computational way idiomsmeans to deal with the problems they pose in terms of relationship be-tween ldquointerpretationrdquo and ldquosyntactic constructionsrdquo (Vietri 2014b)The source of the work are the Lexicon-Grammar descriptions of id-ioms that take the shape of binary tables in which lexical entries aredescribed on the base of syntactic and semantic propertiesThe formal notation is the traditional one used in the LG frameworkThe symbol C is used in frozen sentences to indicate a fixed nom-inal position that can not be substituted by different items belong-ing to the same class or semantic field and that can not be morpho-grammatically modified (Vietri 2004)The Lexicon-Grammar classification of Italian idioms includes morethan 30 Lexicon-Grammar classes of idioms with ordinary verbs andsupport verbs The most common support verbs are avere to ldquohaverdquoessere ldquoto berdquo fare ldquoto makerdquo They differ from their ordinary counter-parts because of their semantic emptiness When they are implied inthe formation of idiomatic constructions they present a very high lex-ical and syntactic flexibility that can take the shape of the following

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 115

phenomena (Vietri 2014d)

bull the alternation of support verbs with aspectual variants (eg ri-manere ldquoto remainrdquo diventare ldquoto becomerdquo)

bull the production of causative constructions (eg with verbs likerendere ldquoto makerdquo)

bull the deletion of the support verb (eg N0 Agg come C1 una donnaastuta come una volpe ldquoa woman as sharp as a tackrdquo)

Because of the complexity of their lexical and syntactic variabilitythat often affects the intensity of the idiom itself we chose to reducethe sample of the idioms under examination for Sentiment Analysispurposes just to the frozen sentences with the support verb essere thatinclude adjectives in their structure namely PECO CEAC EAA ECAand EAPC Due to the fact that a large part of the idioms belongingto this subgroup contains adjectives evaluated in SentIta we foundinteresting a comparison between the prior polarity of such adjectivesinvolved in idioms and the polarity assigned to the idiom itself Theresults of this evaluation of differences is reported in Table 325

PECO CEAC EAA ECA EAPC TotIdioms in the LG Class 274 36 40 153 162 665Idioms in SentIta 245 27 32 134 139 577Adj of SentIta in Idioms 133 12 15 37 56 253

Table 325 Polar adjectives in frozen sentences

3421 The Lexicon-Grammar Classes Under Examination PECO CEAC EAAECA and EAPC

Vietri (1990) showed that the comparative frozen sentences of the typeN0 Agg come C1 (PECO) usually intensify the polarity of the adjectiveof sentiment they contain as happens in (36) in which the SentIta

116CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

adjective bello endowed with a polarity score of +2 occurring into theidiom Essere bello come il sole changes its polarity in +3

(36) Maria egrave bella[+2] come il sole [+3]ldquoMaria is as beautiful as the sun

Actually it is also possible for an idiom of this sort to be polarizedwhen the adjective (eg bianco ldquowhite) contained in it is neutral (37)or even to reverse its polarity as happens in (38) (eg agile ldquoagile ispositive)

(37) Maria egrave bianca[0] come un cadavere [-2]ldquoMaria is as white as a dead body (Maria is pale)

(38) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

In this regard it is interesting to notice that the 84 of the idiomshas a clear SO while just the 36 of the adjectives they contain ispolarized see the examples in Table 326 This confirms the signifi-cance of a sentiment lexicon that includes also frozen expressions inits list Indeed a resource of this sort is able to disambiguate the casesin which the polarity of sentiment (or neutral) words is changed by theironic nature of the idioms in which they occur (38) (39) (40)The disambiguation rule is only one and consist in annotating alwaysthe longest match in the text analysis

(39) Maria egrave alta[0] come un soldo di cacio [-2]ldquoMaria is knee-high to a grasshopperrdquo (Maria is short)

(40) Maria egrave asciutta[0] come lrsquoesca [-2]ldquoMaria is on the bones of its arserdquo (Maria is penniless)

Other idioms included in our resources are the ones belonging tothe classes EAPC that involve the presence of an adjective or a verb inits past participle form a prepositional complement These frozen ex-pressions usually intensify the polarity of the adjectivepast participle

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 117

Froz

enTr

ansl

atio

nA

dj

Ad

jId

iom

Idio

mSe

nte

nce

Po

lari

tySc

ore

Po

lari

tySc

ore

esse

reag

ile

com

eu

na

gatt

ad

ipio

mbo

ldquoto

be

asag

ile

asa

lead

catrdquo

PO

S+2

NE

G-2

esse

reag

ile

com

eu

na

gazz

ella

ldquoto

be

asag

ile

asa

lead

gaze

llerdquo

PO

S+2

PO

S+

2es

sere

agil

eco

me

un

osc

oiat

tolo

ldquoto

be

asag

ile

asa

squ

irre

lrdquoP

OS

+2P

OS

+2

esse

reas

tuto

com

eil

dem

onio

ldquoto

be

ascl

ever

asth

ed

evilrdquo

-0

PO

S+

2es

sere

astu

toco

me

un

avo

lpe

ldquoto

be

assh

arp

asa

tack

rdquo-

0P

OS

+2

esse

reas

tuto

com

eu

no

zin

garo

ldquoto

be

ascu

nn

ing

asa

gyp

syrdquo

-0

NE

G-2

Tab

le3

26E

xam

ple

so

fid

iom

sth

atm

ain

tain

sw

itch

or

shif

tth

ep

rio

rp

ola

rity

oft

he

adje

ctiv

esth

eyco

nta

in

118CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(see example below) but just as the PECO ones they can also switchit or shift it

(41) Maria egrave matta[-2] da legare [-3]ldquoMaria is so crazy she should be locked up

The LG class EAA with definitional structure N0 essere Agg e Agg in-cludes two adjectives that can be polarized or not Example (42) showsthat in this case as well the sentence polarity can be also oppositewith respect to the adjectives ones

(42) Maria egrave bella[+2] e fritta[0] [-2]ldquoMaria is cooked

An example from the class CEACis reported in (43)

(43) Lrsquoanima egrave nera[0] come il carbone [-3]ldquoThe soul is black like the coalrdquo

Among the transformation that these idioms can have we includedN0 avere C Agg (44) that corresponds also to the most frequent one

(44) Maria ha lrsquoanima nera[0] come il carbone [-3]ldquoMaria has a black soul like the coalrdquo

In the end we formalized the LG class ECA that accounts in itsfrozen elements the nominal element C1 and the Adjective

(45) Maria egrave una gatta morta[0] [-2]ldquoMaria is a cock tease

3422 The Formalization of Sentiment Frozen Expressions

Due to their syntactic and lexical variation and to discontinuity frozenexpressions need to be formalized in an electronic context able to sys-tematically list them in electronic databases and to take into accounttheir syntactic variability Basically the final purpose is to take advan-tage of the large number of information listed into the LG matrices

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 119

while recognizing and annotating idioms in real textsOne of the best solution is to link dictionaries which contain the listsof Atomic Linguist Units and their syntactic and semantic propertieswith syntactic grammars which can make such words and propertiesinteract into a FSA context (Silberztein 2008) ALU are ldquothe smallestelements that make up the sentence ie the non-analyzable units ofthe languagerdquo (Silberztein 2003) They can take the shape of simplewords affixes multiword expressions and frozen expressionsWhat makes different frozen expressions and the other ALU is discon-tinuity as displayed in Figure 34 in fact it is possible for adverbs tooccur between the verb and the direct object (Vietri 2004) That iswhy frozen sentences need both dictionaries and FSA to be recognizedand annotated in a correct way the dictionaries allow the recogni-tion of idioms thanks to the characteristic components (word forms orsequences that occur every time the expressions occur and originatethe recognition of the frozen expressions in texts (Silberztein 2008))and FSA let the machine compute them despite of the many differentforms that in real texts they can assumeFigure 34 represents a simplified version the main graph of the FSA

for the recognition and the sentiment annotation of idioms of the LGclass PECO It includes a metanode that allows the recognition of anominal group in subject position (N0) an embedded graph for theverb and six nodes related to different Semantic OrientationsThe instruction XREF is used to exclude the linguistic material thatcan occur between the support verb and the idiom object when theannotation is performed Figure 35 shows as an example the contentof the metanode that annotates frozen expressions with +3 as Seman-tic Orientation (MOLTOPOS) Here we observe the interactions be-tween the idiom dictionary and the restrictions reported in the syntac-tic grammar In its structure the grammar under examination looksexactly the same as the grammar for the identification of free sen-tences with the same syntactic shape The difference are the syntac-tic and distributional constraints that in the grammar recall only the

120CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re34E

xtractofth

eF

SAfo

rth

eSO

iden

tificatio

no

fidio

ms

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 121

properties and the lexical items associated in the dictionary with aspecific characteristic component To be more precise the PECO dic-tionary among others lists the three entries shown below which havebeen labeled with the semantic tags POS and FORTE (+3) as happensin the other subsets of SentIta

Essere (forte+coraggioso) come un leone ldquoTo be (strong+brave) like alionrdquo

leoneN+Class=PECO

+A=coraggioso+A=forte+AVV=come+AVV=quanto+DET=un

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0essereunC1+N0esserecomeC1+N0esserepiu

Essere buono come il pane ldquoTo be as good as breadrdquo

paneN+Class=PECO

+A=buono+AVV=come+AVV=quanto+DET=il+PREP=di

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserediC1+N0esserecomeC1+N0esserepiu

Essere (astuto+furbo) come il demonio ldquoTo be as clever as the devilrdquo

demonioN+Class=PECO

+A=astuto+A=furbo+AVV=come+AVV=quanto+DET=il

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserepiu

122CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

As shown below the characteristic components C (eg leone ldquolionrdquopane ldquobreadrdquo demonio ldquodevilrdquo) have been used as Nooj entry in theelectronic dictionary and have been associated to the other compo-nents of the frozen expressions (eg +A +AVV +DET) Moreover theyhave been described also with other information referring to the fol-lowing properties borrowed from the already cited LG tables

distributionaleg +N0um +Vsup +Vcaus

transformationaleg +N0essereunC1 +N0esserecomeC1 +N0esserediC1 +N0esserepiu

All these properties are recalled in the idiom syntactic grammar inform of lexical and grammatical constraintThe restrictions +N0um and +Vsup +Vcaus are respectively meant forthe distributional properties of sentence subject that can or can notbe a human and of the support verb essere with its aspectual (Vsup)or causative (Vcaus) variants While the support verbs restare and ri-manere are always acceptable diventare and rendere are not accept-able only in few cases (Vietri 1990)We chose to formalize the distributional restrictions in the syntacticgrammars with the following instruction written under each involvednode and into a related variable (var)

lt$var=$C$vargt

It represents a constraint for the words that are allowed to appearinto the nodes it restricts which can be only the ones indicated in thedictionary by the same variable var for the entry C As an example theidiom Essere (forte+coraggioso) come un leone that in the dictionaryhas as values for the variable +A only the adjectives forte ldquostrongrdquo andcoraggioso ldquobraverdquo can not accept in the grammar into the variable Aother adjectives than the ones written above if the variable goes withthe restriction shown below

lt$A=$C$Agt

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 123

The same happens with the verb aspectual and causative variants(Vietri 1990) with similar restrictions

lt$Vvar=$C$Vvargtlt$Vcaus=$C$Vcausgt

The symbols ldquo=rdquo is used in Figure 35 when the inflection of theword in the variable is permitted (eg for the adjective essere furbicome il demonio) Otherwise it is used the symbol ldquo=rdquoWe preferred to treat transformational properties instead differentlyfrom Vietri (1990) In order to reduce the dimension of the gram-mar that contains here also the polarity and intensity informationwe wrote transformational constraints before the path starts in the fol-lowing form

lt$C$vargt

This instructs the FSA that the specific path is allowed just for theidioms that present in the dictionary the variable var for the entry CIn the FSA in Figure 35 the PECO standard structure Essere Agg come

C1 is recognized by the path (2) The path (1) deletes the adjective ifthe idiom recognized by C possesses the specific transformational rule+N0esserecomeC1 Below are reported examples of idioms that satisfythis constraint The ones preceded by the asterisk do not satisfy therestrictions

essere come un leone ldquoto be like a lionrdquoessere come il pane ldquoto be like the breadrdquoessere come il demonio ldquoto be like the devilrdquo

In the path (3) the grammar performs the deletion of both the ad-jective and the adverb come for the idioms endowed with the property+N0essereunC1 in the electronic dictionary

essere un leone (di uomo) ldquoto be a lion (man)rdquoessere un pane ldquoto be a breadrsquoessere un demonio (di uomo) ldquoto be a devil (man)rdquo

124CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re35E

xtractofth

eF

SAfo

rth

eextractio

no

fPE

CO

idio

ms

35 THE AUTOMATIC EXPANSION OF SENTITA 125

Differently from essere un pane that is not acceptable at all es-sere come il demonio and essere un demonio (di uomo) are accept-able sentences that change the polarity of the basic idiom probablybecause they are related with another comparative sentence with op-posite meaning and orientation eg essere (brutto + cattivo) come ildemonio ldquoto be (ugly + evil) like the devilrdquo (Vietri 1990)The last two paths respectively refer to the correlation of PECO withother sentence structures such as N0 essere di C1 (4) indicated by theproperty +N0esserediC1 and N0 essere piugrave Agg di C summarized in+N0esserepiu in the dictionary and in the path (5)

essere (fatto + E) di pane ldquoto be made of breadrdquoessere (fatto + E) di leone ldquoto be made of lionrdquoessere (fatto + E) di demonio ldquoto be made of devilrdquo

essere piugrave (forte + coraggioso) di un leone ldquoto be (stronger + braver)than a lionrdquoessere piugrave buono del pane ldquoto be better that the breadrdquoessere piugrave (astuto + furbo) del demonio ldquoto be cleverer that the devilrdquo

While just the first idiom is allowed to walk through the path (4)the path (5) implies a property accepted by all the exemplified idioms

35 The Automatic Expansion of SentIta

The present Section discusses the automatic enlargement of the man-ually built electronic dictionaries of SentItaOur work took advantage of derivation linguistic clues that put in rela-tion semantically oriented adjectives with quality nouns and with ad-verbs in -mente The purpose is making new words automatically de-rive the semantic information associated to the adjectives with whichthey are morpho-phonologically relatedFurthermore we used as morphological Contextual Valence Shifters(mCVS) a list of prefixes able to negate or to intensifydowntone the

126CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

orientation of the words with which they occurWe clarify in advance that the morphological method could have beenapplied also to Italian verbs but we chose to avoid this solution be-cause of the complexity of their argument structures We decidedinstead to manually evaluate all the verbs described in the ItalianLexicon-Grammar binary tables so we could preserve the differentlexical syntactic and transformational rules connected to each oneof them

351 Literature Survey on Sentiment Lexicon Propagation

The works on sentiment lexicon propagation follow three main re-search lines The first one is grounded on the richness of the alreadyexistent thesauri WordNet (Miller 1995) among others AlthoughWordNet does not include semantic orientation information for itslemmas semantic relations such as synonymy or antonymy are com-monly used in order to automatically propagate the polarity startingfrom a manually annotated set of seed words Anyway this approachpresents some drawbacks such as the lack of scalability the unavail-ability of enough resources for many languages including Italian andthe difficulty to handle newly coined words which are not alreadycontained in the thesauri Furthermore homographs which belongto different synsets could present a diversity of meanings and polari-ties (Silva et al 2010)The second line of research is based on the hypothesis that the wordsthat convey the same polarity appear close in the same corpus sothe propagation can be performed on the base of co-occurrence al-gorithmsIn the end the morphological approach is the one that employs mor-phological structures and relations for the the assignment of the priorsentiment polarities to unknown words on the base of the manipula-tion of the morphological structures of known lemmas

35 THE AUTOMATIC EXPANSION OF SENTITA 127

In the following Paragraphs we will briefly describe the most interest-ing contributions pertaining to the just described research areas

3511 Thesaurus Based Approaches

Kamps et al (2004) investigated the graph-theoretic model of Word-Netrsquos synonymy relation and using elementary notions from graphtheory proposed a measure for the automatic attribution of the se-mantic orientation to adjectives All the words listed in WordNet havebeen collected and related on the base of their occurrence in the samesynsetAlthough current WordNet-based measures of distance or similarityusually focus on taxonomic relations and in general include in theiralgorithms the antonymy relation the authors preferred to use onlythe synonymy relationHu and Liu (2004) chose to limit the lexicon construction to the ad-jectives occurring in sentences that contained one or more productfeatures due to their defined interest on product featuresIn their research in order to predict the semantic orientations ofsuch adjectives the authors proposed a simple and effective methodgrounded on the adjective synonym and antonym sets discoveredsurfing the WordNet graphsKim and Hovy (2004) proposed a system for the automatic detectionof opinion holders topics and features for each opinion that containstwo modules one for the determination of the word sentiment andanother for the sentiment sentencesIn their System Architecture it is interesting for the task in exam thestage in which the word sentiment classifier individually calculatedthe polarity of all the sentiment-bearing words Their basic approachstarted from a small amount of seed words sorted by polarity into apositive and a negative list WordNet has been used to lengthen theinitial list on the base of two very intuitive insights synonyms of polarwords which usually preserve the orientation of the original lemma

128CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

and antonyms of polar words that reverse the orientation of the origi-nal lemmasEsuli and Sebastiani (2005) presented a semi-supervised learningmethod for the orientation identification of subjective terms basedon the classification of their glossesThe authors started from a representative seed set built for both thepositive and negative categories and from some lexical relations de-fined in WordNet (eg synonymy hypernymy and hyponymy for theannotation of terms with the same orientation and direct antonymyand indirect antonymy for the terms with opposite orientation) Everytime in which a new term is added to the original ones it becomes partof the training set for the learning phase performed with a binary textclassifier All the glosses of a given term found in a machine-readabledictionary are collocated into a vectorial form thanks to standard textindexing techniquesEsuli and Sebastiani (2006b) tested the following approaches

bull a two-stage method in which a first classifier groups the termsinto the Subjective or Objective categories and another classifiertags the Subjective terms as Positive or Negative

bull other two binary classifiers based on learning algorithms cat-egorise terms belonging to the Positivenot-Positive categoriesand the ones fitting the Negativenot-Negative categories

bull a method that simply considers Positive Negative and Objectiveas three categories with equal status

Argamon et al (2009) grounded on the Appraisal Theory a methodfor the automatic determination of complex sentiment-related at-tributes (eg type and force) The authors applied supervised learn-ing to Word-Net glosses The method started from small training setsof (positive or negative) seed words and then added them new setsof terms collected in the WordNet graph along the synonymy andantonymy relations This way based on the WordNet glosses each

35 THE AUTOMATIC EXPANSION OF SENTITA 129

term can received a vectorial representations thanks to standard textindexing techniquesDragut et al (2010) based their research on few basic assumptions

bull the semantic relations between words with known polarities canbe used to enlarge sentimental word dictionaries

bull two words are related if they share at least a synset

bull each synset has a unique polarity

bull the deduction can be performed just on few WordNet semanticrelations eg hyponymy antonymy and similarity

Thanks to these intuitions given a sentimental seed dictionary theydeduced approximately an additional 50 of words endowed with aspecific polarity on top of the electronic dictionary WordNetHassan and Radev (2010) applied a Markov random walk model to alarge word relatedness graph with the purpose of producing a polarityestimate of the subjective wordsIn detail in their network two nodes are linked if they are semanti-cally related Among the sources of information used as indicators ofthe relatedness of words we mention WordNetThe hypothesis of the work is the following a random walk that startswith a given word is more likely to firstly hit a word with the same se-mantic orientation than words with a different orientationPaulo-Santos et al (2011) with a semi-supervised approach createda small polarity lexicon (3000+ entries 8486 accuracy) for the Por-tuguese language having as input only a limited collection of re-sources a set of ten seed words labelled as positive or negative acommon online dictionary a set of extraction rules and an intuitivepolarity propagation algorithmThe authors implemented their task by converting the online dictio-nary into a directed graph in which nodes correspond to words andedges correspond to synonyms antonyms or other semantic relationsbetween words Thus they propagated the polarities of the known

130CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

seed set to unlabelled words applying a graph breadth-first traversalIn their research Maks and Vossen (2011) explored two approachesfor the annotation of polarity (positive negative and neutral) of ad-jective synsets in Dutch using WordNet as lexical resource The firstapproach is based on the translation of the English SentiWordNet 10(Esuli and Sebastiani 2006b) into Dutch and the respective transferof the lemmasrsquo polarity values In the second approach WordNet isused in combination with a propagation algorithm that starts with aseed list of synsets of known sentiment and extend the polarity valuesthrough WordNet making use of its lexical relations

3512 Corpus Based Approaches

Baroni and Vegnaduzzo (2004) observed the co-occurrences of polaradjectives into subjective texts The ranking of a large list of adjectivesaccording to a subjectivity score has been implemented without anyknowledge-intensive external resource (eg lexical databases humanannotation ect ) The authors made use just of an unannotated listof adjectives and a small seeds set of manually selected subjective ad-jectivesThe main idea of this work is taken from the Turney (2001) work on co-occurrence statistics applied on the web as a corpus through the Web-based Mutual Information (WMI) method Baroni and Vegnaduzzo(2004) obtained their subjectivity scores by computing the Mutual In-formation of pairs of adjectives taken from each set using frequencyand co-occurrence frequency counts on the World Wide Web col-lected through queries to the AltaVista search engineKanayama and Nasukawa (2006b) proposed an unsupervised methodof lexicon construction for the annotation of polar clauses for domain-oriented Sentiment AnalysisThey grounded their work on unannotated corpora and anchoredtheir research on the context coherency (the tendency for equal po-larities to appear successively in contexts) achieving very satisfying re-

35 THE AUTOMATIC EXPANSION OF SENTITA 131

sults in terms of PrecisionQiu et al (2009) proposed a double propagation method which in-volved different kinds of relations between both sentiment words andfeatures (words modified by the sentiment words)With this method it has been possible to locate specific sentimentwords from relevant reviews starting from a small set of seed senti-ment words Their research emphasized the fact that in reviews senti-ment words always occur in association with featuresDouble propagation basically means that both sentiment words andfeatures can be reutilized to extract new sentiment words and newfeatures until no more sentiment words or features can be identifiedto continue the process The relations are described above all by De-pendency Grammars and trees Tesniegravere (1959) which represented thebase for the design of the extraction rulesIn detail the polarity of the new words is inferred using the followingrules

bull Heterogeneous rule the same polarity of known words is as-signed to the novel words

bull Homogeneous rule the same polarity of known words is as-signed to the novel words unless contrary words occur betweenthem

bull Intra-review rule in the cases in which the polarity cannot beinferred by dependency cues it is assumed that the sentimentword takes the polarity of the whole review

Maks et al (2012) proposed a method for the Dutch language thatis based on the idea that the words that express different types of sub-jectivity are distributed differently depending on the text types there-fore they grounded their work on the comparison between three cor-pora Wikipedia News and News commentsIn order to perform their task they used and test two different calcula-tions generally employed in Keyword Extraction tasks to identify the

132CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

words significantly frequent in a corpus with respect to other corporathe Log-likelihood Technique and the Percentage Difference Calcula-tions (DIFF) Their research revealed the better performances of theDIFF calculationWawer (2012) introduced for the Polish language a novel iterativetechnique of sentiment lexicon expansion which involved rule min-ing and contrast sets discovery The research took advantages fromtwo different textual resources in a first stage a small corpus endowedwith morpho-syntactic annotations (The National Corpus of Polish)to acquire candidates for emotive patterns and evaluate the vocabu-lary and then the whole web as corpus to perform the lexical expan-sionThe key idea of this work implies the fact that if a word appears inthe same extraction patterns as the seeds so they belong to the samesemantic class

3513 The Morphological Approach

Hatzivassiloglou and McKeown (1997) proposed a method for thesentiment lexicon expansion based on morphological relations be-tween adjectives They demonstrated that adjectives related in formalmost always have different semantic orientations eg ldquoadequate-inadequaterdquo ldquothoughtful-thoughtlessrdquo achieving 97 accuracyMoilanen and Pulman (2008) proposed five methods for the assign-ment of the prior sentiment polarities to unknown words based onknown sentiment carriers They started from a core sentiment lexi-con which contained 41109 entries tagged with positive (+) neutral(N) or negative (-) prior polarities and then they employed a classi-fier society of five rule-driven classifiers every one of which adopteda specific analytical strategy

Conversion that estimated zero-derived paronyms by retagging theunknown words with different POS tags

35 THE AUTOMATIC EXPANSION OF SENTITA 133

Morphological derivation that relied on derivational and inflectionalmorphology and is based on the words progressive transforma-tion into shorter paronymic aliases by using affixes and neo-classical combining forms Also some polarity reversal affixes(eg -less and not-so-) were supported

Affix-like Polarity Markers that computed affix-like sentiment mark-ers (eg well-built badly-behaving) which usually propagatesits sentiment orientation across the whole word

Syllables that split unknown words into individual syllables with arule-based syllable chunker and then computed them in orderto connect them with the words contained in the original lexicon

Shallow Parsing that split and POS-tagged unknown word into vir-tual sentences

Ku et al (2009) employed morphological structures and relationsfor the opinion analysis on Chinese words and sentences They classi-fied the words into eight morphological types through Support VectorMachines (SVM) and Conditional Random Fields (CRF) classifiersChinese morphological formative structures consist in three majorprocesses compounding affixation and conversion Every word iscomposed of one or more Chinese characters on the base of whichthe wordrsquos meaning can be interpretedThe authors demonstrated that the injection of morphological infor-mation can truly improve the performances of the word polarity de-tection taskNeviarouskaya (2010) proposed the expansion of the Sentiful lexicongrounding its task on the manipulation of the morphological struc-ture base and affixes of known lemmas (Plag 2003)The derivation process a linguistic device for the creation of new lem-mas from known ones by adding prefixes or suffixesThe results of the application of this method are 4000+ new derivedand scored sentiment words (1400+ adjectives almost 500 adverbs

134CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1800 nouns and almost 350 verbs)The author distinguished four types of affixes on the base of their rolesin the sentiment feature attribution

Propagating The sentiment features are preserved and are inher-ited by new derived lexical units often belonging to differentgrammatical categories (eg en- -ous -fy) The scoring functiontransfers the original polarity score to the new word without anyvariation

Reversing The affixes have the effect of changing the semantic orien-tation of the original lexeme (eg dis- -less) The scoring func-tion simply switches the original score of the starting lemma

Intensifying The sentiment features are strengthened (eg over-super-) The scoring function increases the score of the new wordwith respect to the score of the original lemma

Weakening The sentiment features are decreased (eg semi-) so thescore of the derive word is proportionally reduced by the scoringfunction

In the the approach of Neviarouskaya (2010) if a word is not alreadycontained in the Sentiful lexicon the algorithm checks the presenceof the missing lemma into the Wordnet database if the word actuallyexists there it is taken into account for a future inclusion into theSentiful lexicon Its sentiment orientation is always deduced from thefeatures of the starting word and the affixes used to derivate itNeviarouskaya et al (2011) described methods to automatically gen-erate and score a new sentiment lexicon SentiFul and expand itthrough direct synonymy and antonymy relations hyponymy rela-tions derivation and compounding with known lexical unitsThe authors distinguished four types of affixes used to derive newwords depending on the role they play with regard to sentimentfeatures propagating reversing intensifying and weakening Theyelaborated the algorithm for automatic extraction of new sentiment-

35 THE AUTOMATIC EXPANSION OF SENTITA 135

related compounds from WordNet by using words from SentiFul asseeds for sentiment-carrying base components and applying thepatterns of compound formationsWang et al (2011) proposed a morpheme-based fine-to-coarse strat-egy for Chinese sentence-level sentiment classification which usedpre-existing sentiment dictionaries in order to extract sentiment mor-phemes and calculate their polarity intensity Such morphemes arethen reused to evaluate the sentence-level semantic orientation onthe base of the sentiment phrases and their relevant polarity scoresThe authors preferred morphemes to words as the basic tokens forsentiment classification because they are much less numerous thanwordsWang et al (2011) derived the semantic orientation of unknown senti-ment words on the base of their component morphemes (Yuen et al2004 Ku et al 2009) thanks to a specific morpheme segmentationmodule which applied the Forward Maximum Matching (FMM) wordsegmentation technique for the decomposition of words into stringsof morphemes Their approach can manage both unknown lexicalsentiments and contextual sentiments for sentence-level sentimentclassification by taking into consideration multiple granularity-levelsentiments (eg sentiment morphemes sentiment words and senti-ment phrases) in a morpheme-based framework

352 Deadjectival Adverbs in -mente

As anticipated this Section aims to enlarge the size of SentIta on thebase of the morphological relations that connect the words and theirmeanings In a first stage of the work more than 5000 labeled ad-jectives have been used to predict the orientation of the adverbs withwhich they were morphologically related This Section explores therules formalized to perform the task

136CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Adverbs are morphologically invariable and consequently they donot present any inflection Anyway the majority of them is charac-terized by a complex structure that includes an adjective base and thederivational morpheme -mente ldquo-lyrdquo

[[veloce]A -mente]AVV ldquofast-lyrdquo

[[fragile]A -mente]AVV ldquodelicate-lyrdquo

[[rapido]A -mente]AVV ldquorapid-lyrdquo

Therefore thanks to a morphological FSA in SentIta the dictio-nary of sentiment adverbs has been derived from the adjectives one(see Figure 36) All the adverbs contained in the Italian dictionary ofsimple words have been put in a Nooj text and the above-mentionedgrammar has been used to quickly populate the new dictionary by ex-tracting the words ending with the suffix -mente ldquo-lyrdquo (Ricca 2004b)and by making such words inherit the adjectivesrsquo polarityThe Nooj annotations consisted in a list of 3200+ adverbs that at alater stage has been manually checked in order to adjust the gram-marrsquos mistakes (eg vigorosamente ldquovigorouslyrdquo and pazzamenteldquomadlyrdquo that respectively come from the positive adjective ldquovigorousrdquoand the negative adjective ldquomadrdquo as adverbs become intensifiers) andto add the Prior Polarity to the adverbs that did not end with the suf-fixes used in the grammar (eg volentieri ldquogladlyrdquo controvoglia ldquoun-willinglyrdquo)The manual check produced a set of 3600+ adverbs of sentiment thathas been completed with 126 adverbs that did not end in -menteIn detail the Precision achieved in this task is 99 and the Recall is88 The distribution of semantic tags in this dictionary is reported inTable 327 Figure 36 shows the local grammar in which we formal-ized the rules for the adverbs formation We started the word recogni-tion anchoring it on the localization of an adjective stem (see the firstnode on the left in the variable Agg)Than the automaton continues the word analysis by checking

35 THE AUTOMATIC EXPANSION OF SENTITA 137

Fig

ure

36

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofs

enti

men

tad

verb

s

138CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Tag Automatically Manuallygenerated adjusted

POS+FORTE 82 89POS 743 779

POS+DEB 61 55NEG+DEB 436 456

NEG 1273 1466NEG+FORTE 160 158

FORTE 366 482DEB 84 163

SINTESI 0 9CONFERMA 0 17

INVERTE 0 19Tot 3205 3693

Tot Int 450 645Tot Sent 2755 3003

Tot Discorso 0 42

Table 327 Composition of the adverb dictionary

that the adjective stem belongs to the sentiment dictionary (eg$Agg=A+POS+DEB) Figure 36 represents just an extract of the biggergrammar that includes not only the positive (POS) adverbs but alsothe negative onesNotice in the center of the FSA a multiplication of paths This dependson the different inflectional classes of adjectives to which the suffix -mente is attached (Figure 328)The rules used in the grammar to derive the adverbs are given in thefollowing (see Table 329 and Figure 328)

bull class 2 first path nothing in the adjective changes (eg veloc-eveloce-mente)

bull class 2 second path in the adjectives ending in -re -le the -e isdeleted [e] (eg fragil-e fragil-mente )

bull class 1 third path the -o is deleted [o] and substituted by the

35 THE AUTOMATIC EXPANSION OF SENTITA 139

Adj Number=s Number=pClass Gender=m Gender=f Gender=m Gender=f

1 -o -a -i -e2 -e -i

Table 328 Adjectiversquos inflectional classes Salvi and Vanelli (2004)

Adj Adjective AdverbClass Deletion Stem Thematic Vowel Suffix

1 rapid-o o rapid- -a-2 veloc-e - veloce- - -mente

fragil-e e fragil- -

Table 329 Rules for the adverb derivation

thematic vowel -a (eg rapid-o rapid-a-mente)

Actually in the FSA almost nothing about the inflection of the baseadjectives has been specified Because the adverbs of sentiment mustbe identified among the whole list of nooj adverbs and semanticallyannotated (and not generated from scratch) we did not find neces-sary to recall all the inflectional classes of the derived adjectives justthe deletion or the conservation of the final vowels was used to selectthe correct derivational ruleMistakes concerning the annotation of the adjectives regard those ex-ceptions that were not enough productive to deserve a specific pathin the local grammar Examples are the adjectives in -lento althoughthey have the -o as last vowel they do not require the thematic vowel-a- but the -e- (eg violento becomes violent-e-mente rather thanviolent-a-mente)

3521 Semantics of the -mente formations

The meaning of the deadjectival adverbs in -mente is not always pre-dictable starting from the base adjectives from which they are derived

140CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Also the syntactic structures in which they occur influences their in-terpretation Depending on their position in sentences the deadjecti-val adverbs can be described as follows

Adjective modifiers they modify adjectives or other adverbs eventhough two adverbs in -mente can appear close almost never

bull degree modifiers

ndash intensifierseg altamente ldquohighlyrdquo enormemente ldquoenormouslyrdquo

ndash downtonerseg moderatamente ldquomoderatelyrdquo parzialmente ldquopar-tiallyrdquo

Predicate modifiers they are translatable with the paraphrase in a Away in which A is the adjective base

bull verb argumentseg Maria si comporta perfettamenteldquoMaria acts perfectlyrdquo

bull extranuclear elementseg Maria si reca settimanalmente a MilanoldquoMaria goes weekly to Milanrdquo

Sentence modifiers they usually do not modify the sentence contentbut they often give it coherence or work as discourse signals

bull evaluative adverbs eg stupidamente ldquofoolishlyrdquo

bull modal adverbs eg necessariamente ldquonecessarilyrdquo

bull circumstantial adverbs of time eg ultimamente ldquolatelyrdquo

bull quantifiers over time eg frequentemente ldquooftenrdquo

bull domain adverbs eg politicamente ldquopoliticallyrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 141

The French LG description of adverbs in -mente includes 3200+items represented in LG tables which classify the adverbs in a sim-ilar way They refer to sentential adverbs eg conjuncts style dis-juncts and attitude disjuncts that includes evaluative adverbs adverbsof habit modal adverbs and subject oriented attitude adverbs andadverbs integrated into the sentence eg adverbs of subject orientedmanner adverbs of verbal manner adverbs of quantity adverbs oftime viewpoint adverbs focus adverbs (Molinier and Levrier 2000Sagot and Fort 2007)

3522 The Superlative of the -mente Adverbs

As concern the superlative form of the adverbs in -mente it must beunderlined that they have been treated as adverbs derived from thesuperlative form of the adjectives In fact the rule for the adverb for-mation is selected by the inflectional paradigm of the superlative formand not by the adjective inflection Moreover the semantic orienta-tion inherited by the superlative adverb is again the one belonging tothe superlative adjective and not the one of the adjective itself There-fore the superlative adverbs FSA is almost the same of the one shownin Figure 1 the only difference is in the recognition of the base adjec-tive that is A+SUP rather than only A

353 Deadjectival Nouns of Quality

The derivation of quality nouns (QN) from qualifier adjectives is an-other derivation phenomenon of which we took advantage for the au-tomatic enlargement of SentIta These kind of nouns allow to treat asentities the qualities expressed by the base adjectivesA morphological FSA (Figure 37) following the same idea of the ad-verbs grammar matches in a list of abstract nouns the stems that are

142CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

in morpho-phonological relation with our list of hand-tagged adjec-tives Because the nouns differently from the adverbs need to havespecified the inflection information we associated to each suffix en-try in the QN suffixes electronic dictionary the inflectional paradigmthat they give to the words with which they occur (see Table 330)

As regards the suffixes used to form the quality nouns (Rainer 2004)it must be said that they generally make the new words simply inheritthe orientation of the derived adjectives Exceptions are -edine and -eria that almost always shift the polarity of the quality nouns into theweakly negative one (-1) eg faciloneria ldquoslapdash attitude Also thesuffix -mento differs from the others in so far it belongs to the deriva-tional phenomenon of the deverbal nouns of action (Gaeta 2004) Ithas been possible to use it into our grammar for the deadjectival nounderivation by using the past participles of the verbs listed in the ad-jective dictionary of sentiment (eg Vsfinire ldquoto wear out Asfinitoldquoworn out Nsfinimento ldquoweariness) Basically we chose to includeit in our list of suffixes because of its productivity This caused an over-lap of 475 nouns derived by both the psychological predicates and thequalifier adjectives of sentiment Just 185 psych nominalizations ex-ceeded the coverage of our mophological FSA proving the power ofour methodology also in terms of RecallIn about half the cases the annotations differed just in terms of inten-

sity the other mistakes affected the orientation also (see Table 333)In general the Precision achieved in this task is 93 while the Preci-sion performances of the automatic tagging of the QN compared withthe manual tagging made on the corresponding nominalizations ofthe psych verbs is summarized in Table 334Tables 331 and 332 show the differences in productivity of all the suf-

fixes (SFX) among the electronic dictionary of abstract nouns (Abstr)the sublists of QN that have their counterparts in the Psy V-n list (PsyV-n=QN) and the whole collection of QN (QN tot)As regards the suffixes productivity Abstr the suffixes that achievedthe higher percentage values are -itagrave -ia and -(zione) The most pro-

35 THE AUTOMATIC EXPANSION OF SENTITA 143

Fig

ure

37

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofq

ual

ity

no

un

s

144CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Infl

ecti

on

alp

arad

igm

(FL

X)

Co

rrec

tmat

ches

Err

ors

Pre

cisi

on

()

-edin-e N46 0 0 --etagrave N602 0 0 --izie N602 0 0 --el-a N41 0 1 0-udine N46 5 2 7143-or-e N5 36 9 80-zion-e N46 359 59 8589-anz-a N41 57 9 8636-itudin-e N46 13 2 8667-ur-a N41 142 20 8765-ment-o N5 514 58 8986-izi-a N41 14 1 9333-enz-a N41 148 10 9367-eri-a N41 71 4 9467-ietagrave N602 27 1 9643-aggin-e N46 72 2 973-i-a N41 145 3 9797-itagrave N602 666 13 9809-ezz-a N41 305 2 9935-igi-a N41 3 0 100-z-a N41 2 0 100tot - 2579 196 9294

Table 330 Error analysis of the automatic QN annotation

35 THE AUTOMATIC EXPANSION OF SENTITA 145

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 498 73 305-aggin-e 149 13 72-itudin-e 31 1 13-ietagrave 65 2 27-igi-a 9 0 3-izi-a 40 2 14-enz-a 461 11 148-anz-a 189 7 57-eri-a 267 13 71-itagrave 3263 53 666-or-e 297 6 36-zion-e 2866 87 359-ur-a 1700 11 142-udin-e 39 3 5-i-a 3545 13 145-el-a 15 0 0-edin-e 4 0 0-etagrave 66 0 0-izie 3 0 0-z-a 30 2 2-ment-o 2520 150 514

Table 331 Presence of the QN suffixes in different dictionaries

146CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 31 1633 628-aggin-e 093 291 148-itudin-e 019 022 027-ietagrave 04 045 056-igi-a 006 0 006-izi-a 025 045 029-enz-a 287 246 305-anz-a 118 157 117-eri-a 166 291 146-itagrave 2032 1186 1372-or-e 185 134 074-zion-e 1785 1946 74-ur-a 1059 246 293-udin-e 024 067 01-i-a 2208 291 299-el-a 009 0 0-edin-e 002 0 0-etagrave 041 0 0-izie 002 0 0-z-a 019 045 004-ment-o 1569 3356 1059

Table 332 Productivity of the QN suffixes in different dictionaries

35 THE AUTOMATIC EXPANSION OF SENTITA 147

Co

rres

po

nd

ence

s

Dif

fere

nce

s

Dif

fere

ntO

rien

tati

on

On

lyD

iffe

ren

tIn

ten

sity

475 185 93 92

Table 333 About the overlap among Psych V-n dictionary and QN dictionary

Precision Precision Orientation onlyQNltgtPsy V-n 092 -QN= Psy V-n 061 080

Table 334 Precision measure about the overlap of QN and Psy V-n

ductive suffixes for the Psy V-n=QN formation are -(z)ione -mento and-ezza while the ones most productive for the QN tot are -itagrave -(z)ione-mento

354 Morphological Semantics

Our morphological FSA can moreover interact with a list of pre-fixes able to negate (eg anti- contra- non- ect ) or to inten-sifydowntone (eg arci- semi- ect ) the orientation of the wordsin which they appear (Iacobini 2004)If the suffixes for the creation of quality nouns can interact with thepreexisting dictionaries of the Italian module of Nooj in order toautomatically tag them with new semantic descriptions the prefixestreated in this Section directly work on opinionated documents sothe machine can understand the actual orientation of the wordsoccurring in real texts also when their morphological context shifts

148CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

the polarity of the words listed in the dictionariesThe collection of the suffixes that from now on we will call Morpho-logical Contextual Valence Shifters (mCVS) is reported in Tables 335and 336

Prefixes Negation typesnon- CDDmis- CRa- CR+PRVdis- CR+PRVin- CR+PRVs- CR+PRVanti- OPPcontra- OPPcontro- OPPde- PRVdi- PRVes- PRV

Table 335 Negation suffixes

They are endowed with special tags that specify the way in whichthey alter the meaning of the sentiment words with which they occur

bull FORTE ldquostrongrdquo intensifies the Semantic Orientation of thewords making their polarity increase of one position in the eval-uation scale (first path in Figure 38)

bull DEB ldquoweakrdquo downtones the Semantic Orientation of the wordsmaking their polarity decrease of one position in the evaluationscale (second path in Figure 38)

bull NEGAZIONE ldquonegationrdquo works following the same rules of thenegation formalised in the syntactic grammars (third path in Fig-ure 38)

ndash CR ldquocontraryrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 149

Intensifiers Downtonersiper- bis-macro- fra-maxi- infra-mega- intra-multi- ipo-oltre- micro-pluri- mini-poli- para-sopra- semi-sovra- sotto-stra- sub-super- -sur- -ultra- -

Table 336 Intensifierdowntoner suffixes

ndash OPP ldquooppositionrdquo

ndash PRV ldquodeprivationrdquo

ndash CDD ldquocontradictionrdquo

Figure 38 shows the morphological FSA that combines the polarityand intensity of the adjectives of sentiment with the meaning carriedon by the mCVSThe shifting rules in terms of polarity score which are the same ex-ploited in the syntactic grammar net (see Section 4) are summarizedin this grammar through the green comments on the right side of theautomatonAs regards the manipulation of the annotations of the resulting wordswe followed three parallel solutions

bull when the score doesnrsquot change (eg the case of the intensificationof words with starting polarity +3 or the case in which words withpolarity -1 are decreased) the resulting word simply inherits theinflectional ($2F) and syntactic and semantic ($2S) information

150CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re38M

orp

ho

logicalC

on

textualV

alence

Shiftin

go

fAd

jectives

35 THE AUTOMATIC EXPANSION OF SENTITA 151

of the original word

bull when the intensity changes and the polarity remains steady (egwhen the words with polarity +-2 are intensified or downtoned)the resulting word inherits the inflectional and syntactic and se-mantic information of the original word but also obtain the tagsFORTEDEB

bull when both the intensity and the polarity change (in almost ev-ery case in which the words are negated) the resulting word justinherits the inflectional information while the semantic tags areadded from scratch

We used the list of the sentiment adjectives as Nooj text in order tocheck the performances of our grammar The discovery that the wordscontaining the mCVS lemmatised in the dictionary are very few (just29 adjectives eg straricco ldquovery richrdquo ultraresistente ldquoheavy dutyrdquostramaledetto ldquodamnedrdquo and 9 adverbseg strabene ldquovery wellrdquo ul-trapiattamente ldquovery dullyrdquo) increases the importance of a FTA likethe one described in this ParagraphIt is also important to underline that all the synonyms of poco ldquofewrdquoalso the ones that take the shape of morphemes eg ipo- sotto- sub-do not seem be proper downtoners but resemble the behaviour ofthe negation words that transform the sentiment words with whichthey occur into weakly negative ones see Section 42 (eg ipodotatoldquosubnormalrdquo ipofunzionante ldquohypofunctioningrdquo) Therefore we ex-cluded them from the dictionary of mCVS and we included it into adedicated metanode of the morphological FSA able to correctly com-pute its meaning

Chapter 4

Shifting Lexical Valencesthrough FSA

41 Contextual Valence Shifters

In Section 331 we provided the definitions of Semantic Orientationthe measure of the polarity and intensity of an opinion (Taboada et al2011 Liu 2010) and of Prior Polarity the individual words contex-independent orientation (Osgood 1952) While the second concept isrelevant during the population of sentiment dictionaries the first onerefers to the final semantic annotation that can be automatically ormanually attributed to whole sentences and documentsThe need to make a distinction between these two measures is due tothe semantic modifications that can come from the words surround-ing contexts (Pang and Lee 2008a) Contextual Valence Shifters arelinguistic devices able to change the prior polarity of words when co-occurring in the same context (Kennedy and Inkpen 2006 Polanyiand Zaenen 2006)From a computational point of view Contextual Valence Shifting

153

154 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

is nothing more than an enhanced version of the term-countingmethod firstly proposed by Turney (2002) but from the distribu-tional analysis perspective it issues a number of challenges that de-serve to be examined (Neviarouskaya et al 2009a)In this work we handle the contextual shifting by generalizing all thepolar words that posses the same prior polarity A network of localgrammars as been designed on a set of rules that compute the wordsindividual polarity scores according to the contexts in which they oc-curIn general the sentence annotation is performed using six differentmetanodes that working as containers for the sentiment expressionsassign equal labels to the patterns embedded in the same graphsWe used the FSA editor of Nooj to formalize our CVS rules for theirease of use and their computational power but nothing prevents theuse of other strategies and tools for their application to texts Con-sequently in this chapter we will limit the discussion to the shift-ing rules without any further mentions of their actual formalizationwhich in a banal way can be summarized into the treatment of em-bedded graphs as score boxes

42 Polarity Intensification and Downtoning

421 Related Works

Amplifiers and downtoners (Quirk et al 1985) intensifiers and di-minishers (Polanyi and Zaenen 2006) overstatements and under-statements (Kennedy and Inkpen 2006) intensifiers and downtoners(Taboada et al 2011) are all couples of concepts that refer to the in-creasing or decreasing effects of special kinds of words on orientedlexical itemsAlthough the changes of the polarity scores are very intuitive the sim-ple addition and subtraction of a score tofrom the base valence of a

42 POLARITY INTENSIFICATION AND DOWNTONING 155

term (Polanyi and Zaenen 2006 Kennedy and Inkpen 2006) has beencriticized by Taboada et al (2011) which instead associated differentpercentage values (positive for intensifiers and negative for downton-ers) to each strength wordDifferently from both of these works we decided to restrain the com-plexity of the rules by avoiding the distinction of degrees In compen-sation we did not limit the intensive function only to degree adverbsand adjectives but we took under consideration also verbs and nouns

422 Intensification and Downtoning in SentIta

The lexical items for the computational treatment of intensification inSentIta are the ones denoted by the tags FORTE and DEBAs stated in the previous Chapter Section 33 the lexical items en-dowed with these labels do not possess any orientation but can mod-ify the intensity of semantically oriented wordsThe strength indicators included in the intensification FSA that ac-count for (but are not limited to) such modifiers are the following

bull Morphological Rules

ndash the use of the absolute superlative suffixes -issim-o and -errim-o that always increase the wordsrsquo semantic orienta-tions

ndash the use of intensifying or downtoning prefixes (eg super-stra- semi- micro-) shown in Section 354

bull Syntactic Rules

ndash the repetition of more than one negative or positive words(that always increases the wordsrsquo semantic orientations) egbello[+2] bello[+2] bello[+2] [+3] ldquonice nice nicerdquo)

ndash the co-occurrence of words belonging to the strength scale(tags FORTEDEB) with the sentiment words listed in the

156 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

evaluation scale (tags POSNEG)

As generally assumed in literature the adverbs intensify or atten-uate adjectives (46) verbs (48) and other adverbs (49) while the ad-jectives modify the intensity of nouns (47) We adopted the simpleststrategy for the polarity modification the additionsubtraction of onepoint into the evaluation scale (Polanyi and Zaenen 2006 Kennedyand Inkpen 2006) Because this scale does not exceed the +ndash3words that starts from this polarity can not be increased beyond (egdavvero[+] eccezionale[+3] [+3] ldquotruly exceptionalrdquo)

(46) Parzialmente[-] deludente[-2] anche il reparto degli attori [-1]ldquoPartially unsatisfying also the actor staff

(47) Ciograve che ne deriva () egrave una terribile[-2] confusione[-2] narrativa[-3]

ldquoWhat comes from it is a terrible narrative chaos

(48) Alla guida ci si diverte[+2] molto[+] [+3]ldquoIn the driverrsquos seat you have a lot of fun

(49) Ne sono rimasta molto[+] favorevolmente[+2] colpita [+3]ldquoI have been very favourably affected of it

Anyway as already discussed in Section 334 also special subsetsof verbs possess the power of intensifying or downtoning the polarityof the nominal groups or complement clauses they case by case affectin subject (50) or object (51) position (see Figure 31 Section 334)

(50) Lrsquoamore[+2] travolge[+] Maria [+3]ldquoThe love overwhelms Mariardquo

(51) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoYour words overwhelm Mary with happinessrdquo

Examples of intensifyingdowntoning verbs with the LG classesthey belong are reported below

42 POLARITY INTENSIFICATION AND DOWNTONING 157

bull the list of entries translated from the French work of Balibar-Mrabti (1995) eg travolgere[+] ldquoto overwhelmrdquo risplendere[+] ldquotoshinerdquo

bull verbs from LG class 24 (15 intense 2 weak) eg aumentare[+]

ldquoto increaserdquo gonfiare[+] ldquoto amplifyrdquo calare[ndash] ldquoto go downrdquodiminuire[ndash] ldquoto decreaserdquo

bull verbs from LG class 41 (45 intense 19 weak) eg rivoluzionare[+]

ldquoto revolutionizerdquo sconquassare[+] ldquoto smashrdquo moderare[ndash] ldquotomoderaterdquo placare[ndash] ldquoto appeaserdquo

bull verbs from LG class 47 (43 intense 18 weak) eg strillare[+] ldquotoshoutrdquo sventolare[+] ldquoto show offrdquo mormorare[ndash] ldquoto murmurrdquosussurrare[ndash] ldquoto whisperrdquo

bull verbs from LG class 50 (4 intense 0 weak) eg assicurare[+] ldquotoassurerdquo garantire[ndash] ldquoto guaranteerdquo

bull verbs from LG class 56 (5 intense 3 weak) eg abbandonarsi[+]

ldquoto abandon yourselfrdquo affrettarsi[+] ldquoto rushrdquo accennare[ndash] ldquototouch uponrdquo esitare[ndash] ldquoto hesitaterdquo

As regards intensifying or downtoning nouns we refer to the onesautomatically derived from degree adjectives (220 intense 95 weak)eg sfrenatezza[+] ldquowildnessrdquo abbondanza[+] ldquoabundancerdquo labilitagrave[ndash]

ldquoevanescencerdquo fugacitagrave[ndash] ldquofugacityrdquo and to the nominalizations of in-tensifying or downtoning verbs (41 intense 95 weak) eg rafforza-mento[+] ldquostrengtheningrdquo fervore[+] ldquofervorrdquo affievolimento[ndash] ldquoweak-ening rdquo diminuzione[ndash] ldquodecreaserdquoThey can modify both other nouns (52) and verbs (53)

(52) La sfrenatezza[+] dellrsquoodio[-3] di Maria [-3]ldquoThe wildness of the Maryrsquos haterdquo

(53) Luca difendeva[+2] Maria con fervore[+] [+3]ldquoLuca was defending Maria with fervourrdquo

158 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Intensification negation modality and comparison can appear to-gether in the same sentence We will describe these eventuality in thefurther paragraphs

423 Excess Quantifiers

Words that at first glance seem to be intensifiers but at a deeper anal-ysis reveal a more complex behavior are abbastanza ldquoenoughrdquo troppoldquotoo much and poco ldquonot much Because of their particular influ-ence on polar words their meaning especially when associated topolar adjectives have been associated in literature to other contex-tual valence shifters1 Examples are (Meier 2003 p 69) that proposedan interpretation that includes a comparison between extents

ldquoEnough too and so are quantifiers that relate an extentpredicate and the incomplete conditional (expressed bythe sentential complement) and are interpreted as compar-isons between two extentsThe first extent is the maximal extent that satisfies the extentpredicate The second extent is the minimal or maximal ex-tent of a set of extents that satisfy the (hidden) conditionalwhere the sentential complement supplies the consequentand the main clause the antecedent [] Intuitively thevalue an object has on a scale associated with the meaningof the adjective or adverb I related to some standard of com-parison that is determined by the sentential complementrdquo(Meier 2003 p 69)

(Schwarzschild 2008 p 316) that noticed the possibility of an implicitnegation

1 When quantifiers like the ones under examination modify polar words they compare such wordswith a threshold value that is defined by an infinitive clause in position N1 (eg The food is too good tothrow (it) away) (Meier 2003 Schwarzschild 2008)

42 POLARITY INTENSIFICATION AND DOWNTONING 159

ldquoIn excessives the infinitival complement while it does notform a threshold description it does contain an implicitnegation So like other negative statements excessives sup-port let alone rejoinders2rdquo

And again (Meier 2003 p 71) that transformed the infinitive clauseinto a modal expression

ldquoIn this paper I will propose that the sentential comple-ments of constructions with too and enough implicitly orexplicitly contain a modal expression [] Evidence for thismove is the fact that a modal expression (with existentialforce) can be added or omitted without changing the intu-itive meaning of the sentences [] An explicit modal ex-pression like be able or be allowed in the sentential comple-ment makes them more precise but it does not make themunacceptablerdquo

In this research we noticed as well that the co-occurrence of troppopoco and abbastanza with polar lexical items can provoke in their se-mantic orientation effects that can be associated to other contextualvalence shifters The ad hoc rules dedicated to these words (see Table41) are not actually new but refer to other contextual valence shift-ing rules that have been discussed in this Paragraph andor will beexplored in this ChapterTroppo and poco can also co-occur or can be negated the rules to beapplied in each case are reported in Table 42Troppo poco and abbastanza are also able to give a polarity to wordsthat in SentIta possess a neutral Prior Polarity An exception is the caseof non troppo + neutral words (Table 42) that does not produce gen-eralizable effects (eg non troppo decorato ldquonot so decoratedrdquo couldhave a weakly positive orientation while same can not be said of non

2This fuel is too volatile to use in a car engine = This fuel is too volatile to use in a car engine let alone alawn mower engine = This fuel should not be used in a car engine let alone a lawn mower engine Examplefrom Schwarzschild (2008)

160 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo Intensification Negative Switch Intensificationpoco Weakly Negative Switch Weakly Negative Switch Weak Negationabbastanza Downtoning Weakly Positive Switch Intensification

Table 41 The effects of troppo poco and abbastanza on polar lexical items

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo poco Strong Negation Negative Switch Downtoningun poco troppo Intensification Negative Switch Intensificationnon troppo Negative Switch mdash Weak Negationnon poco Intensification Weakly Positive Switch Intensification

Table 42 The effects of the co-occurrence and the negation of troppo poco withreference to polar lexical items

troppo nuovo ldquonot so newrdquo) Therefore it has been excluded from therules in order to avoid additional errors

43 Negation Modeling

431 Related Works

Negation modeling is an NLP research area that includes the auto-matic detection of negation expressions in raw texts and the determi-nation of the negation scopes that are the parts of the meaning modi-fied by negation) (Jia et al 2009)Negation meets the Sentiment Analysis need of determining the cor-rect polarity of opinionated words when shifted by the context That iswhy we included it into our set of Contextual Valence Shifters (Polanyiand Zaenen 2006 Kennedy and Inkpen 2006)In this paragraph we will survey the literature on negation modelinggoing in depth through the state-of-the-art methods for the automatic

43 NEGATION MODELING 161

negation detection from the pioneer work of Pang and Lee (2004) tomore sophisticated techniques such as some Semantic Compositionmethods which even define the syntactic contexts of the polar expres-sions (Choi and Cardie 2008)Despite the large interest on negation in the biomedical domain(Harkema et al 2009 Descleacutes et al 2010 Dalianis and Skeppstedt2010 Vincze 2010) maybe due to the availability of free annotatedcorpora (Vincze et al 2008) this paragraph focuses on the more sig-nificant works on sentiment analysis

Bag of Words (Pang and Lee 2004) demonstrated that using con-textual information can improve the polarity-classification accu-racy By means of a standard bag-of-words representation theyhandled negation modeling by adding artificial words to textsand without any explicit knowledge of polar expressions (eg Ido not NOT_like NOT_this NOT_new NOT_Nokia NOT_model)The problem with this method is the duplication of the featureswhich are always represented in both their plain and negated oc-currences (Wiegand et al 2010)

Contextual Valence Shifters Polanyi and Zaenen (2006) and Kennedyand Inkpen (2006) proposed a more sophisticated method thatboth switches (eg clever[+2] = not clever[-2]) or shifts (efficient[+2]

= rather efficient[+1]) the initial scores of words when negated ordowntoned in context As a general rule polarized expressionsare considered to be negated if the negation words immediatelyprecede themWilson et al (2005 2009) distinguished the following three typesof binary relationship polarity features for negation modeling

bull negation features negation expressions that negate polar ex-pressions

bull shifter features expressions that can be approximatelyequated with downtoners

162 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull positive and negative polarity modification features specialkind of polar expressions that modify other polar expres-sions turning their polarity into the positive or negative one

Semantic Composition Moilanen and Pulman (2007) exploited thesyntactic representations of sentences into a compositional se-mantics framework with the purpose of computing the polarityof headlines and complex noun phrases In this work composi-tion rules are incrementally applied to constituents dependingon the negation scope by means of negation words shifters andnegative polar expressionsA similar approach is the one of Shaikh et al (2007) that used verbframes as a more abstract level of representationChoi and Cardie (2008) computed the phrases polarity in twosteps the assessment of polarity of the constituent and the sub-sequent application of a set of previously defined inference rulesFor example the rule [Polarity([NP1]ndash [IN] [NP2]ndash) = +] can be ap-plied to phrases like [[lack]ndash

NP1 [of ]IN [crime]ndashNP2 in rural areas])

Liu and Seneff (2009) Taboada et al (2011) proposed composi-tional models able to mix together intensifiers downtoners po-larity shifters and negation words always starting from the wordsprior polarities and intensitiesThe work of Benamara et al (2012) goes beyond the studies men-tioned above since they specifically account for negative polarityitems and for multiple negativesSocher et al (2013) exploited Recursive Neural Tensor Network(RNTN) to compute two types of negation

bull negation of positive sentences where the negation is sup-posed to changes the overall sentiment of a sentence frompositive to negative

bull negation of negative sentences and their negationwhere theoverall sentiment is considered to become less negative (butnot necessarily positive)

43 NEGATION MODELING 163

Negation Scope Detection Jia et al (2009) introduced the conceptof scope of the negation term t The scope identification3 goesthrough the computing of a candidate scope (a subset of thewords appearing after t in the sentence that represent the mini-mal logical units of the sentence containing the scope) and thena pruning of those words The computational procedure thatapproximates the candidate scope considers the following ele-ments static delimiters eg because or unless that signal the be-ginning of another clause dynamic delimiters eg like and forand heuristic rules focused on polar expressions that involve sen-timental verbs adjectives and nouns

Morphological Negation modeling In order to complete the litera-ture survey on negation modeling we can not fail to mentionalso morphological negation modeling However due to the factthat the scope of any morphological negation remains includedinto simple words (Councill et al 2010) we deepened this as-pect of the negation in Sections 351 and 354 of Chapter 3We mention again here only the most significant contributionon this topic Hatzivassiloglou and McKeown (1997) Plag (2003)Yuen et al (2004) Moilanen and Pulman (2008) Ku et al (2009)Neviarouskaya (2010) Neviarouskaya et al (2011) and Wang et al(2011)

432 Negation in SentIta

As exemplified in the following sentences negation indicators do notalways change a sentence polarity in its positive or negative counter-parts (54) they often have the effect of increasing or decreasing thesentence score (55) That is why we prefer to talk about valence ldquoshift-ing rather than ldquoswitchingrdquo

3Scope Identification concerns the determination at a sentence level of the tokens that are affectedby negation cues (Jia et al 2009)

164 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

(54) Citroen non[neg] produce auto valide[+2] [-2]ldquoCitroen does not produce efficient cars

(55) Grafica non proprio[neg] spettacolare[+3] [-1]ldquoThe graphic not quite spectacular

(Taboada et al 2011 p 277) made comparable considerations

ldquoConsider excellent a +5 adjective If we negate it we get notexcellent which intuitively is a far cry from atrocious a -5adjective In fact not excellent seems more positive than notgood which would negate to a -3 In order to capture thesepragmatic intuitions we implemented another method ofnegation a polarity shift (shift negation) Instead of chang-ing the sign the SO value is shifted toward the opposite po-larity by a fixed amount (in our current implementation 4)Thus a +2 adjective is negated to a -2 but the negation of a-3 adjective (for instance sleazy) is only slightly positive aneffect we could call lsquodamning with faint praisersquo [] it is verydifficult to negate a strongly positive word without implyingthat a less positive one is to some extent true and thus ournegator becomes a downtonerrdquo

Just as (Benamara et al 2012 p 11)

ldquo() negation cannot be reduced to reversing polarity Forexample if we assume that the score of the adjective ldquoex-cellentrdquo is +3 then the opinion score in ldquothis student is notexcellentrdquo cannot be -3 It rather means that the student isnot good enough Hence dealing with negation requires togo beyond polarity reversalrdquo

Even though the subtraction of a fixed amount of polarity points4

seems to us a risky generalization the authors confirmed the Hornrsquos

4 Other strategies for the automatic negation modeling have been proposed by Yessenalina and Cardie(2011) who combined words using iterated matrix multiplication and Benamara et al (2012) who morein line with our research computed negation on the base of its own types

43 NEGATION MODELING 165

ideas on the asymmetry between affirmative and negative sentencesin natural language differently from standard logic (Horn 1989)Such assumption complicates any attempt to automatically extractthe meaning of negated phrases and sentences simply switching theiraffirmative scoresOur hypothesis that can be easily collocated into the compositionalsemantics approaches is that the final polarity of a negated expres-sion can be easily modulated but only taking into account at thesame time both the prior polarity of the opinionated expressions andthe strength of the negation indicators (see Tables 43 45 and 46)Despite of any complexity the finite-state technology offered us theopportunity to easily compute the negation influence on polarizedexpressions and to formalize the rules to automatically annotate realtextsAs regards the full list of negation indicators we included in our gram-mar three types of negation (Benamara et al 2012 Godard 2013) thatrespectively include the following items As we will show each typehas a different impact on the opinion expressions in terms of polarityand intensity

Negation operators

bull simple adverbs of negationnoAVV+NEGAZIONE

nonAVV+NEGAZIONE

micaAVV+NEGAZIONE

affattoAVV+NEGAZIONE+FORTE

neancheAVV+NEGAZIONE

neppureAVV+NEGAZIONE

mancoAVV+NEGAZIONE

senzaPREP+NEGAZIONE 5

bull compound adverbs of negationneanche per sognoAVV+AVVC+PC+PN+NEGAZIONE+FORTE

5ldquonotrdquo ldquoat allrdquo ldquoby no meansrdquo ldquoneitherrdquo

166 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

per niente al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

per nulla al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

in nessun modoAVV+AVVC+PDETC+PDETN+NEGAZIONE+FORTE

per nienteAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

per nullaAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

niente affattoAVV+AVVC+CC+AVVAVV+NEGAZIONE+FORTE6

Negation operator Rules Negative operators can appear alone (56)or with a strengthening function with another negation adverb(57)

(56)

Mi caA f f at to

Per ni ente

bel lo l primehotel

ldquoTotally not cool the hotelrdquo

(57) Lprimehotel non egrave

mi caa f f at to

per ni ente

bel lo

ldquoThe hotel is not cool at allrdquo

The general rules concerning negation operators formalized inthe grammars in the form of FSA are summarized in the Tables43 44 45 and 46 The local grammar principle is to put inthe same embedded graph all the structures that are supposedto receive the same score Such score is indicated in the columnShifted Polarity

Negative Quantifiers

bull AdjectivesnessunoA+FLX=A114+NEGAZIONE

nienteA+FLX=A605+NEGAZIONE 7

6ldquono wayrdquo ldquofor anything in the worldrdquo ldquotherersquos no wayrdquo ldquofor nothingrdquo ldquonot at allrdquo7ldquonobodyrdquo ldquonothingrdquo

43 NEGATION MODELING 167

Negation Sentiment Word ShiftedOperator Word Polarity Polarity

non

fantastico +3 -1bello +2 -2

carino +1 -2scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 43 Negation rules

Negation Negation Sentiment Word ShiftedOperator Operator Word Polarity Polarity

non mica

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 44 Negation rules with the repetition of negative operators

Strong Negation Sentiment Word ShiftedOperator Word Polarity Polarity

per niente

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 45 Negation rules with strong operators

168 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Weak Negation Sentiment Word ShiftedOperator Word Polarity Polarity

poco

fantastico +3 -1bello +2 -1

carino +1 -1nuovo 0 -1scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 46 Negation rules with weak operators

bull AdverbsnienteAVV+NEGAZIONE

nullaAVV+NEGAZIONE

pocoAVV+NEGAZIONE+DEB

poco o nienteAVV+PCONG+NKN+NEGAZIONE+DEB

poco o nullaAVV+PCONG+NKN+NEGAZIONE+DEB maiAVV+NEGAZIONE8

bull PronounsnessunoPRON+FLX=PRO717+NEGAZIONE

nientePRON+FLX=PRO719+NEGAZIONE

pocoPRON+FLX=PRO721+NEGAZIONE+DEB9

Negative Quantifiers Rules As indicated in the dictionary extractabove and as exemplified in the sentences below negative quan-tifiers which express both a negation and a quantification cantake the shape of different part of speech and consequently theycan assume in sentences different syntactic positionsWe can directly observe that they change their function on thebase of the different roles they assume

Negative quantifiers as Adjectives they tend to be pragmatically

8ldquonothingrdquo ldquolittlerdquo ldquolittle or nothingrdquo ldquoneverrdquo9ldquonot onerdquo ldquofewrdquo

43 NEGATION MODELING 169

negative when occurring as adjectival modifiers (58 59) justin the same way of lexical negation indicators (Potts 2011)

(58) Cortesia nessuna [-2]ldquoNo kindnessrdquo(59) Nessun servizio nelle stanze [-2]ldquoNo room servicerdquo

Negative quantifiers as Adverbs they work exactly as poco whenthey occupy the adverbial position (60 61) following therules formalized in the next paragraph (see Table 46) As ad-verbs they can occur together with negation operators (61)or not (60)

(60) Costa quasi niente [+1]ldquo It costs almost nothingrdquo(61) Non costa nulla [+2]ldquo It doesnrsquot cost anythingrdquo

Negative quantifiers as Pronouns they seem to possess the samestrengthening function of negative operators when occur-ring as pronouns (62 63) the rules of reference are in Table45

(62) Non lo consiglio a nessuno [-3]ldquoI donrsquot recommend it to anyonerdquo(64) Non gliene frega niente a nessuno [-3]ldquoNobody caresrdquo

Negative quantifiers in verbless sentences into elliptical sen-tences (64 65) they resemble the negation function of neg-ative operators following the rules of Table 43

(65) Niente di veramente innovativo [-1]ldquoNothing truly innovativerdquo(66) Nulla da eccepire [+1]ldquoNo objectionsrdquo

170 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Lexical Negation

Nouns

V er bs

Ad j ect i ves

mancanzaassenzacar enza

mancar edi f et t ar e

scar seg g i ar e

mancantecar ente

pr i vo

di

10

Lexical negation Rules lexical negation regards content word nega-tors that always possess an implicit negative influence There-fore when they occur in texts their polarity is assumed to be neg-ativeBecause they can co-occur with both the negative operators andquantifiers their polarity can also be shifted to the positive oneby simply applying the negation rules conceived for the first twonegation indicators

44 Modality Detection

441 Related Works

Speculative Language Detection (Dalianis and Skeppstedt 2010 De-scleacutes et al 2010 Vincze et al 2008) Hedge Detection (Lakoff 1973Ganter and Strube 2009 Zhao et al 2010) Irrealis Bloking (Taboadaet al 2011) and Uncertainty Detection (Rubin 2010) are all tasks thatcompletely or partially overlap modal constructions

10ldquolack to lack in to be laking ofrdquo

44 MODALITY DETECTION 171

In order to account for the most popular annotate corpora in this NLPfields we mention the BioScope corpus Vincze et al (2008) which hasbeen annotated with negation and speculation cues and their scopesthe FactBank that has been annotated with event factuality (Sauriacuteand Pustejovsky 2009) and the CoNLL Shared Task 2010 (Farkas et al2010) which focused on hedges detection in natural language textsModality has been defined by Kobayakawa et al (2009) into a taxon-omy that includes request recommendation desire will judgmentetcTaboada et al (2011) did not consider irrealis markers to be reliablefor the purposes of sentiment analysis Therefore they simply ignoredthe semantic orientation of any word in their scope Their list of irre-alis markers includes modals conditional markers (eg ldquoifrdquo) negativepolarity items (eg ldquoanyrdquo and ldquoanythingrdquo) some verbs (ldquoto expectrdquo ldquotodoubtrdquo) questions words enclosed in quotes which could not neces-sarily reflect the authorrsquos opinion)According to Benamara et al (2012) modality can be used to expresspossibility necessity permission obligation or desire through gram-matical cues such as adverbial phrases (eg ldquomayberdquo ldquocertainlyrdquo)conditional verbal moods some verbs (eg ldquomustrdquo ldquocanrdquo ldquomayrdquo)some adjectives and nouns (eg ldquoa probable causerdquo)Relying on the categories of Larreya (2004) Portner (2009) they distin-guished the following three types of modality each one of which hasbeen indicated to possess a specific effect on the opinion expressionin its scope in term of strength or degree of certainty

Buletic Modality it indicates the speakerrsquos desireswishes

Cues verbs denoting hope (eg ldquoto wishrdquo)

Epistemic modality it refers to the speakerrsquos personal beliefs andaffects the strength and the certainty degree of opinion expres-sions

Cues doubt possibility or necessity adverbs (eg ldquoperhapsrdquo

172 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ldquodefinitelyrdquo) and modal verbs (eg ldquohave tordquo ldquomaycanrdquo)

Deontic Modality it indicates possibilityimpossibility or obliga-tionpermission

Cues same modal verbs of the epistemic modality but with adeontic reading (eg ldquoYou must go see the movierdquo)

442 Modality in SentIta

When computing the Prior Polarities of the SentIta items into the tex-tual context we considered that modality can also have a significantimpact on the SO of sentiment expressionsAccording to the literature trends but without specifically focusingon the Benamara et al (2012) modality categories we recalled in theFSA dedicated to modality the following linguistic cues and we madethem interact with the SentIta expressions

Sharpening and Softening Adverbs

bull simple adverbs (32 sharp and 12 soft)inequivocabilmenteAVV+FORTE+Focus=SHARP

necessariamenteAVV+FORTE+Focus=SHARP

relativamenteAVV+DEB+Focus=SOFT

suppergiugraveAVV+DEB+Focus=SOFT

bull compound adverbs (38 sharp and 26 soft)con ogni probabilitagraveAVV+AVVC+PDETC+PDETN+Focus=SHARP

senza alcun dubbioAVV+AVVC+PDETC+PDETN+Focus=SHARP

fino a un certo puntoAVV+AVVC+PAC+PDETAGGN+Focus=SOFT

in qualche modoAVV+AVVC+PDETC+PDETN+Focus=SOFT

Modal Verbs from the LG class 56 (dovere ldquohave to potereldquomaycan volere ldquowantrdquo) that have as definitional struc-

44 MODALITY DETECTION 173

ture N0 V (E + Prep) Vinf W and accept in position N1 a director prepositional infinitive clause (all) and only in the cases ofpotere and volere a N1-um

Conditional and Imperfect Tenses tenses are located after thetexts preprocessing through the annotations Tempo=IM andTempo=C

Dealing with modality is very challenging in the Sentiment analy-sis field because in this research area it is not enough to simply detectits indicators but it is also necessary to manage its effects in term ofpolarity We did not find appropriate to simplify the problem by ig-noring all the semantic orientation of phrases and sentences in whichmodality can be detected (Taboada et al 2011) In fact as exemplifiedbelow there are cases in which modality sensibly affects its intensityand other cases in which it regularly shifts the sentence polarity intospecific directions

Uncertainty Degrees the words annotated with the labels +SHARPand +SOFT described above just as intensifiers and downtonershave the power of modifying the intensity of the words endowedwith positive or negative prior polarities with the same effectsdescribed in Section 42 We respectively selected them amongthe words tagged with the labels FORTE and DEB with the onlyaim of providing an exhaustive module that can work well withboth Modality Detection and Sentiment Analysis purposes

ldquoPotererdquo + Indicative Imperfect + Oriented Item modal verbs occur-ring with an imperfect tense can turn a sentence into a weaklynegative one (66) when combined with positive words or into aweakly positive one when occurring with negative items (67)The effect is really close to the weak negation (see Table 46 inSection 43) This can be explained by the fact that the meaning

174 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

of the sentences of this kind always concerns subverted expecta-tions

(66) Poteva[Modal+IM] essere una trama interessante[+2] [-1]ldquoIt could be an interesting plot

(67) Poteva[Modal+IM] essere una trama noiosa[-2] [+1]ldquoIt could be a boring plot

ldquoPotererdquo + Indicative Imperfect + Comparative + Oriented Items alsoin this case we use the explanation of subverted expectationsThe terms of comparison are obviously the expected opinionobject and the real one If the expected object is better than thereal one the sentence score is negative (68) if it is worse the sen-tence score is positive (69) The rules for the polarity score attri-bution are the ones displayed in Section 45

(68) Poteva[Modal+IM] essere piugrave[I-CW] dettagliata[+2] [-1]ldquoIt might have been more detailedrdquo

(69) Poteva[Modal+IM] andare peggio[I-OpW +2] [-1]ldquoIt might have gone worserdquo

ldquoDovererdquo + Indicative Imperfect Here again we face disappointmentof expectations but with a stronger effect the negation does notregard the possibility of the opinion object to be good but a ne-cessity Therefore we assumed the final score in these cases tofollow the negation rules shown in Table 43 in Section 43 (70)

(70) Questo doveva[Modal+IM] essere un film di sfumature[+1] [-2]ldquoThis one was supposed to be a nuanced movierdquo

ldquoDovererdquo + ldquoPotererdquo + Past Conditional Past conditional sentences(Narayanan et al 2009) also have a particular impact on theopinion polarity especially with the modal verb dovere (ldquohaveto) In such cases as exemplified in (71) the sentence polarityis always negative in our corpus

45 COMPARATIVE SENTENCES MINING 175

(71) Non[Negation] avrei[Aux+C] dovuto[Modal+PP] buttare via i mieisoldi [-2]ldquoI should not have burnt my money

The verb potere instead follows the same rules described abovefor the imperfect tense

45 Comparative Sentences Mining

451 Related Works

Comparative Sentence Mining and Comparison Mining Systems areNLP techniques and tool that only in part coincide with the SentimentAnalysis fieldSentences that express a comparison generally carry along with themopinions about two or more entities with regard to their shared fea-tures or attributes (Ganapathibhotla and Liu 2008) While the Sen-timent Analysis main purpose is to identify details about such opin-ions the Information Extraction main goals regard comparative sen-tence detection and classification Therefore if the object of the anal-ysis and the basic information units are shared among the two ap-proaches their purposes can sometimes coincide but in other casesbe differentAmong the few studies on the computational treatment of compari-son (Jindal and Liu 2006ab Ganapathibhotla and Liu 2008 Fiszmanet al 2007 Yang and Ko 2011) only Ganapathibhotla and Liu (2008)tried to extract the authorsrsquo preferred entities and to identify the Se-mantic Orientation of comparative sentences after their identifica-tion and classificationWith the following two examples Ganapathibhotla and Liu (2008)showed the difficulty in the polarity determination of context-dependent opinion comparatives that for a correct interpretation al-

176 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ways require domain knowledge eg the word ldquolongerrdquo can assumepositive (72) or negative (73) orientation depending on the productfeature on which it is associated

(72) ldquoThe battery life of Camera X is longer than Camera Yrdquo(73) ldquoProgram Xrsquos execution time is longer than Program Yrdquo

Since this work aims to provide a general purpose lexical and gram-matical database for Sentiment Analysis and Opinion Mining we didnot face here the treatment of this special kind of opinionated sen-tences just as we did not include in the lexicon items with domain-dependent meaningConcerning the comparative classification the main types that havebeen identified in literature are the following

bull Gradable (direct comparison)

ndash Non-equal

Increasing

Decreasing

ndash Equative

ndash Superlative

bull Non-gradable (implicit comparisons)

According to Ganapathibhotla and Liu (2008) Jindal and Liu(2006ab) a comparative relation can be represented in the followingway

ComparativeWord Features EntityS1 EntityS2 Type

Where ComparativeWord is the keyword used to express a compar-ative relation in a sentence Features is a set of features being com-pared EntityS1 is the first term of comparison and EntityS2 is thesecon oneThis way examples (72) and (73) are represented as follows

45 COMPARATIVE SENTENCES MINING 177

(72) longer battery life Camera X Camera Y Non-equal Gradable

(73) longer execution time Program X Program Y Non-equal Gradable

Our work shares with Ganapathibhotla and Liu (2008) the followingbasic rules

Increasing comparative + Negative item minusrarr Negative opinion minusrarr EntityS2 is preferred

Increasing comparative + Positive item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Negative item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Positive item minusrarr Negative opinion minusrarr EntityS2 is preferred

However differently from our work the authors simply switchedthe polarity of the opinions (and consequently the preferred en-tity) in the cases in which comparative sentences contained negationwords or phrases They underlined that sometimes this operationcould appear problematic eg ldquonot longerrdquo does not mean ldquoshorterrdquoActually the problem is that the authors did not take into accountdifferent degrees of positivity and negativity As we will show in thenext paragraphs of this Section the co-occurrences of weakly pos-itivenegative and intensely positivenegative Prior Polarities withcomparison indicators can generate different final sentence scoresAs regards the automatic analysis of superlatives we mention thework of Bos and Nissim (2006) that grounded the semantic interpre-tation of superlatives on the characterization of a comparison set (theset of entities that are compared to each other with respect to a cer-tain dimension) The authors took into consideration the superlativemodification performed by ordinals cardinals or adverbs (eg inten-sifiers or modals) and also the fact that superlative can manifest itselfboth in predicative or attributive formTheir superlative parser able to recognise a superlative expressionand its comparison set with high accuracy provides a CombinatoryCategorial Grammar (CCG) (Clark and Curran 2004) derivation of theinput sentence that is then used to construct a Discourse Represen-tation Structure (DRS) (Kamp and Reyle 1993)In this thesis we did not go through a semantic and logical analysis of

178 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

comparative and superlative sentences as well explained in the nextparagraph we just used them as CVS in order to accurately modify thepolarity of the entities they involve

452 Comparison in SentIta

In general this work which focuses on Gradable comparison in-cludes the analysis of the following comparative structures

bull equative comparative frozen sentences of the type N0 Agg comeC1 from the LG class PECO (see Section 342)

bull opinionated comparative sentences that involve the expressionsmeglio di migliore di ldquobetter than peggio di peggiore di ldquoworsethan superiore a ldquosuperior to inferiore a ldquoless than (74)

bull adjectival adverbial and nominal comparative structures that re-spectively involve oriented adjectives adverbs and nouns fromSentIta (75)

bull absolute superlative (75)

bull comparative superlative (76)

The comparison with other products (74) has been evaluated withthe same measures of the other sentiment expressions so the polaritycan range from -3 to +3The superlative that can imply (76) or not (75) a comparison with awhole class of items (Farkas and Kiss 2000) confers to the first termof the comparison the higher polarity score so it always increases thestrength of the opinion Its polarity can be -3 or +3

(74) LrsquoS3 egrave complessivamente superiore allrsquoIphone5 [+2]ldquoThe S3 is on the whole superior to the iPhone5

(75) Il suo motore era anche il piugrave brioso[+2] [+3]ldquoIts engine was also the most lively

45 COMPARATIVE SENTENCES MINING 179

(76) Un film peggiore di qualsiasi telefilm [-3]ldquoA film worse than whatever tv series

We did not include the equative comparative in the ContextualValence Shifters Section because it does not change the polarity ex-pressed by the opinion words (77)

(77) [Non[Negation] entusiasmante[+3]][-1] come i PES degli anni prece-denti

ldquoNot exciting as the the past years PESrdquo

In the next paragraphs we will go through the rules used in this FSAnetwork to compute the Semantic Orientations of minority and ma-jority comparative structures As will be observed the variety and thecomplexity of co-occurrence rules can be easily managed using thefinite-state technology

453 Interaction with the Intensification Rules

Rule II increasing comparative sentences with increasing opinion-ated comparative items listed in SentIta with positive Prior Polaritiesgenerate a positive opinionComparative indicators are the following

bull increasing opinionated comparative words (I-Op-CW)eg meglio better = +2

bull Intensifiers + increasing opinionated comparative words (Int + I-Op-CW)eg molto meglio a lot better = +3

Rule I increasing comparative sentences which occur with positiveitems from SentIta generate a positive OpinionComparative indicators are the following

180 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull Increasing Comparative Words (I-CW)eg piugrave morerdquo

bull Intensifiers + Increasing comparative words (Int + I-CW)eg molto piugrave a lot morerdquo

The sentence score depends on the interaction of I-CWs intensi-fiers and the prior polarity scores attributed to the SentIta wordsin the electronic dictionaries (see Table 47)

The sentence score just depends on the Prior polarity of the I-Op-CW

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

I-CW + OpW[POS] piugrave +1 +2 +3Int + I-CW + OpW[POS] molto piugrave +2 +3 +3

Table 47 Rule I increasing comparative sentences with positive orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

D-CW + OpW[POS] meno -1 -2 -1Int + D-CW + OpW[POS] molto meno -2 -3 -1

Table 48 Rule III decreasing comparative sentences with negative orientation

Rule III decreasing comparative sentences which occur with positiveitems from SentIta generate a negative opinion (see Table 48)Comparative indicators are the following

bull Decreasing Comparative Words (D-CW)eg meno lessrdquo

45 COMPARATIVE SENTENCES MINING 181

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

I-CW + OpW[NEG] piugrave -1 -2 -3Int + I-CW + OpW[NEG] molto piugrave -2 -3 -3

Table 49 Rule IV increasing comparative sentences with Negative Orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

D-CW + OpW[NEG] meno +1 +1 +1Int + D-CW + OpW[NEG] molto meno +2 +2 +1

Table 410 Rule VI decreasing comparative sentences with positive orientation

bull Intensifier + Decreasing comparative word (Int + D-CW)eg molto meno much lessrdquo

Rule IV this rule is perfectly specular to Rule I Increasing compara-tive sentences which occur with Negative items from SentIta gen-erate a negative opinionThe sentence score depends again on the interaction of I-CW In-tensifiers and the prior polarity scores attributed to the SentItawords that appear in the same sentence (see Table 49)

Rule V decreasing comparative sentences with decreasing opinion-ated comparative items listed in SentIta with negative Prior Po-larities generate a negative opinionComparative indicators are the following

bull decreasing opinionated comparative words (I-Op-CW)eg peggio worse = -2

bull Intensifier + decreasing opinionated comparative word (Int+ I-Op-CW)

182 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + I-CW + OpW[POS] non piugrave -1 -1 -1N + Int + I-CW + OpW[POS] non molto piugrave -1 -1 -1

Table 411 Nagation of Rule I Negated increasing comparative sentences with neg-ative orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + D-CW + OpW[POS] non meno +1 +2 +3N+ Int + D-CW + OpW[POS] non molto meno +1 +1 +2

Table 412 Negation of Rule III Negated decreasing comparative sentences withpositive orientation

eg molto peggio much worse = -3

Just as in Rule II the sentence score just depends on the PriorPolarity of the I-Op-CWs

Rule VI decreasing comparative sentences which occur with neg-ative items from SentIta generate a positive opinion (see Table410)Rule VI shares the comparative indicators with Rule III

454 Interaction with the Negation Rules

Negation of the Rule I increasing comparative sentences which occurwith positive items from SentIta when negated generate a neg-ative opinionComparative indicators are of course the same of Rule I with

45 COMPARATIVE SENTENCES MINING 183

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + I-CW + OpW[NEG] non piugrave +1 +1 -1N+ Int + I-CW + OpW[NEG] non molto piugrave +1 +1 -2

Table 413 Negation of Rule IV Negated increasing comparative sentences with neg-ative orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + D-CW + OpW[NEG] non meno -1 -2 -3N+ Int + D-CW + OpW[NEG] non molto meno -1 -2 -3

Table 414 Negation of RuleVI Negated decreasing comparative sentences with pos-itive orientation

the addition of negation indicators (N) eg non ldquonotrdquo The sen-tence score is always turned into the weakly negative one (seeTable 411)

Negation of the Rule II increasing comparative sentences in whichoccur increasing opinionated comparative items listed in SentItawith positive Prior Polarities when negated generate always anegative opinionComparative indicators are the same of Rule II

bull Negation markers + Increasing Opinionated ComparativeWords (N + I-Op-CW)eg non meglio ldquonot betterrdquo = -2

bull Negation markers + Intensifiers + Increasing OpinionatedComparative Words (N + Int + I-Op-CW)eg non molto meglio ldquonot much betterrdquo = -1

184 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Negation of Rule III the negation of Rule III that involves decreasingcomparative sentences with positive items from SentIta alwaysgenerates positive opinionsThe different sentence scores computed on the base of the in-teraction of comparative words and SentItarsquos Prior Polarities arereported in Table 412

Negation of Rule IV increasing comparative sentences which occurwith negative items from SentIta generate in the great part of thecases positive opinions (see Table 413) The sentence score isinfluenced by the interaction of Negation markers I-CW Inten-sifiers and negative prior polarity scores

Negation of Rule V decreasing comparative sentences in which occurincreasing opinionated comparative items listed in SentIta withnegative Prior Polarities when negated generate a weakly nega-tive opinion (non (molto + E) peggio ldquonot (much + E) worserdquo)

Negation of Rule VI decreasing comparative sentences which occurwith negative items from SentIta generate negative opinions ifnegated see Table 414

Chapter 5

Specific Tasks of the SentimentAnalysis

51 Sentiment Polarity Classification

Differently from the traditional topic-based text classification senti-ment classification aims to categorize documents through the classespositive and negative The objective can be a simple binary classifica-tion or it can come after a more complex classification with a largernumber of classes in the continuum between positive and negative(Pang and Lee 2008a) The rating-inference problem (Pang and Lee2005 Leung et al 2006 2008 Shimada and Endo 2008) regards themapping of automatically detected document polarity scores on thefine-grained rating scales (eg 510) of existing Collaborative Filter-ing algorithms (Breese et al 1998 Herlocker et al 1999 Sarwar et al2001 Linden et al 2003) The challenging problem of the sentimentclassification is the fact that the sentiment classes far from being un-related to one another represent opposing (binary classification) orordinal categories (multi-class a and rating-inference)

185

186 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

511 Related Works

Many techniques have been discussed in literature to perform theSentiment Polarity Classification We divided them in Section 32 inlexicon-based methods learning methods and hybrid methods1The first which calculate the sentiment orientation of sentences anddocuments starting from the polarity of words and phrases they con-tain are the methods that have been chosen for the present workThis is why they received an in-depth analysis in Chapter 3 wherewe discussed also the pros and cons of both the manual and auto-matic draft of sentiment lexical resources In this respect we men-tion again the major contribution related to knowledge-based so-lutions Landauer and Dumais (1997) Hatzivassiloglou and McKe-own (1997) Wiebe (2000) Turney (2002) Turney and Littman (2003)Riloff et al (2003) Hu and Liu (2004) Kamps et al (2004) Vermeij(2005) Gamon and Aue (2005) Esuli and Sebastiani (2005 2006a)Kanayama and Nasukawa (2006a) Benamara et al (2007) Kaji andKitsuregawa (2007) Rao and Ravichandran (2009) Mohammad et al(2009) Neviarouskaya et al (2009a) Velikovich et al (2010) Taboadaet al (2006 2011)Rule-based approaches that take into account the syntactic dimen-sion of the Sentiment Analysis are the ones used by Mulder et al(2004) Moilanen and Pulman (2007) and Nasukawa and Yi (2003)The most used classification algorithms in learning and statisticalmethods are Support Vector Machines which trained on specificdata-sets map a space of unstructured data points onto a new struc-tured space through a kernel function and Naiumlve Bayes which areprobabilistic classifiers that connect the presenceabsence of a classfeature to the absencepresence of other features given the class vari-able Effective sentiment features exploited in supervised machinelearning applications are terms their frequencies and TF-IDF parts

1In general while hybrid methods are characterized by the application of classifiers in sequence ap-proaches based on rules consists of an if-then relation among an antecedent and its associated conse-quent (Prabowo and Thelwall 2009)

51 SENTIMENT POLARITY CLASSIFICATION 187

of speech sentiment words and phrases sentiment shifters syntac-tic dependency etcTan et al (2009) and Kang et al (2012) adapted Naiumlve Bayes algorithmsfor sentiment analysisPang et al (2002) with the purpose to test the hypothesis that senti-ment classification can be treated just as a special case of topic-basedcategorization with the positive and negative sentiment consideredas topics used SVM Naiumlve Bayes and Maximum Entropy classifierswith diverse sets of featuresMullen and Collier (2004) employed the values potency activity andevaluative from a variety of sources and created a feature space clas-sified by means of SVMYe et al (2009) compared SVM with other statistical approaches prov-ing that SVM and n-gram outperform the Naiumlve Bayes approachThe minimum-cut framework working on graphs has been used byPang and Lee (2004)Bespalov et al (2011) performed sentiment classification via latent n-gram analysisNakagawa et al (2010) presented a dependency tree-based classifica-tion method grounded on conditional random fieldsCompositionality algorithms have been tested by Yessenalina andCardie (2011) and Socher et al (2013) Socher et al (2013) proposedRecursive Neural Tensor Network able to compute compositionalvector representations for phrases of variable length or of syntactickind obtaining better performances of standard recursive neural net-works matrix-vector recursive neural networks Naiumlve bi-gram Naiumlveand SVM Their output is the annotated corpus Stanford SentimentTreebank which represents a precious resource for the analysis of thecompositional effects of sentiment in languageSentiment indicators can be used for sentiment classification in anunsupervised manner this is the case of Turney (2002) and the otherlexicon-based methods (Liu 2012)In the end as regards the hybrid methods we mention the works of An-

188 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

dreevskaia and Bergler (2008) that integrated the accuracy of a corpus-based system with the portability of a lexicon-based system in a sin-gle ensemble of classifiers Aue and Gamon that combined nine SVM-based classifiers Dasgupta and Ng (2009) that firstly mined the un-ambiguous reviews through spectral techniques and afterwards ex-ploited them to classify the ambiguous reviews by combining activelearning transductive learning and ensemble learning Goldberg andZhu (2006) that presented an adaptation of graph-based learning al-gorithms for rating-inference problems and Prabowo and Thelwall(2009) that combined rule-based and machine learning classificationalgorithms

512 DOXA a Sentiment Classifier based on FSA

Using the command-line program noojapplyexe we built Doxa aprototype written in Java by which users can automatically apply theNooj lexicon-grammatical resources from Section 3 and rules fromSection 4 to every kind of text getting back a feedback of statistics thatcontain the opinions expressed in each caseDoxa sums up the values corresponding to every sentiment expressionand then standardizes the result for the total number of sentimentexpressions contained in each review Neutral sentences are simplyignored by both the FSA and DoxaThe automatically attributed polarity score are compared with thestars that the opinion holder gave to his review Then through thestatistics about the opinions expressed in every domain Doxa teststhe consistency of the sentiment lexical databases and tools with re-spect to the rating-inference problem in each document and in thewhole corpusFigure 51 reports an example of the analysis on the domain of hotelsreviews

51 SENTIMENT POLARITY CLASSIFICATION 189

Figure 51 Doxa the LG sentiment classifier based on the finite-state technology

513 A Dataset of Opinionated Reviews

Despite the fact that for the English language there is a large availabil-ity of corpora for sentiment analysis (Hu and Liu 2004 Pang and Lee2004 2005 Wiebe et al 2005 Wilson et al 2005 Thomas et al 2006Jindal and Liu 2007 Snyder and Barzilay 2007 Blitzer et al 2007 Sekiet al 2007 Macdonald et al 2007 Pang and Lee 2008b among oth-ers) the Italian language presents fewer contributions in this sense(Basile and Nissim 2013 Bosco et al 2013 2014)The dataset used to evaluate the lexical and grammatical resources ofthe present work has been built from scratch using Italian opinion-ated texts in the form of usersrsquo reviews and comments of e-commerceand opinion websites It contains 600 texts units and refers to sixdifferent domains for all of which different websites have been ex-ploited

190 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Cars wwwciaoit

Smartphones wwwtecnozoomit wwwciaoit wwwamazonitwwwalatestit

Books wwwamazonit wwwqlibriit

Movies wwwmymoviesit wwwcinemaliait wwwfilmtvitwwwfilmscoopit

Hotels wwwtripadvisorit wwwexpediait wwwvenerecomithotelscom wwwbookingcom

Videogames wwwamazonit

Details about the corpus are shown in Table 1

Text

Feat

ure

s

Car

s

Smar

tph

on

es

Bo

oks

Mov

ies

Ho

tels

Gam

es

Tot

Neg Docs 50 50 50 50 50 50 300Pos Docs 50 50 50 50 50 50 300Text Files 20 20 20 20 20 20 120

Word Forms 17163 19226 8903 37213 12553 5597 101655Tokens 21663 24979 10845 45397 16230 7070 126184

Table 51 Dataset of opinionated online customer reviews

Adjectives as it is commonly recognized in literature (Hatzivas-siloglou and McKeown 1997 Hu and Liu 2004 Taboada et al 2006)seem to be the most reliable SO indicators considering that the 17of the adjectives occurring in the corpus are polarized compared tothe 3 of the adverbs the 2 of nouns and the 7 of verbsMoreover considering all the opinion bearing words in the corpus theadjectivesrsquo patterns cover 81 of the total number of occurrences (al-most 5000 matches) while the adverbs the nouns and the verbs reach

51 SENTIMENT POLARITY CLASSIFICATION 191

respectively a percentage of 4 6 and 2 The remaining 7 is cov-ered by the other sentiment expressions that in any case contributeto the achievement of satisfactory levels of RecallAs regards the Semantic Orientation while in the dictionaries almost70 of the words are connoted by a negative Prior Polarity the num-ber of positive and negative expressions in our perfectly balanced cor-pus lead to almost antipodal results 64 of the sentiment expressionspossess a positive polarity In particular it is positive 61 of the ad-jective expressions 77 of the adverb expressions 65 of the nounsexpressions and 51 of the verb expressionsThe domain independent expressions have diametrically oppositepercentages only 34 of the occurrences is positive This result is dueto the presence of the really productive negative metanodes troppoldquotoo muchrdquo and poco ldquonot muchrdquo Inactivating them in fact the pos-itive percentage reaches 58 of occurrences Thus it could be statedthat although the lexicon makes available a greater number of nega-tive items the real occurrences of the opinion words throw off balancein the direction of the positive items (see Section 3321 to deepen thephenomenon of positive biases)

514 Experiment and Evaluation

In the sentiment classification task the information unit is representedby an opinionated document as a whole The purpose of our senti-ment tool is to classify such documents on the base of their overallSemantic OrientationTo ground this task on a lexicon means to hypothesize that the polar-ities of opinion words can be considered indicators of the polarity ofthe document in which the words and the expressions are containedOur Lexicon-Grammar based sentiment lexical database confers to el-ementary sentences the status of minimum semantic units of the lan-guage Therefore the indicators on which we grounded our classifi-

192 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cation go beyond the limit of mere words to match entire elementarysentences or polar phrases from complex sentencesBecause the inference of the whole document SO depends on the cor-rect annotation of its sentences and phrases the evaluation phase in-volves both the sentence-level and the document-level performancesof the tool

5141 Sentence-level Sentiment Analysis

This Paragraph presents the results pertaining to the Nooj outputwhich has been produced applying the SentIta LG sentiment re-sources to our corpus of costumer reviewsFor sake of clarity and for the reproducibility of the results it must bepointed out that some of the lexical resources have been applied withhigher priority attributions Details about the preferences are givenbelow

bull H1 to the sentiment adverb and nouns dictionaries

bull H2 to the sentiment adjective dictionary

bull H3 to the freely available dictionary Contrazioni that belongs tothe official Italian module of Nooj

bull H4 to an high-level priority dictionary that contains what is com-monly called a ldquostop word listrdquo

bull H5 to a dictionary of compound adverbs

In order to avoid ambiguity as much as possible the dictionaryof the verbs of sentiment must have the same priority of the Italianstandard dictionary (sdic) lower than any other mentioned resourceBecause our lexical and grammatical resources are not domain-specific we observed their interaction with every single part of thecorpus which is composed of many different domains each one ofthem characterized by its own peculiarities

51 SENTIMENT POLARITY CLASSIFICATION 193

Moreover in order to verify the performances of every part of speech(and of the expressions connected to them) we checked as shown inTable 52 the Precision and the Recall applying separately every singlemetanode (A ADV N V D-ind2) of the main graph of the sentimentgrammar

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A 88 90 79 875 915 835 866ADV 804 758 879 923 92 50 797N 818 857 828 778 853 857 832V 882 571 848 895 571 100 795D-ind 879 835 781 767 90 947 879Average 853 784 825 848 832 828 828

Table 52 Precision measure in sentence-level classification

The sentence-level Precision has been manually calculated deter-mining whereas the polarity and the intensity score of the expressionswas correctly assigned on all the concordances produced by Nooj Wemade an exception for the expressions of the adjectives which due tothe extremely large number of concordances have been evaluated byextracting a sample of 1200 matches (200 for each domain)The values marked by the asterisks have been reported to be thoroughbut they are not really relevant because of the small number of concor-dances on which they have been calculatedThe best performance has been reached in average by the cars datasetwhile the lower values have been assigned to the movies dataset Theperformances of the adjectives grammar are really satisfying espe-

2Domain independent sentiment expressions (D-ind) refer to all the frozen and semi-frozen expres-sions that have been directly formalized in the FSA and include also the vulgar expressions

194 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cially if compared with the high number of matches they producedEven though the domain-independent node covers a small part of thematches (7) it reached the best Precision results if compared withthe other metanodesAnyway in many cases just evaluating the polarity and the intensityof the expressions contained into a document is not enough to deter-mine whereas such expression truly contributes to the identificationof the polarity and the intensity of the document itself Sometimes po-larized sentences and phrases are not opinion indicators especiallyin the reviews of movies and books in which polarized sentences canjust refer to the plot (78)

(78) Le belle case sono dimora della degenerazione piugrave bieca [-3]ldquopretty houses hosts the grimmest degeneracyrdquo)

An opinionated sentence can be considered not pertinent also isthe case in which it does not directly refer to the object of the opinionbut mentions aspects that are just indirectly connected with the prod-uct Examples of this are (79) that does not refer to the product but tothe delivery service that in a book review does not refers to the objectbut to another media product related to the book

(79) Apprezzo[+2] molto[+] Amazon per questo [+3]ldquoI really appreciate Amazon for thisrdquo

(80) Il film non[Negation] mi era piaciuto[+2] [-2]ldquoI didnrsquot like the movierdquo

In this work we chose to exclude the second sentence from the cor-rect matches but we considered worthwhile to include the first onebecause of the high influence that this feature has on the numericalvalue that the opinion holder confers to his own opinionTable 53 corrects the results reported in Table 52 by excluding from

the correct matches the ones that do not give their contribution tothe individuation of the correct document SO This makes the averagesentence-level Precision drop of 87 points reaching the percentage of

51 SENTIMENT POLARITY CLASSIFICATION 195

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A -5 -55 -28 -105 -1 -15 -86ADV -22 0 -297 -77 0 0 -66N -114 -143 -395 -148 -59 -143 -167V 0 0 -176 -158 0 0 -56D-ind -86 0 -133 -67 -25 -53 -61Average -54 -4 -256 -111 -19 -42 -87

Table 53 Irrelevant matches in sentence-level classification

741 which we considered adequateWe chose to present these two results separately because this way it ispossible to distinguish the domains that are more influenced by thisproblem (eg movies and books) from the ones that are less affectedby the pertinence problem (eg hotels and smartphones)

5142 Document-level Sentiment Analysis

As far as the document-level performance is concerned (Table 54)we calculated the precision twice by considering in a first case as truepositive the reviews correctly classified by Doxa on the base of theirpolarity and in a second case by considering as true positive the docu-ments that received by our tool exactly the same stars specified by theOpinion HolderIn detail in the Polarity only row the True Positives are the documentsthat have been correctly classified by Doxa with a polarity attributionthat corresponds to the one specified by the Opinion HolderIn the Intensity also row the True Positives are the document that re-ceived by Doxa exactly the same stars specified by the Opinion Holder

196 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Pre

cisi

on

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Polarity only 710 720 630 740 910 720 740Intensity also 320 450 250 330 490 340 363

Table 54 Precision measure in document-level classification

As we can see the latter seem to have a very low precision but upona deeper analysis we discovered that is really common for the OpinionHolders to write in their reviews texts that do not perfectly correspondto the stars they specified That increases the importance of a softwarelike Doxa that does not stop the analysis on the structured data butenters the semantic dimension of texts written in natural language

Rec

all

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Sentence-level 727 796 648 657 721 588 690Document-level 100 986 100 961 989 912 975

Table 55 Recall in both the sentence-level and the document level classifications

The Recall pertaining to the sentence-level performance of our toolhas been manually calculated on a sample of 150 opinionated docu-ments (25 from each domain) by considering as false negatives thesentiment indicators which have not been annotated by our grammarThe document-level Recall instead has been automatically checkedwith Doxa by considering as true positive all the opinionated docu-ments which contained at least one appropriate sentiment indicator

52 FEATURE-BASED SENTIMENT ANALYSIS 197

So the documents in which the Nooj grammar did not annotate anypattern were considered false negatives Because we assumed that allthe texts of our corpus were opinionated we considered as false neg-atives also the cases in which the value of the document was 0Taking the F-measure into account the best results were achievedwith the smartphonersquos domain (770) in the sentence-level taskand with the hotelrsquos dataset (948) into the document-level perfor-mance

52 Feature-based Sentiment Analysis

The purpose of the sentiment analysis based on features is to providecompanies with customer opinions overviews which automaticallysummarize the strengths and the weaknesses of the products and ser-vices they offer In Section 1 we connected the definition of the opin-ion features to the following function (Liu 2010)

T=O(f)

Where the features (f ) of an object (O) (eg hotels) can be brandnames (eg Artemide Milestone) properties (eg cleanness loca-tion) parts (eg rooms apartments) related concepts (eg city to-ponyms) or parts of the related concepts (eg city centre tube mu-seum etc ) (Popescu and Etzioni 2007) Liu (2010) represented ev-ery object as a ldquospecial featurerdquo (F) defined by a subset of features (fi)

F = f1 f2 fn

Both F and fi grouped together under the noun target can beautomatically discovered in texts through both synonym words andphrases Wi or other direct or indirect indicators Ii

Wi = wi1 wi2 wim

Ii = ii1 ii2 iiq

198 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

It is essential to discover not only the overall polarity of an opin-ionated document but also its topic and features in order to discernthe aspect of a product that must be improved or whether the opin-ions extracted by the Sentiment Analysis applications are relevant tothe product or not For example (81) an extract from a hotel reviewexpresses two opinions on two different kinds of features the behav-ior of the staff (81a) and the beauty of the city in which the hotel islocated (81b)

(81a) [Opinion[FeaturePersonale] meravigliosamente accogliente]ldquoStaff wonderfully welcomingrdquo

(81b) [Opinion[FeatureRoma]egrave bellissima]ldquoRome is beautifulrdquo

It is crucial to distinguish between the two topics because just oneof them (81a) refers to an aspect that can be influenced or improvedby the company

521 Related Works

Pioneer works on feature-based opinion summarization are Hu andLiu (2004 2006) Carenini et al (2005) Riloff et al (2006) and Popescuand Etzioni (2007) Both Popescu and Etzioni (2007) and Hu and Liu(2004) firstly identified the product features on the base of their fre-quency and then calculated the Semantic Orientation of the opin-ions expressed on these features In order to find the most impor-tant features commented in reviews Hu and Liu (2004) used the as-sociation rule mining thanks to which frequent itemsets can be ex-tracted in free texts Redundant and meaningless items are removedduring a Feature Pruning phase Experiments have been carried outon the first 100 reviews belonging to five classes of electronic products(wwwamazoncom and wwwC|netcom) Hu and Liu (2006) presentedthe algorithm ClassPrefix-Span that aimed to find special kinds of pat-

52 FEATURE-BASED SENTIMENT ANALYSIS 199

tern the Class Sequential Rules (CSR) using fixed target and classesTheir method was based on the sequential pattern miningCarenini et al (2005) struck a balance between supervised and unsu-pervised approaches They mapped crude (learned) features into aUser-Defined taxonomy of the entityrsquos Features (UDF) that provideda conceptual organization for the information extracted This methodtook advantages from a similarity matching in which the UDF re-duced the redundancies by grouping together identical features andthen organized and presented information by using hierarchical rela-tions) Experiments have been conducted on a testing set containingPros Cons and detailed free reviews of five productsRiloff et al (2006) used the subsumption hierarchy in order to iden-tify complex features and then to reduce a feature set by remov-ing useless features which have for example a more general coun-terpart in the subsumption hierarchy The feature representationsused for opinion analysis are n-grams (unigrams bigrams) and lexico-syntactic extraction patterns Popescu and Etzioni (2007) presentedOPINE an unsupervised feature and opinion extraction system thatused the Web as a corpus to identify explicit and implicit featuresand relaxation-labelling methods to infer the Semantic Orientation ofwords The system draws on WordNetrsquos semantic relations and hier-archies for the individuation of the features (parts properties and re-lated concepts) and the creation of clusters of wordsFerreira et al (2008) made a comparison between the likelihood ratiotest approach (Yi et al 2003) and the Association mining approach(Hu and Liu 2004) The e-product dataset (topical documents) andthe annotation scheme (improved with some revisions) are the onesused by Hu and Liu (2004) In particular the revised annotationscheme comprehended

bull part-of relationship with the product (eg battery is a part of adigital camera)

bull attribute-of relationship with the product (weight and design are

200 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

attributes of a camera)

bull attribute-of relationship with the productrsquos feature (battery life isan attribute of the battery that is in turn a camerarsquos feature)

Double Propagation (Qiu et al 2009) focuses on the natural relationbetween opinion words and features Because opinion words are of-ten used to modify features such relations can be identified thanks tothe dependency grammarBecause these methods have good results only for medium-size cor-pora they must be supported by other feature mining methods Thestrategy proposed by Zhang et al (2010) is based on ldquono patternsrdquo andpart-whole patterns (meronymy)

bull NP + Prep + CP ldquobattery of the camerardquo

bull CP + with + NP ldquomattress with a coverrdquo

bull NP CP or CP NP ldquomattress padrdquo

bull CP Verb NP ldquothe phone has a big screenrdquo

Where NP is a noun phrase and CP a class concept phrase Theverbs used are ldquohasrdquo ldquohaverdquo ldquoincluderdquo ldquocontainrdquo ldquoconsistrdquo etc ldquoNordquopatterns are feature indicators as well Examples of such patterns areldquono noiserdquo or ldquono indentationrdquo Clearly a manually built list of fixedldquonordquo expression like ldquono problemrdquo is filtered out from the feature can-didates Every sentence is POS-tagged using the Brillrsquos tagger (Brill1995) and used as system inputIn order to find important features and rank them high Zhang et al(2010) used a web page ranking algorithm named HITS (Kleinberg1999)Somprasertsri and Lalitrojwong (2010) used a dependency based ap-proach for the opinion summarization task A central stage in theirwork is the extraction of relations between product features (ldquothe topicof the sentimentrdquo) and opinions (ldquothe subjective expression of theproduct featurerdquo) from online customer reviews Adjectives and verbs

52 FEATURE-BASED SENTIMENT ANALYSIS 201

have been used in this study as opinion words The maximum entropymodel has been used in order to predict the opinion-relevant productfeature relation In general in a dependency tree a feature can be ofdiffering kinds

bull Child the product feature is the subject or object of the verbsand the opinion word is a verb or a complement of a copular verb(opinions depend on the product features)

eg I like(opinion) this camera(feature)

bull Parent the opinion words are in the modifiers of product fea-tures which include adjectival modifier relative clause modifieretc (product features depend on the opinions)

eg I have found that this camera takes incredible(opinion) pic-tures(feature)

bull Sibling the opinion word may also be in an adverbial modifiera complement of the verb or a predicative (both opinions andproduct features depend on the same words)

eg The pictures(feature) some time turn out blurry(opinion)

bull Grand Parent the opinion words are adjectival complement ofmodifiers of product features (in the following example the wordldquogoodrdquo is the adverbial complement of relative clause modifierof noun phrase ldquomovie moderdquo) (opinions depend on the wordswhich depend on the product features)

eg It has movie mode(feature) that works good(opinion) for a digitalcamera

bull Grand Child the product feature is the subject or object of thecomplements and the opinion word is a verb or a complementof a copular verb for example (product features depend on thewords which depend on the opinions)

eg Itrsquos great(opinion) having the LCD display(feature)

202 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

bull Indirect none of the above relations

The Stanford Parser has been used by Somprasertsri and Lalitroj-wong (2010) to parse documents in the pre-processing phase beforethe dependency analysis stage Definite linguistic filtering (POSbased) pattern and the General Inquirer dictionary (Stone et al 1966)has been used to locate noun phrases and to extract product featurecandidate The generalized iterative scaling algorithm (Darroch andRatcliff 1972) has been used to estimate parameters or weights of theselected features Because it is possible to refer to a particular featureusing several synonyms Somprasertsri and Lalitrojwong (2010) usedsemantic information encoded into a product ontology manuallybuilt by integrating manufacturer product descriptions and termi-nologies in customer reviewsWei et al (2010) proposed a semantic-based method that made usesof a list of positive and negative adjectives defined in the GeneralInquirer to recognize opinion words and then extracted the relatedproduct features in consumer reviewsXia and Zong (2010) performed the feature extraction and selectiontasks using word relation features which seems to be effective featuresfor sentiment classification because they encode relation informationbetween words They can be of different kinds Uni (unigrams) WR-Bi(traditional bigrams) WR-Dp (word pairs of dependency relation)GWR-Bi (generalized bigrams) and GWR-Dp (dependency relations)Gutieacuterrez et al (2011) exploited Relevant Semantic Trees (RST) for theword-sense disambiguation and measured the association betweenconcepts at the sentence-level using the association ratio measureMejova and Srinivasan (2011) explored different feature definitionand selection techniques (stemming negation enriched featuresterm frequency versus binary weighting n-grams and phrases) andapproaches (frequency based vocabulary trimming part-of-speechand lexicon selection and expected Mutual Information Mejovarsquosexperimental results confirmed the fact that adjectives are importantfor polarity classification Furthermore they revealed that stemming

52 FEATURE-BASED SENTIMENT ANALYSIS 203

and using binary instead of term frequency feature vectors do nothave any impact on performances and that the helpfulness of certaintechniques depends on the nature of the dataset (eg size and classbalance)Concordance based Feature Extraction (CFE) is the technique used byKhan et al (2012) After a traditional pre-processing step regular ex-pressions are used to extract candidate features Evaluative adjectivescollected on the base of a seed list from Hu and Liu (2004) are helpfulin the feature extraction task In the end a grouping phase found theappropriate features for the opinionrsquos topic grouping together all therelated features and removing the useless ones The algorithm usedin this phase is based on the co-occurrence of features and uses theleft and right featurersquos contextKhan et al selected candidate product features by employing nounphrases that appear in texts close to subjective adjectives This intu-ition is shared with Wei et al (2010) and Zhang and Liu (2011) Thecentrepiece of the Khanrsquos method is represented by hybrid patternsCombined Pattern Based Noun Phrases (cBNP) that are grounded onthe dependency relation between subjective adjectives (opinionatedterms) and nouns (product features) Nouns and adjectives canbe sometimes connected by linking verbs (eg ldquocamera producesfantastically good picturesrdquo) Preposition based noun phrases (egldquoquality of photordquo ldquorange of lensesrdquo) often represents entity-to-entityor entity-to-feature relations The last stage is the proper featureextraction phase in which using an ad hoc module the noun phrasesof the cBNP patterns have been designated as product features

522 Experiment and Evaluation

In order to give our contribution to the resolution of the opinionfeature extraction and selection tasks we both exploited the SentIta

204 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

database and other preexisting Italian lexical resources of Nooj thatconsist of an objective dictionary that describes its lemmas withappropriate semantic properties it has been used to define andsummarize the features on which the opinion holder expressed hisopinionsIn order to identify the relations between opinions and features thelexical items contained in the just mentioned dictionaries have beensystematically recalled into a Nooj local grammar that provides asfeedback annotations about the (positive or negative) nature of theopinion and about the semantic category of the featuresThe objective lexicon consists in a Nooj dictionary of concrete nouns(Table 56) that has been manually built and checked by seven lin-guists It counts almost 22000 nouns described from a semantic pointof view by the property Hyper that specify every lemmarsquos hyperonymThe tags it includes are shown in Table 56 Through the softwareNooj it has been possible to create a connection between thesedictionaries and an ad hoc FSA for the feature analysisAs we will demonstrate below the feature classification is stronglydependent on the review domains and topics on the base of whichspecific local grammars need to be adapted (Engstroumlm 2004 An-dreevskaia and Bergler 2008)As shown in Figure 52 in the Feature Extraction phase the sentencesor the noun phrases expressing the opinions on specific features arefound in free texts Annotations regarding their polarity (BENEFITDRAWBACK) and their intensity (SCORE=[-3+3]) are attributed to each sen-tence on the base of the opinion words on which they are anchoredObviously the graph reported in Figure 52 represents just an extractof the more complex FSA built to perform the task which includesmore than 100 embedded graphs and thousands of different pathsable to describe not only the sentences and the phrases anchored onsentiment adjectives but also the ones that contain sentiment ad-verbs nouns verbs idioms and other lexicon-independent opinionbearing expressions (eg essere (dotato + fornito + provvisto) di ldquoto be

52 FEATURE-BASED SENTIMENT ANALYSIS 205

Label Tag Entries Example Traslationbody part Npc 598 labbro liporganism part Npcorg 627 colon colontext Ntesti 1168 rubrica address book(part of) article of clothing Nindu 1009 vestaglia nightgowncosmetic product Ncos 66 rossetto lipstickfood Ncibo 1170 agnello lambliquid substance Nliq 341 urina urinepotable liquid substance Nliqbev 29 sidro cidermoney Nmon 216 rublo ruble(part of) building Nedi 1580 torre towerplace Nloc 1018 valle valleyphysical substance Nmat 1727 agata agate(part of) plant Nbot 2187 zucca pumpkinmedicine Nfarm 415 sedativo sedativedrug Ndrug 69 mescalina mescalinechemical element Nchim 1212 cloruro chloride(part of) electronic device Ndisp 1191 motosega chainsaw(part of) vehicle Nvei 1082 gondola gondola(part of) furniture Narr 450 tavolo tableinstrument Nstrum 3666 ventaglio fanTot - 19946 - -

Table 56 Semantically Labeled Dictionary of Concrete Nouns

equipped withrdquo essere un (aspetto + nota + cosa + lato) negativo ldquoto bea negative siderdquo)The Contextual Valence Shifters that have been considered are theones described in Chapter 4Differently from the other approaches proposed in Literature theFeature Pruning task (Figure 53) in which the features extracted aregrouped together on the base of their semantic nature is performedin the same moment of the Extraction phase just applying once thedictionaries-grammar pair to free texts This happens because thegrammar shown in Figure 53 is contained into the metanode NG(Nominal Group) of the feature extraction grammarThanks to the instructions contained in this subgraph the words de-

scribed in our dictionary of concrete nouns as belonging to the same

206 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figure 52 Extract of the Feature Extraction grammar

hyperonym receive the same annotation The words labeled with thetags Narr (furniture eg ldquobedrdquo) and Nindu (article of clothing egldquobath towelrdquo) are annotated as ldquofurnishingsrdquo (first path) the wordslabeled with the tags Ncibo (food eg ldquocakerdquo) and NliqBev (potableliquid substance eg ldquocoffeerdquo) receive the annotation ldquofoodservicerdquo(second path) and so onIn order to perfect the results we also directly included in the nodes ofthe grammar the words that were important for the feature selectionbut that were not contained in the dictionary of concrete nouns (ab-stract nouns eg pranzo ldquolunchrdquo vista ldquoviewrdquo) and the hyperonymsthemselves that in general correspond with each feature class name(eg arredamento ldquofurnishingsrdquo luogo ldquolocationrdquo)Finally we disambiguated terms that in the objective dictionary wereassociated to more than one tag by excluding the wrong interpreta-tions from the corresponding node For example in the domain of

52 FEATURE-BASED SENTIMENT ANALYSIS 207

Figure 53 Extract of the Feature Pruning metanode

hotel reviews if one comments the cucina ldquocookingrdquo (first path) or thesoggiorno ldquosojournrdquo (third path) he is clearly not talking about rooms(in Italian cucina means also ldquokitchenrdquo and soggiorno ldquoliving roomrdquo)but respectively about the food service and the whole holiday expe-rienceThe annotations provided by Nooj once the lexical and grammaticalresources are applied to texts written in natural language can beon the form of XML (eXtensible Markup Language) tags For thisreason they can be effortlessly recalled into an application that au-tomatically applies the Nooj ad hoc resources and makes statistical

208 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

analyses on them Therefore we dedicated a specific module of Doxato the feature-based sentiment analysis that has been tested on acorpus composed of 1000 hotel reviews collected from websites likewwwtripadvisorit wwwexpediait and wwwbookingcomAn extract of the automatically annotated corpus is reported below

Review

Sono stato in questo albergo quasi per caso dopo aver subitoun incidente e devo ammettere che se non fosse stato per ladisponibilitagrave la cortesia e la professionalitagrave di Mario Pinoe Rosa non avrei potuto risolvere una serie di cose Lrsquohotelegrave fantastico silenzioso e decisamente pulito La colazioneottima ed il responsabile della sala colazione gentilissimo

ldquoI happened to be in this hotel by accident and I must admitthat if it hadnrsquot been for the willingness the kindness andthe competence of Mario Pino and Rosa I couldnrsquot havesolved a number of things The hotel is fantastic silentand definitely clean Great the breakfast and very kind theresponsible of the breakfast roomrdquo

Annotation

Sono stato in questo albergo quasi per caso dopo aver subito un incidente e devo ammettere

che se non fosse stato per ltBENEFIT TYPE=ATTITUDES SCORE=2gt la disponibilitagrave

ltBENEFITgt ltBENEFIT TYPE=ATTITUDES SCORE=2gt la cortesia ltBENEFITgt e

ltBENEFIT TYPE=ATTITUDES SCORE=2gt la professionalitagrave ltBENEFITgt di Mario Pino

e Rosa non avrei potuto risolvere una serie di cose

ltBENEFIT TYPE=ACCOMMODATIONS SCORE=3gt Lrsquohotel egrave fantastico ltBENEFITgt

ltBENEFIT TYPE=LOCATION SCORE=2gt silenzioso ltBENEFITgt e

ltBENEFIT SCORE=3 TYPE=CLEANLINESSgt decisamente pulito ltBENEFITgt

ltBENEFIT TYPE=FOODSERVICE SCORE=3gt La colazione ottima ltBENEFITgt ed

52 FEATURE-BASED SENTIMENT ANALYSIS 209

ltBENEFIT TYPE=ATTITUDES SCORE=3gt il responsabile della sala colazione gentilissimo

ltBENEFITgt

The evaluation presented in Table 57 shows that the method pro-posed in the present work performs very well with the hotel domainWe manually calculated on a sample of 100 documents from theoriginal corpus the Precision on the Semantic Orientation Identi-fication task (that regards the annotation of the sentencesrsquo polarityand intensity) and on the Feature Classification task (that regards theannotation of the types of the feature) Because the classes used to

Method Precision Recall F-scoreSO Identification Classification

No residual category 933 924 723 811Residual category 929 826 783 804

Table 57 F-score of Feature Extraction and Pruning

group the features have been identified a priori in our grammar wealso checked the results of our tool with the introduction of a residualcategory to annotate the features that do not belong to these classesEven though this causes an increase on the Recall a commensuratedecrease of the Precision in the Classification task makes the valueof the F-score similar to the one reached when the residual categoryis not taken into account Significant examples of the sentences thathave been automatically annotated in our corpus are given below

(82) Il responsabile della sala colazione gentilissimoldquoVery kind the responsible of the breakfast roomrdquoBENEFIT TYPE= ATTITUDES SCORE= 3

(83) Le camere erano sporcheldquoThe rooms were dirtyrdquoDRAWBACK TYPE= CLEANLINESS SCORE= -2

210 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

(84) Lrsquohotel egrave vicinissimo alla metroldquoThe hotel is really close to the tuberdquoBENEFIT TYPE= LOCATION SCORE= 3

In (82) the subjective adjective gentilissimo ldquovery kindrdquo recognisedby the grammar as the superlative form (SCORE=+3) of the positive adjec-tive gentile ldquokindrdquo (SCORE=+2) makes the grammar recognize the wholesentiment expression and provides the information regarding its po-larity and intensity This is why the FSA associates the tag BENEFIT andthe higher score to the feature that in this case is represented by anounIn (83) is shown that the ldquotyperdquo of the features can be indicated notonly by nouns but also by specific adjectives as happens with sporcheldquodirtyrdquo that is both an opinion word that specifies that the whole ex-pression is negative and a feature class indicator that let the grammarclassify it in the group of ldquocleanlinessrdquo rather than in the group of ldquoac-commodationsrdquo (that instead would be selected by the noun cameraldquoroomrdquo described in our objective dictionary with the tags Conc ldquocon-creterdquo and Nedi ldquobuildingrdquo)In the end sentence (84) attests the strength of the influence of thereview domains sometimes the lexicon is not enough to correctly ex-tract and classify the product features Domain-specific lexicon in-dependent patterns are often required in order to make the analysesadequate (Engstroumlm 2004 Andreevskaia and Bergler 2008)

523 Opinion Visual Summarization

We said in Section 51 that Doxa is a prototype able to analyze opin-ionated documents from their semantic point of view It automati-cally applies the Nooj resources to free texts sums up the values cor-responding to every sentiment expression standardizes the results forthe total number of sentiment expressions contained in the reviewsand then provides statistics about the opinions expressed

53 SENTIMENT ROLE LABELING 211

In this Section we show the Doxa improved functionalities dedicatedto the sentiment analysis based on the opinion features In the Ho-tel domain this deeper analysis considerably increases the Precisionon the sentences-level SO identification task that goes from 813 to933 without any loss in the Recall that holds steady just rising of02 percentage points from 721 to 723 (see Table 57)The feature-based module of Doxa completely grounded on the auto-matic analysis of free texts allows the comparison between more thanone object on the base of different kinds of aspects that characterizethemThe regular decagon in the center of the spider graphs of Figure 54represents the threshold value (zero) with which every group of opin-ion on the same object is compared when a figure is larger than it theopinion is positive when it is smaller the opinion is negative when afeaturersquos value is zero it means that the consumers did not express anyjudgment on that topicThe automatic transformation of non-structured reviews in struc-tured visual summaries has a strong impact on the work of companymanagers that must ground their marketing strategies on the analy-sis of a large amount of information and on the individual consumersthat need to quickly compare the Pros and Cons of the products orservices they want to buy

53 Sentiment Role Labeling

Semantic Roles also called theta or thematic roles (Chomsky 1993Jackendoff 1990 see Section 214) are linguistic devices able to iden-tify in texts ldquoWho did What to Whom and How When and Whererdquo(Palmer et al 2010 p 1)The main task of an SRL system is to map the syntactic elements of freetext sentences to their semantic representation through the identifi-cation of all the syntactic functions of clause-complements and their

212 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figu

re54D

oxam

od

ule

for

Feature-b

asedSen

timen

tAn

alysis

53 SENTIMENT ROLE LABELING 213

labeling with appropriate semantic roles (Rahimi Rastgar and Razavi2013 Monachesi 2009)To work an SRL system needs machine readable lexical resources inwhich all the predicates and their relative arguments are specifiedFrameNet (Baker et al 1998 Fillmore et al 2002 Ruppenhofer et al2006) VerbNet Schuler (2005) and the PropBank Frame Files (Kings-bury and Palmer 2002) provide the basis for large scale semantic an-notation of corpora (Giuglea and Moschitti 2006 Palmer et al 2010)In this Section according with the Semantic Predicates theory (Gross1981) we performed a matching between the definitional syntacticstructures attributed to each class of verbs and the semantic infor-mation we attached in the database to every lexical entryThis way we could create a strict connection between the argumentsselected by a given Predicate listed in our tables and the actants in-volved into the same verbrsquos Semantic Frame (Fillmore 1976 Fillmoreand Baker 2001 Fillmore 2006) In the experiment that we are goingto describe we focused on the following frames

bull ldquoSentiment (eg odiare ldquoto hate) that involves the roles experi-encer and stimulus

bull ldquoOpinion (eg difendere ldquoto stand up for) that implicates theroles opinion holder target of the opinion and cause of the opin-ion

bull ldquoPhysical act (eg baciare ldquoto kiss) that evokes the roles patientand agent

531 Related Works

A group of researches in the field of Semantic Role Labeling focusedon the automatic annotation of the roles which are relevant to Sen-timent Analysis and opinion mining such as sourceopinion holdertargetthemetopic

214 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Bethard et al (2004) with question answering purposes proposed anextension of semantic parsing techniques coupled with additionallexical and syntactic features for the automatic extraction of propo-sitional opinions The algorithm used to identify and label semanticconstituents is the one developed by Gildea and Jurafsky (2002) Prad-han et al (2003)Kim and Hovy (2006) presented a Semantic Role Labeling systembased on the Frame Semantics of Fillmore (1976) that aimed to tagthe frame elements semantically related to an opinion bearing wordexpressed in a sentence which is considered as the key indicatorof an opinion The learning algorithm used for the classificationmodel is Maximum Entropy (Berger et al 1996 Kim and Hovy 2005)FrameNet II Ruppenhofer et al (2006) has been used to collect framesrelated to opinion wordsOther works based on FrameNet are Houen (2011) that used Frame-Frame relations in order to expand the subset of polarity scoredFrames found in the FrameNet database Ruppenhofer et al (2008)Ruppenhofer and Rehbein (2012) and Ruppenhofer (2013) whichadded opinion frames with slots for source target polarity and in-tensity to the FrameNet database to build SentiFrameNet a lexi-cal database enriched with inherently evaluative items associated toopinion framesLexicon-based approaches are the one of Kim et al (2008) that madeuse of a collection of communication and appraisal verbs the Sen-tiWordNet lexicon a syntactic parser and a named entity recognizerfor the extraction of opinion holders from texts and the one ofBloom et al (2007) that extracted domain-dependent opinion holdersthrough a combination of heuristic shallow parsing and dependencyparsingConditional Random Fields (Lafferty et al 2001) is the method usedby Choi et al (2005) that interpreting the semantic role labeling taskas an information extraction problem exploited a hybrid approachthat combines two very different learning-based methods CRF for the

53 SENTIMENT ROLE LABELING 215

named entity recognition and a variation of AutoSlog (Riloff 1996) forthe information extraction CRF have been also used by by Choi et al(2006) who presented a global inference approach to jointly extractentities and relations in the context of opinion oriented informationextraction through two separate token-level sequence-tagging classi-fiers for opinion expression extraction and source extraction via lin-ear chain CRF (Lafferty et al 2001)

532 Experiment and Evaluation

The starting point of our experiment on SRL are the LG tables of theItalian verbs developed at the Department of Communication Sci-ence of the University of Salerno Among the lexical entries listed insuch matrices we manually extracted all the verbs endowed with apositive or negative defined semantic orientation Such verbs couldbe expression of emotions opinions or physical acts Details of thisclassification can be found in Section 333As an example in Tables 58 and 59 we show a small group of verbs be-longing to the Lexicon-Grammar class 45 This class includes all theverbs that can entry into a syntactic structure such as N0 V di N1 inwhich the ldquosubject (N0) selected by the verb (V ) is generally a humannoun (Num) and the complement (N1) is a completive (Ch F) or in-finitive (V-inf comp) clause usually introduced by the preposition ldquodi(see Table58) As shown in Table 59 our databases contain also se-mantic information concerning the nature the semantic orientationand the strength of the Predicates in examIn detail the tagset used in this experiment is the following

1 Type

(a) SENT sentiment

(b) OP opinion

(c) PHY physical act

216 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

N0=

Nu

m

N0=

ilfa

tto

Ch

F

Verb

N1=

che

F

N1=

che

Fco

ng

diN

1um

diN

1-u

m

diN

1co

ntr

oN

2

pre

pV

-in

fco

mp

+ - profittare + + + + + -+ - ridersene + + + + - -+ - risentirsi + + + + - ++ - strafottersene + + + + - -+ - vergognarsi + + + + - +

Table 58 Extract of the LG table of the verb Class 45

2 Orientation

(a) POS positive

(b) NEG negative

3 Intensity

(a) FORTE intense

(b) DEB feeble

Speaking in terms of Frame Semantics we identified in the OpinionMining and in the Sentiment Analysis field three Frames of interestrecalled by specific Predicates Sentiment Opinion and Physical actThe frame elements evoked by such frames are described below

Sentiment

It refers to the expression of any given frame of mind or affectivestateThe ldquosentiment words can be put in connection with some WordNetAffect categories (Strapparava et al 2004) such as emotion mood he-donic signalExamples are sdegnarsi ldquoto be indignant (class 10) odiare ldquoto hate(class 20) affezionarsi ldquoto grow fond (class 44B) flirtare ldquoto flirt

53 SENTIMENT ROLE LABELING 217

(class 9) disprezzare ldquoto despise (class 20) gioire ldquoto rejoice (class45)Predicates of that kind evoke as frame elements an experiencerthat feels the emotion or other internal states and a stimulusand event or a person that instigates such states (Gross 1995Gildea and Jurafsky 2002 Swier and Stevenson 2004 Palmeret al 2005 Elia 2013) This semantic frame summarizes theFrameNet ones connected to emotions such as Sensation Emo-tions_of_mental_activity Cause_to_experience Emotion_active Emo-tions Cause_emotion ect

Opinion The type ldquoOpinion instead is the expression of positiveor negative viewpoints beliefs or judgments that can be personal orshared by most people It comprehends among the the WN-affect cat-egories trait cognitive state behavior attitude Examples are ignorareldquoto neglect (class 20) premiare ldquoto reward (class 20) difendere ldquotodefend (class 27) esaltare ldquoto exalt (class 22) dubitare ldquoto doubt(class 45) condannare ldquoto condemn (class 49) deridere ldquoto make funof (class 50)The frame elements they evoke are an opinion holder that states anopinion about an object or an event and an opinion target that repre-sents the event or the object on which the opinion is expressed about(Kim and Hovy 2006 Liu 2012) Some predicates imply also the pres-ence of a cause that motivates the opinion (see Section 333)Into the FrameNet frame Opinion and Judgment the opinion holder iscalled Cognizer bu we preferred to use a word which is more commonin the Sentiment Analysis and in the Opinion Mining literature

Physical act

The type ldquoPhysical act comprises verbs like baciare ldquoto kiss (class18) suicidarsi ldquoto commit suicide (class 2) vomitare ldquoto vomit (class2A) schiaffeggiare ldquoto slap (class 20) sparare ldquoto shoot (class 4)

218 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

palpeggiare ldquoto grope (class 18)For this group of predicates the selected frame elements ara a patientthat is the victim (for the negative actions) or the beneficiary (for thepositive ones) of the physical act carried out by an agent (Carreras andMagraverquez 2005 Magraverquez et al 2008) It includes a large number ofFrameNet frames such as Cause_bodily_experience Shoot_projectilesKilling Cause_harm Rape Sex Violence ect

In order to perform the orientation and the intensity attributionwe manually explored the Italian LG tables of verbs and the SentItasemantic labelsThanks to lexical resources of this kind it is possible to automaticallyextract and semantically describe real occurrences of sentences like(85)

(85) [Sentiment[ExperiencerRenzi(N0)] si vergogna(V) [Stimulusdi parlare di energia in Eu-

ropa(N1)]]

ldquoRenzi feels ashamed of talking about energy in Europe

in which the syntactic structure of the verb vergognarsi ldquoto feelashamed N0 V (di) Ch F is matched by means of interpretationrules (eg V = N0 V (di) Ch F = Caus(s sent h) N0 = h N1 = s) to thesemantic function Caus(s sent h) that puts in relation an experiencerh and a stimulus s thanks to a Sentiment Semantic Predicate sent(Gross 1995 Section 214)Moreover we provided our LG databases with the specification of thearguments (N0 N1 N2 ect) that are semantically influenced by thesemantic orientation of the verbs The purpose is to correctly identifythem as features of the opinionated sentences and to work on theirbase also into feature-based sentiment analysis tasks

The reliability of the LG method on the Semantic Role Labelingin the Sentiment Analysis task has been tested on three different

53 SENTIMENT ROLE LABELING 219

Verb

LGC

lass

Typ

e

Ori

enta

tio

n

Inte

nsi

ty

Infl

uen

ce

profittare 45 opinion negative - N0ridersene 45 opinion negative weak N1risentirsi 45 sentiment negative - N0strafottersene 45 opinion negative strong N1vergognarsi 45 sentiment negative strong N0

Table 59 Examples of semantic descriptions of the opinionated verbs from the LGclass 45

datasets two of which have been extracted from social networks orweb resourcesIn detail the first two datasets come from Twitter and the thirdis a free web news headings dataset provided by DataMediaHub(wwwdatamediahubit wwwhumanhighwayit)The tweets have been downloaded using the hashtag that groupstogether the user comments on the election of the Italian PresidentMattarella (1) and the one that collects the comments on the Mas-terchef Italian TV show (2)

bull Tweets 46393 tweets

1 Mattarellapresidente 10000 tweets

2 Masterchefit 36393 tweets

bull News Headings 80651 titles

The LG based approach on which this experiment has been builtincludes the following basic steps

bull pre-processing pipeline that includes two phases

ndash a cleaning up phase carried out with Python routines that

220 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

aims to distinguish in the datasets linguistic elements fromstructural elements (eg markup informations web specificelements)

ndash an automatic linguistic analysis phase with the goal to lin-guistically standardize relevant elements obtained from thecleaned datasets in this phase texts are tokenized lem-matized and POS tagged using TreeTagger (Schmid 1994Schmid et al 2007) and then parsed using DeSR a depen-dency parser (Attardi et al 2009)

bull LG based automatic analysis in which the raw data are seman-tically labeled according with the syntacticsemantic rules of in-terpretation connected with each LG verb class

Figure 55 Examples of syntactically and semantically annotated sentences

53 SENTIMENT ROLE LABELING 221

Figure 55 presents three headlines examples processed both withthe dependency syntactic parser and the semantic LG-based seman-tic analyzerNotice that the elements of the traditional grammar automaticallyidentified by DeSR such as subjects and complements have beenrenamed according with the Lexicon-Grammar traditionThe corpus that counts 127044 short texts has been analyzed andsemantically and syntactically annotatedThe representative sample on which the human evaluation has beenperformed instead has 42348 textsThe evaluation of the performances of our tool proved the effec-tiveness of the Lexicon-Grammar approach The average F-scoresachieved in the different datasets are 071 in the Twitter and 076 inthe Heading corpus

Chapter 6

Conclusion

The present research handled the detection of linguistic phenomenaconnected to subjectivity emotions and opinions from a computa-tional point of viewThe necessity to quickly monitor huge quantity of semistructured andunstructured data from the web poses several challenges to NaturalLanguage Processing that must provide strategies and tools to ana-lyze their structures from a lexical syntactical and semantic point ofviewsThe general aim of the Sentiment Analysis shared with the broaderfields of NLP Data Mining Information Extraction etc is the auto-matic extraction of value from chaos (Gantz and Reinsel 2011) itsspecific focus instead is on opinions rather than on factual informa-tion This is the aspect that differentiates it from other computationallinguistics subfieldsThe majority of the sentiment lexicons has been manually or auto-matically created for the English language therefore existent Italianlexicons are mostly built through the translation and adaptation of theEnglish lexical databases eg SentiWordNet and WordNet-AffectUnlike many other Italian and English sentiment lexicons SentIta

223

224 CHAPTER 6 CONCLUSION

made up on the interaction of electronic dictionaries and lexicon de-pendent local grammars is able to manage simple and multiwordstructures that can take the shape of distributionally free structuresdistributionally restricted structures and frozen structuresMoreover differently from other lexicon-based Sentiment Analysismethods our approach has been grounded on the solidity of theLexicon-Grammar resources and classifications that provide fine-grained semantic but also syntactic descriptions of the lexical entriesSuch lexically exhaustive grammars distance themselves from the ten-dency of other sentiment resources to classify together words thathave nothing in common from the syntactic point of viewFurthermore through the Semantic Predicate theory it has been pos-sible to postulate a complex parallelism among syntax and semanticsthat does not ignore the numerous idiosyncrasies of the lexiconAccording with the major contribution in the Sentiment Analysis lit-erature we did not consider polar words in isolation We computedthey elementary sentence contexts with the allowed transformationsand then their interaction with contextual valence shifters the lin-guistic devices that are able to modify the prior polarity of the wordsfrom SentIta when occurring with them in the same sentencesIn order to do so we took advantage of the computational power ofthe finite-state technology We formalized a set of rules that work forthe intensification downtoning and negation modeling the modalitydetection and the analysis of comparative formsHere the difference with other state-of-the-art strategies consists inthe elimination of complex mathematical calculation in favor of theeasier use of embedded graphs as containers for the expressions de-signed to receive the same annotations into a compositional frame-workWith regard to the applicative part of the research we conducted withsatisfactory results three experiments on the same number of Sen-timent Analysis subtasks the sentiment classification of documentsand sentences the feature-based Sentiment Analysis and the seman-

225

tic role labeling based on sentimentsThis work exploited a method based on knowledge but this doesnot run counter statistical strategies All the semantically annotateddatasets produced by our fine-grained analysis that can indeed bereused as testing sets for learning machinesAs concerns the limitations of this research we mention among oth-ers the cases of irony sarcasm and cultural stereotypes which stillremain open problems for the NLP in general and for the SentimentAnalysis in particular since they can completely overturn the polarityof the sentencesSarcasm hyperbole jocularity etc are all rhetorical devices usedto express verbal irony a figurative linguistic process used to inten-tionally deny what it is literally expressed (Gibbs 2000 Curcoacute 2000)Therefore irony could been interpreted as a sort of indirect negation(Giora 1995 Giora et al 1998) in which some expectations are vio-lated (Clark and Gerrig 1984 Wilson and Sperber 1992) On this pur-pose (Reyes et al 2013 p 243) affirms that irony is perceived ldquoat theboundaries of conflicting frames of reference in which an expectationof one frame has been inappropriately violated in a way that is appro-priate in the otherrdquoThis linguistic device has been explored in both computational lin-guistics studies and in Opinion Mining works Sets of potential indica-tors have been tested on large corpora through different approachesobtaining encouraging results over the state-of-the-art in some re-spectsIn detail Tepperman et al (2006) used prosodic spectral and contex-tual cues Carvalho et al (2009) selected as irony cues emoticons ono-matopoeic expressions for laughter heavy punctuation marks quota-tion marks and positive interjections Davidov et al (2010) exploitedthe meta-data provided by Amazon to locate gaps between the starratings and the sentiments expressed in the raw reviews Reyes andRosso (2011) chose n-grams part of speech n-grams funny profilingpositivenegative profiling affective profiling and pleasantness pro-

226 CHAPTER 6 CONCLUSION

filing Gonzaacutelez-Ibaacutenez et al (2011) identified sarcasm through punc-tuation marks emoticons quotes capitalized words discursive termsthat hint at opposition or contradiction in a text temporal and contex-tual imbalance character-grams skip-grams and emotional scenar-ios concerning activation imagery and pleasantness Maynard andGreenwood (2014) exploited the hashtag tokenizationAmong others Veale and Hao (2009) carried out a very interestingstudy on creative irony focusing on ironic comparisons of the kind ATopic is as Ground as a Vehicle (Fishelov 1993 Moon 2008) whichcan consist on pre-fabricated similes eg ldquoas strong as an oxrdquo (Tay-lor 1954 Norrick 1986 Moon 2008) or on new and more emphaticand creative variations along the frozen similes eg ldquoas dangerous asa toothless wolfrdquo The authors found out that the 2 of these elabora-tions subvert the original simile to achieve an ironic effect The prob-lem is that sarcasm in general is a phenomenon inherently difficult toanalyze for machines and often for humans as well (Justo et al 2014Maynard and Greenwood 2014) In addition the barriers to the cre-ation of brand new comparisons are very low (Veale and Hao 2009)In any case the solutions to verbal irony proposed in literature seemfar from being accurate and generalizable They frequently lead tocontradictory results such as with the use of punctuation markswhich made Carvalho et al (2009) and Tepperman et al (2006) reachrelatively high precision but served as very weak predictors for Justoet al (2014) and Tsur et al (2010)In this works speaking in cost-benefit terms we realized that thequantity of errors introduced by the use of irony cues would havegone over the errors caused by their absence Therefore we limited theirony analysis to the cases of frozen comparative sentences describedin Section 342 that can attribute a semantic orientation to neutralwords (86) or reverse the orientation of polar words (87)

(86) Arturo egrave bianco[0] come un cadavere [-2]ldquoArturo is as white as a dead body (Arturo is pale)

227

(87) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

Anyway as can be noticed in (88) and (89) ironic comparisons canvary considerably from one case to another (Veale and Hao 2009) andcannot be distinguished automatically from non-ironic comparisons(90) as long as we don not include (domain-dependent) encyclopedicknowledge into the analysis tools

(88) La ripresa egrave degna[+2] di un trattore con aratro inseritoldquoThe pickup is worthy of a tractor with an inserted ploughrdquo

(89) E quel tocco di piccante () egrave gradevole[+2] quanto lo sarebbe unaspruzzata di pepe su un gelato alla panna

ldquoAnd that touch of piquancy ( ) is as pleasant as a spattering ofpepper on a cream flavoured ice-creamrdquo

(90) Gli spazi interni sono comodi[+2] come una berlina [+2]ldquoInside spaces are as comfortable as a sedanrdquo

Similar comments on the required encyclopedic knowledge can bemade about cultural stereotypes as well (91 92)

(91) La nuova fiat 500 egrave consigliabile[+2] molto di piugrave ad una ragazza[-2]

ldquoThe new Fiat 500 is recommended a lot more to a girlrdquo

(92) Un gioco per bambini di 12 anni [-2]ldquoA game for 12 years old childrdquo

Appendix A

Italian Lexicon-GrammarSymbols

Agg Val Evaluative adjective

Agg Adjective

Avv Adverbe

Ci Constrained nominal group

Ch F Completive complement without specification of mood

Det Determiner

N0 Sentence formal subject

N123 Sentence complements

Ni Nominal group

N Noun

Napp Appropriate noun

Nconc Concrete noun

229

230 APPENDIX A ITALIAN LEXICON-GRAMMAR SYMBOLS

Npc Noun of body part

Nsent Noun of Sentiment

Num Human noun

N-um Not human noun

Ppv Pronoun in preverbal position

Prep Preposition

V Verb

V-a Adjectivalization

Vcaus Causative verb

V-n Nominalization

Vsup Support verb

Vval Evaluative verb

Vvar Support verb variant

Appendix B

Acronyms

ALU Atomic Linguistic Units

cBNP Combined Pattern Based Noun Phrases

CCG Combinatory Categorial Grammar

CFE Concordance-based Feature Extraction

CRF Conditional Random Field

CSR Class Sequential Rule

CVS Contextual Valence Shifters

DRS Discourse Representation Structure

ERTN Enhanced Recursive Transition Network

eWOM electronic Word of Mouth

FMM Forward Maximum Matching

FSA Finite State Automata

FST Finite State Transducers

231

232 APPENDIX B ACRONYMS

LDA Latent Dirichlet Allocation

LG Lexicon-Grammar

LSA Latent Semantic Analysis

LSF Lexical Syntactic Feature

mCVS Morphological Contextual Valence Shifters

MPQA Multi Perspective Question Answering

NLP Natural Language Processing

PMI Pointwise Mutual Information

POS Part of Speech

PP Prior Polarity

QN Quality Noun

RNTN Recursive Neural Tensor Network

RST Relevant Semantic Tree

RTN Recursive Transition Network

SO Semantic Orientation

SVM Support Vector Machine

TGG Transformational-Generative Grammar

UDF User Defined taxonomy of entity Feature

WMI Web-based Mutual Information

XML eXtensible Markup Language

List of Figures

21 Syntactic Trees of the sentences (1) and (2) 2422 Syntactic Trees of the sentences (1b) and (2b) 2623 LG syntax-semantic interface 3224 Deterministic Finite-State Automaton 3725 Non deterministic Finite-State Automaton 3726 Finite-State Transducer 3727 Enhanced Recursive Transition Network 37

31 FSA for the interaction between Nsent and semanticallyannotated verbs 88

32 Extract of the morphological FSA that associates psychverbs to their adjectivalizations 92

33 Frozen and semi-frozen expressions that involve the vul-gar word cazzo and its euphemisms 103

34 Extract of the FSA for the SO identification of idioms 12035 Extract of the FSA for the extraction of PECO idioms 12436 Extract of the FSA for the automatic annotation of senti-

ment adverbs 13737 Extract of the FSA for the automatic annotation of qual-

ity nouns 14338 Morphological Contextual Valence Shifting of Adjectives 150

233

234 LIST OF FIGURES

51 Doxa the LG sentiment classifier based on the finite-state technology 189

52 Extract of the Feature Extraction grammar 20653 Extract of the Feature Pruning metanode 20754 Doxa module for Feature-based Sentiment Analysis 21255 Examples of syntactically and semantically annotated

sentences 220

List of Tables

21 Example of a Lexicon-Grammar Binary Matrix 27

31 Manually built Polarity Lexicons 4632 (Semi-)Automatically built Polarity Lexicons 4633 Sentix Basile and Nissim (2013) 5434 SentIta Evaluation tagset 5735 SentIta Strength tagset 5736 Composition of SentIta 5837 SentIta tag distribution in the adjective list 6138 Adjectives percentage values in SentIta 6139 Examples from the Psych Predicates dictionary 67310 Psych Predicates chosen for SentIta 67311 All the Semantic Predicates from SentIta 68312 Polar verbs in the main Lexicon-Grammar verbal sub-

classes 79313 Polar verbs in other LG verb classes which contain at

least one oriented item 80314 Distribution of semantic labels into the LG verb classes

that contain at least one oriented item 81315 Description of the Nominalizations of the Psychological

Semantic Predicates 82316 Nominalizations of the Psych Predicates chosen from

SentIta 84

235

236 LIST OF TABLES

317 Distribution of semantic tags in the dictionary of Adjec-tival Psych Predicates 90

318 Percentage values of Psych Adjectivalizations in SentIta 91319 Suffixes from the dictionary of Psych Adjectivalizations 93320 Adjectivalizations of the Psych Predicates 94321 SentIta tag distribution in the list of bad words 105322 Bad Words percentage values in SentIta 105323 Classes of Compound Adverbs in SentIta 110324 Composition of the compound adverb dictionary 113325 Polar adjectives in frozen sentences 115326 Examples of idioms that maintain switch or shift the

prior polarity of the adjectives they contain 117327 Composition of the adverb dictionary 138328 Adjectiversquos inflectional classes Salvi and Vanelli (2004) 139329 Rules for the adverb derivation 139330 Error analysis of the automatic QN annotation 144331 Presence of the QN suffixes in different dictionaries 145332 Productivity of the QN suffixes in different dictionaries 146333 About the overlap among Psych V-n dictionary and QN

dictionary 147334 Precision measure about the overlap of QN and Psy V-n 147335 Negation suffixes 148336 Intensifierdowntoner suffixes 149

41 The effects of troppo poco and abbastanza on polar lex-ical items 160

42 The effects of the co-occurrence and the negation oftroppo poco with reference to polar lexical items 160

43 Negation rules 16744 Negation rules with the repetition of negative operators 16745 Negation rules with strong operators 16746 Negation rules with weak operators 168

LIST OF TABLES 237

47 Rule I increasing comparative sentences with positiveorientation 180

48 Rule III decreasing comparative sentences with nega-tive orientation 180

49 Rule IV increasing comparative sentences with NegativeOrientation 181

410 Rule VI decreasing comparative sentences with positiveorientation 181

411 Nagation of Rule I Negated increasing comparative sen-tences with negative orientation 182

412 Negation of Rule III Negated decreasing comparativesentences with positive orientation 182

413 Negation of Rule IV Negated increasing comparativesentences with negative orientation 183

414 Negation of RuleVI Negated decreasing comparativesentences with positive orientation 183

51 Dataset of opinionated online customer reviews 19052 Precision measure in sentence-level classification 19353 Irrelevant matches in sentence-level classification 19554 Precision measure in document-level classification 19655 Recall in both the sentence-level and the document level

classifications 19656 Semantically Labeled Dictionary of Concrete Nouns 20557 F-score of Feature Extraction and Pruning 20958 Extract of the LG table of the verb Class 45 21659 Examples of semantic descriptions of the opinionated

verbs from the LG class 45 219

Bibliography

Abdul-Mageed M and Diab M (2012) Toward building a large-scalearabic sentiment lexicon In Proceedings of the 6th InternationalGlobal WordNet Conference pages 18ndash22

Abdul-Mageed M Diab M T and Korayem M (2011) Subjectiv-ity and sentiment analysis of modern standard arabic In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies short papers-Volume 2pages 587ndash591 Association for Computational Linguistics

Agerri R and Garciacutea-Serrano A (2010) Q-WordNet Extracting polar-ity from WordNet senses In Proceedings of the International Confer-ence on Language Resources and Evaluation pages 2300ndash2305

Akerlof G A (1970) The market for ldquolemonsrdquo Quality uncertaintyand the market mechanism In The quarterly journal of economicspages 488ndash500 JSTOR

Alba J Lynch J Weitz B Janiszewski C Lutz R Sawyer A andWood S (1997) Interactive home shopping consumer retailer andmanufacturer incentives to participate in electronic marketplacesIn The Journal of Marketing pages 38ndash53 JSTOR

Alm C O Roth D and Sproat R (2005) Emotions from text ma-chine learning for text-based emotion prediction In Proceedingsof the conference on Human Language Technology and Empirical

239

240 BIBLIOGRAPHY

Methods in Natural Language Processing pages 579ndash586 Associa-tion for Computational Linguistics

Andreevskaia A and Bergler S (2008) When specialists and gener-alists work together Overcoming domain dependence in sentimenttagging In ACL pages 290ndash298

Arad M (1998) Psych-notes In UCL Working Papers in Linguisticsvolume 10 pages 1ndash22

Argamon S Bloom K Esuli A and Sebastiani F (2009) Automati-cally determining attitude type and force for sentiment analysis InHuman Language Technology Challenges of the Information Societypages 218ndash231 Springer

Attardi G DellrsquoOrletta F Simi M and Turian J (2009) Accuratedependency parsing with a stacked multilayer perceptron In Pro-ceedings of EVALITA volume 9

Aue A and Gamon M Customizing sentiment classifiers to new do-mains A case study In Proceedings of recent advances in naturallanguage processing (RANLP)

Baccianella S Esuli A and Sebastiani F (2010) SentiWordNet 30An enhanced lexical resource for sentiment analysis and opinionmining In LREC volume 10 pages 2200ndash2204

Baker C F Fillmore C J and Lowe J B (1998) The berkeleyframenet project In Proceedings of the 17th international confer-ence on Computational linguistics-Volume 1 pages 86ndash90 Associa-tion for Computational Linguistics

Baker M (1988) Theta theory and the syntax of applicatives inchichewa In Natural Language amp Linguistic Theory volume 6 pages353ndash389 Springer

Baker M C (1997) Thematic roles and syntactic structure In Ele-ments of grammar pages 73ndash137 Springer

BIBLIOGRAPHY 241

Balahur A and Turchi M (2012) Multilingual sentiment analysis us-ing machine translation In Proceedings of the 3rd Workshop inComputational Approaches to Subjectivity and Sentiment Analysispages 52ndash60 Association for Computational Linguistics

Baldoni M Baroglio C Patti V and Rena P (2012) From tags toemotions Ontology-driven sentiment analysis in the social seman-tic web In Intelligenza Artificiale volume 6 pages 41ndash54

Balibar-Mrabti A (1995) Une eacutetude de la combinatoire des noms desentiment dans une grammaire locale In Langue franccedilaise pages88ndash97 JSTOR

Baptista J (2003) Some families of compound temporal adverbs inportuguese In Proceedings of the Workshop on Finite-State Methodsfor Natural Language Processing

Baroni M and Vegnaduzzo S (2004) Identifying subjective adjec-tives through web-based mutual information In Proceedings ofKONVENS volume 4 pages 17ndash24

Basile V and Nissim M (2013) Sentiment analysis on italian tweetsIn Proceedings of the 4th Workshop on Computational Approaches toSubjectivity Sentiment and Social Media Analysis pages 100ndash107

Belletti A and Rizzi L (1988) Psych-verbs and θ-theory In NaturalLanguage amp Linguistic Theory volume 6 pages 291ndash352 Springer

Benamara F Cesarano C Picariello A Recupero D R and Subrah-manian V S (2007) Sentiment analysis Adjectives and adverbs arebetter than adjectives alone In ICWSM

Benamara F Chardon B Mathieu Y Popescu V and Asher N(2012) How do negation and modality impact on opinions In Pro-ceedings of the Workshop on Extra-Propositional Aspects of Meaningin Computational Linguistics pages 10ndash18 Association for Compu-tational Linguistics

242 BIBLIOGRAPHY

Bennis H (2004) Unergative adjectives and psych verbs In Studiesin unaccusativity The syntax-lexicon interface pages 84ndash113

Berger A L Pietra V J D and Pietra S A D (1996) A maximum en-tropy approach to natural language processing In Computationallinguistics volume 22 pages 39ndash71 MIT Press

Bespalov D Bai B Qi Y and Shokoufandeh A (2011) Sentimentclassification based on supervised latent n-gram analysis In Pro-ceedings of the 20th ACM international conference on Informationand knowledge management pages 375ndash382 ACM

Bethard S Yu H Thornton A Hatzivassiloglou V and Jurafsky D(2004) Automatic extraction of opinion propositions and their hold-ers In 2004 AAAI Spring Symposium on Exploring Attitude and Affectin Text volume 2224

Bhatti M W Wang Y and Guan L A neural network approachfor human emotion recognition in speech In Circuits and Systems2004 ISCASrsquo04 volume 2

Blitzer J Dredze M Pereira F et al (2007) Biographies bollywoodboom-boxes and blenders Domain adaptation for sentiment clas-sification In ACL volume 7 pages 440ndash447

Bloom K (2011) Sentiment analysis based on appraisal theory andfunctional local grammars PhD thesis Illinois Institute of Technol-ogy

Bloom K Garg N and Argamon S (2007) Extracting appraisal ex-pressions In HLT-NAACL volume 2007 pages 308ndash315

Bloomfield L (1933) Language holt new york In Edizione italiana(1974) Il linguaggio Il Saggiatore Milano

Bocchino F (2007) Lessico-Grammatica dellrsquoitaliano Le costruzioniintransitive PhD thesis Universitagrave degli studi di Salerno

BIBLIOGRAPHY 243

Bos J and Nissim M (2006) An empirical approach to the interpreta-tion of superlatives In Proceedings of the 2006 conference on empiri-cal methods in natural language processing pages 9ndash17 Associationfor Computational Linguistics

Bosco C Allisio L Mussa V Patti V Ruffo G Sanguinetti M andSulis E (2014) Detecting happiness in italian tweets Towards anevaluation dataset for sentiment analysis in felicitta In Proc of ES-SSLOD pages 56ndash63

Bosco C Patti V and Bolioli A (2013) Developing corpora for sen-timent analysis The case of irony and senti-tut In IEEE IntelligentSystems number 2 pages 55ndash63 IEEE

Boucher J and Osgood C E (1969) The pollyanna hypothesis InJournal of Verbal Learning and Verbal Behavior volume 8 pages 1ndash8 Elsevier

Bounie D Bourreau M Gensollen M and Waelbroeck P (2005)The effect of online customer reviews on purchasing decisions Thecase of video games In Retrieved July volume 8 page 2009 Citeseer

Breese J S Heckerman D and Kadie C (1998) Empirical analysisof predictive algorithms for collaborative filtering In Proceedingsof the Fourteenth conference on Uncertainty in artificial intelligencepages 43ndash52 Morgan Kaufmann Publishers Inc

Brill E (1995) Transformation-based error-driven learning and nat-ural language processing A case study in part-of-speech tagging InComputational linguistics volume 21 pages 543ndash565 MIT Press

Buchanan T W Lutz K Mirzazade S Specht K Shah N J ZillesK and Jaumlncke L (2000) Recognition of emotional prosody and ver-bal components of spoken language an fmri study In CognitiveBrain Research volume 9 pages 227ndash238 Elsevier

244 BIBLIOGRAPHY

Buvet P-A Girardin C Gross G and Groud C (2005) Les preacutedicatsdrsquoaffect In LIDIL number 32 pages ppndash125

Carbonell J G (1979) Subjective understanding Computer modelsof belief systems Technical report DTIC Document

Carenini G Ng R T and Zwart E (2005) Extracting knowledge fromevaluative text In Proceedings of the 3rd international conference onKnowledge capture pages 11ndash18 ACM

Carreras X and Magraverquez L (2005) Introduction to the conll-2005shared task Semantic role labeling In Proceedings of the Ninth Con-ference on Computational Natural Language Learning pages 152ndash164 Association for Computational Linguistics

Carvalho P Sarmento L Silva M J and De Oliveira E (2009) Cluesfor detecting irony in user-generated contents oh itrsquos so easy-) In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion pages 53ndash56 ACM

Chen Y Zhou Y Zhu S and Xu H (2012) Detecting offensive lan-guage in social media to protect adolescent online safety In Pri-vacy Security Risk and Trust (PASSAT) 2012 International Confer-ence on and 2012 International Confernece on Social Computing (So-cialCom) pages 71ndash80 IEEE

Chetviorkin I and Loukachevitch N V (2012) Extraction of rus-sian sentiment lexicon for product meta-domain In COLING pages593ndash610

Chevalier J A and Mayzlin D (2006) The effect of word of mouthon sales Online book reviews In Journal of marketing research vol-ume 43 pages 345ndash354 American Marketing Association

Chin P (2005) The value of user-generated contents In Intranet Jour-nal www intranetjournal com

BIBLIOGRAPHY 245

Choi Y Breck E and Cardie C (2006) Joint extraction of entities andrelations for opinion recognition In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing pages431ndash439 Association for Computational Linguistics

Choi Y and Cardie C (2008) Learning with compositional seman-tics as structural inference for subsentential sentiment analysis InProceedings of the Conference on Empirical Methods in Natural Lan-guage Processing pages 793ndash801

Choi Y Cardie C Riloff E and Patwardhan S (2005) Identify-ing sources of opinions with conditional random fields and extrac-tion patterns In Proceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Language Processingpages 355ndash362 Association for Computational Linguistics

Chomsky N (1957) Syntactic Structures Mouton and Co The Hague-Paris

Chomsky N (1965) Aspects of the Theory of Syntax Number 11 MITpress

Chomsky N (1993) Lectures on government and binding The Pisalectures Number 9 Walter de Gruyter

Chu S-C and Kim Y (2011) Determinants of consumer engagementin electronic word-of-mouth (ewom) in social networking sites InInternational journal of Advertising volume 30 pages 47ndash75 Tayloramp Francis

Clark H H and Gerrig R J (1984) On the pretense theory of ironyIn Journal of Experimental Psychology General volume 113 pages121ndash126

Clark S and Curran J R (2004) The importance of supertagging forwide-coverage ccg parsing In Proceedings of the 20th international

246 BIBLIOGRAPHY

conference on Computational Linguistics page 282 Association forComputational Linguistics

Clematide S and Klenner M (2010) Evaluation and extension of apolarity lexicon for german In Proceedings of the First Workshop onComputational Approaches to Subjectivity and Sentiment Analysispages 7ndash13

Councill I G McDonald R and Velikovich L (2010) Whatrsquos greatand whatrsquos not learning to classify the scope of negation for im-proved sentiment analysis In Proceedings of the workshop on nega-tion and speculation in natural language processing pages 51ndash59Association for Computational Linguistics

Curcoacute C (2000) Irony Negation echo and metarepresentation InLingua volume 110 pages 257ndash280 Elsevier

DrsquoAgostino E (1992) Analisi del discorso metodi descrittividellrsquoitaliano drsquouso Loffredo

DrsquoAgostino E (2005) Grammatiche lessicalmente esaustive delle pas-sioni il caso dellrsquoio collerico le forme nominali In Quaderns drsquoItaliagravepages 149ndash169

DrsquoAgostino E De Bueriis G Cicalese A Monteleone M VellutinoD Messina S Langella A Santonicola S Longobardi F andGuglielmo D (2007) Lexicon-grammar classifications or better toget rid of anguish In 26th International Conference on Lexis andGrammar (LGCrsquo07)

DrsquoAgostino E and Elia A (1998) Il significato delle frasi un contin-uum dalle frasi semplici alle forme polirematiche In AA VV Ai limitidel linguaggio Laterza Bari pages 287ndash310

Dalianis H and Skeppstedt M (2010) Creating and evaluating a con-sensus for negated and speculative words in a swedish clinical cor-pus In Negation and Speculation in Natural Language Processing(NeSp-NLP 2010) page 5

BIBLIOGRAPHY 247

Dang Y Zhang Y and Chen H (2010) A lexicon-enhanced methodfor sentiment classification An experiment on online product re-views In Intelligent Systems IEEE volume 25 pages 46ndash53 IEEE

Darroch J N and Ratcliff D (1972) Generalized iterative scaling forlog-linear models In The annals of mathematical statistics pages1470ndash1480 JSTOR

Dasgupta S and Ng V (2009) Mine the easy classify the hard asemi-supervised approach to automatic sentiment classification InProceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural LanguageProcessing of the AFNLP Volume 2-Volume 2 pages 701ndash709 Asso-ciation for Computational Linguistics

Davidov D Tsur O and Rappoport A (2010) Semi-supervisedrecognition of sarcastic sentences in twitter and amazon In Pro-ceedings of the Fourteenth Conference on Computational NaturalLanguage Learning pages 107ndash116 Association for ComputationalLinguistics

de Albornoz J C Plaza L and Gervaacutes P (2012) Sentisense An easilyscalable concept-based affective lexicon for sentiment analysis InLREC pages 3562ndash3567

De Bueriis G and Elia A (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

De Gioia M (1994a) Chiaro come il sole a mezzogiorno una classedi avverbi idiomatici dellrsquoitaliano In The Linguist volume 33 pages220ndash225 Newnorth Print Ltd

De Gioia M (1994b) Problemi di rappresentazione et traduzione degliavverbi idiomatici Micromeacutegas

De Gioia M (2001) Avverbi idiomatici dellrsquoitaliano analisi lessico-grammaticale LrsquoHarmattan Italia

248 BIBLIOGRAPHY

De Mauro T and Thornton A M (1985) La predicazione teoria e ap-plicazione allrsquoitaliano In Sintassi e morfologia della lingua italianadrsquouso teorie ed applicazioni descrittive pages 487ndash519

De Silva L C Miyasato T and Nakatsu R (1997) Facial emotionrecognition using multi-modal information In Information Com-munications and Signal Processing 1997 ICICS Proceedings of 1997International Conference on volume 1 pages 397ndash401 IEEE

Denecke K (2008) Using SentiWordNet for multilingual sentimentanalysis In Data Engineering Workshop 2008 ICDEW 2008 IEEE24th International Conference on pages 507ndash512 IEEE

Descleacutes J Makkaoui O and Hacegravene T (2010) Automatic annota-tion of speculation in biomedical texts new perspectives and largescale evaluation In Negation and Speculation in Natural LanguageProcessing (NeSp-NLP 2010) page 32

Dewally M and Ederington L (2006) Reputation certification war-ranties and information as remedies for seller-buyer informationasymmetries Lessons from the online comic book market In TheJournal of Business volume 79 pages 693ndash729 JSTOR

Diamond D W (1989) Reputation acquisition in debt markets InThe journal of political economy pages 828ndash862 JSTOR

Dowty D R (1989) On the semantic content of the notion of lsquothe-matic rolersquo In Properties types and meaning pages 69ndash129 Springer

Dragut E C Yu C Sistla P and Meng W (2010) Construction ofa sentimental word dictionary In Proceedings of the 19th ACM in-ternational conference on Information and knowledge managementpages 1761ndash1764

Duan W Gu B and Whinston A B (2008) The dynamics of on-line word-of-mouth and product salesmdashan empirical investigation

BIBLIOGRAPHY 249

of the movie industry In Journal of retailing volume 84 pages 233ndash242 Elsevier

Elia A Per filo e per segno la struttura degli avverbi compostiIn DrsquoAgostino (a cura di) Tra sintassi e semantica Descrizioni emetodi di elaborazione automatica della lingua drsquouso pages 167ndash263 Napoli ESI

Elia A (1984) Le verbe italien Les compleacutetives dans les phrases agrave uncompleacutement Fasano di Puglia Schena-Nizet

Elia A (1990) Chiaro e tondo Lessico-Grammatica degli avverbi com-posti in italiano Segno Associati

Elia A (2013) On lexical semantic and syntactic granularity of ital-ian verbs In Penser le Lexique Grammaire Perspectives ActuellesHonoreacute Champion Paris pages 277ndash286

Elia A (2014a) Lessico e sintassi tra tempo e massa parlante InMarchese MP Nocentini A Il lessico nella teoria e nella storia lin-guistica pages 15ndash47 Edizioni il Calamo

Elia A (2014b) Operatori argomenti e il sistema leg-semantic rolelabelling dellrsquoitaliano In Mirto I a cura di Le relazioni irresistibilipages 105ndash118 Edizioni Ets

Elia A and De Bueriis G (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

Elia A Guglielmo D Maisto A and Pelosi S (2013) A linguistic-based method for automatically extracting spatial relations fromlarge non-structured data In Algorithms and Architectures for Par-allel Processing pages 193ndash200 Springer

Elia A Martinelli M and DrsquoAgostino E (1981) Lessico e Strut-ture sintattiche Introduzione alla sintassi del verbo italiano NapoliLiguori

250 BIBLIOGRAPHY

Elia A Pelosi S Maisto A and Raffaele G (2015) Towards alexicon-grammar based framework for nlp an opinion mining ap-plication In RANLP 2015 pages 160ndash167

Elia A and Vietri S (2010) Lexis-grammar amp semantic web In IN-FOtheca volume 11 pages 15andash38a

Elia A Vietri S Postiglione A Monteleone M and Marano F(2010) Data mining modular software system In SWWS pages 127ndash133

Engstroumlm C (2004) Topic dependence in sentiment classification InUnpublished MPhil Dissertation University of Cambridge

Essa I Pentland A P et al (1997) Coding analysis interpretationand recognition of facial expressions In Pattern Analysis and Ma-chine Intelligence IEEE Transactions on volume 19 pages 757ndash763IEEE

Esuli A and Sebastiani F (2005) Determining the semantic orienta-tion of terms through gloss classification In Proceedings of the 14thACM international conference on Information and knowledge man-agement pages 617ndash624 ACM

Esuli A and Sebastiani F (2006a) Determining term subjectivity andterm orientation for opinion mining In EACL volume 6 page 2006

Esuli A and Sebastiani F (2006b) SentiWordNet A publicly avail-able lexical resource for opinion mining In Proceedings of LRECvolume 6 pages 417ndash422

Farkas D F and Kiss K Eacute (2000) On the comparative and absolutereadings of superlatives In Natural Language amp Linguistic Theoryvolume 18 pages 417ndash455 Springer

Farkas R Vincze V Moacutera G Csirik J and Szarvas G (2010) Theconll-2010 shared task learning to detect hedges and their scope in

BIBLIOGRAPHY 251

natural language text In Proceedings of the Fourteenth Conferenceon Computational Natural Language LearningmdashShared Task pages1ndash12 Association for Computational Linguistics

Fellbaum C (1998) WordNet Wiley Online Library

Ferreira L Jakob N and Gurevych I (2008) A comparative studyof feature extraction algorithms in customer reviews In SemanticComputing 2008 IEEE International Conference on pages 144ndash151IEEE

Filip H (1996) Psychological predicates and the syntax-semantics in-terface In Conceptual Structure Discourse and Language StanfordCenter for the Study of Language and Information

Fillmore C J (1976) Frame semantics and the nature of languageIn Annals of the New York Academy of Sciences volume 280 pages20ndash32 Wiley Online Library

Fillmore C J (1977) The case for case reopened In Syntax and se-mantics volume 8 pages 59ndash82

Fillmore C J (2006) Frame semantics In Cognitive linguisticsBasic readings volume 34 pages 373ndash400 Berlin Mounton deGruyter[Originally published in Linguistics in the Morning CalmSeoul Hanshin Publishing Co 1982]

Fillmore C J and Baker C F (2001) Frame semantics for text un-derstanding In Proceedings of WordNet and Other Lexical ResourcesWorkshop NAACL

Fillmore C J Baker C F and Sato H (2002) The framenet databaseand software tools In LREC

Fishelov D (1993) Poetic and non-poetic simile structure seman-tics rhetoric In Poetics Today pages 1ndash23 JSTOR

252 BIBLIOGRAPHY

Fiszman M Demner-Fushman D Lang F M Goetz P and Rind-flesch T C (2007) Interpreting comparative constructions inbiomedical text In Proceedings of the Workshop on BioNLP 2007Biological Translational and Clinical Language Processing pages137ndash144 Association for Computational Linguistics

Fragopanagos N and Taylor J G (2005) Emotion recognition inhumanndashcomputer interaction In Neural Networks volume 18pages 389ndash405 Elsevier

Gaeta L (2004) Nomi drsquoazione In La formazione delle parole in ital-iano Tuumlbingen Max Niemeyer Verlag pages 314ndash51

Gamon M and Aue A (2005) Automatic identification of senti-ment vocabulary exploiting low association with known sentimentterms In Proceedings of the ACL Workshop on Feature Engineeringfor Machine Learning in Natural Language Processing pages 57ndash64

Ganapathibhotla M and Liu B (2008) Mining opinions in compara-tive sentences In Proceedings of the 22nd International Conferenceon Computational Linguistics-Volume 1 pages 241ndash248 Associationfor Computational Linguistics

Ganter V and Strube M (2009) Finding hedges by chasing weaselsHedge detection using wikipedia tags and shallow linguistic fea-tures In Proceedings of the ACL-IJCNLP 2009 Conference Short Pa-pers pages 173ndash176 Association for Computational Linguistics

Gantz J and Reinsel D (2011) Extracting value from chaos In IDCiview number 1142 pages 9ndash10

Garcia D Garas A and Schweitzer F (2012) Positive words carryless information than negative words In EPJ Data Science volume 1EPJ

Gatti L and Guerini M (2012) Assessing sentiment strength in wordsprior polarities In arXiv preprint arXiv12124315

BIBLIOGRAPHY 253

Gibbs R W (2000) Irony in talk among friends In Metaphor andsymbol volume 15 pages 5ndash27 Taylor amp Francis

Gildea D and Jurafsky D (2002) Automatic labeling of semanticroles In Computational linguistics volume 28 pages 245ndash288 MITPress

Giora R (1995) On irony and negation In Discourse processes vol-ume 19 pages 239ndash264 Taylor amp Francis

Giora R Fein O and Schwartz T (1998) Irony Grade salience andindirect negation In Metaphor and Symbol volume 13 pages 83ndash101 Taylor amp Francis

Giordano R and Voghera M (2008) Frasi senza verbo il contributodella prosodia In Sintassi storica e sincronica dellrsquoitaliano Atti delConv Intern SILFI Basilea

Giuglea A-M and Moschitti A (2006) Semantic role labeling viaframenet verbnet and propbank In Proceedings of the 21st Inter-national Conference on Computational Linguistics and the 44th an-nual meeting of the Association for Computational Linguistics pages929ndash936 Association for Computational Linguistics

Godard D (2013) Les neacutegateurs In La grande Grammaire du franccedilaisActes Sud

Goldberg A B and Zhu X (2006) Seeing stars when there arenrsquot manystars graph-based semi-supervised learning for sentiment catego-rization In Proceedings of the First Workshop on Graph Based Meth-ods for Natural Language Processing pages 45ndash52 Association forComputational Linguistics

Gonzaacutelez-Ibaacutenez R Muresan S and Wacholder N (2011) Identify-ing sarcasm in twitter a closer look In Proceedings of the 49th An-nual Meeting of the Association for Computational Linguistics Hu-man Language Technologies short papers-Volume 2 pages 581ndash586Association for Computational Linguistics

254 BIBLIOGRAPHY

Grimshaw J (1990) Argument structure the MIT Press

Gross G (1992a) Un outil pour le FLE les classes d lsquoobjets In Actesdu colloque FLE pages 169ndash192

Gross G (2008) Les classes drsquoobjets In Lalies number 28 pages 111ndash165

Gross M (1971) Transformational Analysis of French Verbal Con-structions University of Pennsylvania

Gross M (1975) Meacutethodes en syntaxe Hermann

Gross M (1979) On the failure of generative grammar In Languagepages 859ndash885 JSTOR

Gross M (1981) Les bases empiriques de la notion de preacutedicat seacute-mantique In Langages pages 7ndash52 JSTOR

Gross M (1982) Une classification des phrases laquofigeacuteesraquo du franccedilaisIn Revue queacutebeacutecoise de linguistique volume 11 pages 151ndash185 Uni-versiteacute du Queacutebec agrave Montreacuteal

Gross M (1986) Lexicon-grammar the representation of compoundwords In Proceedings of the 11th coference on Computational lin-guistics pages 1ndash6 Association for Computational Linguistics

Gross M (1988) Les limites de la phrase figeacutee In Langages pages7ndash22 JSTOR

Gross M (1990) La caracteacuterisation des adverbes dans un lexique-grammaire In Langue franccedilaise pages 90ndash102 JSTOR

Gross M (1992b) The argument structure of elementary sentencesIn Language Research volume 28 pages 699ndash716

Gross M (1993) Les phrases figeacutees en franccedilais In Lrsquoinformationgrammaticale volume 59 pages 36ndash41 Peeters

BIBLIOGRAPHY 255

Gross M (1995) Une grammaire locale de lrsquoexpression des senti-ments In Langue franccedilaise pages 70ndash87 JSTOR

Gross M (1996) Les verbes supports drsquoadjectifs et le passif In Lan-gages volume 30 pages 8ndash18 Armand Colin

Gross M (1997) The construction of local grammars In Finite-statelanguage processing volume 11 page 329 MIT Press

Gruber J S (1965) Studies in lexical relations PhD thesis Mas-sachusetts Institute of Technology

Gruen T W Osmonbekov T and Czaplewski A J (2006) ewomThe impact of customer-to-customer online know-how exchangeon customer value and loyalty In Journal of Business research vol-ume 59 pages 449ndash456 Elsevier

Guglielmo D (2009) ldquoGuardarsi allo specchiordquo forme lessicali e sin-tattiche della vanitagrave In Atti del II Convegno Interdipartimentale delCentro Studi sulle Rappresentazioni Linguistiche

Guillet A and Leclegravere C (1981) Restructuration du groupe nominalIn Langages pages 99ndash125 JSTOR

Gutieacuterrez Y Vaacutezquez S and Montoyo A (2011) Sentiment classi-fication using semantic features extracted from WordNet-based re-sources In Proceedings of the 2nd Workshop on Computational Ap-proaches to Subjectivity and Sentiment Analysis pages 139ndash145 As-sociation for Computational Linguistics

Hanani U Shapira B and Shoval P (2001) Information filteringOverview of issues research and systems In User Modeling andUser-Adapted Interaction volume 11 pages 203ndash259 Kluwer Aca-demic Publishers

Hansen L K Arvidsson A Nielsen F Aring Colleoni E and Etter M(2011) Good friends bad news-affect and virality in twitter In Fu-ture information technology pages 34ndash43 Springer

256 BIBLIOGRAPHY

Harkema H Dowling J N Thornblade T and Chapman W W(2009) Context An algorithm for determining negation expe-riencer and temporal status from clinical reports In Journal ofbiomedical informatics volume 42 page 839 NIH Public Access

Harris Z (1988) Language and information Columbia UniversityPress

Harris Z S (1964) Transformations in linguistic structure In Pro-ceedings of the American Philosophical Society pages 418ndash422

Harris Z S (1970) Discourse analysis In Papers in structural andtransformational linguistics pages 313ndash347

Hassan A and Radev D (2010) Identifying text polarity using randomwalks In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics pages 395ndash403

Hatzivassiloglou V and McKeown K R (1997) Predicting the seman-tic orientation of adjectives In Proceedings of the 35th annual meet-ing of the association for computational linguistics and eighth con-ference of the european chapter of the association for computationallinguistics pages 174ndash181 Association for Computational Linguis-tics

Herlocker J L Konstan J A Borchers A and Riedl J (1999) An al-gorithmic framework for performing collaborative filtering In Pro-ceedings of the 22nd annual international ACM SIGIR conference onResearch and development in information retrieval pages 230ndash237ACM

Hernandez-Farias I Buscaldi D and Priego-Saacutenchez B (2014)Iradabe Adapting english lexicons to the italian sentiment polar-ity classification task In First Italian Conference on ComputationalLinguistics (CLiC-it 2014) and the fourth International WorkshopEVALITA2014 pages 75ndash81

BIBLIOGRAPHY 257

Horn L (1989) A natural history of negation University of ChicagoPress

Houen S (2011) Opinion mining with semantic analysis In Onlinewww diku dk forskning Publikationer specialer 2011

specialerapport_ final_ Soren_ Houen pdf

Hu M and Liu B (2004) Mining opinion features in customer re-views In AAAI volume 4 pages 755ndash760

Hu M and Liu B (2006) Opinion feature extraction using classsequential rules In AAAI Spring Symposium Computational Ap-proaches to Analyzing Weblogs pages 61ndash66

Iacobini C (2004) Prefissazione In La formazione delle parole initaliano Tuumlbingen Max Niemeyer Verlag pages 97ndash161

Jackendoff R (1972) Semantic interpretation in generative grammarMIT press Cambridge MA

Jackendoff R (1990) Semantic structures volume 18 MIT press

Jespersen O (1965) The philosophy of grammar Number 307 Uni-versity of Chicago Press

Jia L Yu C and Meng W (2009) The effect of negation on senti-ment analysis and retrieval effectiveness In Proceedings of the 18thACM conference on Information and knowledge management pages1827ndash1830 ACM

Jin X Li Y Mah T and Tong J (2007) Sensitive webpage classifica-tion for content advertising In Proceedings of the 1st internationalworkshop on Data mining and audience intelligence for advertisingpages 28ndash33 ACM

Jindal N and Liu B (2006a) Identifying comparative sentences intext documents In Proceedings of the 29th annual internationalACM SIGIR conference on Research and development in informationretrieval pages 244ndash251 ACM

258 BIBLIOGRAPHY

Jindal N and Liu B (2006b) Mining comparative sentences and re-lations In AAAI volume 22 pages 1331ndash1336

Jindal N and Liu B (2007) Review spam detection In Proceedings ofthe 16th international conference on World Wide Web pages 1189ndash1190 ACM

Justo R Corcoran T Lukin S M Walker M and Torres M I (2014)Extracting relevant knowledge for the detection of sarcasm and nas-tiness in the social web In Knowledge-Based Systems volume 69pages 124ndash133 Elsevier

Kaji N and Kitsuregawa M (2007) Building lexicon for sentimentanalysis from massive collection of html documents In EMNLP-CoNLL pages 1075ndash1083

Kamp H and Reyle U (1993) From discourse to logic Introduction tomodeltheoretic semantics of natural language formal logic and dis-course representation theory Number 42 Springer Science amp Busi-ness Media

Kamps J Marx M Mokken R J and De Rijke M (2004) Usingwordnet to measure semantic orientations of adjectives In LRECvolume 4 pages 1115ndash1118

Kanayama H and Nasukawa T (2006a) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In Proceedings ofthe 2006 Conference on Empirical Methods in Natural Language Pro-cessing pages 355ndash363 Association for Computational Linguistics

Kanayama H and Nasukawa T (2006b) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In COLING ACL2006 page 355

Kang H Yoo S J and Han D (2012) Senti-lexicon and improvednaiumlve bayes algorithms for sentiment analysis of restaurant reviews

BIBLIOGRAPHY 259

In Expert Systems with Applications volume 39 pages 6000ndash6010Elsevier

Kennedy A and Inkpen D (2006) Sentiment classification of moviereviews using contextual valence shifters In Computational intelli-gence volume 22 pages 110ndash125 Wiley Online Library

Khan K Baharudin B and Khan A Identifying product featuresfrom customer reviews using hybrid dependency patterns Citeseer

Khan K Baharudin B B and City T (2012) Identifying product fea-tures from customer reviews using lexical concordance In ResearchJournal of Applied Sciences Engineering and Technology volume 4pages 833ndash839

Kim S-M and Hovy E (2004) Determining the sentiment of opin-ions In Proceedings of the 20th international conference on Compu-tational Linguistics page 1367

Kim S-M and Hovy E (2005) Identifying opinion holders for ques-tion answering in opinion texts In Proceedings of AAAI-05 Workshopon Question Answering in Restricted Domains pages 1367ndash1373

Kim S-M and Hovy E (2006) Extracting opinions opinion holdersand topics expressed in online news media text In Proceedings ofthe Workshop on Sentiment and Subjectivity in Text pages 1ndash8 As-sociation for Computational Linguistics

Kim Y Kim S and Myaeng S-H (2008) Extracting topic-relatedopinions and their targets in NTCIR-7 In Proceedings of the 7th NT-CIR Workshop Meeting Tokyo Japan

Kim Y-j and Larson R (1989) Scope interpretation and the syntaxof psych-verbs In Linguistic Inquiry pages 681ndash688 JSTOR

Kingsbury P and Palmer M (2002) From treebank to propbank InLREC

260 BIBLIOGRAPHY

Klein B and Leffler K B (1981) The role of market forces in assuringcontractual performance In The Journal of Political Economy pages615ndash641 JSTOR

Klein K and Kutscher S (2002) Psych-verbs and lexical economy InTheorie des Lexikons volume 122 pages 1ndash41

Klein L R (1998) Evaluating the potential of interactive mediathrough a new lens Search versus experience goods In Journal ofbusiness research volume 41 pages 195ndash203 Elsevier

Kleinberg J M (1999) Authoritative sources in a hyperlinked envi-ronment In Journal of the ACM (JACM) volume 46 pages 604ndash632ACM

Kobayakawa T S Kumano T Tanaka H Okazaki N Kim J-D andTsujii J (2009) Opinion classification with tree kernel svm usinglinguistic modality analysis In Proceedings of the 18th ACM confer-ence on Information and knowledge management pages 1791ndash1794ACM

Ku L-W Huang T-H and Chen H-H (2009) Using morphologicaland syntactic structures for chinese opinion analysis In Proceedingsof the 2009 Conference on Empirical Methods in Natural LanguageProcessing Volume 3-Volume 3 pages 1260ndash1269

Lafferty J McCallum A and Pereira F C (2001) Conditional ran-dom fields Probabilistic models for segmenting and labeling se-quence data

Lakoff G (1973) Hedges A study in meaning criteria and the logicof fuzzy concepts In Journal of philosophical logic volume 2 pages458ndash508 Springer

Landau I (2010) The locative syntax of experiencers volume 34 MITpress Cambridge

BIBLIOGRAPHY 261

Landauer T K and Dumais S T (1997) A solution to platorsquos problemThe latent semantic analysis theory of acquisition induction andrepresentation of knowledge In Psychological review volume 104page 211 American Psychological Association

Laporte E (1997) Lrsquoanalyse de phrases adjectivales par reacutetablisse-ment de noms approprieacutes In Langages volume 31 pages 79ndash104Armand Colin

Laporte E (2012) Appropriate nouns with obligatory modifiers InarXiv preprint arXiv12074625

Laporte E and Voyatzi S (2008) An electronic dictionary of frenchmultiword adverbs In Language Resources and Evaluation Confer-ence Workshop Towards a Shared Task for Multiword Expressionspages 31ndash34

Larreya P (2004) Lrsquoexpression de la modaliteacute en franccedilais et enanglais (domaine verbal) In Revue belge de philologie et drsquohistoirevolume 82 pages 733ndash762 Socieacuteteacute pour le progregraves des eacutetudesphilologiques et historiques

Lavanda I and Rampa G (2001) Microeconomia scelte individuali ebenessere sociale Carocci

Laver M Benoit K and Garry J (2003) Extracting policy positionsfrom political texts using words as data In American Political Sci-ence Review volume 97 pages 311ndash331 Cambridge Univ Press

Le Pesant D and Mathieu-Colas M (1998) Introduction aux classesdrsquoobjets In Langages pages 6ndash33 JSTOR

Lee C M Narayanan S S and Pieraccini R (2002) Combiningacoustic and language information for emotion recognition In IN-TERSPEECH

262 BIBLIOGRAPHY

Lee M and Youn S (2009) Electronic word of mouth (ewom) howewom platforms influence consumer product judgement In Inter-national Journal of Advertising volume 28 pages 473ndash499 Taylor ampFrancis

Lenci A Lapesa G and Bonansinga G (2012) Lexit A computa-tional resource on italian argument structure In LREC pages 3712ndash3718

Leung C W Chan S C and Chung F-l (2006) Integrating collabo-rative filtering and sentiment analysis A rating inference approachIn Proceedings of the ECAI 2006 workshop on recommender systemspages 62ndash66

Leung C W Chan S C and Chung F-L (2008) Evaluation of a rat-ing inference approach to utilizing textual reviews for collaborativerecommendation In Cooperative Internet Computing World Scien-tific Publisher World Scientific

Li F Huang M and Zhu X (2010) Sentiment analysis with globaltopics and local dependency In AAAI

Linden G Smith B and York J (2003) Amazon com recommenda-tions Item-to-item collaborative filtering In Internet ComputingIEEE volume 7 pages 76ndash80 IEEE

Liu B (2010) Sentiment analysis and subjectivity In Handbook ofnatural language processing volume 2 pages 627ndash666 Chapman ampHall Goshen CT

Liu B (2012) Sentiment analysis and opinion mining In SynthesisLectures on Human Language Technologies volume 5 pages 1ndash167Morgan amp Claypool Publishers

Liu J and Seneff S (2009) Review sentiment scoring via a parse-and-paraphrase paradigm In Proceedings of the 2009 Conference on Em-pirical Methods in Natural Language Processing Volume 1-Volume1 pages 161ndash169 Association for Computational Linguistics

BIBLIOGRAPHY 263

Macdonald C Ounis I and Soboroff I (2007) Overview of the trec2007 blog track In TREC volume 7 pages 31ndash43

Mahmud A Ahmed K Z and Khan M (2008) Detecting flames andinsults in text In Proceedings of the Sixth International Conferenceon Natural Language Processing BRAC University

Maisto A and Pelosi S (2014a) Feature-based customer review sum-marization In On the Move to Meaningful Internet Systems OTM2014 Workshops pages 299ndash308

Maisto A and Pelosi S (2014b) A lexicon-based approach to senti-ment analysis the italian module for nooj In Proceedings of the In-ternational Nooj 2014 Conference University of Sassari Italy Cam-bridge Scholar Publishing

Maks I and Vossen P (2011) Different approaches to automatic po-larity annotation at synset level In Proceedings of the First Interna-tional Workshop on Lexical Resources WoLeR pages 62ndash69

Maks I Vossen P et al (2012) Building a fine-grained subjectivitylexicon from a web corpus In LREC pages 3070ndash3076

Markov A (1971) Extension of the limit theorems of probability theoryto a sum of variables connected in a chain John Wiley amp Sons Inc

Magraverquez L Carreras X Litkowski K C and Stevenson S (2008)Semantic role labeling an introduction to the special issue In Com-putational linguistics volume 34 pages 145ndash159 MIT Press

Martiacuten J (1998) The lexico-syntactic interface psych verbs andpsych nouns In Hispania pages 619ndash631 JSTOR

Mathieu Y Y (1995) Verbes psychologiques et interpreacutetation seacuteman-tique In Langue franccedilaise pages 98ndash106 JSTOR

Mathieu Y Y (1999a) Les preacutedicats de sentiment In Langages pages41ndash52 JSTOR

264 BIBLIOGRAPHY

Mathieu Y Y (1999b) Un classement seacutemantique des verbes psy-chologiques In Cahiers du CIEL Publications Paris 7

Mathieu Y Y (2006) A computational semantic lexicon of frenchverbs of emotion In Computing attitude and affect in text Theoryand applications pages 109ndash124 Springer

Matthews H Hendrickson C and Soh D (2001) Environmentaland economic effects of e-commerce A case study of book publish-ing and retail logistics In Transportation Research Record Journal ofthe Transportation Research Board number 1763 pages 6ndash12 Trans-portation Research Board of the National Academies

Maynard D and Greenwood M A (2014) Who cares about sarcastictweets investigating the impact of sarcasm on sentiment analysisIn Proceedings of LREC

Maziero E G Pardo T A Di Felippo A and Dias-da Silva B C(2008) A base de dados lexical ea interface web do tep 20 thesauruseletrocircnico para o portuguecircs do brasil In Companion Proceedingsof the XIV Brazilian Symposium on Multimedia and the Web pages390ndash392 ACM

McAfee A Brynjolfsson E Davenport T H Patil D and Barton D(2012) Big data In The management revolution Harvard Bus Revvolume 90 pages 61ndash67

Meier C (2003) The meaning of too enough and so that In NaturalLanguage Semantics volume 11 pages 69ndash107 Springer

Meillet A (1906) La Phrase nominale en indoeuropeacuteen Socieacuteteacute delinguistique

Mejova Y and Srinivasan P (2011) Exploring feature definition andselection for sentiment classifiers In ICWSM

BIBLIOGRAPHY 265

Meunier A (1984) La seacutemantique locative de certaines structuresN0 ecirctre adj In Revue queacutebeacutecoise de linguistique volume 13 pages95ndash121 Universiteacute du Queacutebec agrave Montreacuteal

Meunier A (1999) Une construction complexe N0hum ecirctre Adj deV0-inf W caracteeacuteristique de certains adjectifs aacute sujet humain InLangages pages 12ndash44 JSTOR

Meydan M (1996) Constructions adjectivales substantifs approprieacuteset verbes supports In Linx volume 34 pages 197ndash210 Centre derecherches linguistiques de Paris 10

Meydan M (1999) La restructuration du gn sujet dans les phrases ad-jectivales agrave substantif approprieacute In Langages pages 59ndash80 JSTOR

Miller G A (1995) WordNet a lexical database for english In Com-munications of the ACM volume 38 pages 39ndash41 ACM

Mohammad S Dunne C and Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked wordsand a thesaurus In Proceedings of the 2009 Conference on EmpiricalMethods in Natural Language Processing Volume 2-Volume 2 pages599ndash608 Association for Computational Linguistics

Moilanen K and Pulman S (2007) Sentiment composition In Pro-ceedings of the Recent Advances in Natural Language Processing In-ternational Conference pages 378ndash382

Moilanen K and Pulman S (2008) The good the bad and the un-known morphosyllabic sentiment tagging of unseen words In Pro-ceedings of the 46th Annual Meeting of the Association for Computa-tional Linguistics on Human Language Technologies Short Paperspages 109ndash112

Molinier C and Levrier F (2000) Grammaire des adverbes descrip-tion des formes en-ment volume 33 Librairie Droz

266 BIBLIOGRAPHY

Monachesi P (2009) Annotation of semantic roles In Utrechet Uni-versity Netherlands

Montefinese M Ambrosini E Fairfield B and Mammarella N(2014) The adaptation of the affective norms for english words(anew) for italian In Behavior research methods volume 46 pages887ndash903 Springer

Moon R (2008) Conventionalized as-similes in english a problemcase In International Journal of Corpus Linguistics volume 13pages 3ndash37 John Benjamins Publishing Company

Mulder M Nijholt A Den Uyl M and Terpstra P (2004) A lexi-cal grammatical implementation of affect In Text Speech and Dia-logue pages 171ndash177

Mullen T and Collier N (2004) Sentiment analysis using supportvector machines with diverse information sources In EMNLP vol-ume 4 pages 412ndash418

Mullen T and Malouf R (2006) A preliminary investigation into sen-timent analysis of informal political discourse In AAAI Spring Sym-posium Computational Approaches to Analyzing Weblogs pages159ndash162

Nakagawa T Inui K and Kurohashi S (2010) Dependency tree-based sentiment classification using crfs with hidden variables InHuman Language Technologies The 2010 Annual Conference of theNorth American Chapter of the Association for Computational Lin-guistics pages 786ndash794 Association for Computational Linguistics

Nakayama M Sutcliffe N and Wan Y (2010) Has the web trans-formed experience goods into search goods In Electronic Marketsvolume 20 pages 251ndash262 Springer

Narayanan R Liu B and Choudhary A (2009) Sentiment analy-sis of conditional sentences In Proceedings of the 2009 Conference

BIBLIOGRAPHY 267

on Empirical Methods in Natural Language Processing Volume 1-Volume 1 pages 180ndash189

Nasukawa T and Yi J (2003) Sentiment analysis Capturing favora-bility using natural language processing In Proceedings of the 2ndinternational conference on Knowledge capture pages 70ndash77 ACM

Navigli R and Ponzetto S P (2012) BabelNet The automatic con-struction evaluation and application of a wide-coverage multilin-gual semantic network volume 193 pages 217ndash250

Nelson P (1970) Information and consumer behavior In The Journalof Political Economy pages 311ndash329 JSTOR

Neviarouskaya A (2010) Compositional Approach for AutomaticRecognition of Fine-Grained Affect Judgment and Appreciation inText PhD thesis University of Tokyo

Neviarouskaya A Prendinger H and Ishizuka M (2007) Textualaffect sensing for sociable and expressive online communicationIn Affective Computing and Intelligent Interaction pages 218ndash229Springer

Neviarouskaya A Prendinger H and Ishizuka M (2009a) Com-positionality principle in recognition of fine-grained emotions fromtext In ICWSM

Neviarouskaya A Prendinger H and Ishizuka M (2009b) Senti-ful Generating a reliable lexicon for sentiment analysis In AffectiveComputing and Intelligent Interaction and Workshops 2009 ACII2009 3rd International Conference on pages 1ndash6 IEEE

Neviarouskaya A Prendinger H and Ishizuka M (2011) Sentiful Alexicon for sentiment analysis In Affective Computing IEEE Trans-actions on volume 2 pages 22ndash36 IEEE

268 BIBLIOGRAPHY

Niedenthal P M Halberstadt J B Margolin J and Innes-Ker Aring H(2000) Emotional state and the detection of change in facial ex-pression of emotion In European Journal of Social Psychology vol-ume 30 pages 211ndash222 Wiley Online Library

Nielsen F Aring (2011) A new anew Evaluation of a word list for senti-ment analysis in microblogs In arXiv preprint arXiv11032903

Norrick N R (1986) Stock similes In Journal of Literary Semanticsvolume 15 pages 39ndash52

Osgood C E (1952) The nature and measurement of meaning InPsychological bulletin volume 49 page 197 American PsychologicalAssociation

Pak A and Paroubek P (2011) Twitter for sentiment analysis Whenlanguage resources are not available In Database and Expert Sys-tems Applications (DEXA) 2011 22nd International Workshop onpages 111ndash115 IEEE

Pal P Iyer A N and Yantorno R E (2006) Emotion detection frominfant facial expressions and cries In Acoustics Speech and SignalProcessing 2006 ICASSP 2006 Proceedings 2006 IEEE InternationalConference on volume 2 pages IIndashII IEEE

Palmer M Gildea D and Kingsbury P (2005) The proposition bankAn annotated corpus of semantic roles In Computational linguis-tics volume 31 pages 71ndash106 MIT press

Palmer M Gildea D and Xue N (2010) Semantic role labelingIn Synthesis Lectures on Human Language Technologies volume 3pages 1ndash103 Morgan amp Claypool Publishers

Pang B and Lee L (2004) A sentimental education Sentiment anal-ysis using subjectivity summarization based on minimum cuts InProceedings of the 42nd annual meeting on Association for Compu-tational Linguistics page 271 Association for Computational Lin-guistics

BIBLIOGRAPHY 269

Pang B and Lee L (2005) Seeing stars Exploiting class relation-ships for sentiment categorization with respect to rating scales InProceedings of the 43rd Annual Meeting on Association for Compu-tational Linguistics pages 115ndash124 Association for ComputationalLinguistics

Pang B and Lee L (2008a) Opinion mining and sentiment analysisIn Foundations and trends in information retrieval volume 2 pages1ndash135 Now Publishers Inc

Pang B and Lee L (2008b) Using very simple statistics for reviewsearch An exploration In COLING (Posters) pages 75ndash78

Pang B Lee L and Vaithyanathan S (2002) Thumbs up senti-ment classification using machine learning techniques In Proceed-ings of the ACL-02 conference on Empirical methods in natural lan-guage processing-Volume 10 pages 79ndash86 Association for Compu-tational Linguistics

Pantic M and Patras I (2006) Dynamics of facial expression recog-nition of facial actions and their temporal segments from face pro-file image sequences In Systems Man and Cybernetics Part B Cy-bernetics IEEE Transactions on volume 36 pages 433ndash449 IEEE

Park C and Lee T M (2009) Information direction website reputa-tion and ewom effect A moderating role of product type In Journalof Business Research volume 62 pages 61ndash67 Elsevier

Paulo-Santos A Ramos C and Marques N C (2011) Determin-ing the polarity of words through a common online dictionary InProgress in Artificial Intelligence pages 649ndash663 Springer

Pelosi S (2015a) Morphological relations for the automatic expan-sion of italian sentiment lexicons In Proceedings of the Interna-tional Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts NooJ 2015

270 BIBLIOGRAPHY

Pelosi S (2015b) Sentita and doxa Italian databases and tools forsentiment analysis purposes In Proceedings of the Second ItalianConference on Computational Linguistics CLiC-it 2015 pages 226ndash231 Accademia University Press

Pelosi S Maisto A and Elia A (2015) Sentiment analysis throughfinite state automata In the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 - FSMNLP2015

Perez-Rosas V Banea C and Mihalcea R (2012) Learning senti-ment lexicons in spanish In LREC pages 3077ndash3081

Pesetsky D (1996) Zero syntax Experiencers and cascades Num-ber 27 MIT press

Pianta E Bentivogli L and Girardi C (2002) MultiWordNet devel-oping an aligned multilingual database In Proceedings of the firstinternational conference on global WordNet volume 152 pages 55ndash63

Piao S Ananiadou S Tsuruoka Y Sasaki Y and McNaught J(2007) Mining opinion polarity relations of citations In Interna-tional Workshop on Computational Semantics (IWCS) pages 366ndash371

Picabia L (1978) Les constructions adjectivales en franccedilais systeacutema-tique transformationnelle volume 11 Librairie Droz

Picard R W and Picard R (1997) Affective computing volume 252MIT press Cambridge

Pierre-Yves O (2003) The production and recognition of emotionsin speech features and algorithms In International Journal ofHuman-Computer Studies volume 59 pages 157ndash183 Elsevier

Plag I (2003) Word-formation in English Cambridge UniversityPress

BIBLIOGRAPHY 271

Polanyi L and Zaenen A (2006) Contextual valence shifters In Com-puting attitude and affect in text Theory and applications pages 1ndash10 Springer

Popescu A-M and Etzioni O (2007) Extracting product features andopinions from reviews In Natural language processing and text min-ing pages 9ndash28 Springer

Portner P (2009) Modality OUP Oxford

Potts C (2011) On the negativity of negation In Semantics and Lin-guistic Theory number 20 pages 636ndash659

Prabowo R and Thelwall M (2009) Sentiment analysis A combinedapproach In Journal of Informetrics volume 3 pages 143ndash157 Else-vier

Pradhan S Hacioglu K Ward W Martin J H and Jurafsky D(2003) Semantic role parsing Adding semantic structure to un-structured text In Data Mining 2003 ICDM 2003 Third IEEE In-ternational Conference on pages 629ndash632 IEEE

Qiu G Liu B Bu J and Chen C (2009) Expanding domain senti-ment lexicon through double propagation In IJCAI volume 9 pages1199ndash1204

Quirk R Crystal D and Education P (1985) A comprehensive gram-mar of the English language volume 397 Cambridge Univ Press

Rahimi Rastgar S and Razavi N (2013) A System for Building CorpusAnnotated With Semantic Roles PhD thesis

Rainer F (2004) Derivazione nominale deaggettivale In La for-mazione delle parole in italiano pages 293ndash314

Rao D and Ravichandran D (2009) Semi-supervised polarity lexiconinduction In Proceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Linguistics pages 675ndash682 Association for Computational Linguistics

272 BIBLIOGRAPHY

Rappaport Hovav M and Levin B (1998) Building verb meaningsIn The projection of arguments Lexical and compositional factorspages 97ndash134

Read J and Carroll J (2009) Weakly supervised techniques fordomain-independent sentiment classification In Proceedings of the1st international CIKM workshop on Topic-sentiment analysis formass opinion pages 45ndash52 ACM

Reinstein D A and Snyder C M (2005) The influence of expert re-views on consumer demand for experience goods A case study ofmovie critics In The journal of industrial economics volume 53pages 27ndash51 Wiley Online Library

Remus R Quasthoff U and Heyer G (2010) Sentiws-a pub-licly available german-language resource for sentiment analysis InLREC

Reyes A and Rosso P (2011) Mining subjective knowledge from cus-tomer reviews a specific case of irony detection In Proceedings ofthe 2nd Workshop on Computational Approaches to Subjectivity andSentiment Analysis pages 118ndash124 Association for ComputationalLinguistics

Reyes A Rosso P and Veale T (2013) A multidimensional approachfor detecting irony in twitter In Language Resources and Evaluationvolume 47 pages 239ndash268 Springer

Reynolds K Kontostathis A and Edwards L (2011) Using machinelearning to detect cyberbullying In Machine Learning and Applica-tions and Workshops (ICMLA) 2011 10th International Conferenceon volume 2 pages 241ndash244 IEEE

Ricca D (2004a) Aggettivi deverbali In La formazione delle parole initaliano Tuumlbingen Niemeyer pages 419ndash444

BIBLIOGRAPHY 273

Ricca D (2004b) Derivazione avverbiale In La formazione delle pa-role in italiano pages 472ndash489

Riloff E (1996) An empirical study of automated dictionary construc-tion for information extraction in three domains In Artificial intel-ligence volume 85 pages 101ndash134 Elsevier

Riloff E Patwardhan S and Wiebe J (2006) Feature subsumptionfor opinion analysis In Proceedings of the 2006 Conference on Em-pirical Methods in Natural Language Processing pages 440ndash448 As-sociation for Computational Linguistics

Riloff E Wiebe J and Wilson T (2003) Learning subjective nounsusing extraction pattern bootstrapping In Proceedings of the sev-enth conference on Natural language learning at HLT-NAACL 2003-Volume 4 pages 25ndash32 Association for Computational Linguistics

Rosenbaum P S (1967) The grammar of English predicate comple-ment constructions Research monograph 47 Cambridge MA MITPress

Rubin V L (2010) Epistemic modality From uncertainty to certaintyin the context of information seeking as interactions with texts InInformation Processing amp Management volume 46 pages 533ndash540Elsevier

Ruppenhofer J (2013) Anchoring sentiment analysis in frame se-mantics In Veredas pages 66ndash81

Ruppenhofer J Ellsworth M Petruck M R Johnson C R andScheffczyk J (2006) FrameNet II Extended theory and practice

Ruppenhofer J and Rehbein I (2012) Semantic frames as an an-chor representation for sentiment analysis In Proceedings of the 3rdWorkshop in Computational Approaches to Subjectivity and Senti-ment Analysis pages 104ndash109 Association for Computational Lin-guistics

274 BIBLIOGRAPHY

Ruppenhofer J Somasundaran S and Wiebe J (2008) Finding thesources and targets of subjective expressions In LREC

Russom P et al (2011) Big data analytics In TDWI Best PracticesReport Fourth Quarter

Ruwet N (1994) Ecirctre ou ne pas ecirctre un verbe de sentiment In Languefranccedilaise pages 45ndash55 JSTOR

Sagot B and Fort K (2007) Ameacuteliorer un lexique syntaxique agrave lrsquoaidedes tables du lexique-grammaire adverbes en-ment In 26e Col-loque International sur le Lexique et la grammaire 2007

Salvi G and Vanelli L (2004) Nuova grammatica italiana BolognaIl Mulino

Sarwar B Karypis G Konstan J and Riedl J (2001) Item-basedcollaborative filtering recommendation algorithms In Proceedingsof the 10th international conference on World Wide Web pages 285ndash295 ACM

Sauriacute R and Pustejovsky J (2009) Factbank A corpus annotated withevent factuality In Language resources and evaluation volume 43pages 227ndash268 Springer

Schmid H (1994) Probabilistic part-of-speech tagging using decisiontrees In Proceedings of the international conference on new methodsin language processing volume 12 pages 44ndash49

Schmid H Baroni M Zanchetta E and Stein A (2007) The en-riched treetagger system In proceedings of the EVALITA 2007 work-shop

Schuler K K (2005) VerbNet A broad-coverage comprehensive verblexicon PhD thesis Computer and Information Science Dept Uni-versity of Pennsylvania

BIBLIOGRAPHY 275

Schuller B Rigoll G and Lang M (2003) Hidden markov model-based speech emotion recognition In Acoustics Speech and SignalProcessing 2003 Proceedings(ICASSPrsquo03) 2003 IEEE InternationalConference on volume 2 pages IIndash1 IEEE

Schuller B Rigoll G and Lang M (2004) Speech emotion recogni-tion combining acoustic features and linguistic information in a hy-brid support vector machine-belief network architecture In Acous-tics Speech and Signal Processing 2004 Proceedings(ICASSPrsquo04)IEEE International Conference on volume 1 pages Indash577 IEEE

Schwarzschild R (2008) The semantics of comparatives and otherdegree constructions In Language and Linguistics Compass vol-ume 2 pages 308ndash331 Wiley Online Library

Seki Y Eguchi K Kando N and Aono M (2005) Multi-documentsummarization with subjectivity analysis at duc 2005 In Proceed-ings of the Document Understanding Conference (DUC)

Seki Y Evans D K Ku L-W Chen H-H Kando N and Lin C-Y(2007) Overview of opinion analysis pilot task at ntcir-6 In Proceed-ings of NTCIR-6 Workshop Meeting pages 265ndash278

Shaikh M A M Prendinger H and Mitsuru I (2007) Assessingsentiment of text by semantic dependency and contextual valenceanalysis In Affective Computing and Intelligent Interaction pages191ndash202 Springer

Shapiro C (1982) Consumer information product quality and sellerreputation In The Bell Journal of Economics pages 20ndash35 JSTOR

Shapiro C (1983) Premiums for high quality products as returns toreputations In The quarterly journal of economics pages 659ndash679JSTOR

Shimada K and Endo T (2008) Seeing several stars A rating in-ference task for a document containing several evaluation criteria

276 BIBLIOGRAPHY

In Advances in Knowledge Discovery and Data Mining pages 1006ndash1014 Springer

Silberztein M (2003) Nooj manual In Available for download atwww nooj4nlp net

Silberztein M (2008) Complex annotations with nooj In Proceedingsof the 2007 International NooJ Conference pages pndash214 CambridgeScholars Publishing

Silva M J Carvalho P Costa C and Sarmento L (2010) Automaticexpansion of a social judgment lexicon for sentiment analysis Tech-nical Report TR 1008

Silva M J Carvalho P and Sarmento L (2012) Building a sentimentlexicon for social judgement mining In Computational Processingof the Portuguese Language pages 218ndash228 Springer

Singhal P and Bansal A (2013) Improved textual cyberbullying de-tection using data mining In International Journal of Informationand Computation Technology ISSN pages 0974ndash2239

Snyder B and Barzilay R (2007) Multiple aspect ranking using thegood grief algorithm In HLT-NAACL pages 300ndash307

Socher R Perelygin A Wu J Y Chuang J Manning C D Ng A Yand Potts C (2013) Recursive deep models for semantic composi-tionality over a sentiment treebank In Proceedings of the conferenceon empirical methods in natural language processing (EMNLP) vol-ume 1631 page 1642

Somprasertsri G and Lalitrojwong P (2010) Mining feature-opinionin online customer reviews for opinion summarization In J UCSvolume 16 pages 938ndash955

Souza M Vieira R Busetti D Chishman R Alves I M et al(2011) Construction of a portuguese opinion lexicon from multi-ple resources In STIL

BIBLIOGRAPHY 277

Steinberger J Ebrahim M Ehrmann M Hurriyetoglu A KabadjovM Lenkova P Steinberger R Tanev H Vaacutezquez S and ZavarellaV (2012) Creating sentiment dictionaries via triangulation In Deci-sion Support Systems volume 53 pages 689ndash694 Elsevier

Stigler G J (1961) The economics of information In The journal ofpolitical economy pages 213ndash225 JSTOR

Stone P J Dunphy D C and Smith M S (1966) The General In-quirer A Computer Approach to Content Analysis MIT press

Stone P J and Hunt E B (1963) A computer approach to contentanalysis studies using the general inquirer system In Proceedingsof the May 21-23 1963 spring joint computer conference pages 241ndash256 ACM

Strapparava C and Mihalcea R (2008) Learning to identify emo-tions in text In Proceedings of the 2008 ACM symposium on Appliedcomputing pages 1556ndash1560 ACM

Strapparava C Valitutti A et al (2004) WordNet-Affect an affectiveextension of WordNet In LREC volume 4 pages 1083ndash1086

Strapparava C Valitutti A and Stock O (2006) The affective weightof lexicon In Proceedings of the fifth international conference on lan-guage resources and evaluation pages 423ndash426

Swier R S and Stevenson S (2004) Unsupervised semantic role la-belling In Proceedings of EMNLP volume 95 page 102

Taboada M Anthony C and Voll K (2006) Methods for creatingsemantic orientation dictionaries In Proceedings of the 5th Inter-national Conference on Language Resources and Evaluation (LREC)Genova Italy pages 427ndash432

Taboada M Brooke J Tofiloski M Voll K and Stede M (2011)Lexicon-based methods for sentiment analysis In Computationallinguistics volume 37 pages 267ndash307 MIT Press

278 BIBLIOGRAPHY

Taboada M and Grieve J (2004) Analyzing appraisal automaticallyIn Proceedings of AAAI Spring Symposium on Exploring Attitude andAffect in Text (AAAI Technical Re port SS 04 07) Stanford Univer-sity CA pp 158q161 AAAI Press

Tan S Cheng X Wang Y and Xu H (2009) Adapting naive bayesto domain adaptation for sentiment analysis In Advances in Infor-mation Retrieval pages 337ndash349 Springer

Taylor A (1954) Proverbial comparisons and similes from Californiavolume 3 University of California Press

Tepperman J Traum D R and Narayanan S (2006) yeahright sarcasm recognition for spoken dialogue systems In INTER-SPEECH

Terveen L Hill W Amento B McDonald D and Creter J (1997)Phoaks A system for sharing recommendations In Communica-tions of the ACM volume 40 pages 59ndash62 ACM

Tesniegravere L (1959) Eleacutements de syntaxe structurale Librairie C Klinck-sieck

Thomas M Pang B and Lee L (2006) Get out the vote Deter-mining support or opposition from congressional floor-debate tran-scripts In Proceedings of the 2006 conference on empirical methodsin natural language processing pages 327ndash335 Association for Com-putational Linguistics

Tolone E and Stavroula V (2011) Extending the adverbial coverage ofa nlp oriented resource for french In arXiv preprint arXiv11113462

Tsur O Davidov D and Rappoport A (2010) Icwsm-a great catchyname Semi-supervised recognition of sarcastic sentences in onlineproduct reviews In ICWSM

Turney P D (2001) Mining the web for synonyms Pmi-ir versus lsaon toefl In Machine Learning ECML 2001 pages 491ndash502 Springer

BIBLIOGRAPHY 279

Turney P D (2002) Thumbs up or thumbs down semantic orienta-tion applied to unsupervised classification of reviews In Proceed-ings of the 40th annual meeting on association for computationallinguistics pages 417ndash424

Turney P D and Littman M L (2003) Measuring praise and criti-cism Inference of semantic orientation from association In ACMTransactions on Information Systems (TOIS) volume 21 pages 315ndash346

Veale T and Hao Y (2009) Support structures for linguistic creativityA computational analysis of creative irony in similes In Proceedingsof CogSci pages 1376ndash1381

Velikovich L Blair-Goldensohn S Hannan K and McDonald R(2010) The viability of web-derived polarity lexicons In HumanLanguage Technologies The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguisticspages 777ndash785 Association for Computational Linguistics

Vermeij M (2005) The orientation of user opinions through adverbsverbs and nouns In 3rd Twente Student Conference on IT EnschedeJune

Vietri S (1990) On some comparative frozen sentences in italianIn Lingvisticaelig Investigationes volume 14 pages 149ndash174 John Ben-jamins Publishing Company

Vietri S (2004) Lessico-grammatica dellrsquoitaliano Metodi descrizionie applicazioni UTET Universitagrave

Vietri S (2011) On a class of italian frozen sentences In LingvisticaeligInvestigationes volume 34 pages 228ndash267 John Benjamins Publish-ing Company

Vietri S (2014a) The construction of an annotated corpus for theanalysis of italian transfer predicates In Lingvisticae Investigationesvolume 37 pages 69ndash105 John Benjamins Publishing Company

280 BIBLIOGRAPHY

Vietri S (2014b) Idiomatic Constructions in Italian A Lexicon-grammar Approach volume 31 John Benjamins Publishing Com-pany

Vietri S (2014c) The italian module for nooj In In Proceedings of theFirst Italian Conference on Computational Linguistics CLiC-it 2014Pisa University Press

Vietri S (2014d) The lexicon-grammar of italian idioms In Work-shop on Lexical and Grammatical Resources for Language Process-ing page 137

Villars R L Olofson C W and Eastwood M (2011) Big data Whatit is and why you should care In White Paper IDC

Vincze V (2010) Speculation and negation annotation in naturallanguage texts what the case of bioscope might (not) reveal InNegation and Speculation in Natural Language Processing (NeSp-NLP 2010) page 28

Vincze V Szarvas G Farkas R Moacutera G and Csirik J (2008) Thebioscope corpus biomedical texts annotated for uncertainty nega-tion and their scopes In BMC bioinformatics volume 9 page S9BioMed Central Ltd

Voghera M (2004) Polirematiche In Grossmann M Rainer F(a curadi) La formazione delle parole in italiano Tbingen Niemeyer pages56ndash69

Voll K and Taboada M (2007) Not all words are created equal Ex-tracting semantic orientation as a function of adjective relevance InAI 2007 Advances in Artificial Intelligence pages 337ndash346 Springer

Vollero A (2010) E-marketing e Web communication Verso la gestionedella corporate reputation online Giappichelli Torino

BIBLIOGRAPHY 281

von Ungern-Sternberg T and von Weizsaumlcker C C (1985) The sup-ply of quality on a market for experience goods In The Journal ofIndustrial Economics pages 531ndash540 JSTOR

Voyatzi S (2006) Description morphosyntaxique et seacutemantique desadverbes figeacutes en vue drsquoun systeacuteme drsquoanalyse automatique des textesgrecs PhD thesis Universiteacute Paris-Est

Waltinger U (2010) Germanpolarityclues A lexical resource for ger-man sentiment analysis In LREC

Wan X (2009) Co-training for cross-lingual sentiment classificationIn Proceedings of the Joint Conference of the 47th Annual Meeting ofthe ACL and the 4th International Joint Conference on Natural Lan-guage Processing of the AFNLP Volume 1 pages 235ndash243 Associa-tion for Computational Linguistics

Wang X Zhao Y and Fu G (2011) A morpheme-based method tochinese sentence-level sentiment classification In Int J of AsianLang Proc volume 21 pages 95ndash106

Warriner A B Kuperman V and Brysbaert M (2013) Norms of va-lence arousal and dominance for 13915 english lemmas In Behav-ior research methods volume 45 pages 1191ndash1207 Springer

Wawer A (2012) Extracting emotive patterns for languages with richmorphology In International Journal of Computational Linguisticsand Applications volume 3 pages 11ndash24

Wei C-P Chen Y-M Yang C-S and Yang C C (2010) Un-derstanding what concerns consumers a semantic approach toproduct feature extraction from consumer reviews In Informa-tion Systems and E-Business Management volume 8 pages 149ndash167Springer

Whissel C (1989) The dictionary of affect in language emotion The-ory research and experience vol 4 the measurement of emotionsr In Plutchik and H Kellerman Eds New York Academic

282 BIBLIOGRAPHY

Whissell C (2009) Using the revised dictionary of affect in languageto quantify the emotional undertones of samples of natural lan-guage 1 2 In Psychological reports volume 105 pages 509ndash521 Am-mons Scientific

Whitley M S (1995) Gustar and other psych verbs a problem in tran-sitivity In Hispania pages 573ndash585 JSTOR

Wiebe J (2000) Learning subjective adjectives from corpora InAAAIIAAI pages 735ndash740

Wiebe J Wilson T Bruce R Bell M and Martin M (2004) Learn-ing subjective language In Computational linguistics volume 30pages 277ndash308 MIT Press

Wiebe J Wilson T and Cardie C (2005) Annotating expressionsof opinions and emotions in language In Language resources andevaluation volume 39 pages 165ndash210 Springer

Wiegand M Balahur A Roth B Klakow D and Montoyo A (2010)A survey on the role of negation in sentiment analysis In Proceed-ings of the workshop on negation and speculation in natural lan-guage processing pages 60ndash68 Association for Computational Lin-guistics

Wilson D and Sperber D (1992) On verbal irony In Lingua vol-ume 87 pages 53ndash76 Elsevier

Wilson T Wiebe J and Hoffmann P (2005) Recognizing contex-tual polarity in phrase-level sentiment analysis In Proceedings ofthe conference on human language technology and empirical meth-ods in natural language processing pages 347ndash354 Association forComputational Linguistics

Wilson T Wiebe J and Hoffmann P (2009) Recognizing contextualpolarity An exploration of features for phrase-level sentiment anal-ysis In Computational linguistics volume 35 pages 399ndash433 MITPress

BIBLIOGRAPHY 283

Xia R and Zong C (2010) Exploring the use of word relation featuresfor sentiment classification In Proceedings of the 23rd InternationalConference on Computational Linguistics Posters pages 1336ndash1344Association for Computational Linguistics

Xiang G Fan B Wang L Hong J and Rose C (2012) Detecting of-fensive tweets via topical feature discovery over a large scale twittercorpus In Proceedings of the 21st ACM international conference onInformation and knowledge management pages 1980ndash1984 ACM

Xu Z and Zhu S (2010) Filtering offensive language in online com-munities using grammatical relations In Proceedings of the SeventhAnnual Collaboration Electronic Messaging Anti-Abuse and SpamConference

Yang S and Ko Y (2011) Extracting comparative entities and pred-icates from texts using comparative type classification In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies-Volume 1 pages 1636ndash1644 Association for Computational Linguistics

Ye Q Law R Gu B and Chen W (2011) The influence of user-generated content on traveler behavior An empirical investigationon the effects of e-word-of-mouth to hotel online bookings In Com-puters in Human Behavior volume 27 pages 634ndash639 Elsevier

Ye Q Zhang Z and Law R (2009) Sentiment classification of on-line reviews to travel destinations by supervised machine learningapproaches In Expert Systems with Applications volume 36 pages6527ndash6535 Elsevier

Yessenalina A and Cardie C (2011) Compositional matrix-spacemodels for sentiment analysis In Proceedings of the Conference onEmpirical Methods in Natural Language Processing pages 172ndash182Association for Computational Linguistics

284 BIBLIOGRAPHY

Yi J Nasukawa T Bunescu R and Niblack W (2003) Sentimentanalyzer Extracting sentiments about a given topic using naturallanguage processing techniques In Data Mining 2003 ICDM 2003Third IEEE International Conference on pages 427ndash434

Yin D Xue Z Hong L Davison B D Kontostathis A and Ed-wards L (2009) Detection of harassment on web 20 In Proceedingsof the Content Analysis in the WEB volume 2 pages 1ndash7

Yuen R W Chan T Y Lai T B Kwong O Y and Trsquosou B K(2004) Morpheme-based derivation of bipolar semantic orientationof chinese words In Proceedings of the 20th international conferenceon Computational Linguistics page 1008 Association for Computa-tional Linguistics

Zhang L and Liu B (2011) Identifying noun product features thatimply opinions In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human Language Tech-nologies short papers-Volume 2 pages 575ndash580 Association forComputational Linguistics

Zhang L Liu B Lim S H and OrsquoBrien-Strain E (2010) Extractingand ranking product features in opinion documents In Proceed-ings of the 23rd international conference on computational linguis-tics Posters pages 1462ndash1470 Association for Computational Lin-guistics

Zhao Q Sun C Liu B and Cheng Y (2010) Learning to detecthedges and their scope using crf In Proceedings of the FourteenthConference on Computational Natural Language LearningmdashSharedTask pages 100ndash105 Association for Computational Linguistics

Zhou N (2010) Taboo Language on the Internet An Analysis of Gen-der Differences in Using Taboo Language PhD thesis KristianstadUniversity

  • Introduction
    • Research Context
    • Filtering Information Automatically
    • Subjectivity Emotions and Opinions
    • Structure of the Thesis
      • Theoretical Background
        • The Lexicon-Grammar Theoretical Framework
          • Distributional and Transformational Analysis
          • Formal Representation of Elementary Sentences
            • The Continuum from Simple Sentences to Polyrematic Structures
            • Against the Bipartition of the Sentence Structure
              • Lexicon-Grammar Binary Matrices
              • Syntax-Semantics Interface
                • Frame Semantics
                • Semantic Predicates Theory
                    • The Italian Module of Nooj
                      • Electronic Dictionaries
                      • Local grammars and Finite-state Automata
                          • SentIta Lexicon-Grammar based Sentiment Resources
                            • Lexicon-Grammar of Sentiment Expressions
                            • Literature Survey on Subjectivity Lexicons
                              • Polarity Lexicons
                              • Affective Lexicons
                              • Subjectivity Lexicons for Other Languages
                                • Italian Resources
                                • Sentiment Lexical Resources for Different Languages
                                    • SentIta and its Manually-built Resources
                                      • Semantic Orientations and Prior Polarities
                                      • Adjectives of Sentiment
                                        • A Special Remark on Negative Words in Dictionaries and Positive Biases in Corpora
                                          • Psychological and other Opinion Bearing Verbs
                                            • A Brief Literature Survey on the Classification of Psych Verbs
                                            • Semantic Predicates from the LG Framework
                                            • Psychological Predicates
                                            • Other Verbs of Sentiment
                                              • Psychological Predicates Nominalizations
                                                • The Presence of Support Verbs
                                                • The Absence of Support Verbs
                                                • Ordinary Verbs Structures
                                                • Standard-Crossed Structures
                                                  • Psychological Predicates Adjectivalizations
                                                    • Support Verbs
                                                    • Nominal groups with appropriate nouns Napp
                                                    • Verbless Structures
                                                    • Evaluation Verbs
                                                      • Vulgar Lexicon
                                                        • Towards a Lexicon of Opinionated Compounds
                                                          • Compound Adverbs and their Polarities
                                                            • Compound Adjectives of Sentiment
                                                              • Frozen Expressions of Sentiment
                                                                • The Lexicon-Grammar Classes Under Examination PECO CEAC EAA ECA and EAPC
                                                                • The Formalization of Sentiment Frozen Expressions
                                                                    • The Automatic Expansion of SentIta
                                                                      • Literature Survey on Sentiment Lexicon Propagation
                                                                        • Thesaurus Based Approaches
                                                                        • Corpus Based Approaches
                                                                        • The Morphological Approach
                                                                          • Deadjectival Adverbs in -mente
                                                                            • Semantics of the -mente formations
                                                                            • The Superlative of the -mente Adverbs
                                                                              • Deadjectival Nouns of Quality
                                                                              • Morphological Semantics
                                                                                  • Shifting Lexical Valences through FSA
                                                                                    • Contextual Valence Shifters
                                                                                    • Polarity Intensification and Downtoning
                                                                                      • Related Works
                                                                                      • Intensification and Downtoning in SentIta
                                                                                      • Excess Quantifiers
                                                                                        • Negation Modeling
                                                                                          • Related Works
                                                                                          • Negation in SentIta
                                                                                            • Modality Detection
                                                                                              • Related Works
                                                                                              • Modality in SentIta
                                                                                                • Comparative Sentences Mining
                                                                                                  • Related Works
                                                                                                  • Comparison in SentIta
                                                                                                  • Interaction with the Intensification Rules
                                                                                                  • Interaction with the Negation Rules
                                                                                                      • Specific Tasks of the Sentiment Analysis
                                                                                                        • Sentiment Polarity Classification
                                                                                                          • Related Works
                                                                                                          • DOXA a Sentiment Classifier based on FSA
                                                                                                          • A Dataset of Opinionated Reviews
                                                                                                          • Experiment and Evaluation
                                                                                                            • Sentence-level Sentiment Analysis
                                                                                                            • Document-level Sentiment Analysis
                                                                                                                • Feature-based Sentiment Analysis
                                                                                                                  • Related Works
                                                                                                                  • Experiment and Evaluation
                                                                                                                  • Opinion Visual Summarization
                                                                                                                    • Sentiment Role Labeling
                                                                                                                      • Related Works
                                                                                                                      • Experiment and Evaluation
                                                                                                                          • Conclusion
                                                                                                                          • Italian Lexicon-Grammar Symbols
                                                                                                                          • Acronyms
                                                                                                                          • Bibliography
Page 3: Detecting Subjectivity through Lexicon-Grammar Strategies

Contents

1 Introduction 111 Research Context 212 Filtering Information Automatically 513 Subjectivity Emotions and Opinions 714 Structure of the Thesis 9

2 Theoretical Background 1321 The Lexicon-Grammar Theoretical Framework 13

211 Distributional and Transformational Analysis 16212 Formal Representation of Elementary Sentences 19

2121 The Continuum from Simple Sentences to Polyrematic Structures 212122 Against the Bipartition of the Sentence Structure 22

213 Lexicon-Grammar Binary Matrices 25214 Syntax-Semantics Interface 27

2141 Frame Semantics 292142 Semantic Predicates Theory 30

22 The Italian Module of Nooj 34221 Electronic Dictionaries 35222 Local grammars and Finite-state Automata 36

3 SentIta Lexicon-Grammar based Sentiment Resources 3931 Lexicon-Grammar of Sentiment Expressions 3932 Literature Survey on Subjectivity Lexicons 43

321 Polarity Lexicons 46322 Affective Lexicons 51323 Subjectivity Lexicons for Other Languages 53

3231 Italian Resources 533232 Sentiment Lexical Resources for Different Languages 55

33 SentIta and its Manually-built Resources 56

v

vi CONTENTS

331 Semantic Orientations and Prior Polarities 58332 Adjectives of Sentiment 60

3321 A Special Remark on Negative Words in Dictionaries and Posi-tive Biases in Corpora 62

333 Psychological and other Opinion Bearing Verbs 633331 A Brief Literature Survey on the Classification of Psych Verbs 633332 Semantic Predicates from the LG Framework 643333 Psychological Predicates 663334 Other Verbs of Sentiment 78

334 Psychological Predicatesrsquo Nominalizations 793341 The Presence of Support Verbs 843342 The Absence of Support Verbs 853343 Ordinary Verbs Structures 863344 Standard-Crossed Structures 89

335 Psychological Predicatesrsquo Adjectivalizations 893351 Support Verbs 953352 Nominal groups with appropriate nouns Napp 973353 Verbless Structures 1003354 Evaluation Verbs 101

336 Vulgar Lexicon 10234 Towards a Lexicon of Opinionated Compounds 107

341 Compound Adverbs and their Polarities 1073411 Compound Adjectives of Sentiment 111

342 Frozen Expressions of Sentiment 1123421 The Lexicon-Grammar Classes Under Examination PECO

CEAC EAA ECA and EAPC 1153422 The Formalization of Sentiment Frozen Expressions 118

35 The Automatic Expansion of SentIta 125351 Literature Survey on Sentiment Lexicon Propagation 126

3511 Thesaurus Based Approaches 1273512 Corpus Based Approaches 1303513 The Morphological Approach 132

352 Deadjectival Adverbs in -mente 1353521 Semantics of the -mente formations 1393522 The Superlative of the -mente Adverbs 141

353 Deadjectival Nouns of Quality 141354 Morphological Semantics 147

4 Shifting Lexical Valences through FSA 15341 Contextual Valence Shifters 15342 Polarity Intensification and Downtoning 154

421 Related Works 154

CONTENTS vii

422 Intensification and Downtoning in SentIta 155423 Excess Quantifiers 158

43 Negation Modeling 160431 Related Works 160432 Negation in SentIta 163

44 Modality Detection 170441 Related Works 170442 Modality in SentIta 172

45 Comparative Sentences Mining 175451 Related Works 175452 Comparison in SentIta 178453 Interaction with the Intensification Rules 179454 Interaction with the Negation Rules 182

5 Specific Tasks of the Sentiment Analysis 18551 Sentiment Polarity Classification 185

511 Related Works 186512 DOXA a Sentiment Classifier based on FSA 188513 A Dataset of Opinionated Reviews 189514 Experiment and Evaluation 191

5141 Sentence-level Sentiment Analysis 1925142 Document-level Sentiment Analysis 195

52 Feature-based Sentiment Analysis 197521 Related Works 198522 Experiment and Evaluation 203523 Opinion Visual Summarization 210

53 Sentiment Role Labeling 211531 Related Works 213532 Experiment and Evaluation 215

6 Conclusion 223

A Italian Lexicon-Grammar Symbols 229

B Acronyms 231

Bibliography 239

Chapter 1

Introduction

Consumers as Internet users can freely share their thoughts withhuge and geographically dispersed groups of people competing thisway with the traditional power of marketing and advertising channelsDifferently from the traditional word-of-mouth which is usually lim-ited to private conversations the Internet used generated contentscan be directly observed and described by the researchers (Pang andLee 2008a p 7) identified the 2001 as the ldquobeginning of widespreadawareness of the research problems and opportunities that SentimentAnalysis and Opinion Mining raise and subsequently there have beenliterally hundreds of papers published on the subjectrdquoThe context that facilitated the growth of interest around the auto-matic treatment of opinion and emotions can be summarized in thefollowing points

bull the expansion of the e-commerce (Matthews et al 2001)

bull the growth of the user generated contents (forum discussiongroup blog social media review website aggregation site) thatcan constitute large scale databases for the machine learning al-gorithms training (Chin 2005 Pang and Lee 2008a)

1

2 CHAPTER 1 INTRODUCTION

bull the rise of machine learning methods (Pang and Lee 2008a)

bull the importance of the on-line Word of Mouth (eWOM) (Gruenet al 2006 Lee and Youn 2009 Park and Lee 2009 Chu and Kim2011)

bull the customer empowerment (Vollero 2010)

bull the large volume the high velocity and the wide variety of un-structured data (Russom et al 2011 Villars et al 2011 McAfeeet al 2012)

The same preconditions caused a parallel raising of attention alsoin economics and marketing literature studies Basically through usergenerated contents consumers share positive or negative informationthat can influence in many different ways the purchase decisionsand can model the buyer expectations above all with regard to experi-ence goods (Nakayama et al 2010) such as hotels (Ye et al 2011 Nel-son 1970) restaurants (Zhang et al 2010) movies (Duan et al 2008Reinstein and Snyder 2005) books (Chevalier and Mayzlin 2006) orvideogames (Zhu and Zhang 2006 Bounie et al 2005)

11 Research Context

Experience goods are products or services whose qualities cannot beobserved prior to purchase They are distinguished from search goods(eg clothes smartphones) on the base of the possibility to test thembefore the purchase and depending on the information costs (Nelson1970) (Akerlof 1970 p 488-490) clearly exemplified the problemscaused by the interaction of quality differences and uncertainty

ldquoThere are many markets in which buyers use some marketstatistic to judge the quality of prospective purchases []The automobile market is used as a finger exercise to illus-trate and develop these thoughts [] The individuals in this

11 RESEARCH CONTEXT 3

market buy a new automobile without knowing whether thecar they buy will be good or a lemon1 [] After owninga specific car however for a length of time the car ownercan form a good idea of the quality of this machine [] Anasymmetry in available information has developed for thesellers now have more knowledge about the quality of a carthan the buyers [] The bad cars sell at the same price asgood cars since it is impossible for a buyer to tell the differ-ence between a good and a bad car only the seller knowsrdquo

Asymmetric information could damage both the customers whenthey are not able to recognize good quality products and servicesand the market itself when it is negatively affected by the presenceof a smaller quantity of quality products with respect to a quantityof quality products that would have been offered under the conditionof symmetrical information (Lavanda and Rampa 2001 von Ungern-Sternberg and von Weizsaumlcker 1985)However it has been deemed possible for experience goods to basean evaluation of their quality on the past experiences of customersthat in previous ldquoperiodsrdquo had already experienced the same goods

ldquoIn most real world situations suppliers stay on the marketfor considerably longer than one lsquoperiodrsquo They then havethe possibility to build up reputations or goodwill with theconsumers This is due to the following mechanism Whilethe consumers cannot directly observe a productrsquos quality atthe time of purchase they may try to draw inferences aboutthis quality from the past experience they (or others) havehad with this supplierrsquos productsrdquo (von Ungern-Sternbergand von Weizsaumlcker 1985 p 531)

Possible sources of information for buyers of experience goods areword of mouth and the good reputation of the sellers (von Ungern-Sternberg and von Weizsaumlcker 1985 Diamond 1989 Klein and Lef-

1A defective car discovered to be bad only after it has been bought

4 CHAPTER 1 INTRODUCTION

fler 1981 Shapiro 1982 1983 Dewally and Ederington 2006)The possibility to survey other customersrsquo experiences increases thechance of selecting a good quality service and the customer expectedutility It is affordable to continue with this search until the cost ofsuch is equated to its expected marginal return This is the ldquooptimumamount of searchrdquo defined by Stigler (1961)The rapid growth of the Internet drew the managers and business aca-demics attention to the possible influences that this medium can ex-ert on customers information search behaviors and acquisition pro-cesses The hypothesis that the Web 20 as interactive medium couldmake transparent the experience goods market by transforming it intoa search goods market has been explored by Klein (1998) who hasbased his idea on three factors

bull lower search costs

bull different evaluation of some product (attributes)

bull possibility to experience products (attributes) virtually withoutphysically inspecting them

Actually this idea seems too groundbreaking to be entirely true In-deed the real outcome of web searches depends on the quantity ofavailable information on its quality and on the possibility to makefruitful comparisons among alternatives (Alba et al 1997 Nakayamaet al 2010)Therefore from this perspective it is true that the growth of the usergenerated contents and the eWOM can truly reduce the informa-tion search costs On the other hand the distance increased by e-commerce the content explosion and the information overload typi-cal of the Big Data age can seriously hinder the achievement of a sym-metrical distribution of the information affecting not only the marketof experience goods but also that of search goods

12 FILTERING INFORMATION AUTOMATICALLY 5

12 Filtering Information Automatically

The last remarks on the influence of Internet on information searchesinevitably reopen the long-standing problems connected to web in-formation filtering (Hanani et al 2001) and to the possibility of auto-matically ldquoextracting value from chaosrdquo (Gantz and Reinsel 2011)An appropriate management of online corporate reputation requiresa careful monitoring of the new digital environments that strengthenthe stakeholdersrsquo influence and independence and give support dur-ing the decision making process As an example to deeply analyzethem can indicate to a company the areas that need improvementsFurthermore they can be a valid support in the price determinationor in the demand prediction (Bloom 2011 Pang and Lee 2008a)It would be difficult indeed for humans to read and summarize sucha huge volume of data about costumer opinions However in otherrespects to introduce machines to the semantic dimension of humanlanguage remains an ongoing challengeIn this context business and intelligence applications would play acrucial role in the ability to automatically analyze and summarize notonly databases but also raw data in real timeThe largest amount of on-line data is semistructured or unstructuredand as a result its monitoring requires sophisticated Natural Lan-guage Processing (NLP) tools that must be able to pre-process themfrom their linguistic point of view and then automatically accesstheir semantic contentTraditional NLP tasks consist in Data Mining Information Retrievaland Extraction of facts objective information about entities fromsemistructured and unstructured textsIn any case it is of crucial importance for both customers and compa-nies to dispose of automatically extracted analyzed and summarizeddata which do not include only factual information but also opinionsregarding any kind of good they offer (Liu 2010 Bloom 2011)Companies could take advantage of concise and comprehensive cus-

6 CHAPTER 1 INTRODUCTION

tomer opinion overviews that automatically summarize the strengthsand the weaknesses of their products and services with evident ben-efits in term of reputation management and customer relationshipmanagementCustomer information search costs could be decreased trough thesame overviews which offer the opportunity to evaluate and comparethe positive and negative experiences of other consumers who havealready tested the same products and servicesObviously the advantages of sophisticated NLP methods and soft-ware and their ability to distinguish factual from opinionated lan-guage are not limited to the ones discussed so far but they are dis-persed and specialized among different tasks and domains such as

Ads placement possibility to avoid websites that are irrelevant butalso unfavorable when placing advertisements (Jin et al 2007)

Question-answering chance to distinguish between factual andspeculative answers (Carbonell 1979)

Text summarization potential of summarizing different opinionsand perspectives in addition to factual information (Seki et al2005)

Recommendation systems opportunity to recommend only the itemswith positive feedback (Terveen et al 1997)

Flame and cyberbullying detection ability to find semantically andsyntactically complex offensive contents on the web (Xiang et al2012 Chen et al 2012 Reynolds et al 2011)

Literary reputation tracking possibility to distinguish disapprovingfrom supporting citations (Piao et al 2007 Taboada et al 2006)

Political texts analysis opportunity to better understand positive ornegative orientations of both voters or politicians (Laver et al2003 Mullen and Malouf 2006)

13 SUBJECTIVITY EMOTIONS AND OPINIONS 7

13 Subjectivity Emotions and Opinions

Sentiment Analysis Opinion Mining subjectivity analysis review min-ing appraisal extraction affective computing are all terms that refer tothe computational treatment of opinion sentiment and subjectivity inraw text Their ultimate shared goal is enabling computers to recog-nize and express emotions (Pang and Lee 2008a Picard and Picard1997)The first two terms are definitely the most used ones in literature Al-though they basically refer to the same subject they have demon-strated varying degrees of success into different research communi-ties While Opinion Mining is more popular among web informationretrieval researchers Sentiment Analysis is more used into NLP en-vironments (Pang and Lee 2008a) Therefore the coherence of ourworkrsquos approach is the only reason that justifies the favor of the termSentiment Analysis All the terms mentioned above could be used andinterpreted nearly interchangeablySubjectivity is directly connected to the expression of private statespersonal feelings or beliefs toward entities events and their proper-ties (Liu 2010) The expression private state has been defined by Quirket al (1985) and Wiebe et al (2004) as a general covering term for opin-ions evaluations emotions and speculationsThe main difference from objectivity regards the impossibility to di-rectly observe or verify subjective language but neither subjective norobjective implies truth ldquowhether or not the source truly believes theinformation and whether or not the information is in fact true areconsiderations outside the purview of a theory of linguistic subjectiv-ityrdquo (Wiebe et al 2004 p 281)Opinions are defined as positive or negative views attitudes emotionsor appraisals about a topic expressed by an opinion holder in a giventime They are represented by Liu (2010) as the following quintuple

oj fjk ooijkl hi tl

8 CHAPTER 1 INTRODUCTION

Where oj represents the object of the opinion fjk its features ooijklthe positive or negative opinion semantic orientation hi the opinionholder and tl the time in which the opinion is expressedBecause the time can almost alway be deducted from structured datawe focused our work on the automatic detection and annotation ofthe other elements of the quintupleAs regard the opinion holder it must be underlined that ldquoIn the caseof product reviews and blogs opinion holders are usually the authorsof the posts Opinion holders are more important in news articles be-cause they often explicitly state the person or organization that holdsa particular opinionrdquo (Liu 2010 p 2)The Semantic Orientation (SO) gives a measure to opinions byweighing their polarity (positivenegativeneutral) and strength (in-tenseweak) (Taboada et al 2011 Liu 2010) The polarity can be as-signed to words and phrases inside or outside of a sentence or dis-course context In the first case it is called prior polarity (Osgood1952) in the second case contextual or posterior polarity (Gatti andGuerini 2012)We can refer to both opinion objects and features with the term target(Liu 2010) represented by the following function

T=O(f)

Where the object can take the shape of products services individ-uals organizations events topics etc and the features are compo-nents or attributes of the objectEach object O represented as a ldquospecial featurerdquo which is defined by asubset of features is formalized in the following way

F = f1 f2 fn

Targets can be automatically discovered in texts through both syn-onym words and phrases Wi or indicators Ii (Liu 2010)

Wi = wi1 wi2 wim

14 STRUCTURE OF THE THESIS 9

Ii = ii1 ii2 iiq

Emotions constitute the research object of the Emotion Detec-tion (Strapparava et al 2004 Fragopanagos and Taylor 2005 Almet al 2005 Strapparava et al 2006 Neviarouskaya et al 2007 Strap-parava and Mihalcea 2008 Whissell 2009 Argamon et al 2009Neviarouskaya et al 2009b 2011 de Albornoz et al 2012) a very ac-tive NLP subfield that encroaches on Speech Recognition (Buchananet al 2000 Lee et al 2002 Pierre-Yves 2003 Schuller et al 2003Bhatti et al Schuller et al 2004) and Facial Expression Recognition(Essa et al 1997 Niedenthal et al 2000 Pantic and Patras 2006 Palet al 2006 De Silva et al 1997) In this NLP based work we focus onthe Emotion Detection from written raw textsIn order to provide a formal definition of sentiments we refer to thefunction of Gross (1995)

P(senth)

Caus(s sent h)

In the first one the experiencer of the emotion (h) is function ofa sentiment (Sent) through a predicative relationship the latter ex-presses a sentiment that is caused by a stimulus (s) on a person (h)

14 Structure of the Thesis

In this Chapter we introduced the research field of the SentimentAnalysis placing it in the wider framework of Computational Linguis-tics and Natural Language Processing We tried to clarify some termi-nological issues raised by the proliferation and the overlap of termsin the literature studies on subjectivity The choice of these topics hasbeen justified through the brief presentation of the research contextwhich poses numerous challenges to all the Internet stakeholdersThe main aim of the thesis has been identified with the necessity to

10 CHAPTER 1 INTRODUCTION

treat both facts and opinion expressed in the the Web in the form ofraw data with the same accuracy and velocity of the data stored indatabase tablesDetails about the contribution provided by this work to SentimentAnalysis studies are structured as followsChapter 2 introduces the theoretical framework on which the wholework has been grounded the Lexicon-Grammar (LG) approach withan in-depth examination of the distributional and transformationalanalysis and the semantic predicates theory (Section 21)The assumption that the elementary sentences are the basic discourseunits endowed with meaning and the necessity to replace a uniquegrammar with plenty of lexicon-dependent local grammars are thefounding ideas from the LG approach on which the research hasbeen designed and realizedThe formalization of the sentiment lexical resources and tools as beenperformed through the tools described in Section 22Coherently with the LG method our approach to Sentiment Analy-sis is lexicon-based Chapter 3 provides a detailed description of thesentiment lexicon built ad hoc for the Italian language It includesmanually annotated resources for both simple (Section 33) and mul-tiword (Section 34) units and automatically derived annotations (Sec-tion 35) for the adverbs in -mente and for the nouns of qualityBecause the wordrsquos meaning cannot be considered out of contextChapter 3 explores the influences of some elementary sentence struc-tures on the sentiment words occurring in them while Chapter 4 in-vestigates the effects on the sentence polarities of the so called Con-textual Valence Shifters (CVS) namely intensification and downtoning(Section 42) Negation (Section 43) Modality (Section 44) and Com-parison (Section 45)Chapter 5 focuses on the three most challenging subtasks of the Senti-ment Analysis the sentiment polarity classification of sentences anddocuments (Section 51) the Sentiment Analysis based on features(Section 52) and the sentiment-based semantic role labeling (Sec-

14 STRUCTURE OF THE THESIS 11

tion 53) Experiments about these tasks are conduced through ourresources and rules respectively on a multi-domain corpus of cus-tomer reviews on a dataset of hotels comments and on a dataset thatmixes tweets and news headlinesConclusive remarks and limitations of the present work are reportedin Chapter 6Due to the large number of challenges faced in this work and in or-der to avoid penalizing the clarity and the readability of the thesiswe preferred to mention the state of the art works related to a giventopic directly in the sections in which the topic is discussed For easyreference we anticipate that the literature survey on the already exis-tent sentiment lexicons is reported in Section 32 the methods testedin literature for the lexicon propagation are presented in 351 worksrelated to intensification and negation modeling to modality detec-tion and to comparative sentence mining are respectively mentionedin Sections 421 431 441 and 451 in the end state of the artapproaches to sentiment classification the feature-based SentimentAnalysis and the Sentiment Role Labeling (SRL) tasks are cited in Sec-tions 511 521 and 531Some Sections of this thesis refer to works that in part have been al-ready presented to the international computational linguistics com-munity These are the cases of Chapter 3 briefly introduced in Pelosi(2015ab) and Pelosi et al (2015) and Chapter 4 anticipated in Pelosiet al (2015) Furthermore the results presented in Section 51 havebeen discussed in Maisto and Pelosi (2014b) as well as the results ofSection 52 in Maisto and Pelosi (2014a) and the ones of Section 53 inElia et al (2015)

Chapter 2

Theoretical Background

21 The Lexicon-Grammar Theoretical Framework

With Lexicon-Grammar we mean the method and the practice offormal description of the natural language introduced by MauriceGross in the second half of the 1960s who during the verificationof some procedures from the transformational-generative grammar(TGG) (Chomsky 1965) laid the foundations for a brand new theo-retical frameworkMore specifically during the description of French complementclause verbs through a transformational-generative approach Gross(1975) realized that the English descriptive model of Rosenbaum(1967) was not enough to take into account the huge number of ir-regularities he found in the French languageLG introduced important changes in the way in which the relationshipbetween lexicon and syntax was conceived (Gross 1971 1975) It hasbeen underlined for the first time the necessity to provide linguisticdescriptions grounded on the systematic testing of syntactic and se-mantic rules along the whole lexicon and not only on a limited set of

13

14 CHAPTER 2 THEORETICAL BACKGROUND

speculative examplesActually in order to define what can be considered to be a rule andwhat must be considered an exception a small sample of the lexiconof the analyzed language collected in arbitrary ways can be truly mis-leading It is assumed to be impossible to make generalizations with-out verifying or falsifying the hypotheses According with (Gross 1997p 1)

ldquoGrammar or as it has now been called linguistic the-ory has always been driven by a quest for complete gen-eralizations resulting invariably in recent times in the pro-duction of abstract symbolism often semantic but alsoalgorithmic This development contrasts with that of theother Natural Sciences such as Biology or Geology wherethe main stream of activity was and is still now the searchand accumulation of exhaustive data Why the study of lan-guage turned out to be so different is a moot question Onecould argue that the study of sentences provides an endlessvariety of forms and that the observer himself can increasethis variety at will within his own production of new formsthat would seem to confirm that an exhaustive approachmakes no sense []But grammarians operating at the level of sentences seemto be interested only in elaborating general rules and do sowithout performing any sort of systematic observation andwithout a methodical accumulation of sentence forms to beused by further generations of scientistsrdquo

In the LG methodology it is crucial the collection and the analysisof a large quantity of linguistic facts and their continuous compari-son with the reality of the linguistic usages by examples and counter-examplesThe collection of the linguistic information is constantly registeredinto LG tables that cross-check the lexicon and the syntax of any given

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 15

language The ultimate purpose of this procedure is the definition ofwide-ranging classes of lexical items associated to transformationaldistributional and structural rules (DrsquoAgostino 1992)What emerges from the LG studies is that associating more than fiveor six properties to a lexical entries each one of such entries shows anindividual behavior that distinguishes it from any other lexical itemHowever it is always possible to organize a classification around at listone definitional property that is simultaneously accepted by all theitem belonging to a same LG class and for this reason is promoted asdistinctive feature of the classHaving established this it becomes easier to understand the necessityto replace ldquotherdquo grammar with thousands of lexicon-dependent localgrammarsThe choice of this paradigm is due to its compatibility with the pur-poses of the present work and with the computational linguistics ingeneral that in order to reach high performances in results requiresa large amount of linguistic data which must be as exhaustive repro-ducible and well organized as possibleThe huge linguistic datasets produced over the years by the inter-national LG community provide fine-grained semantic and syntac-tic descriptions of thousands of lexical entries also referred as ldquolexi-cally exhaustive grammarsrdquo (DrsquoAgostino 2005 DrsquoAgostino et al 2007Guglielmo 2009) available for reutilization in any kind of NLP task1The LG classification and description of the Italian verbs2 (Elia et al1981 Elia 1984 DrsquoAgostino 1992) is grounded on the differentiationof three different macro-classes transitive verbs intransitive verbsand complement clause verbs Every LG class has its own definitionalstructure that corresponds with the syntactic structure of the nuclearsentence selected by a given number of verbs (eg V for piovere ldquotorain and all the verbs of the class 1 N0 V for bruciare ldquoto burn and

1 LG descriptions are available for 4000+ nouns that enter into verb support constructions 7000+verbs 3000+ multiword adverbs and almost 1000 phrasal verbs (Elia 2014a)

2LG tables are freely available at the address httpdscunisaitcompostitavolecombo

tavoleasp

16 CHAPTER 2 THEORETICAL BACKGROUND

the other verbs of the class 3 N0 V da N1 for provenire ldquoto come fromand the verbs belonging to the class 6 etc) All the lexical entries arethen differentiated one another in each class by taking into accountall the transformational distributional and structural properties ac-cepted or rejected by every item

211 Distributional and Transformational Analysis

The Lexicon-Grammar theory lays its foundations on the Operator-argument grammar of Zellig S Harris the combinatorial system thatsupports the generation of utterances into the natural languageSaying that the operators store inside information regarding the sen-tence structures means to assume the nuclear sentence to be the min-imum discourse unit endowed with meaning (Gross 1992b)This premise is shared with the LG theory together with the centralityof the distributional analysis a method from the structural linguisticsthat has been formulated for the first time by Bloomfield (1933) andthen has been perfected by Harris (1970) The insight that some cat-egories of words can somehow control the functioning of a numberof actants through a dependency relationship called valency insteadcomes from Tesniegravere (1959)The distribution of an element A is defined by Harris (1970) as the sumof the contexts of A Where the context of A is the actual disposition ofits co-occurring elements It consists in the systematic substitution oflexical items with the aim of verifying the semantic or the transforma-tional reactions of the sentences All of this is governed by verisimil-itude rules that involve a graduation in the acceptability of the itemcombinationAlthough the native speakers of a language generally think that thesentence elements can be combined arbitrarily they actually choosea set of items along the classes that regularly appear together Theselection depends on the likelihood that an element co-occurs with

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 17

elements of one class rather than another3When two sequences can appear in the same context they have anequivalent distribution When they do not share any context they havea complementary distributionObviously the possibility to combine sentence elements is related tomany different levels of acceptability Basically the utterances pro-duced by the words co-occurrence can vary in terms of plausibilityIn the operator-argument grammar of Harris the verisimilitude of oc-currence of a word under an operator (or an argument) is an approxi-mation of the likelihood or of the frequency of this word with respectto a fixed number of occurrence of the operator (or argument) (Harris1988)Concerning transformations (Harris 1964) we refer to the phe-nomenon in which the sentences A and B despite a different combi-nation of words are equivalent to a semantic point of view (eg Flo-ria ha letto quel romanzo = quel romanzo egrave stato letto da Floria ldquoFlo-ria read that novel = that novel has been read by Floriardquo (DrsquoAgostino1992)) Therefore a transformation is defined as a function T that tiesa set of variables representing the sentences that are in paraphrasticequivalence (A and B are partial or full synonym) and that follow themorphemic invariance (A and B possess the same lexical morphemes)Transformational relations must not be mixed up with any derivationprocess of the sentence B from A and vice versa A and B are just twocorrelated variants in which equivalent grammatical relations are re-alized (DrsquoAgostino 1992)4 That is why in this work we do not use di-rected arrows to indicate transformed sentencesAmong the transformational manipulations considered in this thesiswhile allying the Lexicon-Grammar theoretical framework to the Sen-

3Among the traits that mark the co-occurrences classes we mention by way of example human andnot human nouns abstract nouns verbs with concrete or abstract usages etc (Elia et al 1981)

4 Different is the concept of transformation in the generative grammar that envisages an orientedpassage from a deep structure to a superficial one ldquoThe central idea of transformational grammar isthat they are in general distinct and that the surface structure is determined by repeated applicationof certain formal operations called lsquogrammatical transformationsrsquo to objects of a more elementary sortrdquo(Chomsky 1965 p 16)

18 CHAPTER 2 THEORETICAL BACKGROUND

timent Analysis field we mention the following (Elia et al 1981 Vietri2004)

Substitution The substitution of polar elements in sentences is thestarting point of this work that explores the mutations in term ofacceptability and semantic orientation in the transformed sen-tences It can be found almost in every chapter of this thesis

Paraphrases with support verbs5 nominalizations and adjectival-izations This manipulation is essential for example in Section334 and 335 where we discuss the nominalization (eg amare= amore ldquoto love = loverdquo) and the adjectivalization (eg amare =amoroso ldquoto love = lovingrdquo) of the Italian psychological verbs orin Section 352 when we account for the relation among a set ofsentiment adjectives and adverbs (eg appassionatamente = inmodo appassionato ldquopassionately = in a passionate wayrdquo)

Repositioning Emphasis permutation dislocation extraction are alltransformation that shifting the focus on a specific element ofthe sentence influence its strength when the element is polar-ized

Extension We are above all interested in the expansion of nominalgroups with opinionated modifiers (see Section 332)

Restructuring The complement restructuring plays a crucial role inthe Feature-based Sentiment Analysis for the correct annotationof the features and the objects of the opinions (see Sections 334and 52)

Passivization In our system of rules passive sentences merely pos-

5 The concept of operator does not depend on specific part of speech so also nouns adjectives andprepositions can possess the power to determine the nature and the number of the sentence arguments(DrsquoAgostino 1992) Because only the verbs carry out morpho-grammatical information regarding themood tense person and aspect they must give this kind of support to non-verbal operators The socalled Support Verbs (Vsup) are different from auxiliaries (Aux) that instead support other verbs Sup-port verbs (eg essere ldquoto berdquo avere ldquoto haverdquo fare ldquoto dordquo) can be case by case substituted by stylisticaspectual and causative equivalents

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 19

sess the same semantic orientation of the corresponding activesentence

Systematic correlation Examples are the standardcrossed structuresmentioned in Section 334

212 Formal Representation of Elementary Sentences

While the generative grammar focused its studies on complex sen-tences both the Operator grammar and the LG approach assumedthe respectively so called nuclear or elementary sentences to bethe minimum discourse units endowed with meaning Above all be-cause complex sentences are made from elementary ones as statedby (Chomsky 1957 p 92) himself

ldquoIn particular in order to understand a sentence it is neces-sary to know the kernel sentences from which it originates(more precisely the terminal strings underlying these ker-nel sentences) and the phrase structure of each of these el-ementary components as well as the transformational his-tory of development of the given sentence from these kernelsentences The general problem of analyzing the process oflsquounderstandingrsquo is thus reduced in a sense to the problemof explaining how kernel sentences are understood thesebeing considered the basic rsquocontent elementsrsquo from whichthe usual more complex sentences of real life are formed bytransformational developmentrdquo

This assumption shifts the definition of creativity from ldquorecursivityrdquoin complex sentences (Chomsky 1957) to ldquocombinatory possibilityrdquoat the level of elementary sentences (Vietri 2004) but affects also theway in which the lexicon is conceived (Gross 1981 p 48) made thefollowing observation in this regard6

6 On this topic also Chomsky agrees that

20 CHAPTER 2 THEORETICAL BACKGROUND

ldquoLes entreacutees du lexique ne sont pas des mots mais des phrasessimples Ce principe nrsquoest en contradiction avec les notionstraditionnelles de lexique que de faccedilon apparente En ef-fet dans un dictionnaire il nrsquoest pas possible de donner lesens drsquoun mot sans utiliser une phrase ni de contraster desemplois diffeacuterents drsquoun mecircme mot sans le placer dans desphrasesrdquo7

As it will be discovered in the following Sections the idea that theentries of the lexicon can be sentences rather that isolated words is aprinciple that has been entirely shared among the sentiment lexicalresources built for the present workWith regard to formalisms Gross (1992b) represented all the elemen-tary sentences through the following generic shape

N0 V WWhere N0 stands for the sentence formal subject V stands for the

verbs and W can indicate all kinds of essential complements includ-ing an empty one While N0 V represents a great generality W raisesmore complex classification problems (Gross 1992b) Complementsindicated by W can be direct (Ni) prepositional (Prep Ni) or sentential(Ch F)

ldquoIn describing the meaning of a word it is often expedient or necessary to refer to thesyntactic framework in which this word is usually embedded eg in describing themeaning of hit we would no doubt describe the agent and object of the action in termsof the notions subject and object which are apparently best analyzed as purelyformal notions belonging to the theory of grammarrdquo (Chomsky 1957 p 104)

But without any exhaustive application of syntactic rules to lexical items he feels compelled to pointout that

ldquo() to generalize from this fairly systematic use and to assign lsquostructural meaningsrsquo togrammatical categories or constructions just as lsquolexical meaningsrsquo are assigned to wordsor morphemes is a step of very questionable validityrdquo (Chomsky 1957 p 104)

7 ldquoThe entries of the lexicon are not words but simple sentences This principle contradicts the tradi-tional notions of lexicon just into an apparent way Indeed into a dictionary it is impossible to providethe meaning of a word without a sentence nor to contrast different uses of the same word without placingit into sentencesrdquo Author translation

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 21

Information about the nature and the number of complement arecontained in the verbs (or in other parts of speech that case by caseplay the a predicative function)Gross (1992b) presented the following typology of complements re-spectively approached for the Italian language by the LG researchersmentioned below

frozen arguments (C) Vietri (1990 2011 2014d)

free arguments (N) DrsquoAgostino (1992) Bocchino (2007)

sentential arguments (Ch F) Elia (1984)

2121 The Continuum from Simple Sentences to Polyrematic Structures

Gross (1988 1992b 1993) recognized and formalized the concept ofphrase figeacutee ldquofrozen sentencerdquo which refers to groups of words thatcan be tied one another by different degrees of variability and cohe-sion (Elia and De Bueriis 2008) They can take the shape of verbalnominal adjectival or adverbial structuresThe continuum along which frozen expressions can be classifiedvaries according to higher or lower levels of compositionality and id-iomaticity and goes from completely free structures to totally frozenexpressions as summarized below (DrsquoAgostino and Elia 1998 Eliaand De Bueriis 2008)

distributionally free structures high levels of co-occurrence variabil-ity among words

distributionally restricted structures limited levels of co-occurrencevariability

frozen structures (almost) null levels of co-occurrence variability

proverbs no variability of co-occurrence

22 CHAPTER 2 THEORETICAL BACKGROUND

Idiomatic interpretations can be attributed to the last two elements ofthe continuum due to their low levels of compositionalityThe fact that the meaning of these expressions cannot be computedby simply summing up the meanings of the words that compose themhas a direct impact on every kind of Sentiment Analysis applicationthat does not correctly approaches the linguistic phenomenon of id-iomaticityWe will show the role of multiword expressions in Section 34

2122 Against the Bipartition of the Sentence Structure

In the operator grammar differently from the generative grammarsentences are not supposed to have a bipartite structure

S rarr N P + V P

V P rarrV + N P 8

but are represented as predicative functions in which a central ele-ment the operator influences the organization of its variables thearguments The role of operator must not be represented necessar-ily by verbs but also by other lexical items that possess a predicativefunction such as nous or adjectivesThis idea shared also by the LG framework is particularly importantin the Sentiment Analysis field especially if the task is the detection ofthe semantic roles involved into sentiment expressions (Section 53)Sentences like (1) and (2) are examples in which the semantic roles ofexperiencer of the sentiment and stimulus of the sentiment are respec-tively played by ldquoMariardquo and il fumo ldquothe smokerdquo It is intuitive thatfrom a syntactic point of view these semantic roles are differently dis-tributed into the direct (the experiencer is the subject such as in 1)and the reverse (the experiencer is the object as in 2) sentence struc-tures (see Section 333 for further explanations)

8Sentence = Noun Phrase + Verb Phrase Verb Phrase = Verb + Noun Phrase

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 23

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]9

ldquoMaria hates the smokerdquo

(2) [Sentiment[StimulusIl fumo] tormenta [ExperiencerMaria]]ldquoThe smoke harasses Mariardquo

The indissoluble bond between the sentiment and the experi-encer10 that does not differ from (1) to (2) is perfectly expressedby the Operator-argument grammar function Onn for both the sen-tences by the LG representation N0 V N1 or by the function of Gross(1995) Caus(ssenth) (described in Section 13 and deepened in Sec-tion 3)

(1) Caus(il fumo odiare Maria)(2) Caus(il fumo tormentare Maria)

The same can not be said of the sentence bipartite structure thatrepresenting the experiencer ldquoMariardquo inside (2) and outside (1) theverb phrase unreasonably brings it closer or further from the senti-ment to which it should be attached just in the same way (Figure 21)

Moreover it does not explain how the verbs of (1) and (2) can ex-ercise distributional constraints on the noun phrase indicating the ex-periecer (that must be human or at least animated) both in the casein which it is under the dependence of the same verb phrase (experi-encer object see 2b) and also when it is not (experiencer subject see2a)

(1a) [Sentiment [Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]odi a[Stimulusi l f umo]]

ldquo(Maria + My cat + The television) hates the smokerdquo

9For the sentence annotation we used the notation of Jackendoff (1990) [Event ]10Un sentiment est toujours attacheacute agrave la personne qui lrsquoeacuteprouve ldquoA sentiment is always attached to the

person that feels itrdquo (Gross 1995 p 1)

24 CHAPTER 2 THEORETICAL BACKGROUND

Figure 21 Syntactic Trees of the sentences (1) and (2)

(2a) [Sentiment[StimulusI l f umo]tor ment a[Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]]

ldquoThe smoke harasses (Maria + My cat + The television)rdquo

According with (Gross 1979 p 872) the problem is that

ldquorewriting rules (eg S rarr N P V P V P rarr V N P ) describeonly LOCAL dependencies [] But there are numerous syn-tactic situations that involve non-local constraintsrdquo

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 25

While the Lexicon-Grammar and the operator grammar represen-tations do not change the representations of the paraphrases of (12)with support verbs (1b2b) further complicate the generative gram-mar solution that still does not correctly represent the relationshipsamong the operators and the arguments (see Figure 22) It fails todetermine the correct number of arguments (that are still two andnot three) and does not recognize the predicative function which isplayed by the nouns odio ldquohaterdquo and tormento ldquoharassmentrdquo and notby the verbs provare ldquoto feelrdquo and dare ldquoto giverdquo

(1b) [Sentiment[ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

(2b) [Sentiment[StimulusIl fumo] dagrave il tormento a [ExperiencerMaria]]ldquoThe smoke gives harassment to Mariardquo

For all these reasons for the computational treatment of subjectiv-ity emotions and opinions in raw text we preferred in this work theLexicon-Grammar framework to other linguistic approaches despiteof their popularity in the NLP literature

213 Lexicon-Grammar Binary Matrices

We said that the difference between the LG language description andany other linguistic approach regards the systematic formalization ofa very broad quantity of data This means that given a set of proper-ties P that define a class of lexical items L the LG method collects andformalizes any single item that enters in such class (extensional classi-fication) while other approaches limit their analyses only to reducedsets of examples (intensional classification) (Elia et al 1981)Because the presentation of the information must be as transparentrigorous and formalized as possible LG represents its classes throughbinary matrices like the one exemplified in Table 21

They are called ldquobinaryrdquo because of the mathematical symbols ldquo+rdquo

26 CHAPTER 2 THEORETICAL BACKGROUND

Figure 22 Syntactic Trees of the sentences (1b) and (2b)

and ldquondashrdquo respectively used to indicate the acceptance or the refuse ofthe properties P by each lexical entry LThe symbol X unequivocally denotes an LG class and is always asso-ciated to a class definitional property simultaneously accepted by allthe items of the class (in the example of Table 21 P1) Subclasses of Xcan be easily built by choosing another P as definitional property (egL2 L3 and L4 under the definitional substructure P2)The sequences of ldquo+rdquo and ldquondashrdquo attributed to each entry constitute the

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 27

X P1 P2 P3 P4 Pn

L1 + ndash ndash ndash +L2 + + ndash ndash ndashL3 + + + + +L4 + + + + +Ln + ndash ndash ndash +

Table 21 Example of a Lexicon-Grammar Binary Matrix

ldquolexico-syntactic profilesrdquo of the lexical items (Elia 2014a)The properties on which the classification is arranged can regard thefollowing aspects (Elia et al 1981)

distributional properties by which all the co-occurrences of the lexi-cal items L with distributional classes and subclasses are testedinside acceptable elementary sentence structures

structural and transformational properties through which all thecombinatorial transformational possibilities inside elementarysentence structures are explored (see Section 211 for more in-depth explanations)

It is possible for a lexical entry to appear in more than one LG ma-trix In this case we are not referring to a single verb that accepts di-vergent syntactic and semantic properties but instead to two differ-ent verb usages attributable to the same morpho-phonological entry(Elia et al 1981 Vietri 2004 DrsquoAgostino 1992) Verb usages possessthe same syntactic and semantic autonomy of two verbs that do notpresent any morpho-phonological relation

214 Syntax-Semantics Interface

In the sixties the hot debate about the syntax-semantics Interface op-posed theories supporting the autonomy of the syntax from the se-mantics to approaches sustaining the close link between these two

28 CHAPTER 2 THEORETICAL BACKGROUND

linguistic dimensionsChomsky (1957) postulated the independence of the syntax from thesemantics due to the vagueness of the semantics and because of thecorrespondence mismatches among these two linguistic dimensionsbut recognized the existence of a relation between the two that de-serves to be explored

ldquoThere is however little evidence that lsquointuition aboutmeaningrsquo is at all useful in the actual investigation of lin-guistic form []It seems clear then that undeniable though only imperfectcorrespondences hold between formal and semantic fea-tures in language The fact that the correspondences are soinexact suggests that meaning will be relatively useless as abasis for grammatical description []The fact that correspondences between formal and seman-tic features exist however cannot be ignored [] Havingdetermined the syntactic structure of the language we canstudy the way in which this syntactic structure is put to usein the actual functioning of language An investigation ofthe semantic function of level structure () might be a rea-sonable step towards a theory of the interconnections be-tween syntax and semanticsrdquo (Chomsky 1957 p 94-102)

The contributions that come from the generative grammar withparticular reference to the theta-theory (Chomsky 1993) assume thepossibility of a mapping between syntax and semantics

ldquoEvery content bearing major phrasal constituent of a sen-tence (S NP AP PP ETC) corresponds to a conceptual con-stituent of some major conceptual category11rdquo (Jackendoff1990 p 44)

11Object Event State Action Place Path Property Amount

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 29

But their autonomy is not negated through this assumption the-matic roles are still considered to be part of the level of conceptualstructure and not part of syntax (Jackendoff 1990) This is confirmedagain by (Chomsky 1993 p 17) when stating that ldquo() the mappingof S-structures onto PF and LF are independent from one anotherrdquoHere the S-structure generated by the rules of the syntax is associ-ated by means of different interpretative rules with representations inphonetic form (PF) and in logical form (LF)Many contributions from the linguistic community define the con-cept of thematic roles through the linking problem between syntax(syntactic realization of predicate argument structures that determinethe roles) and semantics (semantic representation of such structures)(Dowty 1989 Grimshaw 1990 Rappaport Hovav and Levin 1998)Among the canonical thematic roles we mention the agent (Gruber1965) the patient Baker (1988) the experiencer (Grimshaw 1990) thetheme (Jackendoff 1972) ect

2141 Frame Semantics

ldquoSome words exist in order to provide access to knowledgeof such frames to the participants in the communicationprocess and simultaneously serve to perform a categoriza-tion which takes such framing for granted (Fillmore 2006p 381)

With these words Fillmore (2006) depicted the Frame Semanticswhich describes sentences on the base of predicators able to bring tomind the semantic frames (inference structures linked through lin-guistic convention to the lexical items meaning) and the frame ele-ments (participants and props in the frame) involved in these frames(Fillmore 1976 Fillmore and Baker 2001 Fillmore 2006)A frame semantic description starts from the identification of the lex-ical items that carry out a given meaning and then explores theways in which the frame elements and their constellations are realized

30 CHAPTER 2 THEORETICAL BACKGROUND

around the structures that have such items as head (Fillmore et al2002)This theoretical framework poses its bases on the Case grammar astudy on the combination of the deep cases chosen by verbs an on thedefinition of case frames that have the function of providing ldquoa bridgebetween descriptions of situations and underlying syntactic represen-tationsrdquo by attributing ldquosemantico-syntactic roles to particular partic-ipants in the (real or imagined) situation represented by the sentence(Fillmore 1977 p 61) A direct reference of Fillmorersquos work is the va-lency theory of Tesniegravere (1959)Based on these principles the FrameNet research project produced alexicon of English for both human use and NLP applications (Bakeret al 1998 Fillmore et al 2002 Ruppenhofer et al 2006) Its purposeis to provide a large amount of semantically and syntactically anno-tated sentences endowed with information about the valences (com-binatorial possibilities) of the items derived from annotated contem-porary English corpus Among the semantic domains covered thereare also emotion and cognition (Baker et al 1998)For the Italian language Lenci et al (2012) developed LexIt a toolthat following the FrameNet approach automatically explores syn-tactic and semantic properties of Italian predicates in terms of distri-butional profiles It performs frame semantic analyses using both LaRepubblica corpus and the Wikipedia taxonomy

2142 Semantic Predicates Theory

The Lexicon-Grammar framework offers the opportunity to creatematches between sets or subsets of lexico-syntactic structures andtheir semantic interpretations The base of such matches is the con-nection between the arguments selected by a given predicative itemlisted in our tables and the actants involved by the same semanticpredicate According with (Gross 1981 p 9)

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 31

ldquoSoit Sy lrsquoensemble des formes syntaxiques drsquoune langue []Soit Se un ensemble drsquoeacuteleacutements de sens les eacuteleacutements de Syseront associeacutes agrave ceux de Se par des regravegles drsquointerpreacutetation[] nous appellerons actants syntaxiques les sujet et com-pleacutement(s) du verbe tels qursquoils sont deacutecrits dans Sy nous ap-pelons arguments les variables des preacutedicats seacutemantiquesDans certains exemples il y a correspondance biunivoqueentre actants et arguments entre phrase simple et preacutedi-catrdquo12

This is the basic assumption on which the Semantic Predicates the-ory has been built into the LG framework It postulates a complexparallelism between the Sy and the Se that differently from other ap-proaches can not ignore the idiosyncrasies of the lexicon since ldquoilnrsquoexiste pas deux verbes ayant le mecircme ensemble de proprieacuteteacutes syntax-iquesrdquo (Gross 1981 p 10) Actually the large number of structures inwhich the words can enter underlines the importance of a sentimentlexical database that includes both semantic and (lexicon dependent)syntactic classification parameters

Therefore the possibility to create intuitive semantic macro-classifications into the LG framework does not contrast with the au-tonomy of the semantics from the syntax (Elia 2014b)In this thesis we will refer to following three different predicates all ofthem connected to the expression of subjectivity

Sentiment (eg odiare to hate) that involves the actants experi-encer and stimulus

Opinion (eg difendere to stand up for) that implicates the ac-tants opinion holder and target of the opinion

12 ldquoLet Sy be the set of syntactic forms of a language [] Let it Se be the set of semantic elementsthe elements from Sy will be associated to the ones of Se through interpretation rules [] We will callsyntactic actants the subject and the complement(s) of the verb in the way in which they are describedin Sy we will call arguments the variables of semantic predicates In some examples there is a one-to-one correspondence between actants and arguments between simple sentence and predicaterdquo Authortranslation

32 CHAPTER 2 THEORETICAL BACKGROUND

Figure 23 LG syntax-semantic interface

Physical act (eg baciare to kiss) that evokes the actants patientand agent

As an example the predicate in the sentence (1) from the LG class43 will be associated to a Predicate with two variables described bythe mathematical function Caus(ssenth)

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]]ldquoMaria hates the smokerdquo

The rules of interpretations which have been followed are

1 the experiencer (h in the Se) corresponds to the formal subject(N0 in Sy)

2 the sentiment stimulus (t in the Se) is the human complement(N1 in Sy)

As shown in the few examples below the syntactic transformationsin which the same predicate is involved preserve the role played by itsarguments

Nominalization

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 33

(1b) [Sentiment [ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

Adjectivalization

(1c) [Sentiment [Stimulusil fumo] egrave odioso per [ExperiencerMaria]]ldquoThe smoke is hateful for Mariardquo

Passivization

(1d) [Sentiment [Stimulusil fumo] egrave odiato da [ExperiencerMaria]]ldquoThe smoke is hated by Mariardquo

Repositioning

(1e) [Sentiment egrave il [Stimulusil fumo] che [ExperiencerMaria] odia]ldquoIt is the smoke that Maria hatesrdquo

As concerns the link among actants and arguments the back-ground of the research is the LEG-Semantic Role Labeling system ofElia (2014b) with particular reference to its psychological evaluativeand bodily predicates The granularity of the verb properties in thisSRL system are evidence of the strong dependence of the syntax fromthe lexicon and proof of its absolute independence from semanticsSpecial kinds of Semantic Predicates have been already used in NLPapplications into a Lexicon-Grammar context we mention Vietri(2014a) Elia et al (2010) Elia and Vietri (2010) that formalized andtested the Transfer Predicates on the Italian Civil Code Elia et al(2013) that focused on the Spatial Predicates and Maisto and Pelosi(2014b) Pelosi (2015b) Elia et al (2015) that exploited the Psycholog-ical Semantic Predicates for Sentiment Analysis purposes

34 CHAPTER 2 THEORETICAL BACKGROUND

22 The Italian Module of Nooj

Nooj is the NLP tool used in this work for both the language formal-ization and the corpora pre-processing and processing at the ortho-graphical lexical morphological syntactic and semantic levels (Sil-berztein 2003)Among the Nooj modules that have been developed for more thantwenty languages by the international Nooj community the ItalianLingware has been built by the Maurice Gross Laboratory from theUniversity of SalernoThe Italian module for NooJ can be any time integrated with other adhoc resources in form of electronic dictionaries and local grammarsThe freely downloadable Italian resources13 include

bull electronic dictionaries of simple and compound words

bull an electronic dictionary of proper names

bull an electronic dictionary of toponyms

bull a set of morphological grammars

bull samples of syntactic grammars

This knowledge and linguistic based environment is not in contrastwith hybrid and statistical approaches that often require annotatedcorpora for their testing stagesThe Nooj annotations can go through the description of simple wordforms multiword units and even discontinuous expressionsFor a detailed description of the potential of the Italian module ofNooj see Vietri (2014c)

13wwwnooj-associationorg

22 THE ITALIAN MODULE OF NOOJ 35

221 Electronic Dictionaries

If we tried to digit on a search engine the key-phrase ldquoelectronic dic-tionaryrdquo the first results that would appear in the SERP would concernCD-ROM dictionaries or machine translatorsActually electronic dictionaries (usable with all the computerizeddocuments for the text recognition for the information retrieval andfor the machine translation) should be conceived as lexical databasesintended to be used only by computer applications rather than by awide audience who probably wouldnrsquot be able to interpret data codesformalized in a complex wayIn addition to the target the differences between the two types of dic-tionary can be summarized in three points

Completeness human dictionaries may overlook information thatthe human users could extract from their encyclopedic knowl-edge Electronic dictionaries instead must necessarily be ex-haustive They cannot leave anything to chance the computermustnrsquot have possibility of error

Explanation human dictionaries can provide part of the informationimplicitly because they can rely on human userrsquos skills (intuitionadaptation and deduction) Electronic dictionaries on the con-trary must respect this principle the computer can only performfully explained symbols and instructions

Coding unlike human dictionaries all the information provided intothe electronic dictionaries related to the use of software for auto-matic processing of data and texts must be accurate consistentand codified

Joining all these features enlarges the size and magnifies the com-plexity of monolingual and bilingual electronic dictionaries whichare for this reason always bigger than the human onesThe Italian electronic dictionaries system based on the Maurice

36 CHAPTER 2 THEORETICAL BACKGROUND

Grossrsquos Lexicon-Grammar has been developed at the University ofSalerno by the Department of Communication Science it is groundedon the European standard RELEX

222 Local grammars and Finite-state Automata

Local grammars are algorithms that through grammatical morpho-logical and lexical instructions are used to formalize linguistic phe-nomena and to parse texts They are defined ldquolocalrdquo because despiteany generalization they can be used only in the description and anal-ysis of limited linguistic phenomenaThe reliability of ldquofinite state Markov processesrdquo (Markov 1971) hadalready been excluded by (Chomsky 1957 p 23-24) who argued that

ldquo() there are processes of sentence-formation that finitestate grammars are intrinsically not equipped to handle Ifthese processes have no finite limit we can prove the literalinapplicability of this elementary theory If the processeshave a limit then the construction of a finite state grammarwill not be literally out of the question since it will be possi-ble to list the sentences and a list is essentially a trivial finitestate grammar But this grammar will be so complex that itwill be of little use or interestrdquo

and remarked that

ldquoIf a grammar does not have recursive devices () it will beprohibitively complex If it does have recursive devices ofsome sort it will produce infinitely many sentencesrdquo

Instead Gross (1993) caught the potential of finite state machinesespecially for those linguistic segments which are more limited from acombinatorial point of viewThe Nooj software actually is not limited to finite-state machines andregular grammars but relies on the four types of grammars of the

22 THE ITALIAN MODULE OF NOOJ 37

Figure 24 Deterministic Finite-State Automaton

Figure 25 Non deterministic Finite-State Automaton

Figure 26 Finite-State Transducer

Figure 27 Enhanced Recursive Transition Network

38 CHAPTER 2 THEORETICAL BACKGROUND

Chomsky hierarchy In fact as we will show in this research it is ca-pable to process context free and sensitive grammars and to performtransformations in cascadeFinite-State Automata (FSA) are abstract devices characterized by a fi-nite set of nodes or ldquostatesrdquo (Si) connected one another by transitions(tj) that allow us to determine sequences of symbols related to a par-ticular path (see Figure 24)These graphs are read from left to right or rather from the initial state(S0) to the final state (S3) (Gross 1993) FSA can be deterministic suchas the one in Figure 24 or non deterministic if they activate morethen one path as in Figure 25 A Finite-State Transducer (FST) tans-duces the input symbols (Sii) into the output ones (Sio) as in Figure26 A Recursive Transition Network (RTN) is a grammar that containsembedded graphs (the S3 node in Figure 27) An Enhanced RecursiveTransition Network (ERTN) is a more complex grammar that includesvariables (V) and constraints (C) (Silberztein 2003 see Figure 27)For simplicity from now on we will just use the acronym FSA in orderto indicate all the kinds of local grammars used to formalize differentlinguistic phenomenaIn general the nodes of the graphs can anytime recall entries or se-mantic and syntactic information from the electronic dictionariesFSA are inflectional or derivational if the symbols in the nodes aremorphemes syntactic if the symbols are words

Chapter 3

SentIta Lexicon-Grammarbased Sentiment Resources

31 Lexicon-Grammar of Sentiment Expressions

The LG theoretical framework with its huge collections and catego-rization of linguistic data in the form of LG tables built and refinedover years by its researchers offers the opportunity to systematicallyrelate a large number of semantic and syntactic properties to the lex-icon The complex intersection between semantics and syntax thegreatest benefit of this approach corrects the worse deficiency ofother semantic strategies proposed in the Sentiment Analysis litera-ture the tendency to group together only on the base of their meaningwords that have nothing in common from the syntactic point of view(Le Pesant and Mathieu-Colas 1998)1

1 On this purpose (Elia 2013 p 286) illustrates that

ldquo() the semantic intuition that drives us to put together certain predicates and theirargument is not correlated to the set of syntactic properties of the verbs nor is it laquohelpedraquoby it except in a very superficial way indeed we might say that the granular nature of thesyntax in the lexicon would be an obstacle for a mind that is organized in an efficient and

39

40 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ldquoClasses drsquoobjetsrdquo is the expression introduced by Gross (1992a) to in-dicate the linguistic device that allow the language formalization fromboth a syntactic and semantic point of view Arguments and Predi-cates are classified into subsets which are semantically consistent andthat also share lexical and grammatical properties (De Bueriis andElia 2008)According with (Gross 2008 p 1)

ldquoSi un mot ne peut pas ecirctre deacutefini en lui-mecircme crsquoest-agrave-direhors contexte mais seulement dans un environnement syn-taxique donneacute alors le lexique ne peut pas ecirctre seacutepareacute de lasyntaxe crsquoest-agrave-dire de la combinatoire des mots La seacuteman-tique nrsquoest pas autonome non plus elle est le reacutesultat des eacuteleacute-ments lexicaux organiseacutes drsquoune faccedilon deacutetermineacutee (distribu-tion) Qursquoil est soit ainsi est confirmeacute par les auteurs de dic-tionnaires eux-mecircmes qui timidement et sans aucune meacuteth-ode notent pour un preacutedicat donneacute le ou les arguments quipermettent de seacuteparer un emploi drsquoun autre Pas de seacuteman-tique sans syntaxe donc crsquoest-agrave-dire sans contexterdquo2

SentIta is a sentiment lexical database that directly aims to apply theLexicon-Grammar theory starting from its basic hypothesis the min-imum semantic units are the elementary sentences not the words(Gross 1975)Therefore in this work the lemmas collected into the dictionaries andtheir Semantic Orientations are systematically recalled and computedinto a specific context through the use of local grammars On the baseof their combinatorial features and co-occurrences contexts the Sen-

rigorous logic wayrdquo

2ldquoIf a word can not be defined in itself namely out of context but only into a specific syntactic environ-ment then the lexicon can not be separated from the syntax that is to say from the word combinatoricsSemantics is not autonomous anymore it is the result of lexical elements organized into a determinedway (distribution) This is confirmed by the dictionary authors themselves that timidly and without anymethod note for a specific predicate the argument(s) that allow to separate one use from another Nosemantics without syntax that means without contextrdquo Authorrsquos translation

31 LEXICON-GRAMMAR OF SENTIMENT EXPRESSIONS 41

tIta lexical items can take the following shapes (Buvet et al 2005 Elia2014a)

operators the predicates that can be verbs nouns adjectives ad-verbs multiword expressions prepositions and conjunctions

arguments the predicate complements that can be nominal andprepositional groups or entire clauses

As already specified in Section 22 the sentiment words and other ori-ented Atomic Linguistic Units (ALU) are listed into Nooj electronicdictionaries while the local grammars used to manipulate their po-larities are formalized thanks to Finite State AutomataElectronic dictionaries have then been exploited in order to list andto semantically and syntactically classify into a machine readableformat the sentiment lexical resources The computational powerof Nooj graphs has instead been used to represent the accep-tancerefuse of the semantic and syntactic properties through the useof constraint and restrictionsChapter 2 underlined the role played in the elementary sentences bythe Predicates and the one assumed by its arguments According toLe Pesant and Mathieu-Colas (1998)

ldquoLa structure preacutedicatarguments semble plus opeacuteratoire queles deacutecoupages binaires classiques (sujetpreacutedicat)[] Crsquoest le preacutedicat qui deacutetermine le nombre de positionsconstitutives de la phraserdquo 3

Predicates must not be identified only with the category of verbs butalso with predicative nouns and adjectives that enter into supportverb structuresIn the subgroup of sentiment expressions starting from their syntacticstructures Gross identified two semantic types

3The predicateargument structure seems to be more operating than the classical binary subdivision(subjectpredicate) [] It is the predicate that determines the number of positions that constitute thesentence Authorrsquos translation

42 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1 sentences with two arguments where one is the person and theother is the feeling (eg Maria egrave angosciata ldquoMaria is anguishedrdquo)

2 sentences with three arguments that differs from the first one forthe presence of a stimulus that triggers the feeling into anotherperson (eg Gianni angoscia Maria ldquoGianni anguishes Mariardquo)

The notation for their representation is the same as the one usedfor the Semantic Predicates (Gross 1981) The type 1 can be formal-ized by the following formula

1 P(senth) eg P(angoscia Maria)

in which the variable h is function of a sentiment Sent through a pred-icative relationshipThe type 2 instead can be expressed by the following formula

2 Caus(s sent h) eg Caus(Gianni angoscia Maria)

in which a sentiment Sent is caused by a stimulus s on a person hBoth of them are outcomes of the following assumption ldquoun senti-ment est toujours attacheacute agrave la personne qui lrsquoeacuteprouverdquo4 (Gross 1995p 1) At first glance the problem connected to sentiment expressionsseems to be easy to define and solve Actually upon a deeper analysisit presents its challenging aspects the quantity and the variability ofsentiment expressions which can vary both from a transformationaland from a morpho-syntactic point of view As displayed below a sen-timent expressed by a word can take the shape of verbs nouns ad-jectives or adverbs (Gross 1995 DrsquoAgostino 2005 DrsquoAgostino et al2007)

V angosciare ldquoto anguishrdquo angosciarsi ldquoto become anguishedrdquo

N angoscia ldquoanguishrdquo

A angosciante ldquoanguishingrdquo angosciato angoscioso ldquoanguishedrdquo

ADV angosciatamente angosciosamente ldquowith anguishrdquo4ldquoA sentiment is always attached to the person that feels itrdquo Authorrsquos translation

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 43

Such groups of words related one another by the derivational mor-phology under a common root (eg angosc-) represent from the LGpoint of view classes of paraphrastic equivalence or also paraphras-tic constellations (DrsquoAgostino 1992) These concepts are crucial whenapplied to the Sentiment Analysis field because the connection of thewords displayed above is related to a formal principle that regards thesentence structure and also a semantic principle that concerns thedistributional properties and the semantic roles of the arguments se-lected by the predicates (DrsquoAgostino 1992)Nevertheless from a syntactic point of view Gross (1995) noticed thatthe terms of sentiment can occupy all the different nominal adjecti-val verbal or adverbial positions within the same sentence structure

32 Literature Survey on Subjectivity Lexicons

The most used approaches in the Sentiment Analysis field can be di-vided and summarized in three main lines of research lexicon-basedmethods learning and statistical methods and hybrid methodsThe first ones coincide with the strategy that has been chosen to carryout the present workLexicon-based approaches always start from the following assump-tion the text sentiment orientation comes from the semantic orien-tations of words and phrases contained in itThe most commonly used SO indicators are adjectives or adjectivephrases (Hatzivassiloglou and McKeown 1997 Hu and Liu 2004Taboada et al 2006) but recently it became really common the useof adverbs (Benamara et al 2007) nouns (Vermeij 2005 Riloff et al2003) and verbs as well (Neviarouskaya et al 2009a)Hand-built lexicons are definitely more accurate than theautomatically-built ones especially in cross-domain SentimentAnalysis tasks Nevertheless to manually draw up a dictionary is

44 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

considered a painstakingly time-consuming activity (Taboada et al2011 Bloom 2011) This is why the presence of a large number ofstudies on automatic polarity lexicons creation and propagation canbe observed in literature On this topic it must be said that automaticdictionaries seem to be more unstable but usually larger than themanually built ones (see Tables 31 and 32) Size anyway does notmean always quality it is common for these large dictionaries tohave scarcely detailed information a large amount of entries coulddenote fewer details in description eg the Maryland dictionary(Mohammad et al 2009) does not specify the entriesrsquo part of speechor instead could mean more noiseAmong the state of the art methods used to build and test thosedictionaries we mention the Latent Semantic Analysis (LSA) (Lan-dauer and Dumais 1997) bootstrapping algorithms (Riloff et al2003) graph propagation algorithms (Velikovich et al 2010 Kaji andKitsuregawa 2007) conjunctions (and or but) and morphologicalrelations between adjectives (Hatzivassiloglou and McKeown 1997)Context Coherency (Kanayama and Nasukawa 2006a) distributionalsimilarity (Wiebe 2000) etc5Word Similarity is a very frequently used method in the dictionarypropagation over the thesaurus-based approaches Examples are theMaryland dictionary created thanks to a Roget-like thesaurus and ahandful of affixe (Mohammad et al 2009) and other lexicons basedon WordNet like SentiWordNet built on the base of quantitativeanalysis of glosses associated to synsets (Esuli and Sebastiani 20052006a) or other lexicons based on the computing of the distancemeasure on WordNet (Kamps et al 2004 Esuli and Sebastiani 2005)Pointwise Mutual Information (PMI) using Seed Words has beenapplied to sentiment lexicon propagation by Turney (2002) Turneyand Littman (2003) Rao and Ravichandran (2009) Velikovich et al(2010) Gamon and Aue (2005) Seed words are words which arestrongly associated with a positivenegative meaning such as eccel-

5More details about the lexicon propagation will be given in Section 351

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 45

lente (ldquoexcellentrdquo) or orrendo (ldquohorriblerdquo) by which it is possible tobuild a bigger lexicon detecting other words that frequently occuralongside them It has been observed indeed that positive wordsoccur often close to positive seed words whereas negative words arelikely to appear around negative seed words (Turney 2002 Turneyand Littman 2003)PMI could be easily calculated in the past using the Web as a corpusthanks to the AltaVistarsquos NEAR operator Today one must be contentwith the Google AND operatorLearning and statistical methods for Sentiment Analysis intent usuallymake use of Support Vector Machine (Pang et al 2002 Mullen andCollier 2004 Ye et al 2009) or Naiumlve Bayes classifiers (Tan et al 2009Kang et al 2012)In the end as regards the hybrid methods it has to be cite the workof Read and Carroll (2009) Li et al (2010) Andreevskaia and Bergler(2008) Dang et al (2010) Dasgupta and Ng (2009) Goldberg and Zhu(2006) Prabowo and Thelwall (2009) and Wan (2009)6A brief summary about the nature and the size of the most popularlexicons for the Sentiment Analysis is given in Tables 31 and 32A clarification needs to be done about the distinction between Polar-

ity lexicons and Affect lexicon The first ones are used in subjectivitydetection or in polarity and intensity classification systems while thelatter going beyond the dichotomy positivenegative refer to a morefine-grained emotion classificationIn the following paragraphs we will firstly describe some of the mostfamous Polarity lexicon and then we will briefly cite some databasesfor affect classificationAnother distinction will be made among the resources conceived forthe English language and the ones created for other languages TheItalian language will especially be considered

6These contributions will be deepened in Section 511

46 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Dictionary Author

Year

En

trie

s

Lan

guag

e

SO-CAL lexicon Taboada 2011 5000+ EngAFINN-111 Hansen 2011 2400+ EngSentilex Silva 2010 7000+ PortQ-WordNet 30 Agerri et al 2010 15500+ EngDAL-2009 Whissell 2009 8700+ EngMPQA lexicon Wilson et al 2005 8000+ EngHu-Liu lexicon Hu et al 2004 6800+ EngHatzivassiloglou lexicon Hatzivassiloglou 1997 15400+ pairs EngGeneral Inquirer Stone 1961 9800+ Eng

Table 31 Manually built Polarity Lexicons

Dictionary Author

Year

En

trie

s

Lan

guag

eSentix Basile Nissim 2013 59700+ ItaSentiWordNet Baccianella et al 2010 38000+ EngWeb-generated lexicon Velikovich et al 2010 178000+ EngMaryland dictionary Mohammad et al e 2009 76000+ Eng

Table 32 (Semi-)Automatically built Polarity Lexicons

321 Polarity Lexicons

SO-CAL dictionary Taboada et al (2011) due to the low stability ofthe automatically generated lexical databases manually devel-oped the SO-CAL dictionary They hand tagged with an evalu-ation scale that ranged from +5 to -5 the semantically orientedwords they found into a variety of sources

bull the multi-domain collection of 400 reviews belonging to dif-ferent categories described in Taboada and Grieve (2004)Taboada et al (2006)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 47

bull 100 movie reviews from the Polarity Dataset of Pang et al(2002) Pang and Lee (2004)

bull the whole General Inquirer dictionary (Stone et al 1966)

The result was a dictionary of 2252 adjectives 1142 nouns 903verbs and 745 adverbs The adverb list has been automaticallygenerated by matching adverbs ending in ldquo-lyrdquo to their poten-tially corresponding adjective Moreover also a set of multiwordexpressions (152 phrasal verbs eg ldquoto fall apartrdquo and 35 intensi-fier expressions eg ldquoa little bitrdquo) have been taken into accountIn case of overlapping between a simple word (eg ldquofunrdquo +2) anda multiword expression (eg ldquoto make fun ofrdquo -1) with differentpolarity the latter possess the higher priority in the annotationprocess

AFINN It is a list of English words that associates 1446 words with avalence between +5 (positive) and -5 (negative) (Hansen et al2011) AFINN-111 that is its newest version includes 2477words and phrases in its list The drawing of the list started froma a set of obscene words and then gradually extended by examin-ing Twitter posts This has lead to a bias of 65 towards negativewords compared to positive words (Nielsen 2011)

Q-WordNet It is a lexical resource consisting of WordNet senses auto-matically annotated by positive and negative polarity (Agerri andGarciacutea-Serrano 2010) It is grounded on the hypothesis that if apositive synset is matched in a gloss then its synset can be alsoannotated as positive Exceptions are the cases under the scopeof a negation that are classified as the opposite of the matchedlemmaWhat differentiates Q-WordNet from SentiWordNet is the factthat positivenegative labels are attributed to word senses at acertain level by means of a graded classification

Multi-Perspective Question Answering Lexicon The MPQA subjectiv-

48 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ity lexicon (Wiebe et al 2004 Wilson et al 2005) is comprised by8000+ subjectivity clues (words and phrases that may be used toexpress private states) annotated by type (strongsubj or weakly-subj) and prior polarity Subjectivity clues and contextual fea-tures have been learned from two different kinds of corpora asmall amount of data manually annotated at the expression-level from the Wall Street Journal and newsgroup data and alarge amount of data with existing document-level annotationsfrom the Wall Street Journal The focus is on three types of sub-jectivity clues

bull hapax legomena words that appeared just once in the cor-pus

bull collocations identified through fixed n-grams

bull adjective and verb features identified using the results of amethod for clustering words according to distributional sim-ilarity

Hu-Liu lexicon The Hu and Liu (2004) wordlist comprises around6800 English items (2006 positive and 4783 negative) includ-ing slang misspellings morphological variants and social mediamark up

Hatzivassiloglou Lexicon Hatzivassiloglou and McKeown (1997)starting from a list of 1336 manually labeled adjectives (657 pos-itive and 679 negative terms) demonstrated that conjunctionsbetween adjectives can provide indirect information about theirorientation In detail the authors extracted the conjunctionsthrough a two-level finite-state grammar able to cover modifi-cation patterns and noun-adjective apposition and tested theirmethod on 21 million word from the Wall Street Journal corpusThis way they collected 13426 conjunctions of adjectives furtherexpanded to a total of 15431 conjoined adjective pairs

Dictionary of Affect in Language The Whissel (1989) Whissell (2009)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 49

Dictionary of Affect (DAL) lists about 4500 English words Itsmore recent version (Whissell 2009) is provided with 8742 wordsmanually annotated fro their Activation Evaluation ImageryWhissell (2009)

General Inquirer It is an IBM 7090 program system developed at Har-vard in the spring of 1961 for content analysis research problemsin behavioral sciences (Stone et al 1966 Stone and Hunt 1963)It is based on the idea that through the analysis of many vari-ables at once it is often possible to discover trends that can ap-pear ldquolatent to casual observationThe description of the ldquosystematicrdquo content analysis processesmust be as ldquoobjectiverdquo as possible the features of texts must beldquomanifestrdquo and the results ldquoquantitativerdquoThe lexicon attaches syntactic semantic and pragmatic infor-mation to part-of-speech tagged words Among others the cate-gories related to polarity measures are the ones pertaining to thethree semantic dimensions of Osgood (1952)

Positive 1915 words of positive outlook

Negative 2291 words of negative outlook

Strong 1902 words implying strength

Weak 755 words implying weakness

Active 2045 words implying an active orientation

Passive 911 words indicating a passive orientation7

SentiWordNet Esuli and Sebastiani (2006a) developed the Senti-WordNet 10 lexicon based on WordNet 20 (Miller 1995) by auto-matically associating each WordNet synset to three scores Obj(s)for objective terms and Pos(s) and Neg(s) for positive and nega-tive terms

7Table 31 refers to the General Inquirer size only in terms of sentiment words

50 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Every score ranges from 00 to 10 and their sum is 10 for eachsynset The values are determined on the base of the propor-tion of eight ternary classifiers (with similar accuracy levels butdifferent classification behavior) that quantitatively analyze theglosses associated with every synset and assign them the properlabelWhat differentiates SentiWordNet from other sentiment lexiconsis the fact that the weighting process focuses on synonym setsrather than on simple terms The researchers intuitively ob-serve that different senses of the same term could have differentopinion-related propertiesSentiWordNet 30 (Baccianella et al 2010) improves SentiWord-Net 10 The main differences between them are the version ofWordNet they annotate (30 for SentiWordNet 30) and the algo-rithm used to annotate WordNet that in the 30 version alongwith the semi-supervised learning step also includes a random-walk step that perfects the scores

Velikovich Web-generated lexicon This dictionary of sentiment thathas been built through graph propagation algorithms and lexi-cal graphs counts 178104 entries which include not only sim-ple words but also spelling variations slang vulgarity and mul-tiword expressions (Velikovich et al 2010)

Maryland dictionary The procedure on which Mohammad et al(2009) built the Maryland lexicon can be split into two steps theidentification of a seed set of positive and negative words and thepositive and negative annotation of the words synonymous in aRoget-like thesaurusEleven antonym-generating affix patterns have been used to gen-erate overtly marked words and their unmarked counterpartsThis way such patterns generated 2692 pairs of valid Englishwords listed in the Macquarie ThesaurusThe main innovation introduced by Mohammad et al (2009) is

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 51

the autonomy from the manual annotation of any term in thedictionary even if it could be used when available to improveboth the correctness and the coverage of its entries

322 Affective Lexicons

SentiSense SentiSense (de Albornoz et al 2012) is a semi-automatically developed affective lexicon which consists of5496 words and 2190 synsetsSimilarly to SentiWordNet or WordNet-Affect it associates emo-tional meanings or categories to concepts (instead of terms) fromthe WordNet lexical database and can be used for both polarityand intensity classification and emotion identificationSynsets are labeled with a set of 14 emotional categories whichare also related by an antonym relationships Examples are dis-gustlike lovehate sadnessjoy) ambiguous anger anticipa-tion calmness despair disgust fear hate hope joy like lovesadness surprise

SentiFul and the Affect Database Neviarouskaya et al (2009b 2011)developed the SentiFul lexicon by automatically expanding thecore of sentiment lexicon through many different strategies

bull synonymy relation

bull direct antonymy relation

bull hyponymy relations

bull derivation and scoring of morphologically modified words

bull compounding of sentiment-carrying base components

In summary 4190 new words connected to the known onesthrough semantic relations have been automatically extractedfrom WordNet and 4029 lemmas have been automatically de-

52 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

rived with the morphological method The starting point of thework has been the Affect Database of Neviarouskaya et al (2007)that contained 364 emoticons 337 popular acronyms and abbre-viations (eg bl ldquobelly laughingrdquo) words standing for commu-nicative functions (eg ldquowowrdquo ldquoouchrdquo etc) 112 modifiers (egldquoveryrdquo ldquoslightlyrdquo etc) and 2438 direct and indirect emotion-related entries (1627 of which taken from WordNet-Affect)In the Affect Database emotion categories (anger disgust fearguilt interest joy sadness shame an surprise) and intensity (from00 to 10) was manually assigned to each entry by three indepen-dent annotators

Appraisal Lexicon The appraisal lexicon (Argamon et al 2009) hasbeen built through semi-supervised learning on a set of WordNetglosses containing adjectives and adverbs among which only asmall subset of the training data was manually labelledThe construction of the lexicon is grounded on the AppraisalTheory that is a linguistic approach for the analysis of subjec-tive language In this framework several sentiment-related fea-tures are assigned to relevant lexical items The feature set thatallows more subtle distinctions than the most used binary classi-fication PositiveNegative includes the labels orientation (Posi-tive or Negative) attitude type (Affect Appreciation Judgment)and force (Low Median High or Max)

WordNet-Affect WordNet-Affect is another extension of WordNetfor the lexical representation of affective knowledge (Strappar-ava et al 2004) Thanks to the synset model it puts in relationa set of affective concepts with a number of affective words Ina first step in the WordNet-Affect creation the affective corehas been created starting from almost 2000 terms directly orindirectly referring to mental states Lexical and affective in-formation have then been added to each item by creating forthem specific frames In the end the senses of terms have been

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 53

mapped to their respective synsets in WordNet The synsets havebeen specialized with a hierarchically organized set of emotionalcategories (namely emotion mood trait cognitive state physi-cal state edonic signal emotion-eliciting situation emotional re-sponse behaviour attitude sensation) Furthermore the emo-tional valences have been distinguished by four additional a-labels positive negative ambiguous and neutral (Strapparavaet al 2006)WordNet-Affect contains 2874 synsets and 4787 words (Strap-parava et al 2004)

323 Subjectivity Lexicons for Other Languages

3231 Italian Resources

The largest part of the state of the art works on polarity lexicons forSentiment Analysis purposes focuses on the English language ThusItalian lexical databases are mostly created by translating and adapt-ing the English ones SentiWordNet and WordNet-Affect among oth-ers Basile and Nissim (2013) merged the semantic information be-longing to existing lexical resources in order to obtain an annotatedlexicon of senses for Italian Sentix (Sentiment Italian Lexicon) Ba-sically MultiWordNet (Pianta et al 2002) the Italian counterpart ofWordNet (Miller 1995 Fellbaum 1998) has been used to transfer po-larity information associated to English synsets in SentiWordNet toItalian synsets thanks to the multilingual ontology BabelNet (Navigliand Ponzetto 2012)Every Sentixrsquos entry is described by information concerning its part ofspeech its WordNet synset ID a positive and a negative score fromSentiWordNet a polarity score (from -1 to 1) and an intensity score(from 0 to 1)The dictionary contains 59742 entries for 16043 synsets Details

54 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

about its composition are reported in Table 33

PoS Lemmas Synsetsnoun 52257 12228adjective 3359 1805verb 2775 1260adverb 1351 750all 59742 16043

Table 33 Sentix Basile and Nissim (2013)

Moreover Steinberger et al (2012) verified a triangulation hypoth-esis for the creation of sentiment dictionaries in many languagesnamely English Spanish Arabic Czech French German Russian andalso ItalianThe starting point is the semi-automatic generation of two pilot sen-timent dictionaries for English and Spanish that have been automat-ically transposed into the other languages by using the overlap of thetranslations (triangulation) and then manually filtered and expandedAs regards the Italian lexicon the authors collected 918 correctly tri-angulated terms and 1500 correctly translated terms from the startingdictionariesHernandez-Farias et al (2014) achieved good results in the Sen-tiPolC 2014 task by semi-automatically translating in Italian differ-ent lexicons namely SentiWordNet Hu-Liu Lexicon AFINN LexiconWhissel Dictionary etcBaldoni et al (2012) proposed an ontology-driven approach to Sen-timent Analysis where tags and tagged resources are related to emo-tions They selected representative Italian emotional words and usedthem to query MultiWordNet The synsets connected to these lem-mas were then processed with WordNet-Affect in order to populatethe emotion ontology only with the words belonging to synsets thatrepresented affective information The resulting ontology containsabout 450 Italian words referring to the 87 emotional categories of On-

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 55

toEmotion an ontology conceived for the categorization of emotion-denoting words Furthermore thanks to the SentiWordNet databaseevery synset has been associated to the neutral positive or negativescores

3232 Sentiment Lexical Resources for Different Languages

SentiLex-PT (Silva et al 2010 2012) is a sentiment lexicon for Por-tuguese made up of 7014 lemmas distributed in this way

bull 4779 adjectives

ndash eg castigadoPoS=AdjTG=HUMN0POLN0=-1ANOT=JALC

bull 1081 nouns

ndash eg aberraccedilatildeoPoS=NTG=HUMN0POLN0=-1ANOT=MAN

bull 489 verbs

ndash eg enganarPoS=VTG=HUMN0N1POLN0=-1POLN1=0ANOT=MAN

bull 666 verb-based idiomatic expressions

ndash eg engolir em secoPoS=IDIOMTG=HUMN0POLN0=-1ANOT=MAN

The sentiment entries are all human nouns modifiers compiled fromdifferent publicly available corpora and dictionariesThe tagset used to describe the attributes of the lemmas is explainedbelow

bull part-of-speech (ADJ N V and IDIOM)

bull polarity (POL) that ranges between 1 (positive) and -1 (negative)

bull target of polarity (TG) which corresponds to a human subject(HUM)

bull polarity assignment (ANOT) that specifies if the annotation hasbeen performed manually (MAN) or automatically by the Judg-ment Analysis Lexicon Classifier (JALC)

56 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Differently from Silva et al (2010 2012) that focused on thedomain-specific features of the opinion lexicon of social judgmentSouza et al (2011) proposed for the Portuguese language a domain-independent lexicon composed by 7077 polar words (adjectivesverbs and nouns) and expressionsAs regards the dictionary population the authors applied three differ-ent methods

bull corpus based approach applied on 1317 documents annotatedusing the Pointwise Mutual Information

bull thesaurus based approach on 44077 words and their semanticrelations from the TEP thesaurus (Maziero et al 2008)

bull translation based from the Liursquos English Opinion Lexicon (Huand Liu 2004)

Other contribution that deserve to be cited are Perez-Rosas et al(2012) for Spanish Mathieu (2006 1999a 1995) for French Clematideand Klenner (2010) Waltinger (2010) and Remus et al (2010) for Ger-man Abdul-Mageed et al (2011) Abdul-Mageed and Diab (2012) forArabic and Chetviorkin and Loukachevitch (2012) for RussianAs regards Multilingual Sentiment Analysis we can mention Balahurand Turchi (2012) Steinberger et al (2012) Pak and Paroubek (2011)and Denecke (2008) among others

33 SentIta and its Manually-built Resources

In this Section we will go in depth in the description of SentIta the LGbased Sentiment lexicon for the Italian language that has been semi-automatically built on the base of the richness of both the Italian mod-ule of Nooj and the Italian Lexicon-Grammar resourcesThe tagset used for the Prior Polarity annotation Osgood (1952) anno-tation of the resources is composed of four tags

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 57

Lexical Item Translation Tag Description Scoremeraviglioso wonderful +POS+FORTE Intensely Positive +3colto cultured +POS Positive +2rasserenante calming +POS+DEB Weakly Positive +1nuovo new - Neutral 0insapore flavourless +NEG+DEB Weakly Negative -1cafone boorish +NEG Negative -2disastroso disastrous +NEG+FORTE Intensely Negative -3

Table 34 SentIta Evaluation tagset

Lexical Item Translation Tag Description Scoregrande big +FORTE Intense +velato veiled +DEB Weak -

Table 35 SentIta Strength tagset

POS positive

NEG negative

FORTE intense

DEB weak

Such labels if combined together can generate an evaluation scalethat goes from -3 to +3 (see Table 34) and a strength scale that rangesfrom -1 to +1 (see Table 35)Neutral words (eg nuovo ldquonewrdquo with score 0 in the evaluation scale)have been excluded from the lexiconThe main difference between the words listed in the two scales is thepossibility to use them as indicators for the subjectivity detection taskBasically the words belonging to the evaluation scale are ldquoanchorsrdquothat begin the identification of polarized phrases or sentences whilethe ones belonging to the strength scale are just used as intensitymodifiers (see Section 42)

Details about the lexical asset available for the Italian language issummarized in Table 36 Here we report all the items contained in

58 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Category Entries Example TranslationAdjectives 5381 allegro cheerfulAdverbs 3693 tristemente sadlyCompound Adv 774 a gonfie vele full steam aheadIdioms 577 essere in difetto to be in faultNouns 3215 eccellenza excellencePsych Verbs 604 amare to loveLG Verbs 651 prendersela to feel offendedBad words 182 leccaculo arse lickerTot 15077 - -

Table 36 Composition of SentIta

SentIta including the ones automatically derived (namely adverbsand nouns) that will be described in detail in Section 35Because hand-built lexicons are more accurate than the automaticallybuilt ones (Taboada et al 2011 Bloom 2011) we started the creationof the SentIta database with the manual tagging of part of the lemmascontained in the Nooj Italian dictionaries In detail adjectives andbad words have been manually extracted and evaluated starting fromthe Nooj Italian electronic dictionary of simple words preservingtheir inflectional (FLX) and derivational (DRV) properties Moreovercompound adverbs (Elia 1990) idioms (Vietri 1990 2011) psychverbs and other LG verbs (Elia et al 1981 Elia 1984) have beenweighted starting from the Italian LG tables in order to maintain thesyntactic semantic and transformational properties connected toeach one of them

331 Semantic Orientations and Prior Polarities

In the Sentiment Analysis field the Semantic Orientation is a sub-jectivity and an opinion measure that weights the polarity (posi-tivenegative) and the strength (intenseweak) of an opinion Other

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 59

concepts used to refer to SO are sentiment orientation polarity ofopinion or opinion orientation (Taboada et al 2011 Liu 2010)The Prior Polarity instead refers to the individual wordsrsquo SemanticOrientation and differs from the SO because it is always independentfrom the context (Osgood 1952) The second concept generally usedin sentiment dictionary population has been exploited also to manu-ally evaluate the words contained in the simple word dictionary of theItalian module of Nooj (Sdic_itdic)Before the description of the annotation process begins it must beclarified that at this stage of the work the multiple meanings thata single word can possess have not been taken into account In thissense SentIta is different from the resources in which Prior Polaritiesare assigned to ldquoconceptsrdquo rather than to mere words (eg SentiWord-Net) Such procedure gives rise to the need to reconstruct the Prior Po-larities starting from the various word meanings transforming themactually in Posterior Polarities (Gatti and Guerini 2012) An exam-ple is the word ldquocoldrdquo that possesses different meanings and conse-quently different polarities if associated to the words ldquobeerrdquo (with alow temperature neutral) or ldquopersonrdquo (emotionless weakly negative)(Gatti and Guerini 2012)From our point of view this distinction is redundant in a work that in-cludes a syntactic module We strongly support the idea that only thePrior Polarities must be contained in the lexicon and that the contextshould be used to compute them in a parallel module that is the syn-tactic oneIndeed our purpose is to reduce at the same time the presence of am-biguity the computational time and the size of the electronic dictio-naries Therefore the annotators of SentIta simply labeled the wordsof Sdic_it according to the meaning they thought dominant for eachword If out of any context the word ldquocoldrdquo can not be unequivo-cally associated to a positive or negative orientation we preferred toattribute it a neutral polarity

60 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

332 Adjectives of Sentiment

The annotation of the Sdic_it started with 34000+ adjectives byconsidering each word apart from any context The first decisionis binary and regards the choice between the positive or negativepolarity so the fist tags +POS or +NEG (that respectively correspondto the score +2 -2) are attached to the adjective under exam as shownin the following examples

coltoA+FLX=N88+DRV=ISSIMON88+POS

cafoneA+FLX=N88+DRV=ISSIMON88+NEG

A fine grained classification approach must include during thePrior Polarity attribution also the determination of the intensity ofthe oriented words The strength of the SO is therefore definedwith the combined use of the POSNEG tags and the FORTEDEBlabels The last two respectively increase or decrease of one point theintensity of the sentiment items This way as shown in Table 34 wecould take into account three different degrees of positivity and asmuch levels of negativity scoresAs we will explain better in Section 42 we selected among the ad-jectives of Sdic_it a list of about 650 words that did not receive anypolarity tag but that have just been labeled with information abouttheir intensity This is the case of the items reported below which arejust examples from the SentIta Strenght list

grandeA+FLX=N79+DRV=GRANDEN88+DRV=GRANDE-CN79+FORTE

velatoA+FLX=N88+DRV=ISSIMON88+DEB

Such words are not used to locate subjectivity in texts but they canbecome really advantageous when the task is to compute the actualorientation of a phrase The words endowed with the tag +FORTEare called intensifiers and the ones marked with +DEB downtoners(Taboada et al 2011) Their function is respectively to increase or

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 61

decrease the score of the polarized wordsDetails about the distribution of sentiment tag in the adjective dictio-nary are reported in Table 37Table 38 presents a summary in term of percentage values of thecomposition of the adjective dictionary in SentIta In the upper partof the Table percentages are evaluated on the base of SentIta that ac-counts 5381 adjectives which in the Table are indicated by tot ori-ented adj The differences between the positive and negative adjec-tives and the intensity modifiers are shown hereThe coverage of the SentIta adjectives with respect to the whole num-ber of adjectives contained in Sdic_it at a first glance seems to be verylow but if compared with the other part of speech it becomes moresignificantFor a LG treatment of adjectives see Section 335

Score Adjectives Entries+3 +POS+FORTE 111+2 +POS 1137+1 +POS+DEB 110-1 +NEG+DEB 770-2 +NEG 2375-3 +NEG+FORTE 240+ +FORTE 512- +DEB 126

Table 37 SentIta tag distribution in the adjective list

Adjectives Entries Percentage ()tot pos 1358 25 (SentIta)tot neg 3385 63 (SentIta)tot int 638 12 (SentIta)tot oriented adj 5381 16 (Sdic_it)tot neutral adj 28664 84 (Sdic_it)tot adj Sdic_it 34045 100

Table 38 Adjectives percentage values in SentIta

62 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3321 A Special Remark on Negative Words in Dictionaries and Positive Biasesin Corpora

The most important aspect to notice in Table 38 is the fact that thepositive items cover just the 25 of the total amount of oriented ad-jectives while with a percentage of 63 negative adjectives predom-inate the dictionaryJust as in SentIta also other sentiment lexicons present an excess ofnegative items such as the SO-CAL lexicon (Taboada et al 2011)AFINN-111 (Nielsen 2011) and the Maryland dictionary (Mohammadet al 2009)In texts instead exactly the opposite trend can be observed ThePollyanna hypothesis asserts the universal human tendency to usepositive words more frequently than negative ones through the pop-ular claim ldquopeople tend to look on (and talk about) the bright side ofliferdquo (Boucher and Osgood 1969 p 1)The experimental results of Montefinese et al (2014) and Warrineret al (2013) confirmed the so-called linguistic positivity bias theprevalence of positive words in different languagesGarcia et al (2012) also measured this bias by quantifying in the con-text of online written communication both the emotional contentof words in terms of valence and the frequency of word usage in thewhole indexable web They provided strong evidence that words witha positive emotional connotation are used more oftenAs regards the experiment in the Sentiment Analysis field that tried tosolve the positive bias we mention Taboada et al (2011) that uponnoticing the presence of almost twice as many positive as negativewords in their corpus addressed the problem by increasing the SO ofany negative expression by a fixed amount (of 50) Voll and Taboada(2007) overcame the same problem by shifting the numerical cut-offpoint between positive and negative reviewsIn this research we chose to avoid the introduction of unpredictableerrors due to the difficulty to accurately evaluate to what extent posi-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 63

tive and negative items are differently distributed in written texts

333 Psychological and other Opinion Bearing Verbs

In the present Section we will go through the more complex annota-tion of verbs of sentiment from available Lexicon-Grammar matricesWe will return to the adjectives of sentiment in Section 335 where itwill be addressed the issue of the adjectivalization of the psychologicalpredicates

3331 A Brief Literature Survey on the Classification of Psych Verbs

The difference among the direct (Experiencer subject) and reverse (Ex-periencer subject) representations of thematic roles is underlined inthe extensive body of scientific literature on psych verbs eg Chom-sky (1965) Belletti and Rizzi (1988) Kim and Larson (1989) Jackend-off (1990) Whitley (1995) Pesetsky (1996) Filip (1996) Baker (1997)Martiacuten (1998) Arad (1998) Klein and Kutscher (2002) Bennis (2004)Landau (2010)The thematic roles involved are the following

Experiencer the entity or the person that experiences the reaction

Cause ldquothe thing person state of affairs that causes the reaction(Whitley 1995) ldquoor cognitive judgment in the experiencerrdquo (Kleinand Kutscher 2002)

These two main classes8 correspond to three classes proposed byBelletti and Rizzi (1988) in which the reverse representation is splitaccording to the presence of a direct or indirect object mapped withthe role of experiecer

8These classes of psych verbs are also known as the Fear class (direct) and the Frighten class (reverse)(Jackendoff 1990 Pesetsky 1996 Baker 1997) on the base of the syntactic positions occupied by thethematic roles of the experiencer and the cause (theme)

64 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class I verbs which have the experiencer as subject and the cause asdirect object (eg temere ldquoto be afraid ofrdquo)

Class II verbs which have the cause as subject and the experiencer asdirect object (eg preoccupare ldquoto worryrdquo)

Class III verbs which have the cause as subject and the experienceras indirect object in dative case (eg piacere ldquoto likerdquo9)

Transitivity is taken into account also by Whitley (1995) who in-stead identified four classes of psych verbs for the Spanish

Class I transitive-direct (eg amar ldquoto loverdquo)

Class II intransitive-direct (eg gozar de ldquoto enjoyrdquo)

Class III intransitive-reverse (eg gustar ldquoto likerdquo)

Class IIII transitive-reverse (eg tranquilizar ldquoto appeaserdquo)

3332 Semantic Predicates from the LG Framework

Shifting the focus on the works realized into the Lexicon-Grammarframework we find again the distinction among two types of pos-sible constructions on the base of the syntactic position (subject orcomplement) of the person who feels the sentiment (Mathieu 1999aRuwet 1994) and of course the distinction between the transitiveand intransitive verbsThe (reverse) French verbs of sentiment that enter into the transitivestructure N0 V N1 have been examined by Mathieu (1995)10 In thisstructure N0 the stimulus of the sentiment can be a human concreteor abstract noun or a completive or infinitive clause and N1 theexperiencer of the sentiment is a human noun or also other animated

9Note that if the Italian verb piacere is classified in this class of Belletti and Rizzi (1988) its Englishtranslation ldquoto likerdquo belongs to the first one

10The analysis of Mathieu (1995) is based on the syntactic model of the LG French Table 4

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 65

nouns such as animals or gods (Ruwet 1994 see example 3)

(3)

Luc

Ce t abl eauLe mensong e

Que Luc soi t venuTr embler

i r r i te M ar i e (Mathieu 1995)

ldquo(Luc + This table + The lie + That Luc has come + To tremble) irri-tates Marierdquo

Nevertheless Ruwet (1994) argued that a number of predicatesfrom this class11 select an Num in position N1 that do not play the roleof Experiencer as happens in the example (5) differently from whathappens in (4) According with Ruwet (1994) le ministre of (4) doesnot feel any sentiment he is instead the object of a negative (public)judgment caused by the stimulus expressed by N0

(4) La lecture drsquoHomegravere passionne Maxime (Ruwet 1994)ldquoReading Homer moves Maximerdquo

(5) Ses deacuteclarations intempestives discreacuteditent le ministre (Ruwet 1994)ldquoHis untimely declarations discredit the Ministerrdquo

One of the main purposes of this thesis is to provide a Lexicon-Grammar based database for Sentiment Analysis tools We decided tostart from the LG works on psychological predicates because of theirrichness but we do not look at the Ruwet (1994) objection as a limit forour data collection It offers only a wider range of information whichwe associate to our data Sentiment Analysis does not focus only onmental states but searches for every kind of positive or negative state-ment expressed in raw texts therefore also a positive or negative (pub-lic) judgment represents a valuable resource in this fieldThe hypothesis is that the the Predicates under examination could bedistinguished into three subsets the ones indicating a mental state

11 Among the verbs pointed out by Ruwet (1994) there are abuser aigrir anoblir aplatir avilir bernerciviliser compromettre consacrer corrompre couler deacutedouaner deacutemasquer deacutenaturer etc

66 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

that select the roles experiencer and causer (amare ldquoto loverdquo odiareldquoto haterdquo) the ones connected to the manifestation of judgments that(following the Sentiment Analysis trend in terminology) involve theroles of opinion holder and target (eg difendere ldquoto defendrdquo con-dannare ldquoto condemnrdquo) and the ones related to physical acts that se-lect the roles patient and agent (eg baciare ldquoto kissrdquo schiaffeggiareldquoto slaprdquo)12 The mapping between the syntactic representations ofactants and the semantic representations of arguments is feasible andvaluable but from a LG point of view becomes a little bit more com-plex if compared with the generalizations described in the previousParagraph The big problem aroused by the LG method is that theSyntax-Semantics mapping can not ignore the idiosyncrasies of thelexiconIn the following paragraphs we will show the distribution of Psycho-logical Predicates among the classes 41 42 43 and 43B (3333) and wewill demonstrate that a large number of predicates evoking judgmentphysical positivenegative acts but also psychological states are dis-tributed among other 28 LG classes every one of which possesses itsown definitional syntactic structure The large number of structuresin which these verbs can enter underlines the importance of a senti-ment database that includes both lexiocn-based semantic and syntac-tic classification parameters

3333 Psychological Predicates

Among the verbs chosen for our sentiment lexicon a prominent posi-tion is occupied by the Predicates belonging to the Italian LG classes41 42 43 and 43B that constitute 47 of the total amount of verbsin SentIta (Table 311) From the lexical items contained in such LGclasses a list of 602 entries has been evaluated and hand-tagged withthe same labels used to evaluate the adjectives

12An experiment realized on these frames is reported in Section 53

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 67

In the following Paragraphs we will describe some of the peculiaritiesof the psych verbs from the classes 41 42 43 and 43B questioning inmany cases their psychological nature We will then briefly examinethe presence of subjective verbs in other 28 LG classes

Psych Verb Translation LG Class Scoreangosciare ldquoto anguish 41 -3piacere ldquoto like 42 +2amare ldquoto love 43 +3biasimare ldquoto blame 43B -2

Table 39 Examples from the Psych Predicates dictionary

LGC

lass

TotC

lass

Psy

chVe

rbs

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

()

41 599 525 136 325 45 19 8842 114 8 4 4 0 0 743 298 39 19 20 0 0 1343B 142 32 11 21 0 0 23Tot 1153 604 170 368 45 19 52

Table 310 Psych Predicates chosen for SentIta

LG Class 41 Ch F V N1

The verbs from this LG class have as definitional structure N0 V N1in which

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquobut with few exceptions it can be also a human a concrete or anabstract noun

68 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Sen

tIta

Verb

s

TotI

tem

s

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Psych Verbs 604 170 368 45 19Other Verbs 651 189 350 79 33

Tot 1255 359 718 124 52

Table 311 All the Semantic Predicates from SentIta

bull N1 is a human noun (Num)

A great part of these verbs (436 entries) with homogeneous syntacticbehavior can be grouped together also because of a semantic homo-geneity connected to their psychological nature (Elia 1984) examplesof this subset are reported below in their dictionary form

addolorareV+NEG+Sent+FLX=V3+DRV=RI+41

ingelosireV+NEG+DEB+Sent+FLX=V201+41

perturbareV+NEG+Sent+FLX=V3+41

rallegrareV+POS+Sent+FLX=V3+DRV=RI+41

rincuorareV+POS+DEB+Sent+FLX=V3+41

tormentareV+NEG+FORTE+Sent+FLX=V3+DRV=RI+4113

This Psychological subgroup identified in the electronic dictionarythrough the tag +Sent exactly like the French verbs from the LG class4 enters into a reverse transitive psych-verb structure by selectingan Experiencer as object N1 and a Stimulus as N0 as exemplified in (6)

13ldquoTo grieve to make jealous to upset to cheer up to reassure to tormentrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 69

(6) [Sentiment[StimulusLe sue parole]

addolorano[-2]

ingelosiscono[-1]

perturbano[-2]

rallegrano[+2]

rincuorano[+1]

tormentano[-3]

[ExperiencerMaria]]

ldquoHis words (grieve + make jealous + upset + cheer up + reassure +torment) Mariardquo

The objection of Ruwet (1994) on the psychological semantichomogeneity on this class can be applied also to the Italian LG class41 The extract of the dictionary reported below explains it better thanwords

danneggiareV+NEG+Op+FLX=V4+41

esautorareV+NEG+Op+FLX=V3+41

glorificareV+POS+Op+FORTE+FLX=V8+41

ledereV+NEG+Op+FLX=V34+41

miticizzareV+POS+Op+FLX=V3+41

smascherareV+NEG+Op+DEB+FLX=V3+4114

122 items from the class 41 have been selected and marked withthe tag +Op to indicate according with Ruwet (1994) that they selectan N1 that does not plays the role of the Experiencer but that repre-sents just the Target of a positive or negative judgment Differentlyfrom other verbs also tagged with the +Op label (eg condannare ldquotocondemnrdquo) here N0 is not the Opinion Holder but just the Causethat justifies the opinion conveyed by the verb (see example 7) TheOpinion Holder is never expressed by the arguments of the predicatesfrom the class 41 because it can be played in turn by the publicopinion or the people (Ruwet 1994)

14ldquoTo damage to reduce or deprive of the authority to glorify to wrong to make into heroes to un-maskrdquo

70 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(7) [Opinion[CauseLe sue parole]

danneggiano[-2]

esautorano[-2]

glorificano[+3]

ledono[-2]

miticizzano[+2]

smascherarno[-2]

[TargetMaria]]

ldquoHis words (damage + reduce or deprive of the authority + glorify +wrong + make into heroes + unmask) Mariardquo

Another subgroup of the verbs from this class marked in the LGtable as accepting the property V= uso concreto (ldquoV= concrete userdquo)possesses also a concrete meaning This is the case of abbattere ldquotodemolishrdquo (fig ldquoto discouragerdquo) accendere ldquoto lightrdquo (fig ldquoto fire uprdquo)colpire ldquoto hitrdquo (fig ldquoto moverdquo) etcOf course the semantic properties of the figurative verbs are notshared with their concrete homographsThe automatic disambiguation of these metaphors is sometimes verysimple because it can be based on the divergent properties of the dif-ferent LG classes to which the concrete and figurative words belong (89)

Concrete usage (Class 20NR)

(8) Carla accende il fiammifero (Elia 1984)ldquoCarla lights the matchrdquo

Figurative usage (Class 41)

(9) Parlare di rivoluzione accende Arturo (Elia 1984)ldquoTalking about revolution fires up Arturordquo

As exemplified the verb accendere belongs also to the class 20NRin which it does not accept a completive as N0 (Parlare di rivoluzioneaccende il fiammifero ldquoTalking about revolution lights the matchrdquo)neither a human noun as N1 (Carla accende il fratello ldquoCarla lights herbrotherrdquo) or a body part noun (Carla accende la gamba del fratelloldquoCarla lights her brotherrsquos legrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 71

In other cases it proves more difficult but it can be solved as well withthe help of semantically annotated dictionaries This happens in theexamples below in which the fact that this verb is marked in the class20NR as accepting the property Prep N2strum makes it possible toautomatically distinguish its concrete use (10) from the figurative one(11)

Concrete usage

(10) C ar l a accende i l f uoco del l a

stu f ag r i g l i acuci na

ldquoCarla lights the fire of the (stove + grill + cooker)

Figurative usage

(11) C ar l a accende i l f uoco del (l a)

r i vol uzi onecambi amento

passi one

ldquoCarla ignites the fire of (revolution + change + passion)rdquo

LG Class 42 Ch F V Prep N1

The verbs from the class 42 enter into an intransitive reverse psych-verb structure and are defined by the LG formula N0 V Prep N1 where

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquo) butin few cases can be also a human (Num) or a non restricted noun(Nnr)

bull N1 is a human noun (Num) but can be also a body part or anabstract noun of the kind discorso ldquo discourserdquo mente ldquomindrdquomemoria ldquomemoryrdquo (Elia 1984)

Here we find a selection of Psychological Predicates so poor thatcan be entirely reported

72 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

aggradareV+POS+Sent+42+FLX=V403+Prep=a

arridereV+POS+Sent+42+FLX=V34+Prep=a

dispiacereV+NEG+Sent+42+FLX=V37+Prep=a

spiacereV+NEG+Sent+42+FLX=V37+Prep=a

piacereV+POS+Sent+42+FLX=V37+DRV=RI+Prep=a15

The importance of this LG class into SentIta is not given by the(very small) amount of entries it accounts but depend on the ex-tremely high frequency of the verb piacere ldquoto likerdquoHere the prepositions (Prep) have been associated directly in thedictionary in order to avoid ambiguity with other verb usagesThe semantic annotation does not raise any problem as can bededuced from the example (12)

(12) [Sentiment[Stimulus

I l f at to che si f umi

C he si f umiFumar e

I l f umar eI l f umo

]

ag g r ad a[+2]

di spi ace[-2]

pi ace[+2]

a[ExperiencerMaria]]

ldquo(The fact that people smokes + That people smokes + Smoking +The smoke) (pleases + regrets + is appreciated by) Mariardquo

The main difference with the class 41 added to its intransitivity isthe fact that in Italian a more easily accepted form is the permutedone (Elia et al 1981 see example 13)

(13) [SentimentA [ExperiencerMaria] piace[+2] [Stimulusfumare]]ldquoMaria likes smokingrdquo

Again a subset of verbs from this class can be marked as opinionindicator

15ldquoto please to favor to regret to likerdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 73

Both N0 and N1 can be human nounsincombereV+NEG+Op+42+FLX=V21+Prep=loc+N0um+N1um

primeggiareV+POS+Op+FLX=V4+Prep=su+N0um+N1um

contrastareV+NEG+Op+FLX=V3+Prep=con+N0um+N1um

contareV+POS+Op+FLX=V3+Prep=per+N0um+N1um

Just N1 can be a human nounaddirsiV+42+POS+Op+FLX=V263+PRX+Prep=a+N1um

sconvenireV+NEG+Op+FLX=V236+Prep=a+N1um

convenireV+POS+Op+FLX=V236+Prep=a+N1um

costareV+NEG+Op+FLX=V3+Prep=a+N1um

Neither N0 nor N1 can be human nounsdegenerareV+NEG+FORTE+Op+42+FLX=V3+DRV=BILEN79+Prep=in

stridereV+NEG+Op+FLX=V2+Prep=con16

Not every verb of this subset accepts a human noun as N1 (see exam-ples 14 and 15)

(14) [Opinion[TargetQueste idee]str i dono[-2] con

i mi ei obi et t i vil a tua i mmag i ne

lowastM ar i a

]

ldquoThese ideas are at odd with (my purposes + your image + Maria)rdquo

(15) [Opinion[TargetLa situazione]degenera[-3] in un

dr ammacon f l i t tolowastM ar i a

]

ldquoThe situation degenerates into a (drama + conflict + Maria)rdquo

In such cases the Opinion Holder remains implicit but the judg-ment negatively affects the subject of the sentence instead of the N1Sentences like (16) and (17) raise to doubt whether N1 can be theHolder of the opinion expressed by the sentence or not

16ldquoto loom over to excel to hinder to worth to suit to be inconvenient to be convenient to cost todegenerate to clashrdquo

74 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(16) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion HolderArturo]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto Arturordquo

(17) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[convi ene[+2]

cost a[-2]

][Opinion Holderad Arturo]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for Arturordquo

Actually the verb does not provide any information regarding theactive or passive role of N1 in the expression of the opinion Thewriter can be convinced about what ldquomatters is convenient or costsrdquofor Arturo but it remains a belief that is not confirmed in the text (1617) Therefore we chose to assign the Opinion Holder role to the N1

of the class 42 only in the case in which the object corresponds to thepersonal pronouns ldquomerdquo or ldquousrdquo (18 19)

(18) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion Holderme]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto merdquo

(19) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[Opinion Holderci]

[convi ene[+2]

cost a[-2]

]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for usrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 75

LG Classes 43 and 43B N0 V Ch F and N0 V il fatto Ch F

The sentence structure that defines this class is N0 V N1

bull N0 in both the classes 43 and 43B is generally a human noun(Num)

bull N1 is an infinitive clause or a completive one which is direct forthe 43 and introduced by il fatto (Ch F + di) ldquothe fact (that S + of)rdquo)for the 43B

The psychological (Sent) but also the opinion (Op) verbs from theclass 43 share in many cases the acceptance or the refusal of a set ofLG properties

deprecareV+Op+NEG+Class=43+FLX=V8

bestemmiareV+Op+NEG+FORTE+Class=43+FLX=V11

criticareV+Op+NEG+Class=43+FLX=V8

maledireV+Op+NEG+FORTE+Class=43+FLX=V216

tollerareV+Op+NEG+DEB+Class=43+FLX=V3

adorareV+Sent+POS+FORTE+Class=43+FLX=V3

amareV+Sent+POS+FORTE+Class=43+FLX=V3

ammirareV+Sent+POS+Class=43+FLX=V3

odiareV+Sent+NEG+FORTE+Class=43+FLX=V12

rimpiangereV+Sent+NEG+DEB+Class=43+FLX=V3117

As regards the distribution of N0 it must be noticed that all theseverbs accept a human noun but only two of them (condannare ldquotocomndemnrdquo and meritare ldquoto deserverdquo) accept also a not restrictednoun (Nnr not active) or a subjective clause introduced by il fatto (ChF + di) This fact is perfectly consistent with the animated nature ofthe Experiencer with respect to the Psychological verbs (20) Howeverit underlines that this class of verbs selects also an Opinion Holder asN0 (except for the verb meritare)

17ldquoto deprecate to blaspheme to criticize to curse to tolerate to adore to love to admire to haterdquo

76 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(20) [Sentiment[ExperiencerMaria]

[ador a[+3]

odi a[-3]

][Stimulus

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (adores + hates) (Arturo + that Arturo smokes + the fact thatArturo smokes + the smoke)rdquo

(21) [Opinion[Opinion HolderMaria]

[cr i t i ca[-2]

tol ler a[-1]

][Target

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (criticizes + tolerates) (Arturo + that Arturo smokes + thefact that Arturo smokes + the smoke)rdquo

As far as the distribution of N1 is concerned all the psych and opin-ion items accept the pronoun ciograveldquothisrdquo and the Ppv lo ldquoithimherrdquo(apart from inveire ldquoto inveighrdquo) The only verbs that refuse Numas complement are auspicare ldquoto hope forrdquo inveire ldquoto inveighrdquolamentare ldquoto lamentrdquo malignare ldquoto speakthink ill ofrdquo meritare ldquotodeserverdquo agognare ldquoto yearn forrdquo anelare ldquoto long forrdquo deilrare ldquototalk nonsenserdquo and gustare ldquoto tasterdquo The verbs that instead do notaccept a N-um are fewer inveire and delirareThe property N1 dal fatto Ch F is refused by every verb except forapprezzare ldquoto admirerdquo that together with sopportare ldquoto sufferrdquodesiderare ldquoto desirerdquo and bramare ldquoto craverdquo accept also the propertyN1 da N2 Other properties refused in bulk are Ch N0 V essere Comp(save malignare sospettare ldquoto suspectrdquo temere ldquoto be afraid ofrdquo)N1 Agg (save sospettare temere) N1 = Agg Ch F (save sospettare) N0

V essere Cong (save sospettare and malignare) N1 = se F o se F (savebenedire ldquoto bless rdquo apprezzare and gustare) Ch F a N2um (saveapprovare ldquoto approverdquo)Similar observations but with an even higher homogeneity can be

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 77

made on the class 43B that differently from the 43 has the N1 alwaysintroduced by il fatto (Ch F + di)The Psychological (Sent) and the opinionated (Op) verbs from theclass 43B share sometimes the acceptance or the refusal of a set of LGproperties

attaccareV+Op+NEG+Class=43B+FLX=V8

biasimareV+Op+NEG+Class=43B+FLX=V3

lodareV+Op+POS+Class=43B+FLX=V3

osannareV+Op+POS+FORTE+Class=43B+FLX=V3

snobbareV+Op+NEG+Class=43B+FLX=V3

assaporareV+Sent+POS+Class=43B+FLX=V3

pregustareV+Sent+POS+Class=43B+FLX=V3

schifareV+Sent+NEG+FORTE+Class=43B+FLX=V3

sdegnareV+Sent+NEG+FORTE+Class=43B+FLX=V16

spregiareV+Sent+NEG+FORTE+Class=43B+FLX=V418

Here again an N0 different from a human noun can be acceptedjust by three verbs (banalizzare ldquoto trivializerdquo demitizzare ldquoto unideal-izerdquo and pregiudicare ldquoto compromiserdquo)The pronoun ciograveldquothisrdquo the Ppv lo ldquoithimherrdquo and a not human nounare accepted by all the Sent and Op verbs in complement positionwhile the N1um is refused by 2 psychological and 5 opinionated verbsAs regards the properties refused by almost all the verbs we account N1

dal fatto Ch F N1 da N2 (save difendere ldquoto defendrdquo) and Ch F a N2um(save apologizzare ldquoto make an apology forrdquo and vantare ldquoto praiserdquo)Differently from the class 43 in which the passive transformation isnot accepted in 3 cases in the class 43B it is accepted by all the items

18ldquo to attack to blame to praise to snub to taste to foretaste to be disgusted by to be indignant todespiserdquo

78 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3334 Other Verbs of Sentiment

We stated in Paragraph 3332 that almost half of the SentIta verbsdo not coincide with what is defined in literature as ldquopsych verbrdquoMoreover a lot of psych verbs and other opinion-bearing items canbe found in LG classes different from the ones classically identifiedas containing psychological predicates namely 41 42 43 43B (Elia2014a 2013) In detail a large number of verbs that can be used assentiment and opinion indicators is classified over 28 other LG classesdescribed by as many definitional structures This fact could be inter-preted as the failure of the syntactic-semantic mapping hypothesisbut actually into the LG framework the semantic entropy of the lex-icon can be decreased through the identification of classes of wordsthat share sets of lexical-syntactic behaviorsThe main Lexicon-Grammar verbal subclasses are the mentioned be-low

Intransitive verbs LG classes from 1 to 15 on which LG researcherstested the acceptance or the refusal of 545 combinatorial and dis-tributional properties through 23827 elementary sentences

Transitive verbs LG classes from 16 to 40 tested on 361 propertiesamong 19453 sentences

Complement clause verbs (transitive and intransitive) LG classesfrom 41 to 60 tested on 443 properties and 57242 sentences(Elia 2014a)

Table 312 shows the percentage values of opinionated verbsin each one of the Lexicon-Grammar verbal subclasses mentionedabove As we can see the complement clause group contains thelarger number of oriented items When inserted in our electronicdictionaries the lemmas from these LG classes are enriched with se-mantic information concerning the Types (Sent sentiments emo-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 79

LG Class

Verb

usa

ges

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intransitive verbs 401 138 21Transitive verbs 605 205 31Complement clause verbs 760 308 47Tot 1766 651 100

Table 312 Polar verbs in the main Lexicon-Grammar verbal subclasses

tions psychological states Op opinions judgments cognitive statesPhy physical acts) the Orientations (POS NEG) and the Intensities(STRONG WEAK ) of the described lemmasSimilarly to SentiLex (Silva et al 2010 2012) SentIta is provided withthe specification of the argument(s) Ni semantically affected by theorientation of the verbsThe presence of oriented verbs and the distribution of semantic tagsare respectively described in Tables 313 and 314 For a complete pre-sentation of LG verbal classes their composition and their granularitysee Elia (2014a)

334 Psychological Predicatesrsquo Nominalizations

The nominal entities of SentIta that will described in this Sectionare all predicative forms due to their correlation with the verbalpredicates described above eg angosciare ldquoto anguishrdquo angoscialdquoanguishrdquo (DrsquoAgostino 2005)The complex phenomenon related to the combinatorial constraintsbetween sentiment V-n and specific verbs in simple sentences hasbeen already explored by Balibar-Mrabti (1995) and Gross (1995) forthe French language In this regard Gross (1995) specified that ldquoles

80 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Mai

nLG

Cla

ss

LGC

lass

Verb

usa

ge

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intr

ansi

tive

Verb

s 2 122 34 282A 40 9 234 31 12 399 72 35 4910 56 19 3411 80 29 36

Tran

siti

veVe

rbs

18 40 8 2020I 25 1 420NR 83 15 1820UM 145 65 4521 29 2 721A 65 44 6822 63 28 4423R 34 13 3824 72 18 2527 37 10 2728ST 12 1 8

Co

mp

lem

entC

lau

seVe

rbs

44 33 18 5544B 44 20 4545 31 23 7445B 21 16 7647 313 120 3847B 34 12 3549 96 12 1350 51 41 8051 36 19 5353 28 9 3256 73 18 25

Tot - 1766 651 37

Table 313 Polar verbs in other LG verb classes which contain at least one orienteditem

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 81

Mai

nLG

Cla

ss

LGC

lass

Ori

ente

dVe

rbs

Sent Op POS NEG FORTE DEB

Intr

ansi

tive

Verb

s 2 34 7 24 8 26 0 02A 9 4 2 1 8 0 04 12 0 11 5 7 0 09 35 10 25 11 24 0 010 19 4 15 9 10 0 011 29 25 4 0 29 0 0

Tran

siti

veVe

rbs

18 8 0 0 4 4 0 020I 1 0 0 1 0 0 020NR 15 5 8 4 11 0 020UM 65 16 25 11 54 0 021 2 0 2 0 2 0 021A 44 9 14 17 6 11 1022 28 0 28 21 7 0 023R 13 0 0 4 9 0 024 18 0 0 0 1 15 227 10 0 10 10 0 0 028ST 1 0 0 0 1 0 0

Co

mp

lem

entC

lau

seVe

rbs

44 18 0 18 16 2 0 044B 20 2 18 14 6 0 045 23 6 16 7 15 1 045B 16 5 10 7 9 0 047 120 0 120 5 54 43 1847B 12 0 34 4 8 0 049 12 0 12 4 8 0 050 41 4 33 15 22 4 051 19 1 14 10 9 0 053 9 0 9 0 9 0 056 18 1 9 1 9 5 3

tot - 651 99 461 189 350 79 33

Table 314 Distribution of semantic labels into the LG verb classes that contain atleast one oriented item

82 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

restrictions de noms de sentiment agrave certains verbes sont nombreuseset tellement inattendues que dans un premier temps nous les avionstraiteacutees comme des expressions figeacutees crsquoest-agrave-dire comme de simpleslistes19rdquoAs regards Italian works in the LG framework we mention the worksof DrsquoAgostino (2005) DrsquoAgostino et al (2007) and Guglielmo (2009)that respectively explored the lexical micro-classes of the sentimentsira ldquoangerrdquo angoscia ldquoanguishrdquo and vanitagrave ldquovanityrdquo studying theirnominal manifestation in support verb constructionsDifferently from ordinary verbs (discussed here in Sections 333)support verbs do not carry any semantic information this role is justplayed here by the sentiment nouns but they can modulate it to acertain extentIn the present research the Nominalizations of Psychological verbshave been used to manually start the formalization of the SentItadictionary dedicated to nouns It comprehends 1000+ entries andincludes list of 80+ human nouns evaluated and added to this dic-tionary in order to use them in sentences like N0um essere un N1sent(eg Max egrave un attaccabrighe ldquoMax is a cantankerousrdquo)Table 315 shows one example of Psych V-n for each LG class took intoaccount during the annotation of Psychological Predicates

A sample of the entries from this dictionary is given below The LG

Psych verb V-n V-n Translation LG Class Scoreangosciare angoscia ldquoanguish 41 -3piacere piacere ldquopleasure 42 +2amare amore ldquolove 43 +3biasimare biasimo ldquoblame 43B -2

Table 315 Description of the Nominalizations of the Psychological Semantic Pred-icates

class is associated to a given item by the property ldquoClass that let

19ldquoThe restrictions of nouns of sentiment to certain verbs are numerous and so unexpected that at firstwe treated them as frozen expressions that is to say as mere listsrdquo Authorrsquos translation

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 83

the grammar recognize the proper syntactic structures in which thenominalizations occur in real texts

LG Class 41bellezzaN+POS+Class=41+DASOLO+FLX=N41

caloreN+POS+Class=41+FLX=N5

contentezzaN+POS+Class=41+DASOLO+FLX=N41

dolcezzaN+POS+Class=41+DASOLO+FLX=N41

fascinoN+POS+Class=41+DASOLO+FLX=N520

LG Class 42dispiacereN+FLX=N5+42+NEG+DASOLO

dispiacevolezzaN+FLX=N41+42+NEG+DASOLO

dispiacimentoN+FLX=N5+42+NEG+DEB+DASOLO

piacereN+FLX=N5+42+POS+DASOLO

piacevolezzaN+FLX=N41+42+POS+DASOLO21

LG Class 43approvazioneN+FLX=N46+43+POS+DASOLO

benedizioneN+FLX=N46+43+POS+DASOLO

bestemmiaN+FLX=N41+43+NEG+DASOLO

bramosiaN+FLX=N41+43+NEG+DEB+DASOLO

condannaN+FLX=N41+43+NEG+DASOLO22

LG Class 43BdenigrazioneN+FLX=N46+43B+NEG+FORTE+DASOLO

derisioneN+FLX=N46+43B+NEG+DASOLO

dileggioN+FLX=N12+43B+NEG+DASOLO

disdegnoN+FLX=N5+43B+NEG+FORTE+DASOLO

disprezzoN+FLX=N5+43B+NEG+FORTE+DASOLO23

Table 316 presents the details about the distribution of sentimenttags into the Psych V-n dictionary The size of the nouns dictionary is

20 ldquoBeauty warmth happiness kindness charmrdquo21 ldquoSorrow displeasure regret pleasure pleasantnessrdquo22 ldquoApproval blessing curse yearning condemnationrdquo23 ldquoDetraction derision mockery disdain despiserdquo

84 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

LGC

lass

No

un

s

Po

siti

veIt

ems

Neg

ativ

eIt

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

s

41 856 245 542 41 28 7742 11 5 6 0 0 143 170 57 113 0 0 1543B 74 21 53 0 0 7Tot 1111 328 714 41 28 100 100 30 64 4 3 -

Table 316 Nominalizations of the Psych Predicates chosen from SentIta

larger than the the verbsrsquo one This happens because the same pred-icate can possess more than one nominalization (eg the verb amarefrom the LG class 43 is related with the V-n amabilitagrave amore amorev-olezza)

3341 The Presence of Support Verbs

To include V-n in a sentiment lexicon is particularly risky In manycases it is misleading to use them out of any context because theyoften have a specific SO only if used with particular support verbsAmong the Vsup considered we mention the following with theirequivalents (DrsquoAgostino 2005 DrsquoAgostino et al 2007)

bull essere in ldquoto be inrdquo patire di soffrire di ldquoto suffer of stare in ldquotobe inrdquo andare in ldquoto go inrdquo

bull avere ldquoto haverdquo avvertire ldquoto perceiverdquo patire soffrire ldquoto suffertenere ldquoto have sentire provare ldquoto feel subire ldquoto undergordquonutrire covare ldquoto harborrdquo provare un (senso + sentimento) di ldquoto ldquoto feel a (sense + sentiment) of

bull causare ldquoto causerdquo dare ldquoto give suscitare ldquoto raiserdquo provocare

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 85

ldquoto provoke creare ldquoto create alimetare ldquoto instigate stimo-lare ldquoto inspire sviluppare ldquoto develop incutere ldquoto instillrdquo sol-lecitare ldquoto urge tordquo donare regalare offrire ldquoto offerrdquo

Sometimes it is even possible to have a SO switch by changing theword context For example the noun pietagrave ldquopity and ldquocompassionis strongly negative into a fixed expression with the support verb fare(22) but it becomes positive when it co-occurs with the sentiment ex-pression of (23)

(22) Egrave proprio la sceneggiatura che fa pietagrave [-3]ldquoThatrsquos the script that is pitifulrdquo

(23) Mary prova un sentimento di pietagrave [+2]ldquoMary feels a sentiment of compassionrdquo

Systematically evaluating the relationships that exist between eachone of the psych nouns and each one of the support verbs that canbe combined with them through LG tables can bring to an importantconclusion the fact that the support verb variants can influence thepolarity of the V-n with which they occur is not an exception For ex-ample extensions of Vsup like subire ldquoto undergordquo covare ldquoto harborrdquoincutere ldquoto instillrdquo bring a negative orientation with them donare re-galare offrire ldquoto offerrdquo are positively connoted perdere ldquoto loserdquo ab-bandonare ldquoto abandonrdquo lasciare ldquoto leaverdquo cause a polarity switch forwhatever psych noun orientationSupport verbs like the ones cited above can not be simply includedinto a Vsup list but must be inserted in specific paths of a nouns gram-mars able to correctly manage their influence on V-n

3342 The Absence of Support Verbs

Another problem arises when a psych predicative noun from our col-lection presents different semantic behaviors when occurring withoutsupport verbs in real texts This is the case of calore ldquowarmthrdquo from the

86 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

verb riscaldare ldquoto warm uprdquo of the LG class 41 that possesses its psy-chological figurative meaning only together with specific structures(24) but assumes a physical interpretation when it occurs alone (26)Different is the case of contentezza ldquohappinessrdquo that preserves its psy-chological meaning in both the cases (25 27)

(24) Le tue parole invadono[+] Maria di calore[+2] [+3]ldquoyour words warm up Maryrdquo

(25) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoyour words overwhelm Mary with happinessrdquo

(26) Che calore[+2] [0]ldquowhat a heatrdquo

(27) Che contentezza[+2] [2]ldquowhat a joyrdquo

In a banal way we solved the problem by distinguishing in the dic-tionaries and recalling in the grammars the cases in which the nounscan occur alone from the cases in which a specific Vsup context is re-quired to have a particular SO by means of the special tag +DASOLOldquoalonerdquo

3343 Ordinary Verbs Structures

In order to deeper examine how the presence of different verbs canaffect the polarity of the Psychological V-n we translated the list of 61verbs of Balibar-Mrabti (1995) from the French and annotated themwith semantic information The sentence structure which they havebeen conceived for is N0sent V N1um (eg una gioia intensa riempieMax ldquoan intense joy fills Maxrdquo) in which the subject N0 plays the roleof Nsent Then they have been tested into the structure N0 V N1umdi N2sent (eg le tue parole riempiono Max di una gioia intensa ldquoyourwords fill Max with an intense joyrdquo) that in turn is related with the

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 87

passive Max egrave pieno di una gioia intensa ldquoMax is full of an intense joyrdquoThis list of corresponding Italian verbs contains 47 entries because of

the exclusion of the duplicates from the LG class 4124 Among them 11give to the Nsent a negative connotation (eg abbattere ldquoto lose heartrdquooffuscare ldquoto confuserdquo corrodere ldquoto corroderdquo) 6 confer it a positiveorientation (eg rispeldere ldquoto shinerdquo cullare ldquoto clingrdquo illuminareldquotolight uprdquo) 18 intensify their polarity (eg travolgere ldquoldquoto overwhelmrdquordquoattraversare ldquoto undergordquo riempire ldquoto fillrdquo) and 12 simply possess aneutral orientationTheir role can become crucial into two differently oriented sentencesthat make use of the same Nsent like (28a b c)

(28a) Lrsquoamore[+2] tormenta[-3] Maria [-3](28b) Lrsquoamore[+2] illumina[+2] Maria [+2](28c) Lrsquoamore[+2] travolge[+] Maria [+3]

ldquoThe love (torments + lights up + overwhelms) Mariardquo

Therefore for sake of simplicity we decided to put them into aFSA that assigns as sentence polarity the score of the verbs with tagsNEGPOS (28a b) and that uses the verbs with the tag FORTE as in-tensifiers (28c) A simplified version of this grammar is displayed inFigure 31 Here the paths marked with the number (1) represents thenegative rule (28a) the paths (2) represent the positive rule (28b) andthe paths marked with (3) exemplify the intensification rule (28c)In this dictionarygrammar pair we did not mark the verbs with spe-cial tags consequently other Psychological Predicated endowed withpolarity and intensity tags can be matched in the correspondingpaths

24The verbs from this class are all compatible with this structure because they accept in the subjectposition an abstract noun and in position N1 require always a human noun

88 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re31F

SAfo

rth

ein

teraction

betw

eenN

sentan

dsem

antically

ann

otated

verbs

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 89

3344 Standard-Crossed Structures

To conclude the work on sentiment V-n we mention the case of thestructures of the type ldquostandardrdquo (N0sent V Loc N1um) ldquocrossedrdquo(N0um V di N1sent) exemplified below (DrsquoAgostino 2005 Elia et al1981)

(29) lrsquoamore[+2] brilla[+] negli occhi di Maria [+3]ldquolove shines in Mariarsquos eyesrdquo

(30) gli occhi di Maria brillano[+] drsquoamore[+2] [+3]ldquoMariarsquos eyes shine with loverdquo

To formalize this group of sentiment expressions we made use of 12verbs from the LG class 12 interpreted into a figurative way in whichthe N1um ldquostandardrdquo position and the N0um ldquocrossedrdquo position aredesigned by their body parts Npc through a metonymic process (Eliaet al 1981)As happens in (29) and (30) such verbs can work as intensifiers (egbruciare ldquoto burnrdquo) or can also have a negative polarity influence (onlyin the case of dolere ldquoto hurtrdquo)

335 Psychological Predicatesrsquo Adjectivalizations

Paragraph 332 described the manually-built collection of opinionbearing adjectives that counts more than 5000 itemsIn this list a subset of items in morpho-phonological relation with ourpredicates from the classes 41 42 43 and 43B has been automaticallyassociated to the verbs of origin and to their respective LG classes Theresult of this work is an electronic dictionary of 757 entries realizedwith a morphological grammar of Nooj (see Figure 32) It is displayedin the sample below

90 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

angoscianteA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciatoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciosoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

piacenteA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

piacevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

amatoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorosoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

biasimevoleA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=biasimare+Class=43B25

Table 317 gives information about the distribution of its semantictags while Table 318 compares the sentiment selection of V-a withthe whole dictionary of sentiment adjectives of SentIta in which it iscontained

Tag 41 42 43 43B TotPOS+FORTE 8 0 0 9 17POS 133 3 35 11 182POS+DEB 14 0 0 0 14NEG+DEB 66 0 5 1 72NEG 307 2 24 25 358NEG+FORTE 49 0 6 2 57FORTE 46 1 0 0 47DEB 9 0 0 1 10Tot 632 6 70 49 757

Table 317 Distribution of semantic tags in the dictionary of Adjectival Psych Pred-icates

The grammar resembles the ones used to derive the orientation ofadverbs in -mente and quality nouns from the semantic informationlisted in the sentiment adjectives that will be respectively treated inSections 352 and 353 The process is similar we copy and paste thedictionary that has to be semantically enriched into a Nooj text and weannotate it through the instructions of the grammar We then export

25 ldquoAnguishing anguished anguishous attractive pleasing beloved loving amorous blameworthyrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 91

Adjectives Entries Percentages () Percentages ()on Adj Psych on SentIta

Tot pos 213 28 4Tot neg 487 64 9Tot int 57 8 1Tot Psych 757 100 14Tot SentIta 5381 - 100

Table 318 Percentage values of Psych Adjectivalizations in SentIta

the annotations and compile a brand new dictionary on their baseBecause the operations involve just the two dictionaries we exclude

any other lexical (nod) and morphological (nom) resource from theNooj Preferences before applying the grammar to the dictionary-textThe grammar recognizes each verbal stem checks if it belongs to theLG class of interest and then adds the adjectival suffixes listed intoa dic file (see Table 319) The difference from the grammars forthe quick population of the adverbs in -mente and the quality nounsis that here the morphological strategy is not exploited to discoverthe prior polarity of the words since the adjective dictionary was al-ready tagged with sentiment information In fact in the final annota-tion the adjectives preserve all their semantic ($2S) inflectional andderivational ($2F) properties The new information they acquire re-gard their nature of V-a (+Va) the connection with the psychologicalverb (+V=$1L) and the relative LG class of the predicate (+Class=41)Regarding the adjectival suffixes for the deverbal derivation we exam-ined the ones that follow (Ricca 2004a)

bull morphemes for the adjectives formation (-bile -(t)orio -evole-(t)ivo -(t)ore)-bondo -ereccio -iccio -ido -ile -ndo -tario -ulo)

bull morphemes for the formation of adjectives that coincide with theverbrsquos past or present participle (-to -nte)

bull not typically deverbal morphemes (-oso -istico-(t)ico -aneo -ardo -ale -ario)

92 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re32E

xtractofth

em

orp

ho

logicalF

SAth

atassociates

psych

verbs

toth

eirad

jectivalization

s

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 93

bull not typically adjectival morphemes (-(t)ore -trice -ista -one)

SFX Adjectives-t-o 309-nt-e 159-(t)or-e 100-(t)iv-o 38-os-o 38-(t)ori-o 35-evol-e 26-o (conversion) 20-(t)ic-o 8-on-e 6-ist-a 4-istic-o 3-al-e 3-il-e 2-bil-e 1-bond-o 1-nd-o 1-tric-e 1-icci-o 1-ari-o 1-erecci-o 0-id-o 0-tari-o 0-ul-o 0-ane-o 0-ard-o 0

Table 319 Suffixes from the dictionary of Psych Adjectivalizations

As it can be seen in Table 319 some suffixes produced a largenumber of connections some of them were less productive Six mor-phemes did not produce any connectionFurthermore also a set of adjectives derived by conversion has beenautomatically connected to the respective verbs

94 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCESP

sych

verb

Psy

chV-

n

Psy

chV-

a

V-a

Tran

slat

ion

LGC

lass

Sco

re

angosciare angosciaangosciante ldquoanguishing

41 -3angosciato ldquoanguishedangoscioso ldquoanguishous

piacere piacerepiacente ldquoattractive

42 +2piacevole ldquopleasing

amare amoreamato ldquobeloved

43 +3amorevole ldquolovingamoroso ldquoamorous

biasimare biasimo biasimevole ldquoblameworthy 43B -2

Table 320 Adjectivalizations of the Psych Predicates

LG class 43 maligno = malignare ldquomean = to speak ill ofrdquoLG class 41 schifo = schifareldquodisgusting = to loatherdquo

The average precision reached in the automatic association pro-cess is 97 The list produced by Nooj has then been manuallycorrected in order to add it to the SentIta resources without anypossible source of errorsTable 320 shows examples of the connection between the verbs fromthe tables 41 42 43 and 43B and their respective nominalizations andadjectivalizationsActually nouns can be associated to adjectives also independentlyfrom verbs (Gross 1996) as happens in (31) Section 353 will dealwith such cases

(31) Luc

[est g eacuteneacuter eux

a de l a g eacuteneacuter osi teacute

]enver s ses ennemi s (Gross 1996)

ldquoLuc (is generous + has generosity) for his enemiesrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 95

Basically we chose to associate the psych adjectives to the respec-tive verbs in order to preserve their argumentactant selection for Se-mantic Role Labeling purposes This does not exclude that also otheradjectives that do not belong to this subset can have the function ofpredicates in the sentences in which they occur as happens in (31)

3351 Support Verbs

As exemplified above the sentiment expressions in which we insertedthe adjectives from SentIta are of the kind N0 essere Agg val WhereAgg val represents an adjective that expresses an evaluation (Elia et al1981) and the support verb for these predicates is essere ldquoto berdquoThis verb gives its support also to the expression N0 essere un N1-class(eg Questo film egrave una porcheria ldquoThis movie is a messrdquo) that withoutit together with the adjectives would not posses any mark of tense(Elia et al 1981)A great part of the compound adverbs possess also an adjectival func-tion (eg a fin di bene ldquofor goodrdquo tutto rose e fiori ldquoall peace and lightrdquosee Section 341) so with good reason they have been included inthis support verb construction as wellThe support verbsrsquo equivalents included in this case are the following(Gross 1996)

bull aspectual equivalents stare ldquoto stayrdquo diventare ldquoto becomerdquo ri-manere restare ldquoto remainrdquo

bull causative equivalents rendere ldquoto makerdquo

bull stylistic equivalents sembrare ldquoto seemrdquo apparire ldquoto appearrdquorisultare ldquoto resultrdquo rivelarsi ldquoto reveal to berdquo dimostrarsimostrarsi ldquoto show oneself to berdquo

Among the Italian LG structures that include adjectives26 we se-lected the following in which polar and intensive adjectives can occur

26For the LG study of adjectives in French see Picabia (1978) Meunier (1999 1984)

96 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

with the support verb essere (Vietri 2004 Meunier 1984)

bull Sentences with polar adjectives

ndash N0 Vsup Agg Val27

Lrsquoidea iniziale era accettabile[+1]

ldquoThe initial idea was acceptablerdquo

ndash V-inf Vsup Agg Val

Vedere questo film egrave demoralizzante[ndash2]

ldquoWatching this movie is demoralizingrdquo

ndash N0 Vsup Agg Val di V-Inf

La polizia sembra incapace[ndash2] di fare indagini

ldquoThe police seems unable to do investigaterdquo

ndash N0 Vsup Agg Val a N1

La giocabilitagrave egrave inferiore[ndash2] alla serie precedente

ldquoThe playability is worse than the preceding seriesrdquo

ndash N0 Vsup Agg Val Per N1

Per me questo film egrave stato noioso[ndash2]

ldquoIn my opinion this movie was boringrdquo

bull Sentences with adjectives as nouns intensifiers and downtoners

ndash N0 Vsup Agg Int di N1

Una trama piena[+] di falsitagrave[ndash2]

ldquoA plot filled with mendacityrdquo

27 We preferred to use in these examples the notation Agg Val rather than Psych V-a because they canrefer to both adjectivalizations of psych verbs and to other SentIta evaluative adjectives

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 97

3352 Nominal groups with appropriate nouns Napp

The support verb avere ldquoto haverdquo (and its equivalent tenere) has beenobserved into a transformation (Nb Vsup Na V-a) in which is involveda special kind of GN subject that contains noms approprieacutes ldquoappro-priate nounsrdquo Napp (Harris 1970 Guillet and Leclegravere 1981 Laporte1997 2012 Meydan 1996 1999)Citing (Laporte 2012 p 1)

ldquoA sequence is said to be appropriate to a given context ifit has the highest plausibility of occurrence in that contextand can therefore be reduced to zero In French the notionof appropriateness is often connected with a metonymicalrestructuration of the subjectrdquo

and (Mathieu 1999b p 122)

ldquoOn considegravere comme substantif approprieacute tout substantifNa pour lequel dans une position syntaxique donneacutee Na deNb = Nb28rdquo

we can clarify that ldquothe notion of highest plausibility of occurrenceof a term in a given contextrdquo (Laporte 2012) should not be interpretedin statistic terms or proved by searches in corpora but just intuitivelydefined through the paraphrastic relation Na di Nb = NbAccording to (Meydan 1996 p 198) ldquothe adjectival transformationswith Napp (nb (Na di Nb)Q essere V-a = Il fisico di Lea egrave attraenteldquoThe body of Maria is attractiverdquo) can be put in relation through fourtypes of transformationsrdquo which correspond also to the structuresincluded into our network of sentiment FSA The obligatoriness of themodifiers and the appropriateness of the nouns are reflected in thesetransformations (Laporte 1997)

28ldquoIt is considered to be appropriate substantive each substantive fro which into a given syntactic po-sition Na of Nb = Nbrdquo Authorrsquos translation

98 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull a nominal construction Vsup Napp

Nb Vsup Na V-a

Lea ha un fisico attraente ldquoLea has an attractive bodyrdquo

bull a restructured sentence in which the GN subject is exploded intotwo independent constituents

Nb essere V-a Prep Na

Lea egrave attraente (per il suo + di) fisico ldquoLea is attractive for herbodyrdquo

bull a metonymic sentence in which the Napp is erased

Nb essere V-a

Lea is attractive ldquoLea is attractiverdquo

bull a construction in which the Napp is adverbalized

Nb essere Na-mente V-a

Lea egrave fisicamente attraente ldquoLea is physically attractiverdquo

The concept of appropriate noun has a crucial importance notonly in these adjectival expressions but also when it appears as ob-ject of psychological predicates (see Section 334) as exemplified by(Mathieu 1999b p 122)

N0 V (Deacutet N de Nhum)1 = N0 V (Nhum)1

Les soucis rongent lrsquoesprit de Marie = Les soucis rongent MarieldquoWorries torment the spirit of Mary = Worries torment Maryrdquo

Moreover into the Sentiment Analysis field where the identifica-tion and the classification of the features of the opinion object evenconsist in a whole subfield of research (see Section 52) the Napp be-comes a very advantageous linguistic device for the automatic featureanalysis See for example (32) in which Na (Napp) is the feature andthe Nb (N-um) is the the object of the opinion

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 99

(32) [Opinion[Feature

[La r i sol uzi one

Lprimeobi et t i vo

]]del l a [Object f otocamer a] egrave eccel lente[+3]]

(32a) [Opinion[ObjectLa f otocamer a] ha una [Feature

[r i sol uzi one

obi et t i vo

]]eccel lente[+3]]

(32b) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3]]

(32c) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3] per [Feature

[l a sua r i soluzi one

i l suo obi et t i vo

]] ]

ldquo(The image resolution + The lens) of the camera is excellentrdquoldquoThe camera has an excellent (image resolution + lens)rdquoldquoThe camera is excellentrdquoldquoThe camera is excellent for its (image resolution + lens)rdquo

A useful classification of the nouns that can play the role of Nappfor both human and not human nouns and that can be exploited forthe opinion feature classification is the following (Meydan 1996)

Napp of Num

Npc = body parts

Npabs = personality trait

Ncomport = attitudes

Npreacutedr = intentions

Napp of N-um

Npo = shape

Nprop = ability

Dnom = model

100CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3353 Verbless Structures

Predicativity is not a property necessarily possessed by a particularclass of morpho-syntagmatic structures (eg verbs that carry infor-mation concerning person tense mood aspect) instead it is deter-mined by the connection between elements (Giordano and Voghera2008 De Mauro and Thornton 1985)Also on the base of their frequency in written and spoken corpora andin informal and formal speech together with Giordano and Voghera(2008) we consider verbless expressions syntactically and semanti-cally autonomous sentences which can be coordinated juxtaposedand that can introduce subordinate clauses just like verbal sentencesAmong the verbless sentences available in the Italian language we areinterested here on those involving adjectives indicating appreciation(Agg val) eg Bella questa ldquoGood onerdquo (Meillet 1906 De Mauro andThornton 1985)Below we report a selection of customer reviews from the dataset de-scribed in Section 513 that can give an idea of the diffusion of theverbless constructions in user generated contents

bull Movie Reviews [Molto bravi gli interpreti] [la mia preferita lasorella combina guai] [Bella la fotografia] [i colori]29

bull Car Reviews [Ottimo lrsquoimpianto radio cd] [comodissimi i co-mandi al volante]30

bull Hotel Reviews [Posizione fantastica] si raggiungono a piedi moltiluoghi strategici di Londra [molto curato nel servizio e nel soddis-fare le nostre richieste][ottimo servizio in camera]31

bull Videogame Reviews [Provato una volta] e [subito scartato]

29httpwwwmymoviesitfilm2013abouttimepubblicoid=68043930httpautociaoitNuova_Fiat_500__Opinione_1336287SortOrder131httpwwwtripadvisoritShowUserReviews-g186338-d651511-r120264867-Haymarket_

Hotel-London_Englandhtml

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 101

[odioso il modo in cui viene recensito]32

bull Smartphone Reviews [Utilissimo il sistema di spegnimento e ri-accensione ad orari programmati] [Telefonia e sensibilitagrave OK ]33

bull Book Reviews [Azzeccatissima la descrizione anche psicologica eintrospettiva dei personaggi]34

3354 Evaluation Verbs

In this Paragraph we mention the use of the verbs of evaluationVval (Elia et al 1981) which represent a subclass of the LG class43 grouped together through the acceptance of at least one of theproperties N1 = N1 Agg1 and N1 = Agg1 Ch F Examples are giudicareldquoto judgerdquo trovare ldquoto considerrdquo avvertire ldquoto noticerdquo valutare ldquotoevaluaterdquo etcOf course the N1 Agg here can include an Napp that takes the shapeof (Na di Nb)1 Agg just as happens with the psychological predicatesof Mathieu (1999b)They present a different role attribution with respect to support verbsconstructions since the N0 of the evaluative verbs is always an Opin-ion Holder (33)

(33) [Opinion[Opinion HolderM ar i a ]consi der a

[[Target Ar tur o] a f f asci nante[+2]

a f f asci nante[+2] [Targetche Ar tur o veng a]

]

ldquoMaria considers (Arturo fascinating + enchanting that Arturocomes)rdquo

32httpwwwamazonitreviewRK6CGST4C9LXI33httpwwwamazonitAlcatel-Touch-Smartphone-Dual-Italiaproduct-reviews

B00F621PPGpageNumber=334httpwwwamazonitproduct-reviewsB007IZ1Q5SpageNumber=3

102CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

The N0 of support verb construction instead is always the targetof the opinion (34) The Opinion Holder remains here unexpressed

(34) [Opinion[Target Ar tur o ]

egraver i manesembr a

d av ver o[+] a f f asci nante[+2]]

ldquoArturo (is + remain + seems) truly fascinatingrdquo

Into causative constructions the roles are again different (35)

(35) [Opinion[Cause

[M ar i a

Lprimeacconci atur a

]] r ende [Target Ar tur o ]d av ver o[+] a f f asci nante[+2]]

ldquo(Maria + The hairstyle) makes Arturo truly fascinatingrdquo

336 Vulgar Lexicon

The fact that ldquotaboo language is emotionally powerfulrdquo (Zhou 2010 p8) can not be ignored during the collection of lexical and grammaticalresources for Sentiment Analysis purposesSentIta is provided with a collection of 182 bad words that include thefollowing grammatical and semantic subcategories

bull Nouns 115 entries among which

ndash 30 are human nouns (eg rompiballe ldquopain in the assrdquo)

ndash 9 are nouns of body parts (eg cazzo ldquodickrdquo)

bull Verbs 45 entries among which

ndash 17 are pro-complementary and pronominal verbs (eg 4+PRXCOMP fottersene ldquoto give not a fuckrdquo and 13 +PRX in-cazzarsi ldquoto get madrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 103

ndash 17 allow the derivation of nouns in -bile ldquoblerdquo (eg tromba-bile ldquobangablerdquo from trombare ldquoto screwrdquo)

ndash 4 are already included in the LG class 41 (eg fregare ldquoto mat-terrdquo)

bull Adjectives 16 entries among which 10 are from SentIta (eg 5+POS cazzuto ldquodie-hardrdquo and 5 +NEG scoglionato ldquoannoyedrdquo)

bull Adverbs 4 entries (eg incazzosamente ldquogrumpilyrdquo)

bull Exclamation 8 entries (eg vaffanculo ldquofuck offrdquo)

Figure 33 Frozen and semi-frozen expressions that involve the vulgar word cazzoand its euphemisms

104CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Although some vulgar terms and expressions eg with the func-tions of interjections or discourse markers can be semantically emptythe great part of them have metaphorical functions that are relevantfor the correct analysis of the sentiment in real texts Examples are of-fense imprecation intensification of negation etcThe scatological and sexual semantic fields are among the most fre-quent in our lexicon The male and female nouns derived from humanbody parts can have different kinds of orientations and functions thatgo through the insult (coglione ldquoassholerdquo) and the praise (figata ldquocoolthingrdquo)In accord with the observations made up by Velikovich et al (2010)in web corpora our lexicon involves a set of racial or homophobicderogatory terms (eg terrone35 ldquosouthern Italianrdquo culattone ldquofaggotrdquo)The vulgar words in the SentIta database especially the ones with anuncertain polarity can be part of frozen or semi-frozen expressionsthat can make clear for each occurrence the actual semantic orienta-tion of the words in context A very typical Ialian example is cazzoldquodickrdquo with its more or less vulgar regional variants (eg minchiapirla) and euphemisms (eg cavolo ldquocabbagerdquo cacchio ldquodangrdquo mazzaldquostickrdquo tubo ldquopiperdquo corno ldquohornrdquo etc) Figure 33 exemplifies thecases in which the context gives to the word(s) under examination anegative connotation In detail the FSA includes negative adverbialand adjectival functions (paths 12 5 7) exclamations (paths 3 6 9)interrogative forms (6) and intensification of negations (paths 8 9) Inpath (4) it is also exemplified the frozen sentence from the LG classCAN (N0 V (C1 di N)= N0 V C1 a N2) that presents cazzo (Vietri 2011) asC1 frozen complement This frozen sentence in particular is also re-lated to some monorematic compound nouns and adjectives includedin the bad word list of SentitaHowever a Sentiment Analysis tool that works on user generated con-tents must be aware of the fact that in colloquial and informal situ-ations a word like cazzo can work simply as a regular intensifier also

35term used with insulting purposes only by Northern Italians

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 105

for positive sentences (eg egrave cosigrave bello[+2] cazzo[+] [+3] ldquoitrsquos fuckingnicerdquo) This is why it received in the dictionary just the tag attributedto intensifiers (+FORTE) In order to be annotated in texts with otherorientations it must occur as component of specific expressionsAs we can see in Table 321 although the majority of the entries in thebad words dictionary possess a negative connotation 10 of them arepositive (eg strafigo ldquosupercoolrdquo) Moreover the manual annotationof the whole list of bad words with sentiment tags confirmed the bud-ding concept of the emotional power of taboo language 84 of thevulgar lexicon is endowed with a defined semantic orientation (see Ta-ble 322)

Score Bad Words Entries+3 +POS+FORTE 6+2 +POS 12+1 +POS+DEB 0-1 +NEG+DEB 8-2 +NEG 125-3 +NEG+FORTE 6+ +FORTE 3- +DEB 0

Table 321 SentIta tag distribution in the list of bad words

Bad Words Entries Percentage ()tot pos 18 10tot neg 132 73tot int 3 2tot oriented 153 84tot neutral 29 16tot BW 182 100

Table 322 Bad Words percentage values in SentIta

106CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Collateral Benefits of Exhaustive Vulgar Lexical and Grammatical Resources

The Web 20 offers the opportunity to Internet users to freelyshare almost everything in online communities hiding their ownidentity behind the shelter of anonymityFlaming trolling harassment cyberbullying cyberstalking cy-berthreats are all terms used to refer to offensive contents on theweb The shapes can be different and the focus can be on various top-ics eg physical appearance ethnicity sexuality social acceptanceetc (Singhal and Bansal 2013) but they are all annoying (when noteven dangerous when the victims are children or teenagers) for thesame reason they make the online experience always unpleasantThe detection and filtering of offensive language on the web isstrongly connected to the Sentiment Analysis field when one decideto handle it automatically The occurrence of derogatory terms andphrases or the unjustified repetition of words with a strongly negativeconnotation can effectively help with this taskAlthough taboo language is generally considered to be the strongestclue of harassment in the web (Xiang et al 2012 Chen et al 2012Reynolds et al 2011 Xu and Zhu 2010 Yin et al 2009 Mahmud et al2008) it must be clarified that the presence of bad words in posts doesnot necessarily indicate the presence of offensive behaviorsWe already explained that the words in our collection of vulgar termsin some cases are neutral or even positive Profanity can be usedwith comical or satirical purposes and bad words are often just theexpression of strong emotions (Yin et al 2009)Thus the well-known censoring approaches based only on keywordsthat simply match words appearing in text messages with offensivewords stored in blacklists are clearly not meant to reach high levels ofaccuracyConsistent with this idea in recent years many studies on offensivecyberbullying and flame detection integrated the bad wordsrsquo contextin their methods and tools

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 107

Chen et al (2012) exploited a Lexical Syntactic Feature (LSF) archi-tecture to detect offensive content and identify potential offensiveusers in social media introducing the concept of syntactic intensifierto adjust wordsrsquo offensiveness levels based on their contextXiang et al (2012) learned topic models from a dataset of tweetthrough Latent Dirichlet Allocation (LDA) algorithm They combinedtopical and lexical features into a single compact feature spaceXu and Zhu (2010) proposed a sentence-level semantic filtering ap-proach that combined grammatical relations with offensive wordsin order to semantically remove all the offensive content from sen-tencesInsulting phrases (eg get-a-life get-lost etc) and derogatorycomparisons of human beings with insulting items or animals (egdonkey dog) were clues used by Mahmud et al (2008) to locate flamesand insults in text

34 Towards a Lexicon of Opinionated Compounds

341 Compound Adverbs and their Polarities

Elia (1990) adapted to the Italian language the formal definition of ldquoad-verbrdquo of1 2q Jespersen (1965) who for the first time included alsospecial kinds of phrases and sentences in the category of the adverbsIn detail the structures included in this category are the following

bull traditional adverbs with a monolithic shape (eg volentierildquogladlyrdquo)

bull traditional adverbs in -mente (eg impetuosamente ldquoimpetu-ouslyrdquo)

bull prepositional phrases (eg a vanvera ldquorandomlyrdquo)

108CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull phases derived from sentences with finite or non-finite verbs(eg da levare il fiato ldquoto take the breath away)

The first two types of adverbs will be treated separately in Section352 due to the fact that they have been automatically derived fromthe SentIta adjective dictionaryCompound adverbs or frozen or idiomatic adverbs (the last two pointsof Jespersenrsquos list) can be defined as

ldquo() adverbs that can be separated into several wordswith some or all of their words frozen that is semanticallyandor syntactically noncompositionalrdquo (Gross 1986 p 2)

Therefore just as happens in polyrematic expressions the syntacticelements that compose such adverbs can not be freely shifted substi-tuted or deleted This despite the fact that from a syntactic point ofview they work just as simple adverbsFollowing the LG paradigm and according to their internal mor-phosyntactic structure the compound adverbs have been classified inmany languages namely English(Gross 1986) French (Gross 1990Laporte and Voyatzi 2008 Tolone and Stavroula 2011) Italian (Elia1990 Elia De Gioia 1994ab 2001) Modern Greek (Voyatzi 2006)Portuguese (Baptista 2003) etcThe LG description and classification of compound adverbs rep-resents the intersection of two criteria corresponding to the phe-nomenon of the multiword expressions and to the adverbial functions(Tolone and Stavroula 2011 Laporte and Voyatzi 2008)According to (Laporte and Voyatzi 2008 p 32)

ldquo() a phrase composed of several words is considered tobe a multiword expression if some or all of its elements arefrozen together that is if their combination does not obeyproductive rules of syntactic and semantic compositional-ityrdquo

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 109

The adverbial functions instead pertain to the adverbial roleswhich can take the shape of circumstantial complements or of com-plements which are not objects of the predicate of the clause in whichthey appearIn LG descriptions compound adverbs are formalized as a sequenceof parts of speech and syntactic categories Their morpho-syntacticstructures are described on the base of the number the category andthe position of the frozen and free components of the adverbialsThe Italian classification system for compound adverbs follows thesame logic as the French one with which it shares a strong similar-ity in terms of syntactic structuresIn this research among the 16 classes of idiomatic adverbs selected byDe Gioia (1994ab) we worked on the ones reported in Table 323The comparative structures with come ldquolikerdquo treated by Vietri (19902011) as idiomatic sentences will be described in Section 342The adverbs belonging to the classes described in the first 9 rows of

Table 323 have been manually selected and evaluated starting fromthe electronic dictionaries and the LG tables of Elia (1990) while theclasses PF PV and PCPN come from the LG tables of De Gioia (2001)In the first classes Elia (1990) recognized the following three abstractstructures

1 Prep (E+Det) (E+Agg) C1 (E+Agg)

(a) PC

(b) PDETC

(c) PAC

(d) PCA

2 Prep (E+Det+Agg) C1 (E+Agg) Prep (E+Det+Agg) C2 (E+Agg)

(a) PCDC

(b) PCPC

110CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class

Structu

reE

xamp

leTran

lation

PC

Prep

Ccon

trovoglia

un

willin

glyP

DE

TC

Prep

DetC

conle

cattiveb

yan

ym

eans

necessary

PAC

Prep

Agg

Ca

pien

ivotiw

ithfl

ying

colo

rsP

CA

Prep

CA

gga

testaalta

with

you

rh

eadh

eldh

ighP

CD

CP

repC

diC

concogn

izione

dicau

saw

ithfu

llknow

ledge

PC

PC

Prep

CP

repC

diben

ein

meglio

from

goo

dto

better

PC

ON

G(P

rep)

CC

on

gC

drsquoam

oree

drsquoaccord

oin

armo

ny

CC

CC

nien

tem

alen

oth

ing

bad

CP

CC

Prep

Clrsquoira

diD

ioa

kingrsquos

ranso

mP

FC

om

po

un

dSen

tence

sisalvichip

uograve

everym

anfo

rh

imself

PV

Prep

VW

sedu

tastan

teo

nth

esp

ot

PC

PN

Prep

CP

repN

inon

taa

insp

iteo

f

Table

323Classes

ofC

om

po

un

dA

dverb

sin

SentIta

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 111

3 (E+Prep) (E+Det) C1 (E+Cong+Prep) (E+Det+Agg) C2

(a) PCONG

(b) CC

(c) CPC

Table 324 shows the distribution of each one of the evaluation(POSNEG) intensity (FORTEDEB) and discourse tags36 (SINTESINEGAZIONE etc) in every class of compound adverbAs we can see in the two lower section of the Table 324 the percentageof compound adverbs that posses an orientation or that are signifi-cant for Sentiment Analysis purposes is the 51 of the total amount ofcompound adverbs under exam Moreover it must be observed thatthe 60 of the labels are connected to the intensity while we have onlythe 30 of evaluation tagsThe presence of discourse tags seems to be marginal but if comparedwith other part of speech (or even with the -mente and monolithic ad-verbs see Section 352) it reveals its significance

3411 Compound Adjectives of Sentiment

A great part of the polyrematic units with an adverbial function canalso play the role of adjectives

Adverbial function agire a fin di bene ldquoto act for goodrdquo

Adjectival function una bugia a fin di bene ldquoa white lierdquo

For simplicity we did not distinguish the cases in which theyposses just one function from the cases in which they can have boththe roles However separating these two groups is often impossible

36These tags annotate 70 discourse operators able to summarize (eg in parole povere ldquoin simpletermsrdquo) invert (eg nonostante cioacute ldquoneverthelessrdquo) confirm (eg in altri termini ldquoin other termsrdquo) com-pare (eg allo stesso modo ldquoin the same wayrdquo) and negate (eg neanche per sogno ldquoin your dreamsrdquo) theopinion expressed in the previous sentences of the text

112CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

if one considers also their invariability and the fact that their func-tion depends on semantic features rather then on morphological ones(Voghera 2004)

AVV+AVVC+PC+PN without adjectival functionin conclusione ldquoin conclusionrdquo

AVV+AVVC+PC+PN with adjectival functiona vanvera (eg parlare a vanvera ldquoto prattlerdquo un discorso a van-vera ldquoa prattlerdquo)

Therefore we just did not make any distinction into the dictionar-ies in order to avoid the duplication of the entries but we recalledthem also in the grammars dedicated to adjectives as nouns modi-fiers and into constructions of the kind N0 (essere + E) Agg Val (seeSection 335)

342 Frozen Expressions of Sentiment

In this thesis oriented words are always considered in context In thisparagraph we will focus on the importance that the frozen sentencescan have in a LG based module for Sentiment AnalysisTraditional and generative grammar generally treat idioms as aberra-tions and many taggers and corpus linguistic tools just ignore multi-word units and frozen expressions (Vietri 2004 Silberztein 2008)The LG paradigm started with Gross (1982) the systematic and formalstudies on frozen sentences underling their non-exceptional naturefrom both the lexical and syntactic point of views Indeed idioms oc-cupy in the lexicon a volume that is comparable with the one of thefree forms (Gross 1982)This kind of approach opens plenty of possibilities in the field of Sen-timent Analysis in which the collection of lexical resources endowedwith a defined polarity can not just run aground on simple words butneeds to be opened also to different kinds of oriented multiword ex-

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 113

Tag

PC

CC

CPC

PAC

PCA

PCDC

PCONG

PCPC

PDETC

PF

PV

PCPN

TotP

OS+

FO

RT

E0

60

42

02

15

10

021

PO

S1

52

915

101

014

00

057

PO

S+D

EB

00

03

22

20

51

00

15N

EG

+DE

B0

01

04

01

37

31

020

NE

G36

156

155

62

519

70

512

1N

EG

+FO

RT

E0

00

11

12

23

11

012

FO

RT

E10

240

549

2613

3518

768

50

377

DE

B23

131

144

45

26

10

073

SIN

TE

SI13

20

61

20

01

05

030

CO

NF

ER

MA

52

00

00

00

50

01

13IN

VE

RT

E4

20

80

00

07

01

022

CO

MP

30

01

00

00

10

00

5N

EG

AZ

ION

E3

00

00

00

21

01

18

Tot

190

8515

110

6038

5033

150

2214

777

4To

tCla

ss87

234

610

431

632

024

813

212

762

120

212

453

332

1Pe

rcen

tage

()

2225

1435

1915

3826

2311

1113

51

Eva

luat

ion

tags

()

1931

6029

4850

2033

3559

1471

32In

ten

sity

tags

()

6662

4057

5045

8061

5541

360

58D

isco

urs

eta

gs(

)15

70

142

50

610

050

2910

Tab

le3

24C

om

po

siti

on

oft

he

com

po

un

dad

verb

dic

tio

nar

y

114CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

pressions and frozen sentencesAccording to (Vietri 2014b p 32)

() a verb when co-occurring with a certain noun (or a setof nouns) produces a ldquospecial meaningrdquo that it would nothave been assigned if the noun was substituted This typeof meaning is conventionally defined ldquonon-compositionalrdquobut this does not automatically imply from a syntactic pointof view that idioms are ldquounitsrdquo

On the base of this statement strictly connected to the distributionalproperties of the idiomatic sentences we included in our sentimentdatabase more than 500 Italian frozen sentences that have been man-ually evaluated and then formalized with pairs of dictionary-grammarThis choice is connected to the purpose of lemmatizing them in lexicaldatabases taking into account through FSA also their syntactic flexi-bility and lexical variation Treating into a computational way idiomsmeans to deal with the problems they pose in terms of relationship be-tween ldquointerpretationrdquo and ldquosyntactic constructionsrdquo (Vietri 2014b)The source of the work are the Lexicon-Grammar descriptions of id-ioms that take the shape of binary tables in which lexical entries aredescribed on the base of syntactic and semantic propertiesThe formal notation is the traditional one used in the LG frameworkThe symbol C is used in frozen sentences to indicate a fixed nom-inal position that can not be substituted by different items belong-ing to the same class or semantic field and that can not be morpho-grammatically modified (Vietri 2004)The Lexicon-Grammar classification of Italian idioms includes morethan 30 Lexicon-Grammar classes of idioms with ordinary verbs andsupport verbs The most common support verbs are avere to ldquohaverdquoessere ldquoto berdquo fare ldquoto makerdquo They differ from their ordinary counter-parts because of their semantic emptiness When they are implied inthe formation of idiomatic constructions they present a very high lex-ical and syntactic flexibility that can take the shape of the following

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 115

phenomena (Vietri 2014d)

bull the alternation of support verbs with aspectual variants (eg ri-manere ldquoto remainrdquo diventare ldquoto becomerdquo)

bull the production of causative constructions (eg with verbs likerendere ldquoto makerdquo)

bull the deletion of the support verb (eg N0 Agg come C1 una donnaastuta come una volpe ldquoa woman as sharp as a tackrdquo)

Because of the complexity of their lexical and syntactic variabilitythat often affects the intensity of the idiom itself we chose to reducethe sample of the idioms under examination for Sentiment Analysispurposes just to the frozen sentences with the support verb essere thatinclude adjectives in their structure namely PECO CEAC EAA ECAand EAPC Due to the fact that a large part of the idioms belongingto this subgroup contains adjectives evaluated in SentIta we foundinteresting a comparison between the prior polarity of such adjectivesinvolved in idioms and the polarity assigned to the idiom itself Theresults of this evaluation of differences is reported in Table 325

PECO CEAC EAA ECA EAPC TotIdioms in the LG Class 274 36 40 153 162 665Idioms in SentIta 245 27 32 134 139 577Adj of SentIta in Idioms 133 12 15 37 56 253

Table 325 Polar adjectives in frozen sentences

3421 The Lexicon-Grammar Classes Under Examination PECO CEAC EAAECA and EAPC

Vietri (1990) showed that the comparative frozen sentences of the typeN0 Agg come C1 (PECO) usually intensify the polarity of the adjectiveof sentiment they contain as happens in (36) in which the SentIta

116CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

adjective bello endowed with a polarity score of +2 occurring into theidiom Essere bello come il sole changes its polarity in +3

(36) Maria egrave bella[+2] come il sole [+3]ldquoMaria is as beautiful as the sun

Actually it is also possible for an idiom of this sort to be polarizedwhen the adjective (eg bianco ldquowhite) contained in it is neutral (37)or even to reverse its polarity as happens in (38) (eg agile ldquoagile ispositive)

(37) Maria egrave bianca[0] come un cadavere [-2]ldquoMaria is as white as a dead body (Maria is pale)

(38) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

In this regard it is interesting to notice that the 84 of the idiomshas a clear SO while just the 36 of the adjectives they contain ispolarized see the examples in Table 326 This confirms the signifi-cance of a sentiment lexicon that includes also frozen expressions inits list Indeed a resource of this sort is able to disambiguate the casesin which the polarity of sentiment (or neutral) words is changed by theironic nature of the idioms in which they occur (38) (39) (40)The disambiguation rule is only one and consist in annotating alwaysthe longest match in the text analysis

(39) Maria egrave alta[0] come un soldo di cacio [-2]ldquoMaria is knee-high to a grasshopperrdquo (Maria is short)

(40) Maria egrave asciutta[0] come lrsquoesca [-2]ldquoMaria is on the bones of its arserdquo (Maria is penniless)

Other idioms included in our resources are the ones belonging tothe classes EAPC that involve the presence of an adjective or a verb inits past participle form a prepositional complement These frozen ex-pressions usually intensify the polarity of the adjectivepast participle

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 117

Froz

enTr

ansl

atio

nA

dj

Ad

jId

iom

Idio

mSe

nte

nce

Po

lari

tySc

ore

Po

lari

tySc

ore

esse

reag

ile

com

eu

na

gatt

ad

ipio

mbo

ldquoto

be

asag

ile

asa

lead

catrdquo

PO

S+2

NE

G-2

esse

reag

ile

com

eu

na

gazz

ella

ldquoto

be

asag

ile

asa

lead

gaze

llerdquo

PO

S+2

PO

S+

2es

sere

agil

eco

me

un

osc

oiat

tolo

ldquoto

be

asag

ile

asa

squ

irre

lrdquoP

OS

+2P

OS

+2

esse

reas

tuto

com

eil

dem

onio

ldquoto

be

ascl

ever

asth

ed

evilrdquo

-0

PO

S+

2es

sere

astu

toco

me

un

avo

lpe

ldquoto

be

assh

arp

asa

tack

rdquo-

0P

OS

+2

esse

reas

tuto

com

eu

no

zin

garo

ldquoto

be

ascu

nn

ing

asa

gyp

syrdquo

-0

NE

G-2

Tab

le3

26E

xam

ple

so

fid

iom

sth

atm

ain

tain

sw

itch

or

shif

tth

ep

rio

rp

ola

rity

oft

he

adje

ctiv

esth

eyco

nta

in

118CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(see example below) but just as the PECO ones they can also switchit or shift it

(41) Maria egrave matta[-2] da legare [-3]ldquoMaria is so crazy she should be locked up

The LG class EAA with definitional structure N0 essere Agg e Agg in-cludes two adjectives that can be polarized or not Example (42) showsthat in this case as well the sentence polarity can be also oppositewith respect to the adjectives ones

(42) Maria egrave bella[+2] e fritta[0] [-2]ldquoMaria is cooked

An example from the class CEACis reported in (43)

(43) Lrsquoanima egrave nera[0] come il carbone [-3]ldquoThe soul is black like the coalrdquo

Among the transformation that these idioms can have we includedN0 avere C Agg (44) that corresponds also to the most frequent one

(44) Maria ha lrsquoanima nera[0] come il carbone [-3]ldquoMaria has a black soul like the coalrdquo

In the end we formalized the LG class ECA that accounts in itsfrozen elements the nominal element C1 and the Adjective

(45) Maria egrave una gatta morta[0] [-2]ldquoMaria is a cock tease

3422 The Formalization of Sentiment Frozen Expressions

Due to their syntactic and lexical variation and to discontinuity frozenexpressions need to be formalized in an electronic context able to sys-tematically list them in electronic databases and to take into accounttheir syntactic variability Basically the final purpose is to take advan-tage of the large number of information listed into the LG matrices

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 119

while recognizing and annotating idioms in real textsOne of the best solution is to link dictionaries which contain the listsof Atomic Linguist Units and their syntactic and semantic propertieswith syntactic grammars which can make such words and propertiesinteract into a FSA context (Silberztein 2008) ALU are ldquothe smallestelements that make up the sentence ie the non-analyzable units ofthe languagerdquo (Silberztein 2003) They can take the shape of simplewords affixes multiword expressions and frozen expressionsWhat makes different frozen expressions and the other ALU is discon-tinuity as displayed in Figure 34 in fact it is possible for adverbs tooccur between the verb and the direct object (Vietri 2004) That iswhy frozen sentences need both dictionaries and FSA to be recognizedand annotated in a correct way the dictionaries allow the recogni-tion of idioms thanks to the characteristic components (word forms orsequences that occur every time the expressions occur and originatethe recognition of the frozen expressions in texts (Silberztein 2008))and FSA let the machine compute them despite of the many differentforms that in real texts they can assumeFigure 34 represents a simplified version the main graph of the FSA

for the recognition and the sentiment annotation of idioms of the LGclass PECO It includes a metanode that allows the recognition of anominal group in subject position (N0) an embedded graph for theverb and six nodes related to different Semantic OrientationsThe instruction XREF is used to exclude the linguistic material thatcan occur between the support verb and the idiom object when theannotation is performed Figure 35 shows as an example the contentof the metanode that annotates frozen expressions with +3 as Seman-tic Orientation (MOLTOPOS) Here we observe the interactions be-tween the idiom dictionary and the restrictions reported in the syntac-tic grammar In its structure the grammar under examination looksexactly the same as the grammar for the identification of free sen-tences with the same syntactic shape The difference are the syntac-tic and distributional constraints that in the grammar recall only the

120CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re34E

xtractofth

eF

SAfo

rth

eSO

iden

tificatio

no

fidio

ms

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 121

properties and the lexical items associated in the dictionary with aspecific characteristic component To be more precise the PECO dic-tionary among others lists the three entries shown below which havebeen labeled with the semantic tags POS and FORTE (+3) as happensin the other subsets of SentIta

Essere (forte+coraggioso) come un leone ldquoTo be (strong+brave) like alionrdquo

leoneN+Class=PECO

+A=coraggioso+A=forte+AVV=come+AVV=quanto+DET=un

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0essereunC1+N0esserecomeC1+N0esserepiu

Essere buono come il pane ldquoTo be as good as breadrdquo

paneN+Class=PECO

+A=buono+AVV=come+AVV=quanto+DET=il+PREP=di

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserediC1+N0esserecomeC1+N0esserepiu

Essere (astuto+furbo) come il demonio ldquoTo be as clever as the devilrdquo

demonioN+Class=PECO

+A=astuto+A=furbo+AVV=come+AVV=quanto+DET=il

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserepiu

122CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

As shown below the characteristic components C (eg leone ldquolionrdquopane ldquobreadrdquo demonio ldquodevilrdquo) have been used as Nooj entry in theelectronic dictionary and have been associated to the other compo-nents of the frozen expressions (eg +A +AVV +DET) Moreover theyhave been described also with other information referring to the fol-lowing properties borrowed from the already cited LG tables

distributionaleg +N0um +Vsup +Vcaus

transformationaleg +N0essereunC1 +N0esserecomeC1 +N0esserediC1 +N0esserepiu

All these properties are recalled in the idiom syntactic grammar inform of lexical and grammatical constraintThe restrictions +N0um and +Vsup +Vcaus are respectively meant forthe distributional properties of sentence subject that can or can notbe a human and of the support verb essere with its aspectual (Vsup)or causative (Vcaus) variants While the support verbs restare and ri-manere are always acceptable diventare and rendere are not accept-able only in few cases (Vietri 1990)We chose to formalize the distributional restrictions in the syntacticgrammars with the following instruction written under each involvednode and into a related variable (var)

lt$var=$C$vargt

It represents a constraint for the words that are allowed to appearinto the nodes it restricts which can be only the ones indicated in thedictionary by the same variable var for the entry C As an example theidiom Essere (forte+coraggioso) come un leone that in the dictionaryhas as values for the variable +A only the adjectives forte ldquostrongrdquo andcoraggioso ldquobraverdquo can not accept in the grammar into the variable Aother adjectives than the ones written above if the variable goes withthe restriction shown below

lt$A=$C$Agt

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 123

The same happens with the verb aspectual and causative variants(Vietri 1990) with similar restrictions

lt$Vvar=$C$Vvargtlt$Vcaus=$C$Vcausgt

The symbols ldquo=rdquo is used in Figure 35 when the inflection of theword in the variable is permitted (eg for the adjective essere furbicome il demonio) Otherwise it is used the symbol ldquo=rdquoWe preferred to treat transformational properties instead differentlyfrom Vietri (1990) In order to reduce the dimension of the gram-mar that contains here also the polarity and intensity informationwe wrote transformational constraints before the path starts in the fol-lowing form

lt$C$vargt

This instructs the FSA that the specific path is allowed just for theidioms that present in the dictionary the variable var for the entry CIn the FSA in Figure 35 the PECO standard structure Essere Agg come

C1 is recognized by the path (2) The path (1) deletes the adjective ifthe idiom recognized by C possesses the specific transformational rule+N0esserecomeC1 Below are reported examples of idioms that satisfythis constraint The ones preceded by the asterisk do not satisfy therestrictions

essere come un leone ldquoto be like a lionrdquoessere come il pane ldquoto be like the breadrdquoessere come il demonio ldquoto be like the devilrdquo

In the path (3) the grammar performs the deletion of both the ad-jective and the adverb come for the idioms endowed with the property+N0essereunC1 in the electronic dictionary

essere un leone (di uomo) ldquoto be a lion (man)rdquoessere un pane ldquoto be a breadrsquoessere un demonio (di uomo) ldquoto be a devil (man)rdquo

124CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re35E

xtractofth

eF

SAfo

rth

eextractio

no

fPE

CO

idio

ms

35 THE AUTOMATIC EXPANSION OF SENTITA 125

Differently from essere un pane that is not acceptable at all es-sere come il demonio and essere un demonio (di uomo) are accept-able sentences that change the polarity of the basic idiom probablybecause they are related with another comparative sentence with op-posite meaning and orientation eg essere (brutto + cattivo) come ildemonio ldquoto be (ugly + evil) like the devilrdquo (Vietri 1990)The last two paths respectively refer to the correlation of PECO withother sentence structures such as N0 essere di C1 (4) indicated by theproperty +N0esserediC1 and N0 essere piugrave Agg di C summarized in+N0esserepiu in the dictionary and in the path (5)

essere (fatto + E) di pane ldquoto be made of breadrdquoessere (fatto + E) di leone ldquoto be made of lionrdquoessere (fatto + E) di demonio ldquoto be made of devilrdquo

essere piugrave (forte + coraggioso) di un leone ldquoto be (stronger + braver)than a lionrdquoessere piugrave buono del pane ldquoto be better that the breadrdquoessere piugrave (astuto + furbo) del demonio ldquoto be cleverer that the devilrdquo

While just the first idiom is allowed to walk through the path (4)the path (5) implies a property accepted by all the exemplified idioms

35 The Automatic Expansion of SentIta

The present Section discusses the automatic enlargement of the man-ually built electronic dictionaries of SentItaOur work took advantage of derivation linguistic clues that put in rela-tion semantically oriented adjectives with quality nouns and with ad-verbs in -mente The purpose is making new words automatically de-rive the semantic information associated to the adjectives with whichthey are morpho-phonologically relatedFurthermore we used as morphological Contextual Valence Shifters(mCVS) a list of prefixes able to negate or to intensifydowntone the

126CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

orientation of the words with which they occurWe clarify in advance that the morphological method could have beenapplied also to Italian verbs but we chose to avoid this solution be-cause of the complexity of their argument structures We decidedinstead to manually evaluate all the verbs described in the ItalianLexicon-Grammar binary tables so we could preserve the differentlexical syntactic and transformational rules connected to each oneof them

351 Literature Survey on Sentiment Lexicon Propagation

The works on sentiment lexicon propagation follow three main re-search lines The first one is grounded on the richness of the alreadyexistent thesauri WordNet (Miller 1995) among others AlthoughWordNet does not include semantic orientation information for itslemmas semantic relations such as synonymy or antonymy are com-monly used in order to automatically propagate the polarity startingfrom a manually annotated set of seed words Anyway this approachpresents some drawbacks such as the lack of scalability the unavail-ability of enough resources for many languages including Italian andthe difficulty to handle newly coined words which are not alreadycontained in the thesauri Furthermore homographs which belongto different synsets could present a diversity of meanings and polari-ties (Silva et al 2010)The second line of research is based on the hypothesis that the wordsthat convey the same polarity appear close in the same corpus sothe propagation can be performed on the base of co-occurrence al-gorithmsIn the end the morphological approach is the one that employs mor-phological structures and relations for the the assignment of the priorsentiment polarities to unknown words on the base of the manipula-tion of the morphological structures of known lemmas

35 THE AUTOMATIC EXPANSION OF SENTITA 127

In the following Paragraphs we will briefly describe the most interest-ing contributions pertaining to the just described research areas

3511 Thesaurus Based Approaches

Kamps et al (2004) investigated the graph-theoretic model of Word-Netrsquos synonymy relation and using elementary notions from graphtheory proposed a measure for the automatic attribution of the se-mantic orientation to adjectives All the words listed in WordNet havebeen collected and related on the base of their occurrence in the samesynsetAlthough current WordNet-based measures of distance or similarityusually focus on taxonomic relations and in general include in theiralgorithms the antonymy relation the authors preferred to use onlythe synonymy relationHu and Liu (2004) chose to limit the lexicon construction to the ad-jectives occurring in sentences that contained one or more productfeatures due to their defined interest on product featuresIn their research in order to predict the semantic orientations ofsuch adjectives the authors proposed a simple and effective methodgrounded on the adjective synonym and antonym sets discoveredsurfing the WordNet graphsKim and Hovy (2004) proposed a system for the automatic detectionof opinion holders topics and features for each opinion that containstwo modules one for the determination of the word sentiment andanother for the sentiment sentencesIn their System Architecture it is interesting for the task in exam thestage in which the word sentiment classifier individually calculatedthe polarity of all the sentiment-bearing words Their basic approachstarted from a small amount of seed words sorted by polarity into apositive and a negative list WordNet has been used to lengthen theinitial list on the base of two very intuitive insights synonyms of polarwords which usually preserve the orientation of the original lemma

128CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

and antonyms of polar words that reverse the orientation of the origi-nal lemmasEsuli and Sebastiani (2005) presented a semi-supervised learningmethod for the orientation identification of subjective terms basedon the classification of their glossesThe authors started from a representative seed set built for both thepositive and negative categories and from some lexical relations de-fined in WordNet (eg synonymy hypernymy and hyponymy for theannotation of terms with the same orientation and direct antonymyand indirect antonymy for the terms with opposite orientation) Everytime in which a new term is added to the original ones it becomes partof the training set for the learning phase performed with a binary textclassifier All the glosses of a given term found in a machine-readabledictionary are collocated into a vectorial form thanks to standard textindexing techniquesEsuli and Sebastiani (2006b) tested the following approaches

bull a two-stage method in which a first classifier groups the termsinto the Subjective or Objective categories and another classifiertags the Subjective terms as Positive or Negative

bull other two binary classifiers based on learning algorithms cat-egorise terms belonging to the Positivenot-Positive categoriesand the ones fitting the Negativenot-Negative categories

bull a method that simply considers Positive Negative and Objectiveas three categories with equal status

Argamon et al (2009) grounded on the Appraisal Theory a methodfor the automatic determination of complex sentiment-related at-tributes (eg type and force) The authors applied supervised learn-ing to Word-Net glosses The method started from small training setsof (positive or negative) seed words and then added them new setsof terms collected in the WordNet graph along the synonymy andantonymy relations This way based on the WordNet glosses each

35 THE AUTOMATIC EXPANSION OF SENTITA 129

term can received a vectorial representations thanks to standard textindexing techniquesDragut et al (2010) based their research on few basic assumptions

bull the semantic relations between words with known polarities canbe used to enlarge sentimental word dictionaries

bull two words are related if they share at least a synset

bull each synset has a unique polarity

bull the deduction can be performed just on few WordNet semanticrelations eg hyponymy antonymy and similarity

Thanks to these intuitions given a sentimental seed dictionary theydeduced approximately an additional 50 of words endowed with aspecific polarity on top of the electronic dictionary WordNetHassan and Radev (2010) applied a Markov random walk model to alarge word relatedness graph with the purpose of producing a polarityestimate of the subjective wordsIn detail in their network two nodes are linked if they are semanti-cally related Among the sources of information used as indicators ofthe relatedness of words we mention WordNetThe hypothesis of the work is the following a random walk that startswith a given word is more likely to firstly hit a word with the same se-mantic orientation than words with a different orientationPaulo-Santos et al (2011) with a semi-supervised approach createda small polarity lexicon (3000+ entries 8486 accuracy) for the Por-tuguese language having as input only a limited collection of re-sources a set of ten seed words labelled as positive or negative acommon online dictionary a set of extraction rules and an intuitivepolarity propagation algorithmThe authors implemented their task by converting the online dictio-nary into a directed graph in which nodes correspond to words andedges correspond to synonyms antonyms or other semantic relationsbetween words Thus they propagated the polarities of the known

130CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

seed set to unlabelled words applying a graph breadth-first traversalIn their research Maks and Vossen (2011) explored two approachesfor the annotation of polarity (positive negative and neutral) of ad-jective synsets in Dutch using WordNet as lexical resource The firstapproach is based on the translation of the English SentiWordNet 10(Esuli and Sebastiani 2006b) into Dutch and the respective transferof the lemmasrsquo polarity values In the second approach WordNet isused in combination with a propagation algorithm that starts with aseed list of synsets of known sentiment and extend the polarity valuesthrough WordNet making use of its lexical relations

3512 Corpus Based Approaches

Baroni and Vegnaduzzo (2004) observed the co-occurrences of polaradjectives into subjective texts The ranking of a large list of adjectivesaccording to a subjectivity score has been implemented without anyknowledge-intensive external resource (eg lexical databases humanannotation ect ) The authors made use just of an unannotated listof adjectives and a small seeds set of manually selected subjective ad-jectivesThe main idea of this work is taken from the Turney (2001) work on co-occurrence statistics applied on the web as a corpus through the Web-based Mutual Information (WMI) method Baroni and Vegnaduzzo(2004) obtained their subjectivity scores by computing the Mutual In-formation of pairs of adjectives taken from each set using frequencyand co-occurrence frequency counts on the World Wide Web col-lected through queries to the AltaVista search engineKanayama and Nasukawa (2006b) proposed an unsupervised methodof lexicon construction for the annotation of polar clauses for domain-oriented Sentiment AnalysisThey grounded their work on unannotated corpora and anchoredtheir research on the context coherency (the tendency for equal po-larities to appear successively in contexts) achieving very satisfying re-

35 THE AUTOMATIC EXPANSION OF SENTITA 131

sults in terms of PrecisionQiu et al (2009) proposed a double propagation method which in-volved different kinds of relations between both sentiment words andfeatures (words modified by the sentiment words)With this method it has been possible to locate specific sentimentwords from relevant reviews starting from a small set of seed senti-ment words Their research emphasized the fact that in reviews senti-ment words always occur in association with featuresDouble propagation basically means that both sentiment words andfeatures can be reutilized to extract new sentiment words and newfeatures until no more sentiment words or features can be identifiedto continue the process The relations are described above all by De-pendency Grammars and trees Tesniegravere (1959) which represented thebase for the design of the extraction rulesIn detail the polarity of the new words is inferred using the followingrules

bull Heterogeneous rule the same polarity of known words is as-signed to the novel words

bull Homogeneous rule the same polarity of known words is as-signed to the novel words unless contrary words occur betweenthem

bull Intra-review rule in the cases in which the polarity cannot beinferred by dependency cues it is assumed that the sentimentword takes the polarity of the whole review

Maks et al (2012) proposed a method for the Dutch language thatis based on the idea that the words that express different types of sub-jectivity are distributed differently depending on the text types there-fore they grounded their work on the comparison between three cor-pora Wikipedia News and News commentsIn order to perform their task they used and test two different calcula-tions generally employed in Keyword Extraction tasks to identify the

132CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

words significantly frequent in a corpus with respect to other corporathe Log-likelihood Technique and the Percentage Difference Calcula-tions (DIFF) Their research revealed the better performances of theDIFF calculationWawer (2012) introduced for the Polish language a novel iterativetechnique of sentiment lexicon expansion which involved rule min-ing and contrast sets discovery The research took advantages fromtwo different textual resources in a first stage a small corpus endowedwith morpho-syntactic annotations (The National Corpus of Polish)to acquire candidates for emotive patterns and evaluate the vocabu-lary and then the whole web as corpus to perform the lexical expan-sionThe key idea of this work implies the fact that if a word appears inthe same extraction patterns as the seeds so they belong to the samesemantic class

3513 The Morphological Approach

Hatzivassiloglou and McKeown (1997) proposed a method for thesentiment lexicon expansion based on morphological relations be-tween adjectives They demonstrated that adjectives related in formalmost always have different semantic orientations eg ldquoadequate-inadequaterdquo ldquothoughtful-thoughtlessrdquo achieving 97 accuracyMoilanen and Pulman (2008) proposed five methods for the assign-ment of the prior sentiment polarities to unknown words based onknown sentiment carriers They started from a core sentiment lexi-con which contained 41109 entries tagged with positive (+) neutral(N) or negative (-) prior polarities and then they employed a classi-fier society of five rule-driven classifiers every one of which adopteda specific analytical strategy

Conversion that estimated zero-derived paronyms by retagging theunknown words with different POS tags

35 THE AUTOMATIC EXPANSION OF SENTITA 133

Morphological derivation that relied on derivational and inflectionalmorphology and is based on the words progressive transforma-tion into shorter paronymic aliases by using affixes and neo-classical combining forms Also some polarity reversal affixes(eg -less and not-so-) were supported

Affix-like Polarity Markers that computed affix-like sentiment mark-ers (eg well-built badly-behaving) which usually propagatesits sentiment orientation across the whole word

Syllables that split unknown words into individual syllables with arule-based syllable chunker and then computed them in orderto connect them with the words contained in the original lexicon

Shallow Parsing that split and POS-tagged unknown word into vir-tual sentences

Ku et al (2009) employed morphological structures and relationsfor the opinion analysis on Chinese words and sentences They classi-fied the words into eight morphological types through Support VectorMachines (SVM) and Conditional Random Fields (CRF) classifiersChinese morphological formative structures consist in three majorprocesses compounding affixation and conversion Every word iscomposed of one or more Chinese characters on the base of whichthe wordrsquos meaning can be interpretedThe authors demonstrated that the injection of morphological infor-mation can truly improve the performances of the word polarity de-tection taskNeviarouskaya (2010) proposed the expansion of the Sentiful lexicongrounding its task on the manipulation of the morphological struc-ture base and affixes of known lemmas (Plag 2003)The derivation process a linguistic device for the creation of new lem-mas from known ones by adding prefixes or suffixesThe results of the application of this method are 4000+ new derivedand scored sentiment words (1400+ adjectives almost 500 adverbs

134CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1800 nouns and almost 350 verbs)The author distinguished four types of affixes on the base of their rolesin the sentiment feature attribution

Propagating The sentiment features are preserved and are inher-ited by new derived lexical units often belonging to differentgrammatical categories (eg en- -ous -fy) The scoring functiontransfers the original polarity score to the new word without anyvariation

Reversing The affixes have the effect of changing the semantic orien-tation of the original lexeme (eg dis- -less) The scoring func-tion simply switches the original score of the starting lemma

Intensifying The sentiment features are strengthened (eg over-super-) The scoring function increases the score of the new wordwith respect to the score of the original lemma

Weakening The sentiment features are decreased (eg semi-) so thescore of the derive word is proportionally reduced by the scoringfunction

In the the approach of Neviarouskaya (2010) if a word is not alreadycontained in the Sentiful lexicon the algorithm checks the presenceof the missing lemma into the Wordnet database if the word actuallyexists there it is taken into account for a future inclusion into theSentiful lexicon Its sentiment orientation is always deduced from thefeatures of the starting word and the affixes used to derivate itNeviarouskaya et al (2011) described methods to automatically gen-erate and score a new sentiment lexicon SentiFul and expand itthrough direct synonymy and antonymy relations hyponymy rela-tions derivation and compounding with known lexical unitsThe authors distinguished four types of affixes used to derive newwords depending on the role they play with regard to sentimentfeatures propagating reversing intensifying and weakening Theyelaborated the algorithm for automatic extraction of new sentiment-

35 THE AUTOMATIC EXPANSION OF SENTITA 135

related compounds from WordNet by using words from SentiFul asseeds for sentiment-carrying base components and applying thepatterns of compound formationsWang et al (2011) proposed a morpheme-based fine-to-coarse strat-egy for Chinese sentence-level sentiment classification which usedpre-existing sentiment dictionaries in order to extract sentiment mor-phemes and calculate their polarity intensity Such morphemes arethen reused to evaluate the sentence-level semantic orientation onthe base of the sentiment phrases and their relevant polarity scoresThe authors preferred morphemes to words as the basic tokens forsentiment classification because they are much less numerous thanwordsWang et al (2011) derived the semantic orientation of unknown senti-ment words on the base of their component morphemes (Yuen et al2004 Ku et al 2009) thanks to a specific morpheme segmentationmodule which applied the Forward Maximum Matching (FMM) wordsegmentation technique for the decomposition of words into stringsof morphemes Their approach can manage both unknown lexicalsentiments and contextual sentiments for sentence-level sentimentclassification by taking into consideration multiple granularity-levelsentiments (eg sentiment morphemes sentiment words and senti-ment phrases) in a morpheme-based framework

352 Deadjectival Adverbs in -mente

As anticipated this Section aims to enlarge the size of SentIta on thebase of the morphological relations that connect the words and theirmeanings In a first stage of the work more than 5000 labeled ad-jectives have been used to predict the orientation of the adverbs withwhich they were morphologically related This Section explores therules formalized to perform the task

136CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Adverbs are morphologically invariable and consequently they donot present any inflection Anyway the majority of them is charac-terized by a complex structure that includes an adjective base and thederivational morpheme -mente ldquo-lyrdquo

[[veloce]A -mente]AVV ldquofast-lyrdquo

[[fragile]A -mente]AVV ldquodelicate-lyrdquo

[[rapido]A -mente]AVV ldquorapid-lyrdquo

Therefore thanks to a morphological FSA in SentIta the dictio-nary of sentiment adverbs has been derived from the adjectives one(see Figure 36) All the adverbs contained in the Italian dictionary ofsimple words have been put in a Nooj text and the above-mentionedgrammar has been used to quickly populate the new dictionary by ex-tracting the words ending with the suffix -mente ldquo-lyrdquo (Ricca 2004b)and by making such words inherit the adjectivesrsquo polarityThe Nooj annotations consisted in a list of 3200+ adverbs that at alater stage has been manually checked in order to adjust the gram-marrsquos mistakes (eg vigorosamente ldquovigorouslyrdquo and pazzamenteldquomadlyrdquo that respectively come from the positive adjective ldquovigorousrdquoand the negative adjective ldquomadrdquo as adverbs become intensifiers) andto add the Prior Polarity to the adverbs that did not end with the suf-fixes used in the grammar (eg volentieri ldquogladlyrdquo controvoglia ldquoun-willinglyrdquo)The manual check produced a set of 3600+ adverbs of sentiment thathas been completed with 126 adverbs that did not end in -menteIn detail the Precision achieved in this task is 99 and the Recall is88 The distribution of semantic tags in this dictionary is reported inTable 327 Figure 36 shows the local grammar in which we formal-ized the rules for the adverbs formation We started the word recogni-tion anchoring it on the localization of an adjective stem (see the firstnode on the left in the variable Agg)Than the automaton continues the word analysis by checking

35 THE AUTOMATIC EXPANSION OF SENTITA 137

Fig

ure

36

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofs

enti

men

tad

verb

s

138CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Tag Automatically Manuallygenerated adjusted

POS+FORTE 82 89POS 743 779

POS+DEB 61 55NEG+DEB 436 456

NEG 1273 1466NEG+FORTE 160 158

FORTE 366 482DEB 84 163

SINTESI 0 9CONFERMA 0 17

INVERTE 0 19Tot 3205 3693

Tot Int 450 645Tot Sent 2755 3003

Tot Discorso 0 42

Table 327 Composition of the adverb dictionary

that the adjective stem belongs to the sentiment dictionary (eg$Agg=A+POS+DEB) Figure 36 represents just an extract of the biggergrammar that includes not only the positive (POS) adverbs but alsothe negative onesNotice in the center of the FSA a multiplication of paths This dependson the different inflectional classes of adjectives to which the suffix -mente is attached (Figure 328)The rules used in the grammar to derive the adverbs are given in thefollowing (see Table 329 and Figure 328)

bull class 2 first path nothing in the adjective changes (eg veloc-eveloce-mente)

bull class 2 second path in the adjectives ending in -re -le the -e isdeleted [e] (eg fragil-e fragil-mente )

bull class 1 third path the -o is deleted [o] and substituted by the

35 THE AUTOMATIC EXPANSION OF SENTITA 139

Adj Number=s Number=pClass Gender=m Gender=f Gender=m Gender=f

1 -o -a -i -e2 -e -i

Table 328 Adjectiversquos inflectional classes Salvi and Vanelli (2004)

Adj Adjective AdverbClass Deletion Stem Thematic Vowel Suffix

1 rapid-o o rapid- -a-2 veloc-e - veloce- - -mente

fragil-e e fragil- -

Table 329 Rules for the adverb derivation

thematic vowel -a (eg rapid-o rapid-a-mente)

Actually in the FSA almost nothing about the inflection of the baseadjectives has been specified Because the adverbs of sentiment mustbe identified among the whole list of nooj adverbs and semanticallyannotated (and not generated from scratch) we did not find neces-sary to recall all the inflectional classes of the derived adjectives justthe deletion or the conservation of the final vowels was used to selectthe correct derivational ruleMistakes concerning the annotation of the adjectives regard those ex-ceptions that were not enough productive to deserve a specific pathin the local grammar Examples are the adjectives in -lento althoughthey have the -o as last vowel they do not require the thematic vowel-a- but the -e- (eg violento becomes violent-e-mente rather thanviolent-a-mente)

3521 Semantics of the -mente formations

The meaning of the deadjectival adverbs in -mente is not always pre-dictable starting from the base adjectives from which they are derived

140CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Also the syntactic structures in which they occur influences their in-terpretation Depending on their position in sentences the deadjecti-val adverbs can be described as follows

Adjective modifiers they modify adjectives or other adverbs eventhough two adverbs in -mente can appear close almost never

bull degree modifiers

ndash intensifierseg altamente ldquohighlyrdquo enormemente ldquoenormouslyrdquo

ndash downtonerseg moderatamente ldquomoderatelyrdquo parzialmente ldquopar-tiallyrdquo

Predicate modifiers they are translatable with the paraphrase in a Away in which A is the adjective base

bull verb argumentseg Maria si comporta perfettamenteldquoMaria acts perfectlyrdquo

bull extranuclear elementseg Maria si reca settimanalmente a MilanoldquoMaria goes weekly to Milanrdquo

Sentence modifiers they usually do not modify the sentence contentbut they often give it coherence or work as discourse signals

bull evaluative adverbs eg stupidamente ldquofoolishlyrdquo

bull modal adverbs eg necessariamente ldquonecessarilyrdquo

bull circumstantial adverbs of time eg ultimamente ldquolatelyrdquo

bull quantifiers over time eg frequentemente ldquooftenrdquo

bull domain adverbs eg politicamente ldquopoliticallyrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 141

The French LG description of adverbs in -mente includes 3200+items represented in LG tables which classify the adverbs in a sim-ilar way They refer to sentential adverbs eg conjuncts style dis-juncts and attitude disjuncts that includes evaluative adverbs adverbsof habit modal adverbs and subject oriented attitude adverbs andadverbs integrated into the sentence eg adverbs of subject orientedmanner adverbs of verbal manner adverbs of quantity adverbs oftime viewpoint adverbs focus adverbs (Molinier and Levrier 2000Sagot and Fort 2007)

3522 The Superlative of the -mente Adverbs

As concern the superlative form of the adverbs in -mente it must beunderlined that they have been treated as adverbs derived from thesuperlative form of the adjectives In fact the rule for the adverb for-mation is selected by the inflectional paradigm of the superlative formand not by the adjective inflection Moreover the semantic orienta-tion inherited by the superlative adverb is again the one belonging tothe superlative adjective and not the one of the adjective itself There-fore the superlative adverbs FSA is almost the same of the one shownin Figure 1 the only difference is in the recognition of the base adjec-tive that is A+SUP rather than only A

353 Deadjectival Nouns of Quality

The derivation of quality nouns (QN) from qualifier adjectives is an-other derivation phenomenon of which we took advantage for the au-tomatic enlargement of SentIta These kind of nouns allow to treat asentities the qualities expressed by the base adjectivesA morphological FSA (Figure 37) following the same idea of the ad-verbs grammar matches in a list of abstract nouns the stems that are

142CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

in morpho-phonological relation with our list of hand-tagged adjec-tives Because the nouns differently from the adverbs need to havespecified the inflection information we associated to each suffix en-try in the QN suffixes electronic dictionary the inflectional paradigmthat they give to the words with which they occur (see Table 330)

As regards the suffixes used to form the quality nouns (Rainer 2004)it must be said that they generally make the new words simply inheritthe orientation of the derived adjectives Exceptions are -edine and -eria that almost always shift the polarity of the quality nouns into theweakly negative one (-1) eg faciloneria ldquoslapdash attitude Also thesuffix -mento differs from the others in so far it belongs to the deriva-tional phenomenon of the deverbal nouns of action (Gaeta 2004) Ithas been possible to use it into our grammar for the deadjectival nounderivation by using the past participles of the verbs listed in the ad-jective dictionary of sentiment (eg Vsfinire ldquoto wear out Asfinitoldquoworn out Nsfinimento ldquoweariness) Basically we chose to includeit in our list of suffixes because of its productivity This caused an over-lap of 475 nouns derived by both the psychological predicates and thequalifier adjectives of sentiment Just 185 psych nominalizations ex-ceeded the coverage of our mophological FSA proving the power ofour methodology also in terms of RecallIn about half the cases the annotations differed just in terms of inten-

sity the other mistakes affected the orientation also (see Table 333)In general the Precision achieved in this task is 93 while the Preci-sion performances of the automatic tagging of the QN compared withthe manual tagging made on the corresponding nominalizations ofthe psych verbs is summarized in Table 334Tables 331 and 332 show the differences in productivity of all the suf-

fixes (SFX) among the electronic dictionary of abstract nouns (Abstr)the sublists of QN that have their counterparts in the Psy V-n list (PsyV-n=QN) and the whole collection of QN (QN tot)As regards the suffixes productivity Abstr the suffixes that achievedthe higher percentage values are -itagrave -ia and -(zione) The most pro-

35 THE AUTOMATIC EXPANSION OF SENTITA 143

Fig

ure

37

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofq

ual

ity

no

un

s

144CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Infl

ecti

on

alp

arad

igm

(FL

X)

Co

rrec

tmat

ches

Err

ors

Pre

cisi

on

()

-edin-e N46 0 0 --etagrave N602 0 0 --izie N602 0 0 --el-a N41 0 1 0-udine N46 5 2 7143-or-e N5 36 9 80-zion-e N46 359 59 8589-anz-a N41 57 9 8636-itudin-e N46 13 2 8667-ur-a N41 142 20 8765-ment-o N5 514 58 8986-izi-a N41 14 1 9333-enz-a N41 148 10 9367-eri-a N41 71 4 9467-ietagrave N602 27 1 9643-aggin-e N46 72 2 973-i-a N41 145 3 9797-itagrave N602 666 13 9809-ezz-a N41 305 2 9935-igi-a N41 3 0 100-z-a N41 2 0 100tot - 2579 196 9294

Table 330 Error analysis of the automatic QN annotation

35 THE AUTOMATIC EXPANSION OF SENTITA 145

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 498 73 305-aggin-e 149 13 72-itudin-e 31 1 13-ietagrave 65 2 27-igi-a 9 0 3-izi-a 40 2 14-enz-a 461 11 148-anz-a 189 7 57-eri-a 267 13 71-itagrave 3263 53 666-or-e 297 6 36-zion-e 2866 87 359-ur-a 1700 11 142-udin-e 39 3 5-i-a 3545 13 145-el-a 15 0 0-edin-e 4 0 0-etagrave 66 0 0-izie 3 0 0-z-a 30 2 2-ment-o 2520 150 514

Table 331 Presence of the QN suffixes in different dictionaries

146CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 31 1633 628-aggin-e 093 291 148-itudin-e 019 022 027-ietagrave 04 045 056-igi-a 006 0 006-izi-a 025 045 029-enz-a 287 246 305-anz-a 118 157 117-eri-a 166 291 146-itagrave 2032 1186 1372-or-e 185 134 074-zion-e 1785 1946 74-ur-a 1059 246 293-udin-e 024 067 01-i-a 2208 291 299-el-a 009 0 0-edin-e 002 0 0-etagrave 041 0 0-izie 002 0 0-z-a 019 045 004-ment-o 1569 3356 1059

Table 332 Productivity of the QN suffixes in different dictionaries

35 THE AUTOMATIC EXPANSION OF SENTITA 147

Co

rres

po

nd

ence

s

Dif

fere

nce

s

Dif

fere

ntO

rien

tati

on

On

lyD

iffe

ren

tIn

ten

sity

475 185 93 92

Table 333 About the overlap among Psych V-n dictionary and QN dictionary

Precision Precision Orientation onlyQNltgtPsy V-n 092 -QN= Psy V-n 061 080

Table 334 Precision measure about the overlap of QN and Psy V-n

ductive suffixes for the Psy V-n=QN formation are -(z)ione -mento and-ezza while the ones most productive for the QN tot are -itagrave -(z)ione-mento

354 Morphological Semantics

Our morphological FSA can moreover interact with a list of pre-fixes able to negate (eg anti- contra- non- ect ) or to inten-sifydowntone (eg arci- semi- ect ) the orientation of the wordsin which they appear (Iacobini 2004)If the suffixes for the creation of quality nouns can interact with thepreexisting dictionaries of the Italian module of Nooj in order toautomatically tag them with new semantic descriptions the prefixestreated in this Section directly work on opinionated documents sothe machine can understand the actual orientation of the wordsoccurring in real texts also when their morphological context shifts

148CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

the polarity of the words listed in the dictionariesThe collection of the suffixes that from now on we will call Morpho-logical Contextual Valence Shifters (mCVS) is reported in Tables 335and 336

Prefixes Negation typesnon- CDDmis- CRa- CR+PRVdis- CR+PRVin- CR+PRVs- CR+PRVanti- OPPcontra- OPPcontro- OPPde- PRVdi- PRVes- PRV

Table 335 Negation suffixes

They are endowed with special tags that specify the way in whichthey alter the meaning of the sentiment words with which they occur

bull FORTE ldquostrongrdquo intensifies the Semantic Orientation of thewords making their polarity increase of one position in the eval-uation scale (first path in Figure 38)

bull DEB ldquoweakrdquo downtones the Semantic Orientation of the wordsmaking their polarity decrease of one position in the evaluationscale (second path in Figure 38)

bull NEGAZIONE ldquonegationrdquo works following the same rules of thenegation formalised in the syntactic grammars (third path in Fig-ure 38)

ndash CR ldquocontraryrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 149

Intensifiers Downtonersiper- bis-macro- fra-maxi- infra-mega- intra-multi- ipo-oltre- micro-pluri- mini-poli- para-sopra- semi-sovra- sotto-stra- sub-super- -sur- -ultra- -

Table 336 Intensifierdowntoner suffixes

ndash OPP ldquooppositionrdquo

ndash PRV ldquodeprivationrdquo

ndash CDD ldquocontradictionrdquo

Figure 38 shows the morphological FSA that combines the polarityand intensity of the adjectives of sentiment with the meaning carriedon by the mCVSThe shifting rules in terms of polarity score which are the same ex-ploited in the syntactic grammar net (see Section 4) are summarizedin this grammar through the green comments on the right side of theautomatonAs regards the manipulation of the annotations of the resulting wordswe followed three parallel solutions

bull when the score doesnrsquot change (eg the case of the intensificationof words with starting polarity +3 or the case in which words withpolarity -1 are decreased) the resulting word simply inherits theinflectional ($2F) and syntactic and semantic ($2S) information

150CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re38M

orp

ho

logicalC

on

textualV

alence

Shiftin

go

fAd

jectives

35 THE AUTOMATIC EXPANSION OF SENTITA 151

of the original word

bull when the intensity changes and the polarity remains steady (egwhen the words with polarity +-2 are intensified or downtoned)the resulting word inherits the inflectional and syntactic and se-mantic information of the original word but also obtain the tagsFORTEDEB

bull when both the intensity and the polarity change (in almost ev-ery case in which the words are negated) the resulting word justinherits the inflectional information while the semantic tags areadded from scratch

We used the list of the sentiment adjectives as Nooj text in order tocheck the performances of our grammar The discovery that the wordscontaining the mCVS lemmatised in the dictionary are very few (just29 adjectives eg straricco ldquovery richrdquo ultraresistente ldquoheavy dutyrdquostramaledetto ldquodamnedrdquo and 9 adverbseg strabene ldquovery wellrdquo ul-trapiattamente ldquovery dullyrdquo) increases the importance of a FTA likethe one described in this ParagraphIt is also important to underline that all the synonyms of poco ldquofewrdquoalso the ones that take the shape of morphemes eg ipo- sotto- sub-do not seem be proper downtoners but resemble the behaviour ofthe negation words that transform the sentiment words with whichthey occur into weakly negative ones see Section 42 (eg ipodotatoldquosubnormalrdquo ipofunzionante ldquohypofunctioningrdquo) Therefore we ex-cluded them from the dictionary of mCVS and we included it into adedicated metanode of the morphological FSA able to correctly com-pute its meaning

Chapter 4

Shifting Lexical Valencesthrough FSA

41 Contextual Valence Shifters

In Section 331 we provided the definitions of Semantic Orientationthe measure of the polarity and intensity of an opinion (Taboada et al2011 Liu 2010) and of Prior Polarity the individual words contex-independent orientation (Osgood 1952) While the second concept isrelevant during the population of sentiment dictionaries the first onerefers to the final semantic annotation that can be automatically ormanually attributed to whole sentences and documentsThe need to make a distinction between these two measures is due tothe semantic modifications that can come from the words surround-ing contexts (Pang and Lee 2008a) Contextual Valence Shifters arelinguistic devices able to change the prior polarity of words when co-occurring in the same context (Kennedy and Inkpen 2006 Polanyiand Zaenen 2006)From a computational point of view Contextual Valence Shifting

153

154 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

is nothing more than an enhanced version of the term-countingmethod firstly proposed by Turney (2002) but from the distribu-tional analysis perspective it issues a number of challenges that de-serve to be examined (Neviarouskaya et al 2009a)In this work we handle the contextual shifting by generalizing all thepolar words that posses the same prior polarity A network of localgrammars as been designed on a set of rules that compute the wordsindividual polarity scores according to the contexts in which they oc-curIn general the sentence annotation is performed using six differentmetanodes that working as containers for the sentiment expressionsassign equal labels to the patterns embedded in the same graphsWe used the FSA editor of Nooj to formalize our CVS rules for theirease of use and their computational power but nothing prevents theuse of other strategies and tools for their application to texts Con-sequently in this chapter we will limit the discussion to the shift-ing rules without any further mentions of their actual formalizationwhich in a banal way can be summarized into the treatment of em-bedded graphs as score boxes

42 Polarity Intensification and Downtoning

421 Related Works

Amplifiers and downtoners (Quirk et al 1985) intensifiers and di-minishers (Polanyi and Zaenen 2006) overstatements and under-statements (Kennedy and Inkpen 2006) intensifiers and downtoners(Taboada et al 2011) are all couples of concepts that refer to the in-creasing or decreasing effects of special kinds of words on orientedlexical itemsAlthough the changes of the polarity scores are very intuitive the sim-ple addition and subtraction of a score tofrom the base valence of a

42 POLARITY INTENSIFICATION AND DOWNTONING 155

term (Polanyi and Zaenen 2006 Kennedy and Inkpen 2006) has beencriticized by Taboada et al (2011) which instead associated differentpercentage values (positive for intensifiers and negative for downton-ers) to each strength wordDifferently from both of these works we decided to restrain the com-plexity of the rules by avoiding the distinction of degrees In compen-sation we did not limit the intensive function only to degree adverbsand adjectives but we took under consideration also verbs and nouns

422 Intensification and Downtoning in SentIta

The lexical items for the computational treatment of intensification inSentIta are the ones denoted by the tags FORTE and DEBAs stated in the previous Chapter Section 33 the lexical items en-dowed with these labels do not possess any orientation but can mod-ify the intensity of semantically oriented wordsThe strength indicators included in the intensification FSA that ac-count for (but are not limited to) such modifiers are the following

bull Morphological Rules

ndash the use of the absolute superlative suffixes -issim-o and -errim-o that always increase the wordsrsquo semantic orienta-tions

ndash the use of intensifying or downtoning prefixes (eg super-stra- semi- micro-) shown in Section 354

bull Syntactic Rules

ndash the repetition of more than one negative or positive words(that always increases the wordsrsquo semantic orientations) egbello[+2] bello[+2] bello[+2] [+3] ldquonice nice nicerdquo)

ndash the co-occurrence of words belonging to the strength scale(tags FORTEDEB) with the sentiment words listed in the

156 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

evaluation scale (tags POSNEG)

As generally assumed in literature the adverbs intensify or atten-uate adjectives (46) verbs (48) and other adverbs (49) while the ad-jectives modify the intensity of nouns (47) We adopted the simpleststrategy for the polarity modification the additionsubtraction of onepoint into the evaluation scale (Polanyi and Zaenen 2006 Kennedyand Inkpen 2006) Because this scale does not exceed the +ndash3words that starts from this polarity can not be increased beyond (egdavvero[+] eccezionale[+3] [+3] ldquotruly exceptionalrdquo)

(46) Parzialmente[-] deludente[-2] anche il reparto degli attori [-1]ldquoPartially unsatisfying also the actor staff

(47) Ciograve che ne deriva () egrave una terribile[-2] confusione[-2] narrativa[-3]

ldquoWhat comes from it is a terrible narrative chaos

(48) Alla guida ci si diverte[+2] molto[+] [+3]ldquoIn the driverrsquos seat you have a lot of fun

(49) Ne sono rimasta molto[+] favorevolmente[+2] colpita [+3]ldquoI have been very favourably affected of it

Anyway as already discussed in Section 334 also special subsetsof verbs possess the power of intensifying or downtoning the polarityof the nominal groups or complement clauses they case by case affectin subject (50) or object (51) position (see Figure 31 Section 334)

(50) Lrsquoamore[+2] travolge[+] Maria [+3]ldquoThe love overwhelms Mariardquo

(51) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoYour words overwhelm Mary with happinessrdquo

Examples of intensifyingdowntoning verbs with the LG classesthey belong are reported below

42 POLARITY INTENSIFICATION AND DOWNTONING 157

bull the list of entries translated from the French work of Balibar-Mrabti (1995) eg travolgere[+] ldquoto overwhelmrdquo risplendere[+] ldquotoshinerdquo

bull verbs from LG class 24 (15 intense 2 weak) eg aumentare[+]

ldquoto increaserdquo gonfiare[+] ldquoto amplifyrdquo calare[ndash] ldquoto go downrdquodiminuire[ndash] ldquoto decreaserdquo

bull verbs from LG class 41 (45 intense 19 weak) eg rivoluzionare[+]

ldquoto revolutionizerdquo sconquassare[+] ldquoto smashrdquo moderare[ndash] ldquotomoderaterdquo placare[ndash] ldquoto appeaserdquo

bull verbs from LG class 47 (43 intense 18 weak) eg strillare[+] ldquotoshoutrdquo sventolare[+] ldquoto show offrdquo mormorare[ndash] ldquoto murmurrdquosussurrare[ndash] ldquoto whisperrdquo

bull verbs from LG class 50 (4 intense 0 weak) eg assicurare[+] ldquotoassurerdquo garantire[ndash] ldquoto guaranteerdquo

bull verbs from LG class 56 (5 intense 3 weak) eg abbandonarsi[+]

ldquoto abandon yourselfrdquo affrettarsi[+] ldquoto rushrdquo accennare[ndash] ldquototouch uponrdquo esitare[ndash] ldquoto hesitaterdquo

As regards intensifying or downtoning nouns we refer to the onesautomatically derived from degree adjectives (220 intense 95 weak)eg sfrenatezza[+] ldquowildnessrdquo abbondanza[+] ldquoabundancerdquo labilitagrave[ndash]

ldquoevanescencerdquo fugacitagrave[ndash] ldquofugacityrdquo and to the nominalizations of in-tensifying or downtoning verbs (41 intense 95 weak) eg rafforza-mento[+] ldquostrengtheningrdquo fervore[+] ldquofervorrdquo affievolimento[ndash] ldquoweak-ening rdquo diminuzione[ndash] ldquodecreaserdquoThey can modify both other nouns (52) and verbs (53)

(52) La sfrenatezza[+] dellrsquoodio[-3] di Maria [-3]ldquoThe wildness of the Maryrsquos haterdquo

(53) Luca difendeva[+2] Maria con fervore[+] [+3]ldquoLuca was defending Maria with fervourrdquo

158 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Intensification negation modality and comparison can appear to-gether in the same sentence We will describe these eventuality in thefurther paragraphs

423 Excess Quantifiers

Words that at first glance seem to be intensifiers but at a deeper anal-ysis reveal a more complex behavior are abbastanza ldquoenoughrdquo troppoldquotoo much and poco ldquonot much Because of their particular influ-ence on polar words their meaning especially when associated topolar adjectives have been associated in literature to other contex-tual valence shifters1 Examples are (Meier 2003 p 69) that proposedan interpretation that includes a comparison between extents

ldquoEnough too and so are quantifiers that relate an extentpredicate and the incomplete conditional (expressed bythe sentential complement) and are interpreted as compar-isons between two extentsThe first extent is the maximal extent that satisfies the extentpredicate The second extent is the minimal or maximal ex-tent of a set of extents that satisfy the (hidden) conditionalwhere the sentential complement supplies the consequentand the main clause the antecedent [] Intuitively thevalue an object has on a scale associated with the meaningof the adjective or adverb I related to some standard of com-parison that is determined by the sentential complementrdquo(Meier 2003 p 69)

(Schwarzschild 2008 p 316) that noticed the possibility of an implicitnegation

1 When quantifiers like the ones under examination modify polar words they compare such wordswith a threshold value that is defined by an infinitive clause in position N1 (eg The food is too good tothrow (it) away) (Meier 2003 Schwarzschild 2008)

42 POLARITY INTENSIFICATION AND DOWNTONING 159

ldquoIn excessives the infinitival complement while it does notform a threshold description it does contain an implicitnegation So like other negative statements excessives sup-port let alone rejoinders2rdquo

And again (Meier 2003 p 71) that transformed the infinitive clauseinto a modal expression

ldquoIn this paper I will propose that the sentential comple-ments of constructions with too and enough implicitly orexplicitly contain a modal expression [] Evidence for thismove is the fact that a modal expression (with existentialforce) can be added or omitted without changing the intu-itive meaning of the sentences [] An explicit modal ex-pression like be able or be allowed in the sentential comple-ment makes them more precise but it does not make themunacceptablerdquo

In this research we noticed as well that the co-occurrence of troppopoco and abbastanza with polar lexical items can provoke in their se-mantic orientation effects that can be associated to other contextualvalence shifters The ad hoc rules dedicated to these words (see Table41) are not actually new but refer to other contextual valence shift-ing rules that have been discussed in this Paragraph andor will beexplored in this ChapterTroppo and poco can also co-occur or can be negated the rules to beapplied in each case are reported in Table 42Troppo poco and abbastanza are also able to give a polarity to wordsthat in SentIta possess a neutral Prior Polarity An exception is the caseof non troppo + neutral words (Table 42) that does not produce gen-eralizable effects (eg non troppo decorato ldquonot so decoratedrdquo couldhave a weakly positive orientation while same can not be said of non

2This fuel is too volatile to use in a car engine = This fuel is too volatile to use in a car engine let alone alawn mower engine = This fuel should not be used in a car engine let alone a lawn mower engine Examplefrom Schwarzschild (2008)

160 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo Intensification Negative Switch Intensificationpoco Weakly Negative Switch Weakly Negative Switch Weak Negationabbastanza Downtoning Weakly Positive Switch Intensification

Table 41 The effects of troppo poco and abbastanza on polar lexical items

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo poco Strong Negation Negative Switch Downtoningun poco troppo Intensification Negative Switch Intensificationnon troppo Negative Switch mdash Weak Negationnon poco Intensification Weakly Positive Switch Intensification

Table 42 The effects of the co-occurrence and the negation of troppo poco withreference to polar lexical items

troppo nuovo ldquonot so newrdquo) Therefore it has been excluded from therules in order to avoid additional errors

43 Negation Modeling

431 Related Works

Negation modeling is an NLP research area that includes the auto-matic detection of negation expressions in raw texts and the determi-nation of the negation scopes that are the parts of the meaning modi-fied by negation) (Jia et al 2009)Negation meets the Sentiment Analysis need of determining the cor-rect polarity of opinionated words when shifted by the context That iswhy we included it into our set of Contextual Valence Shifters (Polanyiand Zaenen 2006 Kennedy and Inkpen 2006)In this paragraph we will survey the literature on negation modelinggoing in depth through the state-of-the-art methods for the automatic

43 NEGATION MODELING 161

negation detection from the pioneer work of Pang and Lee (2004) tomore sophisticated techniques such as some Semantic Compositionmethods which even define the syntactic contexts of the polar expres-sions (Choi and Cardie 2008)Despite the large interest on negation in the biomedical domain(Harkema et al 2009 Descleacutes et al 2010 Dalianis and Skeppstedt2010 Vincze 2010) maybe due to the availability of free annotatedcorpora (Vincze et al 2008) this paragraph focuses on the more sig-nificant works on sentiment analysis

Bag of Words (Pang and Lee 2004) demonstrated that using con-textual information can improve the polarity-classification accu-racy By means of a standard bag-of-words representation theyhandled negation modeling by adding artificial words to textsand without any explicit knowledge of polar expressions (eg Ido not NOT_like NOT_this NOT_new NOT_Nokia NOT_model)The problem with this method is the duplication of the featureswhich are always represented in both their plain and negated oc-currences (Wiegand et al 2010)

Contextual Valence Shifters Polanyi and Zaenen (2006) and Kennedyand Inkpen (2006) proposed a more sophisticated method thatboth switches (eg clever[+2] = not clever[-2]) or shifts (efficient[+2]

= rather efficient[+1]) the initial scores of words when negated ordowntoned in context As a general rule polarized expressionsare considered to be negated if the negation words immediatelyprecede themWilson et al (2005 2009) distinguished the following three typesof binary relationship polarity features for negation modeling

bull negation features negation expressions that negate polar ex-pressions

bull shifter features expressions that can be approximatelyequated with downtoners

162 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull positive and negative polarity modification features specialkind of polar expressions that modify other polar expres-sions turning their polarity into the positive or negative one

Semantic Composition Moilanen and Pulman (2007) exploited thesyntactic representations of sentences into a compositional se-mantics framework with the purpose of computing the polarityof headlines and complex noun phrases In this work composi-tion rules are incrementally applied to constituents dependingon the negation scope by means of negation words shifters andnegative polar expressionsA similar approach is the one of Shaikh et al (2007) that used verbframes as a more abstract level of representationChoi and Cardie (2008) computed the phrases polarity in twosteps the assessment of polarity of the constituent and the sub-sequent application of a set of previously defined inference rulesFor example the rule [Polarity([NP1]ndash [IN] [NP2]ndash) = +] can be ap-plied to phrases like [[lack]ndash

NP1 [of ]IN [crime]ndashNP2 in rural areas])

Liu and Seneff (2009) Taboada et al (2011) proposed composi-tional models able to mix together intensifiers downtoners po-larity shifters and negation words always starting from the wordsprior polarities and intensitiesThe work of Benamara et al (2012) goes beyond the studies men-tioned above since they specifically account for negative polarityitems and for multiple negativesSocher et al (2013) exploited Recursive Neural Tensor Network(RNTN) to compute two types of negation

bull negation of positive sentences where the negation is sup-posed to changes the overall sentiment of a sentence frompositive to negative

bull negation of negative sentences and their negationwhere theoverall sentiment is considered to become less negative (butnot necessarily positive)

43 NEGATION MODELING 163

Negation Scope Detection Jia et al (2009) introduced the conceptof scope of the negation term t The scope identification3 goesthrough the computing of a candidate scope (a subset of thewords appearing after t in the sentence that represent the mini-mal logical units of the sentence containing the scope) and thena pruning of those words The computational procedure thatapproximates the candidate scope considers the following ele-ments static delimiters eg because or unless that signal the be-ginning of another clause dynamic delimiters eg like and forand heuristic rules focused on polar expressions that involve sen-timental verbs adjectives and nouns

Morphological Negation modeling In order to complete the litera-ture survey on negation modeling we can not fail to mentionalso morphological negation modeling However due to the factthat the scope of any morphological negation remains includedinto simple words (Councill et al 2010) we deepened this as-pect of the negation in Sections 351 and 354 of Chapter 3We mention again here only the most significant contributionon this topic Hatzivassiloglou and McKeown (1997) Plag (2003)Yuen et al (2004) Moilanen and Pulman (2008) Ku et al (2009)Neviarouskaya (2010) Neviarouskaya et al (2011) and Wang et al(2011)

432 Negation in SentIta

As exemplified in the following sentences negation indicators do notalways change a sentence polarity in its positive or negative counter-parts (54) they often have the effect of increasing or decreasing thesentence score (55) That is why we prefer to talk about valence ldquoshift-ing rather than ldquoswitchingrdquo

3Scope Identification concerns the determination at a sentence level of the tokens that are affectedby negation cues (Jia et al 2009)

164 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

(54) Citroen non[neg] produce auto valide[+2] [-2]ldquoCitroen does not produce efficient cars

(55) Grafica non proprio[neg] spettacolare[+3] [-1]ldquoThe graphic not quite spectacular

(Taboada et al 2011 p 277) made comparable considerations

ldquoConsider excellent a +5 adjective If we negate it we get notexcellent which intuitively is a far cry from atrocious a -5adjective In fact not excellent seems more positive than notgood which would negate to a -3 In order to capture thesepragmatic intuitions we implemented another method ofnegation a polarity shift (shift negation) Instead of chang-ing the sign the SO value is shifted toward the opposite po-larity by a fixed amount (in our current implementation 4)Thus a +2 adjective is negated to a -2 but the negation of a-3 adjective (for instance sleazy) is only slightly positive aneffect we could call lsquodamning with faint praisersquo [] it is verydifficult to negate a strongly positive word without implyingthat a less positive one is to some extent true and thus ournegator becomes a downtonerrdquo

Just as (Benamara et al 2012 p 11)

ldquo() negation cannot be reduced to reversing polarity Forexample if we assume that the score of the adjective ldquoex-cellentrdquo is +3 then the opinion score in ldquothis student is notexcellentrdquo cannot be -3 It rather means that the student isnot good enough Hence dealing with negation requires togo beyond polarity reversalrdquo

Even though the subtraction of a fixed amount of polarity points4

seems to us a risky generalization the authors confirmed the Hornrsquos

4 Other strategies for the automatic negation modeling have been proposed by Yessenalina and Cardie(2011) who combined words using iterated matrix multiplication and Benamara et al (2012) who morein line with our research computed negation on the base of its own types

43 NEGATION MODELING 165

ideas on the asymmetry between affirmative and negative sentencesin natural language differently from standard logic (Horn 1989)Such assumption complicates any attempt to automatically extractthe meaning of negated phrases and sentences simply switching theiraffirmative scoresOur hypothesis that can be easily collocated into the compositionalsemantics approaches is that the final polarity of a negated expres-sion can be easily modulated but only taking into account at thesame time both the prior polarity of the opinionated expressions andthe strength of the negation indicators (see Tables 43 45 and 46)Despite of any complexity the finite-state technology offered us theopportunity to easily compute the negation influence on polarizedexpressions and to formalize the rules to automatically annotate realtextsAs regards the full list of negation indicators we included in our gram-mar three types of negation (Benamara et al 2012 Godard 2013) thatrespectively include the following items As we will show each typehas a different impact on the opinion expressions in terms of polarityand intensity

Negation operators

bull simple adverbs of negationnoAVV+NEGAZIONE

nonAVV+NEGAZIONE

micaAVV+NEGAZIONE

affattoAVV+NEGAZIONE+FORTE

neancheAVV+NEGAZIONE

neppureAVV+NEGAZIONE

mancoAVV+NEGAZIONE

senzaPREP+NEGAZIONE 5

bull compound adverbs of negationneanche per sognoAVV+AVVC+PC+PN+NEGAZIONE+FORTE

5ldquonotrdquo ldquoat allrdquo ldquoby no meansrdquo ldquoneitherrdquo

166 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

per niente al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

per nulla al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

in nessun modoAVV+AVVC+PDETC+PDETN+NEGAZIONE+FORTE

per nienteAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

per nullaAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

niente affattoAVV+AVVC+CC+AVVAVV+NEGAZIONE+FORTE6

Negation operator Rules Negative operators can appear alone (56)or with a strengthening function with another negation adverb(57)

(56)

Mi caA f f at to

Per ni ente

bel lo l primehotel

ldquoTotally not cool the hotelrdquo

(57) Lprimehotel non egrave

mi caa f f at to

per ni ente

bel lo

ldquoThe hotel is not cool at allrdquo

The general rules concerning negation operators formalized inthe grammars in the form of FSA are summarized in the Tables43 44 45 and 46 The local grammar principle is to put inthe same embedded graph all the structures that are supposedto receive the same score Such score is indicated in the columnShifted Polarity

Negative Quantifiers

bull AdjectivesnessunoA+FLX=A114+NEGAZIONE

nienteA+FLX=A605+NEGAZIONE 7

6ldquono wayrdquo ldquofor anything in the worldrdquo ldquotherersquos no wayrdquo ldquofor nothingrdquo ldquonot at allrdquo7ldquonobodyrdquo ldquonothingrdquo

43 NEGATION MODELING 167

Negation Sentiment Word ShiftedOperator Word Polarity Polarity

non

fantastico +3 -1bello +2 -2

carino +1 -2scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 43 Negation rules

Negation Negation Sentiment Word ShiftedOperator Operator Word Polarity Polarity

non mica

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 44 Negation rules with the repetition of negative operators

Strong Negation Sentiment Word ShiftedOperator Word Polarity Polarity

per niente

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 45 Negation rules with strong operators

168 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Weak Negation Sentiment Word ShiftedOperator Word Polarity Polarity

poco

fantastico +3 -1bello +2 -1

carino +1 -1nuovo 0 -1scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 46 Negation rules with weak operators

bull AdverbsnienteAVV+NEGAZIONE

nullaAVV+NEGAZIONE

pocoAVV+NEGAZIONE+DEB

poco o nienteAVV+PCONG+NKN+NEGAZIONE+DEB

poco o nullaAVV+PCONG+NKN+NEGAZIONE+DEB maiAVV+NEGAZIONE8

bull PronounsnessunoPRON+FLX=PRO717+NEGAZIONE

nientePRON+FLX=PRO719+NEGAZIONE

pocoPRON+FLX=PRO721+NEGAZIONE+DEB9

Negative Quantifiers Rules As indicated in the dictionary extractabove and as exemplified in the sentences below negative quan-tifiers which express both a negation and a quantification cantake the shape of different part of speech and consequently theycan assume in sentences different syntactic positionsWe can directly observe that they change their function on thebase of the different roles they assume

Negative quantifiers as Adjectives they tend to be pragmatically

8ldquonothingrdquo ldquolittlerdquo ldquolittle or nothingrdquo ldquoneverrdquo9ldquonot onerdquo ldquofewrdquo

43 NEGATION MODELING 169

negative when occurring as adjectival modifiers (58 59) justin the same way of lexical negation indicators (Potts 2011)

(58) Cortesia nessuna [-2]ldquoNo kindnessrdquo(59) Nessun servizio nelle stanze [-2]ldquoNo room servicerdquo

Negative quantifiers as Adverbs they work exactly as poco whenthey occupy the adverbial position (60 61) following therules formalized in the next paragraph (see Table 46) As ad-verbs they can occur together with negation operators (61)or not (60)

(60) Costa quasi niente [+1]ldquo It costs almost nothingrdquo(61) Non costa nulla [+2]ldquo It doesnrsquot cost anythingrdquo

Negative quantifiers as Pronouns they seem to possess the samestrengthening function of negative operators when occur-ring as pronouns (62 63) the rules of reference are in Table45

(62) Non lo consiglio a nessuno [-3]ldquoI donrsquot recommend it to anyonerdquo(64) Non gliene frega niente a nessuno [-3]ldquoNobody caresrdquo

Negative quantifiers in verbless sentences into elliptical sen-tences (64 65) they resemble the negation function of neg-ative operators following the rules of Table 43

(65) Niente di veramente innovativo [-1]ldquoNothing truly innovativerdquo(66) Nulla da eccepire [+1]ldquoNo objectionsrdquo

170 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Lexical Negation

Nouns

V er bs

Ad j ect i ves

mancanzaassenzacar enza

mancar edi f et t ar e

scar seg g i ar e

mancantecar ente

pr i vo

di

10

Lexical negation Rules lexical negation regards content word nega-tors that always possess an implicit negative influence There-fore when they occur in texts their polarity is assumed to be neg-ativeBecause they can co-occur with both the negative operators andquantifiers their polarity can also be shifted to the positive oneby simply applying the negation rules conceived for the first twonegation indicators

44 Modality Detection

441 Related Works

Speculative Language Detection (Dalianis and Skeppstedt 2010 De-scleacutes et al 2010 Vincze et al 2008) Hedge Detection (Lakoff 1973Ganter and Strube 2009 Zhao et al 2010) Irrealis Bloking (Taboadaet al 2011) and Uncertainty Detection (Rubin 2010) are all tasks thatcompletely or partially overlap modal constructions

10ldquolack to lack in to be laking ofrdquo

44 MODALITY DETECTION 171

In order to account for the most popular annotate corpora in this NLPfields we mention the BioScope corpus Vincze et al (2008) which hasbeen annotated with negation and speculation cues and their scopesthe FactBank that has been annotated with event factuality (Sauriacuteand Pustejovsky 2009) and the CoNLL Shared Task 2010 (Farkas et al2010) which focused on hedges detection in natural language textsModality has been defined by Kobayakawa et al (2009) into a taxon-omy that includes request recommendation desire will judgmentetcTaboada et al (2011) did not consider irrealis markers to be reliablefor the purposes of sentiment analysis Therefore they simply ignoredthe semantic orientation of any word in their scope Their list of irre-alis markers includes modals conditional markers (eg ldquoifrdquo) negativepolarity items (eg ldquoanyrdquo and ldquoanythingrdquo) some verbs (ldquoto expectrdquo ldquotodoubtrdquo) questions words enclosed in quotes which could not neces-sarily reflect the authorrsquos opinion)According to Benamara et al (2012) modality can be used to expresspossibility necessity permission obligation or desire through gram-matical cues such as adverbial phrases (eg ldquomayberdquo ldquocertainlyrdquo)conditional verbal moods some verbs (eg ldquomustrdquo ldquocanrdquo ldquomayrdquo)some adjectives and nouns (eg ldquoa probable causerdquo)Relying on the categories of Larreya (2004) Portner (2009) they distin-guished the following three types of modality each one of which hasbeen indicated to possess a specific effect on the opinion expressionin its scope in term of strength or degree of certainty

Buletic Modality it indicates the speakerrsquos desireswishes

Cues verbs denoting hope (eg ldquoto wishrdquo)

Epistemic modality it refers to the speakerrsquos personal beliefs andaffects the strength and the certainty degree of opinion expres-sions

Cues doubt possibility or necessity adverbs (eg ldquoperhapsrdquo

172 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ldquodefinitelyrdquo) and modal verbs (eg ldquohave tordquo ldquomaycanrdquo)

Deontic Modality it indicates possibilityimpossibility or obliga-tionpermission

Cues same modal verbs of the epistemic modality but with adeontic reading (eg ldquoYou must go see the movierdquo)

442 Modality in SentIta

When computing the Prior Polarities of the SentIta items into the tex-tual context we considered that modality can also have a significantimpact on the SO of sentiment expressionsAccording to the literature trends but without specifically focusingon the Benamara et al (2012) modality categories we recalled in theFSA dedicated to modality the following linguistic cues and we madethem interact with the SentIta expressions

Sharpening and Softening Adverbs

bull simple adverbs (32 sharp and 12 soft)inequivocabilmenteAVV+FORTE+Focus=SHARP

necessariamenteAVV+FORTE+Focus=SHARP

relativamenteAVV+DEB+Focus=SOFT

suppergiugraveAVV+DEB+Focus=SOFT

bull compound adverbs (38 sharp and 26 soft)con ogni probabilitagraveAVV+AVVC+PDETC+PDETN+Focus=SHARP

senza alcun dubbioAVV+AVVC+PDETC+PDETN+Focus=SHARP

fino a un certo puntoAVV+AVVC+PAC+PDETAGGN+Focus=SOFT

in qualche modoAVV+AVVC+PDETC+PDETN+Focus=SOFT

Modal Verbs from the LG class 56 (dovere ldquohave to potereldquomaycan volere ldquowantrdquo) that have as definitional struc-

44 MODALITY DETECTION 173

ture N0 V (E + Prep) Vinf W and accept in position N1 a director prepositional infinitive clause (all) and only in the cases ofpotere and volere a N1-um

Conditional and Imperfect Tenses tenses are located after thetexts preprocessing through the annotations Tempo=IM andTempo=C

Dealing with modality is very challenging in the Sentiment analy-sis field because in this research area it is not enough to simply detectits indicators but it is also necessary to manage its effects in term ofpolarity We did not find appropriate to simplify the problem by ig-noring all the semantic orientation of phrases and sentences in whichmodality can be detected (Taboada et al 2011) In fact as exemplifiedbelow there are cases in which modality sensibly affects its intensityand other cases in which it regularly shifts the sentence polarity intospecific directions

Uncertainty Degrees the words annotated with the labels +SHARPand +SOFT described above just as intensifiers and downtonershave the power of modifying the intensity of the words endowedwith positive or negative prior polarities with the same effectsdescribed in Section 42 We respectively selected them amongthe words tagged with the labels FORTE and DEB with the onlyaim of providing an exhaustive module that can work well withboth Modality Detection and Sentiment Analysis purposes

ldquoPotererdquo + Indicative Imperfect + Oriented Item modal verbs occur-ring with an imperfect tense can turn a sentence into a weaklynegative one (66) when combined with positive words or into aweakly positive one when occurring with negative items (67)The effect is really close to the weak negation (see Table 46 inSection 43) This can be explained by the fact that the meaning

174 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

of the sentences of this kind always concerns subverted expecta-tions

(66) Poteva[Modal+IM] essere una trama interessante[+2] [-1]ldquoIt could be an interesting plot

(67) Poteva[Modal+IM] essere una trama noiosa[-2] [+1]ldquoIt could be a boring plot

ldquoPotererdquo + Indicative Imperfect + Comparative + Oriented Items alsoin this case we use the explanation of subverted expectationsThe terms of comparison are obviously the expected opinionobject and the real one If the expected object is better than thereal one the sentence score is negative (68) if it is worse the sen-tence score is positive (69) The rules for the polarity score attri-bution are the ones displayed in Section 45

(68) Poteva[Modal+IM] essere piugrave[I-CW] dettagliata[+2] [-1]ldquoIt might have been more detailedrdquo

(69) Poteva[Modal+IM] andare peggio[I-OpW +2] [-1]ldquoIt might have gone worserdquo

ldquoDovererdquo + Indicative Imperfect Here again we face disappointmentof expectations but with a stronger effect the negation does notregard the possibility of the opinion object to be good but a ne-cessity Therefore we assumed the final score in these cases tofollow the negation rules shown in Table 43 in Section 43 (70)

(70) Questo doveva[Modal+IM] essere un film di sfumature[+1] [-2]ldquoThis one was supposed to be a nuanced movierdquo

ldquoDovererdquo + ldquoPotererdquo + Past Conditional Past conditional sentences(Narayanan et al 2009) also have a particular impact on theopinion polarity especially with the modal verb dovere (ldquohaveto) In such cases as exemplified in (71) the sentence polarityis always negative in our corpus

45 COMPARATIVE SENTENCES MINING 175

(71) Non[Negation] avrei[Aux+C] dovuto[Modal+PP] buttare via i mieisoldi [-2]ldquoI should not have burnt my money

The verb potere instead follows the same rules described abovefor the imperfect tense

45 Comparative Sentences Mining

451 Related Works

Comparative Sentence Mining and Comparison Mining Systems areNLP techniques and tool that only in part coincide with the SentimentAnalysis fieldSentences that express a comparison generally carry along with themopinions about two or more entities with regard to their shared fea-tures or attributes (Ganapathibhotla and Liu 2008) While the Sen-timent Analysis main purpose is to identify details about such opin-ions the Information Extraction main goals regard comparative sen-tence detection and classification Therefore if the object of the anal-ysis and the basic information units are shared among the two ap-proaches their purposes can sometimes coincide but in other casesbe differentAmong the few studies on the computational treatment of compari-son (Jindal and Liu 2006ab Ganapathibhotla and Liu 2008 Fiszmanet al 2007 Yang and Ko 2011) only Ganapathibhotla and Liu (2008)tried to extract the authorsrsquo preferred entities and to identify the Se-mantic Orientation of comparative sentences after their identifica-tion and classificationWith the following two examples Ganapathibhotla and Liu (2008)showed the difficulty in the polarity determination of context-dependent opinion comparatives that for a correct interpretation al-

176 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ways require domain knowledge eg the word ldquolongerrdquo can assumepositive (72) or negative (73) orientation depending on the productfeature on which it is associated

(72) ldquoThe battery life of Camera X is longer than Camera Yrdquo(73) ldquoProgram Xrsquos execution time is longer than Program Yrdquo

Since this work aims to provide a general purpose lexical and gram-matical database for Sentiment Analysis and Opinion Mining we didnot face here the treatment of this special kind of opinionated sen-tences just as we did not include in the lexicon items with domain-dependent meaningConcerning the comparative classification the main types that havebeen identified in literature are the following

bull Gradable (direct comparison)

ndash Non-equal

Increasing

Decreasing

ndash Equative

ndash Superlative

bull Non-gradable (implicit comparisons)

According to Ganapathibhotla and Liu (2008) Jindal and Liu(2006ab) a comparative relation can be represented in the followingway

ComparativeWord Features EntityS1 EntityS2 Type

Where ComparativeWord is the keyword used to express a compar-ative relation in a sentence Features is a set of features being com-pared EntityS1 is the first term of comparison and EntityS2 is thesecon oneThis way examples (72) and (73) are represented as follows

45 COMPARATIVE SENTENCES MINING 177

(72) longer battery life Camera X Camera Y Non-equal Gradable

(73) longer execution time Program X Program Y Non-equal Gradable

Our work shares with Ganapathibhotla and Liu (2008) the followingbasic rules

Increasing comparative + Negative item minusrarr Negative opinion minusrarr EntityS2 is preferred

Increasing comparative + Positive item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Negative item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Positive item minusrarr Negative opinion minusrarr EntityS2 is preferred

However differently from our work the authors simply switchedthe polarity of the opinions (and consequently the preferred en-tity) in the cases in which comparative sentences contained negationwords or phrases They underlined that sometimes this operationcould appear problematic eg ldquonot longerrdquo does not mean ldquoshorterrdquoActually the problem is that the authors did not take into accountdifferent degrees of positivity and negativity As we will show in thenext paragraphs of this Section the co-occurrences of weakly pos-itivenegative and intensely positivenegative Prior Polarities withcomparison indicators can generate different final sentence scoresAs regards the automatic analysis of superlatives we mention thework of Bos and Nissim (2006) that grounded the semantic interpre-tation of superlatives on the characterization of a comparison set (theset of entities that are compared to each other with respect to a cer-tain dimension) The authors took into consideration the superlativemodification performed by ordinals cardinals or adverbs (eg inten-sifiers or modals) and also the fact that superlative can manifest itselfboth in predicative or attributive formTheir superlative parser able to recognise a superlative expressionand its comparison set with high accuracy provides a CombinatoryCategorial Grammar (CCG) (Clark and Curran 2004) derivation of theinput sentence that is then used to construct a Discourse Represen-tation Structure (DRS) (Kamp and Reyle 1993)In this thesis we did not go through a semantic and logical analysis of

178 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

comparative and superlative sentences as well explained in the nextparagraph we just used them as CVS in order to accurately modify thepolarity of the entities they involve

452 Comparison in SentIta

In general this work which focuses on Gradable comparison in-cludes the analysis of the following comparative structures

bull equative comparative frozen sentences of the type N0 Agg comeC1 from the LG class PECO (see Section 342)

bull opinionated comparative sentences that involve the expressionsmeglio di migliore di ldquobetter than peggio di peggiore di ldquoworsethan superiore a ldquosuperior to inferiore a ldquoless than (74)

bull adjectival adverbial and nominal comparative structures that re-spectively involve oriented adjectives adverbs and nouns fromSentIta (75)

bull absolute superlative (75)

bull comparative superlative (76)

The comparison with other products (74) has been evaluated withthe same measures of the other sentiment expressions so the polaritycan range from -3 to +3The superlative that can imply (76) or not (75) a comparison with awhole class of items (Farkas and Kiss 2000) confers to the first termof the comparison the higher polarity score so it always increases thestrength of the opinion Its polarity can be -3 or +3

(74) LrsquoS3 egrave complessivamente superiore allrsquoIphone5 [+2]ldquoThe S3 is on the whole superior to the iPhone5

(75) Il suo motore era anche il piugrave brioso[+2] [+3]ldquoIts engine was also the most lively

45 COMPARATIVE SENTENCES MINING 179

(76) Un film peggiore di qualsiasi telefilm [-3]ldquoA film worse than whatever tv series

We did not include the equative comparative in the ContextualValence Shifters Section because it does not change the polarity ex-pressed by the opinion words (77)

(77) [Non[Negation] entusiasmante[+3]][-1] come i PES degli anni prece-denti

ldquoNot exciting as the the past years PESrdquo

In the next paragraphs we will go through the rules used in this FSAnetwork to compute the Semantic Orientations of minority and ma-jority comparative structures As will be observed the variety and thecomplexity of co-occurrence rules can be easily managed using thefinite-state technology

453 Interaction with the Intensification Rules

Rule II increasing comparative sentences with increasing opinion-ated comparative items listed in SentIta with positive Prior Polaritiesgenerate a positive opinionComparative indicators are the following

bull increasing opinionated comparative words (I-Op-CW)eg meglio better = +2

bull Intensifiers + increasing opinionated comparative words (Int + I-Op-CW)eg molto meglio a lot better = +3

Rule I increasing comparative sentences which occur with positiveitems from SentIta generate a positive OpinionComparative indicators are the following

180 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull Increasing Comparative Words (I-CW)eg piugrave morerdquo

bull Intensifiers + Increasing comparative words (Int + I-CW)eg molto piugrave a lot morerdquo

The sentence score depends on the interaction of I-CWs intensi-fiers and the prior polarity scores attributed to the SentIta wordsin the electronic dictionaries (see Table 47)

The sentence score just depends on the Prior polarity of the I-Op-CW

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

I-CW + OpW[POS] piugrave +1 +2 +3Int + I-CW + OpW[POS] molto piugrave +2 +3 +3

Table 47 Rule I increasing comparative sentences with positive orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

D-CW + OpW[POS] meno -1 -2 -1Int + D-CW + OpW[POS] molto meno -2 -3 -1

Table 48 Rule III decreasing comparative sentences with negative orientation

Rule III decreasing comparative sentences which occur with positiveitems from SentIta generate a negative opinion (see Table 48)Comparative indicators are the following

bull Decreasing Comparative Words (D-CW)eg meno lessrdquo

45 COMPARATIVE SENTENCES MINING 181

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

I-CW + OpW[NEG] piugrave -1 -2 -3Int + I-CW + OpW[NEG] molto piugrave -2 -3 -3

Table 49 Rule IV increasing comparative sentences with Negative Orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

D-CW + OpW[NEG] meno +1 +1 +1Int + D-CW + OpW[NEG] molto meno +2 +2 +1

Table 410 Rule VI decreasing comparative sentences with positive orientation

bull Intensifier + Decreasing comparative word (Int + D-CW)eg molto meno much lessrdquo

Rule IV this rule is perfectly specular to Rule I Increasing compara-tive sentences which occur with Negative items from SentIta gen-erate a negative opinionThe sentence score depends again on the interaction of I-CW In-tensifiers and the prior polarity scores attributed to the SentItawords that appear in the same sentence (see Table 49)

Rule V decreasing comparative sentences with decreasing opinion-ated comparative items listed in SentIta with negative Prior Po-larities generate a negative opinionComparative indicators are the following

bull decreasing opinionated comparative words (I-Op-CW)eg peggio worse = -2

bull Intensifier + decreasing opinionated comparative word (Int+ I-Op-CW)

182 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + I-CW + OpW[POS] non piugrave -1 -1 -1N + Int + I-CW + OpW[POS] non molto piugrave -1 -1 -1

Table 411 Nagation of Rule I Negated increasing comparative sentences with neg-ative orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + D-CW + OpW[POS] non meno +1 +2 +3N+ Int + D-CW + OpW[POS] non molto meno +1 +1 +2

Table 412 Negation of Rule III Negated decreasing comparative sentences withpositive orientation

eg molto peggio much worse = -3

Just as in Rule II the sentence score just depends on the PriorPolarity of the I-Op-CWs

Rule VI decreasing comparative sentences which occur with neg-ative items from SentIta generate a positive opinion (see Table410)Rule VI shares the comparative indicators with Rule III

454 Interaction with the Negation Rules

Negation of the Rule I increasing comparative sentences which occurwith positive items from SentIta when negated generate a neg-ative opinionComparative indicators are of course the same of Rule I with

45 COMPARATIVE SENTENCES MINING 183

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + I-CW + OpW[NEG] non piugrave +1 +1 -1N+ Int + I-CW + OpW[NEG] non molto piugrave +1 +1 -2

Table 413 Negation of Rule IV Negated increasing comparative sentences with neg-ative orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + D-CW + OpW[NEG] non meno -1 -2 -3N+ Int + D-CW + OpW[NEG] non molto meno -1 -2 -3

Table 414 Negation of RuleVI Negated decreasing comparative sentences with pos-itive orientation

the addition of negation indicators (N) eg non ldquonotrdquo The sen-tence score is always turned into the weakly negative one (seeTable 411)

Negation of the Rule II increasing comparative sentences in whichoccur increasing opinionated comparative items listed in SentItawith positive Prior Polarities when negated generate always anegative opinionComparative indicators are the same of Rule II

bull Negation markers + Increasing Opinionated ComparativeWords (N + I-Op-CW)eg non meglio ldquonot betterrdquo = -2

bull Negation markers + Intensifiers + Increasing OpinionatedComparative Words (N + Int + I-Op-CW)eg non molto meglio ldquonot much betterrdquo = -1

184 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Negation of Rule III the negation of Rule III that involves decreasingcomparative sentences with positive items from SentIta alwaysgenerates positive opinionsThe different sentence scores computed on the base of the in-teraction of comparative words and SentItarsquos Prior Polarities arereported in Table 412

Negation of Rule IV increasing comparative sentences which occurwith negative items from SentIta generate in the great part of thecases positive opinions (see Table 413) The sentence score isinfluenced by the interaction of Negation markers I-CW Inten-sifiers and negative prior polarity scores

Negation of Rule V decreasing comparative sentences in which occurincreasing opinionated comparative items listed in SentIta withnegative Prior Polarities when negated generate a weakly nega-tive opinion (non (molto + E) peggio ldquonot (much + E) worserdquo)

Negation of Rule VI decreasing comparative sentences which occurwith negative items from SentIta generate negative opinions ifnegated see Table 414

Chapter 5

Specific Tasks of the SentimentAnalysis

51 Sentiment Polarity Classification

Differently from the traditional topic-based text classification senti-ment classification aims to categorize documents through the classespositive and negative The objective can be a simple binary classifica-tion or it can come after a more complex classification with a largernumber of classes in the continuum between positive and negative(Pang and Lee 2008a) The rating-inference problem (Pang and Lee2005 Leung et al 2006 2008 Shimada and Endo 2008) regards themapping of automatically detected document polarity scores on thefine-grained rating scales (eg 510) of existing Collaborative Filter-ing algorithms (Breese et al 1998 Herlocker et al 1999 Sarwar et al2001 Linden et al 2003) The challenging problem of the sentimentclassification is the fact that the sentiment classes far from being un-related to one another represent opposing (binary classification) orordinal categories (multi-class a and rating-inference)

185

186 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

511 Related Works

Many techniques have been discussed in literature to perform theSentiment Polarity Classification We divided them in Section 32 inlexicon-based methods learning methods and hybrid methods1The first which calculate the sentiment orientation of sentences anddocuments starting from the polarity of words and phrases they con-tain are the methods that have been chosen for the present workThis is why they received an in-depth analysis in Chapter 3 wherewe discussed also the pros and cons of both the manual and auto-matic draft of sentiment lexical resources In this respect we men-tion again the major contribution related to knowledge-based so-lutions Landauer and Dumais (1997) Hatzivassiloglou and McKe-own (1997) Wiebe (2000) Turney (2002) Turney and Littman (2003)Riloff et al (2003) Hu and Liu (2004) Kamps et al (2004) Vermeij(2005) Gamon and Aue (2005) Esuli and Sebastiani (2005 2006a)Kanayama and Nasukawa (2006a) Benamara et al (2007) Kaji andKitsuregawa (2007) Rao and Ravichandran (2009) Mohammad et al(2009) Neviarouskaya et al (2009a) Velikovich et al (2010) Taboadaet al (2006 2011)Rule-based approaches that take into account the syntactic dimen-sion of the Sentiment Analysis are the ones used by Mulder et al(2004) Moilanen and Pulman (2007) and Nasukawa and Yi (2003)The most used classification algorithms in learning and statisticalmethods are Support Vector Machines which trained on specificdata-sets map a space of unstructured data points onto a new struc-tured space through a kernel function and Naiumlve Bayes which areprobabilistic classifiers that connect the presenceabsence of a classfeature to the absencepresence of other features given the class vari-able Effective sentiment features exploited in supervised machinelearning applications are terms their frequencies and TF-IDF parts

1In general while hybrid methods are characterized by the application of classifiers in sequence ap-proaches based on rules consists of an if-then relation among an antecedent and its associated conse-quent (Prabowo and Thelwall 2009)

51 SENTIMENT POLARITY CLASSIFICATION 187

of speech sentiment words and phrases sentiment shifters syntac-tic dependency etcTan et al (2009) and Kang et al (2012) adapted Naiumlve Bayes algorithmsfor sentiment analysisPang et al (2002) with the purpose to test the hypothesis that senti-ment classification can be treated just as a special case of topic-basedcategorization with the positive and negative sentiment consideredas topics used SVM Naiumlve Bayes and Maximum Entropy classifierswith diverse sets of featuresMullen and Collier (2004) employed the values potency activity andevaluative from a variety of sources and created a feature space clas-sified by means of SVMYe et al (2009) compared SVM with other statistical approaches prov-ing that SVM and n-gram outperform the Naiumlve Bayes approachThe minimum-cut framework working on graphs has been used byPang and Lee (2004)Bespalov et al (2011) performed sentiment classification via latent n-gram analysisNakagawa et al (2010) presented a dependency tree-based classifica-tion method grounded on conditional random fieldsCompositionality algorithms have been tested by Yessenalina andCardie (2011) and Socher et al (2013) Socher et al (2013) proposedRecursive Neural Tensor Network able to compute compositionalvector representations for phrases of variable length or of syntactickind obtaining better performances of standard recursive neural net-works matrix-vector recursive neural networks Naiumlve bi-gram Naiumlveand SVM Their output is the annotated corpus Stanford SentimentTreebank which represents a precious resource for the analysis of thecompositional effects of sentiment in languageSentiment indicators can be used for sentiment classification in anunsupervised manner this is the case of Turney (2002) and the otherlexicon-based methods (Liu 2012)In the end as regards the hybrid methods we mention the works of An-

188 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

dreevskaia and Bergler (2008) that integrated the accuracy of a corpus-based system with the portability of a lexicon-based system in a sin-gle ensemble of classifiers Aue and Gamon that combined nine SVM-based classifiers Dasgupta and Ng (2009) that firstly mined the un-ambiguous reviews through spectral techniques and afterwards ex-ploited them to classify the ambiguous reviews by combining activelearning transductive learning and ensemble learning Goldberg andZhu (2006) that presented an adaptation of graph-based learning al-gorithms for rating-inference problems and Prabowo and Thelwall(2009) that combined rule-based and machine learning classificationalgorithms

512 DOXA a Sentiment Classifier based on FSA

Using the command-line program noojapplyexe we built Doxa aprototype written in Java by which users can automatically apply theNooj lexicon-grammatical resources from Section 3 and rules fromSection 4 to every kind of text getting back a feedback of statistics thatcontain the opinions expressed in each caseDoxa sums up the values corresponding to every sentiment expressionand then standardizes the result for the total number of sentimentexpressions contained in each review Neutral sentences are simplyignored by both the FSA and DoxaThe automatically attributed polarity score are compared with thestars that the opinion holder gave to his review Then through thestatistics about the opinions expressed in every domain Doxa teststhe consistency of the sentiment lexical databases and tools with re-spect to the rating-inference problem in each document and in thewhole corpusFigure 51 reports an example of the analysis on the domain of hotelsreviews

51 SENTIMENT POLARITY CLASSIFICATION 189

Figure 51 Doxa the LG sentiment classifier based on the finite-state technology

513 A Dataset of Opinionated Reviews

Despite the fact that for the English language there is a large availabil-ity of corpora for sentiment analysis (Hu and Liu 2004 Pang and Lee2004 2005 Wiebe et al 2005 Wilson et al 2005 Thomas et al 2006Jindal and Liu 2007 Snyder and Barzilay 2007 Blitzer et al 2007 Sekiet al 2007 Macdonald et al 2007 Pang and Lee 2008b among oth-ers) the Italian language presents fewer contributions in this sense(Basile and Nissim 2013 Bosco et al 2013 2014)The dataset used to evaluate the lexical and grammatical resources ofthe present work has been built from scratch using Italian opinion-ated texts in the form of usersrsquo reviews and comments of e-commerceand opinion websites It contains 600 texts units and refers to sixdifferent domains for all of which different websites have been ex-ploited

190 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Cars wwwciaoit

Smartphones wwwtecnozoomit wwwciaoit wwwamazonitwwwalatestit

Books wwwamazonit wwwqlibriit

Movies wwwmymoviesit wwwcinemaliait wwwfilmtvitwwwfilmscoopit

Hotels wwwtripadvisorit wwwexpediait wwwvenerecomithotelscom wwwbookingcom

Videogames wwwamazonit

Details about the corpus are shown in Table 1

Text

Feat

ure

s

Car

s

Smar

tph

on

es

Bo

oks

Mov

ies

Ho

tels

Gam

es

Tot

Neg Docs 50 50 50 50 50 50 300Pos Docs 50 50 50 50 50 50 300Text Files 20 20 20 20 20 20 120

Word Forms 17163 19226 8903 37213 12553 5597 101655Tokens 21663 24979 10845 45397 16230 7070 126184

Table 51 Dataset of opinionated online customer reviews

Adjectives as it is commonly recognized in literature (Hatzivas-siloglou and McKeown 1997 Hu and Liu 2004 Taboada et al 2006)seem to be the most reliable SO indicators considering that the 17of the adjectives occurring in the corpus are polarized compared tothe 3 of the adverbs the 2 of nouns and the 7 of verbsMoreover considering all the opinion bearing words in the corpus theadjectivesrsquo patterns cover 81 of the total number of occurrences (al-most 5000 matches) while the adverbs the nouns and the verbs reach

51 SENTIMENT POLARITY CLASSIFICATION 191

respectively a percentage of 4 6 and 2 The remaining 7 is cov-ered by the other sentiment expressions that in any case contributeto the achievement of satisfactory levels of RecallAs regards the Semantic Orientation while in the dictionaries almost70 of the words are connoted by a negative Prior Polarity the num-ber of positive and negative expressions in our perfectly balanced cor-pus lead to almost antipodal results 64 of the sentiment expressionspossess a positive polarity In particular it is positive 61 of the ad-jective expressions 77 of the adverb expressions 65 of the nounsexpressions and 51 of the verb expressionsThe domain independent expressions have diametrically oppositepercentages only 34 of the occurrences is positive This result is dueto the presence of the really productive negative metanodes troppoldquotoo muchrdquo and poco ldquonot muchrdquo Inactivating them in fact the pos-itive percentage reaches 58 of occurrences Thus it could be statedthat although the lexicon makes available a greater number of nega-tive items the real occurrences of the opinion words throw off balancein the direction of the positive items (see Section 3321 to deepen thephenomenon of positive biases)

514 Experiment and Evaluation

In the sentiment classification task the information unit is representedby an opinionated document as a whole The purpose of our senti-ment tool is to classify such documents on the base of their overallSemantic OrientationTo ground this task on a lexicon means to hypothesize that the polar-ities of opinion words can be considered indicators of the polarity ofthe document in which the words and the expressions are containedOur Lexicon-Grammar based sentiment lexical database confers to el-ementary sentences the status of minimum semantic units of the lan-guage Therefore the indicators on which we grounded our classifi-

192 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cation go beyond the limit of mere words to match entire elementarysentences or polar phrases from complex sentencesBecause the inference of the whole document SO depends on the cor-rect annotation of its sentences and phrases the evaluation phase in-volves both the sentence-level and the document-level performancesof the tool

5141 Sentence-level Sentiment Analysis

This Paragraph presents the results pertaining to the Nooj outputwhich has been produced applying the SentIta LG sentiment re-sources to our corpus of costumer reviewsFor sake of clarity and for the reproducibility of the results it must bepointed out that some of the lexical resources have been applied withhigher priority attributions Details about the preferences are givenbelow

bull H1 to the sentiment adverb and nouns dictionaries

bull H2 to the sentiment adjective dictionary

bull H3 to the freely available dictionary Contrazioni that belongs tothe official Italian module of Nooj

bull H4 to an high-level priority dictionary that contains what is com-monly called a ldquostop word listrdquo

bull H5 to a dictionary of compound adverbs

In order to avoid ambiguity as much as possible the dictionaryof the verbs of sentiment must have the same priority of the Italianstandard dictionary (sdic) lower than any other mentioned resourceBecause our lexical and grammatical resources are not domain-specific we observed their interaction with every single part of thecorpus which is composed of many different domains each one ofthem characterized by its own peculiarities

51 SENTIMENT POLARITY CLASSIFICATION 193

Moreover in order to verify the performances of every part of speech(and of the expressions connected to them) we checked as shown inTable 52 the Precision and the Recall applying separately every singlemetanode (A ADV N V D-ind2) of the main graph of the sentimentgrammar

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A 88 90 79 875 915 835 866ADV 804 758 879 923 92 50 797N 818 857 828 778 853 857 832V 882 571 848 895 571 100 795D-ind 879 835 781 767 90 947 879Average 853 784 825 848 832 828 828

Table 52 Precision measure in sentence-level classification

The sentence-level Precision has been manually calculated deter-mining whereas the polarity and the intensity score of the expressionswas correctly assigned on all the concordances produced by Nooj Wemade an exception for the expressions of the adjectives which due tothe extremely large number of concordances have been evaluated byextracting a sample of 1200 matches (200 for each domain)The values marked by the asterisks have been reported to be thoroughbut they are not really relevant because of the small number of concor-dances on which they have been calculatedThe best performance has been reached in average by the cars datasetwhile the lower values have been assigned to the movies dataset Theperformances of the adjectives grammar are really satisfying espe-

2Domain independent sentiment expressions (D-ind) refer to all the frozen and semi-frozen expres-sions that have been directly formalized in the FSA and include also the vulgar expressions

194 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cially if compared with the high number of matches they producedEven though the domain-independent node covers a small part of thematches (7) it reached the best Precision results if compared withthe other metanodesAnyway in many cases just evaluating the polarity and the intensityof the expressions contained into a document is not enough to deter-mine whereas such expression truly contributes to the identificationof the polarity and the intensity of the document itself Sometimes po-larized sentences and phrases are not opinion indicators especiallyin the reviews of movies and books in which polarized sentences canjust refer to the plot (78)

(78) Le belle case sono dimora della degenerazione piugrave bieca [-3]ldquopretty houses hosts the grimmest degeneracyrdquo)

An opinionated sentence can be considered not pertinent also isthe case in which it does not directly refer to the object of the opinionbut mentions aspects that are just indirectly connected with the prod-uct Examples of this are (79) that does not refer to the product but tothe delivery service that in a book review does not refers to the objectbut to another media product related to the book

(79) Apprezzo[+2] molto[+] Amazon per questo [+3]ldquoI really appreciate Amazon for thisrdquo

(80) Il film non[Negation] mi era piaciuto[+2] [-2]ldquoI didnrsquot like the movierdquo

In this work we chose to exclude the second sentence from the cor-rect matches but we considered worthwhile to include the first onebecause of the high influence that this feature has on the numericalvalue that the opinion holder confers to his own opinionTable 53 corrects the results reported in Table 52 by excluding from

the correct matches the ones that do not give their contribution tothe individuation of the correct document SO This makes the averagesentence-level Precision drop of 87 points reaching the percentage of

51 SENTIMENT POLARITY CLASSIFICATION 195

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A -5 -55 -28 -105 -1 -15 -86ADV -22 0 -297 -77 0 0 -66N -114 -143 -395 -148 -59 -143 -167V 0 0 -176 -158 0 0 -56D-ind -86 0 -133 -67 -25 -53 -61Average -54 -4 -256 -111 -19 -42 -87

Table 53 Irrelevant matches in sentence-level classification

741 which we considered adequateWe chose to present these two results separately because this way it ispossible to distinguish the domains that are more influenced by thisproblem (eg movies and books) from the ones that are less affectedby the pertinence problem (eg hotels and smartphones)

5142 Document-level Sentiment Analysis

As far as the document-level performance is concerned (Table 54)we calculated the precision twice by considering in a first case as truepositive the reviews correctly classified by Doxa on the base of theirpolarity and in a second case by considering as true positive the docu-ments that received by our tool exactly the same stars specified by theOpinion HolderIn detail in the Polarity only row the True Positives are the documentsthat have been correctly classified by Doxa with a polarity attributionthat corresponds to the one specified by the Opinion HolderIn the Intensity also row the True Positives are the document that re-ceived by Doxa exactly the same stars specified by the Opinion Holder

196 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Pre

cisi

on

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Polarity only 710 720 630 740 910 720 740Intensity also 320 450 250 330 490 340 363

Table 54 Precision measure in document-level classification

As we can see the latter seem to have a very low precision but upona deeper analysis we discovered that is really common for the OpinionHolders to write in their reviews texts that do not perfectly correspondto the stars they specified That increases the importance of a softwarelike Doxa that does not stop the analysis on the structured data butenters the semantic dimension of texts written in natural language

Rec

all

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Sentence-level 727 796 648 657 721 588 690Document-level 100 986 100 961 989 912 975

Table 55 Recall in both the sentence-level and the document level classifications

The Recall pertaining to the sentence-level performance of our toolhas been manually calculated on a sample of 150 opinionated docu-ments (25 from each domain) by considering as false negatives thesentiment indicators which have not been annotated by our grammarThe document-level Recall instead has been automatically checkedwith Doxa by considering as true positive all the opinionated docu-ments which contained at least one appropriate sentiment indicator

52 FEATURE-BASED SENTIMENT ANALYSIS 197

So the documents in which the Nooj grammar did not annotate anypattern were considered false negatives Because we assumed that allthe texts of our corpus were opinionated we considered as false neg-atives also the cases in which the value of the document was 0Taking the F-measure into account the best results were achievedwith the smartphonersquos domain (770) in the sentence-level taskand with the hotelrsquos dataset (948) into the document-level perfor-mance

52 Feature-based Sentiment Analysis

The purpose of the sentiment analysis based on features is to providecompanies with customer opinions overviews which automaticallysummarize the strengths and the weaknesses of the products and ser-vices they offer In Section 1 we connected the definition of the opin-ion features to the following function (Liu 2010)

T=O(f)

Where the features (f ) of an object (O) (eg hotels) can be brandnames (eg Artemide Milestone) properties (eg cleanness loca-tion) parts (eg rooms apartments) related concepts (eg city to-ponyms) or parts of the related concepts (eg city centre tube mu-seum etc ) (Popescu and Etzioni 2007) Liu (2010) represented ev-ery object as a ldquospecial featurerdquo (F) defined by a subset of features (fi)

F = f1 f2 fn

Both F and fi grouped together under the noun target can beautomatically discovered in texts through both synonym words andphrases Wi or other direct or indirect indicators Ii

Wi = wi1 wi2 wim

Ii = ii1 ii2 iiq

198 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

It is essential to discover not only the overall polarity of an opin-ionated document but also its topic and features in order to discernthe aspect of a product that must be improved or whether the opin-ions extracted by the Sentiment Analysis applications are relevant tothe product or not For example (81) an extract from a hotel reviewexpresses two opinions on two different kinds of features the behav-ior of the staff (81a) and the beauty of the city in which the hotel islocated (81b)

(81a) [Opinion[FeaturePersonale] meravigliosamente accogliente]ldquoStaff wonderfully welcomingrdquo

(81b) [Opinion[FeatureRoma]egrave bellissima]ldquoRome is beautifulrdquo

It is crucial to distinguish between the two topics because just oneof them (81a) refers to an aspect that can be influenced or improvedby the company

521 Related Works

Pioneer works on feature-based opinion summarization are Hu andLiu (2004 2006) Carenini et al (2005) Riloff et al (2006) and Popescuand Etzioni (2007) Both Popescu and Etzioni (2007) and Hu and Liu(2004) firstly identified the product features on the base of their fre-quency and then calculated the Semantic Orientation of the opin-ions expressed on these features In order to find the most impor-tant features commented in reviews Hu and Liu (2004) used the as-sociation rule mining thanks to which frequent itemsets can be ex-tracted in free texts Redundant and meaningless items are removedduring a Feature Pruning phase Experiments have been carried outon the first 100 reviews belonging to five classes of electronic products(wwwamazoncom and wwwC|netcom) Hu and Liu (2006) presentedthe algorithm ClassPrefix-Span that aimed to find special kinds of pat-

52 FEATURE-BASED SENTIMENT ANALYSIS 199

tern the Class Sequential Rules (CSR) using fixed target and classesTheir method was based on the sequential pattern miningCarenini et al (2005) struck a balance between supervised and unsu-pervised approaches They mapped crude (learned) features into aUser-Defined taxonomy of the entityrsquos Features (UDF) that provideda conceptual organization for the information extracted This methodtook advantages from a similarity matching in which the UDF re-duced the redundancies by grouping together identical features andthen organized and presented information by using hierarchical rela-tions) Experiments have been conducted on a testing set containingPros Cons and detailed free reviews of five productsRiloff et al (2006) used the subsumption hierarchy in order to iden-tify complex features and then to reduce a feature set by remov-ing useless features which have for example a more general coun-terpart in the subsumption hierarchy The feature representationsused for opinion analysis are n-grams (unigrams bigrams) and lexico-syntactic extraction patterns Popescu and Etzioni (2007) presentedOPINE an unsupervised feature and opinion extraction system thatused the Web as a corpus to identify explicit and implicit featuresand relaxation-labelling methods to infer the Semantic Orientation ofwords The system draws on WordNetrsquos semantic relations and hier-archies for the individuation of the features (parts properties and re-lated concepts) and the creation of clusters of wordsFerreira et al (2008) made a comparison between the likelihood ratiotest approach (Yi et al 2003) and the Association mining approach(Hu and Liu 2004) The e-product dataset (topical documents) andthe annotation scheme (improved with some revisions) are the onesused by Hu and Liu (2004) In particular the revised annotationscheme comprehended

bull part-of relationship with the product (eg battery is a part of adigital camera)

bull attribute-of relationship with the product (weight and design are

200 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

attributes of a camera)

bull attribute-of relationship with the productrsquos feature (battery life isan attribute of the battery that is in turn a camerarsquos feature)

Double Propagation (Qiu et al 2009) focuses on the natural relationbetween opinion words and features Because opinion words are of-ten used to modify features such relations can be identified thanks tothe dependency grammarBecause these methods have good results only for medium-size cor-pora they must be supported by other feature mining methods Thestrategy proposed by Zhang et al (2010) is based on ldquono patternsrdquo andpart-whole patterns (meronymy)

bull NP + Prep + CP ldquobattery of the camerardquo

bull CP + with + NP ldquomattress with a coverrdquo

bull NP CP or CP NP ldquomattress padrdquo

bull CP Verb NP ldquothe phone has a big screenrdquo

Where NP is a noun phrase and CP a class concept phrase Theverbs used are ldquohasrdquo ldquohaverdquo ldquoincluderdquo ldquocontainrdquo ldquoconsistrdquo etc ldquoNordquopatterns are feature indicators as well Examples of such patterns areldquono noiserdquo or ldquono indentationrdquo Clearly a manually built list of fixedldquonordquo expression like ldquono problemrdquo is filtered out from the feature can-didates Every sentence is POS-tagged using the Brillrsquos tagger (Brill1995) and used as system inputIn order to find important features and rank them high Zhang et al(2010) used a web page ranking algorithm named HITS (Kleinberg1999)Somprasertsri and Lalitrojwong (2010) used a dependency based ap-proach for the opinion summarization task A central stage in theirwork is the extraction of relations between product features (ldquothe topicof the sentimentrdquo) and opinions (ldquothe subjective expression of theproduct featurerdquo) from online customer reviews Adjectives and verbs

52 FEATURE-BASED SENTIMENT ANALYSIS 201

have been used in this study as opinion words The maximum entropymodel has been used in order to predict the opinion-relevant productfeature relation In general in a dependency tree a feature can be ofdiffering kinds

bull Child the product feature is the subject or object of the verbsand the opinion word is a verb or a complement of a copular verb(opinions depend on the product features)

eg I like(opinion) this camera(feature)

bull Parent the opinion words are in the modifiers of product fea-tures which include adjectival modifier relative clause modifieretc (product features depend on the opinions)

eg I have found that this camera takes incredible(opinion) pic-tures(feature)

bull Sibling the opinion word may also be in an adverbial modifiera complement of the verb or a predicative (both opinions andproduct features depend on the same words)

eg The pictures(feature) some time turn out blurry(opinion)

bull Grand Parent the opinion words are adjectival complement ofmodifiers of product features (in the following example the wordldquogoodrdquo is the adverbial complement of relative clause modifierof noun phrase ldquomovie moderdquo) (opinions depend on the wordswhich depend on the product features)

eg It has movie mode(feature) that works good(opinion) for a digitalcamera

bull Grand Child the product feature is the subject or object of thecomplements and the opinion word is a verb or a complementof a copular verb for example (product features depend on thewords which depend on the opinions)

eg Itrsquos great(opinion) having the LCD display(feature)

202 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

bull Indirect none of the above relations

The Stanford Parser has been used by Somprasertsri and Lalitroj-wong (2010) to parse documents in the pre-processing phase beforethe dependency analysis stage Definite linguistic filtering (POSbased) pattern and the General Inquirer dictionary (Stone et al 1966)has been used to locate noun phrases and to extract product featurecandidate The generalized iterative scaling algorithm (Darroch andRatcliff 1972) has been used to estimate parameters or weights of theselected features Because it is possible to refer to a particular featureusing several synonyms Somprasertsri and Lalitrojwong (2010) usedsemantic information encoded into a product ontology manuallybuilt by integrating manufacturer product descriptions and termi-nologies in customer reviewsWei et al (2010) proposed a semantic-based method that made usesof a list of positive and negative adjectives defined in the GeneralInquirer to recognize opinion words and then extracted the relatedproduct features in consumer reviewsXia and Zong (2010) performed the feature extraction and selectiontasks using word relation features which seems to be effective featuresfor sentiment classification because they encode relation informationbetween words They can be of different kinds Uni (unigrams) WR-Bi(traditional bigrams) WR-Dp (word pairs of dependency relation)GWR-Bi (generalized bigrams) and GWR-Dp (dependency relations)Gutieacuterrez et al (2011) exploited Relevant Semantic Trees (RST) for theword-sense disambiguation and measured the association betweenconcepts at the sentence-level using the association ratio measureMejova and Srinivasan (2011) explored different feature definitionand selection techniques (stemming negation enriched featuresterm frequency versus binary weighting n-grams and phrases) andapproaches (frequency based vocabulary trimming part-of-speechand lexicon selection and expected Mutual Information Mejovarsquosexperimental results confirmed the fact that adjectives are importantfor polarity classification Furthermore they revealed that stemming

52 FEATURE-BASED SENTIMENT ANALYSIS 203

and using binary instead of term frequency feature vectors do nothave any impact on performances and that the helpfulness of certaintechniques depends on the nature of the dataset (eg size and classbalance)Concordance based Feature Extraction (CFE) is the technique used byKhan et al (2012) After a traditional pre-processing step regular ex-pressions are used to extract candidate features Evaluative adjectivescollected on the base of a seed list from Hu and Liu (2004) are helpfulin the feature extraction task In the end a grouping phase found theappropriate features for the opinionrsquos topic grouping together all therelated features and removing the useless ones The algorithm usedin this phase is based on the co-occurrence of features and uses theleft and right featurersquos contextKhan et al selected candidate product features by employing nounphrases that appear in texts close to subjective adjectives This intu-ition is shared with Wei et al (2010) and Zhang and Liu (2011) Thecentrepiece of the Khanrsquos method is represented by hybrid patternsCombined Pattern Based Noun Phrases (cBNP) that are grounded onthe dependency relation between subjective adjectives (opinionatedterms) and nouns (product features) Nouns and adjectives canbe sometimes connected by linking verbs (eg ldquocamera producesfantastically good picturesrdquo) Preposition based noun phrases (egldquoquality of photordquo ldquorange of lensesrdquo) often represents entity-to-entityor entity-to-feature relations The last stage is the proper featureextraction phase in which using an ad hoc module the noun phrasesof the cBNP patterns have been designated as product features

522 Experiment and Evaluation

In order to give our contribution to the resolution of the opinionfeature extraction and selection tasks we both exploited the SentIta

204 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

database and other preexisting Italian lexical resources of Nooj thatconsist of an objective dictionary that describes its lemmas withappropriate semantic properties it has been used to define andsummarize the features on which the opinion holder expressed hisopinionsIn order to identify the relations between opinions and features thelexical items contained in the just mentioned dictionaries have beensystematically recalled into a Nooj local grammar that provides asfeedback annotations about the (positive or negative) nature of theopinion and about the semantic category of the featuresThe objective lexicon consists in a Nooj dictionary of concrete nouns(Table 56) that has been manually built and checked by seven lin-guists It counts almost 22000 nouns described from a semantic pointof view by the property Hyper that specify every lemmarsquos hyperonymThe tags it includes are shown in Table 56 Through the softwareNooj it has been possible to create a connection between thesedictionaries and an ad hoc FSA for the feature analysisAs we will demonstrate below the feature classification is stronglydependent on the review domains and topics on the base of whichspecific local grammars need to be adapted (Engstroumlm 2004 An-dreevskaia and Bergler 2008)As shown in Figure 52 in the Feature Extraction phase the sentencesor the noun phrases expressing the opinions on specific features arefound in free texts Annotations regarding their polarity (BENEFITDRAWBACK) and their intensity (SCORE=[-3+3]) are attributed to each sen-tence on the base of the opinion words on which they are anchoredObviously the graph reported in Figure 52 represents just an extractof the more complex FSA built to perform the task which includesmore than 100 embedded graphs and thousands of different pathsable to describe not only the sentences and the phrases anchored onsentiment adjectives but also the ones that contain sentiment ad-verbs nouns verbs idioms and other lexicon-independent opinionbearing expressions (eg essere (dotato + fornito + provvisto) di ldquoto be

52 FEATURE-BASED SENTIMENT ANALYSIS 205

Label Tag Entries Example Traslationbody part Npc 598 labbro liporganism part Npcorg 627 colon colontext Ntesti 1168 rubrica address book(part of) article of clothing Nindu 1009 vestaglia nightgowncosmetic product Ncos 66 rossetto lipstickfood Ncibo 1170 agnello lambliquid substance Nliq 341 urina urinepotable liquid substance Nliqbev 29 sidro cidermoney Nmon 216 rublo ruble(part of) building Nedi 1580 torre towerplace Nloc 1018 valle valleyphysical substance Nmat 1727 agata agate(part of) plant Nbot 2187 zucca pumpkinmedicine Nfarm 415 sedativo sedativedrug Ndrug 69 mescalina mescalinechemical element Nchim 1212 cloruro chloride(part of) electronic device Ndisp 1191 motosega chainsaw(part of) vehicle Nvei 1082 gondola gondola(part of) furniture Narr 450 tavolo tableinstrument Nstrum 3666 ventaglio fanTot - 19946 - -

Table 56 Semantically Labeled Dictionary of Concrete Nouns

equipped withrdquo essere un (aspetto + nota + cosa + lato) negativo ldquoto bea negative siderdquo)The Contextual Valence Shifters that have been considered are theones described in Chapter 4Differently from the other approaches proposed in Literature theFeature Pruning task (Figure 53) in which the features extracted aregrouped together on the base of their semantic nature is performedin the same moment of the Extraction phase just applying once thedictionaries-grammar pair to free texts This happens because thegrammar shown in Figure 53 is contained into the metanode NG(Nominal Group) of the feature extraction grammarThanks to the instructions contained in this subgraph the words de-

scribed in our dictionary of concrete nouns as belonging to the same

206 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figure 52 Extract of the Feature Extraction grammar

hyperonym receive the same annotation The words labeled with thetags Narr (furniture eg ldquobedrdquo) and Nindu (article of clothing egldquobath towelrdquo) are annotated as ldquofurnishingsrdquo (first path) the wordslabeled with the tags Ncibo (food eg ldquocakerdquo) and NliqBev (potableliquid substance eg ldquocoffeerdquo) receive the annotation ldquofoodservicerdquo(second path) and so onIn order to perfect the results we also directly included in the nodes ofthe grammar the words that were important for the feature selectionbut that were not contained in the dictionary of concrete nouns (ab-stract nouns eg pranzo ldquolunchrdquo vista ldquoviewrdquo) and the hyperonymsthemselves that in general correspond with each feature class name(eg arredamento ldquofurnishingsrdquo luogo ldquolocationrdquo)Finally we disambiguated terms that in the objective dictionary wereassociated to more than one tag by excluding the wrong interpreta-tions from the corresponding node For example in the domain of

52 FEATURE-BASED SENTIMENT ANALYSIS 207

Figure 53 Extract of the Feature Pruning metanode

hotel reviews if one comments the cucina ldquocookingrdquo (first path) or thesoggiorno ldquosojournrdquo (third path) he is clearly not talking about rooms(in Italian cucina means also ldquokitchenrdquo and soggiorno ldquoliving roomrdquo)but respectively about the food service and the whole holiday expe-rienceThe annotations provided by Nooj once the lexical and grammaticalresources are applied to texts written in natural language can beon the form of XML (eXtensible Markup Language) tags For thisreason they can be effortlessly recalled into an application that au-tomatically applies the Nooj ad hoc resources and makes statistical

208 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

analyses on them Therefore we dedicated a specific module of Doxato the feature-based sentiment analysis that has been tested on acorpus composed of 1000 hotel reviews collected from websites likewwwtripadvisorit wwwexpediait and wwwbookingcomAn extract of the automatically annotated corpus is reported below

Review

Sono stato in questo albergo quasi per caso dopo aver subitoun incidente e devo ammettere che se non fosse stato per ladisponibilitagrave la cortesia e la professionalitagrave di Mario Pinoe Rosa non avrei potuto risolvere una serie di cose Lrsquohotelegrave fantastico silenzioso e decisamente pulito La colazioneottima ed il responsabile della sala colazione gentilissimo

ldquoI happened to be in this hotel by accident and I must admitthat if it hadnrsquot been for the willingness the kindness andthe competence of Mario Pino and Rosa I couldnrsquot havesolved a number of things The hotel is fantastic silentand definitely clean Great the breakfast and very kind theresponsible of the breakfast roomrdquo

Annotation

Sono stato in questo albergo quasi per caso dopo aver subito un incidente e devo ammettere

che se non fosse stato per ltBENEFIT TYPE=ATTITUDES SCORE=2gt la disponibilitagrave

ltBENEFITgt ltBENEFIT TYPE=ATTITUDES SCORE=2gt la cortesia ltBENEFITgt e

ltBENEFIT TYPE=ATTITUDES SCORE=2gt la professionalitagrave ltBENEFITgt di Mario Pino

e Rosa non avrei potuto risolvere una serie di cose

ltBENEFIT TYPE=ACCOMMODATIONS SCORE=3gt Lrsquohotel egrave fantastico ltBENEFITgt

ltBENEFIT TYPE=LOCATION SCORE=2gt silenzioso ltBENEFITgt e

ltBENEFIT SCORE=3 TYPE=CLEANLINESSgt decisamente pulito ltBENEFITgt

ltBENEFIT TYPE=FOODSERVICE SCORE=3gt La colazione ottima ltBENEFITgt ed

52 FEATURE-BASED SENTIMENT ANALYSIS 209

ltBENEFIT TYPE=ATTITUDES SCORE=3gt il responsabile della sala colazione gentilissimo

ltBENEFITgt

The evaluation presented in Table 57 shows that the method pro-posed in the present work performs very well with the hotel domainWe manually calculated on a sample of 100 documents from theoriginal corpus the Precision on the Semantic Orientation Identi-fication task (that regards the annotation of the sentencesrsquo polarityand intensity) and on the Feature Classification task (that regards theannotation of the types of the feature) Because the classes used to

Method Precision Recall F-scoreSO Identification Classification

No residual category 933 924 723 811Residual category 929 826 783 804

Table 57 F-score of Feature Extraction and Pruning

group the features have been identified a priori in our grammar wealso checked the results of our tool with the introduction of a residualcategory to annotate the features that do not belong to these classesEven though this causes an increase on the Recall a commensuratedecrease of the Precision in the Classification task makes the valueof the F-score similar to the one reached when the residual categoryis not taken into account Significant examples of the sentences thathave been automatically annotated in our corpus are given below

(82) Il responsabile della sala colazione gentilissimoldquoVery kind the responsible of the breakfast roomrdquoBENEFIT TYPE= ATTITUDES SCORE= 3

(83) Le camere erano sporcheldquoThe rooms were dirtyrdquoDRAWBACK TYPE= CLEANLINESS SCORE= -2

210 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

(84) Lrsquohotel egrave vicinissimo alla metroldquoThe hotel is really close to the tuberdquoBENEFIT TYPE= LOCATION SCORE= 3

In (82) the subjective adjective gentilissimo ldquovery kindrdquo recognisedby the grammar as the superlative form (SCORE=+3) of the positive adjec-tive gentile ldquokindrdquo (SCORE=+2) makes the grammar recognize the wholesentiment expression and provides the information regarding its po-larity and intensity This is why the FSA associates the tag BENEFIT andthe higher score to the feature that in this case is represented by anounIn (83) is shown that the ldquotyperdquo of the features can be indicated notonly by nouns but also by specific adjectives as happens with sporcheldquodirtyrdquo that is both an opinion word that specifies that the whole ex-pression is negative and a feature class indicator that let the grammarclassify it in the group of ldquocleanlinessrdquo rather than in the group of ldquoac-commodationsrdquo (that instead would be selected by the noun cameraldquoroomrdquo described in our objective dictionary with the tags Conc ldquocon-creterdquo and Nedi ldquobuildingrdquo)In the end sentence (84) attests the strength of the influence of thereview domains sometimes the lexicon is not enough to correctly ex-tract and classify the product features Domain-specific lexicon in-dependent patterns are often required in order to make the analysesadequate (Engstroumlm 2004 Andreevskaia and Bergler 2008)

523 Opinion Visual Summarization

We said in Section 51 that Doxa is a prototype able to analyze opin-ionated documents from their semantic point of view It automati-cally applies the Nooj resources to free texts sums up the values cor-responding to every sentiment expression standardizes the results forthe total number of sentiment expressions contained in the reviewsand then provides statistics about the opinions expressed

53 SENTIMENT ROLE LABELING 211

In this Section we show the Doxa improved functionalities dedicatedto the sentiment analysis based on the opinion features In the Ho-tel domain this deeper analysis considerably increases the Precisionon the sentences-level SO identification task that goes from 813 to933 without any loss in the Recall that holds steady just rising of02 percentage points from 721 to 723 (see Table 57)The feature-based module of Doxa completely grounded on the auto-matic analysis of free texts allows the comparison between more thanone object on the base of different kinds of aspects that characterizethemThe regular decagon in the center of the spider graphs of Figure 54represents the threshold value (zero) with which every group of opin-ion on the same object is compared when a figure is larger than it theopinion is positive when it is smaller the opinion is negative when afeaturersquos value is zero it means that the consumers did not express anyjudgment on that topicThe automatic transformation of non-structured reviews in struc-tured visual summaries has a strong impact on the work of companymanagers that must ground their marketing strategies on the analy-sis of a large amount of information and on the individual consumersthat need to quickly compare the Pros and Cons of the products orservices they want to buy

53 Sentiment Role Labeling

Semantic Roles also called theta or thematic roles (Chomsky 1993Jackendoff 1990 see Section 214) are linguistic devices able to iden-tify in texts ldquoWho did What to Whom and How When and Whererdquo(Palmer et al 2010 p 1)The main task of an SRL system is to map the syntactic elements of freetext sentences to their semantic representation through the identifi-cation of all the syntactic functions of clause-complements and their

212 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figu

re54D

oxam

od

ule

for

Feature-b

asedSen

timen

tAn

alysis

53 SENTIMENT ROLE LABELING 213

labeling with appropriate semantic roles (Rahimi Rastgar and Razavi2013 Monachesi 2009)To work an SRL system needs machine readable lexical resources inwhich all the predicates and their relative arguments are specifiedFrameNet (Baker et al 1998 Fillmore et al 2002 Ruppenhofer et al2006) VerbNet Schuler (2005) and the PropBank Frame Files (Kings-bury and Palmer 2002) provide the basis for large scale semantic an-notation of corpora (Giuglea and Moschitti 2006 Palmer et al 2010)In this Section according with the Semantic Predicates theory (Gross1981) we performed a matching between the definitional syntacticstructures attributed to each class of verbs and the semantic infor-mation we attached in the database to every lexical entryThis way we could create a strict connection between the argumentsselected by a given Predicate listed in our tables and the actants in-volved into the same verbrsquos Semantic Frame (Fillmore 1976 Fillmoreand Baker 2001 Fillmore 2006) In the experiment that we are goingto describe we focused on the following frames

bull ldquoSentiment (eg odiare ldquoto hate) that involves the roles experi-encer and stimulus

bull ldquoOpinion (eg difendere ldquoto stand up for) that implicates theroles opinion holder target of the opinion and cause of the opin-ion

bull ldquoPhysical act (eg baciare ldquoto kiss) that evokes the roles patientand agent

531 Related Works

A group of researches in the field of Semantic Role Labeling focusedon the automatic annotation of the roles which are relevant to Sen-timent Analysis and opinion mining such as sourceopinion holdertargetthemetopic

214 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Bethard et al (2004) with question answering purposes proposed anextension of semantic parsing techniques coupled with additionallexical and syntactic features for the automatic extraction of propo-sitional opinions The algorithm used to identify and label semanticconstituents is the one developed by Gildea and Jurafsky (2002) Prad-han et al (2003)Kim and Hovy (2006) presented a Semantic Role Labeling systembased on the Frame Semantics of Fillmore (1976) that aimed to tagthe frame elements semantically related to an opinion bearing wordexpressed in a sentence which is considered as the key indicatorof an opinion The learning algorithm used for the classificationmodel is Maximum Entropy (Berger et al 1996 Kim and Hovy 2005)FrameNet II Ruppenhofer et al (2006) has been used to collect framesrelated to opinion wordsOther works based on FrameNet are Houen (2011) that used Frame-Frame relations in order to expand the subset of polarity scoredFrames found in the FrameNet database Ruppenhofer et al (2008)Ruppenhofer and Rehbein (2012) and Ruppenhofer (2013) whichadded opinion frames with slots for source target polarity and in-tensity to the FrameNet database to build SentiFrameNet a lexi-cal database enriched with inherently evaluative items associated toopinion framesLexicon-based approaches are the one of Kim et al (2008) that madeuse of a collection of communication and appraisal verbs the Sen-tiWordNet lexicon a syntactic parser and a named entity recognizerfor the extraction of opinion holders from texts and the one ofBloom et al (2007) that extracted domain-dependent opinion holdersthrough a combination of heuristic shallow parsing and dependencyparsingConditional Random Fields (Lafferty et al 2001) is the method usedby Choi et al (2005) that interpreting the semantic role labeling taskas an information extraction problem exploited a hybrid approachthat combines two very different learning-based methods CRF for the

53 SENTIMENT ROLE LABELING 215

named entity recognition and a variation of AutoSlog (Riloff 1996) forthe information extraction CRF have been also used by by Choi et al(2006) who presented a global inference approach to jointly extractentities and relations in the context of opinion oriented informationextraction through two separate token-level sequence-tagging classi-fiers for opinion expression extraction and source extraction via lin-ear chain CRF (Lafferty et al 2001)

532 Experiment and Evaluation

The starting point of our experiment on SRL are the LG tables of theItalian verbs developed at the Department of Communication Sci-ence of the University of Salerno Among the lexical entries listed insuch matrices we manually extracted all the verbs endowed with apositive or negative defined semantic orientation Such verbs couldbe expression of emotions opinions or physical acts Details of thisclassification can be found in Section 333As an example in Tables 58 and 59 we show a small group of verbs be-longing to the Lexicon-Grammar class 45 This class includes all theverbs that can entry into a syntactic structure such as N0 V di N1 inwhich the ldquosubject (N0) selected by the verb (V ) is generally a humannoun (Num) and the complement (N1) is a completive (Ch F) or in-finitive (V-inf comp) clause usually introduced by the preposition ldquodi(see Table58) As shown in Table 59 our databases contain also se-mantic information concerning the nature the semantic orientationand the strength of the Predicates in examIn detail the tagset used in this experiment is the following

1 Type

(a) SENT sentiment

(b) OP opinion

(c) PHY physical act

216 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

N0=

Nu

m

N0=

ilfa

tto

Ch

F

Verb

N1=

che

F

N1=

che

Fco

ng

diN

1um

diN

1-u

m

diN

1co

ntr

oN

2

pre

pV

-in

fco

mp

+ - profittare + + + + + -+ - ridersene + + + + - -+ - risentirsi + + + + - ++ - strafottersene + + + + - -+ - vergognarsi + + + + - +

Table 58 Extract of the LG table of the verb Class 45

2 Orientation

(a) POS positive

(b) NEG negative

3 Intensity

(a) FORTE intense

(b) DEB feeble

Speaking in terms of Frame Semantics we identified in the OpinionMining and in the Sentiment Analysis field three Frames of interestrecalled by specific Predicates Sentiment Opinion and Physical actThe frame elements evoked by such frames are described below

Sentiment

It refers to the expression of any given frame of mind or affectivestateThe ldquosentiment words can be put in connection with some WordNetAffect categories (Strapparava et al 2004) such as emotion mood he-donic signalExamples are sdegnarsi ldquoto be indignant (class 10) odiare ldquoto hate(class 20) affezionarsi ldquoto grow fond (class 44B) flirtare ldquoto flirt

53 SENTIMENT ROLE LABELING 217

(class 9) disprezzare ldquoto despise (class 20) gioire ldquoto rejoice (class45)Predicates of that kind evoke as frame elements an experiencerthat feels the emotion or other internal states and a stimulusand event or a person that instigates such states (Gross 1995Gildea and Jurafsky 2002 Swier and Stevenson 2004 Palmeret al 2005 Elia 2013) This semantic frame summarizes theFrameNet ones connected to emotions such as Sensation Emo-tions_of_mental_activity Cause_to_experience Emotion_active Emo-tions Cause_emotion ect

Opinion The type ldquoOpinion instead is the expression of positiveor negative viewpoints beliefs or judgments that can be personal orshared by most people It comprehends among the the WN-affect cat-egories trait cognitive state behavior attitude Examples are ignorareldquoto neglect (class 20) premiare ldquoto reward (class 20) difendere ldquotodefend (class 27) esaltare ldquoto exalt (class 22) dubitare ldquoto doubt(class 45) condannare ldquoto condemn (class 49) deridere ldquoto make funof (class 50)The frame elements they evoke are an opinion holder that states anopinion about an object or an event and an opinion target that repre-sents the event or the object on which the opinion is expressed about(Kim and Hovy 2006 Liu 2012) Some predicates imply also the pres-ence of a cause that motivates the opinion (see Section 333)Into the FrameNet frame Opinion and Judgment the opinion holder iscalled Cognizer bu we preferred to use a word which is more commonin the Sentiment Analysis and in the Opinion Mining literature

Physical act

The type ldquoPhysical act comprises verbs like baciare ldquoto kiss (class18) suicidarsi ldquoto commit suicide (class 2) vomitare ldquoto vomit (class2A) schiaffeggiare ldquoto slap (class 20) sparare ldquoto shoot (class 4)

218 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

palpeggiare ldquoto grope (class 18)For this group of predicates the selected frame elements ara a patientthat is the victim (for the negative actions) or the beneficiary (for thepositive ones) of the physical act carried out by an agent (Carreras andMagraverquez 2005 Magraverquez et al 2008) It includes a large number ofFrameNet frames such as Cause_bodily_experience Shoot_projectilesKilling Cause_harm Rape Sex Violence ect

In order to perform the orientation and the intensity attributionwe manually explored the Italian LG tables of verbs and the SentItasemantic labelsThanks to lexical resources of this kind it is possible to automaticallyextract and semantically describe real occurrences of sentences like(85)

(85) [Sentiment[ExperiencerRenzi(N0)] si vergogna(V) [Stimulusdi parlare di energia in Eu-

ropa(N1)]]

ldquoRenzi feels ashamed of talking about energy in Europe

in which the syntactic structure of the verb vergognarsi ldquoto feelashamed N0 V (di) Ch F is matched by means of interpretationrules (eg V = N0 V (di) Ch F = Caus(s sent h) N0 = h N1 = s) to thesemantic function Caus(s sent h) that puts in relation an experiencerh and a stimulus s thanks to a Sentiment Semantic Predicate sent(Gross 1995 Section 214)Moreover we provided our LG databases with the specification of thearguments (N0 N1 N2 ect) that are semantically influenced by thesemantic orientation of the verbs The purpose is to correctly identifythem as features of the opinionated sentences and to work on theirbase also into feature-based sentiment analysis tasks

The reliability of the LG method on the Semantic Role Labelingin the Sentiment Analysis task has been tested on three different

53 SENTIMENT ROLE LABELING 219

Verb

LGC

lass

Typ

e

Ori

enta

tio

n

Inte

nsi

ty

Infl

uen

ce

profittare 45 opinion negative - N0ridersene 45 opinion negative weak N1risentirsi 45 sentiment negative - N0strafottersene 45 opinion negative strong N1vergognarsi 45 sentiment negative strong N0

Table 59 Examples of semantic descriptions of the opinionated verbs from the LGclass 45

datasets two of which have been extracted from social networks orweb resourcesIn detail the first two datasets come from Twitter and the thirdis a free web news headings dataset provided by DataMediaHub(wwwdatamediahubit wwwhumanhighwayit)The tweets have been downloaded using the hashtag that groupstogether the user comments on the election of the Italian PresidentMattarella (1) and the one that collects the comments on the Mas-terchef Italian TV show (2)

bull Tweets 46393 tweets

1 Mattarellapresidente 10000 tweets

2 Masterchefit 36393 tweets

bull News Headings 80651 titles

The LG based approach on which this experiment has been builtincludes the following basic steps

bull pre-processing pipeline that includes two phases

ndash a cleaning up phase carried out with Python routines that

220 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

aims to distinguish in the datasets linguistic elements fromstructural elements (eg markup informations web specificelements)

ndash an automatic linguistic analysis phase with the goal to lin-guistically standardize relevant elements obtained from thecleaned datasets in this phase texts are tokenized lem-matized and POS tagged using TreeTagger (Schmid 1994Schmid et al 2007) and then parsed using DeSR a depen-dency parser (Attardi et al 2009)

bull LG based automatic analysis in which the raw data are seman-tically labeled according with the syntacticsemantic rules of in-terpretation connected with each LG verb class

Figure 55 Examples of syntactically and semantically annotated sentences

53 SENTIMENT ROLE LABELING 221

Figure 55 presents three headlines examples processed both withthe dependency syntactic parser and the semantic LG-based seman-tic analyzerNotice that the elements of the traditional grammar automaticallyidentified by DeSR such as subjects and complements have beenrenamed according with the Lexicon-Grammar traditionThe corpus that counts 127044 short texts has been analyzed andsemantically and syntactically annotatedThe representative sample on which the human evaluation has beenperformed instead has 42348 textsThe evaluation of the performances of our tool proved the effec-tiveness of the Lexicon-Grammar approach The average F-scoresachieved in the different datasets are 071 in the Twitter and 076 inthe Heading corpus

Chapter 6

Conclusion

The present research handled the detection of linguistic phenomenaconnected to subjectivity emotions and opinions from a computa-tional point of viewThe necessity to quickly monitor huge quantity of semistructured andunstructured data from the web poses several challenges to NaturalLanguage Processing that must provide strategies and tools to ana-lyze their structures from a lexical syntactical and semantic point ofviewsThe general aim of the Sentiment Analysis shared with the broaderfields of NLP Data Mining Information Extraction etc is the auto-matic extraction of value from chaos (Gantz and Reinsel 2011) itsspecific focus instead is on opinions rather than on factual informa-tion This is the aspect that differentiates it from other computationallinguistics subfieldsThe majority of the sentiment lexicons has been manually or auto-matically created for the English language therefore existent Italianlexicons are mostly built through the translation and adaptation of theEnglish lexical databases eg SentiWordNet and WordNet-AffectUnlike many other Italian and English sentiment lexicons SentIta

223

224 CHAPTER 6 CONCLUSION

made up on the interaction of electronic dictionaries and lexicon de-pendent local grammars is able to manage simple and multiwordstructures that can take the shape of distributionally free structuresdistributionally restricted structures and frozen structuresMoreover differently from other lexicon-based Sentiment Analysismethods our approach has been grounded on the solidity of theLexicon-Grammar resources and classifications that provide fine-grained semantic but also syntactic descriptions of the lexical entriesSuch lexically exhaustive grammars distance themselves from the ten-dency of other sentiment resources to classify together words thathave nothing in common from the syntactic point of viewFurthermore through the Semantic Predicate theory it has been pos-sible to postulate a complex parallelism among syntax and semanticsthat does not ignore the numerous idiosyncrasies of the lexiconAccording with the major contribution in the Sentiment Analysis lit-erature we did not consider polar words in isolation We computedthey elementary sentence contexts with the allowed transformationsand then their interaction with contextual valence shifters the lin-guistic devices that are able to modify the prior polarity of the wordsfrom SentIta when occurring with them in the same sentencesIn order to do so we took advantage of the computational power ofthe finite-state technology We formalized a set of rules that work forthe intensification downtoning and negation modeling the modalitydetection and the analysis of comparative formsHere the difference with other state-of-the-art strategies consists inthe elimination of complex mathematical calculation in favor of theeasier use of embedded graphs as containers for the expressions de-signed to receive the same annotations into a compositional frame-workWith regard to the applicative part of the research we conducted withsatisfactory results three experiments on the same number of Sen-timent Analysis subtasks the sentiment classification of documentsand sentences the feature-based Sentiment Analysis and the seman-

225

tic role labeling based on sentimentsThis work exploited a method based on knowledge but this doesnot run counter statistical strategies All the semantically annotateddatasets produced by our fine-grained analysis that can indeed bereused as testing sets for learning machinesAs concerns the limitations of this research we mention among oth-ers the cases of irony sarcasm and cultural stereotypes which stillremain open problems for the NLP in general and for the SentimentAnalysis in particular since they can completely overturn the polarityof the sentencesSarcasm hyperbole jocularity etc are all rhetorical devices usedto express verbal irony a figurative linguistic process used to inten-tionally deny what it is literally expressed (Gibbs 2000 Curcoacute 2000)Therefore irony could been interpreted as a sort of indirect negation(Giora 1995 Giora et al 1998) in which some expectations are vio-lated (Clark and Gerrig 1984 Wilson and Sperber 1992) On this pur-pose (Reyes et al 2013 p 243) affirms that irony is perceived ldquoat theboundaries of conflicting frames of reference in which an expectationof one frame has been inappropriately violated in a way that is appro-priate in the otherrdquoThis linguistic device has been explored in both computational lin-guistics studies and in Opinion Mining works Sets of potential indica-tors have been tested on large corpora through different approachesobtaining encouraging results over the state-of-the-art in some re-spectsIn detail Tepperman et al (2006) used prosodic spectral and contex-tual cues Carvalho et al (2009) selected as irony cues emoticons ono-matopoeic expressions for laughter heavy punctuation marks quota-tion marks and positive interjections Davidov et al (2010) exploitedthe meta-data provided by Amazon to locate gaps between the starratings and the sentiments expressed in the raw reviews Reyes andRosso (2011) chose n-grams part of speech n-grams funny profilingpositivenegative profiling affective profiling and pleasantness pro-

226 CHAPTER 6 CONCLUSION

filing Gonzaacutelez-Ibaacutenez et al (2011) identified sarcasm through punc-tuation marks emoticons quotes capitalized words discursive termsthat hint at opposition or contradiction in a text temporal and contex-tual imbalance character-grams skip-grams and emotional scenar-ios concerning activation imagery and pleasantness Maynard andGreenwood (2014) exploited the hashtag tokenizationAmong others Veale and Hao (2009) carried out a very interestingstudy on creative irony focusing on ironic comparisons of the kind ATopic is as Ground as a Vehicle (Fishelov 1993 Moon 2008) whichcan consist on pre-fabricated similes eg ldquoas strong as an oxrdquo (Tay-lor 1954 Norrick 1986 Moon 2008) or on new and more emphaticand creative variations along the frozen similes eg ldquoas dangerous asa toothless wolfrdquo The authors found out that the 2 of these elabora-tions subvert the original simile to achieve an ironic effect The prob-lem is that sarcasm in general is a phenomenon inherently difficult toanalyze for machines and often for humans as well (Justo et al 2014Maynard and Greenwood 2014) In addition the barriers to the cre-ation of brand new comparisons are very low (Veale and Hao 2009)In any case the solutions to verbal irony proposed in literature seemfar from being accurate and generalizable They frequently lead tocontradictory results such as with the use of punctuation markswhich made Carvalho et al (2009) and Tepperman et al (2006) reachrelatively high precision but served as very weak predictors for Justoet al (2014) and Tsur et al (2010)In this works speaking in cost-benefit terms we realized that thequantity of errors introduced by the use of irony cues would havegone over the errors caused by their absence Therefore we limited theirony analysis to the cases of frozen comparative sentences describedin Section 342 that can attribute a semantic orientation to neutralwords (86) or reverse the orientation of polar words (87)

(86) Arturo egrave bianco[0] come un cadavere [-2]ldquoArturo is as white as a dead body (Arturo is pale)

227

(87) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

Anyway as can be noticed in (88) and (89) ironic comparisons canvary considerably from one case to another (Veale and Hao 2009) andcannot be distinguished automatically from non-ironic comparisons(90) as long as we don not include (domain-dependent) encyclopedicknowledge into the analysis tools

(88) La ripresa egrave degna[+2] di un trattore con aratro inseritoldquoThe pickup is worthy of a tractor with an inserted ploughrdquo

(89) E quel tocco di piccante () egrave gradevole[+2] quanto lo sarebbe unaspruzzata di pepe su un gelato alla panna

ldquoAnd that touch of piquancy ( ) is as pleasant as a spattering ofpepper on a cream flavoured ice-creamrdquo

(90) Gli spazi interni sono comodi[+2] come una berlina [+2]ldquoInside spaces are as comfortable as a sedanrdquo

Similar comments on the required encyclopedic knowledge can bemade about cultural stereotypes as well (91 92)

(91) La nuova fiat 500 egrave consigliabile[+2] molto di piugrave ad una ragazza[-2]

ldquoThe new Fiat 500 is recommended a lot more to a girlrdquo

(92) Un gioco per bambini di 12 anni [-2]ldquoA game for 12 years old childrdquo

Appendix A

Italian Lexicon-GrammarSymbols

Agg Val Evaluative adjective

Agg Adjective

Avv Adverbe

Ci Constrained nominal group

Ch F Completive complement without specification of mood

Det Determiner

N0 Sentence formal subject

N123 Sentence complements

Ni Nominal group

N Noun

Napp Appropriate noun

Nconc Concrete noun

229

230 APPENDIX A ITALIAN LEXICON-GRAMMAR SYMBOLS

Npc Noun of body part

Nsent Noun of Sentiment

Num Human noun

N-um Not human noun

Ppv Pronoun in preverbal position

Prep Preposition

V Verb

V-a Adjectivalization

Vcaus Causative verb

V-n Nominalization

Vsup Support verb

Vval Evaluative verb

Vvar Support verb variant

Appendix B

Acronyms

ALU Atomic Linguistic Units

cBNP Combined Pattern Based Noun Phrases

CCG Combinatory Categorial Grammar

CFE Concordance-based Feature Extraction

CRF Conditional Random Field

CSR Class Sequential Rule

CVS Contextual Valence Shifters

DRS Discourse Representation Structure

ERTN Enhanced Recursive Transition Network

eWOM electronic Word of Mouth

FMM Forward Maximum Matching

FSA Finite State Automata

FST Finite State Transducers

231

232 APPENDIX B ACRONYMS

LDA Latent Dirichlet Allocation

LG Lexicon-Grammar

LSA Latent Semantic Analysis

LSF Lexical Syntactic Feature

mCVS Morphological Contextual Valence Shifters

MPQA Multi Perspective Question Answering

NLP Natural Language Processing

PMI Pointwise Mutual Information

POS Part of Speech

PP Prior Polarity

QN Quality Noun

RNTN Recursive Neural Tensor Network

RST Relevant Semantic Tree

RTN Recursive Transition Network

SO Semantic Orientation

SVM Support Vector Machine

TGG Transformational-Generative Grammar

UDF User Defined taxonomy of entity Feature

WMI Web-based Mutual Information

XML eXtensible Markup Language

List of Figures

21 Syntactic Trees of the sentences (1) and (2) 2422 Syntactic Trees of the sentences (1b) and (2b) 2623 LG syntax-semantic interface 3224 Deterministic Finite-State Automaton 3725 Non deterministic Finite-State Automaton 3726 Finite-State Transducer 3727 Enhanced Recursive Transition Network 37

31 FSA for the interaction between Nsent and semanticallyannotated verbs 88

32 Extract of the morphological FSA that associates psychverbs to their adjectivalizations 92

33 Frozen and semi-frozen expressions that involve the vul-gar word cazzo and its euphemisms 103

34 Extract of the FSA for the SO identification of idioms 12035 Extract of the FSA for the extraction of PECO idioms 12436 Extract of the FSA for the automatic annotation of senti-

ment adverbs 13737 Extract of the FSA for the automatic annotation of qual-

ity nouns 14338 Morphological Contextual Valence Shifting of Adjectives 150

233

234 LIST OF FIGURES

51 Doxa the LG sentiment classifier based on the finite-state technology 189

52 Extract of the Feature Extraction grammar 20653 Extract of the Feature Pruning metanode 20754 Doxa module for Feature-based Sentiment Analysis 21255 Examples of syntactically and semantically annotated

sentences 220

List of Tables

21 Example of a Lexicon-Grammar Binary Matrix 27

31 Manually built Polarity Lexicons 4632 (Semi-)Automatically built Polarity Lexicons 4633 Sentix Basile and Nissim (2013) 5434 SentIta Evaluation tagset 5735 SentIta Strength tagset 5736 Composition of SentIta 5837 SentIta tag distribution in the adjective list 6138 Adjectives percentage values in SentIta 6139 Examples from the Psych Predicates dictionary 67310 Psych Predicates chosen for SentIta 67311 All the Semantic Predicates from SentIta 68312 Polar verbs in the main Lexicon-Grammar verbal sub-

classes 79313 Polar verbs in other LG verb classes which contain at

least one oriented item 80314 Distribution of semantic labels into the LG verb classes

that contain at least one oriented item 81315 Description of the Nominalizations of the Psychological

Semantic Predicates 82316 Nominalizations of the Psych Predicates chosen from

SentIta 84

235

236 LIST OF TABLES

317 Distribution of semantic tags in the dictionary of Adjec-tival Psych Predicates 90

318 Percentage values of Psych Adjectivalizations in SentIta 91319 Suffixes from the dictionary of Psych Adjectivalizations 93320 Adjectivalizations of the Psych Predicates 94321 SentIta tag distribution in the list of bad words 105322 Bad Words percentage values in SentIta 105323 Classes of Compound Adverbs in SentIta 110324 Composition of the compound adverb dictionary 113325 Polar adjectives in frozen sentences 115326 Examples of idioms that maintain switch or shift the

prior polarity of the adjectives they contain 117327 Composition of the adverb dictionary 138328 Adjectiversquos inflectional classes Salvi and Vanelli (2004) 139329 Rules for the adverb derivation 139330 Error analysis of the automatic QN annotation 144331 Presence of the QN suffixes in different dictionaries 145332 Productivity of the QN suffixes in different dictionaries 146333 About the overlap among Psych V-n dictionary and QN

dictionary 147334 Precision measure about the overlap of QN and Psy V-n 147335 Negation suffixes 148336 Intensifierdowntoner suffixes 149

41 The effects of troppo poco and abbastanza on polar lex-ical items 160

42 The effects of the co-occurrence and the negation oftroppo poco with reference to polar lexical items 160

43 Negation rules 16744 Negation rules with the repetition of negative operators 16745 Negation rules with strong operators 16746 Negation rules with weak operators 168

LIST OF TABLES 237

47 Rule I increasing comparative sentences with positiveorientation 180

48 Rule III decreasing comparative sentences with nega-tive orientation 180

49 Rule IV increasing comparative sentences with NegativeOrientation 181

410 Rule VI decreasing comparative sentences with positiveorientation 181

411 Nagation of Rule I Negated increasing comparative sen-tences with negative orientation 182

412 Negation of Rule III Negated decreasing comparativesentences with positive orientation 182

413 Negation of Rule IV Negated increasing comparativesentences with negative orientation 183

414 Negation of RuleVI Negated decreasing comparativesentences with positive orientation 183

51 Dataset of opinionated online customer reviews 19052 Precision measure in sentence-level classification 19353 Irrelevant matches in sentence-level classification 19554 Precision measure in document-level classification 19655 Recall in both the sentence-level and the document level

classifications 19656 Semantically Labeled Dictionary of Concrete Nouns 20557 F-score of Feature Extraction and Pruning 20958 Extract of the LG table of the verb Class 45 21659 Examples of semantic descriptions of the opinionated

verbs from the LG class 45 219

Bibliography

Abdul-Mageed M and Diab M (2012) Toward building a large-scalearabic sentiment lexicon In Proceedings of the 6th InternationalGlobal WordNet Conference pages 18ndash22

Abdul-Mageed M Diab M T and Korayem M (2011) Subjectiv-ity and sentiment analysis of modern standard arabic In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies short papers-Volume 2pages 587ndash591 Association for Computational Linguistics

Agerri R and Garciacutea-Serrano A (2010) Q-WordNet Extracting polar-ity from WordNet senses In Proceedings of the International Confer-ence on Language Resources and Evaluation pages 2300ndash2305

Akerlof G A (1970) The market for ldquolemonsrdquo Quality uncertaintyand the market mechanism In The quarterly journal of economicspages 488ndash500 JSTOR

Alba J Lynch J Weitz B Janiszewski C Lutz R Sawyer A andWood S (1997) Interactive home shopping consumer retailer andmanufacturer incentives to participate in electronic marketplacesIn The Journal of Marketing pages 38ndash53 JSTOR

Alm C O Roth D and Sproat R (2005) Emotions from text ma-chine learning for text-based emotion prediction In Proceedingsof the conference on Human Language Technology and Empirical

239

240 BIBLIOGRAPHY

Methods in Natural Language Processing pages 579ndash586 Associa-tion for Computational Linguistics

Andreevskaia A and Bergler S (2008) When specialists and gener-alists work together Overcoming domain dependence in sentimenttagging In ACL pages 290ndash298

Arad M (1998) Psych-notes In UCL Working Papers in Linguisticsvolume 10 pages 1ndash22

Argamon S Bloom K Esuli A and Sebastiani F (2009) Automati-cally determining attitude type and force for sentiment analysis InHuman Language Technology Challenges of the Information Societypages 218ndash231 Springer

Attardi G DellrsquoOrletta F Simi M and Turian J (2009) Accuratedependency parsing with a stacked multilayer perceptron In Pro-ceedings of EVALITA volume 9

Aue A and Gamon M Customizing sentiment classifiers to new do-mains A case study In Proceedings of recent advances in naturallanguage processing (RANLP)

Baccianella S Esuli A and Sebastiani F (2010) SentiWordNet 30An enhanced lexical resource for sentiment analysis and opinionmining In LREC volume 10 pages 2200ndash2204

Baker C F Fillmore C J and Lowe J B (1998) The berkeleyframenet project In Proceedings of the 17th international confer-ence on Computational linguistics-Volume 1 pages 86ndash90 Associa-tion for Computational Linguistics

Baker M (1988) Theta theory and the syntax of applicatives inchichewa In Natural Language amp Linguistic Theory volume 6 pages353ndash389 Springer

Baker M C (1997) Thematic roles and syntactic structure In Ele-ments of grammar pages 73ndash137 Springer

BIBLIOGRAPHY 241

Balahur A and Turchi M (2012) Multilingual sentiment analysis us-ing machine translation In Proceedings of the 3rd Workshop inComputational Approaches to Subjectivity and Sentiment Analysispages 52ndash60 Association for Computational Linguistics

Baldoni M Baroglio C Patti V and Rena P (2012) From tags toemotions Ontology-driven sentiment analysis in the social seman-tic web In Intelligenza Artificiale volume 6 pages 41ndash54

Balibar-Mrabti A (1995) Une eacutetude de la combinatoire des noms desentiment dans une grammaire locale In Langue franccedilaise pages88ndash97 JSTOR

Baptista J (2003) Some families of compound temporal adverbs inportuguese In Proceedings of the Workshop on Finite-State Methodsfor Natural Language Processing

Baroni M and Vegnaduzzo S (2004) Identifying subjective adjec-tives through web-based mutual information In Proceedings ofKONVENS volume 4 pages 17ndash24

Basile V and Nissim M (2013) Sentiment analysis on italian tweetsIn Proceedings of the 4th Workshop on Computational Approaches toSubjectivity Sentiment and Social Media Analysis pages 100ndash107

Belletti A and Rizzi L (1988) Psych-verbs and θ-theory In NaturalLanguage amp Linguistic Theory volume 6 pages 291ndash352 Springer

Benamara F Cesarano C Picariello A Recupero D R and Subrah-manian V S (2007) Sentiment analysis Adjectives and adverbs arebetter than adjectives alone In ICWSM

Benamara F Chardon B Mathieu Y Popescu V and Asher N(2012) How do negation and modality impact on opinions In Pro-ceedings of the Workshop on Extra-Propositional Aspects of Meaningin Computational Linguistics pages 10ndash18 Association for Compu-tational Linguistics

242 BIBLIOGRAPHY

Bennis H (2004) Unergative adjectives and psych verbs In Studiesin unaccusativity The syntax-lexicon interface pages 84ndash113

Berger A L Pietra V J D and Pietra S A D (1996) A maximum en-tropy approach to natural language processing In Computationallinguistics volume 22 pages 39ndash71 MIT Press

Bespalov D Bai B Qi Y and Shokoufandeh A (2011) Sentimentclassification based on supervised latent n-gram analysis In Pro-ceedings of the 20th ACM international conference on Informationand knowledge management pages 375ndash382 ACM

Bethard S Yu H Thornton A Hatzivassiloglou V and Jurafsky D(2004) Automatic extraction of opinion propositions and their hold-ers In 2004 AAAI Spring Symposium on Exploring Attitude and Affectin Text volume 2224

Bhatti M W Wang Y and Guan L A neural network approachfor human emotion recognition in speech In Circuits and Systems2004 ISCASrsquo04 volume 2

Blitzer J Dredze M Pereira F et al (2007) Biographies bollywoodboom-boxes and blenders Domain adaptation for sentiment clas-sification In ACL volume 7 pages 440ndash447

Bloom K (2011) Sentiment analysis based on appraisal theory andfunctional local grammars PhD thesis Illinois Institute of Technol-ogy

Bloom K Garg N and Argamon S (2007) Extracting appraisal ex-pressions In HLT-NAACL volume 2007 pages 308ndash315

Bloomfield L (1933) Language holt new york In Edizione italiana(1974) Il linguaggio Il Saggiatore Milano

Bocchino F (2007) Lessico-Grammatica dellrsquoitaliano Le costruzioniintransitive PhD thesis Universitagrave degli studi di Salerno

BIBLIOGRAPHY 243

Bos J and Nissim M (2006) An empirical approach to the interpreta-tion of superlatives In Proceedings of the 2006 conference on empiri-cal methods in natural language processing pages 9ndash17 Associationfor Computational Linguistics

Bosco C Allisio L Mussa V Patti V Ruffo G Sanguinetti M andSulis E (2014) Detecting happiness in italian tweets Towards anevaluation dataset for sentiment analysis in felicitta In Proc of ES-SSLOD pages 56ndash63

Bosco C Patti V and Bolioli A (2013) Developing corpora for sen-timent analysis The case of irony and senti-tut In IEEE IntelligentSystems number 2 pages 55ndash63 IEEE

Boucher J and Osgood C E (1969) The pollyanna hypothesis InJournal of Verbal Learning and Verbal Behavior volume 8 pages 1ndash8 Elsevier

Bounie D Bourreau M Gensollen M and Waelbroeck P (2005)The effect of online customer reviews on purchasing decisions Thecase of video games In Retrieved July volume 8 page 2009 Citeseer

Breese J S Heckerman D and Kadie C (1998) Empirical analysisof predictive algorithms for collaborative filtering In Proceedingsof the Fourteenth conference on Uncertainty in artificial intelligencepages 43ndash52 Morgan Kaufmann Publishers Inc

Brill E (1995) Transformation-based error-driven learning and nat-ural language processing A case study in part-of-speech tagging InComputational linguistics volume 21 pages 543ndash565 MIT Press

Buchanan T W Lutz K Mirzazade S Specht K Shah N J ZillesK and Jaumlncke L (2000) Recognition of emotional prosody and ver-bal components of spoken language an fmri study In CognitiveBrain Research volume 9 pages 227ndash238 Elsevier

244 BIBLIOGRAPHY

Buvet P-A Girardin C Gross G and Groud C (2005) Les preacutedicatsdrsquoaffect In LIDIL number 32 pages ppndash125

Carbonell J G (1979) Subjective understanding Computer modelsof belief systems Technical report DTIC Document

Carenini G Ng R T and Zwart E (2005) Extracting knowledge fromevaluative text In Proceedings of the 3rd international conference onKnowledge capture pages 11ndash18 ACM

Carreras X and Magraverquez L (2005) Introduction to the conll-2005shared task Semantic role labeling In Proceedings of the Ninth Con-ference on Computational Natural Language Learning pages 152ndash164 Association for Computational Linguistics

Carvalho P Sarmento L Silva M J and De Oliveira E (2009) Cluesfor detecting irony in user-generated contents oh itrsquos so easy-) In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion pages 53ndash56 ACM

Chen Y Zhou Y Zhu S and Xu H (2012) Detecting offensive lan-guage in social media to protect adolescent online safety In Pri-vacy Security Risk and Trust (PASSAT) 2012 International Confer-ence on and 2012 International Confernece on Social Computing (So-cialCom) pages 71ndash80 IEEE

Chetviorkin I and Loukachevitch N V (2012) Extraction of rus-sian sentiment lexicon for product meta-domain In COLING pages593ndash610

Chevalier J A and Mayzlin D (2006) The effect of word of mouthon sales Online book reviews In Journal of marketing research vol-ume 43 pages 345ndash354 American Marketing Association

Chin P (2005) The value of user-generated contents In Intranet Jour-nal www intranetjournal com

BIBLIOGRAPHY 245

Choi Y Breck E and Cardie C (2006) Joint extraction of entities andrelations for opinion recognition In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing pages431ndash439 Association for Computational Linguistics

Choi Y and Cardie C (2008) Learning with compositional seman-tics as structural inference for subsentential sentiment analysis InProceedings of the Conference on Empirical Methods in Natural Lan-guage Processing pages 793ndash801

Choi Y Cardie C Riloff E and Patwardhan S (2005) Identify-ing sources of opinions with conditional random fields and extrac-tion patterns In Proceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Language Processingpages 355ndash362 Association for Computational Linguistics

Chomsky N (1957) Syntactic Structures Mouton and Co The Hague-Paris

Chomsky N (1965) Aspects of the Theory of Syntax Number 11 MITpress

Chomsky N (1993) Lectures on government and binding The Pisalectures Number 9 Walter de Gruyter

Chu S-C and Kim Y (2011) Determinants of consumer engagementin electronic word-of-mouth (ewom) in social networking sites InInternational journal of Advertising volume 30 pages 47ndash75 Tayloramp Francis

Clark H H and Gerrig R J (1984) On the pretense theory of ironyIn Journal of Experimental Psychology General volume 113 pages121ndash126

Clark S and Curran J R (2004) The importance of supertagging forwide-coverage ccg parsing In Proceedings of the 20th international

246 BIBLIOGRAPHY

conference on Computational Linguistics page 282 Association forComputational Linguistics

Clematide S and Klenner M (2010) Evaluation and extension of apolarity lexicon for german In Proceedings of the First Workshop onComputational Approaches to Subjectivity and Sentiment Analysispages 7ndash13

Councill I G McDonald R and Velikovich L (2010) Whatrsquos greatand whatrsquos not learning to classify the scope of negation for im-proved sentiment analysis In Proceedings of the workshop on nega-tion and speculation in natural language processing pages 51ndash59Association for Computational Linguistics

Curcoacute C (2000) Irony Negation echo and metarepresentation InLingua volume 110 pages 257ndash280 Elsevier

DrsquoAgostino E (1992) Analisi del discorso metodi descrittividellrsquoitaliano drsquouso Loffredo

DrsquoAgostino E (2005) Grammatiche lessicalmente esaustive delle pas-sioni il caso dellrsquoio collerico le forme nominali In Quaderns drsquoItaliagravepages 149ndash169

DrsquoAgostino E De Bueriis G Cicalese A Monteleone M VellutinoD Messina S Langella A Santonicola S Longobardi F andGuglielmo D (2007) Lexicon-grammar classifications or better toget rid of anguish In 26th International Conference on Lexis andGrammar (LGCrsquo07)

DrsquoAgostino E and Elia A (1998) Il significato delle frasi un contin-uum dalle frasi semplici alle forme polirematiche In AA VV Ai limitidel linguaggio Laterza Bari pages 287ndash310

Dalianis H and Skeppstedt M (2010) Creating and evaluating a con-sensus for negated and speculative words in a swedish clinical cor-pus In Negation and Speculation in Natural Language Processing(NeSp-NLP 2010) page 5

BIBLIOGRAPHY 247

Dang Y Zhang Y and Chen H (2010) A lexicon-enhanced methodfor sentiment classification An experiment on online product re-views In Intelligent Systems IEEE volume 25 pages 46ndash53 IEEE

Darroch J N and Ratcliff D (1972) Generalized iterative scaling forlog-linear models In The annals of mathematical statistics pages1470ndash1480 JSTOR

Dasgupta S and Ng V (2009) Mine the easy classify the hard asemi-supervised approach to automatic sentiment classification InProceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural LanguageProcessing of the AFNLP Volume 2-Volume 2 pages 701ndash709 Asso-ciation for Computational Linguistics

Davidov D Tsur O and Rappoport A (2010) Semi-supervisedrecognition of sarcastic sentences in twitter and amazon In Pro-ceedings of the Fourteenth Conference on Computational NaturalLanguage Learning pages 107ndash116 Association for ComputationalLinguistics

de Albornoz J C Plaza L and Gervaacutes P (2012) Sentisense An easilyscalable concept-based affective lexicon for sentiment analysis InLREC pages 3562ndash3567

De Bueriis G and Elia A (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

De Gioia M (1994a) Chiaro come il sole a mezzogiorno una classedi avverbi idiomatici dellrsquoitaliano In The Linguist volume 33 pages220ndash225 Newnorth Print Ltd

De Gioia M (1994b) Problemi di rappresentazione et traduzione degliavverbi idiomatici Micromeacutegas

De Gioia M (2001) Avverbi idiomatici dellrsquoitaliano analisi lessico-grammaticale LrsquoHarmattan Italia

248 BIBLIOGRAPHY

De Mauro T and Thornton A M (1985) La predicazione teoria e ap-plicazione allrsquoitaliano In Sintassi e morfologia della lingua italianadrsquouso teorie ed applicazioni descrittive pages 487ndash519

De Silva L C Miyasato T and Nakatsu R (1997) Facial emotionrecognition using multi-modal information In Information Com-munications and Signal Processing 1997 ICICS Proceedings of 1997International Conference on volume 1 pages 397ndash401 IEEE

Denecke K (2008) Using SentiWordNet for multilingual sentimentanalysis In Data Engineering Workshop 2008 ICDEW 2008 IEEE24th International Conference on pages 507ndash512 IEEE

Descleacutes J Makkaoui O and Hacegravene T (2010) Automatic annota-tion of speculation in biomedical texts new perspectives and largescale evaluation In Negation and Speculation in Natural LanguageProcessing (NeSp-NLP 2010) page 32

Dewally M and Ederington L (2006) Reputation certification war-ranties and information as remedies for seller-buyer informationasymmetries Lessons from the online comic book market In TheJournal of Business volume 79 pages 693ndash729 JSTOR

Diamond D W (1989) Reputation acquisition in debt markets InThe journal of political economy pages 828ndash862 JSTOR

Dowty D R (1989) On the semantic content of the notion of lsquothe-matic rolersquo In Properties types and meaning pages 69ndash129 Springer

Dragut E C Yu C Sistla P and Meng W (2010) Construction ofa sentimental word dictionary In Proceedings of the 19th ACM in-ternational conference on Information and knowledge managementpages 1761ndash1764

Duan W Gu B and Whinston A B (2008) The dynamics of on-line word-of-mouth and product salesmdashan empirical investigation

BIBLIOGRAPHY 249

of the movie industry In Journal of retailing volume 84 pages 233ndash242 Elsevier

Elia A Per filo e per segno la struttura degli avverbi compostiIn DrsquoAgostino (a cura di) Tra sintassi e semantica Descrizioni emetodi di elaborazione automatica della lingua drsquouso pages 167ndash263 Napoli ESI

Elia A (1984) Le verbe italien Les compleacutetives dans les phrases agrave uncompleacutement Fasano di Puglia Schena-Nizet

Elia A (1990) Chiaro e tondo Lessico-Grammatica degli avverbi com-posti in italiano Segno Associati

Elia A (2013) On lexical semantic and syntactic granularity of ital-ian verbs In Penser le Lexique Grammaire Perspectives ActuellesHonoreacute Champion Paris pages 277ndash286

Elia A (2014a) Lessico e sintassi tra tempo e massa parlante InMarchese MP Nocentini A Il lessico nella teoria e nella storia lin-guistica pages 15ndash47 Edizioni il Calamo

Elia A (2014b) Operatori argomenti e il sistema leg-semantic rolelabelling dellrsquoitaliano In Mirto I a cura di Le relazioni irresistibilipages 105ndash118 Edizioni Ets

Elia A and De Bueriis G (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

Elia A Guglielmo D Maisto A and Pelosi S (2013) A linguistic-based method for automatically extracting spatial relations fromlarge non-structured data In Algorithms and Architectures for Par-allel Processing pages 193ndash200 Springer

Elia A Martinelli M and DrsquoAgostino E (1981) Lessico e Strut-ture sintattiche Introduzione alla sintassi del verbo italiano NapoliLiguori

250 BIBLIOGRAPHY

Elia A Pelosi S Maisto A and Raffaele G (2015) Towards alexicon-grammar based framework for nlp an opinion mining ap-plication In RANLP 2015 pages 160ndash167

Elia A and Vietri S (2010) Lexis-grammar amp semantic web In IN-FOtheca volume 11 pages 15andash38a

Elia A Vietri S Postiglione A Monteleone M and Marano F(2010) Data mining modular software system In SWWS pages 127ndash133

Engstroumlm C (2004) Topic dependence in sentiment classification InUnpublished MPhil Dissertation University of Cambridge

Essa I Pentland A P et al (1997) Coding analysis interpretationand recognition of facial expressions In Pattern Analysis and Ma-chine Intelligence IEEE Transactions on volume 19 pages 757ndash763IEEE

Esuli A and Sebastiani F (2005) Determining the semantic orienta-tion of terms through gloss classification In Proceedings of the 14thACM international conference on Information and knowledge man-agement pages 617ndash624 ACM

Esuli A and Sebastiani F (2006a) Determining term subjectivity andterm orientation for opinion mining In EACL volume 6 page 2006

Esuli A and Sebastiani F (2006b) SentiWordNet A publicly avail-able lexical resource for opinion mining In Proceedings of LRECvolume 6 pages 417ndash422

Farkas D F and Kiss K Eacute (2000) On the comparative and absolutereadings of superlatives In Natural Language amp Linguistic Theoryvolume 18 pages 417ndash455 Springer

Farkas R Vincze V Moacutera G Csirik J and Szarvas G (2010) Theconll-2010 shared task learning to detect hedges and their scope in

BIBLIOGRAPHY 251

natural language text In Proceedings of the Fourteenth Conferenceon Computational Natural Language LearningmdashShared Task pages1ndash12 Association for Computational Linguistics

Fellbaum C (1998) WordNet Wiley Online Library

Ferreira L Jakob N and Gurevych I (2008) A comparative studyof feature extraction algorithms in customer reviews In SemanticComputing 2008 IEEE International Conference on pages 144ndash151IEEE

Filip H (1996) Psychological predicates and the syntax-semantics in-terface In Conceptual Structure Discourse and Language StanfordCenter for the Study of Language and Information

Fillmore C J (1976) Frame semantics and the nature of languageIn Annals of the New York Academy of Sciences volume 280 pages20ndash32 Wiley Online Library

Fillmore C J (1977) The case for case reopened In Syntax and se-mantics volume 8 pages 59ndash82

Fillmore C J (2006) Frame semantics In Cognitive linguisticsBasic readings volume 34 pages 373ndash400 Berlin Mounton deGruyter[Originally published in Linguistics in the Morning CalmSeoul Hanshin Publishing Co 1982]

Fillmore C J and Baker C F (2001) Frame semantics for text un-derstanding In Proceedings of WordNet and Other Lexical ResourcesWorkshop NAACL

Fillmore C J Baker C F and Sato H (2002) The framenet databaseand software tools In LREC

Fishelov D (1993) Poetic and non-poetic simile structure seman-tics rhetoric In Poetics Today pages 1ndash23 JSTOR

252 BIBLIOGRAPHY

Fiszman M Demner-Fushman D Lang F M Goetz P and Rind-flesch T C (2007) Interpreting comparative constructions inbiomedical text In Proceedings of the Workshop on BioNLP 2007Biological Translational and Clinical Language Processing pages137ndash144 Association for Computational Linguistics

Fragopanagos N and Taylor J G (2005) Emotion recognition inhumanndashcomputer interaction In Neural Networks volume 18pages 389ndash405 Elsevier

Gaeta L (2004) Nomi drsquoazione In La formazione delle parole in ital-iano Tuumlbingen Max Niemeyer Verlag pages 314ndash51

Gamon M and Aue A (2005) Automatic identification of senti-ment vocabulary exploiting low association with known sentimentterms In Proceedings of the ACL Workshop on Feature Engineeringfor Machine Learning in Natural Language Processing pages 57ndash64

Ganapathibhotla M and Liu B (2008) Mining opinions in compara-tive sentences In Proceedings of the 22nd International Conferenceon Computational Linguistics-Volume 1 pages 241ndash248 Associationfor Computational Linguistics

Ganter V and Strube M (2009) Finding hedges by chasing weaselsHedge detection using wikipedia tags and shallow linguistic fea-tures In Proceedings of the ACL-IJCNLP 2009 Conference Short Pa-pers pages 173ndash176 Association for Computational Linguistics

Gantz J and Reinsel D (2011) Extracting value from chaos In IDCiview number 1142 pages 9ndash10

Garcia D Garas A and Schweitzer F (2012) Positive words carryless information than negative words In EPJ Data Science volume 1EPJ

Gatti L and Guerini M (2012) Assessing sentiment strength in wordsprior polarities In arXiv preprint arXiv12124315

BIBLIOGRAPHY 253

Gibbs R W (2000) Irony in talk among friends In Metaphor andsymbol volume 15 pages 5ndash27 Taylor amp Francis

Gildea D and Jurafsky D (2002) Automatic labeling of semanticroles In Computational linguistics volume 28 pages 245ndash288 MITPress

Giora R (1995) On irony and negation In Discourse processes vol-ume 19 pages 239ndash264 Taylor amp Francis

Giora R Fein O and Schwartz T (1998) Irony Grade salience andindirect negation In Metaphor and Symbol volume 13 pages 83ndash101 Taylor amp Francis

Giordano R and Voghera M (2008) Frasi senza verbo il contributodella prosodia In Sintassi storica e sincronica dellrsquoitaliano Atti delConv Intern SILFI Basilea

Giuglea A-M and Moschitti A (2006) Semantic role labeling viaframenet verbnet and propbank In Proceedings of the 21st Inter-national Conference on Computational Linguistics and the 44th an-nual meeting of the Association for Computational Linguistics pages929ndash936 Association for Computational Linguistics

Godard D (2013) Les neacutegateurs In La grande Grammaire du franccedilaisActes Sud

Goldberg A B and Zhu X (2006) Seeing stars when there arenrsquot manystars graph-based semi-supervised learning for sentiment catego-rization In Proceedings of the First Workshop on Graph Based Meth-ods for Natural Language Processing pages 45ndash52 Association forComputational Linguistics

Gonzaacutelez-Ibaacutenez R Muresan S and Wacholder N (2011) Identify-ing sarcasm in twitter a closer look In Proceedings of the 49th An-nual Meeting of the Association for Computational Linguistics Hu-man Language Technologies short papers-Volume 2 pages 581ndash586Association for Computational Linguistics

254 BIBLIOGRAPHY

Grimshaw J (1990) Argument structure the MIT Press

Gross G (1992a) Un outil pour le FLE les classes d lsquoobjets In Actesdu colloque FLE pages 169ndash192

Gross G (2008) Les classes drsquoobjets In Lalies number 28 pages 111ndash165

Gross M (1971) Transformational Analysis of French Verbal Con-structions University of Pennsylvania

Gross M (1975) Meacutethodes en syntaxe Hermann

Gross M (1979) On the failure of generative grammar In Languagepages 859ndash885 JSTOR

Gross M (1981) Les bases empiriques de la notion de preacutedicat seacute-mantique In Langages pages 7ndash52 JSTOR

Gross M (1982) Une classification des phrases laquofigeacuteesraquo du franccedilaisIn Revue queacutebeacutecoise de linguistique volume 11 pages 151ndash185 Uni-versiteacute du Queacutebec agrave Montreacuteal

Gross M (1986) Lexicon-grammar the representation of compoundwords In Proceedings of the 11th coference on Computational lin-guistics pages 1ndash6 Association for Computational Linguistics

Gross M (1988) Les limites de la phrase figeacutee In Langages pages7ndash22 JSTOR

Gross M (1990) La caracteacuterisation des adverbes dans un lexique-grammaire In Langue franccedilaise pages 90ndash102 JSTOR

Gross M (1992b) The argument structure of elementary sentencesIn Language Research volume 28 pages 699ndash716

Gross M (1993) Les phrases figeacutees en franccedilais In Lrsquoinformationgrammaticale volume 59 pages 36ndash41 Peeters

BIBLIOGRAPHY 255

Gross M (1995) Une grammaire locale de lrsquoexpression des senti-ments In Langue franccedilaise pages 70ndash87 JSTOR

Gross M (1996) Les verbes supports drsquoadjectifs et le passif In Lan-gages volume 30 pages 8ndash18 Armand Colin

Gross M (1997) The construction of local grammars In Finite-statelanguage processing volume 11 page 329 MIT Press

Gruber J S (1965) Studies in lexical relations PhD thesis Mas-sachusetts Institute of Technology

Gruen T W Osmonbekov T and Czaplewski A J (2006) ewomThe impact of customer-to-customer online know-how exchangeon customer value and loyalty In Journal of Business research vol-ume 59 pages 449ndash456 Elsevier

Guglielmo D (2009) ldquoGuardarsi allo specchiordquo forme lessicali e sin-tattiche della vanitagrave In Atti del II Convegno Interdipartimentale delCentro Studi sulle Rappresentazioni Linguistiche

Guillet A and Leclegravere C (1981) Restructuration du groupe nominalIn Langages pages 99ndash125 JSTOR

Gutieacuterrez Y Vaacutezquez S and Montoyo A (2011) Sentiment classi-fication using semantic features extracted from WordNet-based re-sources In Proceedings of the 2nd Workshop on Computational Ap-proaches to Subjectivity and Sentiment Analysis pages 139ndash145 As-sociation for Computational Linguistics

Hanani U Shapira B and Shoval P (2001) Information filteringOverview of issues research and systems In User Modeling andUser-Adapted Interaction volume 11 pages 203ndash259 Kluwer Aca-demic Publishers

Hansen L K Arvidsson A Nielsen F Aring Colleoni E and Etter M(2011) Good friends bad news-affect and virality in twitter In Fu-ture information technology pages 34ndash43 Springer

256 BIBLIOGRAPHY

Harkema H Dowling J N Thornblade T and Chapman W W(2009) Context An algorithm for determining negation expe-riencer and temporal status from clinical reports In Journal ofbiomedical informatics volume 42 page 839 NIH Public Access

Harris Z (1988) Language and information Columbia UniversityPress

Harris Z S (1964) Transformations in linguistic structure In Pro-ceedings of the American Philosophical Society pages 418ndash422

Harris Z S (1970) Discourse analysis In Papers in structural andtransformational linguistics pages 313ndash347

Hassan A and Radev D (2010) Identifying text polarity using randomwalks In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics pages 395ndash403

Hatzivassiloglou V and McKeown K R (1997) Predicting the seman-tic orientation of adjectives In Proceedings of the 35th annual meet-ing of the association for computational linguistics and eighth con-ference of the european chapter of the association for computationallinguistics pages 174ndash181 Association for Computational Linguis-tics

Herlocker J L Konstan J A Borchers A and Riedl J (1999) An al-gorithmic framework for performing collaborative filtering In Pro-ceedings of the 22nd annual international ACM SIGIR conference onResearch and development in information retrieval pages 230ndash237ACM

Hernandez-Farias I Buscaldi D and Priego-Saacutenchez B (2014)Iradabe Adapting english lexicons to the italian sentiment polar-ity classification task In First Italian Conference on ComputationalLinguistics (CLiC-it 2014) and the fourth International WorkshopEVALITA2014 pages 75ndash81

BIBLIOGRAPHY 257

Horn L (1989) A natural history of negation University of ChicagoPress

Houen S (2011) Opinion mining with semantic analysis In Onlinewww diku dk forskning Publikationer specialer 2011

specialerapport_ final_ Soren_ Houen pdf

Hu M and Liu B (2004) Mining opinion features in customer re-views In AAAI volume 4 pages 755ndash760

Hu M and Liu B (2006) Opinion feature extraction using classsequential rules In AAAI Spring Symposium Computational Ap-proaches to Analyzing Weblogs pages 61ndash66

Iacobini C (2004) Prefissazione In La formazione delle parole initaliano Tuumlbingen Max Niemeyer Verlag pages 97ndash161

Jackendoff R (1972) Semantic interpretation in generative grammarMIT press Cambridge MA

Jackendoff R (1990) Semantic structures volume 18 MIT press

Jespersen O (1965) The philosophy of grammar Number 307 Uni-versity of Chicago Press

Jia L Yu C and Meng W (2009) The effect of negation on senti-ment analysis and retrieval effectiveness In Proceedings of the 18thACM conference on Information and knowledge management pages1827ndash1830 ACM

Jin X Li Y Mah T and Tong J (2007) Sensitive webpage classifica-tion for content advertising In Proceedings of the 1st internationalworkshop on Data mining and audience intelligence for advertisingpages 28ndash33 ACM

Jindal N and Liu B (2006a) Identifying comparative sentences intext documents In Proceedings of the 29th annual internationalACM SIGIR conference on Research and development in informationretrieval pages 244ndash251 ACM

258 BIBLIOGRAPHY

Jindal N and Liu B (2006b) Mining comparative sentences and re-lations In AAAI volume 22 pages 1331ndash1336

Jindal N and Liu B (2007) Review spam detection In Proceedings ofthe 16th international conference on World Wide Web pages 1189ndash1190 ACM

Justo R Corcoran T Lukin S M Walker M and Torres M I (2014)Extracting relevant knowledge for the detection of sarcasm and nas-tiness in the social web In Knowledge-Based Systems volume 69pages 124ndash133 Elsevier

Kaji N and Kitsuregawa M (2007) Building lexicon for sentimentanalysis from massive collection of html documents In EMNLP-CoNLL pages 1075ndash1083

Kamp H and Reyle U (1993) From discourse to logic Introduction tomodeltheoretic semantics of natural language formal logic and dis-course representation theory Number 42 Springer Science amp Busi-ness Media

Kamps J Marx M Mokken R J and De Rijke M (2004) Usingwordnet to measure semantic orientations of adjectives In LRECvolume 4 pages 1115ndash1118

Kanayama H and Nasukawa T (2006a) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In Proceedings ofthe 2006 Conference on Empirical Methods in Natural Language Pro-cessing pages 355ndash363 Association for Computational Linguistics

Kanayama H and Nasukawa T (2006b) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In COLING ACL2006 page 355

Kang H Yoo S J and Han D (2012) Senti-lexicon and improvednaiumlve bayes algorithms for sentiment analysis of restaurant reviews

BIBLIOGRAPHY 259

In Expert Systems with Applications volume 39 pages 6000ndash6010Elsevier

Kennedy A and Inkpen D (2006) Sentiment classification of moviereviews using contextual valence shifters In Computational intelli-gence volume 22 pages 110ndash125 Wiley Online Library

Khan K Baharudin B and Khan A Identifying product featuresfrom customer reviews using hybrid dependency patterns Citeseer

Khan K Baharudin B B and City T (2012) Identifying product fea-tures from customer reviews using lexical concordance In ResearchJournal of Applied Sciences Engineering and Technology volume 4pages 833ndash839

Kim S-M and Hovy E (2004) Determining the sentiment of opin-ions In Proceedings of the 20th international conference on Compu-tational Linguistics page 1367

Kim S-M and Hovy E (2005) Identifying opinion holders for ques-tion answering in opinion texts In Proceedings of AAAI-05 Workshopon Question Answering in Restricted Domains pages 1367ndash1373

Kim S-M and Hovy E (2006) Extracting opinions opinion holdersand topics expressed in online news media text In Proceedings ofthe Workshop on Sentiment and Subjectivity in Text pages 1ndash8 As-sociation for Computational Linguistics

Kim Y Kim S and Myaeng S-H (2008) Extracting topic-relatedopinions and their targets in NTCIR-7 In Proceedings of the 7th NT-CIR Workshop Meeting Tokyo Japan

Kim Y-j and Larson R (1989) Scope interpretation and the syntaxof psych-verbs In Linguistic Inquiry pages 681ndash688 JSTOR

Kingsbury P and Palmer M (2002) From treebank to propbank InLREC

260 BIBLIOGRAPHY

Klein B and Leffler K B (1981) The role of market forces in assuringcontractual performance In The Journal of Political Economy pages615ndash641 JSTOR

Klein K and Kutscher S (2002) Psych-verbs and lexical economy InTheorie des Lexikons volume 122 pages 1ndash41

Klein L R (1998) Evaluating the potential of interactive mediathrough a new lens Search versus experience goods In Journal ofbusiness research volume 41 pages 195ndash203 Elsevier

Kleinberg J M (1999) Authoritative sources in a hyperlinked envi-ronment In Journal of the ACM (JACM) volume 46 pages 604ndash632ACM

Kobayakawa T S Kumano T Tanaka H Okazaki N Kim J-D andTsujii J (2009) Opinion classification with tree kernel svm usinglinguistic modality analysis In Proceedings of the 18th ACM confer-ence on Information and knowledge management pages 1791ndash1794ACM

Ku L-W Huang T-H and Chen H-H (2009) Using morphologicaland syntactic structures for chinese opinion analysis In Proceedingsof the 2009 Conference on Empirical Methods in Natural LanguageProcessing Volume 3-Volume 3 pages 1260ndash1269

Lafferty J McCallum A and Pereira F C (2001) Conditional ran-dom fields Probabilistic models for segmenting and labeling se-quence data

Lakoff G (1973) Hedges A study in meaning criteria and the logicof fuzzy concepts In Journal of philosophical logic volume 2 pages458ndash508 Springer

Landau I (2010) The locative syntax of experiencers volume 34 MITpress Cambridge

BIBLIOGRAPHY 261

Landauer T K and Dumais S T (1997) A solution to platorsquos problemThe latent semantic analysis theory of acquisition induction andrepresentation of knowledge In Psychological review volume 104page 211 American Psychological Association

Laporte E (1997) Lrsquoanalyse de phrases adjectivales par reacutetablisse-ment de noms approprieacutes In Langages volume 31 pages 79ndash104Armand Colin

Laporte E (2012) Appropriate nouns with obligatory modifiers InarXiv preprint arXiv12074625

Laporte E and Voyatzi S (2008) An electronic dictionary of frenchmultiword adverbs In Language Resources and Evaluation Confer-ence Workshop Towards a Shared Task for Multiword Expressionspages 31ndash34

Larreya P (2004) Lrsquoexpression de la modaliteacute en franccedilais et enanglais (domaine verbal) In Revue belge de philologie et drsquohistoirevolume 82 pages 733ndash762 Socieacuteteacute pour le progregraves des eacutetudesphilologiques et historiques

Lavanda I and Rampa G (2001) Microeconomia scelte individuali ebenessere sociale Carocci

Laver M Benoit K and Garry J (2003) Extracting policy positionsfrom political texts using words as data In American Political Sci-ence Review volume 97 pages 311ndash331 Cambridge Univ Press

Le Pesant D and Mathieu-Colas M (1998) Introduction aux classesdrsquoobjets In Langages pages 6ndash33 JSTOR

Lee C M Narayanan S S and Pieraccini R (2002) Combiningacoustic and language information for emotion recognition In IN-TERSPEECH

262 BIBLIOGRAPHY

Lee M and Youn S (2009) Electronic word of mouth (ewom) howewom platforms influence consumer product judgement In Inter-national Journal of Advertising volume 28 pages 473ndash499 Taylor ampFrancis

Lenci A Lapesa G and Bonansinga G (2012) Lexit A computa-tional resource on italian argument structure In LREC pages 3712ndash3718

Leung C W Chan S C and Chung F-l (2006) Integrating collabo-rative filtering and sentiment analysis A rating inference approachIn Proceedings of the ECAI 2006 workshop on recommender systemspages 62ndash66

Leung C W Chan S C and Chung F-L (2008) Evaluation of a rat-ing inference approach to utilizing textual reviews for collaborativerecommendation In Cooperative Internet Computing World Scien-tific Publisher World Scientific

Li F Huang M and Zhu X (2010) Sentiment analysis with globaltopics and local dependency In AAAI

Linden G Smith B and York J (2003) Amazon com recommenda-tions Item-to-item collaborative filtering In Internet ComputingIEEE volume 7 pages 76ndash80 IEEE

Liu B (2010) Sentiment analysis and subjectivity In Handbook ofnatural language processing volume 2 pages 627ndash666 Chapman ampHall Goshen CT

Liu B (2012) Sentiment analysis and opinion mining In SynthesisLectures on Human Language Technologies volume 5 pages 1ndash167Morgan amp Claypool Publishers

Liu J and Seneff S (2009) Review sentiment scoring via a parse-and-paraphrase paradigm In Proceedings of the 2009 Conference on Em-pirical Methods in Natural Language Processing Volume 1-Volume1 pages 161ndash169 Association for Computational Linguistics

BIBLIOGRAPHY 263

Macdonald C Ounis I and Soboroff I (2007) Overview of the trec2007 blog track In TREC volume 7 pages 31ndash43

Mahmud A Ahmed K Z and Khan M (2008) Detecting flames andinsults in text In Proceedings of the Sixth International Conferenceon Natural Language Processing BRAC University

Maisto A and Pelosi S (2014a) Feature-based customer review sum-marization In On the Move to Meaningful Internet Systems OTM2014 Workshops pages 299ndash308

Maisto A and Pelosi S (2014b) A lexicon-based approach to senti-ment analysis the italian module for nooj In Proceedings of the In-ternational Nooj 2014 Conference University of Sassari Italy Cam-bridge Scholar Publishing

Maks I and Vossen P (2011) Different approaches to automatic po-larity annotation at synset level In Proceedings of the First Interna-tional Workshop on Lexical Resources WoLeR pages 62ndash69

Maks I Vossen P et al (2012) Building a fine-grained subjectivitylexicon from a web corpus In LREC pages 3070ndash3076

Markov A (1971) Extension of the limit theorems of probability theoryto a sum of variables connected in a chain John Wiley amp Sons Inc

Magraverquez L Carreras X Litkowski K C and Stevenson S (2008)Semantic role labeling an introduction to the special issue In Com-putational linguistics volume 34 pages 145ndash159 MIT Press

Martiacuten J (1998) The lexico-syntactic interface psych verbs andpsych nouns In Hispania pages 619ndash631 JSTOR

Mathieu Y Y (1995) Verbes psychologiques et interpreacutetation seacuteman-tique In Langue franccedilaise pages 98ndash106 JSTOR

Mathieu Y Y (1999a) Les preacutedicats de sentiment In Langages pages41ndash52 JSTOR

264 BIBLIOGRAPHY

Mathieu Y Y (1999b) Un classement seacutemantique des verbes psy-chologiques In Cahiers du CIEL Publications Paris 7

Mathieu Y Y (2006) A computational semantic lexicon of frenchverbs of emotion In Computing attitude and affect in text Theoryand applications pages 109ndash124 Springer

Matthews H Hendrickson C and Soh D (2001) Environmentaland economic effects of e-commerce A case study of book publish-ing and retail logistics In Transportation Research Record Journal ofthe Transportation Research Board number 1763 pages 6ndash12 Trans-portation Research Board of the National Academies

Maynard D and Greenwood M A (2014) Who cares about sarcastictweets investigating the impact of sarcasm on sentiment analysisIn Proceedings of LREC

Maziero E G Pardo T A Di Felippo A and Dias-da Silva B C(2008) A base de dados lexical ea interface web do tep 20 thesauruseletrocircnico para o portuguecircs do brasil In Companion Proceedingsof the XIV Brazilian Symposium on Multimedia and the Web pages390ndash392 ACM

McAfee A Brynjolfsson E Davenport T H Patil D and Barton D(2012) Big data In The management revolution Harvard Bus Revvolume 90 pages 61ndash67

Meier C (2003) The meaning of too enough and so that In NaturalLanguage Semantics volume 11 pages 69ndash107 Springer

Meillet A (1906) La Phrase nominale en indoeuropeacuteen Socieacuteteacute delinguistique

Mejova Y and Srinivasan P (2011) Exploring feature definition andselection for sentiment classifiers In ICWSM

BIBLIOGRAPHY 265

Meunier A (1984) La seacutemantique locative de certaines structuresN0 ecirctre adj In Revue queacutebeacutecoise de linguistique volume 13 pages95ndash121 Universiteacute du Queacutebec agrave Montreacuteal

Meunier A (1999) Une construction complexe N0hum ecirctre Adj deV0-inf W caracteeacuteristique de certains adjectifs aacute sujet humain InLangages pages 12ndash44 JSTOR

Meydan M (1996) Constructions adjectivales substantifs approprieacuteset verbes supports In Linx volume 34 pages 197ndash210 Centre derecherches linguistiques de Paris 10

Meydan M (1999) La restructuration du gn sujet dans les phrases ad-jectivales agrave substantif approprieacute In Langages pages 59ndash80 JSTOR

Miller G A (1995) WordNet a lexical database for english In Com-munications of the ACM volume 38 pages 39ndash41 ACM

Mohammad S Dunne C and Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked wordsand a thesaurus In Proceedings of the 2009 Conference on EmpiricalMethods in Natural Language Processing Volume 2-Volume 2 pages599ndash608 Association for Computational Linguistics

Moilanen K and Pulman S (2007) Sentiment composition In Pro-ceedings of the Recent Advances in Natural Language Processing In-ternational Conference pages 378ndash382

Moilanen K and Pulman S (2008) The good the bad and the un-known morphosyllabic sentiment tagging of unseen words In Pro-ceedings of the 46th Annual Meeting of the Association for Computa-tional Linguistics on Human Language Technologies Short Paperspages 109ndash112

Molinier C and Levrier F (2000) Grammaire des adverbes descrip-tion des formes en-ment volume 33 Librairie Droz

266 BIBLIOGRAPHY

Monachesi P (2009) Annotation of semantic roles In Utrechet Uni-versity Netherlands

Montefinese M Ambrosini E Fairfield B and Mammarella N(2014) The adaptation of the affective norms for english words(anew) for italian In Behavior research methods volume 46 pages887ndash903 Springer

Moon R (2008) Conventionalized as-similes in english a problemcase In International Journal of Corpus Linguistics volume 13pages 3ndash37 John Benjamins Publishing Company

Mulder M Nijholt A Den Uyl M and Terpstra P (2004) A lexi-cal grammatical implementation of affect In Text Speech and Dia-logue pages 171ndash177

Mullen T and Collier N (2004) Sentiment analysis using supportvector machines with diverse information sources In EMNLP vol-ume 4 pages 412ndash418

Mullen T and Malouf R (2006) A preliminary investigation into sen-timent analysis of informal political discourse In AAAI Spring Sym-posium Computational Approaches to Analyzing Weblogs pages159ndash162

Nakagawa T Inui K and Kurohashi S (2010) Dependency tree-based sentiment classification using crfs with hidden variables InHuman Language Technologies The 2010 Annual Conference of theNorth American Chapter of the Association for Computational Lin-guistics pages 786ndash794 Association for Computational Linguistics

Nakayama M Sutcliffe N and Wan Y (2010) Has the web trans-formed experience goods into search goods In Electronic Marketsvolume 20 pages 251ndash262 Springer

Narayanan R Liu B and Choudhary A (2009) Sentiment analy-sis of conditional sentences In Proceedings of the 2009 Conference

BIBLIOGRAPHY 267

on Empirical Methods in Natural Language Processing Volume 1-Volume 1 pages 180ndash189

Nasukawa T and Yi J (2003) Sentiment analysis Capturing favora-bility using natural language processing In Proceedings of the 2ndinternational conference on Knowledge capture pages 70ndash77 ACM

Navigli R and Ponzetto S P (2012) BabelNet The automatic con-struction evaluation and application of a wide-coverage multilin-gual semantic network volume 193 pages 217ndash250

Nelson P (1970) Information and consumer behavior In The Journalof Political Economy pages 311ndash329 JSTOR

Neviarouskaya A (2010) Compositional Approach for AutomaticRecognition of Fine-Grained Affect Judgment and Appreciation inText PhD thesis University of Tokyo

Neviarouskaya A Prendinger H and Ishizuka M (2007) Textualaffect sensing for sociable and expressive online communicationIn Affective Computing and Intelligent Interaction pages 218ndash229Springer

Neviarouskaya A Prendinger H and Ishizuka M (2009a) Com-positionality principle in recognition of fine-grained emotions fromtext In ICWSM

Neviarouskaya A Prendinger H and Ishizuka M (2009b) Senti-ful Generating a reliable lexicon for sentiment analysis In AffectiveComputing and Intelligent Interaction and Workshops 2009 ACII2009 3rd International Conference on pages 1ndash6 IEEE

Neviarouskaya A Prendinger H and Ishizuka M (2011) Sentiful Alexicon for sentiment analysis In Affective Computing IEEE Trans-actions on volume 2 pages 22ndash36 IEEE

268 BIBLIOGRAPHY

Niedenthal P M Halberstadt J B Margolin J and Innes-Ker Aring H(2000) Emotional state and the detection of change in facial ex-pression of emotion In European Journal of Social Psychology vol-ume 30 pages 211ndash222 Wiley Online Library

Nielsen F Aring (2011) A new anew Evaluation of a word list for senti-ment analysis in microblogs In arXiv preprint arXiv11032903

Norrick N R (1986) Stock similes In Journal of Literary Semanticsvolume 15 pages 39ndash52

Osgood C E (1952) The nature and measurement of meaning InPsychological bulletin volume 49 page 197 American PsychologicalAssociation

Pak A and Paroubek P (2011) Twitter for sentiment analysis Whenlanguage resources are not available In Database and Expert Sys-tems Applications (DEXA) 2011 22nd International Workshop onpages 111ndash115 IEEE

Pal P Iyer A N and Yantorno R E (2006) Emotion detection frominfant facial expressions and cries In Acoustics Speech and SignalProcessing 2006 ICASSP 2006 Proceedings 2006 IEEE InternationalConference on volume 2 pages IIndashII IEEE

Palmer M Gildea D and Kingsbury P (2005) The proposition bankAn annotated corpus of semantic roles In Computational linguis-tics volume 31 pages 71ndash106 MIT press

Palmer M Gildea D and Xue N (2010) Semantic role labelingIn Synthesis Lectures on Human Language Technologies volume 3pages 1ndash103 Morgan amp Claypool Publishers

Pang B and Lee L (2004) A sentimental education Sentiment anal-ysis using subjectivity summarization based on minimum cuts InProceedings of the 42nd annual meeting on Association for Compu-tational Linguistics page 271 Association for Computational Lin-guistics

BIBLIOGRAPHY 269

Pang B and Lee L (2005) Seeing stars Exploiting class relation-ships for sentiment categorization with respect to rating scales InProceedings of the 43rd Annual Meeting on Association for Compu-tational Linguistics pages 115ndash124 Association for ComputationalLinguistics

Pang B and Lee L (2008a) Opinion mining and sentiment analysisIn Foundations and trends in information retrieval volume 2 pages1ndash135 Now Publishers Inc

Pang B and Lee L (2008b) Using very simple statistics for reviewsearch An exploration In COLING (Posters) pages 75ndash78

Pang B Lee L and Vaithyanathan S (2002) Thumbs up senti-ment classification using machine learning techniques In Proceed-ings of the ACL-02 conference on Empirical methods in natural lan-guage processing-Volume 10 pages 79ndash86 Association for Compu-tational Linguistics

Pantic M and Patras I (2006) Dynamics of facial expression recog-nition of facial actions and their temporal segments from face pro-file image sequences In Systems Man and Cybernetics Part B Cy-bernetics IEEE Transactions on volume 36 pages 433ndash449 IEEE

Park C and Lee T M (2009) Information direction website reputa-tion and ewom effect A moderating role of product type In Journalof Business Research volume 62 pages 61ndash67 Elsevier

Paulo-Santos A Ramos C and Marques N C (2011) Determin-ing the polarity of words through a common online dictionary InProgress in Artificial Intelligence pages 649ndash663 Springer

Pelosi S (2015a) Morphological relations for the automatic expan-sion of italian sentiment lexicons In Proceedings of the Interna-tional Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts NooJ 2015

270 BIBLIOGRAPHY

Pelosi S (2015b) Sentita and doxa Italian databases and tools forsentiment analysis purposes In Proceedings of the Second ItalianConference on Computational Linguistics CLiC-it 2015 pages 226ndash231 Accademia University Press

Pelosi S Maisto A and Elia A (2015) Sentiment analysis throughfinite state automata In the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 - FSMNLP2015

Perez-Rosas V Banea C and Mihalcea R (2012) Learning senti-ment lexicons in spanish In LREC pages 3077ndash3081

Pesetsky D (1996) Zero syntax Experiencers and cascades Num-ber 27 MIT press

Pianta E Bentivogli L and Girardi C (2002) MultiWordNet devel-oping an aligned multilingual database In Proceedings of the firstinternational conference on global WordNet volume 152 pages 55ndash63

Piao S Ananiadou S Tsuruoka Y Sasaki Y and McNaught J(2007) Mining opinion polarity relations of citations In Interna-tional Workshop on Computational Semantics (IWCS) pages 366ndash371

Picabia L (1978) Les constructions adjectivales en franccedilais systeacutema-tique transformationnelle volume 11 Librairie Droz

Picard R W and Picard R (1997) Affective computing volume 252MIT press Cambridge

Pierre-Yves O (2003) The production and recognition of emotionsin speech features and algorithms In International Journal ofHuman-Computer Studies volume 59 pages 157ndash183 Elsevier

Plag I (2003) Word-formation in English Cambridge UniversityPress

BIBLIOGRAPHY 271

Polanyi L and Zaenen A (2006) Contextual valence shifters In Com-puting attitude and affect in text Theory and applications pages 1ndash10 Springer

Popescu A-M and Etzioni O (2007) Extracting product features andopinions from reviews In Natural language processing and text min-ing pages 9ndash28 Springer

Portner P (2009) Modality OUP Oxford

Potts C (2011) On the negativity of negation In Semantics and Lin-guistic Theory number 20 pages 636ndash659

Prabowo R and Thelwall M (2009) Sentiment analysis A combinedapproach In Journal of Informetrics volume 3 pages 143ndash157 Else-vier

Pradhan S Hacioglu K Ward W Martin J H and Jurafsky D(2003) Semantic role parsing Adding semantic structure to un-structured text In Data Mining 2003 ICDM 2003 Third IEEE In-ternational Conference on pages 629ndash632 IEEE

Qiu G Liu B Bu J and Chen C (2009) Expanding domain senti-ment lexicon through double propagation In IJCAI volume 9 pages1199ndash1204

Quirk R Crystal D and Education P (1985) A comprehensive gram-mar of the English language volume 397 Cambridge Univ Press

Rahimi Rastgar S and Razavi N (2013) A System for Building CorpusAnnotated With Semantic Roles PhD thesis

Rainer F (2004) Derivazione nominale deaggettivale In La for-mazione delle parole in italiano pages 293ndash314

Rao D and Ravichandran D (2009) Semi-supervised polarity lexiconinduction In Proceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Linguistics pages 675ndash682 Association for Computational Linguistics

272 BIBLIOGRAPHY

Rappaport Hovav M and Levin B (1998) Building verb meaningsIn The projection of arguments Lexical and compositional factorspages 97ndash134

Read J and Carroll J (2009) Weakly supervised techniques fordomain-independent sentiment classification In Proceedings of the1st international CIKM workshop on Topic-sentiment analysis formass opinion pages 45ndash52 ACM

Reinstein D A and Snyder C M (2005) The influence of expert re-views on consumer demand for experience goods A case study ofmovie critics In The journal of industrial economics volume 53pages 27ndash51 Wiley Online Library

Remus R Quasthoff U and Heyer G (2010) Sentiws-a pub-licly available german-language resource for sentiment analysis InLREC

Reyes A and Rosso P (2011) Mining subjective knowledge from cus-tomer reviews a specific case of irony detection In Proceedings ofthe 2nd Workshop on Computational Approaches to Subjectivity andSentiment Analysis pages 118ndash124 Association for ComputationalLinguistics

Reyes A Rosso P and Veale T (2013) A multidimensional approachfor detecting irony in twitter In Language Resources and Evaluationvolume 47 pages 239ndash268 Springer

Reynolds K Kontostathis A and Edwards L (2011) Using machinelearning to detect cyberbullying In Machine Learning and Applica-tions and Workshops (ICMLA) 2011 10th International Conferenceon volume 2 pages 241ndash244 IEEE

Ricca D (2004a) Aggettivi deverbali In La formazione delle parole initaliano Tuumlbingen Niemeyer pages 419ndash444

BIBLIOGRAPHY 273

Ricca D (2004b) Derivazione avverbiale In La formazione delle pa-role in italiano pages 472ndash489

Riloff E (1996) An empirical study of automated dictionary construc-tion for information extraction in three domains In Artificial intel-ligence volume 85 pages 101ndash134 Elsevier

Riloff E Patwardhan S and Wiebe J (2006) Feature subsumptionfor opinion analysis In Proceedings of the 2006 Conference on Em-pirical Methods in Natural Language Processing pages 440ndash448 As-sociation for Computational Linguistics

Riloff E Wiebe J and Wilson T (2003) Learning subjective nounsusing extraction pattern bootstrapping In Proceedings of the sev-enth conference on Natural language learning at HLT-NAACL 2003-Volume 4 pages 25ndash32 Association for Computational Linguistics

Rosenbaum P S (1967) The grammar of English predicate comple-ment constructions Research monograph 47 Cambridge MA MITPress

Rubin V L (2010) Epistemic modality From uncertainty to certaintyin the context of information seeking as interactions with texts InInformation Processing amp Management volume 46 pages 533ndash540Elsevier

Ruppenhofer J (2013) Anchoring sentiment analysis in frame se-mantics In Veredas pages 66ndash81

Ruppenhofer J Ellsworth M Petruck M R Johnson C R andScheffczyk J (2006) FrameNet II Extended theory and practice

Ruppenhofer J and Rehbein I (2012) Semantic frames as an an-chor representation for sentiment analysis In Proceedings of the 3rdWorkshop in Computational Approaches to Subjectivity and Senti-ment Analysis pages 104ndash109 Association for Computational Lin-guistics

274 BIBLIOGRAPHY

Ruppenhofer J Somasundaran S and Wiebe J (2008) Finding thesources and targets of subjective expressions In LREC

Russom P et al (2011) Big data analytics In TDWI Best PracticesReport Fourth Quarter

Ruwet N (1994) Ecirctre ou ne pas ecirctre un verbe de sentiment In Languefranccedilaise pages 45ndash55 JSTOR

Sagot B and Fort K (2007) Ameacuteliorer un lexique syntaxique agrave lrsquoaidedes tables du lexique-grammaire adverbes en-ment In 26e Col-loque International sur le Lexique et la grammaire 2007

Salvi G and Vanelli L (2004) Nuova grammatica italiana BolognaIl Mulino

Sarwar B Karypis G Konstan J and Riedl J (2001) Item-basedcollaborative filtering recommendation algorithms In Proceedingsof the 10th international conference on World Wide Web pages 285ndash295 ACM

Sauriacute R and Pustejovsky J (2009) Factbank A corpus annotated withevent factuality In Language resources and evaluation volume 43pages 227ndash268 Springer

Schmid H (1994) Probabilistic part-of-speech tagging using decisiontrees In Proceedings of the international conference on new methodsin language processing volume 12 pages 44ndash49

Schmid H Baroni M Zanchetta E and Stein A (2007) The en-riched treetagger system In proceedings of the EVALITA 2007 work-shop

Schuler K K (2005) VerbNet A broad-coverage comprehensive verblexicon PhD thesis Computer and Information Science Dept Uni-versity of Pennsylvania

BIBLIOGRAPHY 275

Schuller B Rigoll G and Lang M (2003) Hidden markov model-based speech emotion recognition In Acoustics Speech and SignalProcessing 2003 Proceedings(ICASSPrsquo03) 2003 IEEE InternationalConference on volume 2 pages IIndash1 IEEE

Schuller B Rigoll G and Lang M (2004) Speech emotion recogni-tion combining acoustic features and linguistic information in a hy-brid support vector machine-belief network architecture In Acous-tics Speech and Signal Processing 2004 Proceedings(ICASSPrsquo04)IEEE International Conference on volume 1 pages Indash577 IEEE

Schwarzschild R (2008) The semantics of comparatives and otherdegree constructions In Language and Linguistics Compass vol-ume 2 pages 308ndash331 Wiley Online Library

Seki Y Eguchi K Kando N and Aono M (2005) Multi-documentsummarization with subjectivity analysis at duc 2005 In Proceed-ings of the Document Understanding Conference (DUC)

Seki Y Evans D K Ku L-W Chen H-H Kando N and Lin C-Y(2007) Overview of opinion analysis pilot task at ntcir-6 In Proceed-ings of NTCIR-6 Workshop Meeting pages 265ndash278

Shaikh M A M Prendinger H and Mitsuru I (2007) Assessingsentiment of text by semantic dependency and contextual valenceanalysis In Affective Computing and Intelligent Interaction pages191ndash202 Springer

Shapiro C (1982) Consumer information product quality and sellerreputation In The Bell Journal of Economics pages 20ndash35 JSTOR

Shapiro C (1983) Premiums for high quality products as returns toreputations In The quarterly journal of economics pages 659ndash679JSTOR

Shimada K and Endo T (2008) Seeing several stars A rating in-ference task for a document containing several evaluation criteria

276 BIBLIOGRAPHY

In Advances in Knowledge Discovery and Data Mining pages 1006ndash1014 Springer

Silberztein M (2003) Nooj manual In Available for download atwww nooj4nlp net

Silberztein M (2008) Complex annotations with nooj In Proceedingsof the 2007 International NooJ Conference pages pndash214 CambridgeScholars Publishing

Silva M J Carvalho P Costa C and Sarmento L (2010) Automaticexpansion of a social judgment lexicon for sentiment analysis Tech-nical Report TR 1008

Silva M J Carvalho P and Sarmento L (2012) Building a sentimentlexicon for social judgement mining In Computational Processingof the Portuguese Language pages 218ndash228 Springer

Singhal P and Bansal A (2013) Improved textual cyberbullying de-tection using data mining In International Journal of Informationand Computation Technology ISSN pages 0974ndash2239

Snyder B and Barzilay R (2007) Multiple aspect ranking using thegood grief algorithm In HLT-NAACL pages 300ndash307

Socher R Perelygin A Wu J Y Chuang J Manning C D Ng A Yand Potts C (2013) Recursive deep models for semantic composi-tionality over a sentiment treebank In Proceedings of the conferenceon empirical methods in natural language processing (EMNLP) vol-ume 1631 page 1642

Somprasertsri G and Lalitrojwong P (2010) Mining feature-opinionin online customer reviews for opinion summarization In J UCSvolume 16 pages 938ndash955

Souza M Vieira R Busetti D Chishman R Alves I M et al(2011) Construction of a portuguese opinion lexicon from multi-ple resources In STIL

BIBLIOGRAPHY 277

Steinberger J Ebrahim M Ehrmann M Hurriyetoglu A KabadjovM Lenkova P Steinberger R Tanev H Vaacutezquez S and ZavarellaV (2012) Creating sentiment dictionaries via triangulation In Deci-sion Support Systems volume 53 pages 689ndash694 Elsevier

Stigler G J (1961) The economics of information In The journal ofpolitical economy pages 213ndash225 JSTOR

Stone P J Dunphy D C and Smith M S (1966) The General In-quirer A Computer Approach to Content Analysis MIT press

Stone P J and Hunt E B (1963) A computer approach to contentanalysis studies using the general inquirer system In Proceedingsof the May 21-23 1963 spring joint computer conference pages 241ndash256 ACM

Strapparava C and Mihalcea R (2008) Learning to identify emo-tions in text In Proceedings of the 2008 ACM symposium on Appliedcomputing pages 1556ndash1560 ACM

Strapparava C Valitutti A et al (2004) WordNet-Affect an affectiveextension of WordNet In LREC volume 4 pages 1083ndash1086

Strapparava C Valitutti A and Stock O (2006) The affective weightof lexicon In Proceedings of the fifth international conference on lan-guage resources and evaluation pages 423ndash426

Swier R S and Stevenson S (2004) Unsupervised semantic role la-belling In Proceedings of EMNLP volume 95 page 102

Taboada M Anthony C and Voll K (2006) Methods for creatingsemantic orientation dictionaries In Proceedings of the 5th Inter-national Conference on Language Resources and Evaluation (LREC)Genova Italy pages 427ndash432

Taboada M Brooke J Tofiloski M Voll K and Stede M (2011)Lexicon-based methods for sentiment analysis In Computationallinguistics volume 37 pages 267ndash307 MIT Press

278 BIBLIOGRAPHY

Taboada M and Grieve J (2004) Analyzing appraisal automaticallyIn Proceedings of AAAI Spring Symposium on Exploring Attitude andAffect in Text (AAAI Technical Re port SS 04 07) Stanford Univer-sity CA pp 158q161 AAAI Press

Tan S Cheng X Wang Y and Xu H (2009) Adapting naive bayesto domain adaptation for sentiment analysis In Advances in Infor-mation Retrieval pages 337ndash349 Springer

Taylor A (1954) Proverbial comparisons and similes from Californiavolume 3 University of California Press

Tepperman J Traum D R and Narayanan S (2006) yeahright sarcasm recognition for spoken dialogue systems In INTER-SPEECH

Terveen L Hill W Amento B McDonald D and Creter J (1997)Phoaks A system for sharing recommendations In Communica-tions of the ACM volume 40 pages 59ndash62 ACM

Tesniegravere L (1959) Eleacutements de syntaxe structurale Librairie C Klinck-sieck

Thomas M Pang B and Lee L (2006) Get out the vote Deter-mining support or opposition from congressional floor-debate tran-scripts In Proceedings of the 2006 conference on empirical methodsin natural language processing pages 327ndash335 Association for Com-putational Linguistics

Tolone E and Stavroula V (2011) Extending the adverbial coverage ofa nlp oriented resource for french In arXiv preprint arXiv11113462

Tsur O Davidov D and Rappoport A (2010) Icwsm-a great catchyname Semi-supervised recognition of sarcastic sentences in onlineproduct reviews In ICWSM

Turney P D (2001) Mining the web for synonyms Pmi-ir versus lsaon toefl In Machine Learning ECML 2001 pages 491ndash502 Springer

BIBLIOGRAPHY 279

Turney P D (2002) Thumbs up or thumbs down semantic orienta-tion applied to unsupervised classification of reviews In Proceed-ings of the 40th annual meeting on association for computationallinguistics pages 417ndash424

Turney P D and Littman M L (2003) Measuring praise and criti-cism Inference of semantic orientation from association In ACMTransactions on Information Systems (TOIS) volume 21 pages 315ndash346

Veale T and Hao Y (2009) Support structures for linguistic creativityA computational analysis of creative irony in similes In Proceedingsof CogSci pages 1376ndash1381

Velikovich L Blair-Goldensohn S Hannan K and McDonald R(2010) The viability of web-derived polarity lexicons In HumanLanguage Technologies The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguisticspages 777ndash785 Association for Computational Linguistics

Vermeij M (2005) The orientation of user opinions through adverbsverbs and nouns In 3rd Twente Student Conference on IT EnschedeJune

Vietri S (1990) On some comparative frozen sentences in italianIn Lingvisticaelig Investigationes volume 14 pages 149ndash174 John Ben-jamins Publishing Company

Vietri S (2004) Lessico-grammatica dellrsquoitaliano Metodi descrizionie applicazioni UTET Universitagrave

Vietri S (2011) On a class of italian frozen sentences In LingvisticaeligInvestigationes volume 34 pages 228ndash267 John Benjamins Publish-ing Company

Vietri S (2014a) The construction of an annotated corpus for theanalysis of italian transfer predicates In Lingvisticae Investigationesvolume 37 pages 69ndash105 John Benjamins Publishing Company

280 BIBLIOGRAPHY

Vietri S (2014b) Idiomatic Constructions in Italian A Lexicon-grammar Approach volume 31 John Benjamins Publishing Com-pany

Vietri S (2014c) The italian module for nooj In In Proceedings of theFirst Italian Conference on Computational Linguistics CLiC-it 2014Pisa University Press

Vietri S (2014d) The lexicon-grammar of italian idioms In Work-shop on Lexical and Grammatical Resources for Language Process-ing page 137

Villars R L Olofson C W and Eastwood M (2011) Big data Whatit is and why you should care In White Paper IDC

Vincze V (2010) Speculation and negation annotation in naturallanguage texts what the case of bioscope might (not) reveal InNegation and Speculation in Natural Language Processing (NeSp-NLP 2010) page 28

Vincze V Szarvas G Farkas R Moacutera G and Csirik J (2008) Thebioscope corpus biomedical texts annotated for uncertainty nega-tion and their scopes In BMC bioinformatics volume 9 page S9BioMed Central Ltd

Voghera M (2004) Polirematiche In Grossmann M Rainer F(a curadi) La formazione delle parole in italiano Tbingen Niemeyer pages56ndash69

Voll K and Taboada M (2007) Not all words are created equal Ex-tracting semantic orientation as a function of adjective relevance InAI 2007 Advances in Artificial Intelligence pages 337ndash346 Springer

Vollero A (2010) E-marketing e Web communication Verso la gestionedella corporate reputation online Giappichelli Torino

BIBLIOGRAPHY 281

von Ungern-Sternberg T and von Weizsaumlcker C C (1985) The sup-ply of quality on a market for experience goods In The Journal ofIndustrial Economics pages 531ndash540 JSTOR

Voyatzi S (2006) Description morphosyntaxique et seacutemantique desadverbes figeacutes en vue drsquoun systeacuteme drsquoanalyse automatique des textesgrecs PhD thesis Universiteacute Paris-Est

Waltinger U (2010) Germanpolarityclues A lexical resource for ger-man sentiment analysis In LREC

Wan X (2009) Co-training for cross-lingual sentiment classificationIn Proceedings of the Joint Conference of the 47th Annual Meeting ofthe ACL and the 4th International Joint Conference on Natural Lan-guage Processing of the AFNLP Volume 1 pages 235ndash243 Associa-tion for Computational Linguistics

Wang X Zhao Y and Fu G (2011) A morpheme-based method tochinese sentence-level sentiment classification In Int J of AsianLang Proc volume 21 pages 95ndash106

Warriner A B Kuperman V and Brysbaert M (2013) Norms of va-lence arousal and dominance for 13915 english lemmas In Behav-ior research methods volume 45 pages 1191ndash1207 Springer

Wawer A (2012) Extracting emotive patterns for languages with richmorphology In International Journal of Computational Linguisticsand Applications volume 3 pages 11ndash24

Wei C-P Chen Y-M Yang C-S and Yang C C (2010) Un-derstanding what concerns consumers a semantic approach toproduct feature extraction from consumer reviews In Informa-tion Systems and E-Business Management volume 8 pages 149ndash167Springer

Whissel C (1989) The dictionary of affect in language emotion The-ory research and experience vol 4 the measurement of emotionsr In Plutchik and H Kellerman Eds New York Academic

282 BIBLIOGRAPHY

Whissell C (2009) Using the revised dictionary of affect in languageto quantify the emotional undertones of samples of natural lan-guage 1 2 In Psychological reports volume 105 pages 509ndash521 Am-mons Scientific

Whitley M S (1995) Gustar and other psych verbs a problem in tran-sitivity In Hispania pages 573ndash585 JSTOR

Wiebe J (2000) Learning subjective adjectives from corpora InAAAIIAAI pages 735ndash740

Wiebe J Wilson T Bruce R Bell M and Martin M (2004) Learn-ing subjective language In Computational linguistics volume 30pages 277ndash308 MIT Press

Wiebe J Wilson T and Cardie C (2005) Annotating expressionsof opinions and emotions in language In Language resources andevaluation volume 39 pages 165ndash210 Springer

Wiegand M Balahur A Roth B Klakow D and Montoyo A (2010)A survey on the role of negation in sentiment analysis In Proceed-ings of the workshop on negation and speculation in natural lan-guage processing pages 60ndash68 Association for Computational Lin-guistics

Wilson D and Sperber D (1992) On verbal irony In Lingua vol-ume 87 pages 53ndash76 Elsevier

Wilson T Wiebe J and Hoffmann P (2005) Recognizing contex-tual polarity in phrase-level sentiment analysis In Proceedings ofthe conference on human language technology and empirical meth-ods in natural language processing pages 347ndash354 Association forComputational Linguistics

Wilson T Wiebe J and Hoffmann P (2009) Recognizing contextualpolarity An exploration of features for phrase-level sentiment anal-ysis In Computational linguistics volume 35 pages 399ndash433 MITPress

BIBLIOGRAPHY 283

Xia R and Zong C (2010) Exploring the use of word relation featuresfor sentiment classification In Proceedings of the 23rd InternationalConference on Computational Linguistics Posters pages 1336ndash1344Association for Computational Linguistics

Xiang G Fan B Wang L Hong J and Rose C (2012) Detecting of-fensive tweets via topical feature discovery over a large scale twittercorpus In Proceedings of the 21st ACM international conference onInformation and knowledge management pages 1980ndash1984 ACM

Xu Z and Zhu S (2010) Filtering offensive language in online com-munities using grammatical relations In Proceedings of the SeventhAnnual Collaboration Electronic Messaging Anti-Abuse and SpamConference

Yang S and Ko Y (2011) Extracting comparative entities and pred-icates from texts using comparative type classification In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies-Volume 1 pages 1636ndash1644 Association for Computational Linguistics

Ye Q Law R Gu B and Chen W (2011) The influence of user-generated content on traveler behavior An empirical investigationon the effects of e-word-of-mouth to hotel online bookings In Com-puters in Human Behavior volume 27 pages 634ndash639 Elsevier

Ye Q Zhang Z and Law R (2009) Sentiment classification of on-line reviews to travel destinations by supervised machine learningapproaches In Expert Systems with Applications volume 36 pages6527ndash6535 Elsevier

Yessenalina A and Cardie C (2011) Compositional matrix-spacemodels for sentiment analysis In Proceedings of the Conference onEmpirical Methods in Natural Language Processing pages 172ndash182Association for Computational Linguistics

284 BIBLIOGRAPHY

Yi J Nasukawa T Bunescu R and Niblack W (2003) Sentimentanalyzer Extracting sentiments about a given topic using naturallanguage processing techniques In Data Mining 2003 ICDM 2003Third IEEE International Conference on pages 427ndash434

Yin D Xue Z Hong L Davison B D Kontostathis A and Ed-wards L (2009) Detection of harassment on web 20 In Proceedingsof the Content Analysis in the WEB volume 2 pages 1ndash7

Yuen R W Chan T Y Lai T B Kwong O Y and Trsquosou B K(2004) Morpheme-based derivation of bipolar semantic orientationof chinese words In Proceedings of the 20th international conferenceon Computational Linguistics page 1008 Association for Computa-tional Linguistics

Zhang L and Liu B (2011) Identifying noun product features thatimply opinions In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human Language Tech-nologies short papers-Volume 2 pages 575ndash580 Association forComputational Linguistics

Zhang L Liu B Lim S H and OrsquoBrien-Strain E (2010) Extractingand ranking product features in opinion documents In Proceed-ings of the 23rd international conference on computational linguis-tics Posters pages 1462ndash1470 Association for Computational Lin-guistics

Zhao Q Sun C Liu B and Cheng Y (2010) Learning to detecthedges and their scope using crf In Proceedings of the FourteenthConference on Computational Natural Language LearningmdashSharedTask pages 100ndash105 Association for Computational Linguistics

Zhou N (2010) Taboo Language on the Internet An Analysis of Gen-der Differences in Using Taboo Language PhD thesis KristianstadUniversity

  • Introduction
    • Research Context
    • Filtering Information Automatically
    • Subjectivity Emotions and Opinions
    • Structure of the Thesis
      • Theoretical Background
        • The Lexicon-Grammar Theoretical Framework
          • Distributional and Transformational Analysis
          • Formal Representation of Elementary Sentences
            • The Continuum from Simple Sentences to Polyrematic Structures
            • Against the Bipartition of the Sentence Structure
              • Lexicon-Grammar Binary Matrices
              • Syntax-Semantics Interface
                • Frame Semantics
                • Semantic Predicates Theory
                    • The Italian Module of Nooj
                      • Electronic Dictionaries
                      • Local grammars and Finite-state Automata
                          • SentIta Lexicon-Grammar based Sentiment Resources
                            • Lexicon-Grammar of Sentiment Expressions
                            • Literature Survey on Subjectivity Lexicons
                              • Polarity Lexicons
                              • Affective Lexicons
                              • Subjectivity Lexicons for Other Languages
                                • Italian Resources
                                • Sentiment Lexical Resources for Different Languages
                                    • SentIta and its Manually-built Resources
                                      • Semantic Orientations and Prior Polarities
                                      • Adjectives of Sentiment
                                        • A Special Remark on Negative Words in Dictionaries and Positive Biases in Corpora
                                          • Psychological and other Opinion Bearing Verbs
                                            • A Brief Literature Survey on the Classification of Psych Verbs
                                            • Semantic Predicates from the LG Framework
                                            • Psychological Predicates
                                            • Other Verbs of Sentiment
                                              • Psychological Predicates Nominalizations
                                                • The Presence of Support Verbs
                                                • The Absence of Support Verbs
                                                • Ordinary Verbs Structures
                                                • Standard-Crossed Structures
                                                  • Psychological Predicates Adjectivalizations
                                                    • Support Verbs
                                                    • Nominal groups with appropriate nouns Napp
                                                    • Verbless Structures
                                                    • Evaluation Verbs
                                                      • Vulgar Lexicon
                                                        • Towards a Lexicon of Opinionated Compounds
                                                          • Compound Adverbs and their Polarities
                                                            • Compound Adjectives of Sentiment
                                                              • Frozen Expressions of Sentiment
                                                                • The Lexicon-Grammar Classes Under Examination PECO CEAC EAA ECA and EAPC
                                                                • The Formalization of Sentiment Frozen Expressions
                                                                    • The Automatic Expansion of SentIta
                                                                      • Literature Survey on Sentiment Lexicon Propagation
                                                                        • Thesaurus Based Approaches
                                                                        • Corpus Based Approaches
                                                                        • The Morphological Approach
                                                                          • Deadjectival Adverbs in -mente
                                                                            • Semantics of the -mente formations
                                                                            • The Superlative of the -mente Adverbs
                                                                              • Deadjectival Nouns of Quality
                                                                              • Morphological Semantics
                                                                                  • Shifting Lexical Valences through FSA
                                                                                    • Contextual Valence Shifters
                                                                                    • Polarity Intensification and Downtoning
                                                                                      • Related Works
                                                                                      • Intensification and Downtoning in SentIta
                                                                                      • Excess Quantifiers
                                                                                        • Negation Modeling
                                                                                          • Related Works
                                                                                          • Negation in SentIta
                                                                                            • Modality Detection
                                                                                              • Related Works
                                                                                              • Modality in SentIta
                                                                                                • Comparative Sentences Mining
                                                                                                  • Related Works
                                                                                                  • Comparison in SentIta
                                                                                                  • Interaction with the Intensification Rules
                                                                                                  • Interaction with the Negation Rules
                                                                                                      • Specific Tasks of the Sentiment Analysis
                                                                                                        • Sentiment Polarity Classification
                                                                                                          • Related Works
                                                                                                          • DOXA a Sentiment Classifier based on FSA
                                                                                                          • A Dataset of Opinionated Reviews
                                                                                                          • Experiment and Evaluation
                                                                                                            • Sentence-level Sentiment Analysis
                                                                                                            • Document-level Sentiment Analysis
                                                                                                                • Feature-based Sentiment Analysis
                                                                                                                  • Related Works
                                                                                                                  • Experiment and Evaluation
                                                                                                                  • Opinion Visual Summarization
                                                                                                                    • Sentiment Role Labeling
                                                                                                                      • Related Works
                                                                                                                      • Experiment and Evaluation
                                                                                                                          • Conclusion
                                                                                                                          • Italian Lexicon-Grammar Symbols
                                                                                                                          • Acronyms
                                                                                                                          • Bibliography
Page 4: Detecting Subjectivity through Lexicon-Grammar Strategies

vi CONTENTS

331 Semantic Orientations and Prior Polarities 58332 Adjectives of Sentiment 60

3321 A Special Remark on Negative Words in Dictionaries and Posi-tive Biases in Corpora 62

333 Psychological and other Opinion Bearing Verbs 633331 A Brief Literature Survey on the Classification of Psych Verbs 633332 Semantic Predicates from the LG Framework 643333 Psychological Predicates 663334 Other Verbs of Sentiment 78

334 Psychological Predicatesrsquo Nominalizations 793341 The Presence of Support Verbs 843342 The Absence of Support Verbs 853343 Ordinary Verbs Structures 863344 Standard-Crossed Structures 89

335 Psychological Predicatesrsquo Adjectivalizations 893351 Support Verbs 953352 Nominal groups with appropriate nouns Napp 973353 Verbless Structures 1003354 Evaluation Verbs 101

336 Vulgar Lexicon 10234 Towards a Lexicon of Opinionated Compounds 107

341 Compound Adverbs and their Polarities 1073411 Compound Adjectives of Sentiment 111

342 Frozen Expressions of Sentiment 1123421 The Lexicon-Grammar Classes Under Examination PECO

CEAC EAA ECA and EAPC 1153422 The Formalization of Sentiment Frozen Expressions 118

35 The Automatic Expansion of SentIta 125351 Literature Survey on Sentiment Lexicon Propagation 126

3511 Thesaurus Based Approaches 1273512 Corpus Based Approaches 1303513 The Morphological Approach 132

352 Deadjectival Adverbs in -mente 1353521 Semantics of the -mente formations 1393522 The Superlative of the -mente Adverbs 141

353 Deadjectival Nouns of Quality 141354 Morphological Semantics 147

4 Shifting Lexical Valences through FSA 15341 Contextual Valence Shifters 15342 Polarity Intensification and Downtoning 154

421 Related Works 154

CONTENTS vii

422 Intensification and Downtoning in SentIta 155423 Excess Quantifiers 158

43 Negation Modeling 160431 Related Works 160432 Negation in SentIta 163

44 Modality Detection 170441 Related Works 170442 Modality in SentIta 172

45 Comparative Sentences Mining 175451 Related Works 175452 Comparison in SentIta 178453 Interaction with the Intensification Rules 179454 Interaction with the Negation Rules 182

5 Specific Tasks of the Sentiment Analysis 18551 Sentiment Polarity Classification 185

511 Related Works 186512 DOXA a Sentiment Classifier based on FSA 188513 A Dataset of Opinionated Reviews 189514 Experiment and Evaluation 191

5141 Sentence-level Sentiment Analysis 1925142 Document-level Sentiment Analysis 195

52 Feature-based Sentiment Analysis 197521 Related Works 198522 Experiment and Evaluation 203523 Opinion Visual Summarization 210

53 Sentiment Role Labeling 211531 Related Works 213532 Experiment and Evaluation 215

6 Conclusion 223

A Italian Lexicon-Grammar Symbols 229

B Acronyms 231

Bibliography 239

Chapter 1

Introduction

Consumers as Internet users can freely share their thoughts withhuge and geographically dispersed groups of people competing thisway with the traditional power of marketing and advertising channelsDifferently from the traditional word-of-mouth which is usually lim-ited to private conversations the Internet used generated contentscan be directly observed and described by the researchers (Pang andLee 2008a p 7) identified the 2001 as the ldquobeginning of widespreadawareness of the research problems and opportunities that SentimentAnalysis and Opinion Mining raise and subsequently there have beenliterally hundreds of papers published on the subjectrdquoThe context that facilitated the growth of interest around the auto-matic treatment of opinion and emotions can be summarized in thefollowing points

bull the expansion of the e-commerce (Matthews et al 2001)

bull the growth of the user generated contents (forum discussiongroup blog social media review website aggregation site) thatcan constitute large scale databases for the machine learning al-gorithms training (Chin 2005 Pang and Lee 2008a)

1

2 CHAPTER 1 INTRODUCTION

bull the rise of machine learning methods (Pang and Lee 2008a)

bull the importance of the on-line Word of Mouth (eWOM) (Gruenet al 2006 Lee and Youn 2009 Park and Lee 2009 Chu and Kim2011)

bull the customer empowerment (Vollero 2010)

bull the large volume the high velocity and the wide variety of un-structured data (Russom et al 2011 Villars et al 2011 McAfeeet al 2012)

The same preconditions caused a parallel raising of attention alsoin economics and marketing literature studies Basically through usergenerated contents consumers share positive or negative informationthat can influence in many different ways the purchase decisionsand can model the buyer expectations above all with regard to experi-ence goods (Nakayama et al 2010) such as hotels (Ye et al 2011 Nel-son 1970) restaurants (Zhang et al 2010) movies (Duan et al 2008Reinstein and Snyder 2005) books (Chevalier and Mayzlin 2006) orvideogames (Zhu and Zhang 2006 Bounie et al 2005)

11 Research Context

Experience goods are products or services whose qualities cannot beobserved prior to purchase They are distinguished from search goods(eg clothes smartphones) on the base of the possibility to test thembefore the purchase and depending on the information costs (Nelson1970) (Akerlof 1970 p 488-490) clearly exemplified the problemscaused by the interaction of quality differences and uncertainty

ldquoThere are many markets in which buyers use some marketstatistic to judge the quality of prospective purchases []The automobile market is used as a finger exercise to illus-trate and develop these thoughts [] The individuals in this

11 RESEARCH CONTEXT 3

market buy a new automobile without knowing whether thecar they buy will be good or a lemon1 [] After owninga specific car however for a length of time the car ownercan form a good idea of the quality of this machine [] Anasymmetry in available information has developed for thesellers now have more knowledge about the quality of a carthan the buyers [] The bad cars sell at the same price asgood cars since it is impossible for a buyer to tell the differ-ence between a good and a bad car only the seller knowsrdquo

Asymmetric information could damage both the customers whenthey are not able to recognize good quality products and servicesand the market itself when it is negatively affected by the presenceof a smaller quantity of quality products with respect to a quantityof quality products that would have been offered under the conditionof symmetrical information (Lavanda and Rampa 2001 von Ungern-Sternberg and von Weizsaumlcker 1985)However it has been deemed possible for experience goods to basean evaluation of their quality on the past experiences of customersthat in previous ldquoperiodsrdquo had already experienced the same goods

ldquoIn most real world situations suppliers stay on the marketfor considerably longer than one lsquoperiodrsquo They then havethe possibility to build up reputations or goodwill with theconsumers This is due to the following mechanism Whilethe consumers cannot directly observe a productrsquos quality atthe time of purchase they may try to draw inferences aboutthis quality from the past experience they (or others) havehad with this supplierrsquos productsrdquo (von Ungern-Sternbergand von Weizsaumlcker 1985 p 531)

Possible sources of information for buyers of experience goods areword of mouth and the good reputation of the sellers (von Ungern-Sternberg and von Weizsaumlcker 1985 Diamond 1989 Klein and Lef-

1A defective car discovered to be bad only after it has been bought

4 CHAPTER 1 INTRODUCTION

fler 1981 Shapiro 1982 1983 Dewally and Ederington 2006)The possibility to survey other customersrsquo experiences increases thechance of selecting a good quality service and the customer expectedutility It is affordable to continue with this search until the cost ofsuch is equated to its expected marginal return This is the ldquooptimumamount of searchrdquo defined by Stigler (1961)The rapid growth of the Internet drew the managers and business aca-demics attention to the possible influences that this medium can ex-ert on customers information search behaviors and acquisition pro-cesses The hypothesis that the Web 20 as interactive medium couldmake transparent the experience goods market by transforming it intoa search goods market has been explored by Klein (1998) who hasbased his idea on three factors

bull lower search costs

bull different evaluation of some product (attributes)

bull possibility to experience products (attributes) virtually withoutphysically inspecting them

Actually this idea seems too groundbreaking to be entirely true In-deed the real outcome of web searches depends on the quantity ofavailable information on its quality and on the possibility to makefruitful comparisons among alternatives (Alba et al 1997 Nakayamaet al 2010)Therefore from this perspective it is true that the growth of the usergenerated contents and the eWOM can truly reduce the informa-tion search costs On the other hand the distance increased by e-commerce the content explosion and the information overload typi-cal of the Big Data age can seriously hinder the achievement of a sym-metrical distribution of the information affecting not only the marketof experience goods but also that of search goods

12 FILTERING INFORMATION AUTOMATICALLY 5

12 Filtering Information Automatically

The last remarks on the influence of Internet on information searchesinevitably reopen the long-standing problems connected to web in-formation filtering (Hanani et al 2001) and to the possibility of auto-matically ldquoextracting value from chaosrdquo (Gantz and Reinsel 2011)An appropriate management of online corporate reputation requiresa careful monitoring of the new digital environments that strengthenthe stakeholdersrsquo influence and independence and give support dur-ing the decision making process As an example to deeply analyzethem can indicate to a company the areas that need improvementsFurthermore they can be a valid support in the price determinationor in the demand prediction (Bloom 2011 Pang and Lee 2008a)It would be difficult indeed for humans to read and summarize sucha huge volume of data about costumer opinions However in otherrespects to introduce machines to the semantic dimension of humanlanguage remains an ongoing challengeIn this context business and intelligence applications would play acrucial role in the ability to automatically analyze and summarize notonly databases but also raw data in real timeThe largest amount of on-line data is semistructured or unstructuredand as a result its monitoring requires sophisticated Natural Lan-guage Processing (NLP) tools that must be able to pre-process themfrom their linguistic point of view and then automatically accesstheir semantic contentTraditional NLP tasks consist in Data Mining Information Retrievaland Extraction of facts objective information about entities fromsemistructured and unstructured textsIn any case it is of crucial importance for both customers and compa-nies to dispose of automatically extracted analyzed and summarizeddata which do not include only factual information but also opinionsregarding any kind of good they offer (Liu 2010 Bloom 2011)Companies could take advantage of concise and comprehensive cus-

6 CHAPTER 1 INTRODUCTION

tomer opinion overviews that automatically summarize the strengthsand the weaknesses of their products and services with evident ben-efits in term of reputation management and customer relationshipmanagementCustomer information search costs could be decreased trough thesame overviews which offer the opportunity to evaluate and comparethe positive and negative experiences of other consumers who havealready tested the same products and servicesObviously the advantages of sophisticated NLP methods and soft-ware and their ability to distinguish factual from opinionated lan-guage are not limited to the ones discussed so far but they are dis-persed and specialized among different tasks and domains such as

Ads placement possibility to avoid websites that are irrelevant butalso unfavorable when placing advertisements (Jin et al 2007)

Question-answering chance to distinguish between factual andspeculative answers (Carbonell 1979)

Text summarization potential of summarizing different opinionsand perspectives in addition to factual information (Seki et al2005)

Recommendation systems opportunity to recommend only the itemswith positive feedback (Terveen et al 1997)

Flame and cyberbullying detection ability to find semantically andsyntactically complex offensive contents on the web (Xiang et al2012 Chen et al 2012 Reynolds et al 2011)

Literary reputation tracking possibility to distinguish disapprovingfrom supporting citations (Piao et al 2007 Taboada et al 2006)

Political texts analysis opportunity to better understand positive ornegative orientations of both voters or politicians (Laver et al2003 Mullen and Malouf 2006)

13 SUBJECTIVITY EMOTIONS AND OPINIONS 7

13 Subjectivity Emotions and Opinions

Sentiment Analysis Opinion Mining subjectivity analysis review min-ing appraisal extraction affective computing are all terms that refer tothe computational treatment of opinion sentiment and subjectivity inraw text Their ultimate shared goal is enabling computers to recog-nize and express emotions (Pang and Lee 2008a Picard and Picard1997)The first two terms are definitely the most used ones in literature Al-though they basically refer to the same subject they have demon-strated varying degrees of success into different research communi-ties While Opinion Mining is more popular among web informationretrieval researchers Sentiment Analysis is more used into NLP en-vironments (Pang and Lee 2008a) Therefore the coherence of ourworkrsquos approach is the only reason that justifies the favor of the termSentiment Analysis All the terms mentioned above could be used andinterpreted nearly interchangeablySubjectivity is directly connected to the expression of private statespersonal feelings or beliefs toward entities events and their proper-ties (Liu 2010) The expression private state has been defined by Quirket al (1985) and Wiebe et al (2004) as a general covering term for opin-ions evaluations emotions and speculationsThe main difference from objectivity regards the impossibility to di-rectly observe or verify subjective language but neither subjective norobjective implies truth ldquowhether or not the source truly believes theinformation and whether or not the information is in fact true areconsiderations outside the purview of a theory of linguistic subjectiv-ityrdquo (Wiebe et al 2004 p 281)Opinions are defined as positive or negative views attitudes emotionsor appraisals about a topic expressed by an opinion holder in a giventime They are represented by Liu (2010) as the following quintuple

oj fjk ooijkl hi tl

8 CHAPTER 1 INTRODUCTION

Where oj represents the object of the opinion fjk its features ooijklthe positive or negative opinion semantic orientation hi the opinionholder and tl the time in which the opinion is expressedBecause the time can almost alway be deducted from structured datawe focused our work on the automatic detection and annotation ofthe other elements of the quintupleAs regard the opinion holder it must be underlined that ldquoIn the caseof product reviews and blogs opinion holders are usually the authorsof the posts Opinion holders are more important in news articles be-cause they often explicitly state the person or organization that holdsa particular opinionrdquo (Liu 2010 p 2)The Semantic Orientation (SO) gives a measure to opinions byweighing their polarity (positivenegativeneutral) and strength (in-tenseweak) (Taboada et al 2011 Liu 2010) The polarity can be as-signed to words and phrases inside or outside of a sentence or dis-course context In the first case it is called prior polarity (Osgood1952) in the second case contextual or posterior polarity (Gatti andGuerini 2012)We can refer to both opinion objects and features with the term target(Liu 2010) represented by the following function

T=O(f)

Where the object can take the shape of products services individ-uals organizations events topics etc and the features are compo-nents or attributes of the objectEach object O represented as a ldquospecial featurerdquo which is defined by asubset of features is formalized in the following way

F = f1 f2 fn

Targets can be automatically discovered in texts through both syn-onym words and phrases Wi or indicators Ii (Liu 2010)

Wi = wi1 wi2 wim

14 STRUCTURE OF THE THESIS 9

Ii = ii1 ii2 iiq

Emotions constitute the research object of the Emotion Detec-tion (Strapparava et al 2004 Fragopanagos and Taylor 2005 Almet al 2005 Strapparava et al 2006 Neviarouskaya et al 2007 Strap-parava and Mihalcea 2008 Whissell 2009 Argamon et al 2009Neviarouskaya et al 2009b 2011 de Albornoz et al 2012) a very ac-tive NLP subfield that encroaches on Speech Recognition (Buchananet al 2000 Lee et al 2002 Pierre-Yves 2003 Schuller et al 2003Bhatti et al Schuller et al 2004) and Facial Expression Recognition(Essa et al 1997 Niedenthal et al 2000 Pantic and Patras 2006 Palet al 2006 De Silva et al 1997) In this NLP based work we focus onthe Emotion Detection from written raw textsIn order to provide a formal definition of sentiments we refer to thefunction of Gross (1995)

P(senth)

Caus(s sent h)

In the first one the experiencer of the emotion (h) is function ofa sentiment (Sent) through a predicative relationship the latter ex-presses a sentiment that is caused by a stimulus (s) on a person (h)

14 Structure of the Thesis

In this Chapter we introduced the research field of the SentimentAnalysis placing it in the wider framework of Computational Linguis-tics and Natural Language Processing We tried to clarify some termi-nological issues raised by the proliferation and the overlap of termsin the literature studies on subjectivity The choice of these topics hasbeen justified through the brief presentation of the research contextwhich poses numerous challenges to all the Internet stakeholdersThe main aim of the thesis has been identified with the necessity to

10 CHAPTER 1 INTRODUCTION

treat both facts and opinion expressed in the the Web in the form ofraw data with the same accuracy and velocity of the data stored indatabase tablesDetails about the contribution provided by this work to SentimentAnalysis studies are structured as followsChapter 2 introduces the theoretical framework on which the wholework has been grounded the Lexicon-Grammar (LG) approach withan in-depth examination of the distributional and transformationalanalysis and the semantic predicates theory (Section 21)The assumption that the elementary sentences are the basic discourseunits endowed with meaning and the necessity to replace a uniquegrammar with plenty of lexicon-dependent local grammars are thefounding ideas from the LG approach on which the research hasbeen designed and realizedThe formalization of the sentiment lexical resources and tools as beenperformed through the tools described in Section 22Coherently with the LG method our approach to Sentiment Analy-sis is lexicon-based Chapter 3 provides a detailed description of thesentiment lexicon built ad hoc for the Italian language It includesmanually annotated resources for both simple (Section 33) and mul-tiword (Section 34) units and automatically derived annotations (Sec-tion 35) for the adverbs in -mente and for the nouns of qualityBecause the wordrsquos meaning cannot be considered out of contextChapter 3 explores the influences of some elementary sentence struc-tures on the sentiment words occurring in them while Chapter 4 in-vestigates the effects on the sentence polarities of the so called Con-textual Valence Shifters (CVS) namely intensification and downtoning(Section 42) Negation (Section 43) Modality (Section 44) and Com-parison (Section 45)Chapter 5 focuses on the three most challenging subtasks of the Senti-ment Analysis the sentiment polarity classification of sentences anddocuments (Section 51) the Sentiment Analysis based on features(Section 52) and the sentiment-based semantic role labeling (Sec-

14 STRUCTURE OF THE THESIS 11

tion 53) Experiments about these tasks are conduced through ourresources and rules respectively on a multi-domain corpus of cus-tomer reviews on a dataset of hotels comments and on a dataset thatmixes tweets and news headlinesConclusive remarks and limitations of the present work are reportedin Chapter 6Due to the large number of challenges faced in this work and in or-der to avoid penalizing the clarity and the readability of the thesiswe preferred to mention the state of the art works related to a giventopic directly in the sections in which the topic is discussed For easyreference we anticipate that the literature survey on the already exis-tent sentiment lexicons is reported in Section 32 the methods testedin literature for the lexicon propagation are presented in 351 worksrelated to intensification and negation modeling to modality detec-tion and to comparative sentence mining are respectively mentionedin Sections 421 431 441 and 451 in the end state of the artapproaches to sentiment classification the feature-based SentimentAnalysis and the Sentiment Role Labeling (SRL) tasks are cited in Sec-tions 511 521 and 531Some Sections of this thesis refer to works that in part have been al-ready presented to the international computational linguistics com-munity These are the cases of Chapter 3 briefly introduced in Pelosi(2015ab) and Pelosi et al (2015) and Chapter 4 anticipated in Pelosiet al (2015) Furthermore the results presented in Section 51 havebeen discussed in Maisto and Pelosi (2014b) as well as the results ofSection 52 in Maisto and Pelosi (2014a) and the ones of Section 53 inElia et al (2015)

Chapter 2

Theoretical Background

21 The Lexicon-Grammar Theoretical Framework

With Lexicon-Grammar we mean the method and the practice offormal description of the natural language introduced by MauriceGross in the second half of the 1960s who during the verificationof some procedures from the transformational-generative grammar(TGG) (Chomsky 1965) laid the foundations for a brand new theo-retical frameworkMore specifically during the description of French complementclause verbs through a transformational-generative approach Gross(1975) realized that the English descriptive model of Rosenbaum(1967) was not enough to take into account the huge number of ir-regularities he found in the French languageLG introduced important changes in the way in which the relationshipbetween lexicon and syntax was conceived (Gross 1971 1975) It hasbeen underlined for the first time the necessity to provide linguisticdescriptions grounded on the systematic testing of syntactic and se-mantic rules along the whole lexicon and not only on a limited set of

13

14 CHAPTER 2 THEORETICAL BACKGROUND

speculative examplesActually in order to define what can be considered to be a rule andwhat must be considered an exception a small sample of the lexiconof the analyzed language collected in arbitrary ways can be truly mis-leading It is assumed to be impossible to make generalizations with-out verifying or falsifying the hypotheses According with (Gross 1997p 1)

ldquoGrammar or as it has now been called linguistic the-ory has always been driven by a quest for complete gen-eralizations resulting invariably in recent times in the pro-duction of abstract symbolism often semantic but alsoalgorithmic This development contrasts with that of theother Natural Sciences such as Biology or Geology wherethe main stream of activity was and is still now the searchand accumulation of exhaustive data Why the study of lan-guage turned out to be so different is a moot question Onecould argue that the study of sentences provides an endlessvariety of forms and that the observer himself can increasethis variety at will within his own production of new formsthat would seem to confirm that an exhaustive approachmakes no sense []But grammarians operating at the level of sentences seemto be interested only in elaborating general rules and do sowithout performing any sort of systematic observation andwithout a methodical accumulation of sentence forms to beused by further generations of scientistsrdquo

In the LG methodology it is crucial the collection and the analysisof a large quantity of linguistic facts and their continuous compari-son with the reality of the linguistic usages by examples and counter-examplesThe collection of the linguistic information is constantly registeredinto LG tables that cross-check the lexicon and the syntax of any given

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 15

language The ultimate purpose of this procedure is the definition ofwide-ranging classes of lexical items associated to transformationaldistributional and structural rules (DrsquoAgostino 1992)What emerges from the LG studies is that associating more than fiveor six properties to a lexical entries each one of such entries shows anindividual behavior that distinguishes it from any other lexical itemHowever it is always possible to organize a classification around at listone definitional property that is simultaneously accepted by all theitem belonging to a same LG class and for this reason is promoted asdistinctive feature of the classHaving established this it becomes easier to understand the necessityto replace ldquotherdquo grammar with thousands of lexicon-dependent localgrammarsThe choice of this paradigm is due to its compatibility with the pur-poses of the present work and with the computational linguistics ingeneral that in order to reach high performances in results requiresa large amount of linguistic data which must be as exhaustive repro-ducible and well organized as possibleThe huge linguistic datasets produced over the years by the inter-national LG community provide fine-grained semantic and syntac-tic descriptions of thousands of lexical entries also referred as ldquolexi-cally exhaustive grammarsrdquo (DrsquoAgostino 2005 DrsquoAgostino et al 2007Guglielmo 2009) available for reutilization in any kind of NLP task1The LG classification and description of the Italian verbs2 (Elia et al1981 Elia 1984 DrsquoAgostino 1992) is grounded on the differentiationof three different macro-classes transitive verbs intransitive verbsand complement clause verbs Every LG class has its own definitionalstructure that corresponds with the syntactic structure of the nuclearsentence selected by a given number of verbs (eg V for piovere ldquotorain and all the verbs of the class 1 N0 V for bruciare ldquoto burn and

1 LG descriptions are available for 4000+ nouns that enter into verb support constructions 7000+verbs 3000+ multiword adverbs and almost 1000 phrasal verbs (Elia 2014a)

2LG tables are freely available at the address httpdscunisaitcompostitavolecombo

tavoleasp

16 CHAPTER 2 THEORETICAL BACKGROUND

the other verbs of the class 3 N0 V da N1 for provenire ldquoto come fromand the verbs belonging to the class 6 etc) All the lexical entries arethen differentiated one another in each class by taking into accountall the transformational distributional and structural properties ac-cepted or rejected by every item

211 Distributional and Transformational Analysis

The Lexicon-Grammar theory lays its foundations on the Operator-argument grammar of Zellig S Harris the combinatorial system thatsupports the generation of utterances into the natural languageSaying that the operators store inside information regarding the sen-tence structures means to assume the nuclear sentence to be the min-imum discourse unit endowed with meaning (Gross 1992b)This premise is shared with the LG theory together with the centralityof the distributional analysis a method from the structural linguisticsthat has been formulated for the first time by Bloomfield (1933) andthen has been perfected by Harris (1970) The insight that some cat-egories of words can somehow control the functioning of a numberof actants through a dependency relationship called valency insteadcomes from Tesniegravere (1959)The distribution of an element A is defined by Harris (1970) as the sumof the contexts of A Where the context of A is the actual disposition ofits co-occurring elements It consists in the systematic substitution oflexical items with the aim of verifying the semantic or the transforma-tional reactions of the sentences All of this is governed by verisimil-itude rules that involve a graduation in the acceptability of the itemcombinationAlthough the native speakers of a language generally think that thesentence elements can be combined arbitrarily they actually choosea set of items along the classes that regularly appear together Theselection depends on the likelihood that an element co-occurs with

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 17

elements of one class rather than another3When two sequences can appear in the same context they have anequivalent distribution When they do not share any context they havea complementary distributionObviously the possibility to combine sentence elements is related tomany different levels of acceptability Basically the utterances pro-duced by the words co-occurrence can vary in terms of plausibilityIn the operator-argument grammar of Harris the verisimilitude of oc-currence of a word under an operator (or an argument) is an approxi-mation of the likelihood or of the frequency of this word with respectto a fixed number of occurrence of the operator (or argument) (Harris1988)Concerning transformations (Harris 1964) we refer to the phe-nomenon in which the sentences A and B despite a different combi-nation of words are equivalent to a semantic point of view (eg Flo-ria ha letto quel romanzo = quel romanzo egrave stato letto da Floria ldquoFlo-ria read that novel = that novel has been read by Floriardquo (DrsquoAgostino1992)) Therefore a transformation is defined as a function T that tiesa set of variables representing the sentences that are in paraphrasticequivalence (A and B are partial or full synonym) and that follow themorphemic invariance (A and B possess the same lexical morphemes)Transformational relations must not be mixed up with any derivationprocess of the sentence B from A and vice versa A and B are just twocorrelated variants in which equivalent grammatical relations are re-alized (DrsquoAgostino 1992)4 That is why in this work we do not use di-rected arrows to indicate transformed sentencesAmong the transformational manipulations considered in this thesiswhile allying the Lexicon-Grammar theoretical framework to the Sen-

3Among the traits that mark the co-occurrences classes we mention by way of example human andnot human nouns abstract nouns verbs with concrete or abstract usages etc (Elia et al 1981)

4 Different is the concept of transformation in the generative grammar that envisages an orientedpassage from a deep structure to a superficial one ldquoThe central idea of transformational grammar isthat they are in general distinct and that the surface structure is determined by repeated applicationof certain formal operations called lsquogrammatical transformationsrsquo to objects of a more elementary sortrdquo(Chomsky 1965 p 16)

18 CHAPTER 2 THEORETICAL BACKGROUND

timent Analysis field we mention the following (Elia et al 1981 Vietri2004)

Substitution The substitution of polar elements in sentences is thestarting point of this work that explores the mutations in term ofacceptability and semantic orientation in the transformed sen-tences It can be found almost in every chapter of this thesis

Paraphrases with support verbs5 nominalizations and adjectival-izations This manipulation is essential for example in Section334 and 335 where we discuss the nominalization (eg amare= amore ldquoto love = loverdquo) and the adjectivalization (eg amare =amoroso ldquoto love = lovingrdquo) of the Italian psychological verbs orin Section 352 when we account for the relation among a set ofsentiment adjectives and adverbs (eg appassionatamente = inmodo appassionato ldquopassionately = in a passionate wayrdquo)

Repositioning Emphasis permutation dislocation extraction are alltransformation that shifting the focus on a specific element ofthe sentence influence its strength when the element is polar-ized

Extension We are above all interested in the expansion of nominalgroups with opinionated modifiers (see Section 332)

Restructuring The complement restructuring plays a crucial role inthe Feature-based Sentiment Analysis for the correct annotationof the features and the objects of the opinions (see Sections 334and 52)

Passivization In our system of rules passive sentences merely pos-

5 The concept of operator does not depend on specific part of speech so also nouns adjectives andprepositions can possess the power to determine the nature and the number of the sentence arguments(DrsquoAgostino 1992) Because only the verbs carry out morpho-grammatical information regarding themood tense person and aspect they must give this kind of support to non-verbal operators The socalled Support Verbs (Vsup) are different from auxiliaries (Aux) that instead support other verbs Sup-port verbs (eg essere ldquoto berdquo avere ldquoto haverdquo fare ldquoto dordquo) can be case by case substituted by stylisticaspectual and causative equivalents

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 19

sess the same semantic orientation of the corresponding activesentence

Systematic correlation Examples are the standardcrossed structuresmentioned in Section 334

212 Formal Representation of Elementary Sentences

While the generative grammar focused its studies on complex sen-tences both the Operator grammar and the LG approach assumedthe respectively so called nuclear or elementary sentences to bethe minimum discourse units endowed with meaning Above all be-cause complex sentences are made from elementary ones as statedby (Chomsky 1957 p 92) himself

ldquoIn particular in order to understand a sentence it is neces-sary to know the kernel sentences from which it originates(more precisely the terminal strings underlying these ker-nel sentences) and the phrase structure of each of these el-ementary components as well as the transformational his-tory of development of the given sentence from these kernelsentences The general problem of analyzing the process oflsquounderstandingrsquo is thus reduced in a sense to the problemof explaining how kernel sentences are understood thesebeing considered the basic rsquocontent elementsrsquo from whichthe usual more complex sentences of real life are formed bytransformational developmentrdquo

This assumption shifts the definition of creativity from ldquorecursivityrdquoin complex sentences (Chomsky 1957) to ldquocombinatory possibilityrdquoat the level of elementary sentences (Vietri 2004) but affects also theway in which the lexicon is conceived (Gross 1981 p 48) made thefollowing observation in this regard6

6 On this topic also Chomsky agrees that

20 CHAPTER 2 THEORETICAL BACKGROUND

ldquoLes entreacutees du lexique ne sont pas des mots mais des phrasessimples Ce principe nrsquoest en contradiction avec les notionstraditionnelles de lexique que de faccedilon apparente En ef-fet dans un dictionnaire il nrsquoest pas possible de donner lesens drsquoun mot sans utiliser une phrase ni de contraster desemplois diffeacuterents drsquoun mecircme mot sans le placer dans desphrasesrdquo7

As it will be discovered in the following Sections the idea that theentries of the lexicon can be sentences rather that isolated words is aprinciple that has been entirely shared among the sentiment lexicalresources built for the present workWith regard to formalisms Gross (1992b) represented all the elemen-tary sentences through the following generic shape

N0 V WWhere N0 stands for the sentence formal subject V stands for the

verbs and W can indicate all kinds of essential complements includ-ing an empty one While N0 V represents a great generality W raisesmore complex classification problems (Gross 1992b) Complementsindicated by W can be direct (Ni) prepositional (Prep Ni) or sentential(Ch F)

ldquoIn describing the meaning of a word it is often expedient or necessary to refer to thesyntactic framework in which this word is usually embedded eg in describing themeaning of hit we would no doubt describe the agent and object of the action in termsof the notions subject and object which are apparently best analyzed as purelyformal notions belonging to the theory of grammarrdquo (Chomsky 1957 p 104)

But without any exhaustive application of syntactic rules to lexical items he feels compelled to pointout that

ldquo() to generalize from this fairly systematic use and to assign lsquostructural meaningsrsquo togrammatical categories or constructions just as lsquolexical meaningsrsquo are assigned to wordsor morphemes is a step of very questionable validityrdquo (Chomsky 1957 p 104)

7 ldquoThe entries of the lexicon are not words but simple sentences This principle contradicts the tradi-tional notions of lexicon just into an apparent way Indeed into a dictionary it is impossible to providethe meaning of a word without a sentence nor to contrast different uses of the same word without placingit into sentencesrdquo Author translation

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 21

Information about the nature and the number of complement arecontained in the verbs (or in other parts of speech that case by caseplay the a predicative function)Gross (1992b) presented the following typology of complements re-spectively approached for the Italian language by the LG researchersmentioned below

frozen arguments (C) Vietri (1990 2011 2014d)

free arguments (N) DrsquoAgostino (1992) Bocchino (2007)

sentential arguments (Ch F) Elia (1984)

2121 The Continuum from Simple Sentences to Polyrematic Structures

Gross (1988 1992b 1993) recognized and formalized the concept ofphrase figeacutee ldquofrozen sentencerdquo which refers to groups of words thatcan be tied one another by different degrees of variability and cohe-sion (Elia and De Bueriis 2008) They can take the shape of verbalnominal adjectival or adverbial structuresThe continuum along which frozen expressions can be classifiedvaries according to higher or lower levels of compositionality and id-iomaticity and goes from completely free structures to totally frozenexpressions as summarized below (DrsquoAgostino and Elia 1998 Eliaand De Bueriis 2008)

distributionally free structures high levels of co-occurrence variabil-ity among words

distributionally restricted structures limited levels of co-occurrencevariability

frozen structures (almost) null levels of co-occurrence variability

proverbs no variability of co-occurrence

22 CHAPTER 2 THEORETICAL BACKGROUND

Idiomatic interpretations can be attributed to the last two elements ofthe continuum due to their low levels of compositionalityThe fact that the meaning of these expressions cannot be computedby simply summing up the meanings of the words that compose themhas a direct impact on every kind of Sentiment Analysis applicationthat does not correctly approaches the linguistic phenomenon of id-iomaticityWe will show the role of multiword expressions in Section 34

2122 Against the Bipartition of the Sentence Structure

In the operator grammar differently from the generative grammarsentences are not supposed to have a bipartite structure

S rarr N P + V P

V P rarrV + N P 8

but are represented as predicative functions in which a central ele-ment the operator influences the organization of its variables thearguments The role of operator must not be represented necessar-ily by verbs but also by other lexical items that possess a predicativefunction such as nous or adjectivesThis idea shared also by the LG framework is particularly importantin the Sentiment Analysis field especially if the task is the detection ofthe semantic roles involved into sentiment expressions (Section 53)Sentences like (1) and (2) are examples in which the semantic roles ofexperiencer of the sentiment and stimulus of the sentiment are respec-tively played by ldquoMariardquo and il fumo ldquothe smokerdquo It is intuitive thatfrom a syntactic point of view these semantic roles are differently dis-tributed into the direct (the experiencer is the subject such as in 1)and the reverse (the experiencer is the object as in 2) sentence struc-tures (see Section 333 for further explanations)

8Sentence = Noun Phrase + Verb Phrase Verb Phrase = Verb + Noun Phrase

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 23

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]9

ldquoMaria hates the smokerdquo

(2) [Sentiment[StimulusIl fumo] tormenta [ExperiencerMaria]]ldquoThe smoke harasses Mariardquo

The indissoluble bond between the sentiment and the experi-encer10 that does not differ from (1) to (2) is perfectly expressedby the Operator-argument grammar function Onn for both the sen-tences by the LG representation N0 V N1 or by the function of Gross(1995) Caus(ssenth) (described in Section 13 and deepened in Sec-tion 3)

(1) Caus(il fumo odiare Maria)(2) Caus(il fumo tormentare Maria)

The same can not be said of the sentence bipartite structure thatrepresenting the experiencer ldquoMariardquo inside (2) and outside (1) theverb phrase unreasonably brings it closer or further from the senti-ment to which it should be attached just in the same way (Figure 21)

Moreover it does not explain how the verbs of (1) and (2) can ex-ercise distributional constraints on the noun phrase indicating the ex-periecer (that must be human or at least animated) both in the casein which it is under the dependence of the same verb phrase (experi-encer object see 2b) and also when it is not (experiencer subject see2a)

(1a) [Sentiment [Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]odi a[Stimulusi l f umo]]

ldquo(Maria + My cat + The television) hates the smokerdquo

9For the sentence annotation we used the notation of Jackendoff (1990) [Event ]10Un sentiment est toujours attacheacute agrave la personne qui lrsquoeacuteprouve ldquoA sentiment is always attached to the

person that feels itrdquo (Gross 1995 p 1)

24 CHAPTER 2 THEORETICAL BACKGROUND

Figure 21 Syntactic Trees of the sentences (1) and (2)

(2a) [Sentiment[StimulusI l f umo]tor ment a[Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]]

ldquoThe smoke harasses (Maria + My cat + The television)rdquo

According with (Gross 1979 p 872) the problem is that

ldquorewriting rules (eg S rarr N P V P V P rarr V N P ) describeonly LOCAL dependencies [] But there are numerous syn-tactic situations that involve non-local constraintsrdquo

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 25

While the Lexicon-Grammar and the operator grammar represen-tations do not change the representations of the paraphrases of (12)with support verbs (1b2b) further complicate the generative gram-mar solution that still does not correctly represent the relationshipsamong the operators and the arguments (see Figure 22) It fails todetermine the correct number of arguments (that are still two andnot three) and does not recognize the predicative function which isplayed by the nouns odio ldquohaterdquo and tormento ldquoharassmentrdquo and notby the verbs provare ldquoto feelrdquo and dare ldquoto giverdquo

(1b) [Sentiment[ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

(2b) [Sentiment[StimulusIl fumo] dagrave il tormento a [ExperiencerMaria]]ldquoThe smoke gives harassment to Mariardquo

For all these reasons for the computational treatment of subjectiv-ity emotions and opinions in raw text we preferred in this work theLexicon-Grammar framework to other linguistic approaches despiteof their popularity in the NLP literature

213 Lexicon-Grammar Binary Matrices

We said that the difference between the LG language description andany other linguistic approach regards the systematic formalization ofa very broad quantity of data This means that given a set of proper-ties P that define a class of lexical items L the LG method collects andformalizes any single item that enters in such class (extensional classi-fication) while other approaches limit their analyses only to reducedsets of examples (intensional classification) (Elia et al 1981)Because the presentation of the information must be as transparentrigorous and formalized as possible LG represents its classes throughbinary matrices like the one exemplified in Table 21

They are called ldquobinaryrdquo because of the mathematical symbols ldquo+rdquo

26 CHAPTER 2 THEORETICAL BACKGROUND

Figure 22 Syntactic Trees of the sentences (1b) and (2b)

and ldquondashrdquo respectively used to indicate the acceptance or the refuse ofthe properties P by each lexical entry LThe symbol X unequivocally denotes an LG class and is always asso-ciated to a class definitional property simultaneously accepted by allthe items of the class (in the example of Table 21 P1) Subclasses of Xcan be easily built by choosing another P as definitional property (egL2 L3 and L4 under the definitional substructure P2)The sequences of ldquo+rdquo and ldquondashrdquo attributed to each entry constitute the

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 27

X P1 P2 P3 P4 Pn

L1 + ndash ndash ndash +L2 + + ndash ndash ndashL3 + + + + +L4 + + + + +Ln + ndash ndash ndash +

Table 21 Example of a Lexicon-Grammar Binary Matrix

ldquolexico-syntactic profilesrdquo of the lexical items (Elia 2014a)The properties on which the classification is arranged can regard thefollowing aspects (Elia et al 1981)

distributional properties by which all the co-occurrences of the lexi-cal items L with distributional classes and subclasses are testedinside acceptable elementary sentence structures

structural and transformational properties through which all thecombinatorial transformational possibilities inside elementarysentence structures are explored (see Section 211 for more in-depth explanations)

It is possible for a lexical entry to appear in more than one LG ma-trix In this case we are not referring to a single verb that accepts di-vergent syntactic and semantic properties but instead to two differ-ent verb usages attributable to the same morpho-phonological entry(Elia et al 1981 Vietri 2004 DrsquoAgostino 1992) Verb usages possessthe same syntactic and semantic autonomy of two verbs that do notpresent any morpho-phonological relation

214 Syntax-Semantics Interface

In the sixties the hot debate about the syntax-semantics Interface op-posed theories supporting the autonomy of the syntax from the se-mantics to approaches sustaining the close link between these two

28 CHAPTER 2 THEORETICAL BACKGROUND

linguistic dimensionsChomsky (1957) postulated the independence of the syntax from thesemantics due to the vagueness of the semantics and because of thecorrespondence mismatches among these two linguistic dimensionsbut recognized the existence of a relation between the two that de-serves to be explored

ldquoThere is however little evidence that lsquointuition aboutmeaningrsquo is at all useful in the actual investigation of lin-guistic form []It seems clear then that undeniable though only imperfectcorrespondences hold between formal and semantic fea-tures in language The fact that the correspondences are soinexact suggests that meaning will be relatively useless as abasis for grammatical description []The fact that correspondences between formal and seman-tic features exist however cannot be ignored [] Havingdetermined the syntactic structure of the language we canstudy the way in which this syntactic structure is put to usein the actual functioning of language An investigation ofthe semantic function of level structure () might be a rea-sonable step towards a theory of the interconnections be-tween syntax and semanticsrdquo (Chomsky 1957 p 94-102)

The contributions that come from the generative grammar withparticular reference to the theta-theory (Chomsky 1993) assume thepossibility of a mapping between syntax and semantics

ldquoEvery content bearing major phrasal constituent of a sen-tence (S NP AP PP ETC) corresponds to a conceptual con-stituent of some major conceptual category11rdquo (Jackendoff1990 p 44)

11Object Event State Action Place Path Property Amount

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 29

But their autonomy is not negated through this assumption the-matic roles are still considered to be part of the level of conceptualstructure and not part of syntax (Jackendoff 1990) This is confirmedagain by (Chomsky 1993 p 17) when stating that ldquo() the mappingof S-structures onto PF and LF are independent from one anotherrdquoHere the S-structure generated by the rules of the syntax is associ-ated by means of different interpretative rules with representations inphonetic form (PF) and in logical form (LF)Many contributions from the linguistic community define the con-cept of thematic roles through the linking problem between syntax(syntactic realization of predicate argument structures that determinethe roles) and semantics (semantic representation of such structures)(Dowty 1989 Grimshaw 1990 Rappaport Hovav and Levin 1998)Among the canonical thematic roles we mention the agent (Gruber1965) the patient Baker (1988) the experiencer (Grimshaw 1990) thetheme (Jackendoff 1972) ect

2141 Frame Semantics

ldquoSome words exist in order to provide access to knowledgeof such frames to the participants in the communicationprocess and simultaneously serve to perform a categoriza-tion which takes such framing for granted (Fillmore 2006p 381)

With these words Fillmore (2006) depicted the Frame Semanticswhich describes sentences on the base of predicators able to bring tomind the semantic frames (inference structures linked through lin-guistic convention to the lexical items meaning) and the frame ele-ments (participants and props in the frame) involved in these frames(Fillmore 1976 Fillmore and Baker 2001 Fillmore 2006)A frame semantic description starts from the identification of the lex-ical items that carry out a given meaning and then explores theways in which the frame elements and their constellations are realized

30 CHAPTER 2 THEORETICAL BACKGROUND

around the structures that have such items as head (Fillmore et al2002)This theoretical framework poses its bases on the Case grammar astudy on the combination of the deep cases chosen by verbs an on thedefinition of case frames that have the function of providing ldquoa bridgebetween descriptions of situations and underlying syntactic represen-tationsrdquo by attributing ldquosemantico-syntactic roles to particular partic-ipants in the (real or imagined) situation represented by the sentence(Fillmore 1977 p 61) A direct reference of Fillmorersquos work is the va-lency theory of Tesniegravere (1959)Based on these principles the FrameNet research project produced alexicon of English for both human use and NLP applications (Bakeret al 1998 Fillmore et al 2002 Ruppenhofer et al 2006) Its purposeis to provide a large amount of semantically and syntactically anno-tated sentences endowed with information about the valences (com-binatorial possibilities) of the items derived from annotated contem-porary English corpus Among the semantic domains covered thereare also emotion and cognition (Baker et al 1998)For the Italian language Lenci et al (2012) developed LexIt a toolthat following the FrameNet approach automatically explores syn-tactic and semantic properties of Italian predicates in terms of distri-butional profiles It performs frame semantic analyses using both LaRepubblica corpus and the Wikipedia taxonomy

2142 Semantic Predicates Theory

The Lexicon-Grammar framework offers the opportunity to creatematches between sets or subsets of lexico-syntactic structures andtheir semantic interpretations The base of such matches is the con-nection between the arguments selected by a given predicative itemlisted in our tables and the actants involved by the same semanticpredicate According with (Gross 1981 p 9)

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 31

ldquoSoit Sy lrsquoensemble des formes syntaxiques drsquoune langue []Soit Se un ensemble drsquoeacuteleacutements de sens les eacuteleacutements de Syseront associeacutes agrave ceux de Se par des regravegles drsquointerpreacutetation[] nous appellerons actants syntaxiques les sujet et com-pleacutement(s) du verbe tels qursquoils sont deacutecrits dans Sy nous ap-pelons arguments les variables des preacutedicats seacutemantiquesDans certains exemples il y a correspondance biunivoqueentre actants et arguments entre phrase simple et preacutedi-catrdquo12

This is the basic assumption on which the Semantic Predicates the-ory has been built into the LG framework It postulates a complexparallelism between the Sy and the Se that differently from other ap-proaches can not ignore the idiosyncrasies of the lexicon since ldquoilnrsquoexiste pas deux verbes ayant le mecircme ensemble de proprieacuteteacutes syntax-iquesrdquo (Gross 1981 p 10) Actually the large number of structures inwhich the words can enter underlines the importance of a sentimentlexical database that includes both semantic and (lexicon dependent)syntactic classification parameters

Therefore the possibility to create intuitive semantic macro-classifications into the LG framework does not contrast with the au-tonomy of the semantics from the syntax (Elia 2014b)In this thesis we will refer to following three different predicates all ofthem connected to the expression of subjectivity

Sentiment (eg odiare to hate) that involves the actants experi-encer and stimulus

Opinion (eg difendere to stand up for) that implicates the ac-tants opinion holder and target of the opinion

12 ldquoLet Sy be the set of syntactic forms of a language [] Let it Se be the set of semantic elementsthe elements from Sy will be associated to the ones of Se through interpretation rules [] We will callsyntactic actants the subject and the complement(s) of the verb in the way in which they are describedin Sy we will call arguments the variables of semantic predicates In some examples there is a one-to-one correspondence between actants and arguments between simple sentence and predicaterdquo Authortranslation

32 CHAPTER 2 THEORETICAL BACKGROUND

Figure 23 LG syntax-semantic interface

Physical act (eg baciare to kiss) that evokes the actants patientand agent

As an example the predicate in the sentence (1) from the LG class43 will be associated to a Predicate with two variables described bythe mathematical function Caus(ssenth)

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]]ldquoMaria hates the smokerdquo

The rules of interpretations which have been followed are

1 the experiencer (h in the Se) corresponds to the formal subject(N0 in Sy)

2 the sentiment stimulus (t in the Se) is the human complement(N1 in Sy)

As shown in the few examples below the syntactic transformationsin which the same predicate is involved preserve the role played by itsarguments

Nominalization

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 33

(1b) [Sentiment [ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

Adjectivalization

(1c) [Sentiment [Stimulusil fumo] egrave odioso per [ExperiencerMaria]]ldquoThe smoke is hateful for Mariardquo

Passivization

(1d) [Sentiment [Stimulusil fumo] egrave odiato da [ExperiencerMaria]]ldquoThe smoke is hated by Mariardquo

Repositioning

(1e) [Sentiment egrave il [Stimulusil fumo] che [ExperiencerMaria] odia]ldquoIt is the smoke that Maria hatesrdquo

As concerns the link among actants and arguments the back-ground of the research is the LEG-Semantic Role Labeling system ofElia (2014b) with particular reference to its psychological evaluativeand bodily predicates The granularity of the verb properties in thisSRL system are evidence of the strong dependence of the syntax fromthe lexicon and proof of its absolute independence from semanticsSpecial kinds of Semantic Predicates have been already used in NLPapplications into a Lexicon-Grammar context we mention Vietri(2014a) Elia et al (2010) Elia and Vietri (2010) that formalized andtested the Transfer Predicates on the Italian Civil Code Elia et al(2013) that focused on the Spatial Predicates and Maisto and Pelosi(2014b) Pelosi (2015b) Elia et al (2015) that exploited the Psycholog-ical Semantic Predicates for Sentiment Analysis purposes

34 CHAPTER 2 THEORETICAL BACKGROUND

22 The Italian Module of Nooj

Nooj is the NLP tool used in this work for both the language formal-ization and the corpora pre-processing and processing at the ortho-graphical lexical morphological syntactic and semantic levels (Sil-berztein 2003)Among the Nooj modules that have been developed for more thantwenty languages by the international Nooj community the ItalianLingware has been built by the Maurice Gross Laboratory from theUniversity of SalernoThe Italian module for NooJ can be any time integrated with other adhoc resources in form of electronic dictionaries and local grammarsThe freely downloadable Italian resources13 include

bull electronic dictionaries of simple and compound words

bull an electronic dictionary of proper names

bull an electronic dictionary of toponyms

bull a set of morphological grammars

bull samples of syntactic grammars

This knowledge and linguistic based environment is not in contrastwith hybrid and statistical approaches that often require annotatedcorpora for their testing stagesThe Nooj annotations can go through the description of simple wordforms multiword units and even discontinuous expressionsFor a detailed description of the potential of the Italian module ofNooj see Vietri (2014c)

13wwwnooj-associationorg

22 THE ITALIAN MODULE OF NOOJ 35

221 Electronic Dictionaries

If we tried to digit on a search engine the key-phrase ldquoelectronic dic-tionaryrdquo the first results that would appear in the SERP would concernCD-ROM dictionaries or machine translatorsActually electronic dictionaries (usable with all the computerizeddocuments for the text recognition for the information retrieval andfor the machine translation) should be conceived as lexical databasesintended to be used only by computer applications rather than by awide audience who probably wouldnrsquot be able to interpret data codesformalized in a complex wayIn addition to the target the differences between the two types of dic-tionary can be summarized in three points

Completeness human dictionaries may overlook information thatthe human users could extract from their encyclopedic knowl-edge Electronic dictionaries instead must necessarily be ex-haustive They cannot leave anything to chance the computermustnrsquot have possibility of error

Explanation human dictionaries can provide part of the informationimplicitly because they can rely on human userrsquos skills (intuitionadaptation and deduction) Electronic dictionaries on the con-trary must respect this principle the computer can only performfully explained symbols and instructions

Coding unlike human dictionaries all the information provided intothe electronic dictionaries related to the use of software for auto-matic processing of data and texts must be accurate consistentand codified

Joining all these features enlarges the size and magnifies the com-plexity of monolingual and bilingual electronic dictionaries whichare for this reason always bigger than the human onesThe Italian electronic dictionaries system based on the Maurice

36 CHAPTER 2 THEORETICAL BACKGROUND

Grossrsquos Lexicon-Grammar has been developed at the University ofSalerno by the Department of Communication Science it is groundedon the European standard RELEX

222 Local grammars and Finite-state Automata

Local grammars are algorithms that through grammatical morpho-logical and lexical instructions are used to formalize linguistic phe-nomena and to parse texts They are defined ldquolocalrdquo because despiteany generalization they can be used only in the description and anal-ysis of limited linguistic phenomenaThe reliability of ldquofinite state Markov processesrdquo (Markov 1971) hadalready been excluded by (Chomsky 1957 p 23-24) who argued that

ldquo() there are processes of sentence-formation that finitestate grammars are intrinsically not equipped to handle Ifthese processes have no finite limit we can prove the literalinapplicability of this elementary theory If the processeshave a limit then the construction of a finite state grammarwill not be literally out of the question since it will be possi-ble to list the sentences and a list is essentially a trivial finitestate grammar But this grammar will be so complex that itwill be of little use or interestrdquo

and remarked that

ldquoIf a grammar does not have recursive devices () it will beprohibitively complex If it does have recursive devices ofsome sort it will produce infinitely many sentencesrdquo

Instead Gross (1993) caught the potential of finite state machinesespecially for those linguistic segments which are more limited from acombinatorial point of viewThe Nooj software actually is not limited to finite-state machines andregular grammars but relies on the four types of grammars of the

22 THE ITALIAN MODULE OF NOOJ 37

Figure 24 Deterministic Finite-State Automaton

Figure 25 Non deterministic Finite-State Automaton

Figure 26 Finite-State Transducer

Figure 27 Enhanced Recursive Transition Network

38 CHAPTER 2 THEORETICAL BACKGROUND

Chomsky hierarchy In fact as we will show in this research it is ca-pable to process context free and sensitive grammars and to performtransformations in cascadeFinite-State Automata (FSA) are abstract devices characterized by a fi-nite set of nodes or ldquostatesrdquo (Si) connected one another by transitions(tj) that allow us to determine sequences of symbols related to a par-ticular path (see Figure 24)These graphs are read from left to right or rather from the initial state(S0) to the final state (S3) (Gross 1993) FSA can be deterministic suchas the one in Figure 24 or non deterministic if they activate morethen one path as in Figure 25 A Finite-State Transducer (FST) tans-duces the input symbols (Sii) into the output ones (Sio) as in Figure26 A Recursive Transition Network (RTN) is a grammar that containsembedded graphs (the S3 node in Figure 27) An Enhanced RecursiveTransition Network (ERTN) is a more complex grammar that includesvariables (V) and constraints (C) (Silberztein 2003 see Figure 27)For simplicity from now on we will just use the acronym FSA in orderto indicate all the kinds of local grammars used to formalize differentlinguistic phenomenaIn general the nodes of the graphs can anytime recall entries or se-mantic and syntactic information from the electronic dictionariesFSA are inflectional or derivational if the symbols in the nodes aremorphemes syntactic if the symbols are words

Chapter 3

SentIta Lexicon-Grammarbased Sentiment Resources

31 Lexicon-Grammar of Sentiment Expressions

The LG theoretical framework with its huge collections and catego-rization of linguistic data in the form of LG tables built and refinedover years by its researchers offers the opportunity to systematicallyrelate a large number of semantic and syntactic properties to the lex-icon The complex intersection between semantics and syntax thegreatest benefit of this approach corrects the worse deficiency ofother semantic strategies proposed in the Sentiment Analysis litera-ture the tendency to group together only on the base of their meaningwords that have nothing in common from the syntactic point of view(Le Pesant and Mathieu-Colas 1998)1

1 On this purpose (Elia 2013 p 286) illustrates that

ldquo() the semantic intuition that drives us to put together certain predicates and theirargument is not correlated to the set of syntactic properties of the verbs nor is it laquohelpedraquoby it except in a very superficial way indeed we might say that the granular nature of thesyntax in the lexicon would be an obstacle for a mind that is organized in an efficient and

39

40 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ldquoClasses drsquoobjetsrdquo is the expression introduced by Gross (1992a) to in-dicate the linguistic device that allow the language formalization fromboth a syntactic and semantic point of view Arguments and Predi-cates are classified into subsets which are semantically consistent andthat also share lexical and grammatical properties (De Bueriis andElia 2008)According with (Gross 2008 p 1)

ldquoSi un mot ne peut pas ecirctre deacutefini en lui-mecircme crsquoest-agrave-direhors contexte mais seulement dans un environnement syn-taxique donneacute alors le lexique ne peut pas ecirctre seacutepareacute de lasyntaxe crsquoest-agrave-dire de la combinatoire des mots La seacuteman-tique nrsquoest pas autonome non plus elle est le reacutesultat des eacuteleacute-ments lexicaux organiseacutes drsquoune faccedilon deacutetermineacutee (distribu-tion) Qursquoil est soit ainsi est confirmeacute par les auteurs de dic-tionnaires eux-mecircmes qui timidement et sans aucune meacuteth-ode notent pour un preacutedicat donneacute le ou les arguments quipermettent de seacuteparer un emploi drsquoun autre Pas de seacuteman-tique sans syntaxe donc crsquoest-agrave-dire sans contexterdquo2

SentIta is a sentiment lexical database that directly aims to apply theLexicon-Grammar theory starting from its basic hypothesis the min-imum semantic units are the elementary sentences not the words(Gross 1975)Therefore in this work the lemmas collected into the dictionaries andtheir Semantic Orientations are systematically recalled and computedinto a specific context through the use of local grammars On the baseof their combinatorial features and co-occurrences contexts the Sen-

rigorous logic wayrdquo

2ldquoIf a word can not be defined in itself namely out of context but only into a specific syntactic environ-ment then the lexicon can not be separated from the syntax that is to say from the word combinatoricsSemantics is not autonomous anymore it is the result of lexical elements organized into a determinedway (distribution) This is confirmed by the dictionary authors themselves that timidly and without anymethod note for a specific predicate the argument(s) that allow to separate one use from another Nosemantics without syntax that means without contextrdquo Authorrsquos translation

31 LEXICON-GRAMMAR OF SENTIMENT EXPRESSIONS 41

tIta lexical items can take the following shapes (Buvet et al 2005 Elia2014a)

operators the predicates that can be verbs nouns adjectives ad-verbs multiword expressions prepositions and conjunctions

arguments the predicate complements that can be nominal andprepositional groups or entire clauses

As already specified in Section 22 the sentiment words and other ori-ented Atomic Linguistic Units (ALU) are listed into Nooj electronicdictionaries while the local grammars used to manipulate their po-larities are formalized thanks to Finite State AutomataElectronic dictionaries have then been exploited in order to list andto semantically and syntactically classify into a machine readableformat the sentiment lexical resources The computational powerof Nooj graphs has instead been used to represent the accep-tancerefuse of the semantic and syntactic properties through the useof constraint and restrictionsChapter 2 underlined the role played in the elementary sentences bythe Predicates and the one assumed by its arguments According toLe Pesant and Mathieu-Colas (1998)

ldquoLa structure preacutedicatarguments semble plus opeacuteratoire queles deacutecoupages binaires classiques (sujetpreacutedicat)[] Crsquoest le preacutedicat qui deacutetermine le nombre de positionsconstitutives de la phraserdquo 3

Predicates must not be identified only with the category of verbs butalso with predicative nouns and adjectives that enter into supportverb structuresIn the subgroup of sentiment expressions starting from their syntacticstructures Gross identified two semantic types

3The predicateargument structure seems to be more operating than the classical binary subdivision(subjectpredicate) [] It is the predicate that determines the number of positions that constitute thesentence Authorrsquos translation

42 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1 sentences with two arguments where one is the person and theother is the feeling (eg Maria egrave angosciata ldquoMaria is anguishedrdquo)

2 sentences with three arguments that differs from the first one forthe presence of a stimulus that triggers the feeling into anotherperson (eg Gianni angoscia Maria ldquoGianni anguishes Mariardquo)

The notation for their representation is the same as the one usedfor the Semantic Predicates (Gross 1981) The type 1 can be formal-ized by the following formula

1 P(senth) eg P(angoscia Maria)

in which the variable h is function of a sentiment Sent through a pred-icative relationshipThe type 2 instead can be expressed by the following formula

2 Caus(s sent h) eg Caus(Gianni angoscia Maria)

in which a sentiment Sent is caused by a stimulus s on a person hBoth of them are outcomes of the following assumption ldquoun senti-ment est toujours attacheacute agrave la personne qui lrsquoeacuteprouverdquo4 (Gross 1995p 1) At first glance the problem connected to sentiment expressionsseems to be easy to define and solve Actually upon a deeper analysisit presents its challenging aspects the quantity and the variability ofsentiment expressions which can vary both from a transformationaland from a morpho-syntactic point of view As displayed below a sen-timent expressed by a word can take the shape of verbs nouns ad-jectives or adverbs (Gross 1995 DrsquoAgostino 2005 DrsquoAgostino et al2007)

V angosciare ldquoto anguishrdquo angosciarsi ldquoto become anguishedrdquo

N angoscia ldquoanguishrdquo

A angosciante ldquoanguishingrdquo angosciato angoscioso ldquoanguishedrdquo

ADV angosciatamente angosciosamente ldquowith anguishrdquo4ldquoA sentiment is always attached to the person that feels itrdquo Authorrsquos translation

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 43

Such groups of words related one another by the derivational mor-phology under a common root (eg angosc-) represent from the LGpoint of view classes of paraphrastic equivalence or also paraphras-tic constellations (DrsquoAgostino 1992) These concepts are crucial whenapplied to the Sentiment Analysis field because the connection of thewords displayed above is related to a formal principle that regards thesentence structure and also a semantic principle that concerns thedistributional properties and the semantic roles of the arguments se-lected by the predicates (DrsquoAgostino 1992)Nevertheless from a syntactic point of view Gross (1995) noticed thatthe terms of sentiment can occupy all the different nominal adjecti-val verbal or adverbial positions within the same sentence structure

32 Literature Survey on Subjectivity Lexicons

The most used approaches in the Sentiment Analysis field can be di-vided and summarized in three main lines of research lexicon-basedmethods learning and statistical methods and hybrid methodsThe first ones coincide with the strategy that has been chosen to carryout the present workLexicon-based approaches always start from the following assump-tion the text sentiment orientation comes from the semantic orien-tations of words and phrases contained in itThe most commonly used SO indicators are adjectives or adjectivephrases (Hatzivassiloglou and McKeown 1997 Hu and Liu 2004Taboada et al 2006) but recently it became really common the useof adverbs (Benamara et al 2007) nouns (Vermeij 2005 Riloff et al2003) and verbs as well (Neviarouskaya et al 2009a)Hand-built lexicons are definitely more accurate than theautomatically-built ones especially in cross-domain SentimentAnalysis tasks Nevertheless to manually draw up a dictionary is

44 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

considered a painstakingly time-consuming activity (Taboada et al2011 Bloom 2011) This is why the presence of a large number ofstudies on automatic polarity lexicons creation and propagation canbe observed in literature On this topic it must be said that automaticdictionaries seem to be more unstable but usually larger than themanually built ones (see Tables 31 and 32) Size anyway does notmean always quality it is common for these large dictionaries tohave scarcely detailed information a large amount of entries coulddenote fewer details in description eg the Maryland dictionary(Mohammad et al 2009) does not specify the entriesrsquo part of speechor instead could mean more noiseAmong the state of the art methods used to build and test thosedictionaries we mention the Latent Semantic Analysis (LSA) (Lan-dauer and Dumais 1997) bootstrapping algorithms (Riloff et al2003) graph propagation algorithms (Velikovich et al 2010 Kaji andKitsuregawa 2007) conjunctions (and or but) and morphologicalrelations between adjectives (Hatzivassiloglou and McKeown 1997)Context Coherency (Kanayama and Nasukawa 2006a) distributionalsimilarity (Wiebe 2000) etc5Word Similarity is a very frequently used method in the dictionarypropagation over the thesaurus-based approaches Examples are theMaryland dictionary created thanks to a Roget-like thesaurus and ahandful of affixe (Mohammad et al 2009) and other lexicons basedon WordNet like SentiWordNet built on the base of quantitativeanalysis of glosses associated to synsets (Esuli and Sebastiani 20052006a) or other lexicons based on the computing of the distancemeasure on WordNet (Kamps et al 2004 Esuli and Sebastiani 2005)Pointwise Mutual Information (PMI) using Seed Words has beenapplied to sentiment lexicon propagation by Turney (2002) Turneyand Littman (2003) Rao and Ravichandran (2009) Velikovich et al(2010) Gamon and Aue (2005) Seed words are words which arestrongly associated with a positivenegative meaning such as eccel-

5More details about the lexicon propagation will be given in Section 351

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 45

lente (ldquoexcellentrdquo) or orrendo (ldquohorriblerdquo) by which it is possible tobuild a bigger lexicon detecting other words that frequently occuralongside them It has been observed indeed that positive wordsoccur often close to positive seed words whereas negative words arelikely to appear around negative seed words (Turney 2002 Turneyand Littman 2003)PMI could be easily calculated in the past using the Web as a corpusthanks to the AltaVistarsquos NEAR operator Today one must be contentwith the Google AND operatorLearning and statistical methods for Sentiment Analysis intent usuallymake use of Support Vector Machine (Pang et al 2002 Mullen andCollier 2004 Ye et al 2009) or Naiumlve Bayes classifiers (Tan et al 2009Kang et al 2012)In the end as regards the hybrid methods it has to be cite the workof Read and Carroll (2009) Li et al (2010) Andreevskaia and Bergler(2008) Dang et al (2010) Dasgupta and Ng (2009) Goldberg and Zhu(2006) Prabowo and Thelwall (2009) and Wan (2009)6A brief summary about the nature and the size of the most popularlexicons for the Sentiment Analysis is given in Tables 31 and 32A clarification needs to be done about the distinction between Polar-

ity lexicons and Affect lexicon The first ones are used in subjectivitydetection or in polarity and intensity classification systems while thelatter going beyond the dichotomy positivenegative refer to a morefine-grained emotion classificationIn the following paragraphs we will firstly describe some of the mostfamous Polarity lexicon and then we will briefly cite some databasesfor affect classificationAnother distinction will be made among the resources conceived forthe English language and the ones created for other languages TheItalian language will especially be considered

6These contributions will be deepened in Section 511

46 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Dictionary Author

Year

En

trie

s

Lan

guag

e

SO-CAL lexicon Taboada 2011 5000+ EngAFINN-111 Hansen 2011 2400+ EngSentilex Silva 2010 7000+ PortQ-WordNet 30 Agerri et al 2010 15500+ EngDAL-2009 Whissell 2009 8700+ EngMPQA lexicon Wilson et al 2005 8000+ EngHu-Liu lexicon Hu et al 2004 6800+ EngHatzivassiloglou lexicon Hatzivassiloglou 1997 15400+ pairs EngGeneral Inquirer Stone 1961 9800+ Eng

Table 31 Manually built Polarity Lexicons

Dictionary Author

Year

En

trie

s

Lan

guag

eSentix Basile Nissim 2013 59700+ ItaSentiWordNet Baccianella et al 2010 38000+ EngWeb-generated lexicon Velikovich et al 2010 178000+ EngMaryland dictionary Mohammad et al e 2009 76000+ Eng

Table 32 (Semi-)Automatically built Polarity Lexicons

321 Polarity Lexicons

SO-CAL dictionary Taboada et al (2011) due to the low stability ofthe automatically generated lexical databases manually devel-oped the SO-CAL dictionary They hand tagged with an evalu-ation scale that ranged from +5 to -5 the semantically orientedwords they found into a variety of sources

bull the multi-domain collection of 400 reviews belonging to dif-ferent categories described in Taboada and Grieve (2004)Taboada et al (2006)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 47

bull 100 movie reviews from the Polarity Dataset of Pang et al(2002) Pang and Lee (2004)

bull the whole General Inquirer dictionary (Stone et al 1966)

The result was a dictionary of 2252 adjectives 1142 nouns 903verbs and 745 adverbs The adverb list has been automaticallygenerated by matching adverbs ending in ldquo-lyrdquo to their poten-tially corresponding adjective Moreover also a set of multiwordexpressions (152 phrasal verbs eg ldquoto fall apartrdquo and 35 intensi-fier expressions eg ldquoa little bitrdquo) have been taken into accountIn case of overlapping between a simple word (eg ldquofunrdquo +2) anda multiword expression (eg ldquoto make fun ofrdquo -1) with differentpolarity the latter possess the higher priority in the annotationprocess

AFINN It is a list of English words that associates 1446 words with avalence between +5 (positive) and -5 (negative) (Hansen et al2011) AFINN-111 that is its newest version includes 2477words and phrases in its list The drawing of the list started froma a set of obscene words and then gradually extended by examin-ing Twitter posts This has lead to a bias of 65 towards negativewords compared to positive words (Nielsen 2011)

Q-WordNet It is a lexical resource consisting of WordNet senses auto-matically annotated by positive and negative polarity (Agerri andGarciacutea-Serrano 2010) It is grounded on the hypothesis that if apositive synset is matched in a gloss then its synset can be alsoannotated as positive Exceptions are the cases under the scopeof a negation that are classified as the opposite of the matchedlemmaWhat differentiates Q-WordNet from SentiWordNet is the factthat positivenegative labels are attributed to word senses at acertain level by means of a graded classification

Multi-Perspective Question Answering Lexicon The MPQA subjectiv-

48 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ity lexicon (Wiebe et al 2004 Wilson et al 2005) is comprised by8000+ subjectivity clues (words and phrases that may be used toexpress private states) annotated by type (strongsubj or weakly-subj) and prior polarity Subjectivity clues and contextual fea-tures have been learned from two different kinds of corpora asmall amount of data manually annotated at the expression-level from the Wall Street Journal and newsgroup data and alarge amount of data with existing document-level annotationsfrom the Wall Street Journal The focus is on three types of sub-jectivity clues

bull hapax legomena words that appeared just once in the cor-pus

bull collocations identified through fixed n-grams

bull adjective and verb features identified using the results of amethod for clustering words according to distributional sim-ilarity

Hu-Liu lexicon The Hu and Liu (2004) wordlist comprises around6800 English items (2006 positive and 4783 negative) includ-ing slang misspellings morphological variants and social mediamark up

Hatzivassiloglou Lexicon Hatzivassiloglou and McKeown (1997)starting from a list of 1336 manually labeled adjectives (657 pos-itive and 679 negative terms) demonstrated that conjunctionsbetween adjectives can provide indirect information about theirorientation In detail the authors extracted the conjunctionsthrough a two-level finite-state grammar able to cover modifi-cation patterns and noun-adjective apposition and tested theirmethod on 21 million word from the Wall Street Journal corpusThis way they collected 13426 conjunctions of adjectives furtherexpanded to a total of 15431 conjoined adjective pairs

Dictionary of Affect in Language The Whissel (1989) Whissell (2009)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 49

Dictionary of Affect (DAL) lists about 4500 English words Itsmore recent version (Whissell 2009) is provided with 8742 wordsmanually annotated fro their Activation Evaluation ImageryWhissell (2009)

General Inquirer It is an IBM 7090 program system developed at Har-vard in the spring of 1961 for content analysis research problemsin behavioral sciences (Stone et al 1966 Stone and Hunt 1963)It is based on the idea that through the analysis of many vari-ables at once it is often possible to discover trends that can ap-pear ldquolatent to casual observationThe description of the ldquosystematicrdquo content analysis processesmust be as ldquoobjectiverdquo as possible the features of texts must beldquomanifestrdquo and the results ldquoquantitativerdquoThe lexicon attaches syntactic semantic and pragmatic infor-mation to part-of-speech tagged words Among others the cate-gories related to polarity measures are the ones pertaining to thethree semantic dimensions of Osgood (1952)

Positive 1915 words of positive outlook

Negative 2291 words of negative outlook

Strong 1902 words implying strength

Weak 755 words implying weakness

Active 2045 words implying an active orientation

Passive 911 words indicating a passive orientation7

SentiWordNet Esuli and Sebastiani (2006a) developed the Senti-WordNet 10 lexicon based on WordNet 20 (Miller 1995) by auto-matically associating each WordNet synset to three scores Obj(s)for objective terms and Pos(s) and Neg(s) for positive and nega-tive terms

7Table 31 refers to the General Inquirer size only in terms of sentiment words

50 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Every score ranges from 00 to 10 and their sum is 10 for eachsynset The values are determined on the base of the propor-tion of eight ternary classifiers (with similar accuracy levels butdifferent classification behavior) that quantitatively analyze theglosses associated with every synset and assign them the properlabelWhat differentiates SentiWordNet from other sentiment lexiconsis the fact that the weighting process focuses on synonym setsrather than on simple terms The researchers intuitively ob-serve that different senses of the same term could have differentopinion-related propertiesSentiWordNet 30 (Baccianella et al 2010) improves SentiWord-Net 10 The main differences between them are the version ofWordNet they annotate (30 for SentiWordNet 30) and the algo-rithm used to annotate WordNet that in the 30 version alongwith the semi-supervised learning step also includes a random-walk step that perfects the scores

Velikovich Web-generated lexicon This dictionary of sentiment thathas been built through graph propagation algorithms and lexi-cal graphs counts 178104 entries which include not only sim-ple words but also spelling variations slang vulgarity and mul-tiword expressions (Velikovich et al 2010)

Maryland dictionary The procedure on which Mohammad et al(2009) built the Maryland lexicon can be split into two steps theidentification of a seed set of positive and negative words and thepositive and negative annotation of the words synonymous in aRoget-like thesaurusEleven antonym-generating affix patterns have been used to gen-erate overtly marked words and their unmarked counterpartsThis way such patterns generated 2692 pairs of valid Englishwords listed in the Macquarie ThesaurusThe main innovation introduced by Mohammad et al (2009) is

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 51

the autonomy from the manual annotation of any term in thedictionary even if it could be used when available to improveboth the correctness and the coverage of its entries

322 Affective Lexicons

SentiSense SentiSense (de Albornoz et al 2012) is a semi-automatically developed affective lexicon which consists of5496 words and 2190 synsetsSimilarly to SentiWordNet or WordNet-Affect it associates emo-tional meanings or categories to concepts (instead of terms) fromthe WordNet lexical database and can be used for both polarityand intensity classification and emotion identificationSynsets are labeled with a set of 14 emotional categories whichare also related by an antonym relationships Examples are dis-gustlike lovehate sadnessjoy) ambiguous anger anticipa-tion calmness despair disgust fear hate hope joy like lovesadness surprise

SentiFul and the Affect Database Neviarouskaya et al (2009b 2011)developed the SentiFul lexicon by automatically expanding thecore of sentiment lexicon through many different strategies

bull synonymy relation

bull direct antonymy relation

bull hyponymy relations

bull derivation and scoring of morphologically modified words

bull compounding of sentiment-carrying base components

In summary 4190 new words connected to the known onesthrough semantic relations have been automatically extractedfrom WordNet and 4029 lemmas have been automatically de-

52 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

rived with the morphological method The starting point of thework has been the Affect Database of Neviarouskaya et al (2007)that contained 364 emoticons 337 popular acronyms and abbre-viations (eg bl ldquobelly laughingrdquo) words standing for commu-nicative functions (eg ldquowowrdquo ldquoouchrdquo etc) 112 modifiers (egldquoveryrdquo ldquoslightlyrdquo etc) and 2438 direct and indirect emotion-related entries (1627 of which taken from WordNet-Affect)In the Affect Database emotion categories (anger disgust fearguilt interest joy sadness shame an surprise) and intensity (from00 to 10) was manually assigned to each entry by three indepen-dent annotators

Appraisal Lexicon The appraisal lexicon (Argamon et al 2009) hasbeen built through semi-supervised learning on a set of WordNetglosses containing adjectives and adverbs among which only asmall subset of the training data was manually labelledThe construction of the lexicon is grounded on the AppraisalTheory that is a linguistic approach for the analysis of subjec-tive language In this framework several sentiment-related fea-tures are assigned to relevant lexical items The feature set thatallows more subtle distinctions than the most used binary classi-fication PositiveNegative includes the labels orientation (Posi-tive or Negative) attitude type (Affect Appreciation Judgment)and force (Low Median High or Max)

WordNet-Affect WordNet-Affect is another extension of WordNetfor the lexical representation of affective knowledge (Strappar-ava et al 2004) Thanks to the synset model it puts in relationa set of affective concepts with a number of affective words Ina first step in the WordNet-Affect creation the affective corehas been created starting from almost 2000 terms directly orindirectly referring to mental states Lexical and affective in-formation have then been added to each item by creating forthem specific frames In the end the senses of terms have been

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 53

mapped to their respective synsets in WordNet The synsets havebeen specialized with a hierarchically organized set of emotionalcategories (namely emotion mood trait cognitive state physi-cal state edonic signal emotion-eliciting situation emotional re-sponse behaviour attitude sensation) Furthermore the emo-tional valences have been distinguished by four additional a-labels positive negative ambiguous and neutral (Strapparavaet al 2006)WordNet-Affect contains 2874 synsets and 4787 words (Strap-parava et al 2004)

323 Subjectivity Lexicons for Other Languages

3231 Italian Resources

The largest part of the state of the art works on polarity lexicons forSentiment Analysis purposes focuses on the English language ThusItalian lexical databases are mostly created by translating and adapt-ing the English ones SentiWordNet and WordNet-Affect among oth-ers Basile and Nissim (2013) merged the semantic information be-longing to existing lexical resources in order to obtain an annotatedlexicon of senses for Italian Sentix (Sentiment Italian Lexicon) Ba-sically MultiWordNet (Pianta et al 2002) the Italian counterpart ofWordNet (Miller 1995 Fellbaum 1998) has been used to transfer po-larity information associated to English synsets in SentiWordNet toItalian synsets thanks to the multilingual ontology BabelNet (Navigliand Ponzetto 2012)Every Sentixrsquos entry is described by information concerning its part ofspeech its WordNet synset ID a positive and a negative score fromSentiWordNet a polarity score (from -1 to 1) and an intensity score(from 0 to 1)The dictionary contains 59742 entries for 16043 synsets Details

54 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

about its composition are reported in Table 33

PoS Lemmas Synsetsnoun 52257 12228adjective 3359 1805verb 2775 1260adverb 1351 750all 59742 16043

Table 33 Sentix Basile and Nissim (2013)

Moreover Steinberger et al (2012) verified a triangulation hypoth-esis for the creation of sentiment dictionaries in many languagesnamely English Spanish Arabic Czech French German Russian andalso ItalianThe starting point is the semi-automatic generation of two pilot sen-timent dictionaries for English and Spanish that have been automat-ically transposed into the other languages by using the overlap of thetranslations (triangulation) and then manually filtered and expandedAs regards the Italian lexicon the authors collected 918 correctly tri-angulated terms and 1500 correctly translated terms from the startingdictionariesHernandez-Farias et al (2014) achieved good results in the Sen-tiPolC 2014 task by semi-automatically translating in Italian differ-ent lexicons namely SentiWordNet Hu-Liu Lexicon AFINN LexiconWhissel Dictionary etcBaldoni et al (2012) proposed an ontology-driven approach to Sen-timent Analysis where tags and tagged resources are related to emo-tions They selected representative Italian emotional words and usedthem to query MultiWordNet The synsets connected to these lem-mas were then processed with WordNet-Affect in order to populatethe emotion ontology only with the words belonging to synsets thatrepresented affective information The resulting ontology containsabout 450 Italian words referring to the 87 emotional categories of On-

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 55

toEmotion an ontology conceived for the categorization of emotion-denoting words Furthermore thanks to the SentiWordNet databaseevery synset has been associated to the neutral positive or negativescores

3232 Sentiment Lexical Resources for Different Languages

SentiLex-PT (Silva et al 2010 2012) is a sentiment lexicon for Por-tuguese made up of 7014 lemmas distributed in this way

bull 4779 adjectives

ndash eg castigadoPoS=AdjTG=HUMN0POLN0=-1ANOT=JALC

bull 1081 nouns

ndash eg aberraccedilatildeoPoS=NTG=HUMN0POLN0=-1ANOT=MAN

bull 489 verbs

ndash eg enganarPoS=VTG=HUMN0N1POLN0=-1POLN1=0ANOT=MAN

bull 666 verb-based idiomatic expressions

ndash eg engolir em secoPoS=IDIOMTG=HUMN0POLN0=-1ANOT=MAN

The sentiment entries are all human nouns modifiers compiled fromdifferent publicly available corpora and dictionariesThe tagset used to describe the attributes of the lemmas is explainedbelow

bull part-of-speech (ADJ N V and IDIOM)

bull polarity (POL) that ranges between 1 (positive) and -1 (negative)

bull target of polarity (TG) which corresponds to a human subject(HUM)

bull polarity assignment (ANOT) that specifies if the annotation hasbeen performed manually (MAN) or automatically by the Judg-ment Analysis Lexicon Classifier (JALC)

56 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Differently from Silva et al (2010 2012) that focused on thedomain-specific features of the opinion lexicon of social judgmentSouza et al (2011) proposed for the Portuguese language a domain-independent lexicon composed by 7077 polar words (adjectivesverbs and nouns) and expressionsAs regards the dictionary population the authors applied three differ-ent methods

bull corpus based approach applied on 1317 documents annotatedusing the Pointwise Mutual Information

bull thesaurus based approach on 44077 words and their semanticrelations from the TEP thesaurus (Maziero et al 2008)

bull translation based from the Liursquos English Opinion Lexicon (Huand Liu 2004)

Other contribution that deserve to be cited are Perez-Rosas et al(2012) for Spanish Mathieu (2006 1999a 1995) for French Clematideand Klenner (2010) Waltinger (2010) and Remus et al (2010) for Ger-man Abdul-Mageed et al (2011) Abdul-Mageed and Diab (2012) forArabic and Chetviorkin and Loukachevitch (2012) for RussianAs regards Multilingual Sentiment Analysis we can mention Balahurand Turchi (2012) Steinberger et al (2012) Pak and Paroubek (2011)and Denecke (2008) among others

33 SentIta and its Manually-built Resources

In this Section we will go in depth in the description of SentIta the LGbased Sentiment lexicon for the Italian language that has been semi-automatically built on the base of the richness of both the Italian mod-ule of Nooj and the Italian Lexicon-Grammar resourcesThe tagset used for the Prior Polarity annotation Osgood (1952) anno-tation of the resources is composed of four tags

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 57

Lexical Item Translation Tag Description Scoremeraviglioso wonderful +POS+FORTE Intensely Positive +3colto cultured +POS Positive +2rasserenante calming +POS+DEB Weakly Positive +1nuovo new - Neutral 0insapore flavourless +NEG+DEB Weakly Negative -1cafone boorish +NEG Negative -2disastroso disastrous +NEG+FORTE Intensely Negative -3

Table 34 SentIta Evaluation tagset

Lexical Item Translation Tag Description Scoregrande big +FORTE Intense +velato veiled +DEB Weak -

Table 35 SentIta Strength tagset

POS positive

NEG negative

FORTE intense

DEB weak

Such labels if combined together can generate an evaluation scalethat goes from -3 to +3 (see Table 34) and a strength scale that rangesfrom -1 to +1 (see Table 35)Neutral words (eg nuovo ldquonewrdquo with score 0 in the evaluation scale)have been excluded from the lexiconThe main difference between the words listed in the two scales is thepossibility to use them as indicators for the subjectivity detection taskBasically the words belonging to the evaluation scale are ldquoanchorsrdquothat begin the identification of polarized phrases or sentences whilethe ones belonging to the strength scale are just used as intensitymodifiers (see Section 42)

Details about the lexical asset available for the Italian language issummarized in Table 36 Here we report all the items contained in

58 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Category Entries Example TranslationAdjectives 5381 allegro cheerfulAdverbs 3693 tristemente sadlyCompound Adv 774 a gonfie vele full steam aheadIdioms 577 essere in difetto to be in faultNouns 3215 eccellenza excellencePsych Verbs 604 amare to loveLG Verbs 651 prendersela to feel offendedBad words 182 leccaculo arse lickerTot 15077 - -

Table 36 Composition of SentIta

SentIta including the ones automatically derived (namely adverbsand nouns) that will be described in detail in Section 35Because hand-built lexicons are more accurate than the automaticallybuilt ones (Taboada et al 2011 Bloom 2011) we started the creationof the SentIta database with the manual tagging of part of the lemmascontained in the Nooj Italian dictionaries In detail adjectives andbad words have been manually extracted and evaluated starting fromthe Nooj Italian electronic dictionary of simple words preservingtheir inflectional (FLX) and derivational (DRV) properties Moreovercompound adverbs (Elia 1990) idioms (Vietri 1990 2011) psychverbs and other LG verbs (Elia et al 1981 Elia 1984) have beenweighted starting from the Italian LG tables in order to maintain thesyntactic semantic and transformational properties connected toeach one of them

331 Semantic Orientations and Prior Polarities

In the Sentiment Analysis field the Semantic Orientation is a sub-jectivity and an opinion measure that weights the polarity (posi-tivenegative) and the strength (intenseweak) of an opinion Other

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 59

concepts used to refer to SO are sentiment orientation polarity ofopinion or opinion orientation (Taboada et al 2011 Liu 2010)The Prior Polarity instead refers to the individual wordsrsquo SemanticOrientation and differs from the SO because it is always independentfrom the context (Osgood 1952) The second concept generally usedin sentiment dictionary population has been exploited also to manu-ally evaluate the words contained in the simple word dictionary of theItalian module of Nooj (Sdic_itdic)Before the description of the annotation process begins it must beclarified that at this stage of the work the multiple meanings thata single word can possess have not been taken into account In thissense SentIta is different from the resources in which Prior Polaritiesare assigned to ldquoconceptsrdquo rather than to mere words (eg SentiWord-Net) Such procedure gives rise to the need to reconstruct the Prior Po-larities starting from the various word meanings transforming themactually in Posterior Polarities (Gatti and Guerini 2012) An exam-ple is the word ldquocoldrdquo that possesses different meanings and conse-quently different polarities if associated to the words ldquobeerrdquo (with alow temperature neutral) or ldquopersonrdquo (emotionless weakly negative)(Gatti and Guerini 2012)From our point of view this distinction is redundant in a work that in-cludes a syntactic module We strongly support the idea that only thePrior Polarities must be contained in the lexicon and that the contextshould be used to compute them in a parallel module that is the syn-tactic oneIndeed our purpose is to reduce at the same time the presence of am-biguity the computational time and the size of the electronic dictio-naries Therefore the annotators of SentIta simply labeled the wordsof Sdic_it according to the meaning they thought dominant for eachword If out of any context the word ldquocoldrdquo can not be unequivo-cally associated to a positive or negative orientation we preferred toattribute it a neutral polarity

60 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

332 Adjectives of Sentiment

The annotation of the Sdic_it started with 34000+ adjectives byconsidering each word apart from any context The first decisionis binary and regards the choice between the positive or negativepolarity so the fist tags +POS or +NEG (that respectively correspondto the score +2 -2) are attached to the adjective under exam as shownin the following examples

coltoA+FLX=N88+DRV=ISSIMON88+POS

cafoneA+FLX=N88+DRV=ISSIMON88+NEG

A fine grained classification approach must include during thePrior Polarity attribution also the determination of the intensity ofthe oriented words The strength of the SO is therefore definedwith the combined use of the POSNEG tags and the FORTEDEBlabels The last two respectively increase or decrease of one point theintensity of the sentiment items This way as shown in Table 34 wecould take into account three different degrees of positivity and asmuch levels of negativity scoresAs we will explain better in Section 42 we selected among the ad-jectives of Sdic_it a list of about 650 words that did not receive anypolarity tag but that have just been labeled with information abouttheir intensity This is the case of the items reported below which arejust examples from the SentIta Strenght list

grandeA+FLX=N79+DRV=GRANDEN88+DRV=GRANDE-CN79+FORTE

velatoA+FLX=N88+DRV=ISSIMON88+DEB

Such words are not used to locate subjectivity in texts but they canbecome really advantageous when the task is to compute the actualorientation of a phrase The words endowed with the tag +FORTEare called intensifiers and the ones marked with +DEB downtoners(Taboada et al 2011) Their function is respectively to increase or

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 61

decrease the score of the polarized wordsDetails about the distribution of sentiment tag in the adjective dictio-nary are reported in Table 37Table 38 presents a summary in term of percentage values of thecomposition of the adjective dictionary in SentIta In the upper partof the Table percentages are evaluated on the base of SentIta that ac-counts 5381 adjectives which in the Table are indicated by tot ori-ented adj The differences between the positive and negative adjec-tives and the intensity modifiers are shown hereThe coverage of the SentIta adjectives with respect to the whole num-ber of adjectives contained in Sdic_it at a first glance seems to be verylow but if compared with the other part of speech it becomes moresignificantFor a LG treatment of adjectives see Section 335

Score Adjectives Entries+3 +POS+FORTE 111+2 +POS 1137+1 +POS+DEB 110-1 +NEG+DEB 770-2 +NEG 2375-3 +NEG+FORTE 240+ +FORTE 512- +DEB 126

Table 37 SentIta tag distribution in the adjective list

Adjectives Entries Percentage ()tot pos 1358 25 (SentIta)tot neg 3385 63 (SentIta)tot int 638 12 (SentIta)tot oriented adj 5381 16 (Sdic_it)tot neutral adj 28664 84 (Sdic_it)tot adj Sdic_it 34045 100

Table 38 Adjectives percentage values in SentIta

62 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3321 A Special Remark on Negative Words in Dictionaries and Positive Biasesin Corpora

The most important aspect to notice in Table 38 is the fact that thepositive items cover just the 25 of the total amount of oriented ad-jectives while with a percentage of 63 negative adjectives predom-inate the dictionaryJust as in SentIta also other sentiment lexicons present an excess ofnegative items such as the SO-CAL lexicon (Taboada et al 2011)AFINN-111 (Nielsen 2011) and the Maryland dictionary (Mohammadet al 2009)In texts instead exactly the opposite trend can be observed ThePollyanna hypothesis asserts the universal human tendency to usepositive words more frequently than negative ones through the pop-ular claim ldquopeople tend to look on (and talk about) the bright side ofliferdquo (Boucher and Osgood 1969 p 1)The experimental results of Montefinese et al (2014) and Warrineret al (2013) confirmed the so-called linguistic positivity bias theprevalence of positive words in different languagesGarcia et al (2012) also measured this bias by quantifying in the con-text of online written communication both the emotional contentof words in terms of valence and the frequency of word usage in thewhole indexable web They provided strong evidence that words witha positive emotional connotation are used more oftenAs regards the experiment in the Sentiment Analysis field that tried tosolve the positive bias we mention Taboada et al (2011) that uponnoticing the presence of almost twice as many positive as negativewords in their corpus addressed the problem by increasing the SO ofany negative expression by a fixed amount (of 50) Voll and Taboada(2007) overcame the same problem by shifting the numerical cut-offpoint between positive and negative reviewsIn this research we chose to avoid the introduction of unpredictableerrors due to the difficulty to accurately evaluate to what extent posi-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 63

tive and negative items are differently distributed in written texts

333 Psychological and other Opinion Bearing Verbs

In the present Section we will go through the more complex annota-tion of verbs of sentiment from available Lexicon-Grammar matricesWe will return to the adjectives of sentiment in Section 335 where itwill be addressed the issue of the adjectivalization of the psychologicalpredicates

3331 A Brief Literature Survey on the Classification of Psych Verbs

The difference among the direct (Experiencer subject) and reverse (Ex-periencer subject) representations of thematic roles is underlined inthe extensive body of scientific literature on psych verbs eg Chom-sky (1965) Belletti and Rizzi (1988) Kim and Larson (1989) Jackend-off (1990) Whitley (1995) Pesetsky (1996) Filip (1996) Baker (1997)Martiacuten (1998) Arad (1998) Klein and Kutscher (2002) Bennis (2004)Landau (2010)The thematic roles involved are the following

Experiencer the entity or the person that experiences the reaction

Cause ldquothe thing person state of affairs that causes the reaction(Whitley 1995) ldquoor cognitive judgment in the experiencerrdquo (Kleinand Kutscher 2002)

These two main classes8 correspond to three classes proposed byBelletti and Rizzi (1988) in which the reverse representation is splitaccording to the presence of a direct or indirect object mapped withthe role of experiecer

8These classes of psych verbs are also known as the Fear class (direct) and the Frighten class (reverse)(Jackendoff 1990 Pesetsky 1996 Baker 1997) on the base of the syntactic positions occupied by thethematic roles of the experiencer and the cause (theme)

64 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class I verbs which have the experiencer as subject and the cause asdirect object (eg temere ldquoto be afraid ofrdquo)

Class II verbs which have the cause as subject and the experiencer asdirect object (eg preoccupare ldquoto worryrdquo)

Class III verbs which have the cause as subject and the experienceras indirect object in dative case (eg piacere ldquoto likerdquo9)

Transitivity is taken into account also by Whitley (1995) who in-stead identified four classes of psych verbs for the Spanish

Class I transitive-direct (eg amar ldquoto loverdquo)

Class II intransitive-direct (eg gozar de ldquoto enjoyrdquo)

Class III intransitive-reverse (eg gustar ldquoto likerdquo)

Class IIII transitive-reverse (eg tranquilizar ldquoto appeaserdquo)

3332 Semantic Predicates from the LG Framework

Shifting the focus on the works realized into the Lexicon-Grammarframework we find again the distinction among two types of pos-sible constructions on the base of the syntactic position (subject orcomplement) of the person who feels the sentiment (Mathieu 1999aRuwet 1994) and of course the distinction between the transitiveand intransitive verbsThe (reverse) French verbs of sentiment that enter into the transitivestructure N0 V N1 have been examined by Mathieu (1995)10 In thisstructure N0 the stimulus of the sentiment can be a human concreteor abstract noun or a completive or infinitive clause and N1 theexperiencer of the sentiment is a human noun or also other animated

9Note that if the Italian verb piacere is classified in this class of Belletti and Rizzi (1988) its Englishtranslation ldquoto likerdquo belongs to the first one

10The analysis of Mathieu (1995) is based on the syntactic model of the LG French Table 4

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 65

nouns such as animals or gods (Ruwet 1994 see example 3)

(3)

Luc

Ce t abl eauLe mensong e

Que Luc soi t venuTr embler

i r r i te M ar i e (Mathieu 1995)

ldquo(Luc + This table + The lie + That Luc has come + To tremble) irri-tates Marierdquo

Nevertheless Ruwet (1994) argued that a number of predicatesfrom this class11 select an Num in position N1 that do not play the roleof Experiencer as happens in the example (5) differently from whathappens in (4) According with Ruwet (1994) le ministre of (4) doesnot feel any sentiment he is instead the object of a negative (public)judgment caused by the stimulus expressed by N0

(4) La lecture drsquoHomegravere passionne Maxime (Ruwet 1994)ldquoReading Homer moves Maximerdquo

(5) Ses deacuteclarations intempestives discreacuteditent le ministre (Ruwet 1994)ldquoHis untimely declarations discredit the Ministerrdquo

One of the main purposes of this thesis is to provide a Lexicon-Grammar based database for Sentiment Analysis tools We decided tostart from the LG works on psychological predicates because of theirrichness but we do not look at the Ruwet (1994) objection as a limit forour data collection It offers only a wider range of information whichwe associate to our data Sentiment Analysis does not focus only onmental states but searches for every kind of positive or negative state-ment expressed in raw texts therefore also a positive or negative (pub-lic) judgment represents a valuable resource in this fieldThe hypothesis is that the the Predicates under examination could bedistinguished into three subsets the ones indicating a mental state

11 Among the verbs pointed out by Ruwet (1994) there are abuser aigrir anoblir aplatir avilir bernerciviliser compromettre consacrer corrompre couler deacutedouaner deacutemasquer deacutenaturer etc

66 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

that select the roles experiencer and causer (amare ldquoto loverdquo odiareldquoto haterdquo) the ones connected to the manifestation of judgments that(following the Sentiment Analysis trend in terminology) involve theroles of opinion holder and target (eg difendere ldquoto defendrdquo con-dannare ldquoto condemnrdquo) and the ones related to physical acts that se-lect the roles patient and agent (eg baciare ldquoto kissrdquo schiaffeggiareldquoto slaprdquo)12 The mapping between the syntactic representations ofactants and the semantic representations of arguments is feasible andvaluable but from a LG point of view becomes a little bit more com-plex if compared with the generalizations described in the previousParagraph The big problem aroused by the LG method is that theSyntax-Semantics mapping can not ignore the idiosyncrasies of thelexiconIn the following paragraphs we will show the distribution of Psycho-logical Predicates among the classes 41 42 43 and 43B (3333) and wewill demonstrate that a large number of predicates evoking judgmentphysical positivenegative acts but also psychological states are dis-tributed among other 28 LG classes every one of which possesses itsown definitional syntactic structure The large number of structuresin which these verbs can enter underlines the importance of a senti-ment database that includes both lexiocn-based semantic and syntac-tic classification parameters

3333 Psychological Predicates

Among the verbs chosen for our sentiment lexicon a prominent posi-tion is occupied by the Predicates belonging to the Italian LG classes41 42 43 and 43B that constitute 47 of the total amount of verbsin SentIta (Table 311) From the lexical items contained in such LGclasses a list of 602 entries has been evaluated and hand-tagged withthe same labels used to evaluate the adjectives

12An experiment realized on these frames is reported in Section 53

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 67

In the following Paragraphs we will describe some of the peculiaritiesof the psych verbs from the classes 41 42 43 and 43B questioning inmany cases their psychological nature We will then briefly examinethe presence of subjective verbs in other 28 LG classes

Psych Verb Translation LG Class Scoreangosciare ldquoto anguish 41 -3piacere ldquoto like 42 +2amare ldquoto love 43 +3biasimare ldquoto blame 43B -2

Table 39 Examples from the Psych Predicates dictionary

LGC

lass

TotC

lass

Psy

chVe

rbs

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

()

41 599 525 136 325 45 19 8842 114 8 4 4 0 0 743 298 39 19 20 0 0 1343B 142 32 11 21 0 0 23Tot 1153 604 170 368 45 19 52

Table 310 Psych Predicates chosen for SentIta

LG Class 41 Ch F V N1

The verbs from this LG class have as definitional structure N0 V N1in which

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquobut with few exceptions it can be also a human a concrete or anabstract noun

68 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Sen

tIta

Verb

s

TotI

tem

s

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Psych Verbs 604 170 368 45 19Other Verbs 651 189 350 79 33

Tot 1255 359 718 124 52

Table 311 All the Semantic Predicates from SentIta

bull N1 is a human noun (Num)

A great part of these verbs (436 entries) with homogeneous syntacticbehavior can be grouped together also because of a semantic homo-geneity connected to their psychological nature (Elia 1984) examplesof this subset are reported below in their dictionary form

addolorareV+NEG+Sent+FLX=V3+DRV=RI+41

ingelosireV+NEG+DEB+Sent+FLX=V201+41

perturbareV+NEG+Sent+FLX=V3+41

rallegrareV+POS+Sent+FLX=V3+DRV=RI+41

rincuorareV+POS+DEB+Sent+FLX=V3+41

tormentareV+NEG+FORTE+Sent+FLX=V3+DRV=RI+4113

This Psychological subgroup identified in the electronic dictionarythrough the tag +Sent exactly like the French verbs from the LG class4 enters into a reverse transitive psych-verb structure by selectingan Experiencer as object N1 and a Stimulus as N0 as exemplified in (6)

13ldquoTo grieve to make jealous to upset to cheer up to reassure to tormentrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 69

(6) [Sentiment[StimulusLe sue parole]

addolorano[-2]

ingelosiscono[-1]

perturbano[-2]

rallegrano[+2]

rincuorano[+1]

tormentano[-3]

[ExperiencerMaria]]

ldquoHis words (grieve + make jealous + upset + cheer up + reassure +torment) Mariardquo

The objection of Ruwet (1994) on the psychological semantichomogeneity on this class can be applied also to the Italian LG class41 The extract of the dictionary reported below explains it better thanwords

danneggiareV+NEG+Op+FLX=V4+41

esautorareV+NEG+Op+FLX=V3+41

glorificareV+POS+Op+FORTE+FLX=V8+41

ledereV+NEG+Op+FLX=V34+41

miticizzareV+POS+Op+FLX=V3+41

smascherareV+NEG+Op+DEB+FLX=V3+4114

122 items from the class 41 have been selected and marked withthe tag +Op to indicate according with Ruwet (1994) that they selectan N1 that does not plays the role of the Experiencer but that repre-sents just the Target of a positive or negative judgment Differentlyfrom other verbs also tagged with the +Op label (eg condannare ldquotocondemnrdquo) here N0 is not the Opinion Holder but just the Causethat justifies the opinion conveyed by the verb (see example 7) TheOpinion Holder is never expressed by the arguments of the predicatesfrom the class 41 because it can be played in turn by the publicopinion or the people (Ruwet 1994)

14ldquoTo damage to reduce or deprive of the authority to glorify to wrong to make into heroes to un-maskrdquo

70 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(7) [Opinion[CauseLe sue parole]

danneggiano[-2]

esautorano[-2]

glorificano[+3]

ledono[-2]

miticizzano[+2]

smascherarno[-2]

[TargetMaria]]

ldquoHis words (damage + reduce or deprive of the authority + glorify +wrong + make into heroes + unmask) Mariardquo

Another subgroup of the verbs from this class marked in the LGtable as accepting the property V= uso concreto (ldquoV= concrete userdquo)possesses also a concrete meaning This is the case of abbattere ldquotodemolishrdquo (fig ldquoto discouragerdquo) accendere ldquoto lightrdquo (fig ldquoto fire uprdquo)colpire ldquoto hitrdquo (fig ldquoto moverdquo) etcOf course the semantic properties of the figurative verbs are notshared with their concrete homographsThe automatic disambiguation of these metaphors is sometimes verysimple because it can be based on the divergent properties of the dif-ferent LG classes to which the concrete and figurative words belong (89)

Concrete usage (Class 20NR)

(8) Carla accende il fiammifero (Elia 1984)ldquoCarla lights the matchrdquo

Figurative usage (Class 41)

(9) Parlare di rivoluzione accende Arturo (Elia 1984)ldquoTalking about revolution fires up Arturordquo

As exemplified the verb accendere belongs also to the class 20NRin which it does not accept a completive as N0 (Parlare di rivoluzioneaccende il fiammifero ldquoTalking about revolution lights the matchrdquo)neither a human noun as N1 (Carla accende il fratello ldquoCarla lights herbrotherrdquo) or a body part noun (Carla accende la gamba del fratelloldquoCarla lights her brotherrsquos legrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 71

In other cases it proves more difficult but it can be solved as well withthe help of semantically annotated dictionaries This happens in theexamples below in which the fact that this verb is marked in the class20NR as accepting the property Prep N2strum makes it possible toautomatically distinguish its concrete use (10) from the figurative one(11)

Concrete usage

(10) C ar l a accende i l f uoco del l a

stu f ag r i g l i acuci na

ldquoCarla lights the fire of the (stove + grill + cooker)

Figurative usage

(11) C ar l a accende i l f uoco del (l a)

r i vol uzi onecambi amento

passi one

ldquoCarla ignites the fire of (revolution + change + passion)rdquo

LG Class 42 Ch F V Prep N1

The verbs from the class 42 enter into an intransitive reverse psych-verb structure and are defined by the LG formula N0 V Prep N1 where

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquo) butin few cases can be also a human (Num) or a non restricted noun(Nnr)

bull N1 is a human noun (Num) but can be also a body part or anabstract noun of the kind discorso ldquo discourserdquo mente ldquomindrdquomemoria ldquomemoryrdquo (Elia 1984)

Here we find a selection of Psychological Predicates so poor thatcan be entirely reported

72 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

aggradareV+POS+Sent+42+FLX=V403+Prep=a

arridereV+POS+Sent+42+FLX=V34+Prep=a

dispiacereV+NEG+Sent+42+FLX=V37+Prep=a

spiacereV+NEG+Sent+42+FLX=V37+Prep=a

piacereV+POS+Sent+42+FLX=V37+DRV=RI+Prep=a15

The importance of this LG class into SentIta is not given by the(very small) amount of entries it accounts but depend on the ex-tremely high frequency of the verb piacere ldquoto likerdquoHere the prepositions (Prep) have been associated directly in thedictionary in order to avoid ambiguity with other verb usagesThe semantic annotation does not raise any problem as can bededuced from the example (12)

(12) [Sentiment[Stimulus

I l f at to che si f umi

C he si f umiFumar e

I l f umar eI l f umo

]

ag g r ad a[+2]

di spi ace[-2]

pi ace[+2]

a[ExperiencerMaria]]

ldquo(The fact that people smokes + That people smokes + Smoking +The smoke) (pleases + regrets + is appreciated by) Mariardquo

The main difference with the class 41 added to its intransitivity isthe fact that in Italian a more easily accepted form is the permutedone (Elia et al 1981 see example 13)

(13) [SentimentA [ExperiencerMaria] piace[+2] [Stimulusfumare]]ldquoMaria likes smokingrdquo

Again a subset of verbs from this class can be marked as opinionindicator

15ldquoto please to favor to regret to likerdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 73

Both N0 and N1 can be human nounsincombereV+NEG+Op+42+FLX=V21+Prep=loc+N0um+N1um

primeggiareV+POS+Op+FLX=V4+Prep=su+N0um+N1um

contrastareV+NEG+Op+FLX=V3+Prep=con+N0um+N1um

contareV+POS+Op+FLX=V3+Prep=per+N0um+N1um

Just N1 can be a human nounaddirsiV+42+POS+Op+FLX=V263+PRX+Prep=a+N1um

sconvenireV+NEG+Op+FLX=V236+Prep=a+N1um

convenireV+POS+Op+FLX=V236+Prep=a+N1um

costareV+NEG+Op+FLX=V3+Prep=a+N1um

Neither N0 nor N1 can be human nounsdegenerareV+NEG+FORTE+Op+42+FLX=V3+DRV=BILEN79+Prep=in

stridereV+NEG+Op+FLX=V2+Prep=con16

Not every verb of this subset accepts a human noun as N1 (see exam-ples 14 and 15)

(14) [Opinion[TargetQueste idee]str i dono[-2] con

i mi ei obi et t i vil a tua i mmag i ne

lowastM ar i a

]

ldquoThese ideas are at odd with (my purposes + your image + Maria)rdquo

(15) [Opinion[TargetLa situazione]degenera[-3] in un

dr ammacon f l i t tolowastM ar i a

]

ldquoThe situation degenerates into a (drama + conflict + Maria)rdquo

In such cases the Opinion Holder remains implicit but the judg-ment negatively affects the subject of the sentence instead of the N1Sentences like (16) and (17) raise to doubt whether N1 can be theHolder of the opinion expressed by the sentence or not

16ldquoto loom over to excel to hinder to worth to suit to be inconvenient to be convenient to cost todegenerate to clashrdquo

74 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(16) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion HolderArturo]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto Arturordquo

(17) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[convi ene[+2]

cost a[-2]

][Opinion Holderad Arturo]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for Arturordquo

Actually the verb does not provide any information regarding theactive or passive role of N1 in the expression of the opinion Thewriter can be convinced about what ldquomatters is convenient or costsrdquofor Arturo but it remains a belief that is not confirmed in the text (1617) Therefore we chose to assign the Opinion Holder role to the N1

of the class 42 only in the case in which the object corresponds to thepersonal pronouns ldquomerdquo or ldquousrdquo (18 19)

(18) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion Holderme]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto merdquo

(19) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[Opinion Holderci]

[convi ene[+2]

cost a[-2]

]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for usrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 75

LG Classes 43 and 43B N0 V Ch F and N0 V il fatto Ch F

The sentence structure that defines this class is N0 V N1

bull N0 in both the classes 43 and 43B is generally a human noun(Num)

bull N1 is an infinitive clause or a completive one which is direct forthe 43 and introduced by il fatto (Ch F + di) ldquothe fact (that S + of)rdquo)for the 43B

The psychological (Sent) but also the opinion (Op) verbs from theclass 43 share in many cases the acceptance or the refusal of a set ofLG properties

deprecareV+Op+NEG+Class=43+FLX=V8

bestemmiareV+Op+NEG+FORTE+Class=43+FLX=V11

criticareV+Op+NEG+Class=43+FLX=V8

maledireV+Op+NEG+FORTE+Class=43+FLX=V216

tollerareV+Op+NEG+DEB+Class=43+FLX=V3

adorareV+Sent+POS+FORTE+Class=43+FLX=V3

amareV+Sent+POS+FORTE+Class=43+FLX=V3

ammirareV+Sent+POS+Class=43+FLX=V3

odiareV+Sent+NEG+FORTE+Class=43+FLX=V12

rimpiangereV+Sent+NEG+DEB+Class=43+FLX=V3117

As regards the distribution of N0 it must be noticed that all theseverbs accept a human noun but only two of them (condannare ldquotocomndemnrdquo and meritare ldquoto deserverdquo) accept also a not restrictednoun (Nnr not active) or a subjective clause introduced by il fatto (ChF + di) This fact is perfectly consistent with the animated nature ofthe Experiencer with respect to the Psychological verbs (20) Howeverit underlines that this class of verbs selects also an Opinion Holder asN0 (except for the verb meritare)

17ldquoto deprecate to blaspheme to criticize to curse to tolerate to adore to love to admire to haterdquo

76 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(20) [Sentiment[ExperiencerMaria]

[ador a[+3]

odi a[-3]

][Stimulus

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (adores + hates) (Arturo + that Arturo smokes + the fact thatArturo smokes + the smoke)rdquo

(21) [Opinion[Opinion HolderMaria]

[cr i t i ca[-2]

tol ler a[-1]

][Target

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (criticizes + tolerates) (Arturo + that Arturo smokes + thefact that Arturo smokes + the smoke)rdquo

As far as the distribution of N1 is concerned all the psych and opin-ion items accept the pronoun ciograveldquothisrdquo and the Ppv lo ldquoithimherrdquo(apart from inveire ldquoto inveighrdquo) The only verbs that refuse Numas complement are auspicare ldquoto hope forrdquo inveire ldquoto inveighrdquolamentare ldquoto lamentrdquo malignare ldquoto speakthink ill ofrdquo meritare ldquotodeserverdquo agognare ldquoto yearn forrdquo anelare ldquoto long forrdquo deilrare ldquototalk nonsenserdquo and gustare ldquoto tasterdquo The verbs that instead do notaccept a N-um are fewer inveire and delirareThe property N1 dal fatto Ch F is refused by every verb except forapprezzare ldquoto admirerdquo that together with sopportare ldquoto sufferrdquodesiderare ldquoto desirerdquo and bramare ldquoto craverdquo accept also the propertyN1 da N2 Other properties refused in bulk are Ch N0 V essere Comp(save malignare sospettare ldquoto suspectrdquo temere ldquoto be afraid ofrdquo)N1 Agg (save sospettare temere) N1 = Agg Ch F (save sospettare) N0

V essere Cong (save sospettare and malignare) N1 = se F o se F (savebenedire ldquoto bless rdquo apprezzare and gustare) Ch F a N2um (saveapprovare ldquoto approverdquo)Similar observations but with an even higher homogeneity can be

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 77

made on the class 43B that differently from the 43 has the N1 alwaysintroduced by il fatto (Ch F + di)The Psychological (Sent) and the opinionated (Op) verbs from theclass 43B share sometimes the acceptance or the refusal of a set of LGproperties

attaccareV+Op+NEG+Class=43B+FLX=V8

biasimareV+Op+NEG+Class=43B+FLX=V3

lodareV+Op+POS+Class=43B+FLX=V3

osannareV+Op+POS+FORTE+Class=43B+FLX=V3

snobbareV+Op+NEG+Class=43B+FLX=V3

assaporareV+Sent+POS+Class=43B+FLX=V3

pregustareV+Sent+POS+Class=43B+FLX=V3

schifareV+Sent+NEG+FORTE+Class=43B+FLX=V3

sdegnareV+Sent+NEG+FORTE+Class=43B+FLX=V16

spregiareV+Sent+NEG+FORTE+Class=43B+FLX=V418

Here again an N0 different from a human noun can be acceptedjust by three verbs (banalizzare ldquoto trivializerdquo demitizzare ldquoto unideal-izerdquo and pregiudicare ldquoto compromiserdquo)The pronoun ciograveldquothisrdquo the Ppv lo ldquoithimherrdquo and a not human nounare accepted by all the Sent and Op verbs in complement positionwhile the N1um is refused by 2 psychological and 5 opinionated verbsAs regards the properties refused by almost all the verbs we account N1

dal fatto Ch F N1 da N2 (save difendere ldquoto defendrdquo) and Ch F a N2um(save apologizzare ldquoto make an apology forrdquo and vantare ldquoto praiserdquo)Differently from the class 43 in which the passive transformation isnot accepted in 3 cases in the class 43B it is accepted by all the items

18ldquo to attack to blame to praise to snub to taste to foretaste to be disgusted by to be indignant todespiserdquo

78 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3334 Other Verbs of Sentiment

We stated in Paragraph 3332 that almost half of the SentIta verbsdo not coincide with what is defined in literature as ldquopsych verbrdquoMoreover a lot of psych verbs and other opinion-bearing items canbe found in LG classes different from the ones classically identifiedas containing psychological predicates namely 41 42 43 43B (Elia2014a 2013) In detail a large number of verbs that can be used assentiment and opinion indicators is classified over 28 other LG classesdescribed by as many definitional structures This fact could be inter-preted as the failure of the syntactic-semantic mapping hypothesisbut actually into the LG framework the semantic entropy of the lex-icon can be decreased through the identification of classes of wordsthat share sets of lexical-syntactic behaviorsThe main Lexicon-Grammar verbal subclasses are the mentioned be-low

Intransitive verbs LG classes from 1 to 15 on which LG researcherstested the acceptance or the refusal of 545 combinatorial and dis-tributional properties through 23827 elementary sentences

Transitive verbs LG classes from 16 to 40 tested on 361 propertiesamong 19453 sentences

Complement clause verbs (transitive and intransitive) LG classesfrom 41 to 60 tested on 443 properties and 57242 sentences(Elia 2014a)

Table 312 shows the percentage values of opinionated verbsin each one of the Lexicon-Grammar verbal subclasses mentionedabove As we can see the complement clause group contains thelarger number of oriented items When inserted in our electronicdictionaries the lemmas from these LG classes are enriched with se-mantic information concerning the Types (Sent sentiments emo-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 79

LG Class

Verb

usa

ges

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intransitive verbs 401 138 21Transitive verbs 605 205 31Complement clause verbs 760 308 47Tot 1766 651 100

Table 312 Polar verbs in the main Lexicon-Grammar verbal subclasses

tions psychological states Op opinions judgments cognitive statesPhy physical acts) the Orientations (POS NEG) and the Intensities(STRONG WEAK ) of the described lemmasSimilarly to SentiLex (Silva et al 2010 2012) SentIta is provided withthe specification of the argument(s) Ni semantically affected by theorientation of the verbsThe presence of oriented verbs and the distribution of semantic tagsare respectively described in Tables 313 and 314 For a complete pre-sentation of LG verbal classes their composition and their granularitysee Elia (2014a)

334 Psychological Predicatesrsquo Nominalizations

The nominal entities of SentIta that will described in this Sectionare all predicative forms due to their correlation with the verbalpredicates described above eg angosciare ldquoto anguishrdquo angoscialdquoanguishrdquo (DrsquoAgostino 2005)The complex phenomenon related to the combinatorial constraintsbetween sentiment V-n and specific verbs in simple sentences hasbeen already explored by Balibar-Mrabti (1995) and Gross (1995) forthe French language In this regard Gross (1995) specified that ldquoles

80 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Mai

nLG

Cla

ss

LGC

lass

Verb

usa

ge

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intr

ansi

tive

Verb

s 2 122 34 282A 40 9 234 31 12 399 72 35 4910 56 19 3411 80 29 36

Tran

siti

veVe

rbs

18 40 8 2020I 25 1 420NR 83 15 1820UM 145 65 4521 29 2 721A 65 44 6822 63 28 4423R 34 13 3824 72 18 2527 37 10 2728ST 12 1 8

Co

mp

lem

entC

lau

seVe

rbs

44 33 18 5544B 44 20 4545 31 23 7445B 21 16 7647 313 120 3847B 34 12 3549 96 12 1350 51 41 8051 36 19 5353 28 9 3256 73 18 25

Tot - 1766 651 37

Table 313 Polar verbs in other LG verb classes which contain at least one orienteditem

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 81

Mai

nLG

Cla

ss

LGC

lass

Ori

ente

dVe

rbs

Sent Op POS NEG FORTE DEB

Intr

ansi

tive

Verb

s 2 34 7 24 8 26 0 02A 9 4 2 1 8 0 04 12 0 11 5 7 0 09 35 10 25 11 24 0 010 19 4 15 9 10 0 011 29 25 4 0 29 0 0

Tran

siti

veVe

rbs

18 8 0 0 4 4 0 020I 1 0 0 1 0 0 020NR 15 5 8 4 11 0 020UM 65 16 25 11 54 0 021 2 0 2 0 2 0 021A 44 9 14 17 6 11 1022 28 0 28 21 7 0 023R 13 0 0 4 9 0 024 18 0 0 0 1 15 227 10 0 10 10 0 0 028ST 1 0 0 0 1 0 0

Co

mp

lem

entC

lau

seVe

rbs

44 18 0 18 16 2 0 044B 20 2 18 14 6 0 045 23 6 16 7 15 1 045B 16 5 10 7 9 0 047 120 0 120 5 54 43 1847B 12 0 34 4 8 0 049 12 0 12 4 8 0 050 41 4 33 15 22 4 051 19 1 14 10 9 0 053 9 0 9 0 9 0 056 18 1 9 1 9 5 3

tot - 651 99 461 189 350 79 33

Table 314 Distribution of semantic labels into the LG verb classes that contain atleast one oriented item

82 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

restrictions de noms de sentiment agrave certains verbes sont nombreuseset tellement inattendues que dans un premier temps nous les avionstraiteacutees comme des expressions figeacutees crsquoest-agrave-dire comme de simpleslistes19rdquoAs regards Italian works in the LG framework we mention the worksof DrsquoAgostino (2005) DrsquoAgostino et al (2007) and Guglielmo (2009)that respectively explored the lexical micro-classes of the sentimentsira ldquoangerrdquo angoscia ldquoanguishrdquo and vanitagrave ldquovanityrdquo studying theirnominal manifestation in support verb constructionsDifferently from ordinary verbs (discussed here in Sections 333)support verbs do not carry any semantic information this role is justplayed here by the sentiment nouns but they can modulate it to acertain extentIn the present research the Nominalizations of Psychological verbshave been used to manually start the formalization of the SentItadictionary dedicated to nouns It comprehends 1000+ entries andincludes list of 80+ human nouns evaluated and added to this dic-tionary in order to use them in sentences like N0um essere un N1sent(eg Max egrave un attaccabrighe ldquoMax is a cantankerousrdquo)Table 315 shows one example of Psych V-n for each LG class took intoaccount during the annotation of Psychological Predicates

A sample of the entries from this dictionary is given below The LG

Psych verb V-n V-n Translation LG Class Scoreangosciare angoscia ldquoanguish 41 -3piacere piacere ldquopleasure 42 +2amare amore ldquolove 43 +3biasimare biasimo ldquoblame 43B -2

Table 315 Description of the Nominalizations of the Psychological Semantic Pred-icates

class is associated to a given item by the property ldquoClass that let

19ldquoThe restrictions of nouns of sentiment to certain verbs are numerous and so unexpected that at firstwe treated them as frozen expressions that is to say as mere listsrdquo Authorrsquos translation

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 83

the grammar recognize the proper syntactic structures in which thenominalizations occur in real texts

LG Class 41bellezzaN+POS+Class=41+DASOLO+FLX=N41

caloreN+POS+Class=41+FLX=N5

contentezzaN+POS+Class=41+DASOLO+FLX=N41

dolcezzaN+POS+Class=41+DASOLO+FLX=N41

fascinoN+POS+Class=41+DASOLO+FLX=N520

LG Class 42dispiacereN+FLX=N5+42+NEG+DASOLO

dispiacevolezzaN+FLX=N41+42+NEG+DASOLO

dispiacimentoN+FLX=N5+42+NEG+DEB+DASOLO

piacereN+FLX=N5+42+POS+DASOLO

piacevolezzaN+FLX=N41+42+POS+DASOLO21

LG Class 43approvazioneN+FLX=N46+43+POS+DASOLO

benedizioneN+FLX=N46+43+POS+DASOLO

bestemmiaN+FLX=N41+43+NEG+DASOLO

bramosiaN+FLX=N41+43+NEG+DEB+DASOLO

condannaN+FLX=N41+43+NEG+DASOLO22

LG Class 43BdenigrazioneN+FLX=N46+43B+NEG+FORTE+DASOLO

derisioneN+FLX=N46+43B+NEG+DASOLO

dileggioN+FLX=N12+43B+NEG+DASOLO

disdegnoN+FLX=N5+43B+NEG+FORTE+DASOLO

disprezzoN+FLX=N5+43B+NEG+FORTE+DASOLO23

Table 316 presents the details about the distribution of sentimenttags into the Psych V-n dictionary The size of the nouns dictionary is

20 ldquoBeauty warmth happiness kindness charmrdquo21 ldquoSorrow displeasure regret pleasure pleasantnessrdquo22 ldquoApproval blessing curse yearning condemnationrdquo23 ldquoDetraction derision mockery disdain despiserdquo

84 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

LGC

lass

No

un

s

Po

siti

veIt

ems

Neg

ativ

eIt

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

s

41 856 245 542 41 28 7742 11 5 6 0 0 143 170 57 113 0 0 1543B 74 21 53 0 0 7Tot 1111 328 714 41 28 100 100 30 64 4 3 -

Table 316 Nominalizations of the Psych Predicates chosen from SentIta

larger than the the verbsrsquo one This happens because the same pred-icate can possess more than one nominalization (eg the verb amarefrom the LG class 43 is related with the V-n amabilitagrave amore amorev-olezza)

3341 The Presence of Support Verbs

To include V-n in a sentiment lexicon is particularly risky In manycases it is misleading to use them out of any context because theyoften have a specific SO only if used with particular support verbsAmong the Vsup considered we mention the following with theirequivalents (DrsquoAgostino 2005 DrsquoAgostino et al 2007)

bull essere in ldquoto be inrdquo patire di soffrire di ldquoto suffer of stare in ldquotobe inrdquo andare in ldquoto go inrdquo

bull avere ldquoto haverdquo avvertire ldquoto perceiverdquo patire soffrire ldquoto suffertenere ldquoto have sentire provare ldquoto feel subire ldquoto undergordquonutrire covare ldquoto harborrdquo provare un (senso + sentimento) di ldquoto ldquoto feel a (sense + sentiment) of

bull causare ldquoto causerdquo dare ldquoto give suscitare ldquoto raiserdquo provocare

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 85

ldquoto provoke creare ldquoto create alimetare ldquoto instigate stimo-lare ldquoto inspire sviluppare ldquoto develop incutere ldquoto instillrdquo sol-lecitare ldquoto urge tordquo donare regalare offrire ldquoto offerrdquo

Sometimes it is even possible to have a SO switch by changing theword context For example the noun pietagrave ldquopity and ldquocompassionis strongly negative into a fixed expression with the support verb fare(22) but it becomes positive when it co-occurs with the sentiment ex-pression of (23)

(22) Egrave proprio la sceneggiatura che fa pietagrave [-3]ldquoThatrsquos the script that is pitifulrdquo

(23) Mary prova un sentimento di pietagrave [+2]ldquoMary feels a sentiment of compassionrdquo

Systematically evaluating the relationships that exist between eachone of the psych nouns and each one of the support verbs that canbe combined with them through LG tables can bring to an importantconclusion the fact that the support verb variants can influence thepolarity of the V-n with which they occur is not an exception For ex-ample extensions of Vsup like subire ldquoto undergordquo covare ldquoto harborrdquoincutere ldquoto instillrdquo bring a negative orientation with them donare re-galare offrire ldquoto offerrdquo are positively connoted perdere ldquoto loserdquo ab-bandonare ldquoto abandonrdquo lasciare ldquoto leaverdquo cause a polarity switch forwhatever psych noun orientationSupport verbs like the ones cited above can not be simply includedinto a Vsup list but must be inserted in specific paths of a nouns gram-mars able to correctly manage their influence on V-n

3342 The Absence of Support Verbs

Another problem arises when a psych predicative noun from our col-lection presents different semantic behaviors when occurring withoutsupport verbs in real texts This is the case of calore ldquowarmthrdquo from the

86 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

verb riscaldare ldquoto warm uprdquo of the LG class 41 that possesses its psy-chological figurative meaning only together with specific structures(24) but assumes a physical interpretation when it occurs alone (26)Different is the case of contentezza ldquohappinessrdquo that preserves its psy-chological meaning in both the cases (25 27)

(24) Le tue parole invadono[+] Maria di calore[+2] [+3]ldquoyour words warm up Maryrdquo

(25) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoyour words overwhelm Mary with happinessrdquo

(26) Che calore[+2] [0]ldquowhat a heatrdquo

(27) Che contentezza[+2] [2]ldquowhat a joyrdquo

In a banal way we solved the problem by distinguishing in the dic-tionaries and recalling in the grammars the cases in which the nounscan occur alone from the cases in which a specific Vsup context is re-quired to have a particular SO by means of the special tag +DASOLOldquoalonerdquo

3343 Ordinary Verbs Structures

In order to deeper examine how the presence of different verbs canaffect the polarity of the Psychological V-n we translated the list of 61verbs of Balibar-Mrabti (1995) from the French and annotated themwith semantic information The sentence structure which they havebeen conceived for is N0sent V N1um (eg una gioia intensa riempieMax ldquoan intense joy fills Maxrdquo) in which the subject N0 plays the roleof Nsent Then they have been tested into the structure N0 V N1umdi N2sent (eg le tue parole riempiono Max di una gioia intensa ldquoyourwords fill Max with an intense joyrdquo) that in turn is related with the

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 87

passive Max egrave pieno di una gioia intensa ldquoMax is full of an intense joyrdquoThis list of corresponding Italian verbs contains 47 entries because of

the exclusion of the duplicates from the LG class 4124 Among them 11give to the Nsent a negative connotation (eg abbattere ldquoto lose heartrdquooffuscare ldquoto confuserdquo corrodere ldquoto corroderdquo) 6 confer it a positiveorientation (eg rispeldere ldquoto shinerdquo cullare ldquoto clingrdquo illuminareldquotolight uprdquo) 18 intensify their polarity (eg travolgere ldquoldquoto overwhelmrdquordquoattraversare ldquoto undergordquo riempire ldquoto fillrdquo) and 12 simply possess aneutral orientationTheir role can become crucial into two differently oriented sentencesthat make use of the same Nsent like (28a b c)

(28a) Lrsquoamore[+2] tormenta[-3] Maria [-3](28b) Lrsquoamore[+2] illumina[+2] Maria [+2](28c) Lrsquoamore[+2] travolge[+] Maria [+3]

ldquoThe love (torments + lights up + overwhelms) Mariardquo

Therefore for sake of simplicity we decided to put them into aFSA that assigns as sentence polarity the score of the verbs with tagsNEGPOS (28a b) and that uses the verbs with the tag FORTE as in-tensifiers (28c) A simplified version of this grammar is displayed inFigure 31 Here the paths marked with the number (1) represents thenegative rule (28a) the paths (2) represent the positive rule (28b) andthe paths marked with (3) exemplify the intensification rule (28c)In this dictionarygrammar pair we did not mark the verbs with spe-cial tags consequently other Psychological Predicated endowed withpolarity and intensity tags can be matched in the correspondingpaths

24The verbs from this class are all compatible with this structure because they accept in the subjectposition an abstract noun and in position N1 require always a human noun

88 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re31F

SAfo

rth

ein

teraction

betw

eenN

sentan

dsem

antically

ann

otated

verbs

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 89

3344 Standard-Crossed Structures

To conclude the work on sentiment V-n we mention the case of thestructures of the type ldquostandardrdquo (N0sent V Loc N1um) ldquocrossedrdquo(N0um V di N1sent) exemplified below (DrsquoAgostino 2005 Elia et al1981)

(29) lrsquoamore[+2] brilla[+] negli occhi di Maria [+3]ldquolove shines in Mariarsquos eyesrdquo

(30) gli occhi di Maria brillano[+] drsquoamore[+2] [+3]ldquoMariarsquos eyes shine with loverdquo

To formalize this group of sentiment expressions we made use of 12verbs from the LG class 12 interpreted into a figurative way in whichthe N1um ldquostandardrdquo position and the N0um ldquocrossedrdquo position aredesigned by their body parts Npc through a metonymic process (Eliaet al 1981)As happens in (29) and (30) such verbs can work as intensifiers (egbruciare ldquoto burnrdquo) or can also have a negative polarity influence (onlyin the case of dolere ldquoto hurtrdquo)

335 Psychological Predicatesrsquo Adjectivalizations

Paragraph 332 described the manually-built collection of opinionbearing adjectives that counts more than 5000 itemsIn this list a subset of items in morpho-phonological relation with ourpredicates from the classes 41 42 43 and 43B has been automaticallyassociated to the verbs of origin and to their respective LG classes Theresult of this work is an electronic dictionary of 757 entries realizedwith a morphological grammar of Nooj (see Figure 32) It is displayedin the sample below

90 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

angoscianteA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciatoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciosoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

piacenteA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

piacevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

amatoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorosoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

biasimevoleA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=biasimare+Class=43B25

Table 317 gives information about the distribution of its semantictags while Table 318 compares the sentiment selection of V-a withthe whole dictionary of sentiment adjectives of SentIta in which it iscontained

Tag 41 42 43 43B TotPOS+FORTE 8 0 0 9 17POS 133 3 35 11 182POS+DEB 14 0 0 0 14NEG+DEB 66 0 5 1 72NEG 307 2 24 25 358NEG+FORTE 49 0 6 2 57FORTE 46 1 0 0 47DEB 9 0 0 1 10Tot 632 6 70 49 757

Table 317 Distribution of semantic tags in the dictionary of Adjectival Psych Pred-icates

The grammar resembles the ones used to derive the orientation ofadverbs in -mente and quality nouns from the semantic informationlisted in the sentiment adjectives that will be respectively treated inSections 352 and 353 The process is similar we copy and paste thedictionary that has to be semantically enriched into a Nooj text and weannotate it through the instructions of the grammar We then export

25 ldquoAnguishing anguished anguishous attractive pleasing beloved loving amorous blameworthyrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 91

Adjectives Entries Percentages () Percentages ()on Adj Psych on SentIta

Tot pos 213 28 4Tot neg 487 64 9Tot int 57 8 1Tot Psych 757 100 14Tot SentIta 5381 - 100

Table 318 Percentage values of Psych Adjectivalizations in SentIta

the annotations and compile a brand new dictionary on their baseBecause the operations involve just the two dictionaries we exclude

any other lexical (nod) and morphological (nom) resource from theNooj Preferences before applying the grammar to the dictionary-textThe grammar recognizes each verbal stem checks if it belongs to theLG class of interest and then adds the adjectival suffixes listed intoa dic file (see Table 319) The difference from the grammars forthe quick population of the adverbs in -mente and the quality nounsis that here the morphological strategy is not exploited to discoverthe prior polarity of the words since the adjective dictionary was al-ready tagged with sentiment information In fact in the final annota-tion the adjectives preserve all their semantic ($2S) inflectional andderivational ($2F) properties The new information they acquire re-gard their nature of V-a (+Va) the connection with the psychologicalverb (+V=$1L) and the relative LG class of the predicate (+Class=41)Regarding the adjectival suffixes for the deverbal derivation we exam-ined the ones that follow (Ricca 2004a)

bull morphemes for the adjectives formation (-bile -(t)orio -evole-(t)ivo -(t)ore)-bondo -ereccio -iccio -ido -ile -ndo -tario -ulo)

bull morphemes for the formation of adjectives that coincide with theverbrsquos past or present participle (-to -nte)

bull not typically deverbal morphemes (-oso -istico-(t)ico -aneo -ardo -ale -ario)

92 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re32E

xtractofth

em

orp

ho

logicalF

SAth

atassociates

psych

verbs

toth

eirad

jectivalization

s

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 93

bull not typically adjectival morphemes (-(t)ore -trice -ista -one)

SFX Adjectives-t-o 309-nt-e 159-(t)or-e 100-(t)iv-o 38-os-o 38-(t)ori-o 35-evol-e 26-o (conversion) 20-(t)ic-o 8-on-e 6-ist-a 4-istic-o 3-al-e 3-il-e 2-bil-e 1-bond-o 1-nd-o 1-tric-e 1-icci-o 1-ari-o 1-erecci-o 0-id-o 0-tari-o 0-ul-o 0-ane-o 0-ard-o 0

Table 319 Suffixes from the dictionary of Psych Adjectivalizations

As it can be seen in Table 319 some suffixes produced a largenumber of connections some of them were less productive Six mor-phemes did not produce any connectionFurthermore also a set of adjectives derived by conversion has beenautomatically connected to the respective verbs

94 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCESP

sych

verb

Psy

chV-

n

Psy

chV-

a

V-a

Tran

slat

ion

LGC

lass

Sco

re

angosciare angosciaangosciante ldquoanguishing

41 -3angosciato ldquoanguishedangoscioso ldquoanguishous

piacere piacerepiacente ldquoattractive

42 +2piacevole ldquopleasing

amare amoreamato ldquobeloved

43 +3amorevole ldquolovingamoroso ldquoamorous

biasimare biasimo biasimevole ldquoblameworthy 43B -2

Table 320 Adjectivalizations of the Psych Predicates

LG class 43 maligno = malignare ldquomean = to speak ill ofrdquoLG class 41 schifo = schifareldquodisgusting = to loatherdquo

The average precision reached in the automatic association pro-cess is 97 The list produced by Nooj has then been manuallycorrected in order to add it to the SentIta resources without anypossible source of errorsTable 320 shows examples of the connection between the verbs fromthe tables 41 42 43 and 43B and their respective nominalizations andadjectivalizationsActually nouns can be associated to adjectives also independentlyfrom verbs (Gross 1996) as happens in (31) Section 353 will dealwith such cases

(31) Luc

[est g eacuteneacuter eux

a de l a g eacuteneacuter osi teacute

]enver s ses ennemi s (Gross 1996)

ldquoLuc (is generous + has generosity) for his enemiesrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 95

Basically we chose to associate the psych adjectives to the respec-tive verbs in order to preserve their argumentactant selection for Se-mantic Role Labeling purposes This does not exclude that also otheradjectives that do not belong to this subset can have the function ofpredicates in the sentences in which they occur as happens in (31)

3351 Support Verbs

As exemplified above the sentiment expressions in which we insertedthe adjectives from SentIta are of the kind N0 essere Agg val WhereAgg val represents an adjective that expresses an evaluation (Elia et al1981) and the support verb for these predicates is essere ldquoto berdquoThis verb gives its support also to the expression N0 essere un N1-class(eg Questo film egrave una porcheria ldquoThis movie is a messrdquo) that withoutit together with the adjectives would not posses any mark of tense(Elia et al 1981)A great part of the compound adverbs possess also an adjectival func-tion (eg a fin di bene ldquofor goodrdquo tutto rose e fiori ldquoall peace and lightrdquosee Section 341) so with good reason they have been included inthis support verb construction as wellThe support verbsrsquo equivalents included in this case are the following(Gross 1996)

bull aspectual equivalents stare ldquoto stayrdquo diventare ldquoto becomerdquo ri-manere restare ldquoto remainrdquo

bull causative equivalents rendere ldquoto makerdquo

bull stylistic equivalents sembrare ldquoto seemrdquo apparire ldquoto appearrdquorisultare ldquoto resultrdquo rivelarsi ldquoto reveal to berdquo dimostrarsimostrarsi ldquoto show oneself to berdquo

Among the Italian LG structures that include adjectives26 we se-lected the following in which polar and intensive adjectives can occur

26For the LG study of adjectives in French see Picabia (1978) Meunier (1999 1984)

96 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

with the support verb essere (Vietri 2004 Meunier 1984)

bull Sentences with polar adjectives

ndash N0 Vsup Agg Val27

Lrsquoidea iniziale era accettabile[+1]

ldquoThe initial idea was acceptablerdquo

ndash V-inf Vsup Agg Val

Vedere questo film egrave demoralizzante[ndash2]

ldquoWatching this movie is demoralizingrdquo

ndash N0 Vsup Agg Val di V-Inf

La polizia sembra incapace[ndash2] di fare indagini

ldquoThe police seems unable to do investigaterdquo

ndash N0 Vsup Agg Val a N1

La giocabilitagrave egrave inferiore[ndash2] alla serie precedente

ldquoThe playability is worse than the preceding seriesrdquo

ndash N0 Vsup Agg Val Per N1

Per me questo film egrave stato noioso[ndash2]

ldquoIn my opinion this movie was boringrdquo

bull Sentences with adjectives as nouns intensifiers and downtoners

ndash N0 Vsup Agg Int di N1

Una trama piena[+] di falsitagrave[ndash2]

ldquoA plot filled with mendacityrdquo

27 We preferred to use in these examples the notation Agg Val rather than Psych V-a because they canrefer to both adjectivalizations of psych verbs and to other SentIta evaluative adjectives

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 97

3352 Nominal groups with appropriate nouns Napp

The support verb avere ldquoto haverdquo (and its equivalent tenere) has beenobserved into a transformation (Nb Vsup Na V-a) in which is involveda special kind of GN subject that contains noms approprieacutes ldquoappro-priate nounsrdquo Napp (Harris 1970 Guillet and Leclegravere 1981 Laporte1997 2012 Meydan 1996 1999)Citing (Laporte 2012 p 1)

ldquoA sequence is said to be appropriate to a given context ifit has the highest plausibility of occurrence in that contextand can therefore be reduced to zero In French the notionof appropriateness is often connected with a metonymicalrestructuration of the subjectrdquo

and (Mathieu 1999b p 122)

ldquoOn considegravere comme substantif approprieacute tout substantifNa pour lequel dans une position syntaxique donneacutee Na deNb = Nb28rdquo

we can clarify that ldquothe notion of highest plausibility of occurrenceof a term in a given contextrdquo (Laporte 2012) should not be interpretedin statistic terms or proved by searches in corpora but just intuitivelydefined through the paraphrastic relation Na di Nb = NbAccording to (Meydan 1996 p 198) ldquothe adjectival transformationswith Napp (nb (Na di Nb)Q essere V-a = Il fisico di Lea egrave attraenteldquoThe body of Maria is attractiverdquo) can be put in relation through fourtypes of transformationsrdquo which correspond also to the structuresincluded into our network of sentiment FSA The obligatoriness of themodifiers and the appropriateness of the nouns are reflected in thesetransformations (Laporte 1997)

28ldquoIt is considered to be appropriate substantive each substantive fro which into a given syntactic po-sition Na of Nb = Nbrdquo Authorrsquos translation

98 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull a nominal construction Vsup Napp

Nb Vsup Na V-a

Lea ha un fisico attraente ldquoLea has an attractive bodyrdquo

bull a restructured sentence in which the GN subject is exploded intotwo independent constituents

Nb essere V-a Prep Na

Lea egrave attraente (per il suo + di) fisico ldquoLea is attractive for herbodyrdquo

bull a metonymic sentence in which the Napp is erased

Nb essere V-a

Lea is attractive ldquoLea is attractiverdquo

bull a construction in which the Napp is adverbalized

Nb essere Na-mente V-a

Lea egrave fisicamente attraente ldquoLea is physically attractiverdquo

The concept of appropriate noun has a crucial importance notonly in these adjectival expressions but also when it appears as ob-ject of psychological predicates (see Section 334) as exemplified by(Mathieu 1999b p 122)

N0 V (Deacutet N de Nhum)1 = N0 V (Nhum)1

Les soucis rongent lrsquoesprit de Marie = Les soucis rongent MarieldquoWorries torment the spirit of Mary = Worries torment Maryrdquo

Moreover into the Sentiment Analysis field where the identifica-tion and the classification of the features of the opinion object evenconsist in a whole subfield of research (see Section 52) the Napp be-comes a very advantageous linguistic device for the automatic featureanalysis See for example (32) in which Na (Napp) is the feature andthe Nb (N-um) is the the object of the opinion

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 99

(32) [Opinion[Feature

[La r i sol uzi one

Lprimeobi et t i vo

]]del l a [Object f otocamer a] egrave eccel lente[+3]]

(32a) [Opinion[ObjectLa f otocamer a] ha una [Feature

[r i sol uzi one

obi et t i vo

]]eccel lente[+3]]

(32b) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3]]

(32c) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3] per [Feature

[l a sua r i soluzi one

i l suo obi et t i vo

]] ]

ldquo(The image resolution + The lens) of the camera is excellentrdquoldquoThe camera has an excellent (image resolution + lens)rdquoldquoThe camera is excellentrdquoldquoThe camera is excellent for its (image resolution + lens)rdquo

A useful classification of the nouns that can play the role of Nappfor both human and not human nouns and that can be exploited forthe opinion feature classification is the following (Meydan 1996)

Napp of Num

Npc = body parts

Npabs = personality trait

Ncomport = attitudes

Npreacutedr = intentions

Napp of N-um

Npo = shape

Nprop = ability

Dnom = model

100CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3353 Verbless Structures

Predicativity is not a property necessarily possessed by a particularclass of morpho-syntagmatic structures (eg verbs that carry infor-mation concerning person tense mood aspect) instead it is deter-mined by the connection between elements (Giordano and Voghera2008 De Mauro and Thornton 1985)Also on the base of their frequency in written and spoken corpora andin informal and formal speech together with Giordano and Voghera(2008) we consider verbless expressions syntactically and semanti-cally autonomous sentences which can be coordinated juxtaposedand that can introduce subordinate clauses just like verbal sentencesAmong the verbless sentences available in the Italian language we areinterested here on those involving adjectives indicating appreciation(Agg val) eg Bella questa ldquoGood onerdquo (Meillet 1906 De Mauro andThornton 1985)Below we report a selection of customer reviews from the dataset de-scribed in Section 513 that can give an idea of the diffusion of theverbless constructions in user generated contents

bull Movie Reviews [Molto bravi gli interpreti] [la mia preferita lasorella combina guai] [Bella la fotografia] [i colori]29

bull Car Reviews [Ottimo lrsquoimpianto radio cd] [comodissimi i co-mandi al volante]30

bull Hotel Reviews [Posizione fantastica] si raggiungono a piedi moltiluoghi strategici di Londra [molto curato nel servizio e nel soddis-fare le nostre richieste][ottimo servizio in camera]31

bull Videogame Reviews [Provato una volta] e [subito scartato]

29httpwwwmymoviesitfilm2013abouttimepubblicoid=68043930httpautociaoitNuova_Fiat_500__Opinione_1336287SortOrder131httpwwwtripadvisoritShowUserReviews-g186338-d651511-r120264867-Haymarket_

Hotel-London_Englandhtml

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 101

[odioso il modo in cui viene recensito]32

bull Smartphone Reviews [Utilissimo il sistema di spegnimento e ri-accensione ad orari programmati] [Telefonia e sensibilitagrave OK ]33

bull Book Reviews [Azzeccatissima la descrizione anche psicologica eintrospettiva dei personaggi]34

3354 Evaluation Verbs

In this Paragraph we mention the use of the verbs of evaluationVval (Elia et al 1981) which represent a subclass of the LG class43 grouped together through the acceptance of at least one of theproperties N1 = N1 Agg1 and N1 = Agg1 Ch F Examples are giudicareldquoto judgerdquo trovare ldquoto considerrdquo avvertire ldquoto noticerdquo valutare ldquotoevaluaterdquo etcOf course the N1 Agg here can include an Napp that takes the shapeof (Na di Nb)1 Agg just as happens with the psychological predicatesof Mathieu (1999b)They present a different role attribution with respect to support verbsconstructions since the N0 of the evaluative verbs is always an Opin-ion Holder (33)

(33) [Opinion[Opinion HolderM ar i a ]consi der a

[[Target Ar tur o] a f f asci nante[+2]

a f f asci nante[+2] [Targetche Ar tur o veng a]

]

ldquoMaria considers (Arturo fascinating + enchanting that Arturocomes)rdquo

32httpwwwamazonitreviewRK6CGST4C9LXI33httpwwwamazonitAlcatel-Touch-Smartphone-Dual-Italiaproduct-reviews

B00F621PPGpageNumber=334httpwwwamazonitproduct-reviewsB007IZ1Q5SpageNumber=3

102CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

The N0 of support verb construction instead is always the targetof the opinion (34) The Opinion Holder remains here unexpressed

(34) [Opinion[Target Ar tur o ]

egraver i manesembr a

d av ver o[+] a f f asci nante[+2]]

ldquoArturo (is + remain + seems) truly fascinatingrdquo

Into causative constructions the roles are again different (35)

(35) [Opinion[Cause

[M ar i a

Lprimeacconci atur a

]] r ende [Target Ar tur o ]d av ver o[+] a f f asci nante[+2]]

ldquo(Maria + The hairstyle) makes Arturo truly fascinatingrdquo

336 Vulgar Lexicon

The fact that ldquotaboo language is emotionally powerfulrdquo (Zhou 2010 p8) can not be ignored during the collection of lexical and grammaticalresources for Sentiment Analysis purposesSentIta is provided with a collection of 182 bad words that include thefollowing grammatical and semantic subcategories

bull Nouns 115 entries among which

ndash 30 are human nouns (eg rompiballe ldquopain in the assrdquo)

ndash 9 are nouns of body parts (eg cazzo ldquodickrdquo)

bull Verbs 45 entries among which

ndash 17 are pro-complementary and pronominal verbs (eg 4+PRXCOMP fottersene ldquoto give not a fuckrdquo and 13 +PRX in-cazzarsi ldquoto get madrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 103

ndash 17 allow the derivation of nouns in -bile ldquoblerdquo (eg tromba-bile ldquobangablerdquo from trombare ldquoto screwrdquo)

ndash 4 are already included in the LG class 41 (eg fregare ldquoto mat-terrdquo)

bull Adjectives 16 entries among which 10 are from SentIta (eg 5+POS cazzuto ldquodie-hardrdquo and 5 +NEG scoglionato ldquoannoyedrdquo)

bull Adverbs 4 entries (eg incazzosamente ldquogrumpilyrdquo)

bull Exclamation 8 entries (eg vaffanculo ldquofuck offrdquo)

Figure 33 Frozen and semi-frozen expressions that involve the vulgar word cazzoand its euphemisms

104CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Although some vulgar terms and expressions eg with the func-tions of interjections or discourse markers can be semantically emptythe great part of them have metaphorical functions that are relevantfor the correct analysis of the sentiment in real texts Examples are of-fense imprecation intensification of negation etcThe scatological and sexual semantic fields are among the most fre-quent in our lexicon The male and female nouns derived from humanbody parts can have different kinds of orientations and functions thatgo through the insult (coglione ldquoassholerdquo) and the praise (figata ldquocoolthingrdquo)In accord with the observations made up by Velikovich et al (2010)in web corpora our lexicon involves a set of racial or homophobicderogatory terms (eg terrone35 ldquosouthern Italianrdquo culattone ldquofaggotrdquo)The vulgar words in the SentIta database especially the ones with anuncertain polarity can be part of frozen or semi-frozen expressionsthat can make clear for each occurrence the actual semantic orienta-tion of the words in context A very typical Ialian example is cazzoldquodickrdquo with its more or less vulgar regional variants (eg minchiapirla) and euphemisms (eg cavolo ldquocabbagerdquo cacchio ldquodangrdquo mazzaldquostickrdquo tubo ldquopiperdquo corno ldquohornrdquo etc) Figure 33 exemplifies thecases in which the context gives to the word(s) under examination anegative connotation In detail the FSA includes negative adverbialand adjectival functions (paths 12 5 7) exclamations (paths 3 6 9)interrogative forms (6) and intensification of negations (paths 8 9) Inpath (4) it is also exemplified the frozen sentence from the LG classCAN (N0 V (C1 di N)= N0 V C1 a N2) that presents cazzo (Vietri 2011) asC1 frozen complement This frozen sentence in particular is also re-lated to some monorematic compound nouns and adjectives includedin the bad word list of SentitaHowever a Sentiment Analysis tool that works on user generated con-tents must be aware of the fact that in colloquial and informal situ-ations a word like cazzo can work simply as a regular intensifier also

35term used with insulting purposes only by Northern Italians

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 105

for positive sentences (eg egrave cosigrave bello[+2] cazzo[+] [+3] ldquoitrsquos fuckingnicerdquo) This is why it received in the dictionary just the tag attributedto intensifiers (+FORTE) In order to be annotated in texts with otherorientations it must occur as component of specific expressionsAs we can see in Table 321 although the majority of the entries in thebad words dictionary possess a negative connotation 10 of them arepositive (eg strafigo ldquosupercoolrdquo) Moreover the manual annotationof the whole list of bad words with sentiment tags confirmed the bud-ding concept of the emotional power of taboo language 84 of thevulgar lexicon is endowed with a defined semantic orientation (see Ta-ble 322)

Score Bad Words Entries+3 +POS+FORTE 6+2 +POS 12+1 +POS+DEB 0-1 +NEG+DEB 8-2 +NEG 125-3 +NEG+FORTE 6+ +FORTE 3- +DEB 0

Table 321 SentIta tag distribution in the list of bad words

Bad Words Entries Percentage ()tot pos 18 10tot neg 132 73tot int 3 2tot oriented 153 84tot neutral 29 16tot BW 182 100

Table 322 Bad Words percentage values in SentIta

106CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Collateral Benefits of Exhaustive Vulgar Lexical and Grammatical Resources

The Web 20 offers the opportunity to Internet users to freelyshare almost everything in online communities hiding their ownidentity behind the shelter of anonymityFlaming trolling harassment cyberbullying cyberstalking cy-berthreats are all terms used to refer to offensive contents on theweb The shapes can be different and the focus can be on various top-ics eg physical appearance ethnicity sexuality social acceptanceetc (Singhal and Bansal 2013) but they are all annoying (when noteven dangerous when the victims are children or teenagers) for thesame reason they make the online experience always unpleasantThe detection and filtering of offensive language on the web isstrongly connected to the Sentiment Analysis field when one decideto handle it automatically The occurrence of derogatory terms andphrases or the unjustified repetition of words with a strongly negativeconnotation can effectively help with this taskAlthough taboo language is generally considered to be the strongestclue of harassment in the web (Xiang et al 2012 Chen et al 2012Reynolds et al 2011 Xu and Zhu 2010 Yin et al 2009 Mahmud et al2008) it must be clarified that the presence of bad words in posts doesnot necessarily indicate the presence of offensive behaviorsWe already explained that the words in our collection of vulgar termsin some cases are neutral or even positive Profanity can be usedwith comical or satirical purposes and bad words are often just theexpression of strong emotions (Yin et al 2009)Thus the well-known censoring approaches based only on keywordsthat simply match words appearing in text messages with offensivewords stored in blacklists are clearly not meant to reach high levels ofaccuracyConsistent with this idea in recent years many studies on offensivecyberbullying and flame detection integrated the bad wordsrsquo contextin their methods and tools

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 107

Chen et al (2012) exploited a Lexical Syntactic Feature (LSF) archi-tecture to detect offensive content and identify potential offensiveusers in social media introducing the concept of syntactic intensifierto adjust wordsrsquo offensiveness levels based on their contextXiang et al (2012) learned topic models from a dataset of tweetthrough Latent Dirichlet Allocation (LDA) algorithm They combinedtopical and lexical features into a single compact feature spaceXu and Zhu (2010) proposed a sentence-level semantic filtering ap-proach that combined grammatical relations with offensive wordsin order to semantically remove all the offensive content from sen-tencesInsulting phrases (eg get-a-life get-lost etc) and derogatorycomparisons of human beings with insulting items or animals (egdonkey dog) were clues used by Mahmud et al (2008) to locate flamesand insults in text

34 Towards a Lexicon of Opinionated Compounds

341 Compound Adverbs and their Polarities

Elia (1990) adapted to the Italian language the formal definition of ldquoad-verbrdquo of1 2q Jespersen (1965) who for the first time included alsospecial kinds of phrases and sentences in the category of the adverbsIn detail the structures included in this category are the following

bull traditional adverbs with a monolithic shape (eg volentierildquogladlyrdquo)

bull traditional adverbs in -mente (eg impetuosamente ldquoimpetu-ouslyrdquo)

bull prepositional phrases (eg a vanvera ldquorandomlyrdquo)

108CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull phases derived from sentences with finite or non-finite verbs(eg da levare il fiato ldquoto take the breath away)

The first two types of adverbs will be treated separately in Section352 due to the fact that they have been automatically derived fromthe SentIta adjective dictionaryCompound adverbs or frozen or idiomatic adverbs (the last two pointsof Jespersenrsquos list) can be defined as

ldquo() adverbs that can be separated into several wordswith some or all of their words frozen that is semanticallyandor syntactically noncompositionalrdquo (Gross 1986 p 2)

Therefore just as happens in polyrematic expressions the syntacticelements that compose such adverbs can not be freely shifted substi-tuted or deleted This despite the fact that from a syntactic point ofview they work just as simple adverbsFollowing the LG paradigm and according to their internal mor-phosyntactic structure the compound adverbs have been classified inmany languages namely English(Gross 1986) French (Gross 1990Laporte and Voyatzi 2008 Tolone and Stavroula 2011) Italian (Elia1990 Elia De Gioia 1994ab 2001) Modern Greek (Voyatzi 2006)Portuguese (Baptista 2003) etcThe LG description and classification of compound adverbs rep-resents the intersection of two criteria corresponding to the phe-nomenon of the multiword expressions and to the adverbial functions(Tolone and Stavroula 2011 Laporte and Voyatzi 2008)According to (Laporte and Voyatzi 2008 p 32)

ldquo() a phrase composed of several words is considered tobe a multiword expression if some or all of its elements arefrozen together that is if their combination does not obeyproductive rules of syntactic and semantic compositional-ityrdquo

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 109

The adverbial functions instead pertain to the adverbial roleswhich can take the shape of circumstantial complements or of com-plements which are not objects of the predicate of the clause in whichthey appearIn LG descriptions compound adverbs are formalized as a sequenceof parts of speech and syntactic categories Their morpho-syntacticstructures are described on the base of the number the category andthe position of the frozen and free components of the adverbialsThe Italian classification system for compound adverbs follows thesame logic as the French one with which it shares a strong similar-ity in terms of syntactic structuresIn this research among the 16 classes of idiomatic adverbs selected byDe Gioia (1994ab) we worked on the ones reported in Table 323The comparative structures with come ldquolikerdquo treated by Vietri (19902011) as idiomatic sentences will be described in Section 342The adverbs belonging to the classes described in the first 9 rows of

Table 323 have been manually selected and evaluated starting fromthe electronic dictionaries and the LG tables of Elia (1990) while theclasses PF PV and PCPN come from the LG tables of De Gioia (2001)In the first classes Elia (1990) recognized the following three abstractstructures

1 Prep (E+Det) (E+Agg) C1 (E+Agg)

(a) PC

(b) PDETC

(c) PAC

(d) PCA

2 Prep (E+Det+Agg) C1 (E+Agg) Prep (E+Det+Agg) C2 (E+Agg)

(a) PCDC

(b) PCPC

110CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class

Structu

reE

xamp

leTran

lation

PC

Prep

Ccon

trovoglia

un

willin

glyP

DE

TC

Prep

DetC

conle

cattiveb

yan

ym

eans

necessary

PAC

Prep

Agg

Ca

pien

ivotiw

ithfl

ying

colo

rsP

CA

Prep

CA

gga

testaalta

with

you

rh

eadh

eldh

ighP

CD

CP

repC

diC

concogn

izione

dicau

saw

ithfu

llknow

ledge

PC

PC

Prep

CP

repC

diben

ein

meglio

from

goo

dto

better

PC

ON

G(P

rep)

CC

on

gC

drsquoam

oree

drsquoaccord

oin

armo

ny

CC

CC

nien

tem

alen

oth

ing

bad

CP

CC

Prep

Clrsquoira

diD

ioa

kingrsquos

ranso

mP

FC

om

po

un

dSen

tence

sisalvichip

uograve

everym

anfo

rh

imself

PV

Prep

VW

sedu

tastan

teo

nth

esp

ot

PC

PN

Prep

CP

repN

inon

taa

insp

iteo

f

Table

323Classes

ofC

om

po

un

dA

dverb

sin

SentIta

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 111

3 (E+Prep) (E+Det) C1 (E+Cong+Prep) (E+Det+Agg) C2

(a) PCONG

(b) CC

(c) CPC

Table 324 shows the distribution of each one of the evaluation(POSNEG) intensity (FORTEDEB) and discourse tags36 (SINTESINEGAZIONE etc) in every class of compound adverbAs we can see in the two lower section of the Table 324 the percentageof compound adverbs that posses an orientation or that are signifi-cant for Sentiment Analysis purposes is the 51 of the total amount ofcompound adverbs under exam Moreover it must be observed thatthe 60 of the labels are connected to the intensity while we have onlythe 30 of evaluation tagsThe presence of discourse tags seems to be marginal but if comparedwith other part of speech (or even with the -mente and monolithic ad-verbs see Section 352) it reveals its significance

3411 Compound Adjectives of Sentiment

A great part of the polyrematic units with an adverbial function canalso play the role of adjectives

Adverbial function agire a fin di bene ldquoto act for goodrdquo

Adjectival function una bugia a fin di bene ldquoa white lierdquo

For simplicity we did not distinguish the cases in which theyposses just one function from the cases in which they can have boththe roles However separating these two groups is often impossible

36These tags annotate 70 discourse operators able to summarize (eg in parole povere ldquoin simpletermsrdquo) invert (eg nonostante cioacute ldquoneverthelessrdquo) confirm (eg in altri termini ldquoin other termsrdquo) com-pare (eg allo stesso modo ldquoin the same wayrdquo) and negate (eg neanche per sogno ldquoin your dreamsrdquo) theopinion expressed in the previous sentences of the text

112CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

if one considers also their invariability and the fact that their func-tion depends on semantic features rather then on morphological ones(Voghera 2004)

AVV+AVVC+PC+PN without adjectival functionin conclusione ldquoin conclusionrdquo

AVV+AVVC+PC+PN with adjectival functiona vanvera (eg parlare a vanvera ldquoto prattlerdquo un discorso a van-vera ldquoa prattlerdquo)

Therefore we just did not make any distinction into the dictionar-ies in order to avoid the duplication of the entries but we recalledthem also in the grammars dedicated to adjectives as nouns modi-fiers and into constructions of the kind N0 (essere + E) Agg Val (seeSection 335)

342 Frozen Expressions of Sentiment

In this thesis oriented words are always considered in context In thisparagraph we will focus on the importance that the frozen sentencescan have in a LG based module for Sentiment AnalysisTraditional and generative grammar generally treat idioms as aberra-tions and many taggers and corpus linguistic tools just ignore multi-word units and frozen expressions (Vietri 2004 Silberztein 2008)The LG paradigm started with Gross (1982) the systematic and formalstudies on frozen sentences underling their non-exceptional naturefrom both the lexical and syntactic point of views Indeed idioms oc-cupy in the lexicon a volume that is comparable with the one of thefree forms (Gross 1982)This kind of approach opens plenty of possibilities in the field of Sen-timent Analysis in which the collection of lexical resources endowedwith a defined polarity can not just run aground on simple words butneeds to be opened also to different kinds of oriented multiword ex-

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 113

Tag

PC

CC

CPC

PAC

PCA

PCDC

PCONG

PCPC

PDETC

PF

PV

PCPN

TotP

OS+

FO

RT

E0

60

42

02

15

10

021

PO

S1

52

915

101

014

00

057

PO

S+D

EB

00

03

22

20

51

00

15N

EG

+DE

B0

01

04

01

37

31

020

NE

G36

156

155

62

519

70

512

1N

EG

+FO

RT

E0

00

11

12

23

11

012

FO

RT

E10

240

549

2613

3518

768

50

377

DE

B23

131

144

45

26

10

073

SIN

TE

SI13

20

61

20

01

05

030

CO

NF

ER

MA

52

00

00

00

50

01

13IN

VE

RT

E4

20

80

00

07

01

022

CO

MP

30

01

00

00

10

00

5N

EG

AZ

ION

E3

00

00

00

21

01

18

Tot

190

8515

110

6038

5033

150

2214

777

4To

tCla

ss87

234

610

431

632

024

813

212

762

120

212

453

332

1Pe

rcen

tage

()

2225

1435

1915

3826

2311

1113

51

Eva

luat

ion

tags

()

1931

6029

4850

2033

3559

1471

32In

ten

sity

tags

()

6662

4057

5045

8061

5541

360

58D

isco

urs

eta

gs(

)15

70

142

50

610

050

2910

Tab

le3

24C

om

po

siti

on

oft

he

com

po

un

dad

verb

dic

tio

nar

y

114CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

pressions and frozen sentencesAccording to (Vietri 2014b p 32)

() a verb when co-occurring with a certain noun (or a setof nouns) produces a ldquospecial meaningrdquo that it would nothave been assigned if the noun was substituted This typeof meaning is conventionally defined ldquonon-compositionalrdquobut this does not automatically imply from a syntactic pointof view that idioms are ldquounitsrdquo

On the base of this statement strictly connected to the distributionalproperties of the idiomatic sentences we included in our sentimentdatabase more than 500 Italian frozen sentences that have been man-ually evaluated and then formalized with pairs of dictionary-grammarThis choice is connected to the purpose of lemmatizing them in lexicaldatabases taking into account through FSA also their syntactic flexi-bility and lexical variation Treating into a computational way idiomsmeans to deal with the problems they pose in terms of relationship be-tween ldquointerpretationrdquo and ldquosyntactic constructionsrdquo (Vietri 2014b)The source of the work are the Lexicon-Grammar descriptions of id-ioms that take the shape of binary tables in which lexical entries aredescribed on the base of syntactic and semantic propertiesThe formal notation is the traditional one used in the LG frameworkThe symbol C is used in frozen sentences to indicate a fixed nom-inal position that can not be substituted by different items belong-ing to the same class or semantic field and that can not be morpho-grammatically modified (Vietri 2004)The Lexicon-Grammar classification of Italian idioms includes morethan 30 Lexicon-Grammar classes of idioms with ordinary verbs andsupport verbs The most common support verbs are avere to ldquohaverdquoessere ldquoto berdquo fare ldquoto makerdquo They differ from their ordinary counter-parts because of their semantic emptiness When they are implied inthe formation of idiomatic constructions they present a very high lex-ical and syntactic flexibility that can take the shape of the following

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 115

phenomena (Vietri 2014d)

bull the alternation of support verbs with aspectual variants (eg ri-manere ldquoto remainrdquo diventare ldquoto becomerdquo)

bull the production of causative constructions (eg with verbs likerendere ldquoto makerdquo)

bull the deletion of the support verb (eg N0 Agg come C1 una donnaastuta come una volpe ldquoa woman as sharp as a tackrdquo)

Because of the complexity of their lexical and syntactic variabilitythat often affects the intensity of the idiom itself we chose to reducethe sample of the idioms under examination for Sentiment Analysispurposes just to the frozen sentences with the support verb essere thatinclude adjectives in their structure namely PECO CEAC EAA ECAand EAPC Due to the fact that a large part of the idioms belongingto this subgroup contains adjectives evaluated in SentIta we foundinteresting a comparison between the prior polarity of such adjectivesinvolved in idioms and the polarity assigned to the idiom itself Theresults of this evaluation of differences is reported in Table 325

PECO CEAC EAA ECA EAPC TotIdioms in the LG Class 274 36 40 153 162 665Idioms in SentIta 245 27 32 134 139 577Adj of SentIta in Idioms 133 12 15 37 56 253

Table 325 Polar adjectives in frozen sentences

3421 The Lexicon-Grammar Classes Under Examination PECO CEAC EAAECA and EAPC

Vietri (1990) showed that the comparative frozen sentences of the typeN0 Agg come C1 (PECO) usually intensify the polarity of the adjectiveof sentiment they contain as happens in (36) in which the SentIta

116CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

adjective bello endowed with a polarity score of +2 occurring into theidiom Essere bello come il sole changes its polarity in +3

(36) Maria egrave bella[+2] come il sole [+3]ldquoMaria is as beautiful as the sun

Actually it is also possible for an idiom of this sort to be polarizedwhen the adjective (eg bianco ldquowhite) contained in it is neutral (37)or even to reverse its polarity as happens in (38) (eg agile ldquoagile ispositive)

(37) Maria egrave bianca[0] come un cadavere [-2]ldquoMaria is as white as a dead body (Maria is pale)

(38) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

In this regard it is interesting to notice that the 84 of the idiomshas a clear SO while just the 36 of the adjectives they contain ispolarized see the examples in Table 326 This confirms the signifi-cance of a sentiment lexicon that includes also frozen expressions inits list Indeed a resource of this sort is able to disambiguate the casesin which the polarity of sentiment (or neutral) words is changed by theironic nature of the idioms in which they occur (38) (39) (40)The disambiguation rule is only one and consist in annotating alwaysthe longest match in the text analysis

(39) Maria egrave alta[0] come un soldo di cacio [-2]ldquoMaria is knee-high to a grasshopperrdquo (Maria is short)

(40) Maria egrave asciutta[0] come lrsquoesca [-2]ldquoMaria is on the bones of its arserdquo (Maria is penniless)

Other idioms included in our resources are the ones belonging tothe classes EAPC that involve the presence of an adjective or a verb inits past participle form a prepositional complement These frozen ex-pressions usually intensify the polarity of the adjectivepast participle

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 117

Froz

enTr

ansl

atio

nA

dj

Ad

jId

iom

Idio

mSe

nte

nce

Po

lari

tySc

ore

Po

lari

tySc

ore

esse

reag

ile

com

eu

na

gatt

ad

ipio

mbo

ldquoto

be

asag

ile

asa

lead

catrdquo

PO

S+2

NE

G-2

esse

reag

ile

com

eu

na

gazz

ella

ldquoto

be

asag

ile

asa

lead

gaze

llerdquo

PO

S+2

PO

S+

2es

sere

agil

eco

me

un

osc

oiat

tolo

ldquoto

be

asag

ile

asa

squ

irre

lrdquoP

OS

+2P

OS

+2

esse

reas

tuto

com

eil

dem

onio

ldquoto

be

ascl

ever

asth

ed

evilrdquo

-0

PO

S+

2es

sere

astu

toco

me

un

avo

lpe

ldquoto

be

assh

arp

asa

tack

rdquo-

0P

OS

+2

esse

reas

tuto

com

eu

no

zin

garo

ldquoto

be

ascu

nn

ing

asa

gyp

syrdquo

-0

NE

G-2

Tab

le3

26E

xam

ple

so

fid

iom

sth

atm

ain

tain

sw

itch

or

shif

tth

ep

rio

rp

ola

rity

oft

he

adje

ctiv

esth

eyco

nta

in

118CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(see example below) but just as the PECO ones they can also switchit or shift it

(41) Maria egrave matta[-2] da legare [-3]ldquoMaria is so crazy she should be locked up

The LG class EAA with definitional structure N0 essere Agg e Agg in-cludes two adjectives that can be polarized or not Example (42) showsthat in this case as well the sentence polarity can be also oppositewith respect to the adjectives ones

(42) Maria egrave bella[+2] e fritta[0] [-2]ldquoMaria is cooked

An example from the class CEACis reported in (43)

(43) Lrsquoanima egrave nera[0] come il carbone [-3]ldquoThe soul is black like the coalrdquo

Among the transformation that these idioms can have we includedN0 avere C Agg (44) that corresponds also to the most frequent one

(44) Maria ha lrsquoanima nera[0] come il carbone [-3]ldquoMaria has a black soul like the coalrdquo

In the end we formalized the LG class ECA that accounts in itsfrozen elements the nominal element C1 and the Adjective

(45) Maria egrave una gatta morta[0] [-2]ldquoMaria is a cock tease

3422 The Formalization of Sentiment Frozen Expressions

Due to their syntactic and lexical variation and to discontinuity frozenexpressions need to be formalized in an electronic context able to sys-tematically list them in electronic databases and to take into accounttheir syntactic variability Basically the final purpose is to take advan-tage of the large number of information listed into the LG matrices

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 119

while recognizing and annotating idioms in real textsOne of the best solution is to link dictionaries which contain the listsof Atomic Linguist Units and their syntactic and semantic propertieswith syntactic grammars which can make such words and propertiesinteract into a FSA context (Silberztein 2008) ALU are ldquothe smallestelements that make up the sentence ie the non-analyzable units ofthe languagerdquo (Silberztein 2003) They can take the shape of simplewords affixes multiword expressions and frozen expressionsWhat makes different frozen expressions and the other ALU is discon-tinuity as displayed in Figure 34 in fact it is possible for adverbs tooccur between the verb and the direct object (Vietri 2004) That iswhy frozen sentences need both dictionaries and FSA to be recognizedand annotated in a correct way the dictionaries allow the recogni-tion of idioms thanks to the characteristic components (word forms orsequences that occur every time the expressions occur and originatethe recognition of the frozen expressions in texts (Silberztein 2008))and FSA let the machine compute them despite of the many differentforms that in real texts they can assumeFigure 34 represents a simplified version the main graph of the FSA

for the recognition and the sentiment annotation of idioms of the LGclass PECO It includes a metanode that allows the recognition of anominal group in subject position (N0) an embedded graph for theverb and six nodes related to different Semantic OrientationsThe instruction XREF is used to exclude the linguistic material thatcan occur between the support verb and the idiom object when theannotation is performed Figure 35 shows as an example the contentof the metanode that annotates frozen expressions with +3 as Seman-tic Orientation (MOLTOPOS) Here we observe the interactions be-tween the idiom dictionary and the restrictions reported in the syntac-tic grammar In its structure the grammar under examination looksexactly the same as the grammar for the identification of free sen-tences with the same syntactic shape The difference are the syntac-tic and distributional constraints that in the grammar recall only the

120CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re34E

xtractofth

eF

SAfo

rth

eSO

iden

tificatio

no

fidio

ms

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 121

properties and the lexical items associated in the dictionary with aspecific characteristic component To be more precise the PECO dic-tionary among others lists the three entries shown below which havebeen labeled with the semantic tags POS and FORTE (+3) as happensin the other subsets of SentIta

Essere (forte+coraggioso) come un leone ldquoTo be (strong+brave) like alionrdquo

leoneN+Class=PECO

+A=coraggioso+A=forte+AVV=come+AVV=quanto+DET=un

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0essereunC1+N0esserecomeC1+N0esserepiu

Essere buono come il pane ldquoTo be as good as breadrdquo

paneN+Class=PECO

+A=buono+AVV=come+AVV=quanto+DET=il+PREP=di

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserediC1+N0esserecomeC1+N0esserepiu

Essere (astuto+furbo) come il demonio ldquoTo be as clever as the devilrdquo

demonioN+Class=PECO

+A=astuto+A=furbo+AVV=come+AVV=quanto+DET=il

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserepiu

122CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

As shown below the characteristic components C (eg leone ldquolionrdquopane ldquobreadrdquo demonio ldquodevilrdquo) have been used as Nooj entry in theelectronic dictionary and have been associated to the other compo-nents of the frozen expressions (eg +A +AVV +DET) Moreover theyhave been described also with other information referring to the fol-lowing properties borrowed from the already cited LG tables

distributionaleg +N0um +Vsup +Vcaus

transformationaleg +N0essereunC1 +N0esserecomeC1 +N0esserediC1 +N0esserepiu

All these properties are recalled in the idiom syntactic grammar inform of lexical and grammatical constraintThe restrictions +N0um and +Vsup +Vcaus are respectively meant forthe distributional properties of sentence subject that can or can notbe a human and of the support verb essere with its aspectual (Vsup)or causative (Vcaus) variants While the support verbs restare and ri-manere are always acceptable diventare and rendere are not accept-able only in few cases (Vietri 1990)We chose to formalize the distributional restrictions in the syntacticgrammars with the following instruction written under each involvednode and into a related variable (var)

lt$var=$C$vargt

It represents a constraint for the words that are allowed to appearinto the nodes it restricts which can be only the ones indicated in thedictionary by the same variable var for the entry C As an example theidiom Essere (forte+coraggioso) come un leone that in the dictionaryhas as values for the variable +A only the adjectives forte ldquostrongrdquo andcoraggioso ldquobraverdquo can not accept in the grammar into the variable Aother adjectives than the ones written above if the variable goes withthe restriction shown below

lt$A=$C$Agt

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 123

The same happens with the verb aspectual and causative variants(Vietri 1990) with similar restrictions

lt$Vvar=$C$Vvargtlt$Vcaus=$C$Vcausgt

The symbols ldquo=rdquo is used in Figure 35 when the inflection of theword in the variable is permitted (eg for the adjective essere furbicome il demonio) Otherwise it is used the symbol ldquo=rdquoWe preferred to treat transformational properties instead differentlyfrom Vietri (1990) In order to reduce the dimension of the gram-mar that contains here also the polarity and intensity informationwe wrote transformational constraints before the path starts in the fol-lowing form

lt$C$vargt

This instructs the FSA that the specific path is allowed just for theidioms that present in the dictionary the variable var for the entry CIn the FSA in Figure 35 the PECO standard structure Essere Agg come

C1 is recognized by the path (2) The path (1) deletes the adjective ifthe idiom recognized by C possesses the specific transformational rule+N0esserecomeC1 Below are reported examples of idioms that satisfythis constraint The ones preceded by the asterisk do not satisfy therestrictions

essere come un leone ldquoto be like a lionrdquoessere come il pane ldquoto be like the breadrdquoessere come il demonio ldquoto be like the devilrdquo

In the path (3) the grammar performs the deletion of both the ad-jective and the adverb come for the idioms endowed with the property+N0essereunC1 in the electronic dictionary

essere un leone (di uomo) ldquoto be a lion (man)rdquoessere un pane ldquoto be a breadrsquoessere un demonio (di uomo) ldquoto be a devil (man)rdquo

124CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re35E

xtractofth

eF

SAfo

rth

eextractio

no

fPE

CO

idio

ms

35 THE AUTOMATIC EXPANSION OF SENTITA 125

Differently from essere un pane that is not acceptable at all es-sere come il demonio and essere un demonio (di uomo) are accept-able sentences that change the polarity of the basic idiom probablybecause they are related with another comparative sentence with op-posite meaning and orientation eg essere (brutto + cattivo) come ildemonio ldquoto be (ugly + evil) like the devilrdquo (Vietri 1990)The last two paths respectively refer to the correlation of PECO withother sentence structures such as N0 essere di C1 (4) indicated by theproperty +N0esserediC1 and N0 essere piugrave Agg di C summarized in+N0esserepiu in the dictionary and in the path (5)

essere (fatto + E) di pane ldquoto be made of breadrdquoessere (fatto + E) di leone ldquoto be made of lionrdquoessere (fatto + E) di demonio ldquoto be made of devilrdquo

essere piugrave (forte + coraggioso) di un leone ldquoto be (stronger + braver)than a lionrdquoessere piugrave buono del pane ldquoto be better that the breadrdquoessere piugrave (astuto + furbo) del demonio ldquoto be cleverer that the devilrdquo

While just the first idiom is allowed to walk through the path (4)the path (5) implies a property accepted by all the exemplified idioms

35 The Automatic Expansion of SentIta

The present Section discusses the automatic enlargement of the man-ually built electronic dictionaries of SentItaOur work took advantage of derivation linguistic clues that put in rela-tion semantically oriented adjectives with quality nouns and with ad-verbs in -mente The purpose is making new words automatically de-rive the semantic information associated to the adjectives with whichthey are morpho-phonologically relatedFurthermore we used as morphological Contextual Valence Shifters(mCVS) a list of prefixes able to negate or to intensifydowntone the

126CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

orientation of the words with which they occurWe clarify in advance that the morphological method could have beenapplied also to Italian verbs but we chose to avoid this solution be-cause of the complexity of their argument structures We decidedinstead to manually evaluate all the verbs described in the ItalianLexicon-Grammar binary tables so we could preserve the differentlexical syntactic and transformational rules connected to each oneof them

351 Literature Survey on Sentiment Lexicon Propagation

The works on sentiment lexicon propagation follow three main re-search lines The first one is grounded on the richness of the alreadyexistent thesauri WordNet (Miller 1995) among others AlthoughWordNet does not include semantic orientation information for itslemmas semantic relations such as synonymy or antonymy are com-monly used in order to automatically propagate the polarity startingfrom a manually annotated set of seed words Anyway this approachpresents some drawbacks such as the lack of scalability the unavail-ability of enough resources for many languages including Italian andthe difficulty to handle newly coined words which are not alreadycontained in the thesauri Furthermore homographs which belongto different synsets could present a diversity of meanings and polari-ties (Silva et al 2010)The second line of research is based on the hypothesis that the wordsthat convey the same polarity appear close in the same corpus sothe propagation can be performed on the base of co-occurrence al-gorithmsIn the end the morphological approach is the one that employs mor-phological structures and relations for the the assignment of the priorsentiment polarities to unknown words on the base of the manipula-tion of the morphological structures of known lemmas

35 THE AUTOMATIC EXPANSION OF SENTITA 127

In the following Paragraphs we will briefly describe the most interest-ing contributions pertaining to the just described research areas

3511 Thesaurus Based Approaches

Kamps et al (2004) investigated the graph-theoretic model of Word-Netrsquos synonymy relation and using elementary notions from graphtheory proposed a measure for the automatic attribution of the se-mantic orientation to adjectives All the words listed in WordNet havebeen collected and related on the base of their occurrence in the samesynsetAlthough current WordNet-based measures of distance or similarityusually focus on taxonomic relations and in general include in theiralgorithms the antonymy relation the authors preferred to use onlythe synonymy relationHu and Liu (2004) chose to limit the lexicon construction to the ad-jectives occurring in sentences that contained one or more productfeatures due to their defined interest on product featuresIn their research in order to predict the semantic orientations ofsuch adjectives the authors proposed a simple and effective methodgrounded on the adjective synonym and antonym sets discoveredsurfing the WordNet graphsKim and Hovy (2004) proposed a system for the automatic detectionof opinion holders topics and features for each opinion that containstwo modules one for the determination of the word sentiment andanother for the sentiment sentencesIn their System Architecture it is interesting for the task in exam thestage in which the word sentiment classifier individually calculatedthe polarity of all the sentiment-bearing words Their basic approachstarted from a small amount of seed words sorted by polarity into apositive and a negative list WordNet has been used to lengthen theinitial list on the base of two very intuitive insights synonyms of polarwords which usually preserve the orientation of the original lemma

128CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

and antonyms of polar words that reverse the orientation of the origi-nal lemmasEsuli and Sebastiani (2005) presented a semi-supervised learningmethod for the orientation identification of subjective terms basedon the classification of their glossesThe authors started from a representative seed set built for both thepositive and negative categories and from some lexical relations de-fined in WordNet (eg synonymy hypernymy and hyponymy for theannotation of terms with the same orientation and direct antonymyand indirect antonymy for the terms with opposite orientation) Everytime in which a new term is added to the original ones it becomes partof the training set for the learning phase performed with a binary textclassifier All the glosses of a given term found in a machine-readabledictionary are collocated into a vectorial form thanks to standard textindexing techniquesEsuli and Sebastiani (2006b) tested the following approaches

bull a two-stage method in which a first classifier groups the termsinto the Subjective or Objective categories and another classifiertags the Subjective terms as Positive or Negative

bull other two binary classifiers based on learning algorithms cat-egorise terms belonging to the Positivenot-Positive categoriesand the ones fitting the Negativenot-Negative categories

bull a method that simply considers Positive Negative and Objectiveas three categories with equal status

Argamon et al (2009) grounded on the Appraisal Theory a methodfor the automatic determination of complex sentiment-related at-tributes (eg type and force) The authors applied supervised learn-ing to Word-Net glosses The method started from small training setsof (positive or negative) seed words and then added them new setsof terms collected in the WordNet graph along the synonymy andantonymy relations This way based on the WordNet glosses each

35 THE AUTOMATIC EXPANSION OF SENTITA 129

term can received a vectorial representations thanks to standard textindexing techniquesDragut et al (2010) based their research on few basic assumptions

bull the semantic relations between words with known polarities canbe used to enlarge sentimental word dictionaries

bull two words are related if they share at least a synset

bull each synset has a unique polarity

bull the deduction can be performed just on few WordNet semanticrelations eg hyponymy antonymy and similarity

Thanks to these intuitions given a sentimental seed dictionary theydeduced approximately an additional 50 of words endowed with aspecific polarity on top of the electronic dictionary WordNetHassan and Radev (2010) applied a Markov random walk model to alarge word relatedness graph with the purpose of producing a polarityestimate of the subjective wordsIn detail in their network two nodes are linked if they are semanti-cally related Among the sources of information used as indicators ofthe relatedness of words we mention WordNetThe hypothesis of the work is the following a random walk that startswith a given word is more likely to firstly hit a word with the same se-mantic orientation than words with a different orientationPaulo-Santos et al (2011) with a semi-supervised approach createda small polarity lexicon (3000+ entries 8486 accuracy) for the Por-tuguese language having as input only a limited collection of re-sources a set of ten seed words labelled as positive or negative acommon online dictionary a set of extraction rules and an intuitivepolarity propagation algorithmThe authors implemented their task by converting the online dictio-nary into a directed graph in which nodes correspond to words andedges correspond to synonyms antonyms or other semantic relationsbetween words Thus they propagated the polarities of the known

130CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

seed set to unlabelled words applying a graph breadth-first traversalIn their research Maks and Vossen (2011) explored two approachesfor the annotation of polarity (positive negative and neutral) of ad-jective synsets in Dutch using WordNet as lexical resource The firstapproach is based on the translation of the English SentiWordNet 10(Esuli and Sebastiani 2006b) into Dutch and the respective transferof the lemmasrsquo polarity values In the second approach WordNet isused in combination with a propagation algorithm that starts with aseed list of synsets of known sentiment and extend the polarity valuesthrough WordNet making use of its lexical relations

3512 Corpus Based Approaches

Baroni and Vegnaduzzo (2004) observed the co-occurrences of polaradjectives into subjective texts The ranking of a large list of adjectivesaccording to a subjectivity score has been implemented without anyknowledge-intensive external resource (eg lexical databases humanannotation ect ) The authors made use just of an unannotated listof adjectives and a small seeds set of manually selected subjective ad-jectivesThe main idea of this work is taken from the Turney (2001) work on co-occurrence statistics applied on the web as a corpus through the Web-based Mutual Information (WMI) method Baroni and Vegnaduzzo(2004) obtained their subjectivity scores by computing the Mutual In-formation of pairs of adjectives taken from each set using frequencyand co-occurrence frequency counts on the World Wide Web col-lected through queries to the AltaVista search engineKanayama and Nasukawa (2006b) proposed an unsupervised methodof lexicon construction for the annotation of polar clauses for domain-oriented Sentiment AnalysisThey grounded their work on unannotated corpora and anchoredtheir research on the context coherency (the tendency for equal po-larities to appear successively in contexts) achieving very satisfying re-

35 THE AUTOMATIC EXPANSION OF SENTITA 131

sults in terms of PrecisionQiu et al (2009) proposed a double propagation method which in-volved different kinds of relations between both sentiment words andfeatures (words modified by the sentiment words)With this method it has been possible to locate specific sentimentwords from relevant reviews starting from a small set of seed senti-ment words Their research emphasized the fact that in reviews senti-ment words always occur in association with featuresDouble propagation basically means that both sentiment words andfeatures can be reutilized to extract new sentiment words and newfeatures until no more sentiment words or features can be identifiedto continue the process The relations are described above all by De-pendency Grammars and trees Tesniegravere (1959) which represented thebase for the design of the extraction rulesIn detail the polarity of the new words is inferred using the followingrules

bull Heterogeneous rule the same polarity of known words is as-signed to the novel words

bull Homogeneous rule the same polarity of known words is as-signed to the novel words unless contrary words occur betweenthem

bull Intra-review rule in the cases in which the polarity cannot beinferred by dependency cues it is assumed that the sentimentword takes the polarity of the whole review

Maks et al (2012) proposed a method for the Dutch language thatis based on the idea that the words that express different types of sub-jectivity are distributed differently depending on the text types there-fore they grounded their work on the comparison between three cor-pora Wikipedia News and News commentsIn order to perform their task they used and test two different calcula-tions generally employed in Keyword Extraction tasks to identify the

132CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

words significantly frequent in a corpus with respect to other corporathe Log-likelihood Technique and the Percentage Difference Calcula-tions (DIFF) Their research revealed the better performances of theDIFF calculationWawer (2012) introduced for the Polish language a novel iterativetechnique of sentiment lexicon expansion which involved rule min-ing and contrast sets discovery The research took advantages fromtwo different textual resources in a first stage a small corpus endowedwith morpho-syntactic annotations (The National Corpus of Polish)to acquire candidates for emotive patterns and evaluate the vocabu-lary and then the whole web as corpus to perform the lexical expan-sionThe key idea of this work implies the fact that if a word appears inthe same extraction patterns as the seeds so they belong to the samesemantic class

3513 The Morphological Approach

Hatzivassiloglou and McKeown (1997) proposed a method for thesentiment lexicon expansion based on morphological relations be-tween adjectives They demonstrated that adjectives related in formalmost always have different semantic orientations eg ldquoadequate-inadequaterdquo ldquothoughtful-thoughtlessrdquo achieving 97 accuracyMoilanen and Pulman (2008) proposed five methods for the assign-ment of the prior sentiment polarities to unknown words based onknown sentiment carriers They started from a core sentiment lexi-con which contained 41109 entries tagged with positive (+) neutral(N) or negative (-) prior polarities and then they employed a classi-fier society of five rule-driven classifiers every one of which adopteda specific analytical strategy

Conversion that estimated zero-derived paronyms by retagging theunknown words with different POS tags

35 THE AUTOMATIC EXPANSION OF SENTITA 133

Morphological derivation that relied on derivational and inflectionalmorphology and is based on the words progressive transforma-tion into shorter paronymic aliases by using affixes and neo-classical combining forms Also some polarity reversal affixes(eg -less and not-so-) were supported

Affix-like Polarity Markers that computed affix-like sentiment mark-ers (eg well-built badly-behaving) which usually propagatesits sentiment orientation across the whole word

Syllables that split unknown words into individual syllables with arule-based syllable chunker and then computed them in orderto connect them with the words contained in the original lexicon

Shallow Parsing that split and POS-tagged unknown word into vir-tual sentences

Ku et al (2009) employed morphological structures and relationsfor the opinion analysis on Chinese words and sentences They classi-fied the words into eight morphological types through Support VectorMachines (SVM) and Conditional Random Fields (CRF) classifiersChinese morphological formative structures consist in three majorprocesses compounding affixation and conversion Every word iscomposed of one or more Chinese characters on the base of whichthe wordrsquos meaning can be interpretedThe authors demonstrated that the injection of morphological infor-mation can truly improve the performances of the word polarity de-tection taskNeviarouskaya (2010) proposed the expansion of the Sentiful lexicongrounding its task on the manipulation of the morphological struc-ture base and affixes of known lemmas (Plag 2003)The derivation process a linguistic device for the creation of new lem-mas from known ones by adding prefixes or suffixesThe results of the application of this method are 4000+ new derivedand scored sentiment words (1400+ adjectives almost 500 adverbs

134CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1800 nouns and almost 350 verbs)The author distinguished four types of affixes on the base of their rolesin the sentiment feature attribution

Propagating The sentiment features are preserved and are inher-ited by new derived lexical units often belonging to differentgrammatical categories (eg en- -ous -fy) The scoring functiontransfers the original polarity score to the new word without anyvariation

Reversing The affixes have the effect of changing the semantic orien-tation of the original lexeme (eg dis- -less) The scoring func-tion simply switches the original score of the starting lemma

Intensifying The sentiment features are strengthened (eg over-super-) The scoring function increases the score of the new wordwith respect to the score of the original lemma

Weakening The sentiment features are decreased (eg semi-) so thescore of the derive word is proportionally reduced by the scoringfunction

In the the approach of Neviarouskaya (2010) if a word is not alreadycontained in the Sentiful lexicon the algorithm checks the presenceof the missing lemma into the Wordnet database if the word actuallyexists there it is taken into account for a future inclusion into theSentiful lexicon Its sentiment orientation is always deduced from thefeatures of the starting word and the affixes used to derivate itNeviarouskaya et al (2011) described methods to automatically gen-erate and score a new sentiment lexicon SentiFul and expand itthrough direct synonymy and antonymy relations hyponymy rela-tions derivation and compounding with known lexical unitsThe authors distinguished four types of affixes used to derive newwords depending on the role they play with regard to sentimentfeatures propagating reversing intensifying and weakening Theyelaborated the algorithm for automatic extraction of new sentiment-

35 THE AUTOMATIC EXPANSION OF SENTITA 135

related compounds from WordNet by using words from SentiFul asseeds for sentiment-carrying base components and applying thepatterns of compound formationsWang et al (2011) proposed a morpheme-based fine-to-coarse strat-egy for Chinese sentence-level sentiment classification which usedpre-existing sentiment dictionaries in order to extract sentiment mor-phemes and calculate their polarity intensity Such morphemes arethen reused to evaluate the sentence-level semantic orientation onthe base of the sentiment phrases and their relevant polarity scoresThe authors preferred morphemes to words as the basic tokens forsentiment classification because they are much less numerous thanwordsWang et al (2011) derived the semantic orientation of unknown senti-ment words on the base of their component morphemes (Yuen et al2004 Ku et al 2009) thanks to a specific morpheme segmentationmodule which applied the Forward Maximum Matching (FMM) wordsegmentation technique for the decomposition of words into stringsof morphemes Their approach can manage both unknown lexicalsentiments and contextual sentiments for sentence-level sentimentclassification by taking into consideration multiple granularity-levelsentiments (eg sentiment morphemes sentiment words and senti-ment phrases) in a morpheme-based framework

352 Deadjectival Adverbs in -mente

As anticipated this Section aims to enlarge the size of SentIta on thebase of the morphological relations that connect the words and theirmeanings In a first stage of the work more than 5000 labeled ad-jectives have been used to predict the orientation of the adverbs withwhich they were morphologically related This Section explores therules formalized to perform the task

136CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Adverbs are morphologically invariable and consequently they donot present any inflection Anyway the majority of them is charac-terized by a complex structure that includes an adjective base and thederivational morpheme -mente ldquo-lyrdquo

[[veloce]A -mente]AVV ldquofast-lyrdquo

[[fragile]A -mente]AVV ldquodelicate-lyrdquo

[[rapido]A -mente]AVV ldquorapid-lyrdquo

Therefore thanks to a morphological FSA in SentIta the dictio-nary of sentiment adverbs has been derived from the adjectives one(see Figure 36) All the adverbs contained in the Italian dictionary ofsimple words have been put in a Nooj text and the above-mentionedgrammar has been used to quickly populate the new dictionary by ex-tracting the words ending with the suffix -mente ldquo-lyrdquo (Ricca 2004b)and by making such words inherit the adjectivesrsquo polarityThe Nooj annotations consisted in a list of 3200+ adverbs that at alater stage has been manually checked in order to adjust the gram-marrsquos mistakes (eg vigorosamente ldquovigorouslyrdquo and pazzamenteldquomadlyrdquo that respectively come from the positive adjective ldquovigorousrdquoand the negative adjective ldquomadrdquo as adverbs become intensifiers) andto add the Prior Polarity to the adverbs that did not end with the suf-fixes used in the grammar (eg volentieri ldquogladlyrdquo controvoglia ldquoun-willinglyrdquo)The manual check produced a set of 3600+ adverbs of sentiment thathas been completed with 126 adverbs that did not end in -menteIn detail the Precision achieved in this task is 99 and the Recall is88 The distribution of semantic tags in this dictionary is reported inTable 327 Figure 36 shows the local grammar in which we formal-ized the rules for the adverbs formation We started the word recogni-tion anchoring it on the localization of an adjective stem (see the firstnode on the left in the variable Agg)Than the automaton continues the word analysis by checking

35 THE AUTOMATIC EXPANSION OF SENTITA 137

Fig

ure

36

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofs

enti

men

tad

verb

s

138CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Tag Automatically Manuallygenerated adjusted

POS+FORTE 82 89POS 743 779

POS+DEB 61 55NEG+DEB 436 456

NEG 1273 1466NEG+FORTE 160 158

FORTE 366 482DEB 84 163

SINTESI 0 9CONFERMA 0 17

INVERTE 0 19Tot 3205 3693

Tot Int 450 645Tot Sent 2755 3003

Tot Discorso 0 42

Table 327 Composition of the adverb dictionary

that the adjective stem belongs to the sentiment dictionary (eg$Agg=A+POS+DEB) Figure 36 represents just an extract of the biggergrammar that includes not only the positive (POS) adverbs but alsothe negative onesNotice in the center of the FSA a multiplication of paths This dependson the different inflectional classes of adjectives to which the suffix -mente is attached (Figure 328)The rules used in the grammar to derive the adverbs are given in thefollowing (see Table 329 and Figure 328)

bull class 2 first path nothing in the adjective changes (eg veloc-eveloce-mente)

bull class 2 second path in the adjectives ending in -re -le the -e isdeleted [e] (eg fragil-e fragil-mente )

bull class 1 third path the -o is deleted [o] and substituted by the

35 THE AUTOMATIC EXPANSION OF SENTITA 139

Adj Number=s Number=pClass Gender=m Gender=f Gender=m Gender=f

1 -o -a -i -e2 -e -i

Table 328 Adjectiversquos inflectional classes Salvi and Vanelli (2004)

Adj Adjective AdverbClass Deletion Stem Thematic Vowel Suffix

1 rapid-o o rapid- -a-2 veloc-e - veloce- - -mente

fragil-e e fragil- -

Table 329 Rules for the adverb derivation

thematic vowel -a (eg rapid-o rapid-a-mente)

Actually in the FSA almost nothing about the inflection of the baseadjectives has been specified Because the adverbs of sentiment mustbe identified among the whole list of nooj adverbs and semanticallyannotated (and not generated from scratch) we did not find neces-sary to recall all the inflectional classes of the derived adjectives justthe deletion or the conservation of the final vowels was used to selectthe correct derivational ruleMistakes concerning the annotation of the adjectives regard those ex-ceptions that were not enough productive to deserve a specific pathin the local grammar Examples are the adjectives in -lento althoughthey have the -o as last vowel they do not require the thematic vowel-a- but the -e- (eg violento becomes violent-e-mente rather thanviolent-a-mente)

3521 Semantics of the -mente formations

The meaning of the deadjectival adverbs in -mente is not always pre-dictable starting from the base adjectives from which they are derived

140CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Also the syntactic structures in which they occur influences their in-terpretation Depending on their position in sentences the deadjecti-val adverbs can be described as follows

Adjective modifiers they modify adjectives or other adverbs eventhough two adverbs in -mente can appear close almost never

bull degree modifiers

ndash intensifierseg altamente ldquohighlyrdquo enormemente ldquoenormouslyrdquo

ndash downtonerseg moderatamente ldquomoderatelyrdquo parzialmente ldquopar-tiallyrdquo

Predicate modifiers they are translatable with the paraphrase in a Away in which A is the adjective base

bull verb argumentseg Maria si comporta perfettamenteldquoMaria acts perfectlyrdquo

bull extranuclear elementseg Maria si reca settimanalmente a MilanoldquoMaria goes weekly to Milanrdquo

Sentence modifiers they usually do not modify the sentence contentbut they often give it coherence or work as discourse signals

bull evaluative adverbs eg stupidamente ldquofoolishlyrdquo

bull modal adverbs eg necessariamente ldquonecessarilyrdquo

bull circumstantial adverbs of time eg ultimamente ldquolatelyrdquo

bull quantifiers over time eg frequentemente ldquooftenrdquo

bull domain adverbs eg politicamente ldquopoliticallyrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 141

The French LG description of adverbs in -mente includes 3200+items represented in LG tables which classify the adverbs in a sim-ilar way They refer to sentential adverbs eg conjuncts style dis-juncts and attitude disjuncts that includes evaluative adverbs adverbsof habit modal adverbs and subject oriented attitude adverbs andadverbs integrated into the sentence eg adverbs of subject orientedmanner adverbs of verbal manner adverbs of quantity adverbs oftime viewpoint adverbs focus adverbs (Molinier and Levrier 2000Sagot and Fort 2007)

3522 The Superlative of the -mente Adverbs

As concern the superlative form of the adverbs in -mente it must beunderlined that they have been treated as adverbs derived from thesuperlative form of the adjectives In fact the rule for the adverb for-mation is selected by the inflectional paradigm of the superlative formand not by the adjective inflection Moreover the semantic orienta-tion inherited by the superlative adverb is again the one belonging tothe superlative adjective and not the one of the adjective itself There-fore the superlative adverbs FSA is almost the same of the one shownin Figure 1 the only difference is in the recognition of the base adjec-tive that is A+SUP rather than only A

353 Deadjectival Nouns of Quality

The derivation of quality nouns (QN) from qualifier adjectives is an-other derivation phenomenon of which we took advantage for the au-tomatic enlargement of SentIta These kind of nouns allow to treat asentities the qualities expressed by the base adjectivesA morphological FSA (Figure 37) following the same idea of the ad-verbs grammar matches in a list of abstract nouns the stems that are

142CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

in morpho-phonological relation with our list of hand-tagged adjec-tives Because the nouns differently from the adverbs need to havespecified the inflection information we associated to each suffix en-try in the QN suffixes electronic dictionary the inflectional paradigmthat they give to the words with which they occur (see Table 330)

As regards the suffixes used to form the quality nouns (Rainer 2004)it must be said that they generally make the new words simply inheritthe orientation of the derived adjectives Exceptions are -edine and -eria that almost always shift the polarity of the quality nouns into theweakly negative one (-1) eg faciloneria ldquoslapdash attitude Also thesuffix -mento differs from the others in so far it belongs to the deriva-tional phenomenon of the deverbal nouns of action (Gaeta 2004) Ithas been possible to use it into our grammar for the deadjectival nounderivation by using the past participles of the verbs listed in the ad-jective dictionary of sentiment (eg Vsfinire ldquoto wear out Asfinitoldquoworn out Nsfinimento ldquoweariness) Basically we chose to includeit in our list of suffixes because of its productivity This caused an over-lap of 475 nouns derived by both the psychological predicates and thequalifier adjectives of sentiment Just 185 psych nominalizations ex-ceeded the coverage of our mophological FSA proving the power ofour methodology also in terms of RecallIn about half the cases the annotations differed just in terms of inten-

sity the other mistakes affected the orientation also (see Table 333)In general the Precision achieved in this task is 93 while the Preci-sion performances of the automatic tagging of the QN compared withthe manual tagging made on the corresponding nominalizations ofthe psych verbs is summarized in Table 334Tables 331 and 332 show the differences in productivity of all the suf-

fixes (SFX) among the electronic dictionary of abstract nouns (Abstr)the sublists of QN that have their counterparts in the Psy V-n list (PsyV-n=QN) and the whole collection of QN (QN tot)As regards the suffixes productivity Abstr the suffixes that achievedthe higher percentage values are -itagrave -ia and -(zione) The most pro-

35 THE AUTOMATIC EXPANSION OF SENTITA 143

Fig

ure

37

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofq

ual

ity

no

un

s

144CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Infl

ecti

on

alp

arad

igm

(FL

X)

Co

rrec

tmat

ches

Err

ors

Pre

cisi

on

()

-edin-e N46 0 0 --etagrave N602 0 0 --izie N602 0 0 --el-a N41 0 1 0-udine N46 5 2 7143-or-e N5 36 9 80-zion-e N46 359 59 8589-anz-a N41 57 9 8636-itudin-e N46 13 2 8667-ur-a N41 142 20 8765-ment-o N5 514 58 8986-izi-a N41 14 1 9333-enz-a N41 148 10 9367-eri-a N41 71 4 9467-ietagrave N602 27 1 9643-aggin-e N46 72 2 973-i-a N41 145 3 9797-itagrave N602 666 13 9809-ezz-a N41 305 2 9935-igi-a N41 3 0 100-z-a N41 2 0 100tot - 2579 196 9294

Table 330 Error analysis of the automatic QN annotation

35 THE AUTOMATIC EXPANSION OF SENTITA 145

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 498 73 305-aggin-e 149 13 72-itudin-e 31 1 13-ietagrave 65 2 27-igi-a 9 0 3-izi-a 40 2 14-enz-a 461 11 148-anz-a 189 7 57-eri-a 267 13 71-itagrave 3263 53 666-or-e 297 6 36-zion-e 2866 87 359-ur-a 1700 11 142-udin-e 39 3 5-i-a 3545 13 145-el-a 15 0 0-edin-e 4 0 0-etagrave 66 0 0-izie 3 0 0-z-a 30 2 2-ment-o 2520 150 514

Table 331 Presence of the QN suffixes in different dictionaries

146CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 31 1633 628-aggin-e 093 291 148-itudin-e 019 022 027-ietagrave 04 045 056-igi-a 006 0 006-izi-a 025 045 029-enz-a 287 246 305-anz-a 118 157 117-eri-a 166 291 146-itagrave 2032 1186 1372-or-e 185 134 074-zion-e 1785 1946 74-ur-a 1059 246 293-udin-e 024 067 01-i-a 2208 291 299-el-a 009 0 0-edin-e 002 0 0-etagrave 041 0 0-izie 002 0 0-z-a 019 045 004-ment-o 1569 3356 1059

Table 332 Productivity of the QN suffixes in different dictionaries

35 THE AUTOMATIC EXPANSION OF SENTITA 147

Co

rres

po

nd

ence

s

Dif

fere

nce

s

Dif

fere

ntO

rien

tati

on

On

lyD

iffe

ren

tIn

ten

sity

475 185 93 92

Table 333 About the overlap among Psych V-n dictionary and QN dictionary

Precision Precision Orientation onlyQNltgtPsy V-n 092 -QN= Psy V-n 061 080

Table 334 Precision measure about the overlap of QN and Psy V-n

ductive suffixes for the Psy V-n=QN formation are -(z)ione -mento and-ezza while the ones most productive for the QN tot are -itagrave -(z)ione-mento

354 Morphological Semantics

Our morphological FSA can moreover interact with a list of pre-fixes able to negate (eg anti- contra- non- ect ) or to inten-sifydowntone (eg arci- semi- ect ) the orientation of the wordsin which they appear (Iacobini 2004)If the suffixes for the creation of quality nouns can interact with thepreexisting dictionaries of the Italian module of Nooj in order toautomatically tag them with new semantic descriptions the prefixestreated in this Section directly work on opinionated documents sothe machine can understand the actual orientation of the wordsoccurring in real texts also when their morphological context shifts

148CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

the polarity of the words listed in the dictionariesThe collection of the suffixes that from now on we will call Morpho-logical Contextual Valence Shifters (mCVS) is reported in Tables 335and 336

Prefixes Negation typesnon- CDDmis- CRa- CR+PRVdis- CR+PRVin- CR+PRVs- CR+PRVanti- OPPcontra- OPPcontro- OPPde- PRVdi- PRVes- PRV

Table 335 Negation suffixes

They are endowed with special tags that specify the way in whichthey alter the meaning of the sentiment words with which they occur

bull FORTE ldquostrongrdquo intensifies the Semantic Orientation of thewords making their polarity increase of one position in the eval-uation scale (first path in Figure 38)

bull DEB ldquoweakrdquo downtones the Semantic Orientation of the wordsmaking their polarity decrease of one position in the evaluationscale (second path in Figure 38)

bull NEGAZIONE ldquonegationrdquo works following the same rules of thenegation formalised in the syntactic grammars (third path in Fig-ure 38)

ndash CR ldquocontraryrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 149

Intensifiers Downtonersiper- bis-macro- fra-maxi- infra-mega- intra-multi- ipo-oltre- micro-pluri- mini-poli- para-sopra- semi-sovra- sotto-stra- sub-super- -sur- -ultra- -

Table 336 Intensifierdowntoner suffixes

ndash OPP ldquooppositionrdquo

ndash PRV ldquodeprivationrdquo

ndash CDD ldquocontradictionrdquo

Figure 38 shows the morphological FSA that combines the polarityand intensity of the adjectives of sentiment with the meaning carriedon by the mCVSThe shifting rules in terms of polarity score which are the same ex-ploited in the syntactic grammar net (see Section 4) are summarizedin this grammar through the green comments on the right side of theautomatonAs regards the manipulation of the annotations of the resulting wordswe followed three parallel solutions

bull when the score doesnrsquot change (eg the case of the intensificationof words with starting polarity +3 or the case in which words withpolarity -1 are decreased) the resulting word simply inherits theinflectional ($2F) and syntactic and semantic ($2S) information

150CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re38M

orp

ho

logicalC

on

textualV

alence

Shiftin

go

fAd

jectives

35 THE AUTOMATIC EXPANSION OF SENTITA 151

of the original word

bull when the intensity changes and the polarity remains steady (egwhen the words with polarity +-2 are intensified or downtoned)the resulting word inherits the inflectional and syntactic and se-mantic information of the original word but also obtain the tagsFORTEDEB

bull when both the intensity and the polarity change (in almost ev-ery case in which the words are negated) the resulting word justinherits the inflectional information while the semantic tags areadded from scratch

We used the list of the sentiment adjectives as Nooj text in order tocheck the performances of our grammar The discovery that the wordscontaining the mCVS lemmatised in the dictionary are very few (just29 adjectives eg straricco ldquovery richrdquo ultraresistente ldquoheavy dutyrdquostramaledetto ldquodamnedrdquo and 9 adverbseg strabene ldquovery wellrdquo ul-trapiattamente ldquovery dullyrdquo) increases the importance of a FTA likethe one described in this ParagraphIt is also important to underline that all the synonyms of poco ldquofewrdquoalso the ones that take the shape of morphemes eg ipo- sotto- sub-do not seem be proper downtoners but resemble the behaviour ofthe negation words that transform the sentiment words with whichthey occur into weakly negative ones see Section 42 (eg ipodotatoldquosubnormalrdquo ipofunzionante ldquohypofunctioningrdquo) Therefore we ex-cluded them from the dictionary of mCVS and we included it into adedicated metanode of the morphological FSA able to correctly com-pute its meaning

Chapter 4

Shifting Lexical Valencesthrough FSA

41 Contextual Valence Shifters

In Section 331 we provided the definitions of Semantic Orientationthe measure of the polarity and intensity of an opinion (Taboada et al2011 Liu 2010) and of Prior Polarity the individual words contex-independent orientation (Osgood 1952) While the second concept isrelevant during the population of sentiment dictionaries the first onerefers to the final semantic annotation that can be automatically ormanually attributed to whole sentences and documentsThe need to make a distinction between these two measures is due tothe semantic modifications that can come from the words surround-ing contexts (Pang and Lee 2008a) Contextual Valence Shifters arelinguistic devices able to change the prior polarity of words when co-occurring in the same context (Kennedy and Inkpen 2006 Polanyiand Zaenen 2006)From a computational point of view Contextual Valence Shifting

153

154 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

is nothing more than an enhanced version of the term-countingmethod firstly proposed by Turney (2002) but from the distribu-tional analysis perspective it issues a number of challenges that de-serve to be examined (Neviarouskaya et al 2009a)In this work we handle the contextual shifting by generalizing all thepolar words that posses the same prior polarity A network of localgrammars as been designed on a set of rules that compute the wordsindividual polarity scores according to the contexts in which they oc-curIn general the sentence annotation is performed using six differentmetanodes that working as containers for the sentiment expressionsassign equal labels to the patterns embedded in the same graphsWe used the FSA editor of Nooj to formalize our CVS rules for theirease of use and their computational power but nothing prevents theuse of other strategies and tools for their application to texts Con-sequently in this chapter we will limit the discussion to the shift-ing rules without any further mentions of their actual formalizationwhich in a banal way can be summarized into the treatment of em-bedded graphs as score boxes

42 Polarity Intensification and Downtoning

421 Related Works

Amplifiers and downtoners (Quirk et al 1985) intensifiers and di-minishers (Polanyi and Zaenen 2006) overstatements and under-statements (Kennedy and Inkpen 2006) intensifiers and downtoners(Taboada et al 2011) are all couples of concepts that refer to the in-creasing or decreasing effects of special kinds of words on orientedlexical itemsAlthough the changes of the polarity scores are very intuitive the sim-ple addition and subtraction of a score tofrom the base valence of a

42 POLARITY INTENSIFICATION AND DOWNTONING 155

term (Polanyi and Zaenen 2006 Kennedy and Inkpen 2006) has beencriticized by Taboada et al (2011) which instead associated differentpercentage values (positive for intensifiers and negative for downton-ers) to each strength wordDifferently from both of these works we decided to restrain the com-plexity of the rules by avoiding the distinction of degrees In compen-sation we did not limit the intensive function only to degree adverbsand adjectives but we took under consideration also verbs and nouns

422 Intensification and Downtoning in SentIta

The lexical items for the computational treatment of intensification inSentIta are the ones denoted by the tags FORTE and DEBAs stated in the previous Chapter Section 33 the lexical items en-dowed with these labels do not possess any orientation but can mod-ify the intensity of semantically oriented wordsThe strength indicators included in the intensification FSA that ac-count for (but are not limited to) such modifiers are the following

bull Morphological Rules

ndash the use of the absolute superlative suffixes -issim-o and -errim-o that always increase the wordsrsquo semantic orienta-tions

ndash the use of intensifying or downtoning prefixes (eg super-stra- semi- micro-) shown in Section 354

bull Syntactic Rules

ndash the repetition of more than one negative or positive words(that always increases the wordsrsquo semantic orientations) egbello[+2] bello[+2] bello[+2] [+3] ldquonice nice nicerdquo)

ndash the co-occurrence of words belonging to the strength scale(tags FORTEDEB) with the sentiment words listed in the

156 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

evaluation scale (tags POSNEG)

As generally assumed in literature the adverbs intensify or atten-uate adjectives (46) verbs (48) and other adverbs (49) while the ad-jectives modify the intensity of nouns (47) We adopted the simpleststrategy for the polarity modification the additionsubtraction of onepoint into the evaluation scale (Polanyi and Zaenen 2006 Kennedyand Inkpen 2006) Because this scale does not exceed the +ndash3words that starts from this polarity can not be increased beyond (egdavvero[+] eccezionale[+3] [+3] ldquotruly exceptionalrdquo)

(46) Parzialmente[-] deludente[-2] anche il reparto degli attori [-1]ldquoPartially unsatisfying also the actor staff

(47) Ciograve che ne deriva () egrave una terribile[-2] confusione[-2] narrativa[-3]

ldquoWhat comes from it is a terrible narrative chaos

(48) Alla guida ci si diverte[+2] molto[+] [+3]ldquoIn the driverrsquos seat you have a lot of fun

(49) Ne sono rimasta molto[+] favorevolmente[+2] colpita [+3]ldquoI have been very favourably affected of it

Anyway as already discussed in Section 334 also special subsetsof verbs possess the power of intensifying or downtoning the polarityof the nominal groups or complement clauses they case by case affectin subject (50) or object (51) position (see Figure 31 Section 334)

(50) Lrsquoamore[+2] travolge[+] Maria [+3]ldquoThe love overwhelms Mariardquo

(51) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoYour words overwhelm Mary with happinessrdquo

Examples of intensifyingdowntoning verbs with the LG classesthey belong are reported below

42 POLARITY INTENSIFICATION AND DOWNTONING 157

bull the list of entries translated from the French work of Balibar-Mrabti (1995) eg travolgere[+] ldquoto overwhelmrdquo risplendere[+] ldquotoshinerdquo

bull verbs from LG class 24 (15 intense 2 weak) eg aumentare[+]

ldquoto increaserdquo gonfiare[+] ldquoto amplifyrdquo calare[ndash] ldquoto go downrdquodiminuire[ndash] ldquoto decreaserdquo

bull verbs from LG class 41 (45 intense 19 weak) eg rivoluzionare[+]

ldquoto revolutionizerdquo sconquassare[+] ldquoto smashrdquo moderare[ndash] ldquotomoderaterdquo placare[ndash] ldquoto appeaserdquo

bull verbs from LG class 47 (43 intense 18 weak) eg strillare[+] ldquotoshoutrdquo sventolare[+] ldquoto show offrdquo mormorare[ndash] ldquoto murmurrdquosussurrare[ndash] ldquoto whisperrdquo

bull verbs from LG class 50 (4 intense 0 weak) eg assicurare[+] ldquotoassurerdquo garantire[ndash] ldquoto guaranteerdquo

bull verbs from LG class 56 (5 intense 3 weak) eg abbandonarsi[+]

ldquoto abandon yourselfrdquo affrettarsi[+] ldquoto rushrdquo accennare[ndash] ldquototouch uponrdquo esitare[ndash] ldquoto hesitaterdquo

As regards intensifying or downtoning nouns we refer to the onesautomatically derived from degree adjectives (220 intense 95 weak)eg sfrenatezza[+] ldquowildnessrdquo abbondanza[+] ldquoabundancerdquo labilitagrave[ndash]

ldquoevanescencerdquo fugacitagrave[ndash] ldquofugacityrdquo and to the nominalizations of in-tensifying or downtoning verbs (41 intense 95 weak) eg rafforza-mento[+] ldquostrengtheningrdquo fervore[+] ldquofervorrdquo affievolimento[ndash] ldquoweak-ening rdquo diminuzione[ndash] ldquodecreaserdquoThey can modify both other nouns (52) and verbs (53)

(52) La sfrenatezza[+] dellrsquoodio[-3] di Maria [-3]ldquoThe wildness of the Maryrsquos haterdquo

(53) Luca difendeva[+2] Maria con fervore[+] [+3]ldquoLuca was defending Maria with fervourrdquo

158 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Intensification negation modality and comparison can appear to-gether in the same sentence We will describe these eventuality in thefurther paragraphs

423 Excess Quantifiers

Words that at first glance seem to be intensifiers but at a deeper anal-ysis reveal a more complex behavior are abbastanza ldquoenoughrdquo troppoldquotoo much and poco ldquonot much Because of their particular influ-ence on polar words their meaning especially when associated topolar adjectives have been associated in literature to other contex-tual valence shifters1 Examples are (Meier 2003 p 69) that proposedan interpretation that includes a comparison between extents

ldquoEnough too and so are quantifiers that relate an extentpredicate and the incomplete conditional (expressed bythe sentential complement) and are interpreted as compar-isons between two extentsThe first extent is the maximal extent that satisfies the extentpredicate The second extent is the minimal or maximal ex-tent of a set of extents that satisfy the (hidden) conditionalwhere the sentential complement supplies the consequentand the main clause the antecedent [] Intuitively thevalue an object has on a scale associated with the meaningof the adjective or adverb I related to some standard of com-parison that is determined by the sentential complementrdquo(Meier 2003 p 69)

(Schwarzschild 2008 p 316) that noticed the possibility of an implicitnegation

1 When quantifiers like the ones under examination modify polar words they compare such wordswith a threshold value that is defined by an infinitive clause in position N1 (eg The food is too good tothrow (it) away) (Meier 2003 Schwarzschild 2008)

42 POLARITY INTENSIFICATION AND DOWNTONING 159

ldquoIn excessives the infinitival complement while it does notform a threshold description it does contain an implicitnegation So like other negative statements excessives sup-port let alone rejoinders2rdquo

And again (Meier 2003 p 71) that transformed the infinitive clauseinto a modal expression

ldquoIn this paper I will propose that the sentential comple-ments of constructions with too and enough implicitly orexplicitly contain a modal expression [] Evidence for thismove is the fact that a modal expression (with existentialforce) can be added or omitted without changing the intu-itive meaning of the sentences [] An explicit modal ex-pression like be able or be allowed in the sentential comple-ment makes them more precise but it does not make themunacceptablerdquo

In this research we noticed as well that the co-occurrence of troppopoco and abbastanza with polar lexical items can provoke in their se-mantic orientation effects that can be associated to other contextualvalence shifters The ad hoc rules dedicated to these words (see Table41) are not actually new but refer to other contextual valence shift-ing rules that have been discussed in this Paragraph andor will beexplored in this ChapterTroppo and poco can also co-occur or can be negated the rules to beapplied in each case are reported in Table 42Troppo poco and abbastanza are also able to give a polarity to wordsthat in SentIta possess a neutral Prior Polarity An exception is the caseof non troppo + neutral words (Table 42) that does not produce gen-eralizable effects (eg non troppo decorato ldquonot so decoratedrdquo couldhave a weakly positive orientation while same can not be said of non

2This fuel is too volatile to use in a car engine = This fuel is too volatile to use in a car engine let alone alawn mower engine = This fuel should not be used in a car engine let alone a lawn mower engine Examplefrom Schwarzschild (2008)

160 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo Intensification Negative Switch Intensificationpoco Weakly Negative Switch Weakly Negative Switch Weak Negationabbastanza Downtoning Weakly Positive Switch Intensification

Table 41 The effects of troppo poco and abbastanza on polar lexical items

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo poco Strong Negation Negative Switch Downtoningun poco troppo Intensification Negative Switch Intensificationnon troppo Negative Switch mdash Weak Negationnon poco Intensification Weakly Positive Switch Intensification

Table 42 The effects of the co-occurrence and the negation of troppo poco withreference to polar lexical items

troppo nuovo ldquonot so newrdquo) Therefore it has been excluded from therules in order to avoid additional errors

43 Negation Modeling

431 Related Works

Negation modeling is an NLP research area that includes the auto-matic detection of negation expressions in raw texts and the determi-nation of the negation scopes that are the parts of the meaning modi-fied by negation) (Jia et al 2009)Negation meets the Sentiment Analysis need of determining the cor-rect polarity of opinionated words when shifted by the context That iswhy we included it into our set of Contextual Valence Shifters (Polanyiand Zaenen 2006 Kennedy and Inkpen 2006)In this paragraph we will survey the literature on negation modelinggoing in depth through the state-of-the-art methods for the automatic

43 NEGATION MODELING 161

negation detection from the pioneer work of Pang and Lee (2004) tomore sophisticated techniques such as some Semantic Compositionmethods which even define the syntactic contexts of the polar expres-sions (Choi and Cardie 2008)Despite the large interest on negation in the biomedical domain(Harkema et al 2009 Descleacutes et al 2010 Dalianis and Skeppstedt2010 Vincze 2010) maybe due to the availability of free annotatedcorpora (Vincze et al 2008) this paragraph focuses on the more sig-nificant works on sentiment analysis

Bag of Words (Pang and Lee 2004) demonstrated that using con-textual information can improve the polarity-classification accu-racy By means of a standard bag-of-words representation theyhandled negation modeling by adding artificial words to textsand without any explicit knowledge of polar expressions (eg Ido not NOT_like NOT_this NOT_new NOT_Nokia NOT_model)The problem with this method is the duplication of the featureswhich are always represented in both their plain and negated oc-currences (Wiegand et al 2010)

Contextual Valence Shifters Polanyi and Zaenen (2006) and Kennedyand Inkpen (2006) proposed a more sophisticated method thatboth switches (eg clever[+2] = not clever[-2]) or shifts (efficient[+2]

= rather efficient[+1]) the initial scores of words when negated ordowntoned in context As a general rule polarized expressionsare considered to be negated if the negation words immediatelyprecede themWilson et al (2005 2009) distinguished the following three typesof binary relationship polarity features for negation modeling

bull negation features negation expressions that negate polar ex-pressions

bull shifter features expressions that can be approximatelyequated with downtoners

162 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull positive and negative polarity modification features specialkind of polar expressions that modify other polar expres-sions turning their polarity into the positive or negative one

Semantic Composition Moilanen and Pulman (2007) exploited thesyntactic representations of sentences into a compositional se-mantics framework with the purpose of computing the polarityof headlines and complex noun phrases In this work composi-tion rules are incrementally applied to constituents dependingon the negation scope by means of negation words shifters andnegative polar expressionsA similar approach is the one of Shaikh et al (2007) that used verbframes as a more abstract level of representationChoi and Cardie (2008) computed the phrases polarity in twosteps the assessment of polarity of the constituent and the sub-sequent application of a set of previously defined inference rulesFor example the rule [Polarity([NP1]ndash [IN] [NP2]ndash) = +] can be ap-plied to phrases like [[lack]ndash

NP1 [of ]IN [crime]ndashNP2 in rural areas])

Liu and Seneff (2009) Taboada et al (2011) proposed composi-tional models able to mix together intensifiers downtoners po-larity shifters and negation words always starting from the wordsprior polarities and intensitiesThe work of Benamara et al (2012) goes beyond the studies men-tioned above since they specifically account for negative polarityitems and for multiple negativesSocher et al (2013) exploited Recursive Neural Tensor Network(RNTN) to compute two types of negation

bull negation of positive sentences where the negation is sup-posed to changes the overall sentiment of a sentence frompositive to negative

bull negation of negative sentences and their negationwhere theoverall sentiment is considered to become less negative (butnot necessarily positive)

43 NEGATION MODELING 163

Negation Scope Detection Jia et al (2009) introduced the conceptof scope of the negation term t The scope identification3 goesthrough the computing of a candidate scope (a subset of thewords appearing after t in the sentence that represent the mini-mal logical units of the sentence containing the scope) and thena pruning of those words The computational procedure thatapproximates the candidate scope considers the following ele-ments static delimiters eg because or unless that signal the be-ginning of another clause dynamic delimiters eg like and forand heuristic rules focused on polar expressions that involve sen-timental verbs adjectives and nouns

Morphological Negation modeling In order to complete the litera-ture survey on negation modeling we can not fail to mentionalso morphological negation modeling However due to the factthat the scope of any morphological negation remains includedinto simple words (Councill et al 2010) we deepened this as-pect of the negation in Sections 351 and 354 of Chapter 3We mention again here only the most significant contributionon this topic Hatzivassiloglou and McKeown (1997) Plag (2003)Yuen et al (2004) Moilanen and Pulman (2008) Ku et al (2009)Neviarouskaya (2010) Neviarouskaya et al (2011) and Wang et al(2011)

432 Negation in SentIta

As exemplified in the following sentences negation indicators do notalways change a sentence polarity in its positive or negative counter-parts (54) they often have the effect of increasing or decreasing thesentence score (55) That is why we prefer to talk about valence ldquoshift-ing rather than ldquoswitchingrdquo

3Scope Identification concerns the determination at a sentence level of the tokens that are affectedby negation cues (Jia et al 2009)

164 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

(54) Citroen non[neg] produce auto valide[+2] [-2]ldquoCitroen does not produce efficient cars

(55) Grafica non proprio[neg] spettacolare[+3] [-1]ldquoThe graphic not quite spectacular

(Taboada et al 2011 p 277) made comparable considerations

ldquoConsider excellent a +5 adjective If we negate it we get notexcellent which intuitively is a far cry from atrocious a -5adjective In fact not excellent seems more positive than notgood which would negate to a -3 In order to capture thesepragmatic intuitions we implemented another method ofnegation a polarity shift (shift negation) Instead of chang-ing the sign the SO value is shifted toward the opposite po-larity by a fixed amount (in our current implementation 4)Thus a +2 adjective is negated to a -2 but the negation of a-3 adjective (for instance sleazy) is only slightly positive aneffect we could call lsquodamning with faint praisersquo [] it is verydifficult to negate a strongly positive word without implyingthat a less positive one is to some extent true and thus ournegator becomes a downtonerrdquo

Just as (Benamara et al 2012 p 11)

ldquo() negation cannot be reduced to reversing polarity Forexample if we assume that the score of the adjective ldquoex-cellentrdquo is +3 then the opinion score in ldquothis student is notexcellentrdquo cannot be -3 It rather means that the student isnot good enough Hence dealing with negation requires togo beyond polarity reversalrdquo

Even though the subtraction of a fixed amount of polarity points4

seems to us a risky generalization the authors confirmed the Hornrsquos

4 Other strategies for the automatic negation modeling have been proposed by Yessenalina and Cardie(2011) who combined words using iterated matrix multiplication and Benamara et al (2012) who morein line with our research computed negation on the base of its own types

43 NEGATION MODELING 165

ideas on the asymmetry between affirmative and negative sentencesin natural language differently from standard logic (Horn 1989)Such assumption complicates any attempt to automatically extractthe meaning of negated phrases and sentences simply switching theiraffirmative scoresOur hypothesis that can be easily collocated into the compositionalsemantics approaches is that the final polarity of a negated expres-sion can be easily modulated but only taking into account at thesame time both the prior polarity of the opinionated expressions andthe strength of the negation indicators (see Tables 43 45 and 46)Despite of any complexity the finite-state technology offered us theopportunity to easily compute the negation influence on polarizedexpressions and to formalize the rules to automatically annotate realtextsAs regards the full list of negation indicators we included in our gram-mar three types of negation (Benamara et al 2012 Godard 2013) thatrespectively include the following items As we will show each typehas a different impact on the opinion expressions in terms of polarityand intensity

Negation operators

bull simple adverbs of negationnoAVV+NEGAZIONE

nonAVV+NEGAZIONE

micaAVV+NEGAZIONE

affattoAVV+NEGAZIONE+FORTE

neancheAVV+NEGAZIONE

neppureAVV+NEGAZIONE

mancoAVV+NEGAZIONE

senzaPREP+NEGAZIONE 5

bull compound adverbs of negationneanche per sognoAVV+AVVC+PC+PN+NEGAZIONE+FORTE

5ldquonotrdquo ldquoat allrdquo ldquoby no meansrdquo ldquoneitherrdquo

166 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

per niente al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

per nulla al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

in nessun modoAVV+AVVC+PDETC+PDETN+NEGAZIONE+FORTE

per nienteAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

per nullaAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

niente affattoAVV+AVVC+CC+AVVAVV+NEGAZIONE+FORTE6

Negation operator Rules Negative operators can appear alone (56)or with a strengthening function with another negation adverb(57)

(56)

Mi caA f f at to

Per ni ente

bel lo l primehotel

ldquoTotally not cool the hotelrdquo

(57) Lprimehotel non egrave

mi caa f f at to

per ni ente

bel lo

ldquoThe hotel is not cool at allrdquo

The general rules concerning negation operators formalized inthe grammars in the form of FSA are summarized in the Tables43 44 45 and 46 The local grammar principle is to put inthe same embedded graph all the structures that are supposedto receive the same score Such score is indicated in the columnShifted Polarity

Negative Quantifiers

bull AdjectivesnessunoA+FLX=A114+NEGAZIONE

nienteA+FLX=A605+NEGAZIONE 7

6ldquono wayrdquo ldquofor anything in the worldrdquo ldquotherersquos no wayrdquo ldquofor nothingrdquo ldquonot at allrdquo7ldquonobodyrdquo ldquonothingrdquo

43 NEGATION MODELING 167

Negation Sentiment Word ShiftedOperator Word Polarity Polarity

non

fantastico +3 -1bello +2 -2

carino +1 -2scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 43 Negation rules

Negation Negation Sentiment Word ShiftedOperator Operator Word Polarity Polarity

non mica

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 44 Negation rules with the repetition of negative operators

Strong Negation Sentiment Word ShiftedOperator Word Polarity Polarity

per niente

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 45 Negation rules with strong operators

168 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Weak Negation Sentiment Word ShiftedOperator Word Polarity Polarity

poco

fantastico +3 -1bello +2 -1

carino +1 -1nuovo 0 -1scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 46 Negation rules with weak operators

bull AdverbsnienteAVV+NEGAZIONE

nullaAVV+NEGAZIONE

pocoAVV+NEGAZIONE+DEB

poco o nienteAVV+PCONG+NKN+NEGAZIONE+DEB

poco o nullaAVV+PCONG+NKN+NEGAZIONE+DEB maiAVV+NEGAZIONE8

bull PronounsnessunoPRON+FLX=PRO717+NEGAZIONE

nientePRON+FLX=PRO719+NEGAZIONE

pocoPRON+FLX=PRO721+NEGAZIONE+DEB9

Negative Quantifiers Rules As indicated in the dictionary extractabove and as exemplified in the sentences below negative quan-tifiers which express both a negation and a quantification cantake the shape of different part of speech and consequently theycan assume in sentences different syntactic positionsWe can directly observe that they change their function on thebase of the different roles they assume

Negative quantifiers as Adjectives they tend to be pragmatically

8ldquonothingrdquo ldquolittlerdquo ldquolittle or nothingrdquo ldquoneverrdquo9ldquonot onerdquo ldquofewrdquo

43 NEGATION MODELING 169

negative when occurring as adjectival modifiers (58 59) justin the same way of lexical negation indicators (Potts 2011)

(58) Cortesia nessuna [-2]ldquoNo kindnessrdquo(59) Nessun servizio nelle stanze [-2]ldquoNo room servicerdquo

Negative quantifiers as Adverbs they work exactly as poco whenthey occupy the adverbial position (60 61) following therules formalized in the next paragraph (see Table 46) As ad-verbs they can occur together with negation operators (61)or not (60)

(60) Costa quasi niente [+1]ldquo It costs almost nothingrdquo(61) Non costa nulla [+2]ldquo It doesnrsquot cost anythingrdquo

Negative quantifiers as Pronouns they seem to possess the samestrengthening function of negative operators when occur-ring as pronouns (62 63) the rules of reference are in Table45

(62) Non lo consiglio a nessuno [-3]ldquoI donrsquot recommend it to anyonerdquo(64) Non gliene frega niente a nessuno [-3]ldquoNobody caresrdquo

Negative quantifiers in verbless sentences into elliptical sen-tences (64 65) they resemble the negation function of neg-ative operators following the rules of Table 43

(65) Niente di veramente innovativo [-1]ldquoNothing truly innovativerdquo(66) Nulla da eccepire [+1]ldquoNo objectionsrdquo

170 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Lexical Negation

Nouns

V er bs

Ad j ect i ves

mancanzaassenzacar enza

mancar edi f et t ar e

scar seg g i ar e

mancantecar ente

pr i vo

di

10

Lexical negation Rules lexical negation regards content word nega-tors that always possess an implicit negative influence There-fore when they occur in texts their polarity is assumed to be neg-ativeBecause they can co-occur with both the negative operators andquantifiers their polarity can also be shifted to the positive oneby simply applying the negation rules conceived for the first twonegation indicators

44 Modality Detection

441 Related Works

Speculative Language Detection (Dalianis and Skeppstedt 2010 De-scleacutes et al 2010 Vincze et al 2008) Hedge Detection (Lakoff 1973Ganter and Strube 2009 Zhao et al 2010) Irrealis Bloking (Taboadaet al 2011) and Uncertainty Detection (Rubin 2010) are all tasks thatcompletely or partially overlap modal constructions

10ldquolack to lack in to be laking ofrdquo

44 MODALITY DETECTION 171

In order to account for the most popular annotate corpora in this NLPfields we mention the BioScope corpus Vincze et al (2008) which hasbeen annotated with negation and speculation cues and their scopesthe FactBank that has been annotated with event factuality (Sauriacuteand Pustejovsky 2009) and the CoNLL Shared Task 2010 (Farkas et al2010) which focused on hedges detection in natural language textsModality has been defined by Kobayakawa et al (2009) into a taxon-omy that includes request recommendation desire will judgmentetcTaboada et al (2011) did not consider irrealis markers to be reliablefor the purposes of sentiment analysis Therefore they simply ignoredthe semantic orientation of any word in their scope Their list of irre-alis markers includes modals conditional markers (eg ldquoifrdquo) negativepolarity items (eg ldquoanyrdquo and ldquoanythingrdquo) some verbs (ldquoto expectrdquo ldquotodoubtrdquo) questions words enclosed in quotes which could not neces-sarily reflect the authorrsquos opinion)According to Benamara et al (2012) modality can be used to expresspossibility necessity permission obligation or desire through gram-matical cues such as adverbial phrases (eg ldquomayberdquo ldquocertainlyrdquo)conditional verbal moods some verbs (eg ldquomustrdquo ldquocanrdquo ldquomayrdquo)some adjectives and nouns (eg ldquoa probable causerdquo)Relying on the categories of Larreya (2004) Portner (2009) they distin-guished the following three types of modality each one of which hasbeen indicated to possess a specific effect on the opinion expressionin its scope in term of strength or degree of certainty

Buletic Modality it indicates the speakerrsquos desireswishes

Cues verbs denoting hope (eg ldquoto wishrdquo)

Epistemic modality it refers to the speakerrsquos personal beliefs andaffects the strength and the certainty degree of opinion expres-sions

Cues doubt possibility or necessity adverbs (eg ldquoperhapsrdquo

172 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ldquodefinitelyrdquo) and modal verbs (eg ldquohave tordquo ldquomaycanrdquo)

Deontic Modality it indicates possibilityimpossibility or obliga-tionpermission

Cues same modal verbs of the epistemic modality but with adeontic reading (eg ldquoYou must go see the movierdquo)

442 Modality in SentIta

When computing the Prior Polarities of the SentIta items into the tex-tual context we considered that modality can also have a significantimpact on the SO of sentiment expressionsAccording to the literature trends but without specifically focusingon the Benamara et al (2012) modality categories we recalled in theFSA dedicated to modality the following linguistic cues and we madethem interact with the SentIta expressions

Sharpening and Softening Adverbs

bull simple adverbs (32 sharp and 12 soft)inequivocabilmenteAVV+FORTE+Focus=SHARP

necessariamenteAVV+FORTE+Focus=SHARP

relativamenteAVV+DEB+Focus=SOFT

suppergiugraveAVV+DEB+Focus=SOFT

bull compound adverbs (38 sharp and 26 soft)con ogni probabilitagraveAVV+AVVC+PDETC+PDETN+Focus=SHARP

senza alcun dubbioAVV+AVVC+PDETC+PDETN+Focus=SHARP

fino a un certo puntoAVV+AVVC+PAC+PDETAGGN+Focus=SOFT

in qualche modoAVV+AVVC+PDETC+PDETN+Focus=SOFT

Modal Verbs from the LG class 56 (dovere ldquohave to potereldquomaycan volere ldquowantrdquo) that have as definitional struc-

44 MODALITY DETECTION 173

ture N0 V (E + Prep) Vinf W and accept in position N1 a director prepositional infinitive clause (all) and only in the cases ofpotere and volere a N1-um

Conditional and Imperfect Tenses tenses are located after thetexts preprocessing through the annotations Tempo=IM andTempo=C

Dealing with modality is very challenging in the Sentiment analy-sis field because in this research area it is not enough to simply detectits indicators but it is also necessary to manage its effects in term ofpolarity We did not find appropriate to simplify the problem by ig-noring all the semantic orientation of phrases and sentences in whichmodality can be detected (Taboada et al 2011) In fact as exemplifiedbelow there are cases in which modality sensibly affects its intensityand other cases in which it regularly shifts the sentence polarity intospecific directions

Uncertainty Degrees the words annotated with the labels +SHARPand +SOFT described above just as intensifiers and downtonershave the power of modifying the intensity of the words endowedwith positive or negative prior polarities with the same effectsdescribed in Section 42 We respectively selected them amongthe words tagged with the labels FORTE and DEB with the onlyaim of providing an exhaustive module that can work well withboth Modality Detection and Sentiment Analysis purposes

ldquoPotererdquo + Indicative Imperfect + Oriented Item modal verbs occur-ring with an imperfect tense can turn a sentence into a weaklynegative one (66) when combined with positive words or into aweakly positive one when occurring with negative items (67)The effect is really close to the weak negation (see Table 46 inSection 43) This can be explained by the fact that the meaning

174 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

of the sentences of this kind always concerns subverted expecta-tions

(66) Poteva[Modal+IM] essere una trama interessante[+2] [-1]ldquoIt could be an interesting plot

(67) Poteva[Modal+IM] essere una trama noiosa[-2] [+1]ldquoIt could be a boring plot

ldquoPotererdquo + Indicative Imperfect + Comparative + Oriented Items alsoin this case we use the explanation of subverted expectationsThe terms of comparison are obviously the expected opinionobject and the real one If the expected object is better than thereal one the sentence score is negative (68) if it is worse the sen-tence score is positive (69) The rules for the polarity score attri-bution are the ones displayed in Section 45

(68) Poteva[Modal+IM] essere piugrave[I-CW] dettagliata[+2] [-1]ldquoIt might have been more detailedrdquo

(69) Poteva[Modal+IM] andare peggio[I-OpW +2] [-1]ldquoIt might have gone worserdquo

ldquoDovererdquo + Indicative Imperfect Here again we face disappointmentof expectations but with a stronger effect the negation does notregard the possibility of the opinion object to be good but a ne-cessity Therefore we assumed the final score in these cases tofollow the negation rules shown in Table 43 in Section 43 (70)

(70) Questo doveva[Modal+IM] essere un film di sfumature[+1] [-2]ldquoThis one was supposed to be a nuanced movierdquo

ldquoDovererdquo + ldquoPotererdquo + Past Conditional Past conditional sentences(Narayanan et al 2009) also have a particular impact on theopinion polarity especially with the modal verb dovere (ldquohaveto) In such cases as exemplified in (71) the sentence polarityis always negative in our corpus

45 COMPARATIVE SENTENCES MINING 175

(71) Non[Negation] avrei[Aux+C] dovuto[Modal+PP] buttare via i mieisoldi [-2]ldquoI should not have burnt my money

The verb potere instead follows the same rules described abovefor the imperfect tense

45 Comparative Sentences Mining

451 Related Works

Comparative Sentence Mining and Comparison Mining Systems areNLP techniques and tool that only in part coincide with the SentimentAnalysis fieldSentences that express a comparison generally carry along with themopinions about two or more entities with regard to their shared fea-tures or attributes (Ganapathibhotla and Liu 2008) While the Sen-timent Analysis main purpose is to identify details about such opin-ions the Information Extraction main goals regard comparative sen-tence detection and classification Therefore if the object of the anal-ysis and the basic information units are shared among the two ap-proaches their purposes can sometimes coincide but in other casesbe differentAmong the few studies on the computational treatment of compari-son (Jindal and Liu 2006ab Ganapathibhotla and Liu 2008 Fiszmanet al 2007 Yang and Ko 2011) only Ganapathibhotla and Liu (2008)tried to extract the authorsrsquo preferred entities and to identify the Se-mantic Orientation of comparative sentences after their identifica-tion and classificationWith the following two examples Ganapathibhotla and Liu (2008)showed the difficulty in the polarity determination of context-dependent opinion comparatives that for a correct interpretation al-

176 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ways require domain knowledge eg the word ldquolongerrdquo can assumepositive (72) or negative (73) orientation depending on the productfeature on which it is associated

(72) ldquoThe battery life of Camera X is longer than Camera Yrdquo(73) ldquoProgram Xrsquos execution time is longer than Program Yrdquo

Since this work aims to provide a general purpose lexical and gram-matical database for Sentiment Analysis and Opinion Mining we didnot face here the treatment of this special kind of opinionated sen-tences just as we did not include in the lexicon items with domain-dependent meaningConcerning the comparative classification the main types that havebeen identified in literature are the following

bull Gradable (direct comparison)

ndash Non-equal

Increasing

Decreasing

ndash Equative

ndash Superlative

bull Non-gradable (implicit comparisons)

According to Ganapathibhotla and Liu (2008) Jindal and Liu(2006ab) a comparative relation can be represented in the followingway

ComparativeWord Features EntityS1 EntityS2 Type

Where ComparativeWord is the keyword used to express a compar-ative relation in a sentence Features is a set of features being com-pared EntityS1 is the first term of comparison and EntityS2 is thesecon oneThis way examples (72) and (73) are represented as follows

45 COMPARATIVE SENTENCES MINING 177

(72) longer battery life Camera X Camera Y Non-equal Gradable

(73) longer execution time Program X Program Y Non-equal Gradable

Our work shares with Ganapathibhotla and Liu (2008) the followingbasic rules

Increasing comparative + Negative item minusrarr Negative opinion minusrarr EntityS2 is preferred

Increasing comparative + Positive item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Negative item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Positive item minusrarr Negative opinion minusrarr EntityS2 is preferred

However differently from our work the authors simply switchedthe polarity of the opinions (and consequently the preferred en-tity) in the cases in which comparative sentences contained negationwords or phrases They underlined that sometimes this operationcould appear problematic eg ldquonot longerrdquo does not mean ldquoshorterrdquoActually the problem is that the authors did not take into accountdifferent degrees of positivity and negativity As we will show in thenext paragraphs of this Section the co-occurrences of weakly pos-itivenegative and intensely positivenegative Prior Polarities withcomparison indicators can generate different final sentence scoresAs regards the automatic analysis of superlatives we mention thework of Bos and Nissim (2006) that grounded the semantic interpre-tation of superlatives on the characterization of a comparison set (theset of entities that are compared to each other with respect to a cer-tain dimension) The authors took into consideration the superlativemodification performed by ordinals cardinals or adverbs (eg inten-sifiers or modals) and also the fact that superlative can manifest itselfboth in predicative or attributive formTheir superlative parser able to recognise a superlative expressionand its comparison set with high accuracy provides a CombinatoryCategorial Grammar (CCG) (Clark and Curran 2004) derivation of theinput sentence that is then used to construct a Discourse Represen-tation Structure (DRS) (Kamp and Reyle 1993)In this thesis we did not go through a semantic and logical analysis of

178 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

comparative and superlative sentences as well explained in the nextparagraph we just used them as CVS in order to accurately modify thepolarity of the entities they involve

452 Comparison in SentIta

In general this work which focuses on Gradable comparison in-cludes the analysis of the following comparative structures

bull equative comparative frozen sentences of the type N0 Agg comeC1 from the LG class PECO (see Section 342)

bull opinionated comparative sentences that involve the expressionsmeglio di migliore di ldquobetter than peggio di peggiore di ldquoworsethan superiore a ldquosuperior to inferiore a ldquoless than (74)

bull adjectival adverbial and nominal comparative structures that re-spectively involve oriented adjectives adverbs and nouns fromSentIta (75)

bull absolute superlative (75)

bull comparative superlative (76)

The comparison with other products (74) has been evaluated withthe same measures of the other sentiment expressions so the polaritycan range from -3 to +3The superlative that can imply (76) or not (75) a comparison with awhole class of items (Farkas and Kiss 2000) confers to the first termof the comparison the higher polarity score so it always increases thestrength of the opinion Its polarity can be -3 or +3

(74) LrsquoS3 egrave complessivamente superiore allrsquoIphone5 [+2]ldquoThe S3 is on the whole superior to the iPhone5

(75) Il suo motore era anche il piugrave brioso[+2] [+3]ldquoIts engine was also the most lively

45 COMPARATIVE SENTENCES MINING 179

(76) Un film peggiore di qualsiasi telefilm [-3]ldquoA film worse than whatever tv series

We did not include the equative comparative in the ContextualValence Shifters Section because it does not change the polarity ex-pressed by the opinion words (77)

(77) [Non[Negation] entusiasmante[+3]][-1] come i PES degli anni prece-denti

ldquoNot exciting as the the past years PESrdquo

In the next paragraphs we will go through the rules used in this FSAnetwork to compute the Semantic Orientations of minority and ma-jority comparative structures As will be observed the variety and thecomplexity of co-occurrence rules can be easily managed using thefinite-state technology

453 Interaction with the Intensification Rules

Rule II increasing comparative sentences with increasing opinion-ated comparative items listed in SentIta with positive Prior Polaritiesgenerate a positive opinionComparative indicators are the following

bull increasing opinionated comparative words (I-Op-CW)eg meglio better = +2

bull Intensifiers + increasing opinionated comparative words (Int + I-Op-CW)eg molto meglio a lot better = +3

Rule I increasing comparative sentences which occur with positiveitems from SentIta generate a positive OpinionComparative indicators are the following

180 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull Increasing Comparative Words (I-CW)eg piugrave morerdquo

bull Intensifiers + Increasing comparative words (Int + I-CW)eg molto piugrave a lot morerdquo

The sentence score depends on the interaction of I-CWs intensi-fiers and the prior polarity scores attributed to the SentIta wordsin the electronic dictionaries (see Table 47)

The sentence score just depends on the Prior polarity of the I-Op-CW

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

I-CW + OpW[POS] piugrave +1 +2 +3Int + I-CW + OpW[POS] molto piugrave +2 +3 +3

Table 47 Rule I increasing comparative sentences with positive orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

D-CW + OpW[POS] meno -1 -2 -1Int + D-CW + OpW[POS] molto meno -2 -3 -1

Table 48 Rule III decreasing comparative sentences with negative orientation

Rule III decreasing comparative sentences which occur with positiveitems from SentIta generate a negative opinion (see Table 48)Comparative indicators are the following

bull Decreasing Comparative Words (D-CW)eg meno lessrdquo

45 COMPARATIVE SENTENCES MINING 181

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

I-CW + OpW[NEG] piugrave -1 -2 -3Int + I-CW + OpW[NEG] molto piugrave -2 -3 -3

Table 49 Rule IV increasing comparative sentences with Negative Orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

D-CW + OpW[NEG] meno +1 +1 +1Int + D-CW + OpW[NEG] molto meno +2 +2 +1

Table 410 Rule VI decreasing comparative sentences with positive orientation

bull Intensifier + Decreasing comparative word (Int + D-CW)eg molto meno much lessrdquo

Rule IV this rule is perfectly specular to Rule I Increasing compara-tive sentences which occur with Negative items from SentIta gen-erate a negative opinionThe sentence score depends again on the interaction of I-CW In-tensifiers and the prior polarity scores attributed to the SentItawords that appear in the same sentence (see Table 49)

Rule V decreasing comparative sentences with decreasing opinion-ated comparative items listed in SentIta with negative Prior Po-larities generate a negative opinionComparative indicators are the following

bull decreasing opinionated comparative words (I-Op-CW)eg peggio worse = -2

bull Intensifier + decreasing opinionated comparative word (Int+ I-Op-CW)

182 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + I-CW + OpW[POS] non piugrave -1 -1 -1N + Int + I-CW + OpW[POS] non molto piugrave -1 -1 -1

Table 411 Nagation of Rule I Negated increasing comparative sentences with neg-ative orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + D-CW + OpW[POS] non meno +1 +2 +3N+ Int + D-CW + OpW[POS] non molto meno +1 +1 +2

Table 412 Negation of Rule III Negated decreasing comparative sentences withpositive orientation

eg molto peggio much worse = -3

Just as in Rule II the sentence score just depends on the PriorPolarity of the I-Op-CWs

Rule VI decreasing comparative sentences which occur with neg-ative items from SentIta generate a positive opinion (see Table410)Rule VI shares the comparative indicators with Rule III

454 Interaction with the Negation Rules

Negation of the Rule I increasing comparative sentences which occurwith positive items from SentIta when negated generate a neg-ative opinionComparative indicators are of course the same of Rule I with

45 COMPARATIVE SENTENCES MINING 183

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + I-CW + OpW[NEG] non piugrave +1 +1 -1N+ Int + I-CW + OpW[NEG] non molto piugrave +1 +1 -2

Table 413 Negation of Rule IV Negated increasing comparative sentences with neg-ative orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + D-CW + OpW[NEG] non meno -1 -2 -3N+ Int + D-CW + OpW[NEG] non molto meno -1 -2 -3

Table 414 Negation of RuleVI Negated decreasing comparative sentences with pos-itive orientation

the addition of negation indicators (N) eg non ldquonotrdquo The sen-tence score is always turned into the weakly negative one (seeTable 411)

Negation of the Rule II increasing comparative sentences in whichoccur increasing opinionated comparative items listed in SentItawith positive Prior Polarities when negated generate always anegative opinionComparative indicators are the same of Rule II

bull Negation markers + Increasing Opinionated ComparativeWords (N + I-Op-CW)eg non meglio ldquonot betterrdquo = -2

bull Negation markers + Intensifiers + Increasing OpinionatedComparative Words (N + Int + I-Op-CW)eg non molto meglio ldquonot much betterrdquo = -1

184 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Negation of Rule III the negation of Rule III that involves decreasingcomparative sentences with positive items from SentIta alwaysgenerates positive opinionsThe different sentence scores computed on the base of the in-teraction of comparative words and SentItarsquos Prior Polarities arereported in Table 412

Negation of Rule IV increasing comparative sentences which occurwith negative items from SentIta generate in the great part of thecases positive opinions (see Table 413) The sentence score isinfluenced by the interaction of Negation markers I-CW Inten-sifiers and negative prior polarity scores

Negation of Rule V decreasing comparative sentences in which occurincreasing opinionated comparative items listed in SentIta withnegative Prior Polarities when negated generate a weakly nega-tive opinion (non (molto + E) peggio ldquonot (much + E) worserdquo)

Negation of Rule VI decreasing comparative sentences which occurwith negative items from SentIta generate negative opinions ifnegated see Table 414

Chapter 5

Specific Tasks of the SentimentAnalysis

51 Sentiment Polarity Classification

Differently from the traditional topic-based text classification senti-ment classification aims to categorize documents through the classespositive and negative The objective can be a simple binary classifica-tion or it can come after a more complex classification with a largernumber of classes in the continuum between positive and negative(Pang and Lee 2008a) The rating-inference problem (Pang and Lee2005 Leung et al 2006 2008 Shimada and Endo 2008) regards themapping of automatically detected document polarity scores on thefine-grained rating scales (eg 510) of existing Collaborative Filter-ing algorithms (Breese et al 1998 Herlocker et al 1999 Sarwar et al2001 Linden et al 2003) The challenging problem of the sentimentclassification is the fact that the sentiment classes far from being un-related to one another represent opposing (binary classification) orordinal categories (multi-class a and rating-inference)

185

186 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

511 Related Works

Many techniques have been discussed in literature to perform theSentiment Polarity Classification We divided them in Section 32 inlexicon-based methods learning methods and hybrid methods1The first which calculate the sentiment orientation of sentences anddocuments starting from the polarity of words and phrases they con-tain are the methods that have been chosen for the present workThis is why they received an in-depth analysis in Chapter 3 wherewe discussed also the pros and cons of both the manual and auto-matic draft of sentiment lexical resources In this respect we men-tion again the major contribution related to knowledge-based so-lutions Landauer and Dumais (1997) Hatzivassiloglou and McKe-own (1997) Wiebe (2000) Turney (2002) Turney and Littman (2003)Riloff et al (2003) Hu and Liu (2004) Kamps et al (2004) Vermeij(2005) Gamon and Aue (2005) Esuli and Sebastiani (2005 2006a)Kanayama and Nasukawa (2006a) Benamara et al (2007) Kaji andKitsuregawa (2007) Rao and Ravichandran (2009) Mohammad et al(2009) Neviarouskaya et al (2009a) Velikovich et al (2010) Taboadaet al (2006 2011)Rule-based approaches that take into account the syntactic dimen-sion of the Sentiment Analysis are the ones used by Mulder et al(2004) Moilanen and Pulman (2007) and Nasukawa and Yi (2003)The most used classification algorithms in learning and statisticalmethods are Support Vector Machines which trained on specificdata-sets map a space of unstructured data points onto a new struc-tured space through a kernel function and Naiumlve Bayes which areprobabilistic classifiers that connect the presenceabsence of a classfeature to the absencepresence of other features given the class vari-able Effective sentiment features exploited in supervised machinelearning applications are terms their frequencies and TF-IDF parts

1In general while hybrid methods are characterized by the application of classifiers in sequence ap-proaches based on rules consists of an if-then relation among an antecedent and its associated conse-quent (Prabowo and Thelwall 2009)

51 SENTIMENT POLARITY CLASSIFICATION 187

of speech sentiment words and phrases sentiment shifters syntac-tic dependency etcTan et al (2009) and Kang et al (2012) adapted Naiumlve Bayes algorithmsfor sentiment analysisPang et al (2002) with the purpose to test the hypothesis that senti-ment classification can be treated just as a special case of topic-basedcategorization with the positive and negative sentiment consideredas topics used SVM Naiumlve Bayes and Maximum Entropy classifierswith diverse sets of featuresMullen and Collier (2004) employed the values potency activity andevaluative from a variety of sources and created a feature space clas-sified by means of SVMYe et al (2009) compared SVM with other statistical approaches prov-ing that SVM and n-gram outperform the Naiumlve Bayes approachThe minimum-cut framework working on graphs has been used byPang and Lee (2004)Bespalov et al (2011) performed sentiment classification via latent n-gram analysisNakagawa et al (2010) presented a dependency tree-based classifica-tion method grounded on conditional random fieldsCompositionality algorithms have been tested by Yessenalina andCardie (2011) and Socher et al (2013) Socher et al (2013) proposedRecursive Neural Tensor Network able to compute compositionalvector representations for phrases of variable length or of syntactickind obtaining better performances of standard recursive neural net-works matrix-vector recursive neural networks Naiumlve bi-gram Naiumlveand SVM Their output is the annotated corpus Stanford SentimentTreebank which represents a precious resource for the analysis of thecompositional effects of sentiment in languageSentiment indicators can be used for sentiment classification in anunsupervised manner this is the case of Turney (2002) and the otherlexicon-based methods (Liu 2012)In the end as regards the hybrid methods we mention the works of An-

188 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

dreevskaia and Bergler (2008) that integrated the accuracy of a corpus-based system with the portability of a lexicon-based system in a sin-gle ensemble of classifiers Aue and Gamon that combined nine SVM-based classifiers Dasgupta and Ng (2009) that firstly mined the un-ambiguous reviews through spectral techniques and afterwards ex-ploited them to classify the ambiguous reviews by combining activelearning transductive learning and ensemble learning Goldberg andZhu (2006) that presented an adaptation of graph-based learning al-gorithms for rating-inference problems and Prabowo and Thelwall(2009) that combined rule-based and machine learning classificationalgorithms

512 DOXA a Sentiment Classifier based on FSA

Using the command-line program noojapplyexe we built Doxa aprototype written in Java by which users can automatically apply theNooj lexicon-grammatical resources from Section 3 and rules fromSection 4 to every kind of text getting back a feedback of statistics thatcontain the opinions expressed in each caseDoxa sums up the values corresponding to every sentiment expressionand then standardizes the result for the total number of sentimentexpressions contained in each review Neutral sentences are simplyignored by both the FSA and DoxaThe automatically attributed polarity score are compared with thestars that the opinion holder gave to his review Then through thestatistics about the opinions expressed in every domain Doxa teststhe consistency of the sentiment lexical databases and tools with re-spect to the rating-inference problem in each document and in thewhole corpusFigure 51 reports an example of the analysis on the domain of hotelsreviews

51 SENTIMENT POLARITY CLASSIFICATION 189

Figure 51 Doxa the LG sentiment classifier based on the finite-state technology

513 A Dataset of Opinionated Reviews

Despite the fact that for the English language there is a large availabil-ity of corpora for sentiment analysis (Hu and Liu 2004 Pang and Lee2004 2005 Wiebe et al 2005 Wilson et al 2005 Thomas et al 2006Jindal and Liu 2007 Snyder and Barzilay 2007 Blitzer et al 2007 Sekiet al 2007 Macdonald et al 2007 Pang and Lee 2008b among oth-ers) the Italian language presents fewer contributions in this sense(Basile and Nissim 2013 Bosco et al 2013 2014)The dataset used to evaluate the lexical and grammatical resources ofthe present work has been built from scratch using Italian opinion-ated texts in the form of usersrsquo reviews and comments of e-commerceand opinion websites It contains 600 texts units and refers to sixdifferent domains for all of which different websites have been ex-ploited

190 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Cars wwwciaoit

Smartphones wwwtecnozoomit wwwciaoit wwwamazonitwwwalatestit

Books wwwamazonit wwwqlibriit

Movies wwwmymoviesit wwwcinemaliait wwwfilmtvitwwwfilmscoopit

Hotels wwwtripadvisorit wwwexpediait wwwvenerecomithotelscom wwwbookingcom

Videogames wwwamazonit

Details about the corpus are shown in Table 1

Text

Feat

ure

s

Car

s

Smar

tph

on

es

Bo

oks

Mov

ies

Ho

tels

Gam

es

Tot

Neg Docs 50 50 50 50 50 50 300Pos Docs 50 50 50 50 50 50 300Text Files 20 20 20 20 20 20 120

Word Forms 17163 19226 8903 37213 12553 5597 101655Tokens 21663 24979 10845 45397 16230 7070 126184

Table 51 Dataset of opinionated online customer reviews

Adjectives as it is commonly recognized in literature (Hatzivas-siloglou and McKeown 1997 Hu and Liu 2004 Taboada et al 2006)seem to be the most reliable SO indicators considering that the 17of the adjectives occurring in the corpus are polarized compared tothe 3 of the adverbs the 2 of nouns and the 7 of verbsMoreover considering all the opinion bearing words in the corpus theadjectivesrsquo patterns cover 81 of the total number of occurrences (al-most 5000 matches) while the adverbs the nouns and the verbs reach

51 SENTIMENT POLARITY CLASSIFICATION 191

respectively a percentage of 4 6 and 2 The remaining 7 is cov-ered by the other sentiment expressions that in any case contributeto the achievement of satisfactory levels of RecallAs regards the Semantic Orientation while in the dictionaries almost70 of the words are connoted by a negative Prior Polarity the num-ber of positive and negative expressions in our perfectly balanced cor-pus lead to almost antipodal results 64 of the sentiment expressionspossess a positive polarity In particular it is positive 61 of the ad-jective expressions 77 of the adverb expressions 65 of the nounsexpressions and 51 of the verb expressionsThe domain independent expressions have diametrically oppositepercentages only 34 of the occurrences is positive This result is dueto the presence of the really productive negative metanodes troppoldquotoo muchrdquo and poco ldquonot muchrdquo Inactivating them in fact the pos-itive percentage reaches 58 of occurrences Thus it could be statedthat although the lexicon makes available a greater number of nega-tive items the real occurrences of the opinion words throw off balancein the direction of the positive items (see Section 3321 to deepen thephenomenon of positive biases)

514 Experiment and Evaluation

In the sentiment classification task the information unit is representedby an opinionated document as a whole The purpose of our senti-ment tool is to classify such documents on the base of their overallSemantic OrientationTo ground this task on a lexicon means to hypothesize that the polar-ities of opinion words can be considered indicators of the polarity ofthe document in which the words and the expressions are containedOur Lexicon-Grammar based sentiment lexical database confers to el-ementary sentences the status of minimum semantic units of the lan-guage Therefore the indicators on which we grounded our classifi-

192 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cation go beyond the limit of mere words to match entire elementarysentences or polar phrases from complex sentencesBecause the inference of the whole document SO depends on the cor-rect annotation of its sentences and phrases the evaluation phase in-volves both the sentence-level and the document-level performancesof the tool

5141 Sentence-level Sentiment Analysis

This Paragraph presents the results pertaining to the Nooj outputwhich has been produced applying the SentIta LG sentiment re-sources to our corpus of costumer reviewsFor sake of clarity and for the reproducibility of the results it must bepointed out that some of the lexical resources have been applied withhigher priority attributions Details about the preferences are givenbelow

bull H1 to the sentiment adverb and nouns dictionaries

bull H2 to the sentiment adjective dictionary

bull H3 to the freely available dictionary Contrazioni that belongs tothe official Italian module of Nooj

bull H4 to an high-level priority dictionary that contains what is com-monly called a ldquostop word listrdquo

bull H5 to a dictionary of compound adverbs

In order to avoid ambiguity as much as possible the dictionaryof the verbs of sentiment must have the same priority of the Italianstandard dictionary (sdic) lower than any other mentioned resourceBecause our lexical and grammatical resources are not domain-specific we observed their interaction with every single part of thecorpus which is composed of many different domains each one ofthem characterized by its own peculiarities

51 SENTIMENT POLARITY CLASSIFICATION 193

Moreover in order to verify the performances of every part of speech(and of the expressions connected to them) we checked as shown inTable 52 the Precision and the Recall applying separately every singlemetanode (A ADV N V D-ind2) of the main graph of the sentimentgrammar

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A 88 90 79 875 915 835 866ADV 804 758 879 923 92 50 797N 818 857 828 778 853 857 832V 882 571 848 895 571 100 795D-ind 879 835 781 767 90 947 879Average 853 784 825 848 832 828 828

Table 52 Precision measure in sentence-level classification

The sentence-level Precision has been manually calculated deter-mining whereas the polarity and the intensity score of the expressionswas correctly assigned on all the concordances produced by Nooj Wemade an exception for the expressions of the adjectives which due tothe extremely large number of concordances have been evaluated byextracting a sample of 1200 matches (200 for each domain)The values marked by the asterisks have been reported to be thoroughbut they are not really relevant because of the small number of concor-dances on which they have been calculatedThe best performance has been reached in average by the cars datasetwhile the lower values have been assigned to the movies dataset Theperformances of the adjectives grammar are really satisfying espe-

2Domain independent sentiment expressions (D-ind) refer to all the frozen and semi-frozen expres-sions that have been directly formalized in the FSA and include also the vulgar expressions

194 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cially if compared with the high number of matches they producedEven though the domain-independent node covers a small part of thematches (7) it reached the best Precision results if compared withthe other metanodesAnyway in many cases just evaluating the polarity and the intensityof the expressions contained into a document is not enough to deter-mine whereas such expression truly contributes to the identificationof the polarity and the intensity of the document itself Sometimes po-larized sentences and phrases are not opinion indicators especiallyin the reviews of movies and books in which polarized sentences canjust refer to the plot (78)

(78) Le belle case sono dimora della degenerazione piugrave bieca [-3]ldquopretty houses hosts the grimmest degeneracyrdquo)

An opinionated sentence can be considered not pertinent also isthe case in which it does not directly refer to the object of the opinionbut mentions aspects that are just indirectly connected with the prod-uct Examples of this are (79) that does not refer to the product but tothe delivery service that in a book review does not refers to the objectbut to another media product related to the book

(79) Apprezzo[+2] molto[+] Amazon per questo [+3]ldquoI really appreciate Amazon for thisrdquo

(80) Il film non[Negation] mi era piaciuto[+2] [-2]ldquoI didnrsquot like the movierdquo

In this work we chose to exclude the second sentence from the cor-rect matches but we considered worthwhile to include the first onebecause of the high influence that this feature has on the numericalvalue that the opinion holder confers to his own opinionTable 53 corrects the results reported in Table 52 by excluding from

the correct matches the ones that do not give their contribution tothe individuation of the correct document SO This makes the averagesentence-level Precision drop of 87 points reaching the percentage of

51 SENTIMENT POLARITY CLASSIFICATION 195

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A -5 -55 -28 -105 -1 -15 -86ADV -22 0 -297 -77 0 0 -66N -114 -143 -395 -148 -59 -143 -167V 0 0 -176 -158 0 0 -56D-ind -86 0 -133 -67 -25 -53 -61Average -54 -4 -256 -111 -19 -42 -87

Table 53 Irrelevant matches in sentence-level classification

741 which we considered adequateWe chose to present these two results separately because this way it ispossible to distinguish the domains that are more influenced by thisproblem (eg movies and books) from the ones that are less affectedby the pertinence problem (eg hotels and smartphones)

5142 Document-level Sentiment Analysis

As far as the document-level performance is concerned (Table 54)we calculated the precision twice by considering in a first case as truepositive the reviews correctly classified by Doxa on the base of theirpolarity and in a second case by considering as true positive the docu-ments that received by our tool exactly the same stars specified by theOpinion HolderIn detail in the Polarity only row the True Positives are the documentsthat have been correctly classified by Doxa with a polarity attributionthat corresponds to the one specified by the Opinion HolderIn the Intensity also row the True Positives are the document that re-ceived by Doxa exactly the same stars specified by the Opinion Holder

196 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Pre

cisi

on

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Polarity only 710 720 630 740 910 720 740Intensity also 320 450 250 330 490 340 363

Table 54 Precision measure in document-level classification

As we can see the latter seem to have a very low precision but upona deeper analysis we discovered that is really common for the OpinionHolders to write in their reviews texts that do not perfectly correspondto the stars they specified That increases the importance of a softwarelike Doxa that does not stop the analysis on the structured data butenters the semantic dimension of texts written in natural language

Rec

all

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Sentence-level 727 796 648 657 721 588 690Document-level 100 986 100 961 989 912 975

Table 55 Recall in both the sentence-level and the document level classifications

The Recall pertaining to the sentence-level performance of our toolhas been manually calculated on a sample of 150 opinionated docu-ments (25 from each domain) by considering as false negatives thesentiment indicators which have not been annotated by our grammarThe document-level Recall instead has been automatically checkedwith Doxa by considering as true positive all the opinionated docu-ments which contained at least one appropriate sentiment indicator

52 FEATURE-BASED SENTIMENT ANALYSIS 197

So the documents in which the Nooj grammar did not annotate anypattern were considered false negatives Because we assumed that allthe texts of our corpus were opinionated we considered as false neg-atives also the cases in which the value of the document was 0Taking the F-measure into account the best results were achievedwith the smartphonersquos domain (770) in the sentence-level taskand with the hotelrsquos dataset (948) into the document-level perfor-mance

52 Feature-based Sentiment Analysis

The purpose of the sentiment analysis based on features is to providecompanies with customer opinions overviews which automaticallysummarize the strengths and the weaknesses of the products and ser-vices they offer In Section 1 we connected the definition of the opin-ion features to the following function (Liu 2010)

T=O(f)

Where the features (f ) of an object (O) (eg hotels) can be brandnames (eg Artemide Milestone) properties (eg cleanness loca-tion) parts (eg rooms apartments) related concepts (eg city to-ponyms) or parts of the related concepts (eg city centre tube mu-seum etc ) (Popescu and Etzioni 2007) Liu (2010) represented ev-ery object as a ldquospecial featurerdquo (F) defined by a subset of features (fi)

F = f1 f2 fn

Both F and fi grouped together under the noun target can beautomatically discovered in texts through both synonym words andphrases Wi or other direct or indirect indicators Ii

Wi = wi1 wi2 wim

Ii = ii1 ii2 iiq

198 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

It is essential to discover not only the overall polarity of an opin-ionated document but also its topic and features in order to discernthe aspect of a product that must be improved or whether the opin-ions extracted by the Sentiment Analysis applications are relevant tothe product or not For example (81) an extract from a hotel reviewexpresses two opinions on two different kinds of features the behav-ior of the staff (81a) and the beauty of the city in which the hotel islocated (81b)

(81a) [Opinion[FeaturePersonale] meravigliosamente accogliente]ldquoStaff wonderfully welcomingrdquo

(81b) [Opinion[FeatureRoma]egrave bellissima]ldquoRome is beautifulrdquo

It is crucial to distinguish between the two topics because just oneof them (81a) refers to an aspect that can be influenced or improvedby the company

521 Related Works

Pioneer works on feature-based opinion summarization are Hu andLiu (2004 2006) Carenini et al (2005) Riloff et al (2006) and Popescuand Etzioni (2007) Both Popescu and Etzioni (2007) and Hu and Liu(2004) firstly identified the product features on the base of their fre-quency and then calculated the Semantic Orientation of the opin-ions expressed on these features In order to find the most impor-tant features commented in reviews Hu and Liu (2004) used the as-sociation rule mining thanks to which frequent itemsets can be ex-tracted in free texts Redundant and meaningless items are removedduring a Feature Pruning phase Experiments have been carried outon the first 100 reviews belonging to five classes of electronic products(wwwamazoncom and wwwC|netcom) Hu and Liu (2006) presentedthe algorithm ClassPrefix-Span that aimed to find special kinds of pat-

52 FEATURE-BASED SENTIMENT ANALYSIS 199

tern the Class Sequential Rules (CSR) using fixed target and classesTheir method was based on the sequential pattern miningCarenini et al (2005) struck a balance between supervised and unsu-pervised approaches They mapped crude (learned) features into aUser-Defined taxonomy of the entityrsquos Features (UDF) that provideda conceptual organization for the information extracted This methodtook advantages from a similarity matching in which the UDF re-duced the redundancies by grouping together identical features andthen organized and presented information by using hierarchical rela-tions) Experiments have been conducted on a testing set containingPros Cons and detailed free reviews of five productsRiloff et al (2006) used the subsumption hierarchy in order to iden-tify complex features and then to reduce a feature set by remov-ing useless features which have for example a more general coun-terpart in the subsumption hierarchy The feature representationsused for opinion analysis are n-grams (unigrams bigrams) and lexico-syntactic extraction patterns Popescu and Etzioni (2007) presentedOPINE an unsupervised feature and opinion extraction system thatused the Web as a corpus to identify explicit and implicit featuresand relaxation-labelling methods to infer the Semantic Orientation ofwords The system draws on WordNetrsquos semantic relations and hier-archies for the individuation of the features (parts properties and re-lated concepts) and the creation of clusters of wordsFerreira et al (2008) made a comparison between the likelihood ratiotest approach (Yi et al 2003) and the Association mining approach(Hu and Liu 2004) The e-product dataset (topical documents) andthe annotation scheme (improved with some revisions) are the onesused by Hu and Liu (2004) In particular the revised annotationscheme comprehended

bull part-of relationship with the product (eg battery is a part of adigital camera)

bull attribute-of relationship with the product (weight and design are

200 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

attributes of a camera)

bull attribute-of relationship with the productrsquos feature (battery life isan attribute of the battery that is in turn a camerarsquos feature)

Double Propagation (Qiu et al 2009) focuses on the natural relationbetween opinion words and features Because opinion words are of-ten used to modify features such relations can be identified thanks tothe dependency grammarBecause these methods have good results only for medium-size cor-pora they must be supported by other feature mining methods Thestrategy proposed by Zhang et al (2010) is based on ldquono patternsrdquo andpart-whole patterns (meronymy)

bull NP + Prep + CP ldquobattery of the camerardquo

bull CP + with + NP ldquomattress with a coverrdquo

bull NP CP or CP NP ldquomattress padrdquo

bull CP Verb NP ldquothe phone has a big screenrdquo

Where NP is a noun phrase and CP a class concept phrase Theverbs used are ldquohasrdquo ldquohaverdquo ldquoincluderdquo ldquocontainrdquo ldquoconsistrdquo etc ldquoNordquopatterns are feature indicators as well Examples of such patterns areldquono noiserdquo or ldquono indentationrdquo Clearly a manually built list of fixedldquonordquo expression like ldquono problemrdquo is filtered out from the feature can-didates Every sentence is POS-tagged using the Brillrsquos tagger (Brill1995) and used as system inputIn order to find important features and rank them high Zhang et al(2010) used a web page ranking algorithm named HITS (Kleinberg1999)Somprasertsri and Lalitrojwong (2010) used a dependency based ap-proach for the opinion summarization task A central stage in theirwork is the extraction of relations between product features (ldquothe topicof the sentimentrdquo) and opinions (ldquothe subjective expression of theproduct featurerdquo) from online customer reviews Adjectives and verbs

52 FEATURE-BASED SENTIMENT ANALYSIS 201

have been used in this study as opinion words The maximum entropymodel has been used in order to predict the opinion-relevant productfeature relation In general in a dependency tree a feature can be ofdiffering kinds

bull Child the product feature is the subject or object of the verbsand the opinion word is a verb or a complement of a copular verb(opinions depend on the product features)

eg I like(opinion) this camera(feature)

bull Parent the opinion words are in the modifiers of product fea-tures which include adjectival modifier relative clause modifieretc (product features depend on the opinions)

eg I have found that this camera takes incredible(opinion) pic-tures(feature)

bull Sibling the opinion word may also be in an adverbial modifiera complement of the verb or a predicative (both opinions andproduct features depend on the same words)

eg The pictures(feature) some time turn out blurry(opinion)

bull Grand Parent the opinion words are adjectival complement ofmodifiers of product features (in the following example the wordldquogoodrdquo is the adverbial complement of relative clause modifierof noun phrase ldquomovie moderdquo) (opinions depend on the wordswhich depend on the product features)

eg It has movie mode(feature) that works good(opinion) for a digitalcamera

bull Grand Child the product feature is the subject or object of thecomplements and the opinion word is a verb or a complementof a copular verb for example (product features depend on thewords which depend on the opinions)

eg Itrsquos great(opinion) having the LCD display(feature)

202 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

bull Indirect none of the above relations

The Stanford Parser has been used by Somprasertsri and Lalitroj-wong (2010) to parse documents in the pre-processing phase beforethe dependency analysis stage Definite linguistic filtering (POSbased) pattern and the General Inquirer dictionary (Stone et al 1966)has been used to locate noun phrases and to extract product featurecandidate The generalized iterative scaling algorithm (Darroch andRatcliff 1972) has been used to estimate parameters or weights of theselected features Because it is possible to refer to a particular featureusing several synonyms Somprasertsri and Lalitrojwong (2010) usedsemantic information encoded into a product ontology manuallybuilt by integrating manufacturer product descriptions and termi-nologies in customer reviewsWei et al (2010) proposed a semantic-based method that made usesof a list of positive and negative adjectives defined in the GeneralInquirer to recognize opinion words and then extracted the relatedproduct features in consumer reviewsXia and Zong (2010) performed the feature extraction and selectiontasks using word relation features which seems to be effective featuresfor sentiment classification because they encode relation informationbetween words They can be of different kinds Uni (unigrams) WR-Bi(traditional bigrams) WR-Dp (word pairs of dependency relation)GWR-Bi (generalized bigrams) and GWR-Dp (dependency relations)Gutieacuterrez et al (2011) exploited Relevant Semantic Trees (RST) for theword-sense disambiguation and measured the association betweenconcepts at the sentence-level using the association ratio measureMejova and Srinivasan (2011) explored different feature definitionand selection techniques (stemming negation enriched featuresterm frequency versus binary weighting n-grams and phrases) andapproaches (frequency based vocabulary trimming part-of-speechand lexicon selection and expected Mutual Information Mejovarsquosexperimental results confirmed the fact that adjectives are importantfor polarity classification Furthermore they revealed that stemming

52 FEATURE-BASED SENTIMENT ANALYSIS 203

and using binary instead of term frequency feature vectors do nothave any impact on performances and that the helpfulness of certaintechniques depends on the nature of the dataset (eg size and classbalance)Concordance based Feature Extraction (CFE) is the technique used byKhan et al (2012) After a traditional pre-processing step regular ex-pressions are used to extract candidate features Evaluative adjectivescollected on the base of a seed list from Hu and Liu (2004) are helpfulin the feature extraction task In the end a grouping phase found theappropriate features for the opinionrsquos topic grouping together all therelated features and removing the useless ones The algorithm usedin this phase is based on the co-occurrence of features and uses theleft and right featurersquos contextKhan et al selected candidate product features by employing nounphrases that appear in texts close to subjective adjectives This intu-ition is shared with Wei et al (2010) and Zhang and Liu (2011) Thecentrepiece of the Khanrsquos method is represented by hybrid patternsCombined Pattern Based Noun Phrases (cBNP) that are grounded onthe dependency relation between subjective adjectives (opinionatedterms) and nouns (product features) Nouns and adjectives canbe sometimes connected by linking verbs (eg ldquocamera producesfantastically good picturesrdquo) Preposition based noun phrases (egldquoquality of photordquo ldquorange of lensesrdquo) often represents entity-to-entityor entity-to-feature relations The last stage is the proper featureextraction phase in which using an ad hoc module the noun phrasesof the cBNP patterns have been designated as product features

522 Experiment and Evaluation

In order to give our contribution to the resolution of the opinionfeature extraction and selection tasks we both exploited the SentIta

204 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

database and other preexisting Italian lexical resources of Nooj thatconsist of an objective dictionary that describes its lemmas withappropriate semantic properties it has been used to define andsummarize the features on which the opinion holder expressed hisopinionsIn order to identify the relations between opinions and features thelexical items contained in the just mentioned dictionaries have beensystematically recalled into a Nooj local grammar that provides asfeedback annotations about the (positive or negative) nature of theopinion and about the semantic category of the featuresThe objective lexicon consists in a Nooj dictionary of concrete nouns(Table 56) that has been manually built and checked by seven lin-guists It counts almost 22000 nouns described from a semantic pointof view by the property Hyper that specify every lemmarsquos hyperonymThe tags it includes are shown in Table 56 Through the softwareNooj it has been possible to create a connection between thesedictionaries and an ad hoc FSA for the feature analysisAs we will demonstrate below the feature classification is stronglydependent on the review domains and topics on the base of whichspecific local grammars need to be adapted (Engstroumlm 2004 An-dreevskaia and Bergler 2008)As shown in Figure 52 in the Feature Extraction phase the sentencesor the noun phrases expressing the opinions on specific features arefound in free texts Annotations regarding their polarity (BENEFITDRAWBACK) and their intensity (SCORE=[-3+3]) are attributed to each sen-tence on the base of the opinion words on which they are anchoredObviously the graph reported in Figure 52 represents just an extractof the more complex FSA built to perform the task which includesmore than 100 embedded graphs and thousands of different pathsable to describe not only the sentences and the phrases anchored onsentiment adjectives but also the ones that contain sentiment ad-verbs nouns verbs idioms and other lexicon-independent opinionbearing expressions (eg essere (dotato + fornito + provvisto) di ldquoto be

52 FEATURE-BASED SENTIMENT ANALYSIS 205

Label Tag Entries Example Traslationbody part Npc 598 labbro liporganism part Npcorg 627 colon colontext Ntesti 1168 rubrica address book(part of) article of clothing Nindu 1009 vestaglia nightgowncosmetic product Ncos 66 rossetto lipstickfood Ncibo 1170 agnello lambliquid substance Nliq 341 urina urinepotable liquid substance Nliqbev 29 sidro cidermoney Nmon 216 rublo ruble(part of) building Nedi 1580 torre towerplace Nloc 1018 valle valleyphysical substance Nmat 1727 agata agate(part of) plant Nbot 2187 zucca pumpkinmedicine Nfarm 415 sedativo sedativedrug Ndrug 69 mescalina mescalinechemical element Nchim 1212 cloruro chloride(part of) electronic device Ndisp 1191 motosega chainsaw(part of) vehicle Nvei 1082 gondola gondola(part of) furniture Narr 450 tavolo tableinstrument Nstrum 3666 ventaglio fanTot - 19946 - -

Table 56 Semantically Labeled Dictionary of Concrete Nouns

equipped withrdquo essere un (aspetto + nota + cosa + lato) negativo ldquoto bea negative siderdquo)The Contextual Valence Shifters that have been considered are theones described in Chapter 4Differently from the other approaches proposed in Literature theFeature Pruning task (Figure 53) in which the features extracted aregrouped together on the base of their semantic nature is performedin the same moment of the Extraction phase just applying once thedictionaries-grammar pair to free texts This happens because thegrammar shown in Figure 53 is contained into the metanode NG(Nominal Group) of the feature extraction grammarThanks to the instructions contained in this subgraph the words de-

scribed in our dictionary of concrete nouns as belonging to the same

206 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figure 52 Extract of the Feature Extraction grammar

hyperonym receive the same annotation The words labeled with thetags Narr (furniture eg ldquobedrdquo) and Nindu (article of clothing egldquobath towelrdquo) are annotated as ldquofurnishingsrdquo (first path) the wordslabeled with the tags Ncibo (food eg ldquocakerdquo) and NliqBev (potableliquid substance eg ldquocoffeerdquo) receive the annotation ldquofoodservicerdquo(second path) and so onIn order to perfect the results we also directly included in the nodes ofthe grammar the words that were important for the feature selectionbut that were not contained in the dictionary of concrete nouns (ab-stract nouns eg pranzo ldquolunchrdquo vista ldquoviewrdquo) and the hyperonymsthemselves that in general correspond with each feature class name(eg arredamento ldquofurnishingsrdquo luogo ldquolocationrdquo)Finally we disambiguated terms that in the objective dictionary wereassociated to more than one tag by excluding the wrong interpreta-tions from the corresponding node For example in the domain of

52 FEATURE-BASED SENTIMENT ANALYSIS 207

Figure 53 Extract of the Feature Pruning metanode

hotel reviews if one comments the cucina ldquocookingrdquo (first path) or thesoggiorno ldquosojournrdquo (third path) he is clearly not talking about rooms(in Italian cucina means also ldquokitchenrdquo and soggiorno ldquoliving roomrdquo)but respectively about the food service and the whole holiday expe-rienceThe annotations provided by Nooj once the lexical and grammaticalresources are applied to texts written in natural language can beon the form of XML (eXtensible Markup Language) tags For thisreason they can be effortlessly recalled into an application that au-tomatically applies the Nooj ad hoc resources and makes statistical

208 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

analyses on them Therefore we dedicated a specific module of Doxato the feature-based sentiment analysis that has been tested on acorpus composed of 1000 hotel reviews collected from websites likewwwtripadvisorit wwwexpediait and wwwbookingcomAn extract of the automatically annotated corpus is reported below

Review

Sono stato in questo albergo quasi per caso dopo aver subitoun incidente e devo ammettere che se non fosse stato per ladisponibilitagrave la cortesia e la professionalitagrave di Mario Pinoe Rosa non avrei potuto risolvere una serie di cose Lrsquohotelegrave fantastico silenzioso e decisamente pulito La colazioneottima ed il responsabile della sala colazione gentilissimo

ldquoI happened to be in this hotel by accident and I must admitthat if it hadnrsquot been for the willingness the kindness andthe competence of Mario Pino and Rosa I couldnrsquot havesolved a number of things The hotel is fantastic silentand definitely clean Great the breakfast and very kind theresponsible of the breakfast roomrdquo

Annotation

Sono stato in questo albergo quasi per caso dopo aver subito un incidente e devo ammettere

che se non fosse stato per ltBENEFIT TYPE=ATTITUDES SCORE=2gt la disponibilitagrave

ltBENEFITgt ltBENEFIT TYPE=ATTITUDES SCORE=2gt la cortesia ltBENEFITgt e

ltBENEFIT TYPE=ATTITUDES SCORE=2gt la professionalitagrave ltBENEFITgt di Mario Pino

e Rosa non avrei potuto risolvere una serie di cose

ltBENEFIT TYPE=ACCOMMODATIONS SCORE=3gt Lrsquohotel egrave fantastico ltBENEFITgt

ltBENEFIT TYPE=LOCATION SCORE=2gt silenzioso ltBENEFITgt e

ltBENEFIT SCORE=3 TYPE=CLEANLINESSgt decisamente pulito ltBENEFITgt

ltBENEFIT TYPE=FOODSERVICE SCORE=3gt La colazione ottima ltBENEFITgt ed

52 FEATURE-BASED SENTIMENT ANALYSIS 209

ltBENEFIT TYPE=ATTITUDES SCORE=3gt il responsabile della sala colazione gentilissimo

ltBENEFITgt

The evaluation presented in Table 57 shows that the method pro-posed in the present work performs very well with the hotel domainWe manually calculated on a sample of 100 documents from theoriginal corpus the Precision on the Semantic Orientation Identi-fication task (that regards the annotation of the sentencesrsquo polarityand intensity) and on the Feature Classification task (that regards theannotation of the types of the feature) Because the classes used to

Method Precision Recall F-scoreSO Identification Classification

No residual category 933 924 723 811Residual category 929 826 783 804

Table 57 F-score of Feature Extraction and Pruning

group the features have been identified a priori in our grammar wealso checked the results of our tool with the introduction of a residualcategory to annotate the features that do not belong to these classesEven though this causes an increase on the Recall a commensuratedecrease of the Precision in the Classification task makes the valueof the F-score similar to the one reached when the residual categoryis not taken into account Significant examples of the sentences thathave been automatically annotated in our corpus are given below

(82) Il responsabile della sala colazione gentilissimoldquoVery kind the responsible of the breakfast roomrdquoBENEFIT TYPE= ATTITUDES SCORE= 3

(83) Le camere erano sporcheldquoThe rooms were dirtyrdquoDRAWBACK TYPE= CLEANLINESS SCORE= -2

210 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

(84) Lrsquohotel egrave vicinissimo alla metroldquoThe hotel is really close to the tuberdquoBENEFIT TYPE= LOCATION SCORE= 3

In (82) the subjective adjective gentilissimo ldquovery kindrdquo recognisedby the grammar as the superlative form (SCORE=+3) of the positive adjec-tive gentile ldquokindrdquo (SCORE=+2) makes the grammar recognize the wholesentiment expression and provides the information regarding its po-larity and intensity This is why the FSA associates the tag BENEFIT andthe higher score to the feature that in this case is represented by anounIn (83) is shown that the ldquotyperdquo of the features can be indicated notonly by nouns but also by specific adjectives as happens with sporcheldquodirtyrdquo that is both an opinion word that specifies that the whole ex-pression is negative and a feature class indicator that let the grammarclassify it in the group of ldquocleanlinessrdquo rather than in the group of ldquoac-commodationsrdquo (that instead would be selected by the noun cameraldquoroomrdquo described in our objective dictionary with the tags Conc ldquocon-creterdquo and Nedi ldquobuildingrdquo)In the end sentence (84) attests the strength of the influence of thereview domains sometimes the lexicon is not enough to correctly ex-tract and classify the product features Domain-specific lexicon in-dependent patterns are often required in order to make the analysesadequate (Engstroumlm 2004 Andreevskaia and Bergler 2008)

523 Opinion Visual Summarization

We said in Section 51 that Doxa is a prototype able to analyze opin-ionated documents from their semantic point of view It automati-cally applies the Nooj resources to free texts sums up the values cor-responding to every sentiment expression standardizes the results forthe total number of sentiment expressions contained in the reviewsand then provides statistics about the opinions expressed

53 SENTIMENT ROLE LABELING 211

In this Section we show the Doxa improved functionalities dedicatedto the sentiment analysis based on the opinion features In the Ho-tel domain this deeper analysis considerably increases the Precisionon the sentences-level SO identification task that goes from 813 to933 without any loss in the Recall that holds steady just rising of02 percentage points from 721 to 723 (see Table 57)The feature-based module of Doxa completely grounded on the auto-matic analysis of free texts allows the comparison between more thanone object on the base of different kinds of aspects that characterizethemThe regular decagon in the center of the spider graphs of Figure 54represents the threshold value (zero) with which every group of opin-ion on the same object is compared when a figure is larger than it theopinion is positive when it is smaller the opinion is negative when afeaturersquos value is zero it means that the consumers did not express anyjudgment on that topicThe automatic transformation of non-structured reviews in struc-tured visual summaries has a strong impact on the work of companymanagers that must ground their marketing strategies on the analy-sis of a large amount of information and on the individual consumersthat need to quickly compare the Pros and Cons of the products orservices they want to buy

53 Sentiment Role Labeling

Semantic Roles also called theta or thematic roles (Chomsky 1993Jackendoff 1990 see Section 214) are linguistic devices able to iden-tify in texts ldquoWho did What to Whom and How When and Whererdquo(Palmer et al 2010 p 1)The main task of an SRL system is to map the syntactic elements of freetext sentences to their semantic representation through the identifi-cation of all the syntactic functions of clause-complements and their

212 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figu

re54D

oxam

od

ule

for

Feature-b

asedSen

timen

tAn

alysis

53 SENTIMENT ROLE LABELING 213

labeling with appropriate semantic roles (Rahimi Rastgar and Razavi2013 Monachesi 2009)To work an SRL system needs machine readable lexical resources inwhich all the predicates and their relative arguments are specifiedFrameNet (Baker et al 1998 Fillmore et al 2002 Ruppenhofer et al2006) VerbNet Schuler (2005) and the PropBank Frame Files (Kings-bury and Palmer 2002) provide the basis for large scale semantic an-notation of corpora (Giuglea and Moschitti 2006 Palmer et al 2010)In this Section according with the Semantic Predicates theory (Gross1981) we performed a matching between the definitional syntacticstructures attributed to each class of verbs and the semantic infor-mation we attached in the database to every lexical entryThis way we could create a strict connection between the argumentsselected by a given Predicate listed in our tables and the actants in-volved into the same verbrsquos Semantic Frame (Fillmore 1976 Fillmoreand Baker 2001 Fillmore 2006) In the experiment that we are goingto describe we focused on the following frames

bull ldquoSentiment (eg odiare ldquoto hate) that involves the roles experi-encer and stimulus

bull ldquoOpinion (eg difendere ldquoto stand up for) that implicates theroles opinion holder target of the opinion and cause of the opin-ion

bull ldquoPhysical act (eg baciare ldquoto kiss) that evokes the roles patientand agent

531 Related Works

A group of researches in the field of Semantic Role Labeling focusedon the automatic annotation of the roles which are relevant to Sen-timent Analysis and opinion mining such as sourceopinion holdertargetthemetopic

214 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Bethard et al (2004) with question answering purposes proposed anextension of semantic parsing techniques coupled with additionallexical and syntactic features for the automatic extraction of propo-sitional opinions The algorithm used to identify and label semanticconstituents is the one developed by Gildea and Jurafsky (2002) Prad-han et al (2003)Kim and Hovy (2006) presented a Semantic Role Labeling systembased on the Frame Semantics of Fillmore (1976) that aimed to tagthe frame elements semantically related to an opinion bearing wordexpressed in a sentence which is considered as the key indicatorof an opinion The learning algorithm used for the classificationmodel is Maximum Entropy (Berger et al 1996 Kim and Hovy 2005)FrameNet II Ruppenhofer et al (2006) has been used to collect framesrelated to opinion wordsOther works based on FrameNet are Houen (2011) that used Frame-Frame relations in order to expand the subset of polarity scoredFrames found in the FrameNet database Ruppenhofer et al (2008)Ruppenhofer and Rehbein (2012) and Ruppenhofer (2013) whichadded opinion frames with slots for source target polarity and in-tensity to the FrameNet database to build SentiFrameNet a lexi-cal database enriched with inherently evaluative items associated toopinion framesLexicon-based approaches are the one of Kim et al (2008) that madeuse of a collection of communication and appraisal verbs the Sen-tiWordNet lexicon a syntactic parser and a named entity recognizerfor the extraction of opinion holders from texts and the one ofBloom et al (2007) that extracted domain-dependent opinion holdersthrough a combination of heuristic shallow parsing and dependencyparsingConditional Random Fields (Lafferty et al 2001) is the method usedby Choi et al (2005) that interpreting the semantic role labeling taskas an information extraction problem exploited a hybrid approachthat combines two very different learning-based methods CRF for the

53 SENTIMENT ROLE LABELING 215

named entity recognition and a variation of AutoSlog (Riloff 1996) forthe information extraction CRF have been also used by by Choi et al(2006) who presented a global inference approach to jointly extractentities and relations in the context of opinion oriented informationextraction through two separate token-level sequence-tagging classi-fiers for opinion expression extraction and source extraction via lin-ear chain CRF (Lafferty et al 2001)

532 Experiment and Evaluation

The starting point of our experiment on SRL are the LG tables of theItalian verbs developed at the Department of Communication Sci-ence of the University of Salerno Among the lexical entries listed insuch matrices we manually extracted all the verbs endowed with apositive or negative defined semantic orientation Such verbs couldbe expression of emotions opinions or physical acts Details of thisclassification can be found in Section 333As an example in Tables 58 and 59 we show a small group of verbs be-longing to the Lexicon-Grammar class 45 This class includes all theverbs that can entry into a syntactic structure such as N0 V di N1 inwhich the ldquosubject (N0) selected by the verb (V ) is generally a humannoun (Num) and the complement (N1) is a completive (Ch F) or in-finitive (V-inf comp) clause usually introduced by the preposition ldquodi(see Table58) As shown in Table 59 our databases contain also se-mantic information concerning the nature the semantic orientationand the strength of the Predicates in examIn detail the tagset used in this experiment is the following

1 Type

(a) SENT sentiment

(b) OP opinion

(c) PHY physical act

216 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

N0=

Nu

m

N0=

ilfa

tto

Ch

F

Verb

N1=

che

F

N1=

che

Fco

ng

diN

1um

diN

1-u

m

diN

1co

ntr

oN

2

pre

pV

-in

fco

mp

+ - profittare + + + + + -+ - ridersene + + + + - -+ - risentirsi + + + + - ++ - strafottersene + + + + - -+ - vergognarsi + + + + - +

Table 58 Extract of the LG table of the verb Class 45

2 Orientation

(a) POS positive

(b) NEG negative

3 Intensity

(a) FORTE intense

(b) DEB feeble

Speaking in terms of Frame Semantics we identified in the OpinionMining and in the Sentiment Analysis field three Frames of interestrecalled by specific Predicates Sentiment Opinion and Physical actThe frame elements evoked by such frames are described below

Sentiment

It refers to the expression of any given frame of mind or affectivestateThe ldquosentiment words can be put in connection with some WordNetAffect categories (Strapparava et al 2004) such as emotion mood he-donic signalExamples are sdegnarsi ldquoto be indignant (class 10) odiare ldquoto hate(class 20) affezionarsi ldquoto grow fond (class 44B) flirtare ldquoto flirt

53 SENTIMENT ROLE LABELING 217

(class 9) disprezzare ldquoto despise (class 20) gioire ldquoto rejoice (class45)Predicates of that kind evoke as frame elements an experiencerthat feels the emotion or other internal states and a stimulusand event or a person that instigates such states (Gross 1995Gildea and Jurafsky 2002 Swier and Stevenson 2004 Palmeret al 2005 Elia 2013) This semantic frame summarizes theFrameNet ones connected to emotions such as Sensation Emo-tions_of_mental_activity Cause_to_experience Emotion_active Emo-tions Cause_emotion ect

Opinion The type ldquoOpinion instead is the expression of positiveor negative viewpoints beliefs or judgments that can be personal orshared by most people It comprehends among the the WN-affect cat-egories trait cognitive state behavior attitude Examples are ignorareldquoto neglect (class 20) premiare ldquoto reward (class 20) difendere ldquotodefend (class 27) esaltare ldquoto exalt (class 22) dubitare ldquoto doubt(class 45) condannare ldquoto condemn (class 49) deridere ldquoto make funof (class 50)The frame elements they evoke are an opinion holder that states anopinion about an object or an event and an opinion target that repre-sents the event or the object on which the opinion is expressed about(Kim and Hovy 2006 Liu 2012) Some predicates imply also the pres-ence of a cause that motivates the opinion (see Section 333)Into the FrameNet frame Opinion and Judgment the opinion holder iscalled Cognizer bu we preferred to use a word which is more commonin the Sentiment Analysis and in the Opinion Mining literature

Physical act

The type ldquoPhysical act comprises verbs like baciare ldquoto kiss (class18) suicidarsi ldquoto commit suicide (class 2) vomitare ldquoto vomit (class2A) schiaffeggiare ldquoto slap (class 20) sparare ldquoto shoot (class 4)

218 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

palpeggiare ldquoto grope (class 18)For this group of predicates the selected frame elements ara a patientthat is the victim (for the negative actions) or the beneficiary (for thepositive ones) of the physical act carried out by an agent (Carreras andMagraverquez 2005 Magraverquez et al 2008) It includes a large number ofFrameNet frames such as Cause_bodily_experience Shoot_projectilesKilling Cause_harm Rape Sex Violence ect

In order to perform the orientation and the intensity attributionwe manually explored the Italian LG tables of verbs and the SentItasemantic labelsThanks to lexical resources of this kind it is possible to automaticallyextract and semantically describe real occurrences of sentences like(85)

(85) [Sentiment[ExperiencerRenzi(N0)] si vergogna(V) [Stimulusdi parlare di energia in Eu-

ropa(N1)]]

ldquoRenzi feels ashamed of talking about energy in Europe

in which the syntactic structure of the verb vergognarsi ldquoto feelashamed N0 V (di) Ch F is matched by means of interpretationrules (eg V = N0 V (di) Ch F = Caus(s sent h) N0 = h N1 = s) to thesemantic function Caus(s sent h) that puts in relation an experiencerh and a stimulus s thanks to a Sentiment Semantic Predicate sent(Gross 1995 Section 214)Moreover we provided our LG databases with the specification of thearguments (N0 N1 N2 ect) that are semantically influenced by thesemantic orientation of the verbs The purpose is to correctly identifythem as features of the opinionated sentences and to work on theirbase also into feature-based sentiment analysis tasks

The reliability of the LG method on the Semantic Role Labelingin the Sentiment Analysis task has been tested on three different

53 SENTIMENT ROLE LABELING 219

Verb

LGC

lass

Typ

e

Ori

enta

tio

n

Inte

nsi

ty

Infl

uen

ce

profittare 45 opinion negative - N0ridersene 45 opinion negative weak N1risentirsi 45 sentiment negative - N0strafottersene 45 opinion negative strong N1vergognarsi 45 sentiment negative strong N0

Table 59 Examples of semantic descriptions of the opinionated verbs from the LGclass 45

datasets two of which have been extracted from social networks orweb resourcesIn detail the first two datasets come from Twitter and the thirdis a free web news headings dataset provided by DataMediaHub(wwwdatamediahubit wwwhumanhighwayit)The tweets have been downloaded using the hashtag that groupstogether the user comments on the election of the Italian PresidentMattarella (1) and the one that collects the comments on the Mas-terchef Italian TV show (2)

bull Tweets 46393 tweets

1 Mattarellapresidente 10000 tweets

2 Masterchefit 36393 tweets

bull News Headings 80651 titles

The LG based approach on which this experiment has been builtincludes the following basic steps

bull pre-processing pipeline that includes two phases

ndash a cleaning up phase carried out with Python routines that

220 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

aims to distinguish in the datasets linguistic elements fromstructural elements (eg markup informations web specificelements)

ndash an automatic linguistic analysis phase with the goal to lin-guistically standardize relevant elements obtained from thecleaned datasets in this phase texts are tokenized lem-matized and POS tagged using TreeTagger (Schmid 1994Schmid et al 2007) and then parsed using DeSR a depen-dency parser (Attardi et al 2009)

bull LG based automatic analysis in which the raw data are seman-tically labeled according with the syntacticsemantic rules of in-terpretation connected with each LG verb class

Figure 55 Examples of syntactically and semantically annotated sentences

53 SENTIMENT ROLE LABELING 221

Figure 55 presents three headlines examples processed both withthe dependency syntactic parser and the semantic LG-based seman-tic analyzerNotice that the elements of the traditional grammar automaticallyidentified by DeSR such as subjects and complements have beenrenamed according with the Lexicon-Grammar traditionThe corpus that counts 127044 short texts has been analyzed andsemantically and syntactically annotatedThe representative sample on which the human evaluation has beenperformed instead has 42348 textsThe evaluation of the performances of our tool proved the effec-tiveness of the Lexicon-Grammar approach The average F-scoresachieved in the different datasets are 071 in the Twitter and 076 inthe Heading corpus

Chapter 6

Conclusion

The present research handled the detection of linguistic phenomenaconnected to subjectivity emotions and opinions from a computa-tional point of viewThe necessity to quickly monitor huge quantity of semistructured andunstructured data from the web poses several challenges to NaturalLanguage Processing that must provide strategies and tools to ana-lyze their structures from a lexical syntactical and semantic point ofviewsThe general aim of the Sentiment Analysis shared with the broaderfields of NLP Data Mining Information Extraction etc is the auto-matic extraction of value from chaos (Gantz and Reinsel 2011) itsspecific focus instead is on opinions rather than on factual informa-tion This is the aspect that differentiates it from other computationallinguistics subfieldsThe majority of the sentiment lexicons has been manually or auto-matically created for the English language therefore existent Italianlexicons are mostly built through the translation and adaptation of theEnglish lexical databases eg SentiWordNet and WordNet-AffectUnlike many other Italian and English sentiment lexicons SentIta

223

224 CHAPTER 6 CONCLUSION

made up on the interaction of electronic dictionaries and lexicon de-pendent local grammars is able to manage simple and multiwordstructures that can take the shape of distributionally free structuresdistributionally restricted structures and frozen structuresMoreover differently from other lexicon-based Sentiment Analysismethods our approach has been grounded on the solidity of theLexicon-Grammar resources and classifications that provide fine-grained semantic but also syntactic descriptions of the lexical entriesSuch lexically exhaustive grammars distance themselves from the ten-dency of other sentiment resources to classify together words thathave nothing in common from the syntactic point of viewFurthermore through the Semantic Predicate theory it has been pos-sible to postulate a complex parallelism among syntax and semanticsthat does not ignore the numerous idiosyncrasies of the lexiconAccording with the major contribution in the Sentiment Analysis lit-erature we did not consider polar words in isolation We computedthey elementary sentence contexts with the allowed transformationsand then their interaction with contextual valence shifters the lin-guistic devices that are able to modify the prior polarity of the wordsfrom SentIta when occurring with them in the same sentencesIn order to do so we took advantage of the computational power ofthe finite-state technology We formalized a set of rules that work forthe intensification downtoning and negation modeling the modalitydetection and the analysis of comparative formsHere the difference with other state-of-the-art strategies consists inthe elimination of complex mathematical calculation in favor of theeasier use of embedded graphs as containers for the expressions de-signed to receive the same annotations into a compositional frame-workWith regard to the applicative part of the research we conducted withsatisfactory results three experiments on the same number of Sen-timent Analysis subtasks the sentiment classification of documentsand sentences the feature-based Sentiment Analysis and the seman-

225

tic role labeling based on sentimentsThis work exploited a method based on knowledge but this doesnot run counter statistical strategies All the semantically annotateddatasets produced by our fine-grained analysis that can indeed bereused as testing sets for learning machinesAs concerns the limitations of this research we mention among oth-ers the cases of irony sarcasm and cultural stereotypes which stillremain open problems for the NLP in general and for the SentimentAnalysis in particular since they can completely overturn the polarityof the sentencesSarcasm hyperbole jocularity etc are all rhetorical devices usedto express verbal irony a figurative linguistic process used to inten-tionally deny what it is literally expressed (Gibbs 2000 Curcoacute 2000)Therefore irony could been interpreted as a sort of indirect negation(Giora 1995 Giora et al 1998) in which some expectations are vio-lated (Clark and Gerrig 1984 Wilson and Sperber 1992) On this pur-pose (Reyes et al 2013 p 243) affirms that irony is perceived ldquoat theboundaries of conflicting frames of reference in which an expectationof one frame has been inappropriately violated in a way that is appro-priate in the otherrdquoThis linguistic device has been explored in both computational lin-guistics studies and in Opinion Mining works Sets of potential indica-tors have been tested on large corpora through different approachesobtaining encouraging results over the state-of-the-art in some re-spectsIn detail Tepperman et al (2006) used prosodic spectral and contex-tual cues Carvalho et al (2009) selected as irony cues emoticons ono-matopoeic expressions for laughter heavy punctuation marks quota-tion marks and positive interjections Davidov et al (2010) exploitedthe meta-data provided by Amazon to locate gaps between the starratings and the sentiments expressed in the raw reviews Reyes andRosso (2011) chose n-grams part of speech n-grams funny profilingpositivenegative profiling affective profiling and pleasantness pro-

226 CHAPTER 6 CONCLUSION

filing Gonzaacutelez-Ibaacutenez et al (2011) identified sarcasm through punc-tuation marks emoticons quotes capitalized words discursive termsthat hint at opposition or contradiction in a text temporal and contex-tual imbalance character-grams skip-grams and emotional scenar-ios concerning activation imagery and pleasantness Maynard andGreenwood (2014) exploited the hashtag tokenizationAmong others Veale and Hao (2009) carried out a very interestingstudy on creative irony focusing on ironic comparisons of the kind ATopic is as Ground as a Vehicle (Fishelov 1993 Moon 2008) whichcan consist on pre-fabricated similes eg ldquoas strong as an oxrdquo (Tay-lor 1954 Norrick 1986 Moon 2008) or on new and more emphaticand creative variations along the frozen similes eg ldquoas dangerous asa toothless wolfrdquo The authors found out that the 2 of these elabora-tions subvert the original simile to achieve an ironic effect The prob-lem is that sarcasm in general is a phenomenon inherently difficult toanalyze for machines and often for humans as well (Justo et al 2014Maynard and Greenwood 2014) In addition the barriers to the cre-ation of brand new comparisons are very low (Veale and Hao 2009)In any case the solutions to verbal irony proposed in literature seemfar from being accurate and generalizable They frequently lead tocontradictory results such as with the use of punctuation markswhich made Carvalho et al (2009) and Tepperman et al (2006) reachrelatively high precision but served as very weak predictors for Justoet al (2014) and Tsur et al (2010)In this works speaking in cost-benefit terms we realized that thequantity of errors introduced by the use of irony cues would havegone over the errors caused by their absence Therefore we limited theirony analysis to the cases of frozen comparative sentences describedin Section 342 that can attribute a semantic orientation to neutralwords (86) or reverse the orientation of polar words (87)

(86) Arturo egrave bianco[0] come un cadavere [-2]ldquoArturo is as white as a dead body (Arturo is pale)

227

(87) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

Anyway as can be noticed in (88) and (89) ironic comparisons canvary considerably from one case to another (Veale and Hao 2009) andcannot be distinguished automatically from non-ironic comparisons(90) as long as we don not include (domain-dependent) encyclopedicknowledge into the analysis tools

(88) La ripresa egrave degna[+2] di un trattore con aratro inseritoldquoThe pickup is worthy of a tractor with an inserted ploughrdquo

(89) E quel tocco di piccante () egrave gradevole[+2] quanto lo sarebbe unaspruzzata di pepe su un gelato alla panna

ldquoAnd that touch of piquancy ( ) is as pleasant as a spattering ofpepper on a cream flavoured ice-creamrdquo

(90) Gli spazi interni sono comodi[+2] come una berlina [+2]ldquoInside spaces are as comfortable as a sedanrdquo

Similar comments on the required encyclopedic knowledge can bemade about cultural stereotypes as well (91 92)

(91) La nuova fiat 500 egrave consigliabile[+2] molto di piugrave ad una ragazza[-2]

ldquoThe new Fiat 500 is recommended a lot more to a girlrdquo

(92) Un gioco per bambini di 12 anni [-2]ldquoA game for 12 years old childrdquo

Appendix A

Italian Lexicon-GrammarSymbols

Agg Val Evaluative adjective

Agg Adjective

Avv Adverbe

Ci Constrained nominal group

Ch F Completive complement without specification of mood

Det Determiner

N0 Sentence formal subject

N123 Sentence complements

Ni Nominal group

N Noun

Napp Appropriate noun

Nconc Concrete noun

229

230 APPENDIX A ITALIAN LEXICON-GRAMMAR SYMBOLS

Npc Noun of body part

Nsent Noun of Sentiment

Num Human noun

N-um Not human noun

Ppv Pronoun in preverbal position

Prep Preposition

V Verb

V-a Adjectivalization

Vcaus Causative verb

V-n Nominalization

Vsup Support verb

Vval Evaluative verb

Vvar Support verb variant

Appendix B

Acronyms

ALU Atomic Linguistic Units

cBNP Combined Pattern Based Noun Phrases

CCG Combinatory Categorial Grammar

CFE Concordance-based Feature Extraction

CRF Conditional Random Field

CSR Class Sequential Rule

CVS Contextual Valence Shifters

DRS Discourse Representation Structure

ERTN Enhanced Recursive Transition Network

eWOM electronic Word of Mouth

FMM Forward Maximum Matching

FSA Finite State Automata

FST Finite State Transducers

231

232 APPENDIX B ACRONYMS

LDA Latent Dirichlet Allocation

LG Lexicon-Grammar

LSA Latent Semantic Analysis

LSF Lexical Syntactic Feature

mCVS Morphological Contextual Valence Shifters

MPQA Multi Perspective Question Answering

NLP Natural Language Processing

PMI Pointwise Mutual Information

POS Part of Speech

PP Prior Polarity

QN Quality Noun

RNTN Recursive Neural Tensor Network

RST Relevant Semantic Tree

RTN Recursive Transition Network

SO Semantic Orientation

SVM Support Vector Machine

TGG Transformational-Generative Grammar

UDF User Defined taxonomy of entity Feature

WMI Web-based Mutual Information

XML eXtensible Markup Language

List of Figures

21 Syntactic Trees of the sentences (1) and (2) 2422 Syntactic Trees of the sentences (1b) and (2b) 2623 LG syntax-semantic interface 3224 Deterministic Finite-State Automaton 3725 Non deterministic Finite-State Automaton 3726 Finite-State Transducer 3727 Enhanced Recursive Transition Network 37

31 FSA for the interaction between Nsent and semanticallyannotated verbs 88

32 Extract of the morphological FSA that associates psychverbs to their adjectivalizations 92

33 Frozen and semi-frozen expressions that involve the vul-gar word cazzo and its euphemisms 103

34 Extract of the FSA for the SO identification of idioms 12035 Extract of the FSA for the extraction of PECO idioms 12436 Extract of the FSA for the automatic annotation of senti-

ment adverbs 13737 Extract of the FSA for the automatic annotation of qual-

ity nouns 14338 Morphological Contextual Valence Shifting of Adjectives 150

233

234 LIST OF FIGURES

51 Doxa the LG sentiment classifier based on the finite-state technology 189

52 Extract of the Feature Extraction grammar 20653 Extract of the Feature Pruning metanode 20754 Doxa module for Feature-based Sentiment Analysis 21255 Examples of syntactically and semantically annotated

sentences 220

List of Tables

21 Example of a Lexicon-Grammar Binary Matrix 27

31 Manually built Polarity Lexicons 4632 (Semi-)Automatically built Polarity Lexicons 4633 Sentix Basile and Nissim (2013) 5434 SentIta Evaluation tagset 5735 SentIta Strength tagset 5736 Composition of SentIta 5837 SentIta tag distribution in the adjective list 6138 Adjectives percentage values in SentIta 6139 Examples from the Psych Predicates dictionary 67310 Psych Predicates chosen for SentIta 67311 All the Semantic Predicates from SentIta 68312 Polar verbs in the main Lexicon-Grammar verbal sub-

classes 79313 Polar verbs in other LG verb classes which contain at

least one oriented item 80314 Distribution of semantic labels into the LG verb classes

that contain at least one oriented item 81315 Description of the Nominalizations of the Psychological

Semantic Predicates 82316 Nominalizations of the Psych Predicates chosen from

SentIta 84

235

236 LIST OF TABLES

317 Distribution of semantic tags in the dictionary of Adjec-tival Psych Predicates 90

318 Percentage values of Psych Adjectivalizations in SentIta 91319 Suffixes from the dictionary of Psych Adjectivalizations 93320 Adjectivalizations of the Psych Predicates 94321 SentIta tag distribution in the list of bad words 105322 Bad Words percentage values in SentIta 105323 Classes of Compound Adverbs in SentIta 110324 Composition of the compound adverb dictionary 113325 Polar adjectives in frozen sentences 115326 Examples of idioms that maintain switch or shift the

prior polarity of the adjectives they contain 117327 Composition of the adverb dictionary 138328 Adjectiversquos inflectional classes Salvi and Vanelli (2004) 139329 Rules for the adverb derivation 139330 Error analysis of the automatic QN annotation 144331 Presence of the QN suffixes in different dictionaries 145332 Productivity of the QN suffixes in different dictionaries 146333 About the overlap among Psych V-n dictionary and QN

dictionary 147334 Precision measure about the overlap of QN and Psy V-n 147335 Negation suffixes 148336 Intensifierdowntoner suffixes 149

41 The effects of troppo poco and abbastanza on polar lex-ical items 160

42 The effects of the co-occurrence and the negation oftroppo poco with reference to polar lexical items 160

43 Negation rules 16744 Negation rules with the repetition of negative operators 16745 Negation rules with strong operators 16746 Negation rules with weak operators 168

LIST OF TABLES 237

47 Rule I increasing comparative sentences with positiveorientation 180

48 Rule III decreasing comparative sentences with nega-tive orientation 180

49 Rule IV increasing comparative sentences with NegativeOrientation 181

410 Rule VI decreasing comparative sentences with positiveorientation 181

411 Nagation of Rule I Negated increasing comparative sen-tences with negative orientation 182

412 Negation of Rule III Negated decreasing comparativesentences with positive orientation 182

413 Negation of Rule IV Negated increasing comparativesentences with negative orientation 183

414 Negation of RuleVI Negated decreasing comparativesentences with positive orientation 183

51 Dataset of opinionated online customer reviews 19052 Precision measure in sentence-level classification 19353 Irrelevant matches in sentence-level classification 19554 Precision measure in document-level classification 19655 Recall in both the sentence-level and the document level

classifications 19656 Semantically Labeled Dictionary of Concrete Nouns 20557 F-score of Feature Extraction and Pruning 20958 Extract of the LG table of the verb Class 45 21659 Examples of semantic descriptions of the opinionated

verbs from the LG class 45 219

Bibliography

Abdul-Mageed M and Diab M (2012) Toward building a large-scalearabic sentiment lexicon In Proceedings of the 6th InternationalGlobal WordNet Conference pages 18ndash22

Abdul-Mageed M Diab M T and Korayem M (2011) Subjectiv-ity and sentiment analysis of modern standard arabic In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies short papers-Volume 2pages 587ndash591 Association for Computational Linguistics

Agerri R and Garciacutea-Serrano A (2010) Q-WordNet Extracting polar-ity from WordNet senses In Proceedings of the International Confer-ence on Language Resources and Evaluation pages 2300ndash2305

Akerlof G A (1970) The market for ldquolemonsrdquo Quality uncertaintyand the market mechanism In The quarterly journal of economicspages 488ndash500 JSTOR

Alba J Lynch J Weitz B Janiszewski C Lutz R Sawyer A andWood S (1997) Interactive home shopping consumer retailer andmanufacturer incentives to participate in electronic marketplacesIn The Journal of Marketing pages 38ndash53 JSTOR

Alm C O Roth D and Sproat R (2005) Emotions from text ma-chine learning for text-based emotion prediction In Proceedingsof the conference on Human Language Technology and Empirical

239

240 BIBLIOGRAPHY

Methods in Natural Language Processing pages 579ndash586 Associa-tion for Computational Linguistics

Andreevskaia A and Bergler S (2008) When specialists and gener-alists work together Overcoming domain dependence in sentimenttagging In ACL pages 290ndash298

Arad M (1998) Psych-notes In UCL Working Papers in Linguisticsvolume 10 pages 1ndash22

Argamon S Bloom K Esuli A and Sebastiani F (2009) Automati-cally determining attitude type and force for sentiment analysis InHuman Language Technology Challenges of the Information Societypages 218ndash231 Springer

Attardi G DellrsquoOrletta F Simi M and Turian J (2009) Accuratedependency parsing with a stacked multilayer perceptron In Pro-ceedings of EVALITA volume 9

Aue A and Gamon M Customizing sentiment classifiers to new do-mains A case study In Proceedings of recent advances in naturallanguage processing (RANLP)

Baccianella S Esuli A and Sebastiani F (2010) SentiWordNet 30An enhanced lexical resource for sentiment analysis and opinionmining In LREC volume 10 pages 2200ndash2204

Baker C F Fillmore C J and Lowe J B (1998) The berkeleyframenet project In Proceedings of the 17th international confer-ence on Computational linguistics-Volume 1 pages 86ndash90 Associa-tion for Computational Linguistics

Baker M (1988) Theta theory and the syntax of applicatives inchichewa In Natural Language amp Linguistic Theory volume 6 pages353ndash389 Springer

Baker M C (1997) Thematic roles and syntactic structure In Ele-ments of grammar pages 73ndash137 Springer

BIBLIOGRAPHY 241

Balahur A and Turchi M (2012) Multilingual sentiment analysis us-ing machine translation In Proceedings of the 3rd Workshop inComputational Approaches to Subjectivity and Sentiment Analysispages 52ndash60 Association for Computational Linguistics

Baldoni M Baroglio C Patti V and Rena P (2012) From tags toemotions Ontology-driven sentiment analysis in the social seman-tic web In Intelligenza Artificiale volume 6 pages 41ndash54

Balibar-Mrabti A (1995) Une eacutetude de la combinatoire des noms desentiment dans une grammaire locale In Langue franccedilaise pages88ndash97 JSTOR

Baptista J (2003) Some families of compound temporal adverbs inportuguese In Proceedings of the Workshop on Finite-State Methodsfor Natural Language Processing

Baroni M and Vegnaduzzo S (2004) Identifying subjective adjec-tives through web-based mutual information In Proceedings ofKONVENS volume 4 pages 17ndash24

Basile V and Nissim M (2013) Sentiment analysis on italian tweetsIn Proceedings of the 4th Workshop on Computational Approaches toSubjectivity Sentiment and Social Media Analysis pages 100ndash107

Belletti A and Rizzi L (1988) Psych-verbs and θ-theory In NaturalLanguage amp Linguistic Theory volume 6 pages 291ndash352 Springer

Benamara F Cesarano C Picariello A Recupero D R and Subrah-manian V S (2007) Sentiment analysis Adjectives and adverbs arebetter than adjectives alone In ICWSM

Benamara F Chardon B Mathieu Y Popescu V and Asher N(2012) How do negation and modality impact on opinions In Pro-ceedings of the Workshop on Extra-Propositional Aspects of Meaningin Computational Linguistics pages 10ndash18 Association for Compu-tational Linguistics

242 BIBLIOGRAPHY

Bennis H (2004) Unergative adjectives and psych verbs In Studiesin unaccusativity The syntax-lexicon interface pages 84ndash113

Berger A L Pietra V J D and Pietra S A D (1996) A maximum en-tropy approach to natural language processing In Computationallinguistics volume 22 pages 39ndash71 MIT Press

Bespalov D Bai B Qi Y and Shokoufandeh A (2011) Sentimentclassification based on supervised latent n-gram analysis In Pro-ceedings of the 20th ACM international conference on Informationand knowledge management pages 375ndash382 ACM

Bethard S Yu H Thornton A Hatzivassiloglou V and Jurafsky D(2004) Automatic extraction of opinion propositions and their hold-ers In 2004 AAAI Spring Symposium on Exploring Attitude and Affectin Text volume 2224

Bhatti M W Wang Y and Guan L A neural network approachfor human emotion recognition in speech In Circuits and Systems2004 ISCASrsquo04 volume 2

Blitzer J Dredze M Pereira F et al (2007) Biographies bollywoodboom-boxes and blenders Domain adaptation for sentiment clas-sification In ACL volume 7 pages 440ndash447

Bloom K (2011) Sentiment analysis based on appraisal theory andfunctional local grammars PhD thesis Illinois Institute of Technol-ogy

Bloom K Garg N and Argamon S (2007) Extracting appraisal ex-pressions In HLT-NAACL volume 2007 pages 308ndash315

Bloomfield L (1933) Language holt new york In Edizione italiana(1974) Il linguaggio Il Saggiatore Milano

Bocchino F (2007) Lessico-Grammatica dellrsquoitaliano Le costruzioniintransitive PhD thesis Universitagrave degli studi di Salerno

BIBLIOGRAPHY 243

Bos J and Nissim M (2006) An empirical approach to the interpreta-tion of superlatives In Proceedings of the 2006 conference on empiri-cal methods in natural language processing pages 9ndash17 Associationfor Computational Linguistics

Bosco C Allisio L Mussa V Patti V Ruffo G Sanguinetti M andSulis E (2014) Detecting happiness in italian tweets Towards anevaluation dataset for sentiment analysis in felicitta In Proc of ES-SSLOD pages 56ndash63

Bosco C Patti V and Bolioli A (2013) Developing corpora for sen-timent analysis The case of irony and senti-tut In IEEE IntelligentSystems number 2 pages 55ndash63 IEEE

Boucher J and Osgood C E (1969) The pollyanna hypothesis InJournal of Verbal Learning and Verbal Behavior volume 8 pages 1ndash8 Elsevier

Bounie D Bourreau M Gensollen M and Waelbroeck P (2005)The effect of online customer reviews on purchasing decisions Thecase of video games In Retrieved July volume 8 page 2009 Citeseer

Breese J S Heckerman D and Kadie C (1998) Empirical analysisof predictive algorithms for collaborative filtering In Proceedingsof the Fourteenth conference on Uncertainty in artificial intelligencepages 43ndash52 Morgan Kaufmann Publishers Inc

Brill E (1995) Transformation-based error-driven learning and nat-ural language processing A case study in part-of-speech tagging InComputational linguistics volume 21 pages 543ndash565 MIT Press

Buchanan T W Lutz K Mirzazade S Specht K Shah N J ZillesK and Jaumlncke L (2000) Recognition of emotional prosody and ver-bal components of spoken language an fmri study In CognitiveBrain Research volume 9 pages 227ndash238 Elsevier

244 BIBLIOGRAPHY

Buvet P-A Girardin C Gross G and Groud C (2005) Les preacutedicatsdrsquoaffect In LIDIL number 32 pages ppndash125

Carbonell J G (1979) Subjective understanding Computer modelsof belief systems Technical report DTIC Document

Carenini G Ng R T and Zwart E (2005) Extracting knowledge fromevaluative text In Proceedings of the 3rd international conference onKnowledge capture pages 11ndash18 ACM

Carreras X and Magraverquez L (2005) Introduction to the conll-2005shared task Semantic role labeling In Proceedings of the Ninth Con-ference on Computational Natural Language Learning pages 152ndash164 Association for Computational Linguistics

Carvalho P Sarmento L Silva M J and De Oliveira E (2009) Cluesfor detecting irony in user-generated contents oh itrsquos so easy-) In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion pages 53ndash56 ACM

Chen Y Zhou Y Zhu S and Xu H (2012) Detecting offensive lan-guage in social media to protect adolescent online safety In Pri-vacy Security Risk and Trust (PASSAT) 2012 International Confer-ence on and 2012 International Confernece on Social Computing (So-cialCom) pages 71ndash80 IEEE

Chetviorkin I and Loukachevitch N V (2012) Extraction of rus-sian sentiment lexicon for product meta-domain In COLING pages593ndash610

Chevalier J A and Mayzlin D (2006) The effect of word of mouthon sales Online book reviews In Journal of marketing research vol-ume 43 pages 345ndash354 American Marketing Association

Chin P (2005) The value of user-generated contents In Intranet Jour-nal www intranetjournal com

BIBLIOGRAPHY 245

Choi Y Breck E and Cardie C (2006) Joint extraction of entities andrelations for opinion recognition In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing pages431ndash439 Association for Computational Linguistics

Choi Y and Cardie C (2008) Learning with compositional seman-tics as structural inference for subsentential sentiment analysis InProceedings of the Conference on Empirical Methods in Natural Lan-guage Processing pages 793ndash801

Choi Y Cardie C Riloff E and Patwardhan S (2005) Identify-ing sources of opinions with conditional random fields and extrac-tion patterns In Proceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Language Processingpages 355ndash362 Association for Computational Linguistics

Chomsky N (1957) Syntactic Structures Mouton and Co The Hague-Paris

Chomsky N (1965) Aspects of the Theory of Syntax Number 11 MITpress

Chomsky N (1993) Lectures on government and binding The Pisalectures Number 9 Walter de Gruyter

Chu S-C and Kim Y (2011) Determinants of consumer engagementin electronic word-of-mouth (ewom) in social networking sites InInternational journal of Advertising volume 30 pages 47ndash75 Tayloramp Francis

Clark H H and Gerrig R J (1984) On the pretense theory of ironyIn Journal of Experimental Psychology General volume 113 pages121ndash126

Clark S and Curran J R (2004) The importance of supertagging forwide-coverage ccg parsing In Proceedings of the 20th international

246 BIBLIOGRAPHY

conference on Computational Linguistics page 282 Association forComputational Linguistics

Clematide S and Klenner M (2010) Evaluation and extension of apolarity lexicon for german In Proceedings of the First Workshop onComputational Approaches to Subjectivity and Sentiment Analysispages 7ndash13

Councill I G McDonald R and Velikovich L (2010) Whatrsquos greatand whatrsquos not learning to classify the scope of negation for im-proved sentiment analysis In Proceedings of the workshop on nega-tion and speculation in natural language processing pages 51ndash59Association for Computational Linguistics

Curcoacute C (2000) Irony Negation echo and metarepresentation InLingua volume 110 pages 257ndash280 Elsevier

DrsquoAgostino E (1992) Analisi del discorso metodi descrittividellrsquoitaliano drsquouso Loffredo

DrsquoAgostino E (2005) Grammatiche lessicalmente esaustive delle pas-sioni il caso dellrsquoio collerico le forme nominali In Quaderns drsquoItaliagravepages 149ndash169

DrsquoAgostino E De Bueriis G Cicalese A Monteleone M VellutinoD Messina S Langella A Santonicola S Longobardi F andGuglielmo D (2007) Lexicon-grammar classifications or better toget rid of anguish In 26th International Conference on Lexis andGrammar (LGCrsquo07)

DrsquoAgostino E and Elia A (1998) Il significato delle frasi un contin-uum dalle frasi semplici alle forme polirematiche In AA VV Ai limitidel linguaggio Laterza Bari pages 287ndash310

Dalianis H and Skeppstedt M (2010) Creating and evaluating a con-sensus for negated and speculative words in a swedish clinical cor-pus In Negation and Speculation in Natural Language Processing(NeSp-NLP 2010) page 5

BIBLIOGRAPHY 247

Dang Y Zhang Y and Chen H (2010) A lexicon-enhanced methodfor sentiment classification An experiment on online product re-views In Intelligent Systems IEEE volume 25 pages 46ndash53 IEEE

Darroch J N and Ratcliff D (1972) Generalized iterative scaling forlog-linear models In The annals of mathematical statistics pages1470ndash1480 JSTOR

Dasgupta S and Ng V (2009) Mine the easy classify the hard asemi-supervised approach to automatic sentiment classification InProceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural LanguageProcessing of the AFNLP Volume 2-Volume 2 pages 701ndash709 Asso-ciation for Computational Linguistics

Davidov D Tsur O and Rappoport A (2010) Semi-supervisedrecognition of sarcastic sentences in twitter and amazon In Pro-ceedings of the Fourteenth Conference on Computational NaturalLanguage Learning pages 107ndash116 Association for ComputationalLinguistics

de Albornoz J C Plaza L and Gervaacutes P (2012) Sentisense An easilyscalable concept-based affective lexicon for sentiment analysis InLREC pages 3562ndash3567

De Bueriis G and Elia A (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

De Gioia M (1994a) Chiaro come il sole a mezzogiorno una classedi avverbi idiomatici dellrsquoitaliano In The Linguist volume 33 pages220ndash225 Newnorth Print Ltd

De Gioia M (1994b) Problemi di rappresentazione et traduzione degliavverbi idiomatici Micromeacutegas

De Gioia M (2001) Avverbi idiomatici dellrsquoitaliano analisi lessico-grammaticale LrsquoHarmattan Italia

248 BIBLIOGRAPHY

De Mauro T and Thornton A M (1985) La predicazione teoria e ap-plicazione allrsquoitaliano In Sintassi e morfologia della lingua italianadrsquouso teorie ed applicazioni descrittive pages 487ndash519

De Silva L C Miyasato T and Nakatsu R (1997) Facial emotionrecognition using multi-modal information In Information Com-munications and Signal Processing 1997 ICICS Proceedings of 1997International Conference on volume 1 pages 397ndash401 IEEE

Denecke K (2008) Using SentiWordNet for multilingual sentimentanalysis In Data Engineering Workshop 2008 ICDEW 2008 IEEE24th International Conference on pages 507ndash512 IEEE

Descleacutes J Makkaoui O and Hacegravene T (2010) Automatic annota-tion of speculation in biomedical texts new perspectives and largescale evaluation In Negation and Speculation in Natural LanguageProcessing (NeSp-NLP 2010) page 32

Dewally M and Ederington L (2006) Reputation certification war-ranties and information as remedies for seller-buyer informationasymmetries Lessons from the online comic book market In TheJournal of Business volume 79 pages 693ndash729 JSTOR

Diamond D W (1989) Reputation acquisition in debt markets InThe journal of political economy pages 828ndash862 JSTOR

Dowty D R (1989) On the semantic content of the notion of lsquothe-matic rolersquo In Properties types and meaning pages 69ndash129 Springer

Dragut E C Yu C Sistla P and Meng W (2010) Construction ofa sentimental word dictionary In Proceedings of the 19th ACM in-ternational conference on Information and knowledge managementpages 1761ndash1764

Duan W Gu B and Whinston A B (2008) The dynamics of on-line word-of-mouth and product salesmdashan empirical investigation

BIBLIOGRAPHY 249

of the movie industry In Journal of retailing volume 84 pages 233ndash242 Elsevier

Elia A Per filo e per segno la struttura degli avverbi compostiIn DrsquoAgostino (a cura di) Tra sintassi e semantica Descrizioni emetodi di elaborazione automatica della lingua drsquouso pages 167ndash263 Napoli ESI

Elia A (1984) Le verbe italien Les compleacutetives dans les phrases agrave uncompleacutement Fasano di Puglia Schena-Nizet

Elia A (1990) Chiaro e tondo Lessico-Grammatica degli avverbi com-posti in italiano Segno Associati

Elia A (2013) On lexical semantic and syntactic granularity of ital-ian verbs In Penser le Lexique Grammaire Perspectives ActuellesHonoreacute Champion Paris pages 277ndash286

Elia A (2014a) Lessico e sintassi tra tempo e massa parlante InMarchese MP Nocentini A Il lessico nella teoria e nella storia lin-guistica pages 15ndash47 Edizioni il Calamo

Elia A (2014b) Operatori argomenti e il sistema leg-semantic rolelabelling dellrsquoitaliano In Mirto I a cura di Le relazioni irresistibilipages 105ndash118 Edizioni Ets

Elia A and De Bueriis G (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

Elia A Guglielmo D Maisto A and Pelosi S (2013) A linguistic-based method for automatically extracting spatial relations fromlarge non-structured data In Algorithms and Architectures for Par-allel Processing pages 193ndash200 Springer

Elia A Martinelli M and DrsquoAgostino E (1981) Lessico e Strut-ture sintattiche Introduzione alla sintassi del verbo italiano NapoliLiguori

250 BIBLIOGRAPHY

Elia A Pelosi S Maisto A and Raffaele G (2015) Towards alexicon-grammar based framework for nlp an opinion mining ap-plication In RANLP 2015 pages 160ndash167

Elia A and Vietri S (2010) Lexis-grammar amp semantic web In IN-FOtheca volume 11 pages 15andash38a

Elia A Vietri S Postiglione A Monteleone M and Marano F(2010) Data mining modular software system In SWWS pages 127ndash133

Engstroumlm C (2004) Topic dependence in sentiment classification InUnpublished MPhil Dissertation University of Cambridge

Essa I Pentland A P et al (1997) Coding analysis interpretationand recognition of facial expressions In Pattern Analysis and Ma-chine Intelligence IEEE Transactions on volume 19 pages 757ndash763IEEE

Esuli A and Sebastiani F (2005) Determining the semantic orienta-tion of terms through gloss classification In Proceedings of the 14thACM international conference on Information and knowledge man-agement pages 617ndash624 ACM

Esuli A and Sebastiani F (2006a) Determining term subjectivity andterm orientation for opinion mining In EACL volume 6 page 2006

Esuli A and Sebastiani F (2006b) SentiWordNet A publicly avail-able lexical resource for opinion mining In Proceedings of LRECvolume 6 pages 417ndash422

Farkas D F and Kiss K Eacute (2000) On the comparative and absolutereadings of superlatives In Natural Language amp Linguistic Theoryvolume 18 pages 417ndash455 Springer

Farkas R Vincze V Moacutera G Csirik J and Szarvas G (2010) Theconll-2010 shared task learning to detect hedges and their scope in

BIBLIOGRAPHY 251

natural language text In Proceedings of the Fourteenth Conferenceon Computational Natural Language LearningmdashShared Task pages1ndash12 Association for Computational Linguistics

Fellbaum C (1998) WordNet Wiley Online Library

Ferreira L Jakob N and Gurevych I (2008) A comparative studyof feature extraction algorithms in customer reviews In SemanticComputing 2008 IEEE International Conference on pages 144ndash151IEEE

Filip H (1996) Psychological predicates and the syntax-semantics in-terface In Conceptual Structure Discourse and Language StanfordCenter for the Study of Language and Information

Fillmore C J (1976) Frame semantics and the nature of languageIn Annals of the New York Academy of Sciences volume 280 pages20ndash32 Wiley Online Library

Fillmore C J (1977) The case for case reopened In Syntax and se-mantics volume 8 pages 59ndash82

Fillmore C J (2006) Frame semantics In Cognitive linguisticsBasic readings volume 34 pages 373ndash400 Berlin Mounton deGruyter[Originally published in Linguistics in the Morning CalmSeoul Hanshin Publishing Co 1982]

Fillmore C J and Baker C F (2001) Frame semantics for text un-derstanding In Proceedings of WordNet and Other Lexical ResourcesWorkshop NAACL

Fillmore C J Baker C F and Sato H (2002) The framenet databaseand software tools In LREC

Fishelov D (1993) Poetic and non-poetic simile structure seman-tics rhetoric In Poetics Today pages 1ndash23 JSTOR

252 BIBLIOGRAPHY

Fiszman M Demner-Fushman D Lang F M Goetz P and Rind-flesch T C (2007) Interpreting comparative constructions inbiomedical text In Proceedings of the Workshop on BioNLP 2007Biological Translational and Clinical Language Processing pages137ndash144 Association for Computational Linguistics

Fragopanagos N and Taylor J G (2005) Emotion recognition inhumanndashcomputer interaction In Neural Networks volume 18pages 389ndash405 Elsevier

Gaeta L (2004) Nomi drsquoazione In La formazione delle parole in ital-iano Tuumlbingen Max Niemeyer Verlag pages 314ndash51

Gamon M and Aue A (2005) Automatic identification of senti-ment vocabulary exploiting low association with known sentimentterms In Proceedings of the ACL Workshop on Feature Engineeringfor Machine Learning in Natural Language Processing pages 57ndash64

Ganapathibhotla M and Liu B (2008) Mining opinions in compara-tive sentences In Proceedings of the 22nd International Conferenceon Computational Linguistics-Volume 1 pages 241ndash248 Associationfor Computational Linguistics

Ganter V and Strube M (2009) Finding hedges by chasing weaselsHedge detection using wikipedia tags and shallow linguistic fea-tures In Proceedings of the ACL-IJCNLP 2009 Conference Short Pa-pers pages 173ndash176 Association for Computational Linguistics

Gantz J and Reinsel D (2011) Extracting value from chaos In IDCiview number 1142 pages 9ndash10

Garcia D Garas A and Schweitzer F (2012) Positive words carryless information than negative words In EPJ Data Science volume 1EPJ

Gatti L and Guerini M (2012) Assessing sentiment strength in wordsprior polarities In arXiv preprint arXiv12124315

BIBLIOGRAPHY 253

Gibbs R W (2000) Irony in talk among friends In Metaphor andsymbol volume 15 pages 5ndash27 Taylor amp Francis

Gildea D and Jurafsky D (2002) Automatic labeling of semanticroles In Computational linguistics volume 28 pages 245ndash288 MITPress

Giora R (1995) On irony and negation In Discourse processes vol-ume 19 pages 239ndash264 Taylor amp Francis

Giora R Fein O and Schwartz T (1998) Irony Grade salience andindirect negation In Metaphor and Symbol volume 13 pages 83ndash101 Taylor amp Francis

Giordano R and Voghera M (2008) Frasi senza verbo il contributodella prosodia In Sintassi storica e sincronica dellrsquoitaliano Atti delConv Intern SILFI Basilea

Giuglea A-M and Moschitti A (2006) Semantic role labeling viaframenet verbnet and propbank In Proceedings of the 21st Inter-national Conference on Computational Linguistics and the 44th an-nual meeting of the Association for Computational Linguistics pages929ndash936 Association for Computational Linguistics

Godard D (2013) Les neacutegateurs In La grande Grammaire du franccedilaisActes Sud

Goldberg A B and Zhu X (2006) Seeing stars when there arenrsquot manystars graph-based semi-supervised learning for sentiment catego-rization In Proceedings of the First Workshop on Graph Based Meth-ods for Natural Language Processing pages 45ndash52 Association forComputational Linguistics

Gonzaacutelez-Ibaacutenez R Muresan S and Wacholder N (2011) Identify-ing sarcasm in twitter a closer look In Proceedings of the 49th An-nual Meeting of the Association for Computational Linguistics Hu-man Language Technologies short papers-Volume 2 pages 581ndash586Association for Computational Linguistics

254 BIBLIOGRAPHY

Grimshaw J (1990) Argument structure the MIT Press

Gross G (1992a) Un outil pour le FLE les classes d lsquoobjets In Actesdu colloque FLE pages 169ndash192

Gross G (2008) Les classes drsquoobjets In Lalies number 28 pages 111ndash165

Gross M (1971) Transformational Analysis of French Verbal Con-structions University of Pennsylvania

Gross M (1975) Meacutethodes en syntaxe Hermann

Gross M (1979) On the failure of generative grammar In Languagepages 859ndash885 JSTOR

Gross M (1981) Les bases empiriques de la notion de preacutedicat seacute-mantique In Langages pages 7ndash52 JSTOR

Gross M (1982) Une classification des phrases laquofigeacuteesraquo du franccedilaisIn Revue queacutebeacutecoise de linguistique volume 11 pages 151ndash185 Uni-versiteacute du Queacutebec agrave Montreacuteal

Gross M (1986) Lexicon-grammar the representation of compoundwords In Proceedings of the 11th coference on Computational lin-guistics pages 1ndash6 Association for Computational Linguistics

Gross M (1988) Les limites de la phrase figeacutee In Langages pages7ndash22 JSTOR

Gross M (1990) La caracteacuterisation des adverbes dans un lexique-grammaire In Langue franccedilaise pages 90ndash102 JSTOR

Gross M (1992b) The argument structure of elementary sentencesIn Language Research volume 28 pages 699ndash716

Gross M (1993) Les phrases figeacutees en franccedilais In Lrsquoinformationgrammaticale volume 59 pages 36ndash41 Peeters

BIBLIOGRAPHY 255

Gross M (1995) Une grammaire locale de lrsquoexpression des senti-ments In Langue franccedilaise pages 70ndash87 JSTOR

Gross M (1996) Les verbes supports drsquoadjectifs et le passif In Lan-gages volume 30 pages 8ndash18 Armand Colin

Gross M (1997) The construction of local grammars In Finite-statelanguage processing volume 11 page 329 MIT Press

Gruber J S (1965) Studies in lexical relations PhD thesis Mas-sachusetts Institute of Technology

Gruen T W Osmonbekov T and Czaplewski A J (2006) ewomThe impact of customer-to-customer online know-how exchangeon customer value and loyalty In Journal of Business research vol-ume 59 pages 449ndash456 Elsevier

Guglielmo D (2009) ldquoGuardarsi allo specchiordquo forme lessicali e sin-tattiche della vanitagrave In Atti del II Convegno Interdipartimentale delCentro Studi sulle Rappresentazioni Linguistiche

Guillet A and Leclegravere C (1981) Restructuration du groupe nominalIn Langages pages 99ndash125 JSTOR

Gutieacuterrez Y Vaacutezquez S and Montoyo A (2011) Sentiment classi-fication using semantic features extracted from WordNet-based re-sources In Proceedings of the 2nd Workshop on Computational Ap-proaches to Subjectivity and Sentiment Analysis pages 139ndash145 As-sociation for Computational Linguistics

Hanani U Shapira B and Shoval P (2001) Information filteringOverview of issues research and systems In User Modeling andUser-Adapted Interaction volume 11 pages 203ndash259 Kluwer Aca-demic Publishers

Hansen L K Arvidsson A Nielsen F Aring Colleoni E and Etter M(2011) Good friends bad news-affect and virality in twitter In Fu-ture information technology pages 34ndash43 Springer

256 BIBLIOGRAPHY

Harkema H Dowling J N Thornblade T and Chapman W W(2009) Context An algorithm for determining negation expe-riencer and temporal status from clinical reports In Journal ofbiomedical informatics volume 42 page 839 NIH Public Access

Harris Z (1988) Language and information Columbia UniversityPress

Harris Z S (1964) Transformations in linguistic structure In Pro-ceedings of the American Philosophical Society pages 418ndash422

Harris Z S (1970) Discourse analysis In Papers in structural andtransformational linguistics pages 313ndash347

Hassan A and Radev D (2010) Identifying text polarity using randomwalks In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics pages 395ndash403

Hatzivassiloglou V and McKeown K R (1997) Predicting the seman-tic orientation of adjectives In Proceedings of the 35th annual meet-ing of the association for computational linguistics and eighth con-ference of the european chapter of the association for computationallinguistics pages 174ndash181 Association for Computational Linguis-tics

Herlocker J L Konstan J A Borchers A and Riedl J (1999) An al-gorithmic framework for performing collaborative filtering In Pro-ceedings of the 22nd annual international ACM SIGIR conference onResearch and development in information retrieval pages 230ndash237ACM

Hernandez-Farias I Buscaldi D and Priego-Saacutenchez B (2014)Iradabe Adapting english lexicons to the italian sentiment polar-ity classification task In First Italian Conference on ComputationalLinguistics (CLiC-it 2014) and the fourth International WorkshopEVALITA2014 pages 75ndash81

BIBLIOGRAPHY 257

Horn L (1989) A natural history of negation University of ChicagoPress

Houen S (2011) Opinion mining with semantic analysis In Onlinewww diku dk forskning Publikationer specialer 2011

specialerapport_ final_ Soren_ Houen pdf

Hu M and Liu B (2004) Mining opinion features in customer re-views In AAAI volume 4 pages 755ndash760

Hu M and Liu B (2006) Opinion feature extraction using classsequential rules In AAAI Spring Symposium Computational Ap-proaches to Analyzing Weblogs pages 61ndash66

Iacobini C (2004) Prefissazione In La formazione delle parole initaliano Tuumlbingen Max Niemeyer Verlag pages 97ndash161

Jackendoff R (1972) Semantic interpretation in generative grammarMIT press Cambridge MA

Jackendoff R (1990) Semantic structures volume 18 MIT press

Jespersen O (1965) The philosophy of grammar Number 307 Uni-versity of Chicago Press

Jia L Yu C and Meng W (2009) The effect of negation on senti-ment analysis and retrieval effectiveness In Proceedings of the 18thACM conference on Information and knowledge management pages1827ndash1830 ACM

Jin X Li Y Mah T and Tong J (2007) Sensitive webpage classifica-tion for content advertising In Proceedings of the 1st internationalworkshop on Data mining and audience intelligence for advertisingpages 28ndash33 ACM

Jindal N and Liu B (2006a) Identifying comparative sentences intext documents In Proceedings of the 29th annual internationalACM SIGIR conference on Research and development in informationretrieval pages 244ndash251 ACM

258 BIBLIOGRAPHY

Jindal N and Liu B (2006b) Mining comparative sentences and re-lations In AAAI volume 22 pages 1331ndash1336

Jindal N and Liu B (2007) Review spam detection In Proceedings ofthe 16th international conference on World Wide Web pages 1189ndash1190 ACM

Justo R Corcoran T Lukin S M Walker M and Torres M I (2014)Extracting relevant knowledge for the detection of sarcasm and nas-tiness in the social web In Knowledge-Based Systems volume 69pages 124ndash133 Elsevier

Kaji N and Kitsuregawa M (2007) Building lexicon for sentimentanalysis from massive collection of html documents In EMNLP-CoNLL pages 1075ndash1083

Kamp H and Reyle U (1993) From discourse to logic Introduction tomodeltheoretic semantics of natural language formal logic and dis-course representation theory Number 42 Springer Science amp Busi-ness Media

Kamps J Marx M Mokken R J and De Rijke M (2004) Usingwordnet to measure semantic orientations of adjectives In LRECvolume 4 pages 1115ndash1118

Kanayama H and Nasukawa T (2006a) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In Proceedings ofthe 2006 Conference on Empirical Methods in Natural Language Pro-cessing pages 355ndash363 Association for Computational Linguistics

Kanayama H and Nasukawa T (2006b) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In COLING ACL2006 page 355

Kang H Yoo S J and Han D (2012) Senti-lexicon and improvednaiumlve bayes algorithms for sentiment analysis of restaurant reviews

BIBLIOGRAPHY 259

In Expert Systems with Applications volume 39 pages 6000ndash6010Elsevier

Kennedy A and Inkpen D (2006) Sentiment classification of moviereviews using contextual valence shifters In Computational intelli-gence volume 22 pages 110ndash125 Wiley Online Library

Khan K Baharudin B and Khan A Identifying product featuresfrom customer reviews using hybrid dependency patterns Citeseer

Khan K Baharudin B B and City T (2012) Identifying product fea-tures from customer reviews using lexical concordance In ResearchJournal of Applied Sciences Engineering and Technology volume 4pages 833ndash839

Kim S-M and Hovy E (2004) Determining the sentiment of opin-ions In Proceedings of the 20th international conference on Compu-tational Linguistics page 1367

Kim S-M and Hovy E (2005) Identifying opinion holders for ques-tion answering in opinion texts In Proceedings of AAAI-05 Workshopon Question Answering in Restricted Domains pages 1367ndash1373

Kim S-M and Hovy E (2006) Extracting opinions opinion holdersand topics expressed in online news media text In Proceedings ofthe Workshop on Sentiment and Subjectivity in Text pages 1ndash8 As-sociation for Computational Linguistics

Kim Y Kim S and Myaeng S-H (2008) Extracting topic-relatedopinions and their targets in NTCIR-7 In Proceedings of the 7th NT-CIR Workshop Meeting Tokyo Japan

Kim Y-j and Larson R (1989) Scope interpretation and the syntaxof psych-verbs In Linguistic Inquiry pages 681ndash688 JSTOR

Kingsbury P and Palmer M (2002) From treebank to propbank InLREC

260 BIBLIOGRAPHY

Klein B and Leffler K B (1981) The role of market forces in assuringcontractual performance In The Journal of Political Economy pages615ndash641 JSTOR

Klein K and Kutscher S (2002) Psych-verbs and lexical economy InTheorie des Lexikons volume 122 pages 1ndash41

Klein L R (1998) Evaluating the potential of interactive mediathrough a new lens Search versus experience goods In Journal ofbusiness research volume 41 pages 195ndash203 Elsevier

Kleinberg J M (1999) Authoritative sources in a hyperlinked envi-ronment In Journal of the ACM (JACM) volume 46 pages 604ndash632ACM

Kobayakawa T S Kumano T Tanaka H Okazaki N Kim J-D andTsujii J (2009) Opinion classification with tree kernel svm usinglinguistic modality analysis In Proceedings of the 18th ACM confer-ence on Information and knowledge management pages 1791ndash1794ACM

Ku L-W Huang T-H and Chen H-H (2009) Using morphologicaland syntactic structures for chinese opinion analysis In Proceedingsof the 2009 Conference on Empirical Methods in Natural LanguageProcessing Volume 3-Volume 3 pages 1260ndash1269

Lafferty J McCallum A and Pereira F C (2001) Conditional ran-dom fields Probabilistic models for segmenting and labeling se-quence data

Lakoff G (1973) Hedges A study in meaning criteria and the logicof fuzzy concepts In Journal of philosophical logic volume 2 pages458ndash508 Springer

Landau I (2010) The locative syntax of experiencers volume 34 MITpress Cambridge

BIBLIOGRAPHY 261

Landauer T K and Dumais S T (1997) A solution to platorsquos problemThe latent semantic analysis theory of acquisition induction andrepresentation of knowledge In Psychological review volume 104page 211 American Psychological Association

Laporte E (1997) Lrsquoanalyse de phrases adjectivales par reacutetablisse-ment de noms approprieacutes In Langages volume 31 pages 79ndash104Armand Colin

Laporte E (2012) Appropriate nouns with obligatory modifiers InarXiv preprint arXiv12074625

Laporte E and Voyatzi S (2008) An electronic dictionary of frenchmultiword adverbs In Language Resources and Evaluation Confer-ence Workshop Towards a Shared Task for Multiword Expressionspages 31ndash34

Larreya P (2004) Lrsquoexpression de la modaliteacute en franccedilais et enanglais (domaine verbal) In Revue belge de philologie et drsquohistoirevolume 82 pages 733ndash762 Socieacuteteacute pour le progregraves des eacutetudesphilologiques et historiques

Lavanda I and Rampa G (2001) Microeconomia scelte individuali ebenessere sociale Carocci

Laver M Benoit K and Garry J (2003) Extracting policy positionsfrom political texts using words as data In American Political Sci-ence Review volume 97 pages 311ndash331 Cambridge Univ Press

Le Pesant D and Mathieu-Colas M (1998) Introduction aux classesdrsquoobjets In Langages pages 6ndash33 JSTOR

Lee C M Narayanan S S and Pieraccini R (2002) Combiningacoustic and language information for emotion recognition In IN-TERSPEECH

262 BIBLIOGRAPHY

Lee M and Youn S (2009) Electronic word of mouth (ewom) howewom platforms influence consumer product judgement In Inter-national Journal of Advertising volume 28 pages 473ndash499 Taylor ampFrancis

Lenci A Lapesa G and Bonansinga G (2012) Lexit A computa-tional resource on italian argument structure In LREC pages 3712ndash3718

Leung C W Chan S C and Chung F-l (2006) Integrating collabo-rative filtering and sentiment analysis A rating inference approachIn Proceedings of the ECAI 2006 workshop on recommender systemspages 62ndash66

Leung C W Chan S C and Chung F-L (2008) Evaluation of a rat-ing inference approach to utilizing textual reviews for collaborativerecommendation In Cooperative Internet Computing World Scien-tific Publisher World Scientific

Li F Huang M and Zhu X (2010) Sentiment analysis with globaltopics and local dependency In AAAI

Linden G Smith B and York J (2003) Amazon com recommenda-tions Item-to-item collaborative filtering In Internet ComputingIEEE volume 7 pages 76ndash80 IEEE

Liu B (2010) Sentiment analysis and subjectivity In Handbook ofnatural language processing volume 2 pages 627ndash666 Chapman ampHall Goshen CT

Liu B (2012) Sentiment analysis and opinion mining In SynthesisLectures on Human Language Technologies volume 5 pages 1ndash167Morgan amp Claypool Publishers

Liu J and Seneff S (2009) Review sentiment scoring via a parse-and-paraphrase paradigm In Proceedings of the 2009 Conference on Em-pirical Methods in Natural Language Processing Volume 1-Volume1 pages 161ndash169 Association for Computational Linguistics

BIBLIOGRAPHY 263

Macdonald C Ounis I and Soboroff I (2007) Overview of the trec2007 blog track In TREC volume 7 pages 31ndash43

Mahmud A Ahmed K Z and Khan M (2008) Detecting flames andinsults in text In Proceedings of the Sixth International Conferenceon Natural Language Processing BRAC University

Maisto A and Pelosi S (2014a) Feature-based customer review sum-marization In On the Move to Meaningful Internet Systems OTM2014 Workshops pages 299ndash308

Maisto A and Pelosi S (2014b) A lexicon-based approach to senti-ment analysis the italian module for nooj In Proceedings of the In-ternational Nooj 2014 Conference University of Sassari Italy Cam-bridge Scholar Publishing

Maks I and Vossen P (2011) Different approaches to automatic po-larity annotation at synset level In Proceedings of the First Interna-tional Workshop on Lexical Resources WoLeR pages 62ndash69

Maks I Vossen P et al (2012) Building a fine-grained subjectivitylexicon from a web corpus In LREC pages 3070ndash3076

Markov A (1971) Extension of the limit theorems of probability theoryto a sum of variables connected in a chain John Wiley amp Sons Inc

Magraverquez L Carreras X Litkowski K C and Stevenson S (2008)Semantic role labeling an introduction to the special issue In Com-putational linguistics volume 34 pages 145ndash159 MIT Press

Martiacuten J (1998) The lexico-syntactic interface psych verbs andpsych nouns In Hispania pages 619ndash631 JSTOR

Mathieu Y Y (1995) Verbes psychologiques et interpreacutetation seacuteman-tique In Langue franccedilaise pages 98ndash106 JSTOR

Mathieu Y Y (1999a) Les preacutedicats de sentiment In Langages pages41ndash52 JSTOR

264 BIBLIOGRAPHY

Mathieu Y Y (1999b) Un classement seacutemantique des verbes psy-chologiques In Cahiers du CIEL Publications Paris 7

Mathieu Y Y (2006) A computational semantic lexicon of frenchverbs of emotion In Computing attitude and affect in text Theoryand applications pages 109ndash124 Springer

Matthews H Hendrickson C and Soh D (2001) Environmentaland economic effects of e-commerce A case study of book publish-ing and retail logistics In Transportation Research Record Journal ofthe Transportation Research Board number 1763 pages 6ndash12 Trans-portation Research Board of the National Academies

Maynard D and Greenwood M A (2014) Who cares about sarcastictweets investigating the impact of sarcasm on sentiment analysisIn Proceedings of LREC

Maziero E G Pardo T A Di Felippo A and Dias-da Silva B C(2008) A base de dados lexical ea interface web do tep 20 thesauruseletrocircnico para o portuguecircs do brasil In Companion Proceedingsof the XIV Brazilian Symposium on Multimedia and the Web pages390ndash392 ACM

McAfee A Brynjolfsson E Davenport T H Patil D and Barton D(2012) Big data In The management revolution Harvard Bus Revvolume 90 pages 61ndash67

Meier C (2003) The meaning of too enough and so that In NaturalLanguage Semantics volume 11 pages 69ndash107 Springer

Meillet A (1906) La Phrase nominale en indoeuropeacuteen Socieacuteteacute delinguistique

Mejova Y and Srinivasan P (2011) Exploring feature definition andselection for sentiment classifiers In ICWSM

BIBLIOGRAPHY 265

Meunier A (1984) La seacutemantique locative de certaines structuresN0 ecirctre adj In Revue queacutebeacutecoise de linguistique volume 13 pages95ndash121 Universiteacute du Queacutebec agrave Montreacuteal

Meunier A (1999) Une construction complexe N0hum ecirctre Adj deV0-inf W caracteeacuteristique de certains adjectifs aacute sujet humain InLangages pages 12ndash44 JSTOR

Meydan M (1996) Constructions adjectivales substantifs approprieacuteset verbes supports In Linx volume 34 pages 197ndash210 Centre derecherches linguistiques de Paris 10

Meydan M (1999) La restructuration du gn sujet dans les phrases ad-jectivales agrave substantif approprieacute In Langages pages 59ndash80 JSTOR

Miller G A (1995) WordNet a lexical database for english In Com-munications of the ACM volume 38 pages 39ndash41 ACM

Mohammad S Dunne C and Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked wordsand a thesaurus In Proceedings of the 2009 Conference on EmpiricalMethods in Natural Language Processing Volume 2-Volume 2 pages599ndash608 Association for Computational Linguistics

Moilanen K and Pulman S (2007) Sentiment composition In Pro-ceedings of the Recent Advances in Natural Language Processing In-ternational Conference pages 378ndash382

Moilanen K and Pulman S (2008) The good the bad and the un-known morphosyllabic sentiment tagging of unseen words In Pro-ceedings of the 46th Annual Meeting of the Association for Computa-tional Linguistics on Human Language Technologies Short Paperspages 109ndash112

Molinier C and Levrier F (2000) Grammaire des adverbes descrip-tion des formes en-ment volume 33 Librairie Droz

266 BIBLIOGRAPHY

Monachesi P (2009) Annotation of semantic roles In Utrechet Uni-versity Netherlands

Montefinese M Ambrosini E Fairfield B and Mammarella N(2014) The adaptation of the affective norms for english words(anew) for italian In Behavior research methods volume 46 pages887ndash903 Springer

Moon R (2008) Conventionalized as-similes in english a problemcase In International Journal of Corpus Linguistics volume 13pages 3ndash37 John Benjamins Publishing Company

Mulder M Nijholt A Den Uyl M and Terpstra P (2004) A lexi-cal grammatical implementation of affect In Text Speech and Dia-logue pages 171ndash177

Mullen T and Collier N (2004) Sentiment analysis using supportvector machines with diverse information sources In EMNLP vol-ume 4 pages 412ndash418

Mullen T and Malouf R (2006) A preliminary investigation into sen-timent analysis of informal political discourse In AAAI Spring Sym-posium Computational Approaches to Analyzing Weblogs pages159ndash162

Nakagawa T Inui K and Kurohashi S (2010) Dependency tree-based sentiment classification using crfs with hidden variables InHuman Language Technologies The 2010 Annual Conference of theNorth American Chapter of the Association for Computational Lin-guistics pages 786ndash794 Association for Computational Linguistics

Nakayama M Sutcliffe N and Wan Y (2010) Has the web trans-formed experience goods into search goods In Electronic Marketsvolume 20 pages 251ndash262 Springer

Narayanan R Liu B and Choudhary A (2009) Sentiment analy-sis of conditional sentences In Proceedings of the 2009 Conference

BIBLIOGRAPHY 267

on Empirical Methods in Natural Language Processing Volume 1-Volume 1 pages 180ndash189

Nasukawa T and Yi J (2003) Sentiment analysis Capturing favora-bility using natural language processing In Proceedings of the 2ndinternational conference on Knowledge capture pages 70ndash77 ACM

Navigli R and Ponzetto S P (2012) BabelNet The automatic con-struction evaluation and application of a wide-coverage multilin-gual semantic network volume 193 pages 217ndash250

Nelson P (1970) Information and consumer behavior In The Journalof Political Economy pages 311ndash329 JSTOR

Neviarouskaya A (2010) Compositional Approach for AutomaticRecognition of Fine-Grained Affect Judgment and Appreciation inText PhD thesis University of Tokyo

Neviarouskaya A Prendinger H and Ishizuka M (2007) Textualaffect sensing for sociable and expressive online communicationIn Affective Computing and Intelligent Interaction pages 218ndash229Springer

Neviarouskaya A Prendinger H and Ishizuka M (2009a) Com-positionality principle in recognition of fine-grained emotions fromtext In ICWSM

Neviarouskaya A Prendinger H and Ishizuka M (2009b) Senti-ful Generating a reliable lexicon for sentiment analysis In AffectiveComputing and Intelligent Interaction and Workshops 2009 ACII2009 3rd International Conference on pages 1ndash6 IEEE

Neviarouskaya A Prendinger H and Ishizuka M (2011) Sentiful Alexicon for sentiment analysis In Affective Computing IEEE Trans-actions on volume 2 pages 22ndash36 IEEE

268 BIBLIOGRAPHY

Niedenthal P M Halberstadt J B Margolin J and Innes-Ker Aring H(2000) Emotional state and the detection of change in facial ex-pression of emotion In European Journal of Social Psychology vol-ume 30 pages 211ndash222 Wiley Online Library

Nielsen F Aring (2011) A new anew Evaluation of a word list for senti-ment analysis in microblogs In arXiv preprint arXiv11032903

Norrick N R (1986) Stock similes In Journal of Literary Semanticsvolume 15 pages 39ndash52

Osgood C E (1952) The nature and measurement of meaning InPsychological bulletin volume 49 page 197 American PsychologicalAssociation

Pak A and Paroubek P (2011) Twitter for sentiment analysis Whenlanguage resources are not available In Database and Expert Sys-tems Applications (DEXA) 2011 22nd International Workshop onpages 111ndash115 IEEE

Pal P Iyer A N and Yantorno R E (2006) Emotion detection frominfant facial expressions and cries In Acoustics Speech and SignalProcessing 2006 ICASSP 2006 Proceedings 2006 IEEE InternationalConference on volume 2 pages IIndashII IEEE

Palmer M Gildea D and Kingsbury P (2005) The proposition bankAn annotated corpus of semantic roles In Computational linguis-tics volume 31 pages 71ndash106 MIT press

Palmer M Gildea D and Xue N (2010) Semantic role labelingIn Synthesis Lectures on Human Language Technologies volume 3pages 1ndash103 Morgan amp Claypool Publishers

Pang B and Lee L (2004) A sentimental education Sentiment anal-ysis using subjectivity summarization based on minimum cuts InProceedings of the 42nd annual meeting on Association for Compu-tational Linguistics page 271 Association for Computational Lin-guistics

BIBLIOGRAPHY 269

Pang B and Lee L (2005) Seeing stars Exploiting class relation-ships for sentiment categorization with respect to rating scales InProceedings of the 43rd Annual Meeting on Association for Compu-tational Linguistics pages 115ndash124 Association for ComputationalLinguistics

Pang B and Lee L (2008a) Opinion mining and sentiment analysisIn Foundations and trends in information retrieval volume 2 pages1ndash135 Now Publishers Inc

Pang B and Lee L (2008b) Using very simple statistics for reviewsearch An exploration In COLING (Posters) pages 75ndash78

Pang B Lee L and Vaithyanathan S (2002) Thumbs up senti-ment classification using machine learning techniques In Proceed-ings of the ACL-02 conference on Empirical methods in natural lan-guage processing-Volume 10 pages 79ndash86 Association for Compu-tational Linguistics

Pantic M and Patras I (2006) Dynamics of facial expression recog-nition of facial actions and their temporal segments from face pro-file image sequences In Systems Man and Cybernetics Part B Cy-bernetics IEEE Transactions on volume 36 pages 433ndash449 IEEE

Park C and Lee T M (2009) Information direction website reputa-tion and ewom effect A moderating role of product type In Journalof Business Research volume 62 pages 61ndash67 Elsevier

Paulo-Santos A Ramos C and Marques N C (2011) Determin-ing the polarity of words through a common online dictionary InProgress in Artificial Intelligence pages 649ndash663 Springer

Pelosi S (2015a) Morphological relations for the automatic expan-sion of italian sentiment lexicons In Proceedings of the Interna-tional Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts NooJ 2015

270 BIBLIOGRAPHY

Pelosi S (2015b) Sentita and doxa Italian databases and tools forsentiment analysis purposes In Proceedings of the Second ItalianConference on Computational Linguistics CLiC-it 2015 pages 226ndash231 Accademia University Press

Pelosi S Maisto A and Elia A (2015) Sentiment analysis throughfinite state automata In the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 - FSMNLP2015

Perez-Rosas V Banea C and Mihalcea R (2012) Learning senti-ment lexicons in spanish In LREC pages 3077ndash3081

Pesetsky D (1996) Zero syntax Experiencers and cascades Num-ber 27 MIT press

Pianta E Bentivogli L and Girardi C (2002) MultiWordNet devel-oping an aligned multilingual database In Proceedings of the firstinternational conference on global WordNet volume 152 pages 55ndash63

Piao S Ananiadou S Tsuruoka Y Sasaki Y and McNaught J(2007) Mining opinion polarity relations of citations In Interna-tional Workshop on Computational Semantics (IWCS) pages 366ndash371

Picabia L (1978) Les constructions adjectivales en franccedilais systeacutema-tique transformationnelle volume 11 Librairie Droz

Picard R W and Picard R (1997) Affective computing volume 252MIT press Cambridge

Pierre-Yves O (2003) The production and recognition of emotionsin speech features and algorithms In International Journal ofHuman-Computer Studies volume 59 pages 157ndash183 Elsevier

Plag I (2003) Word-formation in English Cambridge UniversityPress

BIBLIOGRAPHY 271

Polanyi L and Zaenen A (2006) Contextual valence shifters In Com-puting attitude and affect in text Theory and applications pages 1ndash10 Springer

Popescu A-M and Etzioni O (2007) Extracting product features andopinions from reviews In Natural language processing and text min-ing pages 9ndash28 Springer

Portner P (2009) Modality OUP Oxford

Potts C (2011) On the negativity of negation In Semantics and Lin-guistic Theory number 20 pages 636ndash659

Prabowo R and Thelwall M (2009) Sentiment analysis A combinedapproach In Journal of Informetrics volume 3 pages 143ndash157 Else-vier

Pradhan S Hacioglu K Ward W Martin J H and Jurafsky D(2003) Semantic role parsing Adding semantic structure to un-structured text In Data Mining 2003 ICDM 2003 Third IEEE In-ternational Conference on pages 629ndash632 IEEE

Qiu G Liu B Bu J and Chen C (2009) Expanding domain senti-ment lexicon through double propagation In IJCAI volume 9 pages1199ndash1204

Quirk R Crystal D and Education P (1985) A comprehensive gram-mar of the English language volume 397 Cambridge Univ Press

Rahimi Rastgar S and Razavi N (2013) A System for Building CorpusAnnotated With Semantic Roles PhD thesis

Rainer F (2004) Derivazione nominale deaggettivale In La for-mazione delle parole in italiano pages 293ndash314

Rao D and Ravichandran D (2009) Semi-supervised polarity lexiconinduction In Proceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Linguistics pages 675ndash682 Association for Computational Linguistics

272 BIBLIOGRAPHY

Rappaport Hovav M and Levin B (1998) Building verb meaningsIn The projection of arguments Lexical and compositional factorspages 97ndash134

Read J and Carroll J (2009) Weakly supervised techniques fordomain-independent sentiment classification In Proceedings of the1st international CIKM workshop on Topic-sentiment analysis formass opinion pages 45ndash52 ACM

Reinstein D A and Snyder C M (2005) The influence of expert re-views on consumer demand for experience goods A case study ofmovie critics In The journal of industrial economics volume 53pages 27ndash51 Wiley Online Library

Remus R Quasthoff U and Heyer G (2010) Sentiws-a pub-licly available german-language resource for sentiment analysis InLREC

Reyes A and Rosso P (2011) Mining subjective knowledge from cus-tomer reviews a specific case of irony detection In Proceedings ofthe 2nd Workshop on Computational Approaches to Subjectivity andSentiment Analysis pages 118ndash124 Association for ComputationalLinguistics

Reyes A Rosso P and Veale T (2013) A multidimensional approachfor detecting irony in twitter In Language Resources and Evaluationvolume 47 pages 239ndash268 Springer

Reynolds K Kontostathis A and Edwards L (2011) Using machinelearning to detect cyberbullying In Machine Learning and Applica-tions and Workshops (ICMLA) 2011 10th International Conferenceon volume 2 pages 241ndash244 IEEE

Ricca D (2004a) Aggettivi deverbali In La formazione delle parole initaliano Tuumlbingen Niemeyer pages 419ndash444

BIBLIOGRAPHY 273

Ricca D (2004b) Derivazione avverbiale In La formazione delle pa-role in italiano pages 472ndash489

Riloff E (1996) An empirical study of automated dictionary construc-tion for information extraction in three domains In Artificial intel-ligence volume 85 pages 101ndash134 Elsevier

Riloff E Patwardhan S and Wiebe J (2006) Feature subsumptionfor opinion analysis In Proceedings of the 2006 Conference on Em-pirical Methods in Natural Language Processing pages 440ndash448 As-sociation for Computational Linguistics

Riloff E Wiebe J and Wilson T (2003) Learning subjective nounsusing extraction pattern bootstrapping In Proceedings of the sev-enth conference on Natural language learning at HLT-NAACL 2003-Volume 4 pages 25ndash32 Association for Computational Linguistics

Rosenbaum P S (1967) The grammar of English predicate comple-ment constructions Research monograph 47 Cambridge MA MITPress

Rubin V L (2010) Epistemic modality From uncertainty to certaintyin the context of information seeking as interactions with texts InInformation Processing amp Management volume 46 pages 533ndash540Elsevier

Ruppenhofer J (2013) Anchoring sentiment analysis in frame se-mantics In Veredas pages 66ndash81

Ruppenhofer J Ellsworth M Petruck M R Johnson C R andScheffczyk J (2006) FrameNet II Extended theory and practice

Ruppenhofer J and Rehbein I (2012) Semantic frames as an an-chor representation for sentiment analysis In Proceedings of the 3rdWorkshop in Computational Approaches to Subjectivity and Senti-ment Analysis pages 104ndash109 Association for Computational Lin-guistics

274 BIBLIOGRAPHY

Ruppenhofer J Somasundaran S and Wiebe J (2008) Finding thesources and targets of subjective expressions In LREC

Russom P et al (2011) Big data analytics In TDWI Best PracticesReport Fourth Quarter

Ruwet N (1994) Ecirctre ou ne pas ecirctre un verbe de sentiment In Languefranccedilaise pages 45ndash55 JSTOR

Sagot B and Fort K (2007) Ameacuteliorer un lexique syntaxique agrave lrsquoaidedes tables du lexique-grammaire adverbes en-ment In 26e Col-loque International sur le Lexique et la grammaire 2007

Salvi G and Vanelli L (2004) Nuova grammatica italiana BolognaIl Mulino

Sarwar B Karypis G Konstan J and Riedl J (2001) Item-basedcollaborative filtering recommendation algorithms In Proceedingsof the 10th international conference on World Wide Web pages 285ndash295 ACM

Sauriacute R and Pustejovsky J (2009) Factbank A corpus annotated withevent factuality In Language resources and evaluation volume 43pages 227ndash268 Springer

Schmid H (1994) Probabilistic part-of-speech tagging using decisiontrees In Proceedings of the international conference on new methodsin language processing volume 12 pages 44ndash49

Schmid H Baroni M Zanchetta E and Stein A (2007) The en-riched treetagger system In proceedings of the EVALITA 2007 work-shop

Schuler K K (2005) VerbNet A broad-coverage comprehensive verblexicon PhD thesis Computer and Information Science Dept Uni-versity of Pennsylvania

BIBLIOGRAPHY 275

Schuller B Rigoll G and Lang M (2003) Hidden markov model-based speech emotion recognition In Acoustics Speech and SignalProcessing 2003 Proceedings(ICASSPrsquo03) 2003 IEEE InternationalConference on volume 2 pages IIndash1 IEEE

Schuller B Rigoll G and Lang M (2004) Speech emotion recogni-tion combining acoustic features and linguistic information in a hy-brid support vector machine-belief network architecture In Acous-tics Speech and Signal Processing 2004 Proceedings(ICASSPrsquo04)IEEE International Conference on volume 1 pages Indash577 IEEE

Schwarzschild R (2008) The semantics of comparatives and otherdegree constructions In Language and Linguistics Compass vol-ume 2 pages 308ndash331 Wiley Online Library

Seki Y Eguchi K Kando N and Aono M (2005) Multi-documentsummarization with subjectivity analysis at duc 2005 In Proceed-ings of the Document Understanding Conference (DUC)

Seki Y Evans D K Ku L-W Chen H-H Kando N and Lin C-Y(2007) Overview of opinion analysis pilot task at ntcir-6 In Proceed-ings of NTCIR-6 Workshop Meeting pages 265ndash278

Shaikh M A M Prendinger H and Mitsuru I (2007) Assessingsentiment of text by semantic dependency and contextual valenceanalysis In Affective Computing and Intelligent Interaction pages191ndash202 Springer

Shapiro C (1982) Consumer information product quality and sellerreputation In The Bell Journal of Economics pages 20ndash35 JSTOR

Shapiro C (1983) Premiums for high quality products as returns toreputations In The quarterly journal of economics pages 659ndash679JSTOR

Shimada K and Endo T (2008) Seeing several stars A rating in-ference task for a document containing several evaluation criteria

276 BIBLIOGRAPHY

In Advances in Knowledge Discovery and Data Mining pages 1006ndash1014 Springer

Silberztein M (2003) Nooj manual In Available for download atwww nooj4nlp net

Silberztein M (2008) Complex annotations with nooj In Proceedingsof the 2007 International NooJ Conference pages pndash214 CambridgeScholars Publishing

Silva M J Carvalho P Costa C and Sarmento L (2010) Automaticexpansion of a social judgment lexicon for sentiment analysis Tech-nical Report TR 1008

Silva M J Carvalho P and Sarmento L (2012) Building a sentimentlexicon for social judgement mining In Computational Processingof the Portuguese Language pages 218ndash228 Springer

Singhal P and Bansal A (2013) Improved textual cyberbullying de-tection using data mining In International Journal of Informationand Computation Technology ISSN pages 0974ndash2239

Snyder B and Barzilay R (2007) Multiple aspect ranking using thegood grief algorithm In HLT-NAACL pages 300ndash307

Socher R Perelygin A Wu J Y Chuang J Manning C D Ng A Yand Potts C (2013) Recursive deep models for semantic composi-tionality over a sentiment treebank In Proceedings of the conferenceon empirical methods in natural language processing (EMNLP) vol-ume 1631 page 1642

Somprasertsri G and Lalitrojwong P (2010) Mining feature-opinionin online customer reviews for opinion summarization In J UCSvolume 16 pages 938ndash955

Souza M Vieira R Busetti D Chishman R Alves I M et al(2011) Construction of a portuguese opinion lexicon from multi-ple resources In STIL

BIBLIOGRAPHY 277

Steinberger J Ebrahim M Ehrmann M Hurriyetoglu A KabadjovM Lenkova P Steinberger R Tanev H Vaacutezquez S and ZavarellaV (2012) Creating sentiment dictionaries via triangulation In Deci-sion Support Systems volume 53 pages 689ndash694 Elsevier

Stigler G J (1961) The economics of information In The journal ofpolitical economy pages 213ndash225 JSTOR

Stone P J Dunphy D C and Smith M S (1966) The General In-quirer A Computer Approach to Content Analysis MIT press

Stone P J and Hunt E B (1963) A computer approach to contentanalysis studies using the general inquirer system In Proceedingsof the May 21-23 1963 spring joint computer conference pages 241ndash256 ACM

Strapparava C and Mihalcea R (2008) Learning to identify emo-tions in text In Proceedings of the 2008 ACM symposium on Appliedcomputing pages 1556ndash1560 ACM

Strapparava C Valitutti A et al (2004) WordNet-Affect an affectiveextension of WordNet In LREC volume 4 pages 1083ndash1086

Strapparava C Valitutti A and Stock O (2006) The affective weightof lexicon In Proceedings of the fifth international conference on lan-guage resources and evaluation pages 423ndash426

Swier R S and Stevenson S (2004) Unsupervised semantic role la-belling In Proceedings of EMNLP volume 95 page 102

Taboada M Anthony C and Voll K (2006) Methods for creatingsemantic orientation dictionaries In Proceedings of the 5th Inter-national Conference on Language Resources and Evaluation (LREC)Genova Italy pages 427ndash432

Taboada M Brooke J Tofiloski M Voll K and Stede M (2011)Lexicon-based methods for sentiment analysis In Computationallinguistics volume 37 pages 267ndash307 MIT Press

278 BIBLIOGRAPHY

Taboada M and Grieve J (2004) Analyzing appraisal automaticallyIn Proceedings of AAAI Spring Symposium on Exploring Attitude andAffect in Text (AAAI Technical Re port SS 04 07) Stanford Univer-sity CA pp 158q161 AAAI Press

Tan S Cheng X Wang Y and Xu H (2009) Adapting naive bayesto domain adaptation for sentiment analysis In Advances in Infor-mation Retrieval pages 337ndash349 Springer

Taylor A (1954) Proverbial comparisons and similes from Californiavolume 3 University of California Press

Tepperman J Traum D R and Narayanan S (2006) yeahright sarcasm recognition for spoken dialogue systems In INTER-SPEECH

Terveen L Hill W Amento B McDonald D and Creter J (1997)Phoaks A system for sharing recommendations In Communica-tions of the ACM volume 40 pages 59ndash62 ACM

Tesniegravere L (1959) Eleacutements de syntaxe structurale Librairie C Klinck-sieck

Thomas M Pang B and Lee L (2006) Get out the vote Deter-mining support or opposition from congressional floor-debate tran-scripts In Proceedings of the 2006 conference on empirical methodsin natural language processing pages 327ndash335 Association for Com-putational Linguistics

Tolone E and Stavroula V (2011) Extending the adverbial coverage ofa nlp oriented resource for french In arXiv preprint arXiv11113462

Tsur O Davidov D and Rappoport A (2010) Icwsm-a great catchyname Semi-supervised recognition of sarcastic sentences in onlineproduct reviews In ICWSM

Turney P D (2001) Mining the web for synonyms Pmi-ir versus lsaon toefl In Machine Learning ECML 2001 pages 491ndash502 Springer

BIBLIOGRAPHY 279

Turney P D (2002) Thumbs up or thumbs down semantic orienta-tion applied to unsupervised classification of reviews In Proceed-ings of the 40th annual meeting on association for computationallinguistics pages 417ndash424

Turney P D and Littman M L (2003) Measuring praise and criti-cism Inference of semantic orientation from association In ACMTransactions on Information Systems (TOIS) volume 21 pages 315ndash346

Veale T and Hao Y (2009) Support structures for linguistic creativityA computational analysis of creative irony in similes In Proceedingsof CogSci pages 1376ndash1381

Velikovich L Blair-Goldensohn S Hannan K and McDonald R(2010) The viability of web-derived polarity lexicons In HumanLanguage Technologies The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguisticspages 777ndash785 Association for Computational Linguistics

Vermeij M (2005) The orientation of user opinions through adverbsverbs and nouns In 3rd Twente Student Conference on IT EnschedeJune

Vietri S (1990) On some comparative frozen sentences in italianIn Lingvisticaelig Investigationes volume 14 pages 149ndash174 John Ben-jamins Publishing Company

Vietri S (2004) Lessico-grammatica dellrsquoitaliano Metodi descrizionie applicazioni UTET Universitagrave

Vietri S (2011) On a class of italian frozen sentences In LingvisticaeligInvestigationes volume 34 pages 228ndash267 John Benjamins Publish-ing Company

Vietri S (2014a) The construction of an annotated corpus for theanalysis of italian transfer predicates In Lingvisticae Investigationesvolume 37 pages 69ndash105 John Benjamins Publishing Company

280 BIBLIOGRAPHY

Vietri S (2014b) Idiomatic Constructions in Italian A Lexicon-grammar Approach volume 31 John Benjamins Publishing Com-pany

Vietri S (2014c) The italian module for nooj In In Proceedings of theFirst Italian Conference on Computational Linguistics CLiC-it 2014Pisa University Press

Vietri S (2014d) The lexicon-grammar of italian idioms In Work-shop on Lexical and Grammatical Resources for Language Process-ing page 137

Villars R L Olofson C W and Eastwood M (2011) Big data Whatit is and why you should care In White Paper IDC

Vincze V (2010) Speculation and negation annotation in naturallanguage texts what the case of bioscope might (not) reveal InNegation and Speculation in Natural Language Processing (NeSp-NLP 2010) page 28

Vincze V Szarvas G Farkas R Moacutera G and Csirik J (2008) Thebioscope corpus biomedical texts annotated for uncertainty nega-tion and their scopes In BMC bioinformatics volume 9 page S9BioMed Central Ltd

Voghera M (2004) Polirematiche In Grossmann M Rainer F(a curadi) La formazione delle parole in italiano Tbingen Niemeyer pages56ndash69

Voll K and Taboada M (2007) Not all words are created equal Ex-tracting semantic orientation as a function of adjective relevance InAI 2007 Advances in Artificial Intelligence pages 337ndash346 Springer

Vollero A (2010) E-marketing e Web communication Verso la gestionedella corporate reputation online Giappichelli Torino

BIBLIOGRAPHY 281

von Ungern-Sternberg T and von Weizsaumlcker C C (1985) The sup-ply of quality on a market for experience goods In The Journal ofIndustrial Economics pages 531ndash540 JSTOR

Voyatzi S (2006) Description morphosyntaxique et seacutemantique desadverbes figeacutes en vue drsquoun systeacuteme drsquoanalyse automatique des textesgrecs PhD thesis Universiteacute Paris-Est

Waltinger U (2010) Germanpolarityclues A lexical resource for ger-man sentiment analysis In LREC

Wan X (2009) Co-training for cross-lingual sentiment classificationIn Proceedings of the Joint Conference of the 47th Annual Meeting ofthe ACL and the 4th International Joint Conference on Natural Lan-guage Processing of the AFNLP Volume 1 pages 235ndash243 Associa-tion for Computational Linguistics

Wang X Zhao Y and Fu G (2011) A morpheme-based method tochinese sentence-level sentiment classification In Int J of AsianLang Proc volume 21 pages 95ndash106

Warriner A B Kuperman V and Brysbaert M (2013) Norms of va-lence arousal and dominance for 13915 english lemmas In Behav-ior research methods volume 45 pages 1191ndash1207 Springer

Wawer A (2012) Extracting emotive patterns for languages with richmorphology In International Journal of Computational Linguisticsand Applications volume 3 pages 11ndash24

Wei C-P Chen Y-M Yang C-S and Yang C C (2010) Un-derstanding what concerns consumers a semantic approach toproduct feature extraction from consumer reviews In Informa-tion Systems and E-Business Management volume 8 pages 149ndash167Springer

Whissel C (1989) The dictionary of affect in language emotion The-ory research and experience vol 4 the measurement of emotionsr In Plutchik and H Kellerman Eds New York Academic

282 BIBLIOGRAPHY

Whissell C (2009) Using the revised dictionary of affect in languageto quantify the emotional undertones of samples of natural lan-guage 1 2 In Psychological reports volume 105 pages 509ndash521 Am-mons Scientific

Whitley M S (1995) Gustar and other psych verbs a problem in tran-sitivity In Hispania pages 573ndash585 JSTOR

Wiebe J (2000) Learning subjective adjectives from corpora InAAAIIAAI pages 735ndash740

Wiebe J Wilson T Bruce R Bell M and Martin M (2004) Learn-ing subjective language In Computational linguistics volume 30pages 277ndash308 MIT Press

Wiebe J Wilson T and Cardie C (2005) Annotating expressionsof opinions and emotions in language In Language resources andevaluation volume 39 pages 165ndash210 Springer

Wiegand M Balahur A Roth B Klakow D and Montoyo A (2010)A survey on the role of negation in sentiment analysis In Proceed-ings of the workshop on negation and speculation in natural lan-guage processing pages 60ndash68 Association for Computational Lin-guistics

Wilson D and Sperber D (1992) On verbal irony In Lingua vol-ume 87 pages 53ndash76 Elsevier

Wilson T Wiebe J and Hoffmann P (2005) Recognizing contex-tual polarity in phrase-level sentiment analysis In Proceedings ofthe conference on human language technology and empirical meth-ods in natural language processing pages 347ndash354 Association forComputational Linguistics

Wilson T Wiebe J and Hoffmann P (2009) Recognizing contextualpolarity An exploration of features for phrase-level sentiment anal-ysis In Computational linguistics volume 35 pages 399ndash433 MITPress

BIBLIOGRAPHY 283

Xia R and Zong C (2010) Exploring the use of word relation featuresfor sentiment classification In Proceedings of the 23rd InternationalConference on Computational Linguistics Posters pages 1336ndash1344Association for Computational Linguistics

Xiang G Fan B Wang L Hong J and Rose C (2012) Detecting of-fensive tweets via topical feature discovery over a large scale twittercorpus In Proceedings of the 21st ACM international conference onInformation and knowledge management pages 1980ndash1984 ACM

Xu Z and Zhu S (2010) Filtering offensive language in online com-munities using grammatical relations In Proceedings of the SeventhAnnual Collaboration Electronic Messaging Anti-Abuse and SpamConference

Yang S and Ko Y (2011) Extracting comparative entities and pred-icates from texts using comparative type classification In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies-Volume 1 pages 1636ndash1644 Association for Computational Linguistics

Ye Q Law R Gu B and Chen W (2011) The influence of user-generated content on traveler behavior An empirical investigationon the effects of e-word-of-mouth to hotel online bookings In Com-puters in Human Behavior volume 27 pages 634ndash639 Elsevier

Ye Q Zhang Z and Law R (2009) Sentiment classification of on-line reviews to travel destinations by supervised machine learningapproaches In Expert Systems with Applications volume 36 pages6527ndash6535 Elsevier

Yessenalina A and Cardie C (2011) Compositional matrix-spacemodels for sentiment analysis In Proceedings of the Conference onEmpirical Methods in Natural Language Processing pages 172ndash182Association for Computational Linguistics

284 BIBLIOGRAPHY

Yi J Nasukawa T Bunescu R and Niblack W (2003) Sentimentanalyzer Extracting sentiments about a given topic using naturallanguage processing techniques In Data Mining 2003 ICDM 2003Third IEEE International Conference on pages 427ndash434

Yin D Xue Z Hong L Davison B D Kontostathis A and Ed-wards L (2009) Detection of harassment on web 20 In Proceedingsof the Content Analysis in the WEB volume 2 pages 1ndash7

Yuen R W Chan T Y Lai T B Kwong O Y and Trsquosou B K(2004) Morpheme-based derivation of bipolar semantic orientationof chinese words In Proceedings of the 20th international conferenceon Computational Linguistics page 1008 Association for Computa-tional Linguistics

Zhang L and Liu B (2011) Identifying noun product features thatimply opinions In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human Language Tech-nologies short papers-Volume 2 pages 575ndash580 Association forComputational Linguistics

Zhang L Liu B Lim S H and OrsquoBrien-Strain E (2010) Extractingand ranking product features in opinion documents In Proceed-ings of the 23rd international conference on computational linguis-tics Posters pages 1462ndash1470 Association for Computational Lin-guistics

Zhao Q Sun C Liu B and Cheng Y (2010) Learning to detecthedges and their scope using crf In Proceedings of the FourteenthConference on Computational Natural Language LearningmdashSharedTask pages 100ndash105 Association for Computational Linguistics

Zhou N (2010) Taboo Language on the Internet An Analysis of Gen-der Differences in Using Taboo Language PhD thesis KristianstadUniversity

  • Introduction
    • Research Context
    • Filtering Information Automatically
    • Subjectivity Emotions and Opinions
    • Structure of the Thesis
      • Theoretical Background
        • The Lexicon-Grammar Theoretical Framework
          • Distributional and Transformational Analysis
          • Formal Representation of Elementary Sentences
            • The Continuum from Simple Sentences to Polyrematic Structures
            • Against the Bipartition of the Sentence Structure
              • Lexicon-Grammar Binary Matrices
              • Syntax-Semantics Interface
                • Frame Semantics
                • Semantic Predicates Theory
                    • The Italian Module of Nooj
                      • Electronic Dictionaries
                      • Local grammars and Finite-state Automata
                          • SentIta Lexicon-Grammar based Sentiment Resources
                            • Lexicon-Grammar of Sentiment Expressions
                            • Literature Survey on Subjectivity Lexicons
                              • Polarity Lexicons
                              • Affective Lexicons
                              • Subjectivity Lexicons for Other Languages
                                • Italian Resources
                                • Sentiment Lexical Resources for Different Languages
                                    • SentIta and its Manually-built Resources
                                      • Semantic Orientations and Prior Polarities
                                      • Adjectives of Sentiment
                                        • A Special Remark on Negative Words in Dictionaries and Positive Biases in Corpora
                                          • Psychological and other Opinion Bearing Verbs
                                            • A Brief Literature Survey on the Classification of Psych Verbs
                                            • Semantic Predicates from the LG Framework
                                            • Psychological Predicates
                                            • Other Verbs of Sentiment
                                              • Psychological Predicates Nominalizations
                                                • The Presence of Support Verbs
                                                • The Absence of Support Verbs
                                                • Ordinary Verbs Structures
                                                • Standard-Crossed Structures
                                                  • Psychological Predicates Adjectivalizations
                                                    • Support Verbs
                                                    • Nominal groups with appropriate nouns Napp
                                                    • Verbless Structures
                                                    • Evaluation Verbs
                                                      • Vulgar Lexicon
                                                        • Towards a Lexicon of Opinionated Compounds
                                                          • Compound Adverbs and their Polarities
                                                            • Compound Adjectives of Sentiment
                                                              • Frozen Expressions of Sentiment
                                                                • The Lexicon-Grammar Classes Under Examination PECO CEAC EAA ECA and EAPC
                                                                • The Formalization of Sentiment Frozen Expressions
                                                                    • The Automatic Expansion of SentIta
                                                                      • Literature Survey on Sentiment Lexicon Propagation
                                                                        • Thesaurus Based Approaches
                                                                        • Corpus Based Approaches
                                                                        • The Morphological Approach
                                                                          • Deadjectival Adverbs in -mente
                                                                            • Semantics of the -mente formations
                                                                            • The Superlative of the -mente Adverbs
                                                                              • Deadjectival Nouns of Quality
                                                                              • Morphological Semantics
                                                                                  • Shifting Lexical Valences through FSA
                                                                                    • Contextual Valence Shifters
                                                                                    • Polarity Intensification and Downtoning
                                                                                      • Related Works
                                                                                      • Intensification and Downtoning in SentIta
                                                                                      • Excess Quantifiers
                                                                                        • Negation Modeling
                                                                                          • Related Works
                                                                                          • Negation in SentIta
                                                                                            • Modality Detection
                                                                                              • Related Works
                                                                                              • Modality in SentIta
                                                                                                • Comparative Sentences Mining
                                                                                                  • Related Works
                                                                                                  • Comparison in SentIta
                                                                                                  • Interaction with the Intensification Rules
                                                                                                  • Interaction with the Negation Rules
                                                                                                      • Specific Tasks of the Sentiment Analysis
                                                                                                        • Sentiment Polarity Classification
                                                                                                          • Related Works
                                                                                                          • DOXA a Sentiment Classifier based on FSA
                                                                                                          • A Dataset of Opinionated Reviews
                                                                                                          • Experiment and Evaluation
                                                                                                            • Sentence-level Sentiment Analysis
                                                                                                            • Document-level Sentiment Analysis
                                                                                                                • Feature-based Sentiment Analysis
                                                                                                                  • Related Works
                                                                                                                  • Experiment and Evaluation
                                                                                                                  • Opinion Visual Summarization
                                                                                                                    • Sentiment Role Labeling
                                                                                                                      • Related Works
                                                                                                                      • Experiment and Evaluation
                                                                                                                          • Conclusion
                                                                                                                          • Italian Lexicon-Grammar Symbols
                                                                                                                          • Acronyms
                                                                                                                          • Bibliography
Page 5: Detecting Subjectivity through Lexicon-Grammar Strategies

CONTENTS vii

422 Intensification and Downtoning in SentIta 155423 Excess Quantifiers 158

43 Negation Modeling 160431 Related Works 160432 Negation in SentIta 163

44 Modality Detection 170441 Related Works 170442 Modality in SentIta 172

45 Comparative Sentences Mining 175451 Related Works 175452 Comparison in SentIta 178453 Interaction with the Intensification Rules 179454 Interaction with the Negation Rules 182

5 Specific Tasks of the Sentiment Analysis 18551 Sentiment Polarity Classification 185

511 Related Works 186512 DOXA a Sentiment Classifier based on FSA 188513 A Dataset of Opinionated Reviews 189514 Experiment and Evaluation 191

5141 Sentence-level Sentiment Analysis 1925142 Document-level Sentiment Analysis 195

52 Feature-based Sentiment Analysis 197521 Related Works 198522 Experiment and Evaluation 203523 Opinion Visual Summarization 210

53 Sentiment Role Labeling 211531 Related Works 213532 Experiment and Evaluation 215

6 Conclusion 223

A Italian Lexicon-Grammar Symbols 229

B Acronyms 231

Bibliography 239

Chapter 1

Introduction

Consumers as Internet users can freely share their thoughts withhuge and geographically dispersed groups of people competing thisway with the traditional power of marketing and advertising channelsDifferently from the traditional word-of-mouth which is usually lim-ited to private conversations the Internet used generated contentscan be directly observed and described by the researchers (Pang andLee 2008a p 7) identified the 2001 as the ldquobeginning of widespreadawareness of the research problems and opportunities that SentimentAnalysis and Opinion Mining raise and subsequently there have beenliterally hundreds of papers published on the subjectrdquoThe context that facilitated the growth of interest around the auto-matic treatment of opinion and emotions can be summarized in thefollowing points

bull the expansion of the e-commerce (Matthews et al 2001)

bull the growth of the user generated contents (forum discussiongroup blog social media review website aggregation site) thatcan constitute large scale databases for the machine learning al-gorithms training (Chin 2005 Pang and Lee 2008a)

1

2 CHAPTER 1 INTRODUCTION

bull the rise of machine learning methods (Pang and Lee 2008a)

bull the importance of the on-line Word of Mouth (eWOM) (Gruenet al 2006 Lee and Youn 2009 Park and Lee 2009 Chu and Kim2011)

bull the customer empowerment (Vollero 2010)

bull the large volume the high velocity and the wide variety of un-structured data (Russom et al 2011 Villars et al 2011 McAfeeet al 2012)

The same preconditions caused a parallel raising of attention alsoin economics and marketing literature studies Basically through usergenerated contents consumers share positive or negative informationthat can influence in many different ways the purchase decisionsand can model the buyer expectations above all with regard to experi-ence goods (Nakayama et al 2010) such as hotels (Ye et al 2011 Nel-son 1970) restaurants (Zhang et al 2010) movies (Duan et al 2008Reinstein and Snyder 2005) books (Chevalier and Mayzlin 2006) orvideogames (Zhu and Zhang 2006 Bounie et al 2005)

11 Research Context

Experience goods are products or services whose qualities cannot beobserved prior to purchase They are distinguished from search goods(eg clothes smartphones) on the base of the possibility to test thembefore the purchase and depending on the information costs (Nelson1970) (Akerlof 1970 p 488-490) clearly exemplified the problemscaused by the interaction of quality differences and uncertainty

ldquoThere are many markets in which buyers use some marketstatistic to judge the quality of prospective purchases []The automobile market is used as a finger exercise to illus-trate and develop these thoughts [] The individuals in this

11 RESEARCH CONTEXT 3

market buy a new automobile without knowing whether thecar they buy will be good or a lemon1 [] After owninga specific car however for a length of time the car ownercan form a good idea of the quality of this machine [] Anasymmetry in available information has developed for thesellers now have more knowledge about the quality of a carthan the buyers [] The bad cars sell at the same price asgood cars since it is impossible for a buyer to tell the differ-ence between a good and a bad car only the seller knowsrdquo

Asymmetric information could damage both the customers whenthey are not able to recognize good quality products and servicesand the market itself when it is negatively affected by the presenceof a smaller quantity of quality products with respect to a quantityof quality products that would have been offered under the conditionof symmetrical information (Lavanda and Rampa 2001 von Ungern-Sternberg and von Weizsaumlcker 1985)However it has been deemed possible for experience goods to basean evaluation of their quality on the past experiences of customersthat in previous ldquoperiodsrdquo had already experienced the same goods

ldquoIn most real world situations suppliers stay on the marketfor considerably longer than one lsquoperiodrsquo They then havethe possibility to build up reputations or goodwill with theconsumers This is due to the following mechanism Whilethe consumers cannot directly observe a productrsquos quality atthe time of purchase they may try to draw inferences aboutthis quality from the past experience they (or others) havehad with this supplierrsquos productsrdquo (von Ungern-Sternbergand von Weizsaumlcker 1985 p 531)

Possible sources of information for buyers of experience goods areword of mouth and the good reputation of the sellers (von Ungern-Sternberg and von Weizsaumlcker 1985 Diamond 1989 Klein and Lef-

1A defective car discovered to be bad only after it has been bought

4 CHAPTER 1 INTRODUCTION

fler 1981 Shapiro 1982 1983 Dewally and Ederington 2006)The possibility to survey other customersrsquo experiences increases thechance of selecting a good quality service and the customer expectedutility It is affordable to continue with this search until the cost ofsuch is equated to its expected marginal return This is the ldquooptimumamount of searchrdquo defined by Stigler (1961)The rapid growth of the Internet drew the managers and business aca-demics attention to the possible influences that this medium can ex-ert on customers information search behaviors and acquisition pro-cesses The hypothesis that the Web 20 as interactive medium couldmake transparent the experience goods market by transforming it intoa search goods market has been explored by Klein (1998) who hasbased his idea on three factors

bull lower search costs

bull different evaluation of some product (attributes)

bull possibility to experience products (attributes) virtually withoutphysically inspecting them

Actually this idea seems too groundbreaking to be entirely true In-deed the real outcome of web searches depends on the quantity ofavailable information on its quality and on the possibility to makefruitful comparisons among alternatives (Alba et al 1997 Nakayamaet al 2010)Therefore from this perspective it is true that the growth of the usergenerated contents and the eWOM can truly reduce the informa-tion search costs On the other hand the distance increased by e-commerce the content explosion and the information overload typi-cal of the Big Data age can seriously hinder the achievement of a sym-metrical distribution of the information affecting not only the marketof experience goods but also that of search goods

12 FILTERING INFORMATION AUTOMATICALLY 5

12 Filtering Information Automatically

The last remarks on the influence of Internet on information searchesinevitably reopen the long-standing problems connected to web in-formation filtering (Hanani et al 2001) and to the possibility of auto-matically ldquoextracting value from chaosrdquo (Gantz and Reinsel 2011)An appropriate management of online corporate reputation requiresa careful monitoring of the new digital environments that strengthenthe stakeholdersrsquo influence and independence and give support dur-ing the decision making process As an example to deeply analyzethem can indicate to a company the areas that need improvementsFurthermore they can be a valid support in the price determinationor in the demand prediction (Bloom 2011 Pang and Lee 2008a)It would be difficult indeed for humans to read and summarize sucha huge volume of data about costumer opinions However in otherrespects to introduce machines to the semantic dimension of humanlanguage remains an ongoing challengeIn this context business and intelligence applications would play acrucial role in the ability to automatically analyze and summarize notonly databases but also raw data in real timeThe largest amount of on-line data is semistructured or unstructuredand as a result its monitoring requires sophisticated Natural Lan-guage Processing (NLP) tools that must be able to pre-process themfrom their linguistic point of view and then automatically accesstheir semantic contentTraditional NLP tasks consist in Data Mining Information Retrievaland Extraction of facts objective information about entities fromsemistructured and unstructured textsIn any case it is of crucial importance for both customers and compa-nies to dispose of automatically extracted analyzed and summarizeddata which do not include only factual information but also opinionsregarding any kind of good they offer (Liu 2010 Bloom 2011)Companies could take advantage of concise and comprehensive cus-

6 CHAPTER 1 INTRODUCTION

tomer opinion overviews that automatically summarize the strengthsand the weaknesses of their products and services with evident ben-efits in term of reputation management and customer relationshipmanagementCustomer information search costs could be decreased trough thesame overviews which offer the opportunity to evaluate and comparethe positive and negative experiences of other consumers who havealready tested the same products and servicesObviously the advantages of sophisticated NLP methods and soft-ware and their ability to distinguish factual from opinionated lan-guage are not limited to the ones discussed so far but they are dis-persed and specialized among different tasks and domains such as

Ads placement possibility to avoid websites that are irrelevant butalso unfavorable when placing advertisements (Jin et al 2007)

Question-answering chance to distinguish between factual andspeculative answers (Carbonell 1979)

Text summarization potential of summarizing different opinionsand perspectives in addition to factual information (Seki et al2005)

Recommendation systems opportunity to recommend only the itemswith positive feedback (Terveen et al 1997)

Flame and cyberbullying detection ability to find semantically andsyntactically complex offensive contents on the web (Xiang et al2012 Chen et al 2012 Reynolds et al 2011)

Literary reputation tracking possibility to distinguish disapprovingfrom supporting citations (Piao et al 2007 Taboada et al 2006)

Political texts analysis opportunity to better understand positive ornegative orientations of both voters or politicians (Laver et al2003 Mullen and Malouf 2006)

13 SUBJECTIVITY EMOTIONS AND OPINIONS 7

13 Subjectivity Emotions and Opinions

Sentiment Analysis Opinion Mining subjectivity analysis review min-ing appraisal extraction affective computing are all terms that refer tothe computational treatment of opinion sentiment and subjectivity inraw text Their ultimate shared goal is enabling computers to recog-nize and express emotions (Pang and Lee 2008a Picard and Picard1997)The first two terms are definitely the most used ones in literature Al-though they basically refer to the same subject they have demon-strated varying degrees of success into different research communi-ties While Opinion Mining is more popular among web informationretrieval researchers Sentiment Analysis is more used into NLP en-vironments (Pang and Lee 2008a) Therefore the coherence of ourworkrsquos approach is the only reason that justifies the favor of the termSentiment Analysis All the terms mentioned above could be used andinterpreted nearly interchangeablySubjectivity is directly connected to the expression of private statespersonal feelings or beliefs toward entities events and their proper-ties (Liu 2010) The expression private state has been defined by Quirket al (1985) and Wiebe et al (2004) as a general covering term for opin-ions evaluations emotions and speculationsThe main difference from objectivity regards the impossibility to di-rectly observe or verify subjective language but neither subjective norobjective implies truth ldquowhether or not the source truly believes theinformation and whether or not the information is in fact true areconsiderations outside the purview of a theory of linguistic subjectiv-ityrdquo (Wiebe et al 2004 p 281)Opinions are defined as positive or negative views attitudes emotionsor appraisals about a topic expressed by an opinion holder in a giventime They are represented by Liu (2010) as the following quintuple

oj fjk ooijkl hi tl

8 CHAPTER 1 INTRODUCTION

Where oj represents the object of the opinion fjk its features ooijklthe positive or negative opinion semantic orientation hi the opinionholder and tl the time in which the opinion is expressedBecause the time can almost alway be deducted from structured datawe focused our work on the automatic detection and annotation ofthe other elements of the quintupleAs regard the opinion holder it must be underlined that ldquoIn the caseof product reviews and blogs opinion holders are usually the authorsof the posts Opinion holders are more important in news articles be-cause they often explicitly state the person or organization that holdsa particular opinionrdquo (Liu 2010 p 2)The Semantic Orientation (SO) gives a measure to opinions byweighing their polarity (positivenegativeneutral) and strength (in-tenseweak) (Taboada et al 2011 Liu 2010) The polarity can be as-signed to words and phrases inside or outside of a sentence or dis-course context In the first case it is called prior polarity (Osgood1952) in the second case contextual or posterior polarity (Gatti andGuerini 2012)We can refer to both opinion objects and features with the term target(Liu 2010) represented by the following function

T=O(f)

Where the object can take the shape of products services individ-uals organizations events topics etc and the features are compo-nents or attributes of the objectEach object O represented as a ldquospecial featurerdquo which is defined by asubset of features is formalized in the following way

F = f1 f2 fn

Targets can be automatically discovered in texts through both syn-onym words and phrases Wi or indicators Ii (Liu 2010)

Wi = wi1 wi2 wim

14 STRUCTURE OF THE THESIS 9

Ii = ii1 ii2 iiq

Emotions constitute the research object of the Emotion Detec-tion (Strapparava et al 2004 Fragopanagos and Taylor 2005 Almet al 2005 Strapparava et al 2006 Neviarouskaya et al 2007 Strap-parava and Mihalcea 2008 Whissell 2009 Argamon et al 2009Neviarouskaya et al 2009b 2011 de Albornoz et al 2012) a very ac-tive NLP subfield that encroaches on Speech Recognition (Buchananet al 2000 Lee et al 2002 Pierre-Yves 2003 Schuller et al 2003Bhatti et al Schuller et al 2004) and Facial Expression Recognition(Essa et al 1997 Niedenthal et al 2000 Pantic and Patras 2006 Palet al 2006 De Silva et al 1997) In this NLP based work we focus onthe Emotion Detection from written raw textsIn order to provide a formal definition of sentiments we refer to thefunction of Gross (1995)

P(senth)

Caus(s sent h)

In the first one the experiencer of the emotion (h) is function ofa sentiment (Sent) through a predicative relationship the latter ex-presses a sentiment that is caused by a stimulus (s) on a person (h)

14 Structure of the Thesis

In this Chapter we introduced the research field of the SentimentAnalysis placing it in the wider framework of Computational Linguis-tics and Natural Language Processing We tried to clarify some termi-nological issues raised by the proliferation and the overlap of termsin the literature studies on subjectivity The choice of these topics hasbeen justified through the brief presentation of the research contextwhich poses numerous challenges to all the Internet stakeholdersThe main aim of the thesis has been identified with the necessity to

10 CHAPTER 1 INTRODUCTION

treat both facts and opinion expressed in the the Web in the form ofraw data with the same accuracy and velocity of the data stored indatabase tablesDetails about the contribution provided by this work to SentimentAnalysis studies are structured as followsChapter 2 introduces the theoretical framework on which the wholework has been grounded the Lexicon-Grammar (LG) approach withan in-depth examination of the distributional and transformationalanalysis and the semantic predicates theory (Section 21)The assumption that the elementary sentences are the basic discourseunits endowed with meaning and the necessity to replace a uniquegrammar with plenty of lexicon-dependent local grammars are thefounding ideas from the LG approach on which the research hasbeen designed and realizedThe formalization of the sentiment lexical resources and tools as beenperformed through the tools described in Section 22Coherently with the LG method our approach to Sentiment Analy-sis is lexicon-based Chapter 3 provides a detailed description of thesentiment lexicon built ad hoc for the Italian language It includesmanually annotated resources for both simple (Section 33) and mul-tiword (Section 34) units and automatically derived annotations (Sec-tion 35) for the adverbs in -mente and for the nouns of qualityBecause the wordrsquos meaning cannot be considered out of contextChapter 3 explores the influences of some elementary sentence struc-tures on the sentiment words occurring in them while Chapter 4 in-vestigates the effects on the sentence polarities of the so called Con-textual Valence Shifters (CVS) namely intensification and downtoning(Section 42) Negation (Section 43) Modality (Section 44) and Com-parison (Section 45)Chapter 5 focuses on the three most challenging subtasks of the Senti-ment Analysis the sentiment polarity classification of sentences anddocuments (Section 51) the Sentiment Analysis based on features(Section 52) and the sentiment-based semantic role labeling (Sec-

14 STRUCTURE OF THE THESIS 11

tion 53) Experiments about these tasks are conduced through ourresources and rules respectively on a multi-domain corpus of cus-tomer reviews on a dataset of hotels comments and on a dataset thatmixes tweets and news headlinesConclusive remarks and limitations of the present work are reportedin Chapter 6Due to the large number of challenges faced in this work and in or-der to avoid penalizing the clarity and the readability of the thesiswe preferred to mention the state of the art works related to a giventopic directly in the sections in which the topic is discussed For easyreference we anticipate that the literature survey on the already exis-tent sentiment lexicons is reported in Section 32 the methods testedin literature for the lexicon propagation are presented in 351 worksrelated to intensification and negation modeling to modality detec-tion and to comparative sentence mining are respectively mentionedin Sections 421 431 441 and 451 in the end state of the artapproaches to sentiment classification the feature-based SentimentAnalysis and the Sentiment Role Labeling (SRL) tasks are cited in Sec-tions 511 521 and 531Some Sections of this thesis refer to works that in part have been al-ready presented to the international computational linguistics com-munity These are the cases of Chapter 3 briefly introduced in Pelosi(2015ab) and Pelosi et al (2015) and Chapter 4 anticipated in Pelosiet al (2015) Furthermore the results presented in Section 51 havebeen discussed in Maisto and Pelosi (2014b) as well as the results ofSection 52 in Maisto and Pelosi (2014a) and the ones of Section 53 inElia et al (2015)

Chapter 2

Theoretical Background

21 The Lexicon-Grammar Theoretical Framework

With Lexicon-Grammar we mean the method and the practice offormal description of the natural language introduced by MauriceGross in the second half of the 1960s who during the verificationof some procedures from the transformational-generative grammar(TGG) (Chomsky 1965) laid the foundations for a brand new theo-retical frameworkMore specifically during the description of French complementclause verbs through a transformational-generative approach Gross(1975) realized that the English descriptive model of Rosenbaum(1967) was not enough to take into account the huge number of ir-regularities he found in the French languageLG introduced important changes in the way in which the relationshipbetween lexicon and syntax was conceived (Gross 1971 1975) It hasbeen underlined for the first time the necessity to provide linguisticdescriptions grounded on the systematic testing of syntactic and se-mantic rules along the whole lexicon and not only on a limited set of

13

14 CHAPTER 2 THEORETICAL BACKGROUND

speculative examplesActually in order to define what can be considered to be a rule andwhat must be considered an exception a small sample of the lexiconof the analyzed language collected in arbitrary ways can be truly mis-leading It is assumed to be impossible to make generalizations with-out verifying or falsifying the hypotheses According with (Gross 1997p 1)

ldquoGrammar or as it has now been called linguistic the-ory has always been driven by a quest for complete gen-eralizations resulting invariably in recent times in the pro-duction of abstract symbolism often semantic but alsoalgorithmic This development contrasts with that of theother Natural Sciences such as Biology or Geology wherethe main stream of activity was and is still now the searchand accumulation of exhaustive data Why the study of lan-guage turned out to be so different is a moot question Onecould argue that the study of sentences provides an endlessvariety of forms and that the observer himself can increasethis variety at will within his own production of new formsthat would seem to confirm that an exhaustive approachmakes no sense []But grammarians operating at the level of sentences seemto be interested only in elaborating general rules and do sowithout performing any sort of systematic observation andwithout a methodical accumulation of sentence forms to beused by further generations of scientistsrdquo

In the LG methodology it is crucial the collection and the analysisof a large quantity of linguistic facts and their continuous compari-son with the reality of the linguistic usages by examples and counter-examplesThe collection of the linguistic information is constantly registeredinto LG tables that cross-check the lexicon and the syntax of any given

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 15

language The ultimate purpose of this procedure is the definition ofwide-ranging classes of lexical items associated to transformationaldistributional and structural rules (DrsquoAgostino 1992)What emerges from the LG studies is that associating more than fiveor six properties to a lexical entries each one of such entries shows anindividual behavior that distinguishes it from any other lexical itemHowever it is always possible to organize a classification around at listone definitional property that is simultaneously accepted by all theitem belonging to a same LG class and for this reason is promoted asdistinctive feature of the classHaving established this it becomes easier to understand the necessityto replace ldquotherdquo grammar with thousands of lexicon-dependent localgrammarsThe choice of this paradigm is due to its compatibility with the pur-poses of the present work and with the computational linguistics ingeneral that in order to reach high performances in results requiresa large amount of linguistic data which must be as exhaustive repro-ducible and well organized as possibleThe huge linguistic datasets produced over the years by the inter-national LG community provide fine-grained semantic and syntac-tic descriptions of thousands of lexical entries also referred as ldquolexi-cally exhaustive grammarsrdquo (DrsquoAgostino 2005 DrsquoAgostino et al 2007Guglielmo 2009) available for reutilization in any kind of NLP task1The LG classification and description of the Italian verbs2 (Elia et al1981 Elia 1984 DrsquoAgostino 1992) is grounded on the differentiationof three different macro-classes transitive verbs intransitive verbsand complement clause verbs Every LG class has its own definitionalstructure that corresponds with the syntactic structure of the nuclearsentence selected by a given number of verbs (eg V for piovere ldquotorain and all the verbs of the class 1 N0 V for bruciare ldquoto burn and

1 LG descriptions are available for 4000+ nouns that enter into verb support constructions 7000+verbs 3000+ multiword adverbs and almost 1000 phrasal verbs (Elia 2014a)

2LG tables are freely available at the address httpdscunisaitcompostitavolecombo

tavoleasp

16 CHAPTER 2 THEORETICAL BACKGROUND

the other verbs of the class 3 N0 V da N1 for provenire ldquoto come fromand the verbs belonging to the class 6 etc) All the lexical entries arethen differentiated one another in each class by taking into accountall the transformational distributional and structural properties ac-cepted or rejected by every item

211 Distributional and Transformational Analysis

The Lexicon-Grammar theory lays its foundations on the Operator-argument grammar of Zellig S Harris the combinatorial system thatsupports the generation of utterances into the natural languageSaying that the operators store inside information regarding the sen-tence structures means to assume the nuclear sentence to be the min-imum discourse unit endowed with meaning (Gross 1992b)This premise is shared with the LG theory together with the centralityof the distributional analysis a method from the structural linguisticsthat has been formulated for the first time by Bloomfield (1933) andthen has been perfected by Harris (1970) The insight that some cat-egories of words can somehow control the functioning of a numberof actants through a dependency relationship called valency insteadcomes from Tesniegravere (1959)The distribution of an element A is defined by Harris (1970) as the sumof the contexts of A Where the context of A is the actual disposition ofits co-occurring elements It consists in the systematic substitution oflexical items with the aim of verifying the semantic or the transforma-tional reactions of the sentences All of this is governed by verisimil-itude rules that involve a graduation in the acceptability of the itemcombinationAlthough the native speakers of a language generally think that thesentence elements can be combined arbitrarily they actually choosea set of items along the classes that regularly appear together Theselection depends on the likelihood that an element co-occurs with

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 17

elements of one class rather than another3When two sequences can appear in the same context they have anequivalent distribution When they do not share any context they havea complementary distributionObviously the possibility to combine sentence elements is related tomany different levels of acceptability Basically the utterances pro-duced by the words co-occurrence can vary in terms of plausibilityIn the operator-argument grammar of Harris the verisimilitude of oc-currence of a word under an operator (or an argument) is an approxi-mation of the likelihood or of the frequency of this word with respectto a fixed number of occurrence of the operator (or argument) (Harris1988)Concerning transformations (Harris 1964) we refer to the phe-nomenon in which the sentences A and B despite a different combi-nation of words are equivalent to a semantic point of view (eg Flo-ria ha letto quel romanzo = quel romanzo egrave stato letto da Floria ldquoFlo-ria read that novel = that novel has been read by Floriardquo (DrsquoAgostino1992)) Therefore a transformation is defined as a function T that tiesa set of variables representing the sentences that are in paraphrasticequivalence (A and B are partial or full synonym) and that follow themorphemic invariance (A and B possess the same lexical morphemes)Transformational relations must not be mixed up with any derivationprocess of the sentence B from A and vice versa A and B are just twocorrelated variants in which equivalent grammatical relations are re-alized (DrsquoAgostino 1992)4 That is why in this work we do not use di-rected arrows to indicate transformed sentencesAmong the transformational manipulations considered in this thesiswhile allying the Lexicon-Grammar theoretical framework to the Sen-

3Among the traits that mark the co-occurrences classes we mention by way of example human andnot human nouns abstract nouns verbs with concrete or abstract usages etc (Elia et al 1981)

4 Different is the concept of transformation in the generative grammar that envisages an orientedpassage from a deep structure to a superficial one ldquoThe central idea of transformational grammar isthat they are in general distinct and that the surface structure is determined by repeated applicationof certain formal operations called lsquogrammatical transformationsrsquo to objects of a more elementary sortrdquo(Chomsky 1965 p 16)

18 CHAPTER 2 THEORETICAL BACKGROUND

timent Analysis field we mention the following (Elia et al 1981 Vietri2004)

Substitution The substitution of polar elements in sentences is thestarting point of this work that explores the mutations in term ofacceptability and semantic orientation in the transformed sen-tences It can be found almost in every chapter of this thesis

Paraphrases with support verbs5 nominalizations and adjectival-izations This manipulation is essential for example in Section334 and 335 where we discuss the nominalization (eg amare= amore ldquoto love = loverdquo) and the adjectivalization (eg amare =amoroso ldquoto love = lovingrdquo) of the Italian psychological verbs orin Section 352 when we account for the relation among a set ofsentiment adjectives and adverbs (eg appassionatamente = inmodo appassionato ldquopassionately = in a passionate wayrdquo)

Repositioning Emphasis permutation dislocation extraction are alltransformation that shifting the focus on a specific element ofthe sentence influence its strength when the element is polar-ized

Extension We are above all interested in the expansion of nominalgroups with opinionated modifiers (see Section 332)

Restructuring The complement restructuring plays a crucial role inthe Feature-based Sentiment Analysis for the correct annotationof the features and the objects of the opinions (see Sections 334and 52)

Passivization In our system of rules passive sentences merely pos-

5 The concept of operator does not depend on specific part of speech so also nouns adjectives andprepositions can possess the power to determine the nature and the number of the sentence arguments(DrsquoAgostino 1992) Because only the verbs carry out morpho-grammatical information regarding themood tense person and aspect they must give this kind of support to non-verbal operators The socalled Support Verbs (Vsup) are different from auxiliaries (Aux) that instead support other verbs Sup-port verbs (eg essere ldquoto berdquo avere ldquoto haverdquo fare ldquoto dordquo) can be case by case substituted by stylisticaspectual and causative equivalents

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 19

sess the same semantic orientation of the corresponding activesentence

Systematic correlation Examples are the standardcrossed structuresmentioned in Section 334

212 Formal Representation of Elementary Sentences

While the generative grammar focused its studies on complex sen-tences both the Operator grammar and the LG approach assumedthe respectively so called nuclear or elementary sentences to bethe minimum discourse units endowed with meaning Above all be-cause complex sentences are made from elementary ones as statedby (Chomsky 1957 p 92) himself

ldquoIn particular in order to understand a sentence it is neces-sary to know the kernel sentences from which it originates(more precisely the terminal strings underlying these ker-nel sentences) and the phrase structure of each of these el-ementary components as well as the transformational his-tory of development of the given sentence from these kernelsentences The general problem of analyzing the process oflsquounderstandingrsquo is thus reduced in a sense to the problemof explaining how kernel sentences are understood thesebeing considered the basic rsquocontent elementsrsquo from whichthe usual more complex sentences of real life are formed bytransformational developmentrdquo

This assumption shifts the definition of creativity from ldquorecursivityrdquoin complex sentences (Chomsky 1957) to ldquocombinatory possibilityrdquoat the level of elementary sentences (Vietri 2004) but affects also theway in which the lexicon is conceived (Gross 1981 p 48) made thefollowing observation in this regard6

6 On this topic also Chomsky agrees that

20 CHAPTER 2 THEORETICAL BACKGROUND

ldquoLes entreacutees du lexique ne sont pas des mots mais des phrasessimples Ce principe nrsquoest en contradiction avec les notionstraditionnelles de lexique que de faccedilon apparente En ef-fet dans un dictionnaire il nrsquoest pas possible de donner lesens drsquoun mot sans utiliser une phrase ni de contraster desemplois diffeacuterents drsquoun mecircme mot sans le placer dans desphrasesrdquo7

As it will be discovered in the following Sections the idea that theentries of the lexicon can be sentences rather that isolated words is aprinciple that has been entirely shared among the sentiment lexicalresources built for the present workWith regard to formalisms Gross (1992b) represented all the elemen-tary sentences through the following generic shape

N0 V WWhere N0 stands for the sentence formal subject V stands for the

verbs and W can indicate all kinds of essential complements includ-ing an empty one While N0 V represents a great generality W raisesmore complex classification problems (Gross 1992b) Complementsindicated by W can be direct (Ni) prepositional (Prep Ni) or sentential(Ch F)

ldquoIn describing the meaning of a word it is often expedient or necessary to refer to thesyntactic framework in which this word is usually embedded eg in describing themeaning of hit we would no doubt describe the agent and object of the action in termsof the notions subject and object which are apparently best analyzed as purelyformal notions belonging to the theory of grammarrdquo (Chomsky 1957 p 104)

But without any exhaustive application of syntactic rules to lexical items he feels compelled to pointout that

ldquo() to generalize from this fairly systematic use and to assign lsquostructural meaningsrsquo togrammatical categories or constructions just as lsquolexical meaningsrsquo are assigned to wordsor morphemes is a step of very questionable validityrdquo (Chomsky 1957 p 104)

7 ldquoThe entries of the lexicon are not words but simple sentences This principle contradicts the tradi-tional notions of lexicon just into an apparent way Indeed into a dictionary it is impossible to providethe meaning of a word without a sentence nor to contrast different uses of the same word without placingit into sentencesrdquo Author translation

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 21

Information about the nature and the number of complement arecontained in the verbs (or in other parts of speech that case by caseplay the a predicative function)Gross (1992b) presented the following typology of complements re-spectively approached for the Italian language by the LG researchersmentioned below

frozen arguments (C) Vietri (1990 2011 2014d)

free arguments (N) DrsquoAgostino (1992) Bocchino (2007)

sentential arguments (Ch F) Elia (1984)

2121 The Continuum from Simple Sentences to Polyrematic Structures

Gross (1988 1992b 1993) recognized and formalized the concept ofphrase figeacutee ldquofrozen sentencerdquo which refers to groups of words thatcan be tied one another by different degrees of variability and cohe-sion (Elia and De Bueriis 2008) They can take the shape of verbalnominal adjectival or adverbial structuresThe continuum along which frozen expressions can be classifiedvaries according to higher or lower levels of compositionality and id-iomaticity and goes from completely free structures to totally frozenexpressions as summarized below (DrsquoAgostino and Elia 1998 Eliaand De Bueriis 2008)

distributionally free structures high levels of co-occurrence variabil-ity among words

distributionally restricted structures limited levels of co-occurrencevariability

frozen structures (almost) null levels of co-occurrence variability

proverbs no variability of co-occurrence

22 CHAPTER 2 THEORETICAL BACKGROUND

Idiomatic interpretations can be attributed to the last two elements ofthe continuum due to their low levels of compositionalityThe fact that the meaning of these expressions cannot be computedby simply summing up the meanings of the words that compose themhas a direct impact on every kind of Sentiment Analysis applicationthat does not correctly approaches the linguistic phenomenon of id-iomaticityWe will show the role of multiword expressions in Section 34

2122 Against the Bipartition of the Sentence Structure

In the operator grammar differently from the generative grammarsentences are not supposed to have a bipartite structure

S rarr N P + V P

V P rarrV + N P 8

but are represented as predicative functions in which a central ele-ment the operator influences the organization of its variables thearguments The role of operator must not be represented necessar-ily by verbs but also by other lexical items that possess a predicativefunction such as nous or adjectivesThis idea shared also by the LG framework is particularly importantin the Sentiment Analysis field especially if the task is the detection ofthe semantic roles involved into sentiment expressions (Section 53)Sentences like (1) and (2) are examples in which the semantic roles ofexperiencer of the sentiment and stimulus of the sentiment are respec-tively played by ldquoMariardquo and il fumo ldquothe smokerdquo It is intuitive thatfrom a syntactic point of view these semantic roles are differently dis-tributed into the direct (the experiencer is the subject such as in 1)and the reverse (the experiencer is the object as in 2) sentence struc-tures (see Section 333 for further explanations)

8Sentence = Noun Phrase + Verb Phrase Verb Phrase = Verb + Noun Phrase

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 23

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]9

ldquoMaria hates the smokerdquo

(2) [Sentiment[StimulusIl fumo] tormenta [ExperiencerMaria]]ldquoThe smoke harasses Mariardquo

The indissoluble bond between the sentiment and the experi-encer10 that does not differ from (1) to (2) is perfectly expressedby the Operator-argument grammar function Onn for both the sen-tences by the LG representation N0 V N1 or by the function of Gross(1995) Caus(ssenth) (described in Section 13 and deepened in Sec-tion 3)

(1) Caus(il fumo odiare Maria)(2) Caus(il fumo tormentare Maria)

The same can not be said of the sentence bipartite structure thatrepresenting the experiencer ldquoMariardquo inside (2) and outside (1) theverb phrase unreasonably brings it closer or further from the senti-ment to which it should be attached just in the same way (Figure 21)

Moreover it does not explain how the verbs of (1) and (2) can ex-ercise distributional constraints on the noun phrase indicating the ex-periecer (that must be human or at least animated) both in the casein which it is under the dependence of the same verb phrase (experi-encer object see 2b) and also when it is not (experiencer subject see2a)

(1a) [Sentiment [Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]odi a[Stimulusi l f umo]]

ldquo(Maria + My cat + The television) hates the smokerdquo

9For the sentence annotation we used the notation of Jackendoff (1990) [Event ]10Un sentiment est toujours attacheacute agrave la personne qui lrsquoeacuteprouve ldquoA sentiment is always attached to the

person that feels itrdquo (Gross 1995 p 1)

24 CHAPTER 2 THEORETICAL BACKGROUND

Figure 21 Syntactic Trees of the sentences (1) and (2)

(2a) [Sentiment[StimulusI l f umo]tor ment a[Experiencer

M ar i aI l mi o g at tolowastI l tel evi sor e

]]

ldquoThe smoke harasses (Maria + My cat + The television)rdquo

According with (Gross 1979 p 872) the problem is that

ldquorewriting rules (eg S rarr N P V P V P rarr V N P ) describeonly LOCAL dependencies [] But there are numerous syn-tactic situations that involve non-local constraintsrdquo

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 25

While the Lexicon-Grammar and the operator grammar represen-tations do not change the representations of the paraphrases of (12)with support verbs (1b2b) further complicate the generative gram-mar solution that still does not correctly represent the relationshipsamong the operators and the arguments (see Figure 22) It fails todetermine the correct number of arguments (that are still two andnot three) and does not recognize the predicative function which isplayed by the nouns odio ldquohaterdquo and tormento ldquoharassmentrdquo and notby the verbs provare ldquoto feelrdquo and dare ldquoto giverdquo

(1b) [Sentiment[ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

(2b) [Sentiment[StimulusIl fumo] dagrave il tormento a [ExperiencerMaria]]ldquoThe smoke gives harassment to Mariardquo

For all these reasons for the computational treatment of subjectiv-ity emotions and opinions in raw text we preferred in this work theLexicon-Grammar framework to other linguistic approaches despiteof their popularity in the NLP literature

213 Lexicon-Grammar Binary Matrices

We said that the difference between the LG language description andany other linguistic approach regards the systematic formalization ofa very broad quantity of data This means that given a set of proper-ties P that define a class of lexical items L the LG method collects andformalizes any single item that enters in such class (extensional classi-fication) while other approaches limit their analyses only to reducedsets of examples (intensional classification) (Elia et al 1981)Because the presentation of the information must be as transparentrigorous and formalized as possible LG represents its classes throughbinary matrices like the one exemplified in Table 21

They are called ldquobinaryrdquo because of the mathematical symbols ldquo+rdquo

26 CHAPTER 2 THEORETICAL BACKGROUND

Figure 22 Syntactic Trees of the sentences (1b) and (2b)

and ldquondashrdquo respectively used to indicate the acceptance or the refuse ofthe properties P by each lexical entry LThe symbol X unequivocally denotes an LG class and is always asso-ciated to a class definitional property simultaneously accepted by allthe items of the class (in the example of Table 21 P1) Subclasses of Xcan be easily built by choosing another P as definitional property (egL2 L3 and L4 under the definitional substructure P2)The sequences of ldquo+rdquo and ldquondashrdquo attributed to each entry constitute the

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 27

X P1 P2 P3 P4 Pn

L1 + ndash ndash ndash +L2 + + ndash ndash ndashL3 + + + + +L4 + + + + +Ln + ndash ndash ndash +

Table 21 Example of a Lexicon-Grammar Binary Matrix

ldquolexico-syntactic profilesrdquo of the lexical items (Elia 2014a)The properties on which the classification is arranged can regard thefollowing aspects (Elia et al 1981)

distributional properties by which all the co-occurrences of the lexi-cal items L with distributional classes and subclasses are testedinside acceptable elementary sentence structures

structural and transformational properties through which all thecombinatorial transformational possibilities inside elementarysentence structures are explored (see Section 211 for more in-depth explanations)

It is possible for a lexical entry to appear in more than one LG ma-trix In this case we are not referring to a single verb that accepts di-vergent syntactic and semantic properties but instead to two differ-ent verb usages attributable to the same morpho-phonological entry(Elia et al 1981 Vietri 2004 DrsquoAgostino 1992) Verb usages possessthe same syntactic and semantic autonomy of two verbs that do notpresent any morpho-phonological relation

214 Syntax-Semantics Interface

In the sixties the hot debate about the syntax-semantics Interface op-posed theories supporting the autonomy of the syntax from the se-mantics to approaches sustaining the close link between these two

28 CHAPTER 2 THEORETICAL BACKGROUND

linguistic dimensionsChomsky (1957) postulated the independence of the syntax from thesemantics due to the vagueness of the semantics and because of thecorrespondence mismatches among these two linguistic dimensionsbut recognized the existence of a relation between the two that de-serves to be explored

ldquoThere is however little evidence that lsquointuition aboutmeaningrsquo is at all useful in the actual investigation of lin-guistic form []It seems clear then that undeniable though only imperfectcorrespondences hold between formal and semantic fea-tures in language The fact that the correspondences are soinexact suggests that meaning will be relatively useless as abasis for grammatical description []The fact that correspondences between formal and seman-tic features exist however cannot be ignored [] Havingdetermined the syntactic structure of the language we canstudy the way in which this syntactic structure is put to usein the actual functioning of language An investigation ofthe semantic function of level structure () might be a rea-sonable step towards a theory of the interconnections be-tween syntax and semanticsrdquo (Chomsky 1957 p 94-102)

The contributions that come from the generative grammar withparticular reference to the theta-theory (Chomsky 1993) assume thepossibility of a mapping between syntax and semantics

ldquoEvery content bearing major phrasal constituent of a sen-tence (S NP AP PP ETC) corresponds to a conceptual con-stituent of some major conceptual category11rdquo (Jackendoff1990 p 44)

11Object Event State Action Place Path Property Amount

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 29

But their autonomy is not negated through this assumption the-matic roles are still considered to be part of the level of conceptualstructure and not part of syntax (Jackendoff 1990) This is confirmedagain by (Chomsky 1993 p 17) when stating that ldquo() the mappingof S-structures onto PF and LF are independent from one anotherrdquoHere the S-structure generated by the rules of the syntax is associ-ated by means of different interpretative rules with representations inphonetic form (PF) and in logical form (LF)Many contributions from the linguistic community define the con-cept of thematic roles through the linking problem between syntax(syntactic realization of predicate argument structures that determinethe roles) and semantics (semantic representation of such structures)(Dowty 1989 Grimshaw 1990 Rappaport Hovav and Levin 1998)Among the canonical thematic roles we mention the agent (Gruber1965) the patient Baker (1988) the experiencer (Grimshaw 1990) thetheme (Jackendoff 1972) ect

2141 Frame Semantics

ldquoSome words exist in order to provide access to knowledgeof such frames to the participants in the communicationprocess and simultaneously serve to perform a categoriza-tion which takes such framing for granted (Fillmore 2006p 381)

With these words Fillmore (2006) depicted the Frame Semanticswhich describes sentences on the base of predicators able to bring tomind the semantic frames (inference structures linked through lin-guistic convention to the lexical items meaning) and the frame ele-ments (participants and props in the frame) involved in these frames(Fillmore 1976 Fillmore and Baker 2001 Fillmore 2006)A frame semantic description starts from the identification of the lex-ical items that carry out a given meaning and then explores theways in which the frame elements and their constellations are realized

30 CHAPTER 2 THEORETICAL BACKGROUND

around the structures that have such items as head (Fillmore et al2002)This theoretical framework poses its bases on the Case grammar astudy on the combination of the deep cases chosen by verbs an on thedefinition of case frames that have the function of providing ldquoa bridgebetween descriptions of situations and underlying syntactic represen-tationsrdquo by attributing ldquosemantico-syntactic roles to particular partic-ipants in the (real or imagined) situation represented by the sentence(Fillmore 1977 p 61) A direct reference of Fillmorersquos work is the va-lency theory of Tesniegravere (1959)Based on these principles the FrameNet research project produced alexicon of English for both human use and NLP applications (Bakeret al 1998 Fillmore et al 2002 Ruppenhofer et al 2006) Its purposeis to provide a large amount of semantically and syntactically anno-tated sentences endowed with information about the valences (com-binatorial possibilities) of the items derived from annotated contem-porary English corpus Among the semantic domains covered thereare also emotion and cognition (Baker et al 1998)For the Italian language Lenci et al (2012) developed LexIt a toolthat following the FrameNet approach automatically explores syn-tactic and semantic properties of Italian predicates in terms of distri-butional profiles It performs frame semantic analyses using both LaRepubblica corpus and the Wikipedia taxonomy

2142 Semantic Predicates Theory

The Lexicon-Grammar framework offers the opportunity to creatematches between sets or subsets of lexico-syntactic structures andtheir semantic interpretations The base of such matches is the con-nection between the arguments selected by a given predicative itemlisted in our tables and the actants involved by the same semanticpredicate According with (Gross 1981 p 9)

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 31

ldquoSoit Sy lrsquoensemble des formes syntaxiques drsquoune langue []Soit Se un ensemble drsquoeacuteleacutements de sens les eacuteleacutements de Syseront associeacutes agrave ceux de Se par des regravegles drsquointerpreacutetation[] nous appellerons actants syntaxiques les sujet et com-pleacutement(s) du verbe tels qursquoils sont deacutecrits dans Sy nous ap-pelons arguments les variables des preacutedicats seacutemantiquesDans certains exemples il y a correspondance biunivoqueentre actants et arguments entre phrase simple et preacutedi-catrdquo12

This is the basic assumption on which the Semantic Predicates the-ory has been built into the LG framework It postulates a complexparallelism between the Sy and the Se that differently from other ap-proaches can not ignore the idiosyncrasies of the lexicon since ldquoilnrsquoexiste pas deux verbes ayant le mecircme ensemble de proprieacuteteacutes syntax-iquesrdquo (Gross 1981 p 10) Actually the large number of structures inwhich the words can enter underlines the importance of a sentimentlexical database that includes both semantic and (lexicon dependent)syntactic classification parameters

Therefore the possibility to create intuitive semantic macro-classifications into the LG framework does not contrast with the au-tonomy of the semantics from the syntax (Elia 2014b)In this thesis we will refer to following three different predicates all ofthem connected to the expression of subjectivity

Sentiment (eg odiare to hate) that involves the actants experi-encer and stimulus

Opinion (eg difendere to stand up for) that implicates the ac-tants opinion holder and target of the opinion

12 ldquoLet Sy be the set of syntactic forms of a language [] Let it Se be the set of semantic elementsthe elements from Sy will be associated to the ones of Se through interpretation rules [] We will callsyntactic actants the subject and the complement(s) of the verb in the way in which they are describedin Sy we will call arguments the variables of semantic predicates In some examples there is a one-to-one correspondence between actants and arguments between simple sentence and predicaterdquo Authortranslation

32 CHAPTER 2 THEORETICAL BACKGROUND

Figure 23 LG syntax-semantic interface

Physical act (eg baciare to kiss) that evokes the actants patientand agent

As an example the predicate in the sentence (1) from the LG class43 will be associated to a Predicate with two variables described bythe mathematical function Caus(ssenth)

(1) [Sentiment[ExperiencerMaria] odia [Stimulusil fumo]]]ldquoMaria hates the smokerdquo

The rules of interpretations which have been followed are

1 the experiencer (h in the Se) corresponds to the formal subject(N0 in Sy)

2 the sentiment stimulus (t in the Se) is the human complement(N1 in Sy)

As shown in the few examples below the syntactic transformationsin which the same predicate is involved preserve the role played by itsarguments

Nominalization

21 THE LEXICON-GRAMMAR THEORETICAL FRAMEWORK 33

(1b) [Sentiment [ExperiencerMaria] prova odio per [Stimulusil fumo]]ldquoMaria feels hate for the smokerdquo

Adjectivalization

(1c) [Sentiment [Stimulusil fumo] egrave odioso per [ExperiencerMaria]]ldquoThe smoke is hateful for Mariardquo

Passivization

(1d) [Sentiment [Stimulusil fumo] egrave odiato da [ExperiencerMaria]]ldquoThe smoke is hated by Mariardquo

Repositioning

(1e) [Sentiment egrave il [Stimulusil fumo] che [ExperiencerMaria] odia]ldquoIt is the smoke that Maria hatesrdquo

As concerns the link among actants and arguments the back-ground of the research is the LEG-Semantic Role Labeling system ofElia (2014b) with particular reference to its psychological evaluativeand bodily predicates The granularity of the verb properties in thisSRL system are evidence of the strong dependence of the syntax fromthe lexicon and proof of its absolute independence from semanticsSpecial kinds of Semantic Predicates have been already used in NLPapplications into a Lexicon-Grammar context we mention Vietri(2014a) Elia et al (2010) Elia and Vietri (2010) that formalized andtested the Transfer Predicates on the Italian Civil Code Elia et al(2013) that focused on the Spatial Predicates and Maisto and Pelosi(2014b) Pelosi (2015b) Elia et al (2015) that exploited the Psycholog-ical Semantic Predicates for Sentiment Analysis purposes

34 CHAPTER 2 THEORETICAL BACKGROUND

22 The Italian Module of Nooj

Nooj is the NLP tool used in this work for both the language formal-ization and the corpora pre-processing and processing at the ortho-graphical lexical morphological syntactic and semantic levels (Sil-berztein 2003)Among the Nooj modules that have been developed for more thantwenty languages by the international Nooj community the ItalianLingware has been built by the Maurice Gross Laboratory from theUniversity of SalernoThe Italian module for NooJ can be any time integrated with other adhoc resources in form of electronic dictionaries and local grammarsThe freely downloadable Italian resources13 include

bull electronic dictionaries of simple and compound words

bull an electronic dictionary of proper names

bull an electronic dictionary of toponyms

bull a set of morphological grammars

bull samples of syntactic grammars

This knowledge and linguistic based environment is not in contrastwith hybrid and statistical approaches that often require annotatedcorpora for their testing stagesThe Nooj annotations can go through the description of simple wordforms multiword units and even discontinuous expressionsFor a detailed description of the potential of the Italian module ofNooj see Vietri (2014c)

13wwwnooj-associationorg

22 THE ITALIAN MODULE OF NOOJ 35

221 Electronic Dictionaries

If we tried to digit on a search engine the key-phrase ldquoelectronic dic-tionaryrdquo the first results that would appear in the SERP would concernCD-ROM dictionaries or machine translatorsActually electronic dictionaries (usable with all the computerizeddocuments for the text recognition for the information retrieval andfor the machine translation) should be conceived as lexical databasesintended to be used only by computer applications rather than by awide audience who probably wouldnrsquot be able to interpret data codesformalized in a complex wayIn addition to the target the differences between the two types of dic-tionary can be summarized in three points

Completeness human dictionaries may overlook information thatthe human users could extract from their encyclopedic knowl-edge Electronic dictionaries instead must necessarily be ex-haustive They cannot leave anything to chance the computermustnrsquot have possibility of error

Explanation human dictionaries can provide part of the informationimplicitly because they can rely on human userrsquos skills (intuitionadaptation and deduction) Electronic dictionaries on the con-trary must respect this principle the computer can only performfully explained symbols and instructions

Coding unlike human dictionaries all the information provided intothe electronic dictionaries related to the use of software for auto-matic processing of data and texts must be accurate consistentand codified

Joining all these features enlarges the size and magnifies the com-plexity of monolingual and bilingual electronic dictionaries whichare for this reason always bigger than the human onesThe Italian electronic dictionaries system based on the Maurice

36 CHAPTER 2 THEORETICAL BACKGROUND

Grossrsquos Lexicon-Grammar has been developed at the University ofSalerno by the Department of Communication Science it is groundedon the European standard RELEX

222 Local grammars and Finite-state Automata

Local grammars are algorithms that through grammatical morpho-logical and lexical instructions are used to formalize linguistic phe-nomena and to parse texts They are defined ldquolocalrdquo because despiteany generalization they can be used only in the description and anal-ysis of limited linguistic phenomenaThe reliability of ldquofinite state Markov processesrdquo (Markov 1971) hadalready been excluded by (Chomsky 1957 p 23-24) who argued that

ldquo() there are processes of sentence-formation that finitestate grammars are intrinsically not equipped to handle Ifthese processes have no finite limit we can prove the literalinapplicability of this elementary theory If the processeshave a limit then the construction of a finite state grammarwill not be literally out of the question since it will be possi-ble to list the sentences and a list is essentially a trivial finitestate grammar But this grammar will be so complex that itwill be of little use or interestrdquo

and remarked that

ldquoIf a grammar does not have recursive devices () it will beprohibitively complex If it does have recursive devices ofsome sort it will produce infinitely many sentencesrdquo

Instead Gross (1993) caught the potential of finite state machinesespecially for those linguistic segments which are more limited from acombinatorial point of viewThe Nooj software actually is not limited to finite-state machines andregular grammars but relies on the four types of grammars of the

22 THE ITALIAN MODULE OF NOOJ 37

Figure 24 Deterministic Finite-State Automaton

Figure 25 Non deterministic Finite-State Automaton

Figure 26 Finite-State Transducer

Figure 27 Enhanced Recursive Transition Network

38 CHAPTER 2 THEORETICAL BACKGROUND

Chomsky hierarchy In fact as we will show in this research it is ca-pable to process context free and sensitive grammars and to performtransformations in cascadeFinite-State Automata (FSA) are abstract devices characterized by a fi-nite set of nodes or ldquostatesrdquo (Si) connected one another by transitions(tj) that allow us to determine sequences of symbols related to a par-ticular path (see Figure 24)These graphs are read from left to right or rather from the initial state(S0) to the final state (S3) (Gross 1993) FSA can be deterministic suchas the one in Figure 24 or non deterministic if they activate morethen one path as in Figure 25 A Finite-State Transducer (FST) tans-duces the input symbols (Sii) into the output ones (Sio) as in Figure26 A Recursive Transition Network (RTN) is a grammar that containsembedded graphs (the S3 node in Figure 27) An Enhanced RecursiveTransition Network (ERTN) is a more complex grammar that includesvariables (V) and constraints (C) (Silberztein 2003 see Figure 27)For simplicity from now on we will just use the acronym FSA in orderto indicate all the kinds of local grammars used to formalize differentlinguistic phenomenaIn general the nodes of the graphs can anytime recall entries or se-mantic and syntactic information from the electronic dictionariesFSA are inflectional or derivational if the symbols in the nodes aremorphemes syntactic if the symbols are words

Chapter 3

SentIta Lexicon-Grammarbased Sentiment Resources

31 Lexicon-Grammar of Sentiment Expressions

The LG theoretical framework with its huge collections and catego-rization of linguistic data in the form of LG tables built and refinedover years by its researchers offers the opportunity to systematicallyrelate a large number of semantic and syntactic properties to the lex-icon The complex intersection between semantics and syntax thegreatest benefit of this approach corrects the worse deficiency ofother semantic strategies proposed in the Sentiment Analysis litera-ture the tendency to group together only on the base of their meaningwords that have nothing in common from the syntactic point of view(Le Pesant and Mathieu-Colas 1998)1

1 On this purpose (Elia 2013 p 286) illustrates that

ldquo() the semantic intuition that drives us to put together certain predicates and theirargument is not correlated to the set of syntactic properties of the verbs nor is it laquohelpedraquoby it except in a very superficial way indeed we might say that the granular nature of thesyntax in the lexicon would be an obstacle for a mind that is organized in an efficient and

39

40 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ldquoClasses drsquoobjetsrdquo is the expression introduced by Gross (1992a) to in-dicate the linguistic device that allow the language formalization fromboth a syntactic and semantic point of view Arguments and Predi-cates are classified into subsets which are semantically consistent andthat also share lexical and grammatical properties (De Bueriis andElia 2008)According with (Gross 2008 p 1)

ldquoSi un mot ne peut pas ecirctre deacutefini en lui-mecircme crsquoest-agrave-direhors contexte mais seulement dans un environnement syn-taxique donneacute alors le lexique ne peut pas ecirctre seacutepareacute de lasyntaxe crsquoest-agrave-dire de la combinatoire des mots La seacuteman-tique nrsquoest pas autonome non plus elle est le reacutesultat des eacuteleacute-ments lexicaux organiseacutes drsquoune faccedilon deacutetermineacutee (distribu-tion) Qursquoil est soit ainsi est confirmeacute par les auteurs de dic-tionnaires eux-mecircmes qui timidement et sans aucune meacuteth-ode notent pour un preacutedicat donneacute le ou les arguments quipermettent de seacuteparer un emploi drsquoun autre Pas de seacuteman-tique sans syntaxe donc crsquoest-agrave-dire sans contexterdquo2

SentIta is a sentiment lexical database that directly aims to apply theLexicon-Grammar theory starting from its basic hypothesis the min-imum semantic units are the elementary sentences not the words(Gross 1975)Therefore in this work the lemmas collected into the dictionaries andtheir Semantic Orientations are systematically recalled and computedinto a specific context through the use of local grammars On the baseof their combinatorial features and co-occurrences contexts the Sen-

rigorous logic wayrdquo

2ldquoIf a word can not be defined in itself namely out of context but only into a specific syntactic environ-ment then the lexicon can not be separated from the syntax that is to say from the word combinatoricsSemantics is not autonomous anymore it is the result of lexical elements organized into a determinedway (distribution) This is confirmed by the dictionary authors themselves that timidly and without anymethod note for a specific predicate the argument(s) that allow to separate one use from another Nosemantics without syntax that means without contextrdquo Authorrsquos translation

31 LEXICON-GRAMMAR OF SENTIMENT EXPRESSIONS 41

tIta lexical items can take the following shapes (Buvet et al 2005 Elia2014a)

operators the predicates that can be verbs nouns adjectives ad-verbs multiword expressions prepositions and conjunctions

arguments the predicate complements that can be nominal andprepositional groups or entire clauses

As already specified in Section 22 the sentiment words and other ori-ented Atomic Linguistic Units (ALU) are listed into Nooj electronicdictionaries while the local grammars used to manipulate their po-larities are formalized thanks to Finite State AutomataElectronic dictionaries have then been exploited in order to list andto semantically and syntactically classify into a machine readableformat the sentiment lexical resources The computational powerof Nooj graphs has instead been used to represent the accep-tancerefuse of the semantic and syntactic properties through the useof constraint and restrictionsChapter 2 underlined the role played in the elementary sentences bythe Predicates and the one assumed by its arguments According toLe Pesant and Mathieu-Colas (1998)

ldquoLa structure preacutedicatarguments semble plus opeacuteratoire queles deacutecoupages binaires classiques (sujetpreacutedicat)[] Crsquoest le preacutedicat qui deacutetermine le nombre de positionsconstitutives de la phraserdquo 3

Predicates must not be identified only with the category of verbs butalso with predicative nouns and adjectives that enter into supportverb structuresIn the subgroup of sentiment expressions starting from their syntacticstructures Gross identified two semantic types

3The predicateargument structure seems to be more operating than the classical binary subdivision(subjectpredicate) [] It is the predicate that determines the number of positions that constitute thesentence Authorrsquos translation

42 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1 sentences with two arguments where one is the person and theother is the feeling (eg Maria egrave angosciata ldquoMaria is anguishedrdquo)

2 sentences with three arguments that differs from the first one forthe presence of a stimulus that triggers the feeling into anotherperson (eg Gianni angoscia Maria ldquoGianni anguishes Mariardquo)

The notation for their representation is the same as the one usedfor the Semantic Predicates (Gross 1981) The type 1 can be formal-ized by the following formula

1 P(senth) eg P(angoscia Maria)

in which the variable h is function of a sentiment Sent through a pred-icative relationshipThe type 2 instead can be expressed by the following formula

2 Caus(s sent h) eg Caus(Gianni angoscia Maria)

in which a sentiment Sent is caused by a stimulus s on a person hBoth of them are outcomes of the following assumption ldquoun senti-ment est toujours attacheacute agrave la personne qui lrsquoeacuteprouverdquo4 (Gross 1995p 1) At first glance the problem connected to sentiment expressionsseems to be easy to define and solve Actually upon a deeper analysisit presents its challenging aspects the quantity and the variability ofsentiment expressions which can vary both from a transformationaland from a morpho-syntactic point of view As displayed below a sen-timent expressed by a word can take the shape of verbs nouns ad-jectives or adverbs (Gross 1995 DrsquoAgostino 2005 DrsquoAgostino et al2007)

V angosciare ldquoto anguishrdquo angosciarsi ldquoto become anguishedrdquo

N angoscia ldquoanguishrdquo

A angosciante ldquoanguishingrdquo angosciato angoscioso ldquoanguishedrdquo

ADV angosciatamente angosciosamente ldquowith anguishrdquo4ldquoA sentiment is always attached to the person that feels itrdquo Authorrsquos translation

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 43

Such groups of words related one another by the derivational mor-phology under a common root (eg angosc-) represent from the LGpoint of view classes of paraphrastic equivalence or also paraphras-tic constellations (DrsquoAgostino 1992) These concepts are crucial whenapplied to the Sentiment Analysis field because the connection of thewords displayed above is related to a formal principle that regards thesentence structure and also a semantic principle that concerns thedistributional properties and the semantic roles of the arguments se-lected by the predicates (DrsquoAgostino 1992)Nevertheless from a syntactic point of view Gross (1995) noticed thatthe terms of sentiment can occupy all the different nominal adjecti-val verbal or adverbial positions within the same sentence structure

32 Literature Survey on Subjectivity Lexicons

The most used approaches in the Sentiment Analysis field can be di-vided and summarized in three main lines of research lexicon-basedmethods learning and statistical methods and hybrid methodsThe first ones coincide with the strategy that has been chosen to carryout the present workLexicon-based approaches always start from the following assump-tion the text sentiment orientation comes from the semantic orien-tations of words and phrases contained in itThe most commonly used SO indicators are adjectives or adjectivephrases (Hatzivassiloglou and McKeown 1997 Hu and Liu 2004Taboada et al 2006) but recently it became really common the useof adverbs (Benamara et al 2007) nouns (Vermeij 2005 Riloff et al2003) and verbs as well (Neviarouskaya et al 2009a)Hand-built lexicons are definitely more accurate than theautomatically-built ones especially in cross-domain SentimentAnalysis tasks Nevertheless to manually draw up a dictionary is

44 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

considered a painstakingly time-consuming activity (Taboada et al2011 Bloom 2011) This is why the presence of a large number ofstudies on automatic polarity lexicons creation and propagation canbe observed in literature On this topic it must be said that automaticdictionaries seem to be more unstable but usually larger than themanually built ones (see Tables 31 and 32) Size anyway does notmean always quality it is common for these large dictionaries tohave scarcely detailed information a large amount of entries coulddenote fewer details in description eg the Maryland dictionary(Mohammad et al 2009) does not specify the entriesrsquo part of speechor instead could mean more noiseAmong the state of the art methods used to build and test thosedictionaries we mention the Latent Semantic Analysis (LSA) (Lan-dauer and Dumais 1997) bootstrapping algorithms (Riloff et al2003) graph propagation algorithms (Velikovich et al 2010 Kaji andKitsuregawa 2007) conjunctions (and or but) and morphologicalrelations between adjectives (Hatzivassiloglou and McKeown 1997)Context Coherency (Kanayama and Nasukawa 2006a) distributionalsimilarity (Wiebe 2000) etc5Word Similarity is a very frequently used method in the dictionarypropagation over the thesaurus-based approaches Examples are theMaryland dictionary created thanks to a Roget-like thesaurus and ahandful of affixe (Mohammad et al 2009) and other lexicons basedon WordNet like SentiWordNet built on the base of quantitativeanalysis of glosses associated to synsets (Esuli and Sebastiani 20052006a) or other lexicons based on the computing of the distancemeasure on WordNet (Kamps et al 2004 Esuli and Sebastiani 2005)Pointwise Mutual Information (PMI) using Seed Words has beenapplied to sentiment lexicon propagation by Turney (2002) Turneyand Littman (2003) Rao and Ravichandran (2009) Velikovich et al(2010) Gamon and Aue (2005) Seed words are words which arestrongly associated with a positivenegative meaning such as eccel-

5More details about the lexicon propagation will be given in Section 351

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 45

lente (ldquoexcellentrdquo) or orrendo (ldquohorriblerdquo) by which it is possible tobuild a bigger lexicon detecting other words that frequently occuralongside them It has been observed indeed that positive wordsoccur often close to positive seed words whereas negative words arelikely to appear around negative seed words (Turney 2002 Turneyand Littman 2003)PMI could be easily calculated in the past using the Web as a corpusthanks to the AltaVistarsquos NEAR operator Today one must be contentwith the Google AND operatorLearning and statistical methods for Sentiment Analysis intent usuallymake use of Support Vector Machine (Pang et al 2002 Mullen andCollier 2004 Ye et al 2009) or Naiumlve Bayes classifiers (Tan et al 2009Kang et al 2012)In the end as regards the hybrid methods it has to be cite the workof Read and Carroll (2009) Li et al (2010) Andreevskaia and Bergler(2008) Dang et al (2010) Dasgupta and Ng (2009) Goldberg and Zhu(2006) Prabowo and Thelwall (2009) and Wan (2009)6A brief summary about the nature and the size of the most popularlexicons for the Sentiment Analysis is given in Tables 31 and 32A clarification needs to be done about the distinction between Polar-

ity lexicons and Affect lexicon The first ones are used in subjectivitydetection or in polarity and intensity classification systems while thelatter going beyond the dichotomy positivenegative refer to a morefine-grained emotion classificationIn the following paragraphs we will firstly describe some of the mostfamous Polarity lexicon and then we will briefly cite some databasesfor affect classificationAnother distinction will be made among the resources conceived forthe English language and the ones created for other languages TheItalian language will especially be considered

6These contributions will be deepened in Section 511

46 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Dictionary Author

Year

En

trie

s

Lan

guag

e

SO-CAL lexicon Taboada 2011 5000+ EngAFINN-111 Hansen 2011 2400+ EngSentilex Silva 2010 7000+ PortQ-WordNet 30 Agerri et al 2010 15500+ EngDAL-2009 Whissell 2009 8700+ EngMPQA lexicon Wilson et al 2005 8000+ EngHu-Liu lexicon Hu et al 2004 6800+ EngHatzivassiloglou lexicon Hatzivassiloglou 1997 15400+ pairs EngGeneral Inquirer Stone 1961 9800+ Eng

Table 31 Manually built Polarity Lexicons

Dictionary Author

Year

En

trie

s

Lan

guag

eSentix Basile Nissim 2013 59700+ ItaSentiWordNet Baccianella et al 2010 38000+ EngWeb-generated lexicon Velikovich et al 2010 178000+ EngMaryland dictionary Mohammad et al e 2009 76000+ Eng

Table 32 (Semi-)Automatically built Polarity Lexicons

321 Polarity Lexicons

SO-CAL dictionary Taboada et al (2011) due to the low stability ofthe automatically generated lexical databases manually devel-oped the SO-CAL dictionary They hand tagged with an evalu-ation scale that ranged from +5 to -5 the semantically orientedwords they found into a variety of sources

bull the multi-domain collection of 400 reviews belonging to dif-ferent categories described in Taboada and Grieve (2004)Taboada et al (2006)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 47

bull 100 movie reviews from the Polarity Dataset of Pang et al(2002) Pang and Lee (2004)

bull the whole General Inquirer dictionary (Stone et al 1966)

The result was a dictionary of 2252 adjectives 1142 nouns 903verbs and 745 adverbs The adverb list has been automaticallygenerated by matching adverbs ending in ldquo-lyrdquo to their poten-tially corresponding adjective Moreover also a set of multiwordexpressions (152 phrasal verbs eg ldquoto fall apartrdquo and 35 intensi-fier expressions eg ldquoa little bitrdquo) have been taken into accountIn case of overlapping between a simple word (eg ldquofunrdquo +2) anda multiword expression (eg ldquoto make fun ofrdquo -1) with differentpolarity the latter possess the higher priority in the annotationprocess

AFINN It is a list of English words that associates 1446 words with avalence between +5 (positive) and -5 (negative) (Hansen et al2011) AFINN-111 that is its newest version includes 2477words and phrases in its list The drawing of the list started froma a set of obscene words and then gradually extended by examin-ing Twitter posts This has lead to a bias of 65 towards negativewords compared to positive words (Nielsen 2011)

Q-WordNet It is a lexical resource consisting of WordNet senses auto-matically annotated by positive and negative polarity (Agerri andGarciacutea-Serrano 2010) It is grounded on the hypothesis that if apositive synset is matched in a gloss then its synset can be alsoannotated as positive Exceptions are the cases under the scopeof a negation that are classified as the opposite of the matchedlemmaWhat differentiates Q-WordNet from SentiWordNet is the factthat positivenegative labels are attributed to word senses at acertain level by means of a graded classification

Multi-Perspective Question Answering Lexicon The MPQA subjectiv-

48 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

ity lexicon (Wiebe et al 2004 Wilson et al 2005) is comprised by8000+ subjectivity clues (words and phrases that may be used toexpress private states) annotated by type (strongsubj or weakly-subj) and prior polarity Subjectivity clues and contextual fea-tures have been learned from two different kinds of corpora asmall amount of data manually annotated at the expression-level from the Wall Street Journal and newsgroup data and alarge amount of data with existing document-level annotationsfrom the Wall Street Journal The focus is on three types of sub-jectivity clues

bull hapax legomena words that appeared just once in the cor-pus

bull collocations identified through fixed n-grams

bull adjective and verb features identified using the results of amethod for clustering words according to distributional sim-ilarity

Hu-Liu lexicon The Hu and Liu (2004) wordlist comprises around6800 English items (2006 positive and 4783 negative) includ-ing slang misspellings morphological variants and social mediamark up

Hatzivassiloglou Lexicon Hatzivassiloglou and McKeown (1997)starting from a list of 1336 manually labeled adjectives (657 pos-itive and 679 negative terms) demonstrated that conjunctionsbetween adjectives can provide indirect information about theirorientation In detail the authors extracted the conjunctionsthrough a two-level finite-state grammar able to cover modifi-cation patterns and noun-adjective apposition and tested theirmethod on 21 million word from the Wall Street Journal corpusThis way they collected 13426 conjunctions of adjectives furtherexpanded to a total of 15431 conjoined adjective pairs

Dictionary of Affect in Language The Whissel (1989) Whissell (2009)

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 49

Dictionary of Affect (DAL) lists about 4500 English words Itsmore recent version (Whissell 2009) is provided with 8742 wordsmanually annotated fro their Activation Evaluation ImageryWhissell (2009)

General Inquirer It is an IBM 7090 program system developed at Har-vard in the spring of 1961 for content analysis research problemsin behavioral sciences (Stone et al 1966 Stone and Hunt 1963)It is based on the idea that through the analysis of many vari-ables at once it is often possible to discover trends that can ap-pear ldquolatent to casual observationThe description of the ldquosystematicrdquo content analysis processesmust be as ldquoobjectiverdquo as possible the features of texts must beldquomanifestrdquo and the results ldquoquantitativerdquoThe lexicon attaches syntactic semantic and pragmatic infor-mation to part-of-speech tagged words Among others the cate-gories related to polarity measures are the ones pertaining to thethree semantic dimensions of Osgood (1952)

Positive 1915 words of positive outlook

Negative 2291 words of negative outlook

Strong 1902 words implying strength

Weak 755 words implying weakness

Active 2045 words implying an active orientation

Passive 911 words indicating a passive orientation7

SentiWordNet Esuli and Sebastiani (2006a) developed the Senti-WordNet 10 lexicon based on WordNet 20 (Miller 1995) by auto-matically associating each WordNet synset to three scores Obj(s)for objective terms and Pos(s) and Neg(s) for positive and nega-tive terms

7Table 31 refers to the General Inquirer size only in terms of sentiment words

50 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Every score ranges from 00 to 10 and their sum is 10 for eachsynset The values are determined on the base of the propor-tion of eight ternary classifiers (with similar accuracy levels butdifferent classification behavior) that quantitatively analyze theglosses associated with every synset and assign them the properlabelWhat differentiates SentiWordNet from other sentiment lexiconsis the fact that the weighting process focuses on synonym setsrather than on simple terms The researchers intuitively ob-serve that different senses of the same term could have differentopinion-related propertiesSentiWordNet 30 (Baccianella et al 2010) improves SentiWord-Net 10 The main differences between them are the version ofWordNet they annotate (30 for SentiWordNet 30) and the algo-rithm used to annotate WordNet that in the 30 version alongwith the semi-supervised learning step also includes a random-walk step that perfects the scores

Velikovich Web-generated lexicon This dictionary of sentiment thathas been built through graph propagation algorithms and lexi-cal graphs counts 178104 entries which include not only sim-ple words but also spelling variations slang vulgarity and mul-tiword expressions (Velikovich et al 2010)

Maryland dictionary The procedure on which Mohammad et al(2009) built the Maryland lexicon can be split into two steps theidentification of a seed set of positive and negative words and thepositive and negative annotation of the words synonymous in aRoget-like thesaurusEleven antonym-generating affix patterns have been used to gen-erate overtly marked words and their unmarked counterpartsThis way such patterns generated 2692 pairs of valid Englishwords listed in the Macquarie ThesaurusThe main innovation introduced by Mohammad et al (2009) is

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 51

the autonomy from the manual annotation of any term in thedictionary even if it could be used when available to improveboth the correctness and the coverage of its entries

322 Affective Lexicons

SentiSense SentiSense (de Albornoz et al 2012) is a semi-automatically developed affective lexicon which consists of5496 words and 2190 synsetsSimilarly to SentiWordNet or WordNet-Affect it associates emo-tional meanings or categories to concepts (instead of terms) fromthe WordNet lexical database and can be used for both polarityand intensity classification and emotion identificationSynsets are labeled with a set of 14 emotional categories whichare also related by an antonym relationships Examples are dis-gustlike lovehate sadnessjoy) ambiguous anger anticipa-tion calmness despair disgust fear hate hope joy like lovesadness surprise

SentiFul and the Affect Database Neviarouskaya et al (2009b 2011)developed the SentiFul lexicon by automatically expanding thecore of sentiment lexicon through many different strategies

bull synonymy relation

bull direct antonymy relation

bull hyponymy relations

bull derivation and scoring of morphologically modified words

bull compounding of sentiment-carrying base components

In summary 4190 new words connected to the known onesthrough semantic relations have been automatically extractedfrom WordNet and 4029 lemmas have been automatically de-

52 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

rived with the morphological method The starting point of thework has been the Affect Database of Neviarouskaya et al (2007)that contained 364 emoticons 337 popular acronyms and abbre-viations (eg bl ldquobelly laughingrdquo) words standing for commu-nicative functions (eg ldquowowrdquo ldquoouchrdquo etc) 112 modifiers (egldquoveryrdquo ldquoslightlyrdquo etc) and 2438 direct and indirect emotion-related entries (1627 of which taken from WordNet-Affect)In the Affect Database emotion categories (anger disgust fearguilt interest joy sadness shame an surprise) and intensity (from00 to 10) was manually assigned to each entry by three indepen-dent annotators

Appraisal Lexicon The appraisal lexicon (Argamon et al 2009) hasbeen built through semi-supervised learning on a set of WordNetglosses containing adjectives and adverbs among which only asmall subset of the training data was manually labelledThe construction of the lexicon is grounded on the AppraisalTheory that is a linguistic approach for the analysis of subjec-tive language In this framework several sentiment-related fea-tures are assigned to relevant lexical items The feature set thatallows more subtle distinctions than the most used binary classi-fication PositiveNegative includes the labels orientation (Posi-tive or Negative) attitude type (Affect Appreciation Judgment)and force (Low Median High or Max)

WordNet-Affect WordNet-Affect is another extension of WordNetfor the lexical representation of affective knowledge (Strappar-ava et al 2004) Thanks to the synset model it puts in relationa set of affective concepts with a number of affective words Ina first step in the WordNet-Affect creation the affective corehas been created starting from almost 2000 terms directly orindirectly referring to mental states Lexical and affective in-formation have then been added to each item by creating forthem specific frames In the end the senses of terms have been

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 53

mapped to their respective synsets in WordNet The synsets havebeen specialized with a hierarchically organized set of emotionalcategories (namely emotion mood trait cognitive state physi-cal state edonic signal emotion-eliciting situation emotional re-sponse behaviour attitude sensation) Furthermore the emo-tional valences have been distinguished by four additional a-labels positive negative ambiguous and neutral (Strapparavaet al 2006)WordNet-Affect contains 2874 synsets and 4787 words (Strap-parava et al 2004)

323 Subjectivity Lexicons for Other Languages

3231 Italian Resources

The largest part of the state of the art works on polarity lexicons forSentiment Analysis purposes focuses on the English language ThusItalian lexical databases are mostly created by translating and adapt-ing the English ones SentiWordNet and WordNet-Affect among oth-ers Basile and Nissim (2013) merged the semantic information be-longing to existing lexical resources in order to obtain an annotatedlexicon of senses for Italian Sentix (Sentiment Italian Lexicon) Ba-sically MultiWordNet (Pianta et al 2002) the Italian counterpart ofWordNet (Miller 1995 Fellbaum 1998) has been used to transfer po-larity information associated to English synsets in SentiWordNet toItalian synsets thanks to the multilingual ontology BabelNet (Navigliand Ponzetto 2012)Every Sentixrsquos entry is described by information concerning its part ofspeech its WordNet synset ID a positive and a negative score fromSentiWordNet a polarity score (from -1 to 1) and an intensity score(from 0 to 1)The dictionary contains 59742 entries for 16043 synsets Details

54 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

about its composition are reported in Table 33

PoS Lemmas Synsetsnoun 52257 12228adjective 3359 1805verb 2775 1260adverb 1351 750all 59742 16043

Table 33 Sentix Basile and Nissim (2013)

Moreover Steinberger et al (2012) verified a triangulation hypoth-esis for the creation of sentiment dictionaries in many languagesnamely English Spanish Arabic Czech French German Russian andalso ItalianThe starting point is the semi-automatic generation of two pilot sen-timent dictionaries for English and Spanish that have been automat-ically transposed into the other languages by using the overlap of thetranslations (triangulation) and then manually filtered and expandedAs regards the Italian lexicon the authors collected 918 correctly tri-angulated terms and 1500 correctly translated terms from the startingdictionariesHernandez-Farias et al (2014) achieved good results in the Sen-tiPolC 2014 task by semi-automatically translating in Italian differ-ent lexicons namely SentiWordNet Hu-Liu Lexicon AFINN LexiconWhissel Dictionary etcBaldoni et al (2012) proposed an ontology-driven approach to Sen-timent Analysis where tags and tagged resources are related to emo-tions They selected representative Italian emotional words and usedthem to query MultiWordNet The synsets connected to these lem-mas were then processed with WordNet-Affect in order to populatethe emotion ontology only with the words belonging to synsets thatrepresented affective information The resulting ontology containsabout 450 Italian words referring to the 87 emotional categories of On-

32 LITERATURE SURVEY ON SUBJECTIVITY LEXICONS 55

toEmotion an ontology conceived for the categorization of emotion-denoting words Furthermore thanks to the SentiWordNet databaseevery synset has been associated to the neutral positive or negativescores

3232 Sentiment Lexical Resources for Different Languages

SentiLex-PT (Silva et al 2010 2012) is a sentiment lexicon for Por-tuguese made up of 7014 lemmas distributed in this way

bull 4779 adjectives

ndash eg castigadoPoS=AdjTG=HUMN0POLN0=-1ANOT=JALC

bull 1081 nouns

ndash eg aberraccedilatildeoPoS=NTG=HUMN0POLN0=-1ANOT=MAN

bull 489 verbs

ndash eg enganarPoS=VTG=HUMN0N1POLN0=-1POLN1=0ANOT=MAN

bull 666 verb-based idiomatic expressions

ndash eg engolir em secoPoS=IDIOMTG=HUMN0POLN0=-1ANOT=MAN

The sentiment entries are all human nouns modifiers compiled fromdifferent publicly available corpora and dictionariesThe tagset used to describe the attributes of the lemmas is explainedbelow

bull part-of-speech (ADJ N V and IDIOM)

bull polarity (POL) that ranges between 1 (positive) and -1 (negative)

bull target of polarity (TG) which corresponds to a human subject(HUM)

bull polarity assignment (ANOT) that specifies if the annotation hasbeen performed manually (MAN) or automatically by the Judg-ment Analysis Lexicon Classifier (JALC)

56 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Differently from Silva et al (2010 2012) that focused on thedomain-specific features of the opinion lexicon of social judgmentSouza et al (2011) proposed for the Portuguese language a domain-independent lexicon composed by 7077 polar words (adjectivesverbs and nouns) and expressionsAs regards the dictionary population the authors applied three differ-ent methods

bull corpus based approach applied on 1317 documents annotatedusing the Pointwise Mutual Information

bull thesaurus based approach on 44077 words and their semanticrelations from the TEP thesaurus (Maziero et al 2008)

bull translation based from the Liursquos English Opinion Lexicon (Huand Liu 2004)

Other contribution that deserve to be cited are Perez-Rosas et al(2012) for Spanish Mathieu (2006 1999a 1995) for French Clematideand Klenner (2010) Waltinger (2010) and Remus et al (2010) for Ger-man Abdul-Mageed et al (2011) Abdul-Mageed and Diab (2012) forArabic and Chetviorkin and Loukachevitch (2012) for RussianAs regards Multilingual Sentiment Analysis we can mention Balahurand Turchi (2012) Steinberger et al (2012) Pak and Paroubek (2011)and Denecke (2008) among others

33 SentIta and its Manually-built Resources

In this Section we will go in depth in the description of SentIta the LGbased Sentiment lexicon for the Italian language that has been semi-automatically built on the base of the richness of both the Italian mod-ule of Nooj and the Italian Lexicon-Grammar resourcesThe tagset used for the Prior Polarity annotation Osgood (1952) anno-tation of the resources is composed of four tags

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 57

Lexical Item Translation Tag Description Scoremeraviglioso wonderful +POS+FORTE Intensely Positive +3colto cultured +POS Positive +2rasserenante calming +POS+DEB Weakly Positive +1nuovo new - Neutral 0insapore flavourless +NEG+DEB Weakly Negative -1cafone boorish +NEG Negative -2disastroso disastrous +NEG+FORTE Intensely Negative -3

Table 34 SentIta Evaluation tagset

Lexical Item Translation Tag Description Scoregrande big +FORTE Intense +velato veiled +DEB Weak -

Table 35 SentIta Strength tagset

POS positive

NEG negative

FORTE intense

DEB weak

Such labels if combined together can generate an evaluation scalethat goes from -3 to +3 (see Table 34) and a strength scale that rangesfrom -1 to +1 (see Table 35)Neutral words (eg nuovo ldquonewrdquo with score 0 in the evaluation scale)have been excluded from the lexiconThe main difference between the words listed in the two scales is thepossibility to use them as indicators for the subjectivity detection taskBasically the words belonging to the evaluation scale are ldquoanchorsrdquothat begin the identification of polarized phrases or sentences whilethe ones belonging to the strength scale are just used as intensitymodifiers (see Section 42)

Details about the lexical asset available for the Italian language issummarized in Table 36 Here we report all the items contained in

58 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Category Entries Example TranslationAdjectives 5381 allegro cheerfulAdverbs 3693 tristemente sadlyCompound Adv 774 a gonfie vele full steam aheadIdioms 577 essere in difetto to be in faultNouns 3215 eccellenza excellencePsych Verbs 604 amare to loveLG Verbs 651 prendersela to feel offendedBad words 182 leccaculo arse lickerTot 15077 - -

Table 36 Composition of SentIta

SentIta including the ones automatically derived (namely adverbsand nouns) that will be described in detail in Section 35Because hand-built lexicons are more accurate than the automaticallybuilt ones (Taboada et al 2011 Bloom 2011) we started the creationof the SentIta database with the manual tagging of part of the lemmascontained in the Nooj Italian dictionaries In detail adjectives andbad words have been manually extracted and evaluated starting fromthe Nooj Italian electronic dictionary of simple words preservingtheir inflectional (FLX) and derivational (DRV) properties Moreovercompound adverbs (Elia 1990) idioms (Vietri 1990 2011) psychverbs and other LG verbs (Elia et al 1981 Elia 1984) have beenweighted starting from the Italian LG tables in order to maintain thesyntactic semantic and transformational properties connected toeach one of them

331 Semantic Orientations and Prior Polarities

In the Sentiment Analysis field the Semantic Orientation is a sub-jectivity and an opinion measure that weights the polarity (posi-tivenegative) and the strength (intenseweak) of an opinion Other

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 59

concepts used to refer to SO are sentiment orientation polarity ofopinion or opinion orientation (Taboada et al 2011 Liu 2010)The Prior Polarity instead refers to the individual wordsrsquo SemanticOrientation and differs from the SO because it is always independentfrom the context (Osgood 1952) The second concept generally usedin sentiment dictionary population has been exploited also to manu-ally evaluate the words contained in the simple word dictionary of theItalian module of Nooj (Sdic_itdic)Before the description of the annotation process begins it must beclarified that at this stage of the work the multiple meanings thata single word can possess have not been taken into account In thissense SentIta is different from the resources in which Prior Polaritiesare assigned to ldquoconceptsrdquo rather than to mere words (eg SentiWord-Net) Such procedure gives rise to the need to reconstruct the Prior Po-larities starting from the various word meanings transforming themactually in Posterior Polarities (Gatti and Guerini 2012) An exam-ple is the word ldquocoldrdquo that possesses different meanings and conse-quently different polarities if associated to the words ldquobeerrdquo (with alow temperature neutral) or ldquopersonrdquo (emotionless weakly negative)(Gatti and Guerini 2012)From our point of view this distinction is redundant in a work that in-cludes a syntactic module We strongly support the idea that only thePrior Polarities must be contained in the lexicon and that the contextshould be used to compute them in a parallel module that is the syn-tactic oneIndeed our purpose is to reduce at the same time the presence of am-biguity the computational time and the size of the electronic dictio-naries Therefore the annotators of SentIta simply labeled the wordsof Sdic_it according to the meaning they thought dominant for eachword If out of any context the word ldquocoldrdquo can not be unequivo-cally associated to a positive or negative orientation we preferred toattribute it a neutral polarity

60 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

332 Adjectives of Sentiment

The annotation of the Sdic_it started with 34000+ adjectives byconsidering each word apart from any context The first decisionis binary and regards the choice between the positive or negativepolarity so the fist tags +POS or +NEG (that respectively correspondto the score +2 -2) are attached to the adjective under exam as shownin the following examples

coltoA+FLX=N88+DRV=ISSIMON88+POS

cafoneA+FLX=N88+DRV=ISSIMON88+NEG

A fine grained classification approach must include during thePrior Polarity attribution also the determination of the intensity ofthe oriented words The strength of the SO is therefore definedwith the combined use of the POSNEG tags and the FORTEDEBlabels The last two respectively increase or decrease of one point theintensity of the sentiment items This way as shown in Table 34 wecould take into account three different degrees of positivity and asmuch levels of negativity scoresAs we will explain better in Section 42 we selected among the ad-jectives of Sdic_it a list of about 650 words that did not receive anypolarity tag but that have just been labeled with information abouttheir intensity This is the case of the items reported below which arejust examples from the SentIta Strenght list

grandeA+FLX=N79+DRV=GRANDEN88+DRV=GRANDE-CN79+FORTE

velatoA+FLX=N88+DRV=ISSIMON88+DEB

Such words are not used to locate subjectivity in texts but they canbecome really advantageous when the task is to compute the actualorientation of a phrase The words endowed with the tag +FORTEare called intensifiers and the ones marked with +DEB downtoners(Taboada et al 2011) Their function is respectively to increase or

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 61

decrease the score of the polarized wordsDetails about the distribution of sentiment tag in the adjective dictio-nary are reported in Table 37Table 38 presents a summary in term of percentage values of thecomposition of the adjective dictionary in SentIta In the upper partof the Table percentages are evaluated on the base of SentIta that ac-counts 5381 adjectives which in the Table are indicated by tot ori-ented adj The differences between the positive and negative adjec-tives and the intensity modifiers are shown hereThe coverage of the SentIta adjectives with respect to the whole num-ber of adjectives contained in Sdic_it at a first glance seems to be verylow but if compared with the other part of speech it becomes moresignificantFor a LG treatment of adjectives see Section 335

Score Adjectives Entries+3 +POS+FORTE 111+2 +POS 1137+1 +POS+DEB 110-1 +NEG+DEB 770-2 +NEG 2375-3 +NEG+FORTE 240+ +FORTE 512- +DEB 126

Table 37 SentIta tag distribution in the adjective list

Adjectives Entries Percentage ()tot pos 1358 25 (SentIta)tot neg 3385 63 (SentIta)tot int 638 12 (SentIta)tot oriented adj 5381 16 (Sdic_it)tot neutral adj 28664 84 (Sdic_it)tot adj Sdic_it 34045 100

Table 38 Adjectives percentage values in SentIta

62 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3321 A Special Remark on Negative Words in Dictionaries and Positive Biasesin Corpora

The most important aspect to notice in Table 38 is the fact that thepositive items cover just the 25 of the total amount of oriented ad-jectives while with a percentage of 63 negative adjectives predom-inate the dictionaryJust as in SentIta also other sentiment lexicons present an excess ofnegative items such as the SO-CAL lexicon (Taboada et al 2011)AFINN-111 (Nielsen 2011) and the Maryland dictionary (Mohammadet al 2009)In texts instead exactly the opposite trend can be observed ThePollyanna hypothesis asserts the universal human tendency to usepositive words more frequently than negative ones through the pop-ular claim ldquopeople tend to look on (and talk about) the bright side ofliferdquo (Boucher and Osgood 1969 p 1)The experimental results of Montefinese et al (2014) and Warrineret al (2013) confirmed the so-called linguistic positivity bias theprevalence of positive words in different languagesGarcia et al (2012) also measured this bias by quantifying in the con-text of online written communication both the emotional contentof words in terms of valence and the frequency of word usage in thewhole indexable web They provided strong evidence that words witha positive emotional connotation are used more oftenAs regards the experiment in the Sentiment Analysis field that tried tosolve the positive bias we mention Taboada et al (2011) that uponnoticing the presence of almost twice as many positive as negativewords in their corpus addressed the problem by increasing the SO ofany negative expression by a fixed amount (of 50) Voll and Taboada(2007) overcame the same problem by shifting the numerical cut-offpoint between positive and negative reviewsIn this research we chose to avoid the introduction of unpredictableerrors due to the difficulty to accurately evaluate to what extent posi-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 63

tive and negative items are differently distributed in written texts

333 Psychological and other Opinion Bearing Verbs

In the present Section we will go through the more complex annota-tion of verbs of sentiment from available Lexicon-Grammar matricesWe will return to the adjectives of sentiment in Section 335 where itwill be addressed the issue of the adjectivalization of the psychologicalpredicates

3331 A Brief Literature Survey on the Classification of Psych Verbs

The difference among the direct (Experiencer subject) and reverse (Ex-periencer subject) representations of thematic roles is underlined inthe extensive body of scientific literature on psych verbs eg Chom-sky (1965) Belletti and Rizzi (1988) Kim and Larson (1989) Jackend-off (1990) Whitley (1995) Pesetsky (1996) Filip (1996) Baker (1997)Martiacuten (1998) Arad (1998) Klein and Kutscher (2002) Bennis (2004)Landau (2010)The thematic roles involved are the following

Experiencer the entity or the person that experiences the reaction

Cause ldquothe thing person state of affairs that causes the reaction(Whitley 1995) ldquoor cognitive judgment in the experiencerrdquo (Kleinand Kutscher 2002)

These two main classes8 correspond to three classes proposed byBelletti and Rizzi (1988) in which the reverse representation is splitaccording to the presence of a direct or indirect object mapped withthe role of experiecer

8These classes of psych verbs are also known as the Fear class (direct) and the Frighten class (reverse)(Jackendoff 1990 Pesetsky 1996 Baker 1997) on the base of the syntactic positions occupied by thethematic roles of the experiencer and the cause (theme)

64 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class I verbs which have the experiencer as subject and the cause asdirect object (eg temere ldquoto be afraid ofrdquo)

Class II verbs which have the cause as subject and the experiencer asdirect object (eg preoccupare ldquoto worryrdquo)

Class III verbs which have the cause as subject and the experienceras indirect object in dative case (eg piacere ldquoto likerdquo9)

Transitivity is taken into account also by Whitley (1995) who in-stead identified four classes of psych verbs for the Spanish

Class I transitive-direct (eg amar ldquoto loverdquo)

Class II intransitive-direct (eg gozar de ldquoto enjoyrdquo)

Class III intransitive-reverse (eg gustar ldquoto likerdquo)

Class IIII transitive-reverse (eg tranquilizar ldquoto appeaserdquo)

3332 Semantic Predicates from the LG Framework

Shifting the focus on the works realized into the Lexicon-Grammarframework we find again the distinction among two types of pos-sible constructions on the base of the syntactic position (subject orcomplement) of the person who feels the sentiment (Mathieu 1999aRuwet 1994) and of course the distinction between the transitiveand intransitive verbsThe (reverse) French verbs of sentiment that enter into the transitivestructure N0 V N1 have been examined by Mathieu (1995)10 In thisstructure N0 the stimulus of the sentiment can be a human concreteor abstract noun or a completive or infinitive clause and N1 theexperiencer of the sentiment is a human noun or also other animated

9Note that if the Italian verb piacere is classified in this class of Belletti and Rizzi (1988) its Englishtranslation ldquoto likerdquo belongs to the first one

10The analysis of Mathieu (1995) is based on the syntactic model of the LG French Table 4

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 65

nouns such as animals or gods (Ruwet 1994 see example 3)

(3)

Luc

Ce t abl eauLe mensong e

Que Luc soi t venuTr embler

i r r i te M ar i e (Mathieu 1995)

ldquo(Luc + This table + The lie + That Luc has come + To tremble) irri-tates Marierdquo

Nevertheless Ruwet (1994) argued that a number of predicatesfrom this class11 select an Num in position N1 that do not play the roleof Experiencer as happens in the example (5) differently from whathappens in (4) According with Ruwet (1994) le ministre of (4) doesnot feel any sentiment he is instead the object of a negative (public)judgment caused by the stimulus expressed by N0

(4) La lecture drsquoHomegravere passionne Maxime (Ruwet 1994)ldquoReading Homer moves Maximerdquo

(5) Ses deacuteclarations intempestives discreacuteditent le ministre (Ruwet 1994)ldquoHis untimely declarations discredit the Ministerrdquo

One of the main purposes of this thesis is to provide a Lexicon-Grammar based database for Sentiment Analysis tools We decided tostart from the LG works on psychological predicates because of theirrichness but we do not look at the Ruwet (1994) objection as a limit forour data collection It offers only a wider range of information whichwe associate to our data Sentiment Analysis does not focus only onmental states but searches for every kind of positive or negative state-ment expressed in raw texts therefore also a positive or negative (pub-lic) judgment represents a valuable resource in this fieldThe hypothesis is that the the Predicates under examination could bedistinguished into three subsets the ones indicating a mental state

11 Among the verbs pointed out by Ruwet (1994) there are abuser aigrir anoblir aplatir avilir bernerciviliser compromettre consacrer corrompre couler deacutedouaner deacutemasquer deacutenaturer etc

66 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

that select the roles experiencer and causer (amare ldquoto loverdquo odiareldquoto haterdquo) the ones connected to the manifestation of judgments that(following the Sentiment Analysis trend in terminology) involve theroles of opinion holder and target (eg difendere ldquoto defendrdquo con-dannare ldquoto condemnrdquo) and the ones related to physical acts that se-lect the roles patient and agent (eg baciare ldquoto kissrdquo schiaffeggiareldquoto slaprdquo)12 The mapping between the syntactic representations ofactants and the semantic representations of arguments is feasible andvaluable but from a LG point of view becomes a little bit more com-plex if compared with the generalizations described in the previousParagraph The big problem aroused by the LG method is that theSyntax-Semantics mapping can not ignore the idiosyncrasies of thelexiconIn the following paragraphs we will show the distribution of Psycho-logical Predicates among the classes 41 42 43 and 43B (3333) and wewill demonstrate that a large number of predicates evoking judgmentphysical positivenegative acts but also psychological states are dis-tributed among other 28 LG classes every one of which possesses itsown definitional syntactic structure The large number of structuresin which these verbs can enter underlines the importance of a senti-ment database that includes both lexiocn-based semantic and syntac-tic classification parameters

3333 Psychological Predicates

Among the verbs chosen for our sentiment lexicon a prominent posi-tion is occupied by the Predicates belonging to the Italian LG classes41 42 43 and 43B that constitute 47 of the total amount of verbsin SentIta (Table 311) From the lexical items contained in such LGclasses a list of 602 entries has been evaluated and hand-tagged withthe same labels used to evaluate the adjectives

12An experiment realized on these frames is reported in Section 53

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 67

In the following Paragraphs we will describe some of the peculiaritiesof the psych verbs from the classes 41 42 43 and 43B questioning inmany cases their psychological nature We will then briefly examinethe presence of subjective verbs in other 28 LG classes

Psych Verb Translation LG Class Scoreangosciare ldquoto anguish 41 -3piacere ldquoto like 42 +2amare ldquoto love 43 +3biasimare ldquoto blame 43B -2

Table 39 Examples from the Psych Predicates dictionary

LGC

lass

TotC

lass

Psy

chVe

rbs

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

()

41 599 525 136 325 45 19 8842 114 8 4 4 0 0 743 298 39 19 20 0 0 1343B 142 32 11 21 0 0 23Tot 1153 604 170 368 45 19 52

Table 310 Psych Predicates chosen for SentIta

LG Class 41 Ch F V N1

The verbs from this LG class have as definitional structure N0 V N1in which

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquobut with few exceptions it can be also a human a concrete or anabstract noun

68 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Sen

tIta

Verb

s

TotI

tem

s

Po

siti

veit

ems

Neg

ativ

eit

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Psych Verbs 604 170 368 45 19Other Verbs 651 189 350 79 33

Tot 1255 359 718 124 52

Table 311 All the Semantic Predicates from SentIta

bull N1 is a human noun (Num)

A great part of these verbs (436 entries) with homogeneous syntacticbehavior can be grouped together also because of a semantic homo-geneity connected to their psychological nature (Elia 1984) examplesof this subset are reported below in their dictionary form

addolorareV+NEG+Sent+FLX=V3+DRV=RI+41

ingelosireV+NEG+DEB+Sent+FLX=V201+41

perturbareV+NEG+Sent+FLX=V3+41

rallegrareV+POS+Sent+FLX=V3+DRV=RI+41

rincuorareV+POS+DEB+Sent+FLX=V3+41

tormentareV+NEG+FORTE+Sent+FLX=V3+DRV=RI+4113

This Psychological subgroup identified in the electronic dictionarythrough the tag +Sent exactly like the French verbs from the LG class4 enters into a reverse transitive psych-verb structure by selectingan Experiencer as object N1 and a Stimulus as N0 as exemplified in (6)

13ldquoTo grieve to make jealous to upset to cheer up to reassure to tormentrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 69

(6) [Sentiment[StimulusLe sue parole]

addolorano[-2]

ingelosiscono[-1]

perturbano[-2]

rallegrano[+2]

rincuorano[+1]

tormentano[-3]

[ExperiencerMaria]]

ldquoHis words (grieve + make jealous + upset + cheer up + reassure +torment) Mariardquo

The objection of Ruwet (1994) on the psychological semantichomogeneity on this class can be applied also to the Italian LG class41 The extract of the dictionary reported below explains it better thanwords

danneggiareV+NEG+Op+FLX=V4+41

esautorareV+NEG+Op+FLX=V3+41

glorificareV+POS+Op+FORTE+FLX=V8+41

ledereV+NEG+Op+FLX=V34+41

miticizzareV+POS+Op+FLX=V3+41

smascherareV+NEG+Op+DEB+FLX=V3+4114

122 items from the class 41 have been selected and marked withthe tag +Op to indicate according with Ruwet (1994) that they selectan N1 that does not plays the role of the Experiencer but that repre-sents just the Target of a positive or negative judgment Differentlyfrom other verbs also tagged with the +Op label (eg condannare ldquotocondemnrdquo) here N0 is not the Opinion Holder but just the Causethat justifies the opinion conveyed by the verb (see example 7) TheOpinion Holder is never expressed by the arguments of the predicatesfrom the class 41 because it can be played in turn by the publicopinion or the people (Ruwet 1994)

14ldquoTo damage to reduce or deprive of the authority to glorify to wrong to make into heroes to un-maskrdquo

70 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(7) [Opinion[CauseLe sue parole]

danneggiano[-2]

esautorano[-2]

glorificano[+3]

ledono[-2]

miticizzano[+2]

smascherarno[-2]

[TargetMaria]]

ldquoHis words (damage + reduce or deprive of the authority + glorify +wrong + make into heroes + unmask) Mariardquo

Another subgroup of the verbs from this class marked in the LGtable as accepting the property V= uso concreto (ldquoV= concrete userdquo)possesses also a concrete meaning This is the case of abbattere ldquotodemolishrdquo (fig ldquoto discouragerdquo) accendere ldquoto lightrdquo (fig ldquoto fire uprdquo)colpire ldquoto hitrdquo (fig ldquoto moverdquo) etcOf course the semantic properties of the figurative verbs are notshared with their concrete homographsThe automatic disambiguation of these metaphors is sometimes verysimple because it can be based on the divergent properties of the dif-ferent LG classes to which the concrete and figurative words belong (89)

Concrete usage (Class 20NR)

(8) Carla accende il fiammifero (Elia 1984)ldquoCarla lights the matchrdquo

Figurative usage (Class 41)

(9) Parlare di rivoluzione accende Arturo (Elia 1984)ldquoTalking about revolution fires up Arturordquo

As exemplified the verb accendere belongs also to the class 20NRin which it does not accept a completive as N0 (Parlare di rivoluzioneaccende il fiammifero ldquoTalking about revolution lights the matchrdquo)neither a human noun as N1 (Carla accende il fratello ldquoCarla lights herbrotherrdquo) or a body part noun (Carla accende la gamba del fratelloldquoCarla lights her brotherrsquos legrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 71

In other cases it proves more difficult but it can be solved as well withthe help of semantically annotated dictionaries This happens in theexamples below in which the fact that this verb is marked in the class20NR as accepting the property Prep N2strum makes it possible toautomatically distinguish its concrete use (10) from the figurative one(11)

Concrete usage

(10) C ar l a accende i l f uoco del l a

stu f ag r i g l i acuci na

ldquoCarla lights the fire of the (stove + grill + cooker)

Figurative usage

(11) C ar l a accende i l f uoco del (l a)

r i vol uzi onecambi amento

passi one

ldquoCarla ignites the fire of (revolution + change + passion)rdquo

LG Class 42 Ch F V Prep N1

The verbs from the class 42 enter into an intransitive reverse psych-verb structure and are defined by the LG formula N0 V Prep N1 where

bull N0 is generally an infinitive or completive clause (direct or intro-ducet by the phrase il fatto (Ch F + di) ldquothe fact (that S + of)rdquo) butin few cases can be also a human (Num) or a non restricted noun(Nnr)

bull N1 is a human noun (Num) but can be also a body part or anabstract noun of the kind discorso ldquo discourserdquo mente ldquomindrdquomemoria ldquomemoryrdquo (Elia 1984)

Here we find a selection of Psychological Predicates so poor thatcan be entirely reported

72 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

aggradareV+POS+Sent+42+FLX=V403+Prep=a

arridereV+POS+Sent+42+FLX=V34+Prep=a

dispiacereV+NEG+Sent+42+FLX=V37+Prep=a

spiacereV+NEG+Sent+42+FLX=V37+Prep=a

piacereV+POS+Sent+42+FLX=V37+DRV=RI+Prep=a15

The importance of this LG class into SentIta is not given by the(very small) amount of entries it accounts but depend on the ex-tremely high frequency of the verb piacere ldquoto likerdquoHere the prepositions (Prep) have been associated directly in thedictionary in order to avoid ambiguity with other verb usagesThe semantic annotation does not raise any problem as can bededuced from the example (12)

(12) [Sentiment[Stimulus

I l f at to che si f umi

C he si f umiFumar e

I l f umar eI l f umo

]

ag g r ad a[+2]

di spi ace[-2]

pi ace[+2]

a[ExperiencerMaria]]

ldquo(The fact that people smokes + That people smokes + Smoking +The smoke) (pleases + regrets + is appreciated by) Mariardquo

The main difference with the class 41 added to its intransitivity isthe fact that in Italian a more easily accepted form is the permutedone (Elia et al 1981 see example 13)

(13) [SentimentA [ExperiencerMaria] piace[+2] [Stimulusfumare]]ldquoMaria likes smokingrdquo

Again a subset of verbs from this class can be marked as opinionindicator

15ldquoto please to favor to regret to likerdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 73

Both N0 and N1 can be human nounsincombereV+NEG+Op+42+FLX=V21+Prep=loc+N0um+N1um

primeggiareV+POS+Op+FLX=V4+Prep=su+N0um+N1um

contrastareV+NEG+Op+FLX=V3+Prep=con+N0um+N1um

contareV+POS+Op+FLX=V3+Prep=per+N0um+N1um

Just N1 can be a human nounaddirsiV+42+POS+Op+FLX=V263+PRX+Prep=a+N1um

sconvenireV+NEG+Op+FLX=V236+Prep=a+N1um

convenireV+POS+Op+FLX=V236+Prep=a+N1um

costareV+NEG+Op+FLX=V3+Prep=a+N1um

Neither N0 nor N1 can be human nounsdegenerareV+NEG+FORTE+Op+42+FLX=V3+DRV=BILEN79+Prep=in

stridereV+NEG+Op+FLX=V2+Prep=con16

Not every verb of this subset accepts a human noun as N1 (see exam-ples 14 and 15)

(14) [Opinion[TargetQueste idee]str i dono[-2] con

i mi ei obi et t i vil a tua i mmag i ne

lowastM ar i a

]

ldquoThese ideas are at odd with (my purposes + your image + Maria)rdquo

(15) [Opinion[TargetLa situazione]degenera[-3] in un

dr ammacon f l i t tolowastM ar i a

]

ldquoThe situation degenerates into a (drama + conflict + Maria)rdquo

In such cases the Opinion Holder remains implicit but the judg-ment negatively affects the subject of the sentence instead of the N1Sentences like (16) and (17) raise to doubt whether N1 can be theHolder of the opinion expressed by the sentence or not

16ldquoto loom over to excel to hinder to worth to suit to be inconvenient to be convenient to cost todegenerate to clashrdquo

74 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(16) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion HolderArturo]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto Arturordquo

(17) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[convi ene[+2]

cost a[-2]

][Opinion Holderad Arturo]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for Arturordquo

Actually the verb does not provide any information regarding theactive or passive role of N1 in the expression of the opinion Thewriter can be convinced about what ldquomatters is convenient or costsrdquofor Arturo but it remains a belief that is not confirmed in the text (1617) Therefore we chose to assign the Opinion Holder role to the N1

of the class 42 only in the case in which the object corresponds to thepersonal pronouns ldquomerdquo or ldquousrdquo (18 19)

(18) [Opinion[Target

M ar i aI l f i d anzamento

C he si sposti l a r i uni one

conta[+2] per [Opinion Holderme]]

ldquo(Maria + The engagement + That the meeting is moved) mattersto merdquo

(19) [Opinion[Target

lowastM ar i aI l f i d anzamento

C he si sposti l a r i uni one

[Opinion Holderci]

[convi ene[+2]

cost a[-2]

]]

ldquo(Maria + The engagement + That the meeting is moved) (isconvenient + costs) for usrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 75

LG Classes 43 and 43B N0 V Ch F and N0 V il fatto Ch F

The sentence structure that defines this class is N0 V N1

bull N0 in both the classes 43 and 43B is generally a human noun(Num)

bull N1 is an infinitive clause or a completive one which is direct forthe 43 and introduced by il fatto (Ch F + di) ldquothe fact (that S + of)rdquo)for the 43B

The psychological (Sent) but also the opinion (Op) verbs from theclass 43 share in many cases the acceptance or the refusal of a set ofLG properties

deprecareV+Op+NEG+Class=43+FLX=V8

bestemmiareV+Op+NEG+FORTE+Class=43+FLX=V11

criticareV+Op+NEG+Class=43+FLX=V8

maledireV+Op+NEG+FORTE+Class=43+FLX=V216

tollerareV+Op+NEG+DEB+Class=43+FLX=V3

adorareV+Sent+POS+FORTE+Class=43+FLX=V3

amareV+Sent+POS+FORTE+Class=43+FLX=V3

ammirareV+Sent+POS+Class=43+FLX=V3

odiareV+Sent+NEG+FORTE+Class=43+FLX=V12

rimpiangereV+Sent+NEG+DEB+Class=43+FLX=V3117

As regards the distribution of N0 it must be noticed that all theseverbs accept a human noun but only two of them (condannare ldquotocomndemnrdquo and meritare ldquoto deserverdquo) accept also a not restrictednoun (Nnr not active) or a subjective clause introduced by il fatto (ChF + di) This fact is perfectly consistent with the animated nature ofthe Experiencer with respect to the Psychological verbs (20) Howeverit underlines that this class of verbs selects also an Opinion Holder asN0 (except for the verb meritare)

17ldquoto deprecate to blaspheme to criticize to curse to tolerate to adore to love to admire to haterdquo

76 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(20) [Sentiment[ExperiencerMaria]

[ador a[+3]

odi a[-3]

][Stimulus

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (adores + hates) (Arturo + that Arturo smokes + the fact thatArturo smokes + the smoke)rdquo

(21) [Opinion[Opinion HolderMaria]

[cr i t i ca[-2]

tol ler a[-1]

][Target

Ar tur o

che Ar tur o f umii l f at to che Ar tur o f umi

i l f umo

]]

ldquoMaria (criticizes + tolerates) (Arturo + that Arturo smokes + thefact that Arturo smokes + the smoke)rdquo

As far as the distribution of N1 is concerned all the psych and opin-ion items accept the pronoun ciograveldquothisrdquo and the Ppv lo ldquoithimherrdquo(apart from inveire ldquoto inveighrdquo) The only verbs that refuse Numas complement are auspicare ldquoto hope forrdquo inveire ldquoto inveighrdquolamentare ldquoto lamentrdquo malignare ldquoto speakthink ill ofrdquo meritare ldquotodeserverdquo agognare ldquoto yearn forrdquo anelare ldquoto long forrdquo deilrare ldquototalk nonsenserdquo and gustare ldquoto tasterdquo The verbs that instead do notaccept a N-um are fewer inveire and delirareThe property N1 dal fatto Ch F is refused by every verb except forapprezzare ldquoto admirerdquo that together with sopportare ldquoto sufferrdquodesiderare ldquoto desirerdquo and bramare ldquoto craverdquo accept also the propertyN1 da N2 Other properties refused in bulk are Ch N0 V essere Comp(save malignare sospettare ldquoto suspectrdquo temere ldquoto be afraid ofrdquo)N1 Agg (save sospettare temere) N1 = Agg Ch F (save sospettare) N0

V essere Cong (save sospettare and malignare) N1 = se F o se F (savebenedire ldquoto bless rdquo apprezzare and gustare) Ch F a N2um (saveapprovare ldquoto approverdquo)Similar observations but with an even higher homogeneity can be

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 77

made on the class 43B that differently from the 43 has the N1 alwaysintroduced by il fatto (Ch F + di)The Psychological (Sent) and the opinionated (Op) verbs from theclass 43B share sometimes the acceptance or the refusal of a set of LGproperties

attaccareV+Op+NEG+Class=43B+FLX=V8

biasimareV+Op+NEG+Class=43B+FLX=V3

lodareV+Op+POS+Class=43B+FLX=V3

osannareV+Op+POS+FORTE+Class=43B+FLX=V3

snobbareV+Op+NEG+Class=43B+FLX=V3

assaporareV+Sent+POS+Class=43B+FLX=V3

pregustareV+Sent+POS+Class=43B+FLX=V3

schifareV+Sent+NEG+FORTE+Class=43B+FLX=V3

sdegnareV+Sent+NEG+FORTE+Class=43B+FLX=V16

spregiareV+Sent+NEG+FORTE+Class=43B+FLX=V418

Here again an N0 different from a human noun can be acceptedjust by three verbs (banalizzare ldquoto trivializerdquo demitizzare ldquoto unideal-izerdquo and pregiudicare ldquoto compromiserdquo)The pronoun ciograveldquothisrdquo the Ppv lo ldquoithimherrdquo and a not human nounare accepted by all the Sent and Op verbs in complement positionwhile the N1um is refused by 2 psychological and 5 opinionated verbsAs regards the properties refused by almost all the verbs we account N1

dal fatto Ch F N1 da N2 (save difendere ldquoto defendrdquo) and Ch F a N2um(save apologizzare ldquoto make an apology forrdquo and vantare ldquoto praiserdquo)Differently from the class 43 in which the passive transformation isnot accepted in 3 cases in the class 43B it is accepted by all the items

18ldquo to attack to blame to praise to snub to taste to foretaste to be disgusted by to be indignant todespiserdquo

78 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3334 Other Verbs of Sentiment

We stated in Paragraph 3332 that almost half of the SentIta verbsdo not coincide with what is defined in literature as ldquopsych verbrdquoMoreover a lot of psych verbs and other opinion-bearing items canbe found in LG classes different from the ones classically identifiedas containing psychological predicates namely 41 42 43 43B (Elia2014a 2013) In detail a large number of verbs that can be used assentiment and opinion indicators is classified over 28 other LG classesdescribed by as many definitional structures This fact could be inter-preted as the failure of the syntactic-semantic mapping hypothesisbut actually into the LG framework the semantic entropy of the lex-icon can be decreased through the identification of classes of wordsthat share sets of lexical-syntactic behaviorsThe main Lexicon-Grammar verbal subclasses are the mentioned be-low

Intransitive verbs LG classes from 1 to 15 on which LG researcherstested the acceptance or the refusal of 545 combinatorial and dis-tributional properties through 23827 elementary sentences

Transitive verbs LG classes from 16 to 40 tested on 361 propertiesamong 19453 sentences

Complement clause verbs (transitive and intransitive) LG classesfrom 41 to 60 tested on 443 properties and 57242 sentences(Elia 2014a)

Table 312 shows the percentage values of opinionated verbsin each one of the Lexicon-Grammar verbal subclasses mentionedabove As we can see the complement clause group contains thelarger number of oriented items When inserted in our electronicdictionaries the lemmas from these LG classes are enriched with se-mantic information concerning the Types (Sent sentiments emo-

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 79

LG Class

Verb

usa

ges

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intransitive verbs 401 138 21Transitive verbs 605 205 31Complement clause verbs 760 308 47Tot 1766 651 100

Table 312 Polar verbs in the main Lexicon-Grammar verbal subclasses

tions psychological states Op opinions judgments cognitive statesPhy physical acts) the Orientations (POS NEG) and the Intensities(STRONG WEAK ) of the described lemmasSimilarly to SentiLex (Silva et al 2010 2012) SentIta is provided withthe specification of the argument(s) Ni semantically affected by theorientation of the verbsThe presence of oriented verbs and the distribution of semantic tagsare respectively described in Tables 313 and 314 For a complete pre-sentation of LG verbal classes their composition and their granularitysee Elia (2014a)

334 Psychological Predicatesrsquo Nominalizations

The nominal entities of SentIta that will described in this Sectionare all predicative forms due to their correlation with the verbalpredicates described above eg angosciare ldquoto anguishrdquo angoscialdquoanguishrdquo (DrsquoAgostino 2005)The complex phenomenon related to the combinatorial constraintsbetween sentiment V-n and specific verbs in simple sentences hasbeen already explored by Balibar-Mrabti (1995) and Gross (1995) forthe French language In this regard Gross (1995) specified that ldquoles

80 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Mai

nLG

Cla

ss

LGC

lass

Verb

usa

ge

Ori

ente

dve

rbs

Per

cen

tage

s(

)

Intr

ansi

tive

Verb

s 2 122 34 282A 40 9 234 31 12 399 72 35 4910 56 19 3411 80 29 36

Tran

siti

veVe

rbs

18 40 8 2020I 25 1 420NR 83 15 1820UM 145 65 4521 29 2 721A 65 44 6822 63 28 4423R 34 13 3824 72 18 2527 37 10 2728ST 12 1 8

Co

mp

lem

entC

lau

seVe

rbs

44 33 18 5544B 44 20 4545 31 23 7445B 21 16 7647 313 120 3847B 34 12 3549 96 12 1350 51 41 8051 36 19 5353 28 9 3256 73 18 25

Tot - 1766 651 37

Table 313 Polar verbs in other LG verb classes which contain at least one orienteditem

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 81

Mai

nLG

Cla

ss

LGC

lass

Ori

ente

dVe

rbs

Sent Op POS NEG FORTE DEB

Intr

ansi

tive

Verb

s 2 34 7 24 8 26 0 02A 9 4 2 1 8 0 04 12 0 11 5 7 0 09 35 10 25 11 24 0 010 19 4 15 9 10 0 011 29 25 4 0 29 0 0

Tran

siti

veVe

rbs

18 8 0 0 4 4 0 020I 1 0 0 1 0 0 020NR 15 5 8 4 11 0 020UM 65 16 25 11 54 0 021 2 0 2 0 2 0 021A 44 9 14 17 6 11 1022 28 0 28 21 7 0 023R 13 0 0 4 9 0 024 18 0 0 0 1 15 227 10 0 10 10 0 0 028ST 1 0 0 0 1 0 0

Co

mp

lem

entC

lau

seVe

rbs

44 18 0 18 16 2 0 044B 20 2 18 14 6 0 045 23 6 16 7 15 1 045B 16 5 10 7 9 0 047 120 0 120 5 54 43 1847B 12 0 34 4 8 0 049 12 0 12 4 8 0 050 41 4 33 15 22 4 051 19 1 14 10 9 0 053 9 0 9 0 9 0 056 18 1 9 1 9 5 3

tot - 651 99 461 189 350 79 33

Table 314 Distribution of semantic labels into the LG verb classes that contain atleast one oriented item

82 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

restrictions de noms de sentiment agrave certains verbes sont nombreuseset tellement inattendues que dans un premier temps nous les avionstraiteacutees comme des expressions figeacutees crsquoest-agrave-dire comme de simpleslistes19rdquoAs regards Italian works in the LG framework we mention the worksof DrsquoAgostino (2005) DrsquoAgostino et al (2007) and Guglielmo (2009)that respectively explored the lexical micro-classes of the sentimentsira ldquoangerrdquo angoscia ldquoanguishrdquo and vanitagrave ldquovanityrdquo studying theirnominal manifestation in support verb constructionsDifferently from ordinary verbs (discussed here in Sections 333)support verbs do not carry any semantic information this role is justplayed here by the sentiment nouns but they can modulate it to acertain extentIn the present research the Nominalizations of Psychological verbshave been used to manually start the formalization of the SentItadictionary dedicated to nouns It comprehends 1000+ entries andincludes list of 80+ human nouns evaluated and added to this dic-tionary in order to use them in sentences like N0um essere un N1sent(eg Max egrave un attaccabrighe ldquoMax is a cantankerousrdquo)Table 315 shows one example of Psych V-n for each LG class took intoaccount during the annotation of Psychological Predicates

A sample of the entries from this dictionary is given below The LG

Psych verb V-n V-n Translation LG Class Scoreangosciare angoscia ldquoanguish 41 -3piacere piacere ldquopleasure 42 +2amare amore ldquolove 43 +3biasimare biasimo ldquoblame 43B -2

Table 315 Description of the Nominalizations of the Psychological Semantic Pred-icates

class is associated to a given item by the property ldquoClass that let

19ldquoThe restrictions of nouns of sentiment to certain verbs are numerous and so unexpected that at firstwe treated them as frozen expressions that is to say as mere listsrdquo Authorrsquos translation

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 83

the grammar recognize the proper syntactic structures in which thenominalizations occur in real texts

LG Class 41bellezzaN+POS+Class=41+DASOLO+FLX=N41

caloreN+POS+Class=41+FLX=N5

contentezzaN+POS+Class=41+DASOLO+FLX=N41

dolcezzaN+POS+Class=41+DASOLO+FLX=N41

fascinoN+POS+Class=41+DASOLO+FLX=N520

LG Class 42dispiacereN+FLX=N5+42+NEG+DASOLO

dispiacevolezzaN+FLX=N41+42+NEG+DASOLO

dispiacimentoN+FLX=N5+42+NEG+DEB+DASOLO

piacereN+FLX=N5+42+POS+DASOLO

piacevolezzaN+FLX=N41+42+POS+DASOLO21

LG Class 43approvazioneN+FLX=N46+43+POS+DASOLO

benedizioneN+FLX=N46+43+POS+DASOLO

bestemmiaN+FLX=N41+43+NEG+DASOLO

bramosiaN+FLX=N41+43+NEG+DEB+DASOLO

condannaN+FLX=N41+43+NEG+DASOLO22

LG Class 43BdenigrazioneN+FLX=N46+43B+NEG+FORTE+DASOLO

derisioneN+FLX=N46+43B+NEG+DASOLO

dileggioN+FLX=N12+43B+NEG+DASOLO

disdegnoN+FLX=N5+43B+NEG+FORTE+DASOLO

disprezzoN+FLX=N5+43B+NEG+FORTE+DASOLO23

Table 316 presents the details about the distribution of sentimenttags into the Psych V-n dictionary The size of the nouns dictionary is

20 ldquoBeauty warmth happiness kindness charmrdquo21 ldquoSorrow displeasure regret pleasure pleasantnessrdquo22 ldquoApproval blessing curse yearning condemnationrdquo23 ldquoDetraction derision mockery disdain despiserdquo

84 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

LGC

lass

No

un

s

Po

siti

veIt

ems

Neg

ativ

eIt

ems

Inte

nsi

fier

s

Dow

nto

ner

s

Per

cen

tage

s

41 856 245 542 41 28 7742 11 5 6 0 0 143 170 57 113 0 0 1543B 74 21 53 0 0 7Tot 1111 328 714 41 28 100 100 30 64 4 3 -

Table 316 Nominalizations of the Psych Predicates chosen from SentIta

larger than the the verbsrsquo one This happens because the same pred-icate can possess more than one nominalization (eg the verb amarefrom the LG class 43 is related with the V-n amabilitagrave amore amorev-olezza)

3341 The Presence of Support Verbs

To include V-n in a sentiment lexicon is particularly risky In manycases it is misleading to use them out of any context because theyoften have a specific SO only if used with particular support verbsAmong the Vsup considered we mention the following with theirequivalents (DrsquoAgostino 2005 DrsquoAgostino et al 2007)

bull essere in ldquoto be inrdquo patire di soffrire di ldquoto suffer of stare in ldquotobe inrdquo andare in ldquoto go inrdquo

bull avere ldquoto haverdquo avvertire ldquoto perceiverdquo patire soffrire ldquoto suffertenere ldquoto have sentire provare ldquoto feel subire ldquoto undergordquonutrire covare ldquoto harborrdquo provare un (senso + sentimento) di ldquoto ldquoto feel a (sense + sentiment) of

bull causare ldquoto causerdquo dare ldquoto give suscitare ldquoto raiserdquo provocare

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 85

ldquoto provoke creare ldquoto create alimetare ldquoto instigate stimo-lare ldquoto inspire sviluppare ldquoto develop incutere ldquoto instillrdquo sol-lecitare ldquoto urge tordquo donare regalare offrire ldquoto offerrdquo

Sometimes it is even possible to have a SO switch by changing theword context For example the noun pietagrave ldquopity and ldquocompassionis strongly negative into a fixed expression with the support verb fare(22) but it becomes positive when it co-occurs with the sentiment ex-pression of (23)

(22) Egrave proprio la sceneggiatura che fa pietagrave [-3]ldquoThatrsquos the script that is pitifulrdquo

(23) Mary prova un sentimento di pietagrave [+2]ldquoMary feels a sentiment of compassionrdquo

Systematically evaluating the relationships that exist between eachone of the psych nouns and each one of the support verbs that canbe combined with them through LG tables can bring to an importantconclusion the fact that the support verb variants can influence thepolarity of the V-n with which they occur is not an exception For ex-ample extensions of Vsup like subire ldquoto undergordquo covare ldquoto harborrdquoincutere ldquoto instillrdquo bring a negative orientation with them donare re-galare offrire ldquoto offerrdquo are positively connoted perdere ldquoto loserdquo ab-bandonare ldquoto abandonrdquo lasciare ldquoto leaverdquo cause a polarity switch forwhatever psych noun orientationSupport verbs like the ones cited above can not be simply includedinto a Vsup list but must be inserted in specific paths of a nouns gram-mars able to correctly manage their influence on V-n

3342 The Absence of Support Verbs

Another problem arises when a psych predicative noun from our col-lection presents different semantic behaviors when occurring withoutsupport verbs in real texts This is the case of calore ldquowarmthrdquo from the

86 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

verb riscaldare ldquoto warm uprdquo of the LG class 41 that possesses its psy-chological figurative meaning only together with specific structures(24) but assumes a physical interpretation when it occurs alone (26)Different is the case of contentezza ldquohappinessrdquo that preserves its psy-chological meaning in both the cases (25 27)

(24) Le tue parole invadono[+] Maria di calore[+2] [+3]ldquoyour words warm up Maryrdquo

(25) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoyour words overwhelm Mary with happinessrdquo

(26) Che calore[+2] [0]ldquowhat a heatrdquo

(27) Che contentezza[+2] [2]ldquowhat a joyrdquo

In a banal way we solved the problem by distinguishing in the dic-tionaries and recalling in the grammars the cases in which the nounscan occur alone from the cases in which a specific Vsup context is re-quired to have a particular SO by means of the special tag +DASOLOldquoalonerdquo

3343 Ordinary Verbs Structures

In order to deeper examine how the presence of different verbs canaffect the polarity of the Psychological V-n we translated the list of 61verbs of Balibar-Mrabti (1995) from the French and annotated themwith semantic information The sentence structure which they havebeen conceived for is N0sent V N1um (eg una gioia intensa riempieMax ldquoan intense joy fills Maxrdquo) in which the subject N0 plays the roleof Nsent Then they have been tested into the structure N0 V N1umdi N2sent (eg le tue parole riempiono Max di una gioia intensa ldquoyourwords fill Max with an intense joyrdquo) that in turn is related with the

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 87

passive Max egrave pieno di una gioia intensa ldquoMax is full of an intense joyrdquoThis list of corresponding Italian verbs contains 47 entries because of

the exclusion of the duplicates from the LG class 4124 Among them 11give to the Nsent a negative connotation (eg abbattere ldquoto lose heartrdquooffuscare ldquoto confuserdquo corrodere ldquoto corroderdquo) 6 confer it a positiveorientation (eg rispeldere ldquoto shinerdquo cullare ldquoto clingrdquo illuminareldquotolight uprdquo) 18 intensify their polarity (eg travolgere ldquoldquoto overwhelmrdquordquoattraversare ldquoto undergordquo riempire ldquoto fillrdquo) and 12 simply possess aneutral orientationTheir role can become crucial into two differently oriented sentencesthat make use of the same Nsent like (28a b c)

(28a) Lrsquoamore[+2] tormenta[-3] Maria [-3](28b) Lrsquoamore[+2] illumina[+2] Maria [+2](28c) Lrsquoamore[+2] travolge[+] Maria [+3]

ldquoThe love (torments + lights up + overwhelms) Mariardquo

Therefore for sake of simplicity we decided to put them into aFSA that assigns as sentence polarity the score of the verbs with tagsNEGPOS (28a b) and that uses the verbs with the tag FORTE as in-tensifiers (28c) A simplified version of this grammar is displayed inFigure 31 Here the paths marked with the number (1) represents thenegative rule (28a) the paths (2) represent the positive rule (28b) andthe paths marked with (3) exemplify the intensification rule (28c)In this dictionarygrammar pair we did not mark the verbs with spe-cial tags consequently other Psychological Predicated endowed withpolarity and intensity tags can be matched in the correspondingpaths

24The verbs from this class are all compatible with this structure because they accept in the subjectposition an abstract noun and in position N1 require always a human noun

88 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re31F

SAfo

rth

ein

teraction

betw

eenN

sentan

dsem

antically

ann

otated

verbs

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 89

3344 Standard-Crossed Structures

To conclude the work on sentiment V-n we mention the case of thestructures of the type ldquostandardrdquo (N0sent V Loc N1um) ldquocrossedrdquo(N0um V di N1sent) exemplified below (DrsquoAgostino 2005 Elia et al1981)

(29) lrsquoamore[+2] brilla[+] negli occhi di Maria [+3]ldquolove shines in Mariarsquos eyesrdquo

(30) gli occhi di Maria brillano[+] drsquoamore[+2] [+3]ldquoMariarsquos eyes shine with loverdquo

To formalize this group of sentiment expressions we made use of 12verbs from the LG class 12 interpreted into a figurative way in whichthe N1um ldquostandardrdquo position and the N0um ldquocrossedrdquo position aredesigned by their body parts Npc through a metonymic process (Eliaet al 1981)As happens in (29) and (30) such verbs can work as intensifiers (egbruciare ldquoto burnrdquo) or can also have a negative polarity influence (onlyin the case of dolere ldquoto hurtrdquo)

335 Psychological Predicatesrsquo Adjectivalizations

Paragraph 332 described the manually-built collection of opinionbearing adjectives that counts more than 5000 itemsIn this list a subset of items in morpho-phonological relation with ourpredicates from the classes 41 42 43 and 43B has been automaticallyassociated to the verbs of origin and to their respective LG classes Theresult of this work is an electronic dictionary of 757 entries realizedwith a morphological grammar of Nooj (see Figure 32) It is displayedin the sample below

90 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

angoscianteA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciatoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

angosciosoA+FLX=N88+DRV=ISSIMON88+NEG+Va+V=angosciare+Class=41

piacenteA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

piacevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=piacere+Class=42

amatoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorevoleA+FLX=N79+DRV=ISSIMON88+POS+Va+V=amare+Class=43

amorosoA+FLX=N88+DRV=ISSIMON88+POS+Va+V=amare+Class=43

biasimevoleA+FLX=N79+DRV=ISSIMON88+NEG+Va+V=biasimare+Class=43B25

Table 317 gives information about the distribution of its semantictags while Table 318 compares the sentiment selection of V-a withthe whole dictionary of sentiment adjectives of SentIta in which it iscontained

Tag 41 42 43 43B TotPOS+FORTE 8 0 0 9 17POS 133 3 35 11 182POS+DEB 14 0 0 0 14NEG+DEB 66 0 5 1 72NEG 307 2 24 25 358NEG+FORTE 49 0 6 2 57FORTE 46 1 0 0 47DEB 9 0 0 1 10Tot 632 6 70 49 757

Table 317 Distribution of semantic tags in the dictionary of Adjectival Psych Pred-icates

The grammar resembles the ones used to derive the orientation ofadverbs in -mente and quality nouns from the semantic informationlisted in the sentiment adjectives that will be respectively treated inSections 352 and 353 The process is similar we copy and paste thedictionary that has to be semantically enriched into a Nooj text and weannotate it through the instructions of the grammar We then export

25 ldquoAnguishing anguished anguishous attractive pleasing beloved loving amorous blameworthyrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 91

Adjectives Entries Percentages () Percentages ()on Adj Psych on SentIta

Tot pos 213 28 4Tot neg 487 64 9Tot int 57 8 1Tot Psych 757 100 14Tot SentIta 5381 - 100

Table 318 Percentage values of Psych Adjectivalizations in SentIta

the annotations and compile a brand new dictionary on their baseBecause the operations involve just the two dictionaries we exclude

any other lexical (nod) and morphological (nom) resource from theNooj Preferences before applying the grammar to the dictionary-textThe grammar recognizes each verbal stem checks if it belongs to theLG class of interest and then adds the adjectival suffixes listed intoa dic file (see Table 319) The difference from the grammars forthe quick population of the adverbs in -mente and the quality nounsis that here the morphological strategy is not exploited to discoverthe prior polarity of the words since the adjective dictionary was al-ready tagged with sentiment information In fact in the final annota-tion the adjectives preserve all their semantic ($2S) inflectional andderivational ($2F) properties The new information they acquire re-gard their nature of V-a (+Va) the connection with the psychologicalverb (+V=$1L) and the relative LG class of the predicate (+Class=41)Regarding the adjectival suffixes for the deverbal derivation we exam-ined the ones that follow (Ricca 2004a)

bull morphemes for the adjectives formation (-bile -(t)orio -evole-(t)ivo -(t)ore)-bondo -ereccio -iccio -ido -ile -ndo -tario -ulo)

bull morphemes for the formation of adjectives that coincide with theverbrsquos past or present participle (-to -nte)

bull not typically deverbal morphemes (-oso -istico-(t)ico -aneo -ardo -ale -ario)

92 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re32E

xtractofth

em

orp

ho

logicalF

SAth

atassociates

psych

verbs

toth

eirad

jectivalization

s

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 93

bull not typically adjectival morphemes (-(t)ore -trice -ista -one)

SFX Adjectives-t-o 309-nt-e 159-(t)or-e 100-(t)iv-o 38-os-o 38-(t)ori-o 35-evol-e 26-o (conversion) 20-(t)ic-o 8-on-e 6-ist-a 4-istic-o 3-al-e 3-il-e 2-bil-e 1-bond-o 1-nd-o 1-tric-e 1-icci-o 1-ari-o 1-erecci-o 0-id-o 0-tari-o 0-ul-o 0-ane-o 0-ard-o 0

Table 319 Suffixes from the dictionary of Psych Adjectivalizations

As it can be seen in Table 319 some suffixes produced a largenumber of connections some of them were less productive Six mor-phemes did not produce any connectionFurthermore also a set of adjectives derived by conversion has beenautomatically connected to the respective verbs

94 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCESP

sych

verb

Psy

chV-

n

Psy

chV-

a

V-a

Tran

slat

ion

LGC

lass

Sco

re

angosciare angosciaangosciante ldquoanguishing

41 -3angosciato ldquoanguishedangoscioso ldquoanguishous

piacere piacerepiacente ldquoattractive

42 +2piacevole ldquopleasing

amare amoreamato ldquobeloved

43 +3amorevole ldquolovingamoroso ldquoamorous

biasimare biasimo biasimevole ldquoblameworthy 43B -2

Table 320 Adjectivalizations of the Psych Predicates

LG class 43 maligno = malignare ldquomean = to speak ill ofrdquoLG class 41 schifo = schifareldquodisgusting = to loatherdquo

The average precision reached in the automatic association pro-cess is 97 The list produced by Nooj has then been manuallycorrected in order to add it to the SentIta resources without anypossible source of errorsTable 320 shows examples of the connection between the verbs fromthe tables 41 42 43 and 43B and their respective nominalizations andadjectivalizationsActually nouns can be associated to adjectives also independentlyfrom verbs (Gross 1996) as happens in (31) Section 353 will dealwith such cases

(31) Luc

[est g eacuteneacuter eux

a de l a g eacuteneacuter osi teacute

]enver s ses ennemi s (Gross 1996)

ldquoLuc (is generous + has generosity) for his enemiesrdquo

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 95

Basically we chose to associate the psych adjectives to the respec-tive verbs in order to preserve their argumentactant selection for Se-mantic Role Labeling purposes This does not exclude that also otheradjectives that do not belong to this subset can have the function ofpredicates in the sentences in which they occur as happens in (31)

3351 Support Verbs

As exemplified above the sentiment expressions in which we insertedthe adjectives from SentIta are of the kind N0 essere Agg val WhereAgg val represents an adjective that expresses an evaluation (Elia et al1981) and the support verb for these predicates is essere ldquoto berdquoThis verb gives its support also to the expression N0 essere un N1-class(eg Questo film egrave una porcheria ldquoThis movie is a messrdquo) that withoutit together with the adjectives would not posses any mark of tense(Elia et al 1981)A great part of the compound adverbs possess also an adjectival func-tion (eg a fin di bene ldquofor goodrdquo tutto rose e fiori ldquoall peace and lightrdquosee Section 341) so with good reason they have been included inthis support verb construction as wellThe support verbsrsquo equivalents included in this case are the following(Gross 1996)

bull aspectual equivalents stare ldquoto stayrdquo diventare ldquoto becomerdquo ri-manere restare ldquoto remainrdquo

bull causative equivalents rendere ldquoto makerdquo

bull stylistic equivalents sembrare ldquoto seemrdquo apparire ldquoto appearrdquorisultare ldquoto resultrdquo rivelarsi ldquoto reveal to berdquo dimostrarsimostrarsi ldquoto show oneself to berdquo

Among the Italian LG structures that include adjectives26 we se-lected the following in which polar and intensive adjectives can occur

26For the LG study of adjectives in French see Picabia (1978) Meunier (1999 1984)

96 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

with the support verb essere (Vietri 2004 Meunier 1984)

bull Sentences with polar adjectives

ndash N0 Vsup Agg Val27

Lrsquoidea iniziale era accettabile[+1]

ldquoThe initial idea was acceptablerdquo

ndash V-inf Vsup Agg Val

Vedere questo film egrave demoralizzante[ndash2]

ldquoWatching this movie is demoralizingrdquo

ndash N0 Vsup Agg Val di V-Inf

La polizia sembra incapace[ndash2] di fare indagini

ldquoThe police seems unable to do investigaterdquo

ndash N0 Vsup Agg Val a N1

La giocabilitagrave egrave inferiore[ndash2] alla serie precedente

ldquoThe playability is worse than the preceding seriesrdquo

ndash N0 Vsup Agg Val Per N1

Per me questo film egrave stato noioso[ndash2]

ldquoIn my opinion this movie was boringrdquo

bull Sentences with adjectives as nouns intensifiers and downtoners

ndash N0 Vsup Agg Int di N1

Una trama piena[+] di falsitagrave[ndash2]

ldquoA plot filled with mendacityrdquo

27 We preferred to use in these examples the notation Agg Val rather than Psych V-a because they canrefer to both adjectivalizations of psych verbs and to other SentIta evaluative adjectives

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 97

3352 Nominal groups with appropriate nouns Napp

The support verb avere ldquoto haverdquo (and its equivalent tenere) has beenobserved into a transformation (Nb Vsup Na V-a) in which is involveda special kind of GN subject that contains noms approprieacutes ldquoappro-priate nounsrdquo Napp (Harris 1970 Guillet and Leclegravere 1981 Laporte1997 2012 Meydan 1996 1999)Citing (Laporte 2012 p 1)

ldquoA sequence is said to be appropriate to a given context ifit has the highest plausibility of occurrence in that contextand can therefore be reduced to zero In French the notionof appropriateness is often connected with a metonymicalrestructuration of the subjectrdquo

and (Mathieu 1999b p 122)

ldquoOn considegravere comme substantif approprieacute tout substantifNa pour lequel dans une position syntaxique donneacutee Na deNb = Nb28rdquo

we can clarify that ldquothe notion of highest plausibility of occurrenceof a term in a given contextrdquo (Laporte 2012) should not be interpretedin statistic terms or proved by searches in corpora but just intuitivelydefined through the paraphrastic relation Na di Nb = NbAccording to (Meydan 1996 p 198) ldquothe adjectival transformationswith Napp (nb (Na di Nb)Q essere V-a = Il fisico di Lea egrave attraenteldquoThe body of Maria is attractiverdquo) can be put in relation through fourtypes of transformationsrdquo which correspond also to the structuresincluded into our network of sentiment FSA The obligatoriness of themodifiers and the appropriateness of the nouns are reflected in thesetransformations (Laporte 1997)

28ldquoIt is considered to be appropriate substantive each substantive fro which into a given syntactic po-sition Na of Nb = Nbrdquo Authorrsquos translation

98 CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull a nominal construction Vsup Napp

Nb Vsup Na V-a

Lea ha un fisico attraente ldquoLea has an attractive bodyrdquo

bull a restructured sentence in which the GN subject is exploded intotwo independent constituents

Nb essere V-a Prep Na

Lea egrave attraente (per il suo + di) fisico ldquoLea is attractive for herbodyrdquo

bull a metonymic sentence in which the Napp is erased

Nb essere V-a

Lea is attractive ldquoLea is attractiverdquo

bull a construction in which the Napp is adverbalized

Nb essere Na-mente V-a

Lea egrave fisicamente attraente ldquoLea is physically attractiverdquo

The concept of appropriate noun has a crucial importance notonly in these adjectival expressions but also when it appears as ob-ject of psychological predicates (see Section 334) as exemplified by(Mathieu 1999b p 122)

N0 V (Deacutet N de Nhum)1 = N0 V (Nhum)1

Les soucis rongent lrsquoesprit de Marie = Les soucis rongent MarieldquoWorries torment the spirit of Mary = Worries torment Maryrdquo

Moreover into the Sentiment Analysis field where the identifica-tion and the classification of the features of the opinion object evenconsist in a whole subfield of research (see Section 52) the Napp be-comes a very advantageous linguistic device for the automatic featureanalysis See for example (32) in which Na (Napp) is the feature andthe Nb (N-um) is the the object of the opinion

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 99

(32) [Opinion[Feature

[La r i sol uzi one

Lprimeobi et t i vo

]]del l a [Object f otocamer a] egrave eccel lente[+3]]

(32a) [Opinion[ObjectLa f otocamer a] ha una [Feature

[r i sol uzi one

obi et t i vo

]]eccel lente[+3]]

(32b) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3]]

(32c) [Opinion[ObjectLa f otocamer a] egrave eccel lente[+3] per [Feature

[l a sua r i soluzi one

i l suo obi et t i vo

]] ]

ldquo(The image resolution + The lens) of the camera is excellentrdquoldquoThe camera has an excellent (image resolution + lens)rdquoldquoThe camera is excellentrdquoldquoThe camera is excellent for its (image resolution + lens)rdquo

A useful classification of the nouns that can play the role of Nappfor both human and not human nouns and that can be exploited forthe opinion feature classification is the following (Meydan 1996)

Napp of Num

Npc = body parts

Npabs = personality trait

Ncomport = attitudes

Npreacutedr = intentions

Napp of N-um

Npo = shape

Nprop = ability

Dnom = model

100CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

3353 Verbless Structures

Predicativity is not a property necessarily possessed by a particularclass of morpho-syntagmatic structures (eg verbs that carry infor-mation concerning person tense mood aspect) instead it is deter-mined by the connection between elements (Giordano and Voghera2008 De Mauro and Thornton 1985)Also on the base of their frequency in written and spoken corpora andin informal and formal speech together with Giordano and Voghera(2008) we consider verbless expressions syntactically and semanti-cally autonomous sentences which can be coordinated juxtaposedand that can introduce subordinate clauses just like verbal sentencesAmong the verbless sentences available in the Italian language we areinterested here on those involving adjectives indicating appreciation(Agg val) eg Bella questa ldquoGood onerdquo (Meillet 1906 De Mauro andThornton 1985)Below we report a selection of customer reviews from the dataset de-scribed in Section 513 that can give an idea of the diffusion of theverbless constructions in user generated contents

bull Movie Reviews [Molto bravi gli interpreti] [la mia preferita lasorella combina guai] [Bella la fotografia] [i colori]29

bull Car Reviews [Ottimo lrsquoimpianto radio cd] [comodissimi i co-mandi al volante]30

bull Hotel Reviews [Posizione fantastica] si raggiungono a piedi moltiluoghi strategici di Londra [molto curato nel servizio e nel soddis-fare le nostre richieste][ottimo servizio in camera]31

bull Videogame Reviews [Provato una volta] e [subito scartato]

29httpwwwmymoviesitfilm2013abouttimepubblicoid=68043930httpautociaoitNuova_Fiat_500__Opinione_1336287SortOrder131httpwwwtripadvisoritShowUserReviews-g186338-d651511-r120264867-Haymarket_

Hotel-London_Englandhtml

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 101

[odioso il modo in cui viene recensito]32

bull Smartphone Reviews [Utilissimo il sistema di spegnimento e ri-accensione ad orari programmati] [Telefonia e sensibilitagrave OK ]33

bull Book Reviews [Azzeccatissima la descrizione anche psicologica eintrospettiva dei personaggi]34

3354 Evaluation Verbs

In this Paragraph we mention the use of the verbs of evaluationVval (Elia et al 1981) which represent a subclass of the LG class43 grouped together through the acceptance of at least one of theproperties N1 = N1 Agg1 and N1 = Agg1 Ch F Examples are giudicareldquoto judgerdquo trovare ldquoto considerrdquo avvertire ldquoto noticerdquo valutare ldquotoevaluaterdquo etcOf course the N1 Agg here can include an Napp that takes the shapeof (Na di Nb)1 Agg just as happens with the psychological predicatesof Mathieu (1999b)They present a different role attribution with respect to support verbsconstructions since the N0 of the evaluative verbs is always an Opin-ion Holder (33)

(33) [Opinion[Opinion HolderM ar i a ]consi der a

[[Target Ar tur o] a f f asci nante[+2]

a f f asci nante[+2] [Targetche Ar tur o veng a]

]

ldquoMaria considers (Arturo fascinating + enchanting that Arturocomes)rdquo

32httpwwwamazonitreviewRK6CGST4C9LXI33httpwwwamazonitAlcatel-Touch-Smartphone-Dual-Italiaproduct-reviews

B00F621PPGpageNumber=334httpwwwamazonitproduct-reviewsB007IZ1Q5SpageNumber=3

102CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

The N0 of support verb construction instead is always the targetof the opinion (34) The Opinion Holder remains here unexpressed

(34) [Opinion[Target Ar tur o ]

egraver i manesembr a

d av ver o[+] a f f asci nante[+2]]

ldquoArturo (is + remain + seems) truly fascinatingrdquo

Into causative constructions the roles are again different (35)

(35) [Opinion[Cause

[M ar i a

Lprimeacconci atur a

]] r ende [Target Ar tur o ]d av ver o[+] a f f asci nante[+2]]

ldquo(Maria + The hairstyle) makes Arturo truly fascinatingrdquo

336 Vulgar Lexicon

The fact that ldquotaboo language is emotionally powerfulrdquo (Zhou 2010 p8) can not be ignored during the collection of lexical and grammaticalresources for Sentiment Analysis purposesSentIta is provided with a collection of 182 bad words that include thefollowing grammatical and semantic subcategories

bull Nouns 115 entries among which

ndash 30 are human nouns (eg rompiballe ldquopain in the assrdquo)

ndash 9 are nouns of body parts (eg cazzo ldquodickrdquo)

bull Verbs 45 entries among which

ndash 17 are pro-complementary and pronominal verbs (eg 4+PRXCOMP fottersene ldquoto give not a fuckrdquo and 13 +PRX in-cazzarsi ldquoto get madrdquo)

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 103

ndash 17 allow the derivation of nouns in -bile ldquoblerdquo (eg tromba-bile ldquobangablerdquo from trombare ldquoto screwrdquo)

ndash 4 are already included in the LG class 41 (eg fregare ldquoto mat-terrdquo)

bull Adjectives 16 entries among which 10 are from SentIta (eg 5+POS cazzuto ldquodie-hardrdquo and 5 +NEG scoglionato ldquoannoyedrdquo)

bull Adverbs 4 entries (eg incazzosamente ldquogrumpilyrdquo)

bull Exclamation 8 entries (eg vaffanculo ldquofuck offrdquo)

Figure 33 Frozen and semi-frozen expressions that involve the vulgar word cazzoand its euphemisms

104CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Although some vulgar terms and expressions eg with the func-tions of interjections or discourse markers can be semantically emptythe great part of them have metaphorical functions that are relevantfor the correct analysis of the sentiment in real texts Examples are of-fense imprecation intensification of negation etcThe scatological and sexual semantic fields are among the most fre-quent in our lexicon The male and female nouns derived from humanbody parts can have different kinds of orientations and functions thatgo through the insult (coglione ldquoassholerdquo) and the praise (figata ldquocoolthingrdquo)In accord with the observations made up by Velikovich et al (2010)in web corpora our lexicon involves a set of racial or homophobicderogatory terms (eg terrone35 ldquosouthern Italianrdquo culattone ldquofaggotrdquo)The vulgar words in the SentIta database especially the ones with anuncertain polarity can be part of frozen or semi-frozen expressionsthat can make clear for each occurrence the actual semantic orienta-tion of the words in context A very typical Ialian example is cazzoldquodickrdquo with its more or less vulgar regional variants (eg minchiapirla) and euphemisms (eg cavolo ldquocabbagerdquo cacchio ldquodangrdquo mazzaldquostickrdquo tubo ldquopiperdquo corno ldquohornrdquo etc) Figure 33 exemplifies thecases in which the context gives to the word(s) under examination anegative connotation In detail the FSA includes negative adverbialand adjectival functions (paths 12 5 7) exclamations (paths 3 6 9)interrogative forms (6) and intensification of negations (paths 8 9) Inpath (4) it is also exemplified the frozen sentence from the LG classCAN (N0 V (C1 di N)= N0 V C1 a N2) that presents cazzo (Vietri 2011) asC1 frozen complement This frozen sentence in particular is also re-lated to some monorematic compound nouns and adjectives includedin the bad word list of SentitaHowever a Sentiment Analysis tool that works on user generated con-tents must be aware of the fact that in colloquial and informal situ-ations a word like cazzo can work simply as a regular intensifier also

35term used with insulting purposes only by Northern Italians

33 SENTITA AND ITS MANUALLY-BUILT RESOURCES 105

for positive sentences (eg egrave cosigrave bello[+2] cazzo[+] [+3] ldquoitrsquos fuckingnicerdquo) This is why it received in the dictionary just the tag attributedto intensifiers (+FORTE) In order to be annotated in texts with otherorientations it must occur as component of specific expressionsAs we can see in Table 321 although the majority of the entries in thebad words dictionary possess a negative connotation 10 of them arepositive (eg strafigo ldquosupercoolrdquo) Moreover the manual annotationof the whole list of bad words with sentiment tags confirmed the bud-ding concept of the emotional power of taboo language 84 of thevulgar lexicon is endowed with a defined semantic orientation (see Ta-ble 322)

Score Bad Words Entries+3 +POS+FORTE 6+2 +POS 12+1 +POS+DEB 0-1 +NEG+DEB 8-2 +NEG 125-3 +NEG+FORTE 6+ +FORTE 3- +DEB 0

Table 321 SentIta tag distribution in the list of bad words

Bad Words Entries Percentage ()tot pos 18 10tot neg 132 73tot int 3 2tot oriented 153 84tot neutral 29 16tot BW 182 100

Table 322 Bad Words percentage values in SentIta

106CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Collateral Benefits of Exhaustive Vulgar Lexical and Grammatical Resources

The Web 20 offers the opportunity to Internet users to freelyshare almost everything in online communities hiding their ownidentity behind the shelter of anonymityFlaming trolling harassment cyberbullying cyberstalking cy-berthreats are all terms used to refer to offensive contents on theweb The shapes can be different and the focus can be on various top-ics eg physical appearance ethnicity sexuality social acceptanceetc (Singhal and Bansal 2013) but they are all annoying (when noteven dangerous when the victims are children or teenagers) for thesame reason they make the online experience always unpleasantThe detection and filtering of offensive language on the web isstrongly connected to the Sentiment Analysis field when one decideto handle it automatically The occurrence of derogatory terms andphrases or the unjustified repetition of words with a strongly negativeconnotation can effectively help with this taskAlthough taboo language is generally considered to be the strongestclue of harassment in the web (Xiang et al 2012 Chen et al 2012Reynolds et al 2011 Xu and Zhu 2010 Yin et al 2009 Mahmud et al2008) it must be clarified that the presence of bad words in posts doesnot necessarily indicate the presence of offensive behaviorsWe already explained that the words in our collection of vulgar termsin some cases are neutral or even positive Profanity can be usedwith comical or satirical purposes and bad words are often just theexpression of strong emotions (Yin et al 2009)Thus the well-known censoring approaches based only on keywordsthat simply match words appearing in text messages with offensivewords stored in blacklists are clearly not meant to reach high levels ofaccuracyConsistent with this idea in recent years many studies on offensivecyberbullying and flame detection integrated the bad wordsrsquo contextin their methods and tools

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 107

Chen et al (2012) exploited a Lexical Syntactic Feature (LSF) archi-tecture to detect offensive content and identify potential offensiveusers in social media introducing the concept of syntactic intensifierto adjust wordsrsquo offensiveness levels based on their contextXiang et al (2012) learned topic models from a dataset of tweetthrough Latent Dirichlet Allocation (LDA) algorithm They combinedtopical and lexical features into a single compact feature spaceXu and Zhu (2010) proposed a sentence-level semantic filtering ap-proach that combined grammatical relations with offensive wordsin order to semantically remove all the offensive content from sen-tencesInsulting phrases (eg get-a-life get-lost etc) and derogatorycomparisons of human beings with insulting items or animals (egdonkey dog) were clues used by Mahmud et al (2008) to locate flamesand insults in text

34 Towards a Lexicon of Opinionated Compounds

341 Compound Adverbs and their Polarities

Elia (1990) adapted to the Italian language the formal definition of ldquoad-verbrdquo of1 2q Jespersen (1965) who for the first time included alsospecial kinds of phrases and sentences in the category of the adverbsIn detail the structures included in this category are the following

bull traditional adverbs with a monolithic shape (eg volentierildquogladlyrdquo)

bull traditional adverbs in -mente (eg impetuosamente ldquoimpetu-ouslyrdquo)

bull prepositional phrases (eg a vanvera ldquorandomlyrdquo)

108CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

bull phases derived from sentences with finite or non-finite verbs(eg da levare il fiato ldquoto take the breath away)

The first two types of adverbs will be treated separately in Section352 due to the fact that they have been automatically derived fromthe SentIta adjective dictionaryCompound adverbs or frozen or idiomatic adverbs (the last two pointsof Jespersenrsquos list) can be defined as

ldquo() adverbs that can be separated into several wordswith some or all of their words frozen that is semanticallyandor syntactically noncompositionalrdquo (Gross 1986 p 2)

Therefore just as happens in polyrematic expressions the syntacticelements that compose such adverbs can not be freely shifted substi-tuted or deleted This despite the fact that from a syntactic point ofview they work just as simple adverbsFollowing the LG paradigm and according to their internal mor-phosyntactic structure the compound adverbs have been classified inmany languages namely English(Gross 1986) French (Gross 1990Laporte and Voyatzi 2008 Tolone and Stavroula 2011) Italian (Elia1990 Elia De Gioia 1994ab 2001) Modern Greek (Voyatzi 2006)Portuguese (Baptista 2003) etcThe LG description and classification of compound adverbs rep-resents the intersection of two criteria corresponding to the phe-nomenon of the multiword expressions and to the adverbial functions(Tolone and Stavroula 2011 Laporte and Voyatzi 2008)According to (Laporte and Voyatzi 2008 p 32)

ldquo() a phrase composed of several words is considered tobe a multiword expression if some or all of its elements arefrozen together that is if their combination does not obeyproductive rules of syntactic and semantic compositional-ityrdquo

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 109

The adverbial functions instead pertain to the adverbial roleswhich can take the shape of circumstantial complements or of com-plements which are not objects of the predicate of the clause in whichthey appearIn LG descriptions compound adverbs are formalized as a sequenceof parts of speech and syntactic categories Their morpho-syntacticstructures are described on the base of the number the category andthe position of the frozen and free components of the adverbialsThe Italian classification system for compound adverbs follows thesame logic as the French one with which it shares a strong similar-ity in terms of syntactic structuresIn this research among the 16 classes of idiomatic adverbs selected byDe Gioia (1994ab) we worked on the ones reported in Table 323The comparative structures with come ldquolikerdquo treated by Vietri (19902011) as idiomatic sentences will be described in Section 342The adverbs belonging to the classes described in the first 9 rows of

Table 323 have been manually selected and evaluated starting fromthe electronic dictionaries and the LG tables of Elia (1990) while theclasses PF PV and PCPN come from the LG tables of De Gioia (2001)In the first classes Elia (1990) recognized the following three abstractstructures

1 Prep (E+Det) (E+Agg) C1 (E+Agg)

(a) PC

(b) PDETC

(c) PAC

(d) PCA

2 Prep (E+Det+Agg) C1 (E+Agg) Prep (E+Det+Agg) C2 (E+Agg)

(a) PCDC

(b) PCPC

110CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Class

Structu

reE

xamp

leTran

lation

PC

Prep

Ccon

trovoglia

un

willin

glyP

DE

TC

Prep

DetC

conle

cattiveb

yan

ym

eans

necessary

PAC

Prep

Agg

Ca

pien

ivotiw

ithfl

ying

colo

rsP

CA

Prep

CA

gga

testaalta

with

you

rh

eadh

eldh

ighP

CD

CP

repC

diC

concogn

izione

dicau

saw

ithfu

llknow

ledge

PC

PC

Prep

CP

repC

diben

ein

meglio

from

goo

dto

better

PC

ON

G(P

rep)

CC

on

gC

drsquoam

oree

drsquoaccord

oin

armo

ny

CC

CC

nien

tem

alen

oth

ing

bad

CP

CC

Prep

Clrsquoira

diD

ioa

kingrsquos

ranso

mP

FC

om

po

un

dSen

tence

sisalvichip

uograve

everym

anfo

rh

imself

PV

Prep

VW

sedu

tastan

teo

nth

esp

ot

PC

PN

Prep

CP

repN

inon

taa

insp

iteo

f

Table

323Classes

ofC

om

po

un

dA

dverb

sin

SentIta

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 111

3 (E+Prep) (E+Det) C1 (E+Cong+Prep) (E+Det+Agg) C2

(a) PCONG

(b) CC

(c) CPC

Table 324 shows the distribution of each one of the evaluation(POSNEG) intensity (FORTEDEB) and discourse tags36 (SINTESINEGAZIONE etc) in every class of compound adverbAs we can see in the two lower section of the Table 324 the percentageof compound adverbs that posses an orientation or that are signifi-cant for Sentiment Analysis purposes is the 51 of the total amount ofcompound adverbs under exam Moreover it must be observed thatthe 60 of the labels are connected to the intensity while we have onlythe 30 of evaluation tagsThe presence of discourse tags seems to be marginal but if comparedwith other part of speech (or even with the -mente and monolithic ad-verbs see Section 352) it reveals its significance

3411 Compound Adjectives of Sentiment

A great part of the polyrematic units with an adverbial function canalso play the role of adjectives

Adverbial function agire a fin di bene ldquoto act for goodrdquo

Adjectival function una bugia a fin di bene ldquoa white lierdquo

For simplicity we did not distinguish the cases in which theyposses just one function from the cases in which they can have boththe roles However separating these two groups is often impossible

36These tags annotate 70 discourse operators able to summarize (eg in parole povere ldquoin simpletermsrdquo) invert (eg nonostante cioacute ldquoneverthelessrdquo) confirm (eg in altri termini ldquoin other termsrdquo) com-pare (eg allo stesso modo ldquoin the same wayrdquo) and negate (eg neanche per sogno ldquoin your dreamsrdquo) theopinion expressed in the previous sentences of the text

112CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

if one considers also their invariability and the fact that their func-tion depends on semantic features rather then on morphological ones(Voghera 2004)

AVV+AVVC+PC+PN without adjectival functionin conclusione ldquoin conclusionrdquo

AVV+AVVC+PC+PN with adjectival functiona vanvera (eg parlare a vanvera ldquoto prattlerdquo un discorso a van-vera ldquoa prattlerdquo)

Therefore we just did not make any distinction into the dictionar-ies in order to avoid the duplication of the entries but we recalledthem also in the grammars dedicated to adjectives as nouns modi-fiers and into constructions of the kind N0 (essere + E) Agg Val (seeSection 335)

342 Frozen Expressions of Sentiment

In this thesis oriented words are always considered in context In thisparagraph we will focus on the importance that the frozen sentencescan have in a LG based module for Sentiment AnalysisTraditional and generative grammar generally treat idioms as aberra-tions and many taggers and corpus linguistic tools just ignore multi-word units and frozen expressions (Vietri 2004 Silberztein 2008)The LG paradigm started with Gross (1982) the systematic and formalstudies on frozen sentences underling their non-exceptional naturefrom both the lexical and syntactic point of views Indeed idioms oc-cupy in the lexicon a volume that is comparable with the one of thefree forms (Gross 1982)This kind of approach opens plenty of possibilities in the field of Sen-timent Analysis in which the collection of lexical resources endowedwith a defined polarity can not just run aground on simple words butneeds to be opened also to different kinds of oriented multiword ex-

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 113

Tag

PC

CC

CPC

PAC

PCA

PCDC

PCONG

PCPC

PDETC

PF

PV

PCPN

TotP

OS+

FO

RT

E0

60

42

02

15

10

021

PO

S1

52

915

101

014

00

057

PO

S+D

EB

00

03

22

20

51

00

15N

EG

+DE

B0

01

04

01

37

31

020

NE

G36

156

155

62

519

70

512

1N

EG

+FO

RT

E0

00

11

12

23

11

012

FO

RT

E10

240

549

2613

3518

768

50

377

DE

B23

131

144

45

26

10

073

SIN

TE

SI13

20

61

20

01

05

030

CO

NF

ER

MA

52

00

00

00

50

01

13IN

VE

RT

E4

20

80

00

07

01

022

CO

MP

30

01

00

00

10

00

5N

EG

AZ

ION

E3

00

00

00

21

01

18

Tot

190

8515

110

6038

5033

150

2214

777

4To

tCla

ss87

234

610

431

632

024

813

212

762

120

212

453

332

1Pe

rcen

tage

()

2225

1435

1915

3826

2311

1113

51

Eva

luat

ion

tags

()

1931

6029

4850

2033

3559

1471

32In

ten

sity

tags

()

6662

4057

5045

8061

5541

360

58D

isco

urs

eta

gs(

)15

70

142

50

610

050

2910

Tab

le3

24C

om

po

siti

on

oft

he

com

po

un

dad

verb

dic

tio

nar

y

114CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

pressions and frozen sentencesAccording to (Vietri 2014b p 32)

() a verb when co-occurring with a certain noun (or a setof nouns) produces a ldquospecial meaningrdquo that it would nothave been assigned if the noun was substituted This typeof meaning is conventionally defined ldquonon-compositionalrdquobut this does not automatically imply from a syntactic pointof view that idioms are ldquounitsrdquo

On the base of this statement strictly connected to the distributionalproperties of the idiomatic sentences we included in our sentimentdatabase more than 500 Italian frozen sentences that have been man-ually evaluated and then formalized with pairs of dictionary-grammarThis choice is connected to the purpose of lemmatizing them in lexicaldatabases taking into account through FSA also their syntactic flexi-bility and lexical variation Treating into a computational way idiomsmeans to deal with the problems they pose in terms of relationship be-tween ldquointerpretationrdquo and ldquosyntactic constructionsrdquo (Vietri 2014b)The source of the work are the Lexicon-Grammar descriptions of id-ioms that take the shape of binary tables in which lexical entries aredescribed on the base of syntactic and semantic propertiesThe formal notation is the traditional one used in the LG frameworkThe symbol C is used in frozen sentences to indicate a fixed nom-inal position that can not be substituted by different items belong-ing to the same class or semantic field and that can not be morpho-grammatically modified (Vietri 2004)The Lexicon-Grammar classification of Italian idioms includes morethan 30 Lexicon-Grammar classes of idioms with ordinary verbs andsupport verbs The most common support verbs are avere to ldquohaverdquoessere ldquoto berdquo fare ldquoto makerdquo They differ from their ordinary counter-parts because of their semantic emptiness When they are implied inthe formation of idiomatic constructions they present a very high lex-ical and syntactic flexibility that can take the shape of the following

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 115

phenomena (Vietri 2014d)

bull the alternation of support verbs with aspectual variants (eg ri-manere ldquoto remainrdquo diventare ldquoto becomerdquo)

bull the production of causative constructions (eg with verbs likerendere ldquoto makerdquo)

bull the deletion of the support verb (eg N0 Agg come C1 una donnaastuta come una volpe ldquoa woman as sharp as a tackrdquo)

Because of the complexity of their lexical and syntactic variabilitythat often affects the intensity of the idiom itself we chose to reducethe sample of the idioms under examination for Sentiment Analysispurposes just to the frozen sentences with the support verb essere thatinclude adjectives in their structure namely PECO CEAC EAA ECAand EAPC Due to the fact that a large part of the idioms belongingto this subgroup contains adjectives evaluated in SentIta we foundinteresting a comparison between the prior polarity of such adjectivesinvolved in idioms and the polarity assigned to the idiom itself Theresults of this evaluation of differences is reported in Table 325

PECO CEAC EAA ECA EAPC TotIdioms in the LG Class 274 36 40 153 162 665Idioms in SentIta 245 27 32 134 139 577Adj of SentIta in Idioms 133 12 15 37 56 253

Table 325 Polar adjectives in frozen sentences

3421 The Lexicon-Grammar Classes Under Examination PECO CEAC EAAECA and EAPC

Vietri (1990) showed that the comparative frozen sentences of the typeN0 Agg come C1 (PECO) usually intensify the polarity of the adjectiveof sentiment they contain as happens in (36) in which the SentIta

116CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

adjective bello endowed with a polarity score of +2 occurring into theidiom Essere bello come il sole changes its polarity in +3

(36) Maria egrave bella[+2] come il sole [+3]ldquoMaria is as beautiful as the sun

Actually it is also possible for an idiom of this sort to be polarizedwhen the adjective (eg bianco ldquowhite) contained in it is neutral (37)or even to reverse its polarity as happens in (38) (eg agile ldquoagile ispositive)

(37) Maria egrave bianca[0] come un cadavere [-2]ldquoMaria is as white as a dead body (Maria is pale)

(38) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

In this regard it is interesting to notice that the 84 of the idiomshas a clear SO while just the 36 of the adjectives they contain ispolarized see the examples in Table 326 This confirms the signifi-cance of a sentiment lexicon that includes also frozen expressions inits list Indeed a resource of this sort is able to disambiguate the casesin which the polarity of sentiment (or neutral) words is changed by theironic nature of the idioms in which they occur (38) (39) (40)The disambiguation rule is only one and consist in annotating alwaysthe longest match in the text analysis

(39) Maria egrave alta[0] come un soldo di cacio [-2]ldquoMaria is knee-high to a grasshopperrdquo (Maria is short)

(40) Maria egrave asciutta[0] come lrsquoesca [-2]ldquoMaria is on the bones of its arserdquo (Maria is penniless)

Other idioms included in our resources are the ones belonging tothe classes EAPC that involve the presence of an adjective or a verb inits past participle form a prepositional complement These frozen ex-pressions usually intensify the polarity of the adjectivepast participle

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 117

Froz

enTr

ansl

atio

nA

dj

Ad

jId

iom

Idio

mSe

nte

nce

Po

lari

tySc

ore

Po

lari

tySc

ore

esse

reag

ile

com

eu

na

gatt

ad

ipio

mbo

ldquoto

be

asag

ile

asa

lead

catrdquo

PO

S+2

NE

G-2

esse

reag

ile

com

eu

na

gazz

ella

ldquoto

be

asag

ile

asa

lead

gaze

llerdquo

PO

S+2

PO

S+

2es

sere

agil

eco

me

un

osc

oiat

tolo

ldquoto

be

asag

ile

asa

squ

irre

lrdquoP

OS

+2P

OS

+2

esse

reas

tuto

com

eil

dem

onio

ldquoto

be

ascl

ever

asth

ed

evilrdquo

-0

PO

S+

2es

sere

astu

toco

me

un

avo

lpe

ldquoto

be

assh

arp

asa

tack

rdquo-

0P

OS

+2

esse

reas

tuto

com

eu

no

zin

garo

ldquoto

be

ascu

nn

ing

asa

gyp

syrdquo

-0

NE

G-2

Tab

le3

26E

xam

ple

so

fid

iom

sth

atm

ain

tain

sw

itch

or

shif

tth

ep

rio

rp

ola

rity

oft

he

adje

ctiv

esth

eyco

nta

in

118CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

(see example below) but just as the PECO ones they can also switchit or shift it

(41) Maria egrave matta[-2] da legare [-3]ldquoMaria is so crazy she should be locked up

The LG class EAA with definitional structure N0 essere Agg e Agg in-cludes two adjectives that can be polarized or not Example (42) showsthat in this case as well the sentence polarity can be also oppositewith respect to the adjectives ones

(42) Maria egrave bella[+2] e fritta[0] [-2]ldquoMaria is cooked

An example from the class CEACis reported in (43)

(43) Lrsquoanima egrave nera[0] come il carbone [-3]ldquoThe soul is black like the coalrdquo

Among the transformation that these idioms can have we includedN0 avere C Agg (44) that corresponds also to the most frequent one

(44) Maria ha lrsquoanima nera[0] come il carbone [-3]ldquoMaria has a black soul like the coalrdquo

In the end we formalized the LG class ECA that accounts in itsfrozen elements the nominal element C1 and the Adjective

(45) Maria egrave una gatta morta[0] [-2]ldquoMaria is a cock tease

3422 The Formalization of Sentiment Frozen Expressions

Due to their syntactic and lexical variation and to discontinuity frozenexpressions need to be formalized in an electronic context able to sys-tematically list them in electronic databases and to take into accounttheir syntactic variability Basically the final purpose is to take advan-tage of the large number of information listed into the LG matrices

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 119

while recognizing and annotating idioms in real textsOne of the best solution is to link dictionaries which contain the listsof Atomic Linguist Units and their syntactic and semantic propertieswith syntactic grammars which can make such words and propertiesinteract into a FSA context (Silberztein 2008) ALU are ldquothe smallestelements that make up the sentence ie the non-analyzable units ofthe languagerdquo (Silberztein 2003) They can take the shape of simplewords affixes multiword expressions and frozen expressionsWhat makes different frozen expressions and the other ALU is discon-tinuity as displayed in Figure 34 in fact it is possible for adverbs tooccur between the verb and the direct object (Vietri 2004) That iswhy frozen sentences need both dictionaries and FSA to be recognizedand annotated in a correct way the dictionaries allow the recogni-tion of idioms thanks to the characteristic components (word forms orsequences that occur every time the expressions occur and originatethe recognition of the frozen expressions in texts (Silberztein 2008))and FSA let the machine compute them despite of the many differentforms that in real texts they can assumeFigure 34 represents a simplified version the main graph of the FSA

for the recognition and the sentiment annotation of idioms of the LGclass PECO It includes a metanode that allows the recognition of anominal group in subject position (N0) an embedded graph for theverb and six nodes related to different Semantic OrientationsThe instruction XREF is used to exclude the linguistic material thatcan occur between the support verb and the idiom object when theannotation is performed Figure 35 shows as an example the contentof the metanode that annotates frozen expressions with +3 as Seman-tic Orientation (MOLTOPOS) Here we observe the interactions be-tween the idiom dictionary and the restrictions reported in the syntac-tic grammar In its structure the grammar under examination looksexactly the same as the grammar for the identification of free sen-tences with the same syntactic shape The difference are the syntac-tic and distributional constraints that in the grammar recall only the

120CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re34E

xtractofth

eF

SAfo

rth

eSO

iden

tificatio

no

fidio

ms

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 121

properties and the lexical items associated in the dictionary with aspecific characteristic component To be more precise the PECO dic-tionary among others lists the three entries shown below which havebeen labeled with the semantic tags POS and FORTE (+3) as happensin the other subsets of SentIta

Essere (forte+coraggioso) come un leone ldquoTo be (strong+brave) like alionrdquo

leoneN+Class=PECO

+A=coraggioso+A=forte+AVV=come+AVV=quanto+DET=un

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0essereunC1+N0esserecomeC1+N0esserepiu

Essere buono come il pane ldquoTo be as good as breadrdquo

paneN+Class=PECO

+A=buono+AVV=come+AVV=quanto+DET=il+PREP=di

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserediC1+N0esserecomeC1+N0esserepiu

Essere (astuto+furbo) come il demonio ldquoTo be as clever as the devilrdquo

demonioN+Class=PECO

+A=astuto+A=furbo+AVV=come+AVV=quanto+DET=il

+Sent=POS+Int=FORTE

+N0um

+Vsup=essere+Vsup=diventare+Vsup=restare+Vsup=rimanere

+Vcaus=rendere

+N0esserepiu

122CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

As shown below the characteristic components C (eg leone ldquolionrdquopane ldquobreadrdquo demonio ldquodevilrdquo) have been used as Nooj entry in theelectronic dictionary and have been associated to the other compo-nents of the frozen expressions (eg +A +AVV +DET) Moreover theyhave been described also with other information referring to the fol-lowing properties borrowed from the already cited LG tables

distributionaleg +N0um +Vsup +Vcaus

transformationaleg +N0essereunC1 +N0esserecomeC1 +N0esserediC1 +N0esserepiu

All these properties are recalled in the idiom syntactic grammar inform of lexical and grammatical constraintThe restrictions +N0um and +Vsup +Vcaus are respectively meant forthe distributional properties of sentence subject that can or can notbe a human and of the support verb essere with its aspectual (Vsup)or causative (Vcaus) variants While the support verbs restare and ri-manere are always acceptable diventare and rendere are not accept-able only in few cases (Vietri 1990)We chose to formalize the distributional restrictions in the syntacticgrammars with the following instruction written under each involvednode and into a related variable (var)

lt$var=$C$vargt

It represents a constraint for the words that are allowed to appearinto the nodes it restricts which can be only the ones indicated in thedictionary by the same variable var for the entry C As an example theidiom Essere (forte+coraggioso) come un leone that in the dictionaryhas as values for the variable +A only the adjectives forte ldquostrongrdquo andcoraggioso ldquobraverdquo can not accept in the grammar into the variable Aother adjectives than the ones written above if the variable goes withthe restriction shown below

lt$A=$C$Agt

34 TOWARDS A LEXICON OF OPINIONATED COMPOUNDS 123

The same happens with the verb aspectual and causative variants(Vietri 1990) with similar restrictions

lt$Vvar=$C$Vvargtlt$Vcaus=$C$Vcausgt

The symbols ldquo=rdquo is used in Figure 35 when the inflection of theword in the variable is permitted (eg for the adjective essere furbicome il demonio) Otherwise it is used the symbol ldquo=rdquoWe preferred to treat transformational properties instead differentlyfrom Vietri (1990) In order to reduce the dimension of the gram-mar that contains here also the polarity and intensity informationwe wrote transformational constraints before the path starts in the fol-lowing form

lt$C$vargt

This instructs the FSA that the specific path is allowed just for theidioms that present in the dictionary the variable var for the entry CIn the FSA in Figure 35 the PECO standard structure Essere Agg come

C1 is recognized by the path (2) The path (1) deletes the adjective ifthe idiom recognized by C possesses the specific transformational rule+N0esserecomeC1 Below are reported examples of idioms that satisfythis constraint The ones preceded by the asterisk do not satisfy therestrictions

essere come un leone ldquoto be like a lionrdquoessere come il pane ldquoto be like the breadrdquoessere come il demonio ldquoto be like the devilrdquo

In the path (3) the grammar performs the deletion of both the ad-jective and the adverb come for the idioms endowed with the property+N0essereunC1 in the electronic dictionary

essere un leone (di uomo) ldquoto be a lion (man)rdquoessere un pane ldquoto be a breadrsquoessere un demonio (di uomo) ldquoto be a devil (man)rdquo

124CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re35E

xtractofth

eF

SAfo

rth

eextractio

no

fPE

CO

idio

ms

35 THE AUTOMATIC EXPANSION OF SENTITA 125

Differently from essere un pane that is not acceptable at all es-sere come il demonio and essere un demonio (di uomo) are accept-able sentences that change the polarity of the basic idiom probablybecause they are related with another comparative sentence with op-posite meaning and orientation eg essere (brutto + cattivo) come ildemonio ldquoto be (ugly + evil) like the devilrdquo (Vietri 1990)The last two paths respectively refer to the correlation of PECO withother sentence structures such as N0 essere di C1 (4) indicated by theproperty +N0esserediC1 and N0 essere piugrave Agg di C summarized in+N0esserepiu in the dictionary and in the path (5)

essere (fatto + E) di pane ldquoto be made of breadrdquoessere (fatto + E) di leone ldquoto be made of lionrdquoessere (fatto + E) di demonio ldquoto be made of devilrdquo

essere piugrave (forte + coraggioso) di un leone ldquoto be (stronger + braver)than a lionrdquoessere piugrave buono del pane ldquoto be better that the breadrdquoessere piugrave (astuto + furbo) del demonio ldquoto be cleverer that the devilrdquo

While just the first idiom is allowed to walk through the path (4)the path (5) implies a property accepted by all the exemplified idioms

35 The Automatic Expansion of SentIta

The present Section discusses the automatic enlargement of the man-ually built electronic dictionaries of SentItaOur work took advantage of derivation linguistic clues that put in rela-tion semantically oriented adjectives with quality nouns and with ad-verbs in -mente The purpose is making new words automatically de-rive the semantic information associated to the adjectives with whichthey are morpho-phonologically relatedFurthermore we used as morphological Contextual Valence Shifters(mCVS) a list of prefixes able to negate or to intensifydowntone the

126CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

orientation of the words with which they occurWe clarify in advance that the morphological method could have beenapplied also to Italian verbs but we chose to avoid this solution be-cause of the complexity of their argument structures We decidedinstead to manually evaluate all the verbs described in the ItalianLexicon-Grammar binary tables so we could preserve the differentlexical syntactic and transformational rules connected to each oneof them

351 Literature Survey on Sentiment Lexicon Propagation

The works on sentiment lexicon propagation follow three main re-search lines The first one is grounded on the richness of the alreadyexistent thesauri WordNet (Miller 1995) among others AlthoughWordNet does not include semantic orientation information for itslemmas semantic relations such as synonymy or antonymy are com-monly used in order to automatically propagate the polarity startingfrom a manually annotated set of seed words Anyway this approachpresents some drawbacks such as the lack of scalability the unavail-ability of enough resources for many languages including Italian andthe difficulty to handle newly coined words which are not alreadycontained in the thesauri Furthermore homographs which belongto different synsets could present a diversity of meanings and polari-ties (Silva et al 2010)The second line of research is based on the hypothesis that the wordsthat convey the same polarity appear close in the same corpus sothe propagation can be performed on the base of co-occurrence al-gorithmsIn the end the morphological approach is the one that employs mor-phological structures and relations for the the assignment of the priorsentiment polarities to unknown words on the base of the manipula-tion of the morphological structures of known lemmas

35 THE AUTOMATIC EXPANSION OF SENTITA 127

In the following Paragraphs we will briefly describe the most interest-ing contributions pertaining to the just described research areas

3511 Thesaurus Based Approaches

Kamps et al (2004) investigated the graph-theoretic model of Word-Netrsquos synonymy relation and using elementary notions from graphtheory proposed a measure for the automatic attribution of the se-mantic orientation to adjectives All the words listed in WordNet havebeen collected and related on the base of their occurrence in the samesynsetAlthough current WordNet-based measures of distance or similarityusually focus on taxonomic relations and in general include in theiralgorithms the antonymy relation the authors preferred to use onlythe synonymy relationHu and Liu (2004) chose to limit the lexicon construction to the ad-jectives occurring in sentences that contained one or more productfeatures due to their defined interest on product featuresIn their research in order to predict the semantic orientations ofsuch adjectives the authors proposed a simple and effective methodgrounded on the adjective synonym and antonym sets discoveredsurfing the WordNet graphsKim and Hovy (2004) proposed a system for the automatic detectionof opinion holders topics and features for each opinion that containstwo modules one for the determination of the word sentiment andanother for the sentiment sentencesIn their System Architecture it is interesting for the task in exam thestage in which the word sentiment classifier individually calculatedthe polarity of all the sentiment-bearing words Their basic approachstarted from a small amount of seed words sorted by polarity into apositive and a negative list WordNet has been used to lengthen theinitial list on the base of two very intuitive insights synonyms of polarwords which usually preserve the orientation of the original lemma

128CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

and antonyms of polar words that reverse the orientation of the origi-nal lemmasEsuli and Sebastiani (2005) presented a semi-supervised learningmethod for the orientation identification of subjective terms basedon the classification of their glossesThe authors started from a representative seed set built for both thepositive and negative categories and from some lexical relations de-fined in WordNet (eg synonymy hypernymy and hyponymy for theannotation of terms with the same orientation and direct antonymyand indirect antonymy for the terms with opposite orientation) Everytime in which a new term is added to the original ones it becomes partof the training set for the learning phase performed with a binary textclassifier All the glosses of a given term found in a machine-readabledictionary are collocated into a vectorial form thanks to standard textindexing techniquesEsuli and Sebastiani (2006b) tested the following approaches

bull a two-stage method in which a first classifier groups the termsinto the Subjective or Objective categories and another classifiertags the Subjective terms as Positive or Negative

bull other two binary classifiers based on learning algorithms cat-egorise terms belonging to the Positivenot-Positive categoriesand the ones fitting the Negativenot-Negative categories

bull a method that simply considers Positive Negative and Objectiveas three categories with equal status

Argamon et al (2009) grounded on the Appraisal Theory a methodfor the automatic determination of complex sentiment-related at-tributes (eg type and force) The authors applied supervised learn-ing to Word-Net glosses The method started from small training setsof (positive or negative) seed words and then added them new setsof terms collected in the WordNet graph along the synonymy andantonymy relations This way based on the WordNet glosses each

35 THE AUTOMATIC EXPANSION OF SENTITA 129

term can received a vectorial representations thanks to standard textindexing techniquesDragut et al (2010) based their research on few basic assumptions

bull the semantic relations between words with known polarities canbe used to enlarge sentimental word dictionaries

bull two words are related if they share at least a synset

bull each synset has a unique polarity

bull the deduction can be performed just on few WordNet semanticrelations eg hyponymy antonymy and similarity

Thanks to these intuitions given a sentimental seed dictionary theydeduced approximately an additional 50 of words endowed with aspecific polarity on top of the electronic dictionary WordNetHassan and Radev (2010) applied a Markov random walk model to alarge word relatedness graph with the purpose of producing a polarityestimate of the subjective wordsIn detail in their network two nodes are linked if they are semanti-cally related Among the sources of information used as indicators ofthe relatedness of words we mention WordNetThe hypothesis of the work is the following a random walk that startswith a given word is more likely to firstly hit a word with the same se-mantic orientation than words with a different orientationPaulo-Santos et al (2011) with a semi-supervised approach createda small polarity lexicon (3000+ entries 8486 accuracy) for the Por-tuguese language having as input only a limited collection of re-sources a set of ten seed words labelled as positive or negative acommon online dictionary a set of extraction rules and an intuitivepolarity propagation algorithmThe authors implemented their task by converting the online dictio-nary into a directed graph in which nodes correspond to words andedges correspond to synonyms antonyms or other semantic relationsbetween words Thus they propagated the polarities of the known

130CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

seed set to unlabelled words applying a graph breadth-first traversalIn their research Maks and Vossen (2011) explored two approachesfor the annotation of polarity (positive negative and neutral) of ad-jective synsets in Dutch using WordNet as lexical resource The firstapproach is based on the translation of the English SentiWordNet 10(Esuli and Sebastiani 2006b) into Dutch and the respective transferof the lemmasrsquo polarity values In the second approach WordNet isused in combination with a propagation algorithm that starts with aseed list of synsets of known sentiment and extend the polarity valuesthrough WordNet making use of its lexical relations

3512 Corpus Based Approaches

Baroni and Vegnaduzzo (2004) observed the co-occurrences of polaradjectives into subjective texts The ranking of a large list of adjectivesaccording to a subjectivity score has been implemented without anyknowledge-intensive external resource (eg lexical databases humanannotation ect ) The authors made use just of an unannotated listof adjectives and a small seeds set of manually selected subjective ad-jectivesThe main idea of this work is taken from the Turney (2001) work on co-occurrence statistics applied on the web as a corpus through the Web-based Mutual Information (WMI) method Baroni and Vegnaduzzo(2004) obtained their subjectivity scores by computing the Mutual In-formation of pairs of adjectives taken from each set using frequencyand co-occurrence frequency counts on the World Wide Web col-lected through queries to the AltaVista search engineKanayama and Nasukawa (2006b) proposed an unsupervised methodof lexicon construction for the annotation of polar clauses for domain-oriented Sentiment AnalysisThey grounded their work on unannotated corpora and anchoredtheir research on the context coherency (the tendency for equal po-larities to appear successively in contexts) achieving very satisfying re-

35 THE AUTOMATIC EXPANSION OF SENTITA 131

sults in terms of PrecisionQiu et al (2009) proposed a double propagation method which in-volved different kinds of relations between both sentiment words andfeatures (words modified by the sentiment words)With this method it has been possible to locate specific sentimentwords from relevant reviews starting from a small set of seed senti-ment words Their research emphasized the fact that in reviews senti-ment words always occur in association with featuresDouble propagation basically means that both sentiment words andfeatures can be reutilized to extract new sentiment words and newfeatures until no more sentiment words or features can be identifiedto continue the process The relations are described above all by De-pendency Grammars and trees Tesniegravere (1959) which represented thebase for the design of the extraction rulesIn detail the polarity of the new words is inferred using the followingrules

bull Heterogeneous rule the same polarity of known words is as-signed to the novel words

bull Homogeneous rule the same polarity of known words is as-signed to the novel words unless contrary words occur betweenthem

bull Intra-review rule in the cases in which the polarity cannot beinferred by dependency cues it is assumed that the sentimentword takes the polarity of the whole review

Maks et al (2012) proposed a method for the Dutch language thatis based on the idea that the words that express different types of sub-jectivity are distributed differently depending on the text types there-fore they grounded their work on the comparison between three cor-pora Wikipedia News and News commentsIn order to perform their task they used and test two different calcula-tions generally employed in Keyword Extraction tasks to identify the

132CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

words significantly frequent in a corpus with respect to other corporathe Log-likelihood Technique and the Percentage Difference Calcula-tions (DIFF) Their research revealed the better performances of theDIFF calculationWawer (2012) introduced for the Polish language a novel iterativetechnique of sentiment lexicon expansion which involved rule min-ing and contrast sets discovery The research took advantages fromtwo different textual resources in a first stage a small corpus endowedwith morpho-syntactic annotations (The National Corpus of Polish)to acquire candidates for emotive patterns and evaluate the vocabu-lary and then the whole web as corpus to perform the lexical expan-sionThe key idea of this work implies the fact that if a word appears inthe same extraction patterns as the seeds so they belong to the samesemantic class

3513 The Morphological Approach

Hatzivassiloglou and McKeown (1997) proposed a method for thesentiment lexicon expansion based on morphological relations be-tween adjectives They demonstrated that adjectives related in formalmost always have different semantic orientations eg ldquoadequate-inadequaterdquo ldquothoughtful-thoughtlessrdquo achieving 97 accuracyMoilanen and Pulman (2008) proposed five methods for the assign-ment of the prior sentiment polarities to unknown words based onknown sentiment carriers They started from a core sentiment lexi-con which contained 41109 entries tagged with positive (+) neutral(N) or negative (-) prior polarities and then they employed a classi-fier society of five rule-driven classifiers every one of which adopteda specific analytical strategy

Conversion that estimated zero-derived paronyms by retagging theunknown words with different POS tags

35 THE AUTOMATIC EXPANSION OF SENTITA 133

Morphological derivation that relied on derivational and inflectionalmorphology and is based on the words progressive transforma-tion into shorter paronymic aliases by using affixes and neo-classical combining forms Also some polarity reversal affixes(eg -less and not-so-) were supported

Affix-like Polarity Markers that computed affix-like sentiment mark-ers (eg well-built badly-behaving) which usually propagatesits sentiment orientation across the whole word

Syllables that split unknown words into individual syllables with arule-based syllable chunker and then computed them in orderto connect them with the words contained in the original lexicon

Shallow Parsing that split and POS-tagged unknown word into vir-tual sentences

Ku et al (2009) employed morphological structures and relationsfor the opinion analysis on Chinese words and sentences They classi-fied the words into eight morphological types through Support VectorMachines (SVM) and Conditional Random Fields (CRF) classifiersChinese morphological formative structures consist in three majorprocesses compounding affixation and conversion Every word iscomposed of one or more Chinese characters on the base of whichthe wordrsquos meaning can be interpretedThe authors demonstrated that the injection of morphological infor-mation can truly improve the performances of the word polarity de-tection taskNeviarouskaya (2010) proposed the expansion of the Sentiful lexicongrounding its task on the manipulation of the morphological struc-ture base and affixes of known lemmas (Plag 2003)The derivation process a linguistic device for the creation of new lem-mas from known ones by adding prefixes or suffixesThe results of the application of this method are 4000+ new derivedand scored sentiment words (1400+ adjectives almost 500 adverbs

134CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

1800 nouns and almost 350 verbs)The author distinguished four types of affixes on the base of their rolesin the sentiment feature attribution

Propagating The sentiment features are preserved and are inher-ited by new derived lexical units often belonging to differentgrammatical categories (eg en- -ous -fy) The scoring functiontransfers the original polarity score to the new word without anyvariation

Reversing The affixes have the effect of changing the semantic orien-tation of the original lexeme (eg dis- -less) The scoring func-tion simply switches the original score of the starting lemma

Intensifying The sentiment features are strengthened (eg over-super-) The scoring function increases the score of the new wordwith respect to the score of the original lemma

Weakening The sentiment features are decreased (eg semi-) so thescore of the derive word is proportionally reduced by the scoringfunction

In the the approach of Neviarouskaya (2010) if a word is not alreadycontained in the Sentiful lexicon the algorithm checks the presenceof the missing lemma into the Wordnet database if the word actuallyexists there it is taken into account for a future inclusion into theSentiful lexicon Its sentiment orientation is always deduced from thefeatures of the starting word and the affixes used to derivate itNeviarouskaya et al (2011) described methods to automatically gen-erate and score a new sentiment lexicon SentiFul and expand itthrough direct synonymy and antonymy relations hyponymy rela-tions derivation and compounding with known lexical unitsThe authors distinguished four types of affixes used to derive newwords depending on the role they play with regard to sentimentfeatures propagating reversing intensifying and weakening Theyelaborated the algorithm for automatic extraction of new sentiment-

35 THE AUTOMATIC EXPANSION OF SENTITA 135

related compounds from WordNet by using words from SentiFul asseeds for sentiment-carrying base components and applying thepatterns of compound formationsWang et al (2011) proposed a morpheme-based fine-to-coarse strat-egy for Chinese sentence-level sentiment classification which usedpre-existing sentiment dictionaries in order to extract sentiment mor-phemes and calculate their polarity intensity Such morphemes arethen reused to evaluate the sentence-level semantic orientation onthe base of the sentiment phrases and their relevant polarity scoresThe authors preferred morphemes to words as the basic tokens forsentiment classification because they are much less numerous thanwordsWang et al (2011) derived the semantic orientation of unknown senti-ment words on the base of their component morphemes (Yuen et al2004 Ku et al 2009) thanks to a specific morpheme segmentationmodule which applied the Forward Maximum Matching (FMM) wordsegmentation technique for the decomposition of words into stringsof morphemes Their approach can manage both unknown lexicalsentiments and contextual sentiments for sentence-level sentimentclassification by taking into consideration multiple granularity-levelsentiments (eg sentiment morphemes sentiment words and senti-ment phrases) in a morpheme-based framework

352 Deadjectival Adverbs in -mente

As anticipated this Section aims to enlarge the size of SentIta on thebase of the morphological relations that connect the words and theirmeanings In a first stage of the work more than 5000 labeled ad-jectives have been used to predict the orientation of the adverbs withwhich they were morphologically related This Section explores therules formalized to perform the task

136CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Adverbs are morphologically invariable and consequently they donot present any inflection Anyway the majority of them is charac-terized by a complex structure that includes an adjective base and thederivational morpheme -mente ldquo-lyrdquo

[[veloce]A -mente]AVV ldquofast-lyrdquo

[[fragile]A -mente]AVV ldquodelicate-lyrdquo

[[rapido]A -mente]AVV ldquorapid-lyrdquo

Therefore thanks to a morphological FSA in SentIta the dictio-nary of sentiment adverbs has been derived from the adjectives one(see Figure 36) All the adverbs contained in the Italian dictionary ofsimple words have been put in a Nooj text and the above-mentionedgrammar has been used to quickly populate the new dictionary by ex-tracting the words ending with the suffix -mente ldquo-lyrdquo (Ricca 2004b)and by making such words inherit the adjectivesrsquo polarityThe Nooj annotations consisted in a list of 3200+ adverbs that at alater stage has been manually checked in order to adjust the gram-marrsquos mistakes (eg vigorosamente ldquovigorouslyrdquo and pazzamenteldquomadlyrdquo that respectively come from the positive adjective ldquovigorousrdquoand the negative adjective ldquomadrdquo as adverbs become intensifiers) andto add the Prior Polarity to the adverbs that did not end with the suf-fixes used in the grammar (eg volentieri ldquogladlyrdquo controvoglia ldquoun-willinglyrdquo)The manual check produced a set of 3600+ adverbs of sentiment thathas been completed with 126 adverbs that did not end in -menteIn detail the Precision achieved in this task is 99 and the Recall is88 The distribution of semantic tags in this dictionary is reported inTable 327 Figure 36 shows the local grammar in which we formal-ized the rules for the adverbs formation We started the word recogni-tion anchoring it on the localization of an adjective stem (see the firstnode on the left in the variable Agg)Than the automaton continues the word analysis by checking

35 THE AUTOMATIC EXPANSION OF SENTITA 137

Fig

ure

36

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofs

enti

men

tad

verb

s

138CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Tag Automatically Manuallygenerated adjusted

POS+FORTE 82 89POS 743 779

POS+DEB 61 55NEG+DEB 436 456

NEG 1273 1466NEG+FORTE 160 158

FORTE 366 482DEB 84 163

SINTESI 0 9CONFERMA 0 17

INVERTE 0 19Tot 3205 3693

Tot Int 450 645Tot Sent 2755 3003

Tot Discorso 0 42

Table 327 Composition of the adverb dictionary

that the adjective stem belongs to the sentiment dictionary (eg$Agg=A+POS+DEB) Figure 36 represents just an extract of the biggergrammar that includes not only the positive (POS) adverbs but alsothe negative onesNotice in the center of the FSA a multiplication of paths This dependson the different inflectional classes of adjectives to which the suffix -mente is attached (Figure 328)The rules used in the grammar to derive the adverbs are given in thefollowing (see Table 329 and Figure 328)

bull class 2 first path nothing in the adjective changes (eg veloc-eveloce-mente)

bull class 2 second path in the adjectives ending in -re -le the -e isdeleted [e] (eg fragil-e fragil-mente )

bull class 1 third path the -o is deleted [o] and substituted by the

35 THE AUTOMATIC EXPANSION OF SENTITA 139

Adj Number=s Number=pClass Gender=m Gender=f Gender=m Gender=f

1 -o -a -i -e2 -e -i

Table 328 Adjectiversquos inflectional classes Salvi and Vanelli (2004)

Adj Adjective AdverbClass Deletion Stem Thematic Vowel Suffix

1 rapid-o o rapid- -a-2 veloc-e - veloce- - -mente

fragil-e e fragil- -

Table 329 Rules for the adverb derivation

thematic vowel -a (eg rapid-o rapid-a-mente)

Actually in the FSA almost nothing about the inflection of the baseadjectives has been specified Because the adverbs of sentiment mustbe identified among the whole list of nooj adverbs and semanticallyannotated (and not generated from scratch) we did not find neces-sary to recall all the inflectional classes of the derived adjectives justthe deletion or the conservation of the final vowels was used to selectthe correct derivational ruleMistakes concerning the annotation of the adjectives regard those ex-ceptions that were not enough productive to deserve a specific pathin the local grammar Examples are the adjectives in -lento althoughthey have the -o as last vowel they do not require the thematic vowel-a- but the -e- (eg violento becomes violent-e-mente rather thanviolent-a-mente)

3521 Semantics of the -mente formations

The meaning of the deadjectival adverbs in -mente is not always pre-dictable starting from the base adjectives from which they are derived

140CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Also the syntactic structures in which they occur influences their in-terpretation Depending on their position in sentences the deadjecti-val adverbs can be described as follows

Adjective modifiers they modify adjectives or other adverbs eventhough two adverbs in -mente can appear close almost never

bull degree modifiers

ndash intensifierseg altamente ldquohighlyrdquo enormemente ldquoenormouslyrdquo

ndash downtonerseg moderatamente ldquomoderatelyrdquo parzialmente ldquopar-tiallyrdquo

Predicate modifiers they are translatable with the paraphrase in a Away in which A is the adjective base

bull verb argumentseg Maria si comporta perfettamenteldquoMaria acts perfectlyrdquo

bull extranuclear elementseg Maria si reca settimanalmente a MilanoldquoMaria goes weekly to Milanrdquo

Sentence modifiers they usually do not modify the sentence contentbut they often give it coherence or work as discourse signals

bull evaluative adverbs eg stupidamente ldquofoolishlyrdquo

bull modal adverbs eg necessariamente ldquonecessarilyrdquo

bull circumstantial adverbs of time eg ultimamente ldquolatelyrdquo

bull quantifiers over time eg frequentemente ldquooftenrdquo

bull domain adverbs eg politicamente ldquopoliticallyrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 141

The French LG description of adverbs in -mente includes 3200+items represented in LG tables which classify the adverbs in a sim-ilar way They refer to sentential adverbs eg conjuncts style dis-juncts and attitude disjuncts that includes evaluative adverbs adverbsof habit modal adverbs and subject oriented attitude adverbs andadverbs integrated into the sentence eg adverbs of subject orientedmanner adverbs of verbal manner adverbs of quantity adverbs oftime viewpoint adverbs focus adverbs (Molinier and Levrier 2000Sagot and Fort 2007)

3522 The Superlative of the -mente Adverbs

As concern the superlative form of the adverbs in -mente it must beunderlined that they have been treated as adverbs derived from thesuperlative form of the adjectives In fact the rule for the adverb for-mation is selected by the inflectional paradigm of the superlative formand not by the adjective inflection Moreover the semantic orienta-tion inherited by the superlative adverb is again the one belonging tothe superlative adjective and not the one of the adjective itself There-fore the superlative adverbs FSA is almost the same of the one shownin Figure 1 the only difference is in the recognition of the base adjec-tive that is A+SUP rather than only A

353 Deadjectival Nouns of Quality

The derivation of quality nouns (QN) from qualifier adjectives is an-other derivation phenomenon of which we took advantage for the au-tomatic enlargement of SentIta These kind of nouns allow to treat asentities the qualities expressed by the base adjectivesA morphological FSA (Figure 37) following the same idea of the ad-verbs grammar matches in a list of abstract nouns the stems that are

142CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

in morpho-phonological relation with our list of hand-tagged adjec-tives Because the nouns differently from the adverbs need to havespecified the inflection information we associated to each suffix en-try in the QN suffixes electronic dictionary the inflectional paradigmthat they give to the words with which they occur (see Table 330)

As regards the suffixes used to form the quality nouns (Rainer 2004)it must be said that they generally make the new words simply inheritthe orientation of the derived adjectives Exceptions are -edine and -eria that almost always shift the polarity of the quality nouns into theweakly negative one (-1) eg faciloneria ldquoslapdash attitude Also thesuffix -mento differs from the others in so far it belongs to the deriva-tional phenomenon of the deverbal nouns of action (Gaeta 2004) Ithas been possible to use it into our grammar for the deadjectival nounderivation by using the past participles of the verbs listed in the ad-jective dictionary of sentiment (eg Vsfinire ldquoto wear out Asfinitoldquoworn out Nsfinimento ldquoweariness) Basically we chose to includeit in our list of suffixes because of its productivity This caused an over-lap of 475 nouns derived by both the psychological predicates and thequalifier adjectives of sentiment Just 185 psych nominalizations ex-ceeded the coverage of our mophological FSA proving the power ofour methodology also in terms of RecallIn about half the cases the annotations differed just in terms of inten-

sity the other mistakes affected the orientation also (see Table 333)In general the Precision achieved in this task is 93 while the Preci-sion performances of the automatic tagging of the QN compared withthe manual tagging made on the corresponding nominalizations ofthe psych verbs is summarized in Table 334Tables 331 and 332 show the differences in productivity of all the suf-

fixes (SFX) among the electronic dictionary of abstract nouns (Abstr)the sublists of QN that have their counterparts in the Psy V-n list (PsyV-n=QN) and the whole collection of QN (QN tot)As regards the suffixes productivity Abstr the suffixes that achievedthe higher percentage values are -itagrave -ia and -(zione) The most pro-

35 THE AUTOMATIC EXPANSION OF SENTITA 143

Fig

ure

37

Ext

ract

oft

he

FSA

for

the

auto

mat

ican

no

tati

on

ofq

ual

ity

no

un

s

144CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Infl

ecti

on

alp

arad

igm

(FL

X)

Co

rrec

tmat

ches

Err

ors

Pre

cisi

on

()

-edin-e N46 0 0 --etagrave N602 0 0 --izie N602 0 0 --el-a N41 0 1 0-udine N46 5 2 7143-or-e N5 36 9 80-zion-e N46 359 59 8589-anz-a N41 57 9 8636-itudin-e N46 13 2 8667-ur-a N41 142 20 8765-ment-o N5 514 58 8986-izi-a N41 14 1 9333-enz-a N41 148 10 9367-eri-a N41 71 4 9467-ietagrave N602 27 1 9643-aggin-e N46 72 2 973-i-a N41 145 3 9797-itagrave N602 666 13 9809-ezz-a N41 305 2 9935-igi-a N41 3 0 100-z-a N41 2 0 100tot - 2579 196 9294

Table 330 Error analysis of the automatic QN annotation

35 THE AUTOMATIC EXPANSION OF SENTITA 145

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 498 73 305-aggin-e 149 13 72-itudin-e 31 1 13-ietagrave 65 2 27-igi-a 9 0 3-izi-a 40 2 14-enz-a 461 11 148-anz-a 189 7 57-eri-a 267 13 71-itagrave 3263 53 666-or-e 297 6 36-zion-e 2866 87 359-ur-a 1700 11 142-udin-e 39 3 5-i-a 3545 13 145-el-a 15 0 0-edin-e 4 0 0-etagrave 66 0 0-izie 3 0 0-z-a 30 2 2-ment-o 2520 150 514

Table 331 Presence of the QN suffixes in different dictionaries

146CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Suffi

x(S

FX

)

Suffi

xin

Ab

str

Suffi

xin

Psy

V-n

=Q

N

Suffi

xin

QN

-ezz-a 31 1633 628-aggin-e 093 291 148-itudin-e 019 022 027-ietagrave 04 045 056-igi-a 006 0 006-izi-a 025 045 029-enz-a 287 246 305-anz-a 118 157 117-eri-a 166 291 146-itagrave 2032 1186 1372-or-e 185 134 074-zion-e 1785 1946 74-ur-a 1059 246 293-udin-e 024 067 01-i-a 2208 291 299-el-a 009 0 0-edin-e 002 0 0-etagrave 041 0 0-izie 002 0 0-z-a 019 045 004-ment-o 1569 3356 1059

Table 332 Productivity of the QN suffixes in different dictionaries

35 THE AUTOMATIC EXPANSION OF SENTITA 147

Co

rres

po

nd

ence

s

Dif

fere

nce

s

Dif

fere

ntO

rien

tati

on

On

lyD

iffe

ren

tIn

ten

sity

475 185 93 92

Table 333 About the overlap among Psych V-n dictionary and QN dictionary

Precision Precision Orientation onlyQNltgtPsy V-n 092 -QN= Psy V-n 061 080

Table 334 Precision measure about the overlap of QN and Psy V-n

ductive suffixes for the Psy V-n=QN formation are -(z)ione -mento and-ezza while the ones most productive for the QN tot are -itagrave -(z)ione-mento

354 Morphological Semantics

Our morphological FSA can moreover interact with a list of pre-fixes able to negate (eg anti- contra- non- ect ) or to inten-sifydowntone (eg arci- semi- ect ) the orientation of the wordsin which they appear (Iacobini 2004)If the suffixes for the creation of quality nouns can interact with thepreexisting dictionaries of the Italian module of Nooj in order toautomatically tag them with new semantic descriptions the prefixestreated in this Section directly work on opinionated documents sothe machine can understand the actual orientation of the wordsoccurring in real texts also when their morphological context shifts

148CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

the polarity of the words listed in the dictionariesThe collection of the suffixes that from now on we will call Morpho-logical Contextual Valence Shifters (mCVS) is reported in Tables 335and 336

Prefixes Negation typesnon- CDDmis- CRa- CR+PRVdis- CR+PRVin- CR+PRVs- CR+PRVanti- OPPcontra- OPPcontro- OPPde- PRVdi- PRVes- PRV

Table 335 Negation suffixes

They are endowed with special tags that specify the way in whichthey alter the meaning of the sentiment words with which they occur

bull FORTE ldquostrongrdquo intensifies the Semantic Orientation of thewords making their polarity increase of one position in the eval-uation scale (first path in Figure 38)

bull DEB ldquoweakrdquo downtones the Semantic Orientation of the wordsmaking their polarity decrease of one position in the evaluationscale (second path in Figure 38)

bull NEGAZIONE ldquonegationrdquo works following the same rules of thenegation formalised in the syntactic grammars (third path in Fig-ure 38)

ndash CR ldquocontraryrdquo

35 THE AUTOMATIC EXPANSION OF SENTITA 149

Intensifiers Downtonersiper- bis-macro- fra-maxi- infra-mega- intra-multi- ipo-oltre- micro-pluri- mini-poli- para-sopra- semi-sovra- sotto-stra- sub-super- -sur- -ultra- -

Table 336 Intensifierdowntoner suffixes

ndash OPP ldquooppositionrdquo

ndash PRV ldquodeprivationrdquo

ndash CDD ldquocontradictionrdquo

Figure 38 shows the morphological FSA that combines the polarityand intensity of the adjectives of sentiment with the meaning carriedon by the mCVSThe shifting rules in terms of polarity score which are the same ex-ploited in the syntactic grammar net (see Section 4) are summarizedin this grammar through the green comments on the right side of theautomatonAs regards the manipulation of the annotations of the resulting wordswe followed three parallel solutions

bull when the score doesnrsquot change (eg the case of the intensificationof words with starting polarity +3 or the case in which words withpolarity -1 are decreased) the resulting word simply inherits theinflectional ($2F) and syntactic and semantic ($2S) information

150CHAPTER 3 SENTITA LEXICON-GRAMMAR BASED SENTIMENT RESOURCES

Figu

re38M

orp

ho

logicalC

on

textualV

alence

Shiftin

go

fAd

jectives

35 THE AUTOMATIC EXPANSION OF SENTITA 151

of the original word

bull when the intensity changes and the polarity remains steady (egwhen the words with polarity +-2 are intensified or downtoned)the resulting word inherits the inflectional and syntactic and se-mantic information of the original word but also obtain the tagsFORTEDEB

bull when both the intensity and the polarity change (in almost ev-ery case in which the words are negated) the resulting word justinherits the inflectional information while the semantic tags areadded from scratch

We used the list of the sentiment adjectives as Nooj text in order tocheck the performances of our grammar The discovery that the wordscontaining the mCVS lemmatised in the dictionary are very few (just29 adjectives eg straricco ldquovery richrdquo ultraresistente ldquoheavy dutyrdquostramaledetto ldquodamnedrdquo and 9 adverbseg strabene ldquovery wellrdquo ul-trapiattamente ldquovery dullyrdquo) increases the importance of a FTA likethe one described in this ParagraphIt is also important to underline that all the synonyms of poco ldquofewrdquoalso the ones that take the shape of morphemes eg ipo- sotto- sub-do not seem be proper downtoners but resemble the behaviour ofthe negation words that transform the sentiment words with whichthey occur into weakly negative ones see Section 42 (eg ipodotatoldquosubnormalrdquo ipofunzionante ldquohypofunctioningrdquo) Therefore we ex-cluded them from the dictionary of mCVS and we included it into adedicated metanode of the morphological FSA able to correctly com-pute its meaning

Chapter 4

Shifting Lexical Valencesthrough FSA

41 Contextual Valence Shifters

In Section 331 we provided the definitions of Semantic Orientationthe measure of the polarity and intensity of an opinion (Taboada et al2011 Liu 2010) and of Prior Polarity the individual words contex-independent orientation (Osgood 1952) While the second concept isrelevant during the population of sentiment dictionaries the first onerefers to the final semantic annotation that can be automatically ormanually attributed to whole sentences and documentsThe need to make a distinction between these two measures is due tothe semantic modifications that can come from the words surround-ing contexts (Pang and Lee 2008a) Contextual Valence Shifters arelinguistic devices able to change the prior polarity of words when co-occurring in the same context (Kennedy and Inkpen 2006 Polanyiand Zaenen 2006)From a computational point of view Contextual Valence Shifting

153

154 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

is nothing more than an enhanced version of the term-countingmethod firstly proposed by Turney (2002) but from the distribu-tional analysis perspective it issues a number of challenges that de-serve to be examined (Neviarouskaya et al 2009a)In this work we handle the contextual shifting by generalizing all thepolar words that posses the same prior polarity A network of localgrammars as been designed on a set of rules that compute the wordsindividual polarity scores according to the contexts in which they oc-curIn general the sentence annotation is performed using six differentmetanodes that working as containers for the sentiment expressionsassign equal labels to the patterns embedded in the same graphsWe used the FSA editor of Nooj to formalize our CVS rules for theirease of use and their computational power but nothing prevents theuse of other strategies and tools for their application to texts Con-sequently in this chapter we will limit the discussion to the shift-ing rules without any further mentions of their actual formalizationwhich in a banal way can be summarized into the treatment of em-bedded graphs as score boxes

42 Polarity Intensification and Downtoning

421 Related Works

Amplifiers and downtoners (Quirk et al 1985) intensifiers and di-minishers (Polanyi and Zaenen 2006) overstatements and under-statements (Kennedy and Inkpen 2006) intensifiers and downtoners(Taboada et al 2011) are all couples of concepts that refer to the in-creasing or decreasing effects of special kinds of words on orientedlexical itemsAlthough the changes of the polarity scores are very intuitive the sim-ple addition and subtraction of a score tofrom the base valence of a

42 POLARITY INTENSIFICATION AND DOWNTONING 155

term (Polanyi and Zaenen 2006 Kennedy and Inkpen 2006) has beencriticized by Taboada et al (2011) which instead associated differentpercentage values (positive for intensifiers and negative for downton-ers) to each strength wordDifferently from both of these works we decided to restrain the com-plexity of the rules by avoiding the distinction of degrees In compen-sation we did not limit the intensive function only to degree adverbsand adjectives but we took under consideration also verbs and nouns

422 Intensification and Downtoning in SentIta

The lexical items for the computational treatment of intensification inSentIta are the ones denoted by the tags FORTE and DEBAs stated in the previous Chapter Section 33 the lexical items en-dowed with these labels do not possess any orientation but can mod-ify the intensity of semantically oriented wordsThe strength indicators included in the intensification FSA that ac-count for (but are not limited to) such modifiers are the following

bull Morphological Rules

ndash the use of the absolute superlative suffixes -issim-o and -errim-o that always increase the wordsrsquo semantic orienta-tions

ndash the use of intensifying or downtoning prefixes (eg super-stra- semi- micro-) shown in Section 354

bull Syntactic Rules

ndash the repetition of more than one negative or positive words(that always increases the wordsrsquo semantic orientations) egbello[+2] bello[+2] bello[+2] [+3] ldquonice nice nicerdquo)

ndash the co-occurrence of words belonging to the strength scale(tags FORTEDEB) with the sentiment words listed in the

156 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

evaluation scale (tags POSNEG)

As generally assumed in literature the adverbs intensify or atten-uate adjectives (46) verbs (48) and other adverbs (49) while the ad-jectives modify the intensity of nouns (47) We adopted the simpleststrategy for the polarity modification the additionsubtraction of onepoint into the evaluation scale (Polanyi and Zaenen 2006 Kennedyand Inkpen 2006) Because this scale does not exceed the +ndash3words that starts from this polarity can not be increased beyond (egdavvero[+] eccezionale[+3] [+3] ldquotruly exceptionalrdquo)

(46) Parzialmente[-] deludente[-2] anche il reparto degli attori [-1]ldquoPartially unsatisfying also the actor staff

(47) Ciograve che ne deriva () egrave una terribile[-2] confusione[-2] narrativa[-3]

ldquoWhat comes from it is a terrible narrative chaos

(48) Alla guida ci si diverte[+2] molto[+] [+3]ldquoIn the driverrsquos seat you have a lot of fun

(49) Ne sono rimasta molto[+] favorevolmente[+2] colpita [+3]ldquoI have been very favourably affected of it

Anyway as already discussed in Section 334 also special subsetsof verbs possess the power of intensifying or downtoning the polarityof the nominal groups or complement clauses they case by case affectin subject (50) or object (51) position (see Figure 31 Section 334)

(50) Lrsquoamore[+2] travolge[+] Maria [+3]ldquoThe love overwhelms Mariardquo

(51) Le tue parole invadono[+] Maria di contentezza[+2] [+3]ldquoYour words overwhelm Mary with happinessrdquo

Examples of intensifyingdowntoning verbs with the LG classesthey belong are reported below

42 POLARITY INTENSIFICATION AND DOWNTONING 157

bull the list of entries translated from the French work of Balibar-Mrabti (1995) eg travolgere[+] ldquoto overwhelmrdquo risplendere[+] ldquotoshinerdquo

bull verbs from LG class 24 (15 intense 2 weak) eg aumentare[+]

ldquoto increaserdquo gonfiare[+] ldquoto amplifyrdquo calare[ndash] ldquoto go downrdquodiminuire[ndash] ldquoto decreaserdquo

bull verbs from LG class 41 (45 intense 19 weak) eg rivoluzionare[+]

ldquoto revolutionizerdquo sconquassare[+] ldquoto smashrdquo moderare[ndash] ldquotomoderaterdquo placare[ndash] ldquoto appeaserdquo

bull verbs from LG class 47 (43 intense 18 weak) eg strillare[+] ldquotoshoutrdquo sventolare[+] ldquoto show offrdquo mormorare[ndash] ldquoto murmurrdquosussurrare[ndash] ldquoto whisperrdquo

bull verbs from LG class 50 (4 intense 0 weak) eg assicurare[+] ldquotoassurerdquo garantire[ndash] ldquoto guaranteerdquo

bull verbs from LG class 56 (5 intense 3 weak) eg abbandonarsi[+]

ldquoto abandon yourselfrdquo affrettarsi[+] ldquoto rushrdquo accennare[ndash] ldquototouch uponrdquo esitare[ndash] ldquoto hesitaterdquo

As regards intensifying or downtoning nouns we refer to the onesautomatically derived from degree adjectives (220 intense 95 weak)eg sfrenatezza[+] ldquowildnessrdquo abbondanza[+] ldquoabundancerdquo labilitagrave[ndash]

ldquoevanescencerdquo fugacitagrave[ndash] ldquofugacityrdquo and to the nominalizations of in-tensifying or downtoning verbs (41 intense 95 weak) eg rafforza-mento[+] ldquostrengtheningrdquo fervore[+] ldquofervorrdquo affievolimento[ndash] ldquoweak-ening rdquo diminuzione[ndash] ldquodecreaserdquoThey can modify both other nouns (52) and verbs (53)

(52) La sfrenatezza[+] dellrsquoodio[-3] di Maria [-3]ldquoThe wildness of the Maryrsquos haterdquo

(53) Luca difendeva[+2] Maria con fervore[+] [+3]ldquoLuca was defending Maria with fervourrdquo

158 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Intensification negation modality and comparison can appear to-gether in the same sentence We will describe these eventuality in thefurther paragraphs

423 Excess Quantifiers

Words that at first glance seem to be intensifiers but at a deeper anal-ysis reveal a more complex behavior are abbastanza ldquoenoughrdquo troppoldquotoo much and poco ldquonot much Because of their particular influ-ence on polar words their meaning especially when associated topolar adjectives have been associated in literature to other contex-tual valence shifters1 Examples are (Meier 2003 p 69) that proposedan interpretation that includes a comparison between extents

ldquoEnough too and so are quantifiers that relate an extentpredicate and the incomplete conditional (expressed bythe sentential complement) and are interpreted as compar-isons between two extentsThe first extent is the maximal extent that satisfies the extentpredicate The second extent is the minimal or maximal ex-tent of a set of extents that satisfy the (hidden) conditionalwhere the sentential complement supplies the consequentand the main clause the antecedent [] Intuitively thevalue an object has on a scale associated with the meaningof the adjective or adverb I related to some standard of com-parison that is determined by the sentential complementrdquo(Meier 2003 p 69)

(Schwarzschild 2008 p 316) that noticed the possibility of an implicitnegation

1 When quantifiers like the ones under examination modify polar words they compare such wordswith a threshold value that is defined by an infinitive clause in position N1 (eg The food is too good tothrow (it) away) (Meier 2003 Schwarzschild 2008)

42 POLARITY INTENSIFICATION AND DOWNTONING 159

ldquoIn excessives the infinitival complement while it does notform a threshold description it does contain an implicitnegation So like other negative statements excessives sup-port let alone rejoinders2rdquo

And again (Meier 2003 p 71) that transformed the infinitive clauseinto a modal expression

ldquoIn this paper I will propose that the sentential comple-ments of constructions with too and enough implicitly orexplicitly contain a modal expression [] Evidence for thismove is the fact that a modal expression (with existentialforce) can be added or omitted without changing the intu-itive meaning of the sentences [] An explicit modal ex-pression like be able or be allowed in the sentential comple-ment makes them more precise but it does not make themunacceptablerdquo

In this research we noticed as well that the co-occurrence of troppopoco and abbastanza with polar lexical items can provoke in their se-mantic orientation effects that can be associated to other contextualvalence shifters The ad hoc rules dedicated to these words (see Table41) are not actually new but refer to other contextual valence shift-ing rules that have been discussed in this Paragraph andor will beexplored in this ChapterTroppo and poco can also co-occur or can be negated the rules to beapplied in each case are reported in Table 42Troppo poco and abbastanza are also able to give a polarity to wordsthat in SentIta possess a neutral Prior Polarity An exception is the caseof non troppo + neutral words (Table 42) that does not produce gen-eralizable effects (eg non troppo decorato ldquonot so decoratedrdquo couldhave a weakly positive orientation while same can not be said of non

2This fuel is too volatile to use in a car engine = This fuel is too volatile to use in a car engine let alone alawn mower engine = This fuel should not be used in a car engine let alone a lawn mower engine Examplefrom Schwarzschild (2008)

160 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo Intensification Negative Switch Intensificationpoco Weakly Negative Switch Weakly Negative Switch Weak Negationabbastanza Downtoning Weakly Positive Switch Intensification

Table 41 The effects of troppo poco and abbastanza on polar lexical items

Positive Word Neutral Word Nagative WordExamples bello decorato bruttotroppo poco Strong Negation Negative Switch Downtoningun poco troppo Intensification Negative Switch Intensificationnon troppo Negative Switch mdash Weak Negationnon poco Intensification Weakly Positive Switch Intensification

Table 42 The effects of the co-occurrence and the negation of troppo poco withreference to polar lexical items

troppo nuovo ldquonot so newrdquo) Therefore it has been excluded from therules in order to avoid additional errors

43 Negation Modeling

431 Related Works

Negation modeling is an NLP research area that includes the auto-matic detection of negation expressions in raw texts and the determi-nation of the negation scopes that are the parts of the meaning modi-fied by negation) (Jia et al 2009)Negation meets the Sentiment Analysis need of determining the cor-rect polarity of opinionated words when shifted by the context That iswhy we included it into our set of Contextual Valence Shifters (Polanyiand Zaenen 2006 Kennedy and Inkpen 2006)In this paragraph we will survey the literature on negation modelinggoing in depth through the state-of-the-art methods for the automatic

43 NEGATION MODELING 161

negation detection from the pioneer work of Pang and Lee (2004) tomore sophisticated techniques such as some Semantic Compositionmethods which even define the syntactic contexts of the polar expres-sions (Choi and Cardie 2008)Despite the large interest on negation in the biomedical domain(Harkema et al 2009 Descleacutes et al 2010 Dalianis and Skeppstedt2010 Vincze 2010) maybe due to the availability of free annotatedcorpora (Vincze et al 2008) this paragraph focuses on the more sig-nificant works on sentiment analysis

Bag of Words (Pang and Lee 2004) demonstrated that using con-textual information can improve the polarity-classification accu-racy By means of a standard bag-of-words representation theyhandled negation modeling by adding artificial words to textsand without any explicit knowledge of polar expressions (eg Ido not NOT_like NOT_this NOT_new NOT_Nokia NOT_model)The problem with this method is the duplication of the featureswhich are always represented in both their plain and negated oc-currences (Wiegand et al 2010)

Contextual Valence Shifters Polanyi and Zaenen (2006) and Kennedyand Inkpen (2006) proposed a more sophisticated method thatboth switches (eg clever[+2] = not clever[-2]) or shifts (efficient[+2]

= rather efficient[+1]) the initial scores of words when negated ordowntoned in context As a general rule polarized expressionsare considered to be negated if the negation words immediatelyprecede themWilson et al (2005 2009) distinguished the following three typesof binary relationship polarity features for negation modeling

bull negation features negation expressions that negate polar ex-pressions

bull shifter features expressions that can be approximatelyequated with downtoners

162 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull positive and negative polarity modification features specialkind of polar expressions that modify other polar expres-sions turning their polarity into the positive or negative one

Semantic Composition Moilanen and Pulman (2007) exploited thesyntactic representations of sentences into a compositional se-mantics framework with the purpose of computing the polarityof headlines and complex noun phrases In this work composi-tion rules are incrementally applied to constituents dependingon the negation scope by means of negation words shifters andnegative polar expressionsA similar approach is the one of Shaikh et al (2007) that used verbframes as a more abstract level of representationChoi and Cardie (2008) computed the phrases polarity in twosteps the assessment of polarity of the constituent and the sub-sequent application of a set of previously defined inference rulesFor example the rule [Polarity([NP1]ndash [IN] [NP2]ndash) = +] can be ap-plied to phrases like [[lack]ndash

NP1 [of ]IN [crime]ndashNP2 in rural areas])

Liu and Seneff (2009) Taboada et al (2011) proposed composi-tional models able to mix together intensifiers downtoners po-larity shifters and negation words always starting from the wordsprior polarities and intensitiesThe work of Benamara et al (2012) goes beyond the studies men-tioned above since they specifically account for negative polarityitems and for multiple negativesSocher et al (2013) exploited Recursive Neural Tensor Network(RNTN) to compute two types of negation

bull negation of positive sentences where the negation is sup-posed to changes the overall sentiment of a sentence frompositive to negative

bull negation of negative sentences and their negationwhere theoverall sentiment is considered to become less negative (butnot necessarily positive)

43 NEGATION MODELING 163

Negation Scope Detection Jia et al (2009) introduced the conceptof scope of the negation term t The scope identification3 goesthrough the computing of a candidate scope (a subset of thewords appearing after t in the sentence that represent the mini-mal logical units of the sentence containing the scope) and thena pruning of those words The computational procedure thatapproximates the candidate scope considers the following ele-ments static delimiters eg because or unless that signal the be-ginning of another clause dynamic delimiters eg like and forand heuristic rules focused on polar expressions that involve sen-timental verbs adjectives and nouns

Morphological Negation modeling In order to complete the litera-ture survey on negation modeling we can not fail to mentionalso morphological negation modeling However due to the factthat the scope of any morphological negation remains includedinto simple words (Councill et al 2010) we deepened this as-pect of the negation in Sections 351 and 354 of Chapter 3We mention again here only the most significant contributionon this topic Hatzivassiloglou and McKeown (1997) Plag (2003)Yuen et al (2004) Moilanen and Pulman (2008) Ku et al (2009)Neviarouskaya (2010) Neviarouskaya et al (2011) and Wang et al(2011)

432 Negation in SentIta

As exemplified in the following sentences negation indicators do notalways change a sentence polarity in its positive or negative counter-parts (54) they often have the effect of increasing or decreasing thesentence score (55) That is why we prefer to talk about valence ldquoshift-ing rather than ldquoswitchingrdquo

3Scope Identification concerns the determination at a sentence level of the tokens that are affectedby negation cues (Jia et al 2009)

164 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

(54) Citroen non[neg] produce auto valide[+2] [-2]ldquoCitroen does not produce efficient cars

(55) Grafica non proprio[neg] spettacolare[+3] [-1]ldquoThe graphic not quite spectacular

(Taboada et al 2011 p 277) made comparable considerations

ldquoConsider excellent a +5 adjective If we negate it we get notexcellent which intuitively is a far cry from atrocious a -5adjective In fact not excellent seems more positive than notgood which would negate to a -3 In order to capture thesepragmatic intuitions we implemented another method ofnegation a polarity shift (shift negation) Instead of chang-ing the sign the SO value is shifted toward the opposite po-larity by a fixed amount (in our current implementation 4)Thus a +2 adjective is negated to a -2 but the negation of a-3 adjective (for instance sleazy) is only slightly positive aneffect we could call lsquodamning with faint praisersquo [] it is verydifficult to negate a strongly positive word without implyingthat a less positive one is to some extent true and thus ournegator becomes a downtonerrdquo

Just as (Benamara et al 2012 p 11)

ldquo() negation cannot be reduced to reversing polarity Forexample if we assume that the score of the adjective ldquoex-cellentrdquo is +3 then the opinion score in ldquothis student is notexcellentrdquo cannot be -3 It rather means that the student isnot good enough Hence dealing with negation requires togo beyond polarity reversalrdquo

Even though the subtraction of a fixed amount of polarity points4

seems to us a risky generalization the authors confirmed the Hornrsquos

4 Other strategies for the automatic negation modeling have been proposed by Yessenalina and Cardie(2011) who combined words using iterated matrix multiplication and Benamara et al (2012) who morein line with our research computed negation on the base of its own types

43 NEGATION MODELING 165

ideas on the asymmetry between affirmative and negative sentencesin natural language differently from standard logic (Horn 1989)Such assumption complicates any attempt to automatically extractthe meaning of negated phrases and sentences simply switching theiraffirmative scoresOur hypothesis that can be easily collocated into the compositionalsemantics approaches is that the final polarity of a negated expres-sion can be easily modulated but only taking into account at thesame time both the prior polarity of the opinionated expressions andthe strength of the negation indicators (see Tables 43 45 and 46)Despite of any complexity the finite-state technology offered us theopportunity to easily compute the negation influence on polarizedexpressions and to formalize the rules to automatically annotate realtextsAs regards the full list of negation indicators we included in our gram-mar three types of negation (Benamara et al 2012 Godard 2013) thatrespectively include the following items As we will show each typehas a different impact on the opinion expressions in terms of polarityand intensity

Negation operators

bull simple adverbs of negationnoAVV+NEGAZIONE

nonAVV+NEGAZIONE

micaAVV+NEGAZIONE

affattoAVV+NEGAZIONE+FORTE

neancheAVV+NEGAZIONE

neppureAVV+NEGAZIONE

mancoAVV+NEGAZIONE

senzaPREP+NEGAZIONE 5

bull compound adverbs of negationneanche per sognoAVV+AVVC+PC+PN+NEGAZIONE+FORTE

5ldquonotrdquo ldquoat allrdquo ldquoby no meansrdquo ldquoneitherrdquo

166 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

per niente al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

per nulla al mondoAVV+AVVC+PCPC+PNPN+NEGAZIONE+FORTE

in nessun modoAVV+AVVC+PDETC+PDETN+NEGAZIONE+FORTE

per nienteAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

per nullaAVV+AVVC+PC+PAVV+NEGAZIONE+FORTE

niente affattoAVV+AVVC+CC+AVVAVV+NEGAZIONE+FORTE6

Negation operator Rules Negative operators can appear alone (56)or with a strengthening function with another negation adverb(57)

(56)

Mi caA f f at to

Per ni ente

bel lo l primehotel

ldquoTotally not cool the hotelrdquo

(57) Lprimehotel non egrave

mi caa f f at to

per ni ente

bel lo

ldquoThe hotel is not cool at allrdquo

The general rules concerning negation operators formalized inthe grammars in the form of FSA are summarized in the Tables43 44 45 and 46 The local grammar principle is to put inthe same embedded graph all the structures that are supposedto receive the same score Such score is indicated in the columnShifted Polarity

Negative Quantifiers

bull AdjectivesnessunoA+FLX=A114+NEGAZIONE

nienteA+FLX=A605+NEGAZIONE 7

6ldquono wayrdquo ldquofor anything in the worldrdquo ldquotherersquos no wayrdquo ldquofor nothingrdquo ldquonot at allrdquo7ldquonobodyrdquo ldquonothingrdquo

43 NEGATION MODELING 167

Negation Sentiment Word ShiftedOperator Word Polarity Polarity

non

fantastico +3 -1bello +2 -2

carino +1 -2scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 43 Negation rules

Negation Negation Sentiment Word ShiftedOperator Operator Word Polarity Polarity

non mica

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 44 Negation rules with the repetition of negative operators

Strong Negation Sentiment Word ShiftedOperator Word Polarity Polarity

per niente

fantastico +3 -2bello +2 -3

carino +1 -3scialbo -1 +2brutto -2 +2

orribile -3 +1

Table 45 Negation rules with strong operators

168 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Weak Negation Sentiment Word ShiftedOperator Word Polarity Polarity

poco

fantastico +3 -1bello +2 -1

carino +1 -1nuovo 0 -1scialbo -1 +1brutto -2 +1

orribile -3 -1

Table 46 Negation rules with weak operators

bull AdverbsnienteAVV+NEGAZIONE

nullaAVV+NEGAZIONE

pocoAVV+NEGAZIONE+DEB

poco o nienteAVV+PCONG+NKN+NEGAZIONE+DEB

poco o nullaAVV+PCONG+NKN+NEGAZIONE+DEB maiAVV+NEGAZIONE8

bull PronounsnessunoPRON+FLX=PRO717+NEGAZIONE

nientePRON+FLX=PRO719+NEGAZIONE

pocoPRON+FLX=PRO721+NEGAZIONE+DEB9

Negative Quantifiers Rules As indicated in the dictionary extractabove and as exemplified in the sentences below negative quan-tifiers which express both a negation and a quantification cantake the shape of different part of speech and consequently theycan assume in sentences different syntactic positionsWe can directly observe that they change their function on thebase of the different roles they assume

Negative quantifiers as Adjectives they tend to be pragmatically

8ldquonothingrdquo ldquolittlerdquo ldquolittle or nothingrdquo ldquoneverrdquo9ldquonot onerdquo ldquofewrdquo

43 NEGATION MODELING 169

negative when occurring as adjectival modifiers (58 59) justin the same way of lexical negation indicators (Potts 2011)

(58) Cortesia nessuna [-2]ldquoNo kindnessrdquo(59) Nessun servizio nelle stanze [-2]ldquoNo room servicerdquo

Negative quantifiers as Adverbs they work exactly as poco whenthey occupy the adverbial position (60 61) following therules formalized in the next paragraph (see Table 46) As ad-verbs they can occur together with negation operators (61)or not (60)

(60) Costa quasi niente [+1]ldquo It costs almost nothingrdquo(61) Non costa nulla [+2]ldquo It doesnrsquot cost anythingrdquo

Negative quantifiers as Pronouns they seem to possess the samestrengthening function of negative operators when occur-ring as pronouns (62 63) the rules of reference are in Table45

(62) Non lo consiglio a nessuno [-3]ldquoI donrsquot recommend it to anyonerdquo(64) Non gliene frega niente a nessuno [-3]ldquoNobody caresrdquo

Negative quantifiers in verbless sentences into elliptical sen-tences (64 65) they resemble the negation function of neg-ative operators following the rules of Table 43

(65) Niente di veramente innovativo [-1]ldquoNothing truly innovativerdquo(66) Nulla da eccepire [+1]ldquoNo objectionsrdquo

170 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Lexical Negation

Nouns

V er bs

Ad j ect i ves

mancanzaassenzacar enza

mancar edi f et t ar e

scar seg g i ar e

mancantecar ente

pr i vo

di

10

Lexical negation Rules lexical negation regards content word nega-tors that always possess an implicit negative influence There-fore when they occur in texts their polarity is assumed to be neg-ativeBecause they can co-occur with both the negative operators andquantifiers their polarity can also be shifted to the positive oneby simply applying the negation rules conceived for the first twonegation indicators

44 Modality Detection

441 Related Works

Speculative Language Detection (Dalianis and Skeppstedt 2010 De-scleacutes et al 2010 Vincze et al 2008) Hedge Detection (Lakoff 1973Ganter and Strube 2009 Zhao et al 2010) Irrealis Bloking (Taboadaet al 2011) and Uncertainty Detection (Rubin 2010) are all tasks thatcompletely or partially overlap modal constructions

10ldquolack to lack in to be laking ofrdquo

44 MODALITY DETECTION 171

In order to account for the most popular annotate corpora in this NLPfields we mention the BioScope corpus Vincze et al (2008) which hasbeen annotated with negation and speculation cues and their scopesthe FactBank that has been annotated with event factuality (Sauriacuteand Pustejovsky 2009) and the CoNLL Shared Task 2010 (Farkas et al2010) which focused on hedges detection in natural language textsModality has been defined by Kobayakawa et al (2009) into a taxon-omy that includes request recommendation desire will judgmentetcTaboada et al (2011) did not consider irrealis markers to be reliablefor the purposes of sentiment analysis Therefore they simply ignoredthe semantic orientation of any word in their scope Their list of irre-alis markers includes modals conditional markers (eg ldquoifrdquo) negativepolarity items (eg ldquoanyrdquo and ldquoanythingrdquo) some verbs (ldquoto expectrdquo ldquotodoubtrdquo) questions words enclosed in quotes which could not neces-sarily reflect the authorrsquos opinion)According to Benamara et al (2012) modality can be used to expresspossibility necessity permission obligation or desire through gram-matical cues such as adverbial phrases (eg ldquomayberdquo ldquocertainlyrdquo)conditional verbal moods some verbs (eg ldquomustrdquo ldquocanrdquo ldquomayrdquo)some adjectives and nouns (eg ldquoa probable causerdquo)Relying on the categories of Larreya (2004) Portner (2009) they distin-guished the following three types of modality each one of which hasbeen indicated to possess a specific effect on the opinion expressionin its scope in term of strength or degree of certainty

Buletic Modality it indicates the speakerrsquos desireswishes

Cues verbs denoting hope (eg ldquoto wishrdquo)

Epistemic modality it refers to the speakerrsquos personal beliefs andaffects the strength and the certainty degree of opinion expres-sions

Cues doubt possibility or necessity adverbs (eg ldquoperhapsrdquo

172 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ldquodefinitelyrdquo) and modal verbs (eg ldquohave tordquo ldquomaycanrdquo)

Deontic Modality it indicates possibilityimpossibility or obliga-tionpermission

Cues same modal verbs of the epistemic modality but with adeontic reading (eg ldquoYou must go see the movierdquo)

442 Modality in SentIta

When computing the Prior Polarities of the SentIta items into the tex-tual context we considered that modality can also have a significantimpact on the SO of sentiment expressionsAccording to the literature trends but without specifically focusingon the Benamara et al (2012) modality categories we recalled in theFSA dedicated to modality the following linguistic cues and we madethem interact with the SentIta expressions

Sharpening and Softening Adverbs

bull simple adverbs (32 sharp and 12 soft)inequivocabilmenteAVV+FORTE+Focus=SHARP

necessariamenteAVV+FORTE+Focus=SHARP

relativamenteAVV+DEB+Focus=SOFT

suppergiugraveAVV+DEB+Focus=SOFT

bull compound adverbs (38 sharp and 26 soft)con ogni probabilitagraveAVV+AVVC+PDETC+PDETN+Focus=SHARP

senza alcun dubbioAVV+AVVC+PDETC+PDETN+Focus=SHARP

fino a un certo puntoAVV+AVVC+PAC+PDETAGGN+Focus=SOFT

in qualche modoAVV+AVVC+PDETC+PDETN+Focus=SOFT

Modal Verbs from the LG class 56 (dovere ldquohave to potereldquomaycan volere ldquowantrdquo) that have as definitional struc-

44 MODALITY DETECTION 173

ture N0 V (E + Prep) Vinf W and accept in position N1 a director prepositional infinitive clause (all) and only in the cases ofpotere and volere a N1-um

Conditional and Imperfect Tenses tenses are located after thetexts preprocessing through the annotations Tempo=IM andTempo=C

Dealing with modality is very challenging in the Sentiment analy-sis field because in this research area it is not enough to simply detectits indicators but it is also necessary to manage its effects in term ofpolarity We did not find appropriate to simplify the problem by ig-noring all the semantic orientation of phrases and sentences in whichmodality can be detected (Taboada et al 2011) In fact as exemplifiedbelow there are cases in which modality sensibly affects its intensityand other cases in which it regularly shifts the sentence polarity intospecific directions

Uncertainty Degrees the words annotated with the labels +SHARPand +SOFT described above just as intensifiers and downtonershave the power of modifying the intensity of the words endowedwith positive or negative prior polarities with the same effectsdescribed in Section 42 We respectively selected them amongthe words tagged with the labels FORTE and DEB with the onlyaim of providing an exhaustive module that can work well withboth Modality Detection and Sentiment Analysis purposes

ldquoPotererdquo + Indicative Imperfect + Oriented Item modal verbs occur-ring with an imperfect tense can turn a sentence into a weaklynegative one (66) when combined with positive words or into aweakly positive one when occurring with negative items (67)The effect is really close to the weak negation (see Table 46 inSection 43) This can be explained by the fact that the meaning

174 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

of the sentences of this kind always concerns subverted expecta-tions

(66) Poteva[Modal+IM] essere una trama interessante[+2] [-1]ldquoIt could be an interesting plot

(67) Poteva[Modal+IM] essere una trama noiosa[-2] [+1]ldquoIt could be a boring plot

ldquoPotererdquo + Indicative Imperfect + Comparative + Oriented Items alsoin this case we use the explanation of subverted expectationsThe terms of comparison are obviously the expected opinionobject and the real one If the expected object is better than thereal one the sentence score is negative (68) if it is worse the sen-tence score is positive (69) The rules for the polarity score attri-bution are the ones displayed in Section 45

(68) Poteva[Modal+IM] essere piugrave[I-CW] dettagliata[+2] [-1]ldquoIt might have been more detailedrdquo

(69) Poteva[Modal+IM] andare peggio[I-OpW +2] [-1]ldquoIt might have gone worserdquo

ldquoDovererdquo + Indicative Imperfect Here again we face disappointmentof expectations but with a stronger effect the negation does notregard the possibility of the opinion object to be good but a ne-cessity Therefore we assumed the final score in these cases tofollow the negation rules shown in Table 43 in Section 43 (70)

(70) Questo doveva[Modal+IM] essere un film di sfumature[+1] [-2]ldquoThis one was supposed to be a nuanced movierdquo

ldquoDovererdquo + ldquoPotererdquo + Past Conditional Past conditional sentences(Narayanan et al 2009) also have a particular impact on theopinion polarity especially with the modal verb dovere (ldquohaveto) In such cases as exemplified in (71) the sentence polarityis always negative in our corpus

45 COMPARATIVE SENTENCES MINING 175

(71) Non[Negation] avrei[Aux+C] dovuto[Modal+PP] buttare via i mieisoldi [-2]ldquoI should not have burnt my money

The verb potere instead follows the same rules described abovefor the imperfect tense

45 Comparative Sentences Mining

451 Related Works

Comparative Sentence Mining and Comparison Mining Systems areNLP techniques and tool that only in part coincide with the SentimentAnalysis fieldSentences that express a comparison generally carry along with themopinions about two or more entities with regard to their shared fea-tures or attributes (Ganapathibhotla and Liu 2008) While the Sen-timent Analysis main purpose is to identify details about such opin-ions the Information Extraction main goals regard comparative sen-tence detection and classification Therefore if the object of the anal-ysis and the basic information units are shared among the two ap-proaches their purposes can sometimes coincide but in other casesbe differentAmong the few studies on the computational treatment of compari-son (Jindal and Liu 2006ab Ganapathibhotla and Liu 2008 Fiszmanet al 2007 Yang and Ko 2011) only Ganapathibhotla and Liu (2008)tried to extract the authorsrsquo preferred entities and to identify the Se-mantic Orientation of comparative sentences after their identifica-tion and classificationWith the following two examples Ganapathibhotla and Liu (2008)showed the difficulty in the polarity determination of context-dependent opinion comparatives that for a correct interpretation al-

176 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

ways require domain knowledge eg the word ldquolongerrdquo can assumepositive (72) or negative (73) orientation depending on the productfeature on which it is associated

(72) ldquoThe battery life of Camera X is longer than Camera Yrdquo(73) ldquoProgram Xrsquos execution time is longer than Program Yrdquo

Since this work aims to provide a general purpose lexical and gram-matical database for Sentiment Analysis and Opinion Mining we didnot face here the treatment of this special kind of opinionated sen-tences just as we did not include in the lexicon items with domain-dependent meaningConcerning the comparative classification the main types that havebeen identified in literature are the following

bull Gradable (direct comparison)

ndash Non-equal

Increasing

Decreasing

ndash Equative

ndash Superlative

bull Non-gradable (implicit comparisons)

According to Ganapathibhotla and Liu (2008) Jindal and Liu(2006ab) a comparative relation can be represented in the followingway

ComparativeWord Features EntityS1 EntityS2 Type

Where ComparativeWord is the keyword used to express a compar-ative relation in a sentence Features is a set of features being com-pared EntityS1 is the first term of comparison and EntityS2 is thesecon oneThis way examples (72) and (73) are represented as follows

45 COMPARATIVE SENTENCES MINING 177

(72) longer battery life Camera X Camera Y Non-equal Gradable

(73) longer execution time Program X Program Y Non-equal Gradable

Our work shares with Ganapathibhotla and Liu (2008) the followingbasic rules

Increasing comparative + Negative item minusrarr Negative opinion minusrarr EntityS2 is preferred

Increasing comparative + Positive item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Negative item minusrarr Positive opinion minusrarr EntityS1 is preferred

Decreasing comparative + Positive item minusrarr Negative opinion minusrarr EntityS2 is preferred

However differently from our work the authors simply switchedthe polarity of the opinions (and consequently the preferred en-tity) in the cases in which comparative sentences contained negationwords or phrases They underlined that sometimes this operationcould appear problematic eg ldquonot longerrdquo does not mean ldquoshorterrdquoActually the problem is that the authors did not take into accountdifferent degrees of positivity and negativity As we will show in thenext paragraphs of this Section the co-occurrences of weakly pos-itivenegative and intensely positivenegative Prior Polarities withcomparison indicators can generate different final sentence scoresAs regards the automatic analysis of superlatives we mention thework of Bos and Nissim (2006) that grounded the semantic interpre-tation of superlatives on the characterization of a comparison set (theset of entities that are compared to each other with respect to a cer-tain dimension) The authors took into consideration the superlativemodification performed by ordinals cardinals or adverbs (eg inten-sifiers or modals) and also the fact that superlative can manifest itselfboth in predicative or attributive formTheir superlative parser able to recognise a superlative expressionand its comparison set with high accuracy provides a CombinatoryCategorial Grammar (CCG) (Clark and Curran 2004) derivation of theinput sentence that is then used to construct a Discourse Represen-tation Structure (DRS) (Kamp and Reyle 1993)In this thesis we did not go through a semantic and logical analysis of

178 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

comparative and superlative sentences as well explained in the nextparagraph we just used them as CVS in order to accurately modify thepolarity of the entities they involve

452 Comparison in SentIta

In general this work which focuses on Gradable comparison in-cludes the analysis of the following comparative structures

bull equative comparative frozen sentences of the type N0 Agg comeC1 from the LG class PECO (see Section 342)

bull opinionated comparative sentences that involve the expressionsmeglio di migliore di ldquobetter than peggio di peggiore di ldquoworsethan superiore a ldquosuperior to inferiore a ldquoless than (74)

bull adjectival adverbial and nominal comparative structures that re-spectively involve oriented adjectives adverbs and nouns fromSentIta (75)

bull absolute superlative (75)

bull comparative superlative (76)

The comparison with other products (74) has been evaluated withthe same measures of the other sentiment expressions so the polaritycan range from -3 to +3The superlative that can imply (76) or not (75) a comparison with awhole class of items (Farkas and Kiss 2000) confers to the first termof the comparison the higher polarity score so it always increases thestrength of the opinion Its polarity can be -3 or +3

(74) LrsquoS3 egrave complessivamente superiore allrsquoIphone5 [+2]ldquoThe S3 is on the whole superior to the iPhone5

(75) Il suo motore era anche il piugrave brioso[+2] [+3]ldquoIts engine was also the most lively

45 COMPARATIVE SENTENCES MINING 179

(76) Un film peggiore di qualsiasi telefilm [-3]ldquoA film worse than whatever tv series

We did not include the equative comparative in the ContextualValence Shifters Section because it does not change the polarity ex-pressed by the opinion words (77)

(77) [Non[Negation] entusiasmante[+3]][-1] come i PES degli anni prece-denti

ldquoNot exciting as the the past years PESrdquo

In the next paragraphs we will go through the rules used in this FSAnetwork to compute the Semantic Orientations of minority and ma-jority comparative structures As will be observed the variety and thecomplexity of co-occurrence rules can be easily managed using thefinite-state technology

453 Interaction with the Intensification Rules

Rule II increasing comparative sentences with increasing opinion-ated comparative items listed in SentIta with positive Prior Polaritiesgenerate a positive opinionComparative indicators are the following

bull increasing opinionated comparative words (I-Op-CW)eg meglio better = +2

bull Intensifiers + increasing opinionated comparative words (Int + I-Op-CW)eg molto meglio a lot better = +3

Rule I increasing comparative sentences which occur with positiveitems from SentIta generate a positive OpinionComparative indicators are the following

180 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

bull Increasing Comparative Words (I-CW)eg piugrave morerdquo

bull Intensifiers + Increasing comparative words (Int + I-CW)eg molto piugrave a lot morerdquo

The sentence score depends on the interaction of I-CWs intensi-fiers and the prior polarity scores attributed to the SentIta wordsin the electronic dictionaries (see Table 47)

The sentence score just depends on the Prior polarity of the I-Op-CW

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

I-CW + OpW[POS] piugrave +1 +2 +3Int + I-CW + OpW[POS] molto piugrave +2 +3 +3

Table 47 Rule I increasing comparative sentences with positive orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

D-CW + OpW[POS] meno -1 -2 -1Int + D-CW + OpW[POS] molto meno -2 -3 -1

Table 48 Rule III decreasing comparative sentences with negative orientation

Rule III decreasing comparative sentences which occur with positiveitems from SentIta generate a negative opinion (see Table 48)Comparative indicators are the following

bull Decreasing Comparative Words (D-CW)eg meno lessrdquo

45 COMPARATIVE SENTENCES MINING 181

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

I-CW + OpW[NEG] piugrave -1 -2 -3Int + I-CW + OpW[NEG] molto piugrave -2 -3 -3

Table 49 Rule IV increasing comparative sentences with Negative Orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

D-CW + OpW[NEG] meno +1 +1 +1Int + D-CW + OpW[NEG] molto meno +2 +2 +1

Table 410 Rule VI decreasing comparative sentences with positive orientation

bull Intensifier + Decreasing comparative word (Int + D-CW)eg molto meno much lessrdquo

Rule IV this rule is perfectly specular to Rule I Increasing compara-tive sentences which occur with Negative items from SentIta gen-erate a negative opinionThe sentence score depends again on the interaction of I-CW In-tensifiers and the prior polarity scores attributed to the SentItawords that appear in the same sentence (see Table 49)

Rule V decreasing comparative sentences with decreasing opinion-ated comparative items listed in SentIta with negative Prior Po-larities generate a negative opinionComparative indicators are the following

bull decreasing opinionated comparative words (I-Op-CW)eg peggio worse = -2

bull Intensifier + decreasing opinionated comparative word (Int+ I-Op-CW)

182 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Positive Prior PolaritiesIncreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + I-CW + OpW[POS] non piugrave -1 -1 -1N + Int + I-CW + OpW[POS] non molto piugrave -1 -1 -1

Table 411 Nagation of Rule I Negated increasing comparative sentences with neg-ative orientation

Positive Prior PolaritiesDecreasing comparative Examples carino buono prodigioso

+1 +2 +3Entity 1 Polarities

N + D-CW + OpW[POS] non meno +1 +2 +3N+ Int + D-CW + OpW[POS] non molto meno +1 +1 +2

Table 412 Negation of Rule III Negated decreasing comparative sentences withpositive orientation

eg molto peggio much worse = -3

Just as in Rule II the sentence score just depends on the PriorPolarity of the I-Op-CWs

Rule VI decreasing comparative sentences which occur with neg-ative items from SentIta generate a positive opinion (see Table410)Rule VI shares the comparative indicators with Rule III

454 Interaction with the Negation Rules

Negation of the Rule I increasing comparative sentences which occurwith positive items from SentIta when negated generate a neg-ative opinionComparative indicators are of course the same of Rule I with

45 COMPARATIVE SENTENCES MINING 183

Negative Prior PolaritiesIncreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + I-CW + OpW[NEG] non piugrave +1 +1 -1N+ Int + I-CW + OpW[NEG] non molto piugrave +1 +1 -2

Table 413 Negation of Rule IV Negated increasing comparative sentences with neg-ative orientation

Negative Prior PolaritiesDecreasing comparative Examples distratto cafone osceno

-1 -2 -3Entity 1 Polarities

N + D-CW + OpW[NEG] non meno -1 -2 -3N+ Int + D-CW + OpW[NEG] non molto meno -1 -2 -3

Table 414 Negation of RuleVI Negated decreasing comparative sentences with pos-itive orientation

the addition of negation indicators (N) eg non ldquonotrdquo The sen-tence score is always turned into the weakly negative one (seeTable 411)

Negation of the Rule II increasing comparative sentences in whichoccur increasing opinionated comparative items listed in SentItawith positive Prior Polarities when negated generate always anegative opinionComparative indicators are the same of Rule II

bull Negation markers + Increasing Opinionated ComparativeWords (N + I-Op-CW)eg non meglio ldquonot betterrdquo = -2

bull Negation markers + Intensifiers + Increasing OpinionatedComparative Words (N + Int + I-Op-CW)eg non molto meglio ldquonot much betterrdquo = -1

184 CHAPTER 4 SHIFTING LEXICAL VALENCES THROUGH FSA

Negation of Rule III the negation of Rule III that involves decreasingcomparative sentences with positive items from SentIta alwaysgenerates positive opinionsThe different sentence scores computed on the base of the in-teraction of comparative words and SentItarsquos Prior Polarities arereported in Table 412

Negation of Rule IV increasing comparative sentences which occurwith negative items from SentIta generate in the great part of thecases positive opinions (see Table 413) The sentence score isinfluenced by the interaction of Negation markers I-CW Inten-sifiers and negative prior polarity scores

Negation of Rule V decreasing comparative sentences in which occurincreasing opinionated comparative items listed in SentIta withnegative Prior Polarities when negated generate a weakly nega-tive opinion (non (molto + E) peggio ldquonot (much + E) worserdquo)

Negation of Rule VI decreasing comparative sentences which occurwith negative items from SentIta generate negative opinions ifnegated see Table 414

Chapter 5

Specific Tasks of the SentimentAnalysis

51 Sentiment Polarity Classification

Differently from the traditional topic-based text classification senti-ment classification aims to categorize documents through the classespositive and negative The objective can be a simple binary classifica-tion or it can come after a more complex classification with a largernumber of classes in the continuum between positive and negative(Pang and Lee 2008a) The rating-inference problem (Pang and Lee2005 Leung et al 2006 2008 Shimada and Endo 2008) regards themapping of automatically detected document polarity scores on thefine-grained rating scales (eg 510) of existing Collaborative Filter-ing algorithms (Breese et al 1998 Herlocker et al 1999 Sarwar et al2001 Linden et al 2003) The challenging problem of the sentimentclassification is the fact that the sentiment classes far from being un-related to one another represent opposing (binary classification) orordinal categories (multi-class a and rating-inference)

185

186 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

511 Related Works

Many techniques have been discussed in literature to perform theSentiment Polarity Classification We divided them in Section 32 inlexicon-based methods learning methods and hybrid methods1The first which calculate the sentiment orientation of sentences anddocuments starting from the polarity of words and phrases they con-tain are the methods that have been chosen for the present workThis is why they received an in-depth analysis in Chapter 3 wherewe discussed also the pros and cons of both the manual and auto-matic draft of sentiment lexical resources In this respect we men-tion again the major contribution related to knowledge-based so-lutions Landauer and Dumais (1997) Hatzivassiloglou and McKe-own (1997) Wiebe (2000) Turney (2002) Turney and Littman (2003)Riloff et al (2003) Hu and Liu (2004) Kamps et al (2004) Vermeij(2005) Gamon and Aue (2005) Esuli and Sebastiani (2005 2006a)Kanayama and Nasukawa (2006a) Benamara et al (2007) Kaji andKitsuregawa (2007) Rao and Ravichandran (2009) Mohammad et al(2009) Neviarouskaya et al (2009a) Velikovich et al (2010) Taboadaet al (2006 2011)Rule-based approaches that take into account the syntactic dimen-sion of the Sentiment Analysis are the ones used by Mulder et al(2004) Moilanen and Pulman (2007) and Nasukawa and Yi (2003)The most used classification algorithms in learning and statisticalmethods are Support Vector Machines which trained on specificdata-sets map a space of unstructured data points onto a new struc-tured space through a kernel function and Naiumlve Bayes which areprobabilistic classifiers that connect the presenceabsence of a classfeature to the absencepresence of other features given the class vari-able Effective sentiment features exploited in supervised machinelearning applications are terms their frequencies and TF-IDF parts

1In general while hybrid methods are characterized by the application of classifiers in sequence ap-proaches based on rules consists of an if-then relation among an antecedent and its associated conse-quent (Prabowo and Thelwall 2009)

51 SENTIMENT POLARITY CLASSIFICATION 187

of speech sentiment words and phrases sentiment shifters syntac-tic dependency etcTan et al (2009) and Kang et al (2012) adapted Naiumlve Bayes algorithmsfor sentiment analysisPang et al (2002) with the purpose to test the hypothesis that senti-ment classification can be treated just as a special case of topic-basedcategorization with the positive and negative sentiment consideredas topics used SVM Naiumlve Bayes and Maximum Entropy classifierswith diverse sets of featuresMullen and Collier (2004) employed the values potency activity andevaluative from a variety of sources and created a feature space clas-sified by means of SVMYe et al (2009) compared SVM with other statistical approaches prov-ing that SVM and n-gram outperform the Naiumlve Bayes approachThe minimum-cut framework working on graphs has been used byPang and Lee (2004)Bespalov et al (2011) performed sentiment classification via latent n-gram analysisNakagawa et al (2010) presented a dependency tree-based classifica-tion method grounded on conditional random fieldsCompositionality algorithms have been tested by Yessenalina andCardie (2011) and Socher et al (2013) Socher et al (2013) proposedRecursive Neural Tensor Network able to compute compositionalvector representations for phrases of variable length or of syntactickind obtaining better performances of standard recursive neural net-works matrix-vector recursive neural networks Naiumlve bi-gram Naiumlveand SVM Their output is the annotated corpus Stanford SentimentTreebank which represents a precious resource for the analysis of thecompositional effects of sentiment in languageSentiment indicators can be used for sentiment classification in anunsupervised manner this is the case of Turney (2002) and the otherlexicon-based methods (Liu 2012)In the end as regards the hybrid methods we mention the works of An-

188 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

dreevskaia and Bergler (2008) that integrated the accuracy of a corpus-based system with the portability of a lexicon-based system in a sin-gle ensemble of classifiers Aue and Gamon that combined nine SVM-based classifiers Dasgupta and Ng (2009) that firstly mined the un-ambiguous reviews through spectral techniques and afterwards ex-ploited them to classify the ambiguous reviews by combining activelearning transductive learning and ensemble learning Goldberg andZhu (2006) that presented an adaptation of graph-based learning al-gorithms for rating-inference problems and Prabowo and Thelwall(2009) that combined rule-based and machine learning classificationalgorithms

512 DOXA a Sentiment Classifier based on FSA

Using the command-line program noojapplyexe we built Doxa aprototype written in Java by which users can automatically apply theNooj lexicon-grammatical resources from Section 3 and rules fromSection 4 to every kind of text getting back a feedback of statistics thatcontain the opinions expressed in each caseDoxa sums up the values corresponding to every sentiment expressionand then standardizes the result for the total number of sentimentexpressions contained in each review Neutral sentences are simplyignored by both the FSA and DoxaThe automatically attributed polarity score are compared with thestars that the opinion holder gave to his review Then through thestatistics about the opinions expressed in every domain Doxa teststhe consistency of the sentiment lexical databases and tools with re-spect to the rating-inference problem in each document and in thewhole corpusFigure 51 reports an example of the analysis on the domain of hotelsreviews

51 SENTIMENT POLARITY CLASSIFICATION 189

Figure 51 Doxa the LG sentiment classifier based on the finite-state technology

513 A Dataset of Opinionated Reviews

Despite the fact that for the English language there is a large availabil-ity of corpora for sentiment analysis (Hu and Liu 2004 Pang and Lee2004 2005 Wiebe et al 2005 Wilson et al 2005 Thomas et al 2006Jindal and Liu 2007 Snyder and Barzilay 2007 Blitzer et al 2007 Sekiet al 2007 Macdonald et al 2007 Pang and Lee 2008b among oth-ers) the Italian language presents fewer contributions in this sense(Basile and Nissim 2013 Bosco et al 2013 2014)The dataset used to evaluate the lexical and grammatical resources ofthe present work has been built from scratch using Italian opinion-ated texts in the form of usersrsquo reviews and comments of e-commerceand opinion websites It contains 600 texts units and refers to sixdifferent domains for all of which different websites have been ex-ploited

190 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Cars wwwciaoit

Smartphones wwwtecnozoomit wwwciaoit wwwamazonitwwwalatestit

Books wwwamazonit wwwqlibriit

Movies wwwmymoviesit wwwcinemaliait wwwfilmtvitwwwfilmscoopit

Hotels wwwtripadvisorit wwwexpediait wwwvenerecomithotelscom wwwbookingcom

Videogames wwwamazonit

Details about the corpus are shown in Table 1

Text

Feat

ure

s

Car

s

Smar

tph

on

es

Bo

oks

Mov

ies

Ho

tels

Gam

es

Tot

Neg Docs 50 50 50 50 50 50 300Pos Docs 50 50 50 50 50 50 300Text Files 20 20 20 20 20 20 120

Word Forms 17163 19226 8903 37213 12553 5597 101655Tokens 21663 24979 10845 45397 16230 7070 126184

Table 51 Dataset of opinionated online customer reviews

Adjectives as it is commonly recognized in literature (Hatzivas-siloglou and McKeown 1997 Hu and Liu 2004 Taboada et al 2006)seem to be the most reliable SO indicators considering that the 17of the adjectives occurring in the corpus are polarized compared tothe 3 of the adverbs the 2 of nouns and the 7 of verbsMoreover considering all the opinion bearing words in the corpus theadjectivesrsquo patterns cover 81 of the total number of occurrences (al-most 5000 matches) while the adverbs the nouns and the verbs reach

51 SENTIMENT POLARITY CLASSIFICATION 191

respectively a percentage of 4 6 and 2 The remaining 7 is cov-ered by the other sentiment expressions that in any case contributeto the achievement of satisfactory levels of RecallAs regards the Semantic Orientation while in the dictionaries almost70 of the words are connoted by a negative Prior Polarity the num-ber of positive and negative expressions in our perfectly balanced cor-pus lead to almost antipodal results 64 of the sentiment expressionspossess a positive polarity In particular it is positive 61 of the ad-jective expressions 77 of the adverb expressions 65 of the nounsexpressions and 51 of the verb expressionsThe domain independent expressions have diametrically oppositepercentages only 34 of the occurrences is positive This result is dueto the presence of the really productive negative metanodes troppoldquotoo muchrdquo and poco ldquonot muchrdquo Inactivating them in fact the pos-itive percentage reaches 58 of occurrences Thus it could be statedthat although the lexicon makes available a greater number of nega-tive items the real occurrences of the opinion words throw off balancein the direction of the positive items (see Section 3321 to deepen thephenomenon of positive biases)

514 Experiment and Evaluation

In the sentiment classification task the information unit is representedby an opinionated document as a whole The purpose of our senti-ment tool is to classify such documents on the base of their overallSemantic OrientationTo ground this task on a lexicon means to hypothesize that the polar-ities of opinion words can be considered indicators of the polarity ofthe document in which the words and the expressions are containedOur Lexicon-Grammar based sentiment lexical database confers to el-ementary sentences the status of minimum semantic units of the lan-guage Therefore the indicators on which we grounded our classifi-

192 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cation go beyond the limit of mere words to match entire elementarysentences or polar phrases from complex sentencesBecause the inference of the whole document SO depends on the cor-rect annotation of its sentences and phrases the evaluation phase in-volves both the sentence-level and the document-level performancesof the tool

5141 Sentence-level Sentiment Analysis

This Paragraph presents the results pertaining to the Nooj outputwhich has been produced applying the SentIta LG sentiment re-sources to our corpus of costumer reviewsFor sake of clarity and for the reproducibility of the results it must bepointed out that some of the lexical resources have been applied withhigher priority attributions Details about the preferences are givenbelow

bull H1 to the sentiment adverb and nouns dictionaries

bull H2 to the sentiment adjective dictionary

bull H3 to the freely available dictionary Contrazioni that belongs tothe official Italian module of Nooj

bull H4 to an high-level priority dictionary that contains what is com-monly called a ldquostop word listrdquo

bull H5 to a dictionary of compound adverbs

In order to avoid ambiguity as much as possible the dictionaryof the verbs of sentiment must have the same priority of the Italianstandard dictionary (sdic) lower than any other mentioned resourceBecause our lexical and grammatical resources are not domain-specific we observed their interaction with every single part of thecorpus which is composed of many different domains each one ofthem characterized by its own peculiarities

51 SENTIMENT POLARITY CLASSIFICATION 193

Moreover in order to verify the performances of every part of speech(and of the expressions connected to them) we checked as shown inTable 52 the Precision and the Recall applying separately every singlemetanode (A ADV N V D-ind2) of the main graph of the sentimentgrammar

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A 88 90 79 875 915 835 866ADV 804 758 879 923 92 50 797N 818 857 828 778 853 857 832V 882 571 848 895 571 100 795D-ind 879 835 781 767 90 947 879Average 853 784 825 848 832 828 828

Table 52 Precision measure in sentence-level classification

The sentence-level Precision has been manually calculated deter-mining whereas the polarity and the intensity score of the expressionswas correctly assigned on all the concordances produced by Nooj Wemade an exception for the expressions of the adjectives which due tothe extremely large number of concordances have been evaluated byextracting a sample of 1200 matches (200 for each domain)The values marked by the asterisks have been reported to be thoroughbut they are not really relevant because of the small number of concor-dances on which they have been calculatedThe best performance has been reached in average by the cars datasetwhile the lower values have been assigned to the movies dataset Theperformances of the adjectives grammar are really satisfying espe-

2Domain independent sentiment expressions (D-ind) refer to all the frozen and semi-frozen expres-sions that have been directly formalized in the FSA and include also the vulgar expressions

194 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

cially if compared with the high number of matches they producedEven though the domain-independent node covers a small part of thematches (7) it reached the best Precision results if compared withthe other metanodesAnyway in many cases just evaluating the polarity and the intensityof the expressions contained into a document is not enough to deter-mine whereas such expression truly contributes to the identificationof the polarity and the intensity of the document itself Sometimes po-larized sentences and phrases are not opinion indicators especiallyin the reviews of movies and books in which polarized sentences canjust refer to the plot (78)

(78) Le belle case sono dimora della degenerazione piugrave bieca [-3]ldquopretty houses hosts the grimmest degeneracyrdquo)

An opinionated sentence can be considered not pertinent also isthe case in which it does not directly refer to the object of the opinionbut mentions aspects that are just indirectly connected with the prod-uct Examples of this are (79) that does not refer to the product but tothe delivery service that in a book review does not refers to the objectbut to another media product related to the book

(79) Apprezzo[+2] molto[+] Amazon per questo [+3]ldquoI really appreciate Amazon for thisrdquo

(80) Il film non[Negation] mi era piaciuto[+2] [-2]ldquoI didnrsquot like the movierdquo

In this work we chose to exclude the second sentence from the cor-rect matches but we considered worthwhile to include the first onebecause of the high influence that this feature has on the numericalvalue that the opinion holder confers to his own opinionTable 53 corrects the results reported in Table 52 by excluding from

the correct matches the ones that do not give their contribution tothe individuation of the correct document SO This makes the averagesentence-level Precision drop of 87 points reaching the percentage of

51 SENTIMENT POLARITY CLASSIFICATION 195

No

des

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

A -5 -55 -28 -105 -1 -15 -86ADV -22 0 -297 -77 0 0 -66N -114 -143 -395 -148 -59 -143 -167V 0 0 -176 -158 0 0 -56D-ind -86 0 -133 -67 -25 -53 -61Average -54 -4 -256 -111 -19 -42 -87

Table 53 Irrelevant matches in sentence-level classification

741 which we considered adequateWe chose to present these two results separately because this way it ispossible to distinguish the domains that are more influenced by thisproblem (eg movies and books) from the ones that are less affectedby the pertinence problem (eg hotels and smartphones)

5142 Document-level Sentiment Analysis

As far as the document-level performance is concerned (Table 54)we calculated the precision twice by considering in a first case as truepositive the reviews correctly classified by Doxa on the base of theirpolarity and in a second case by considering as true positive the docu-ments that received by our tool exactly the same stars specified by theOpinion HolderIn detail in the Polarity only row the True Positives are the documentsthat have been correctly classified by Doxa with a polarity attributionthat corresponds to the one specified by the Opinion HolderIn the Intensity also row the True Positives are the document that re-ceived by Doxa exactly the same stars specified by the Opinion Holder

196 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Pre

cisi

on

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Polarity only 710 720 630 740 910 720 740Intensity also 320 450 250 330 490 340 363

Table 54 Precision measure in document-level classification

As we can see the latter seem to have a very low precision but upona deeper analysis we discovered that is really common for the OpinionHolders to write in their reviews texts that do not perfectly correspondto the stars they specified That increases the importance of a softwarelike Doxa that does not stop the analysis on the structured data butenters the semantic dimension of texts written in natural language

Rec

all

Car

s

Smar

tph

on

es

Mov

ies

Bo

oks

Ho

tels

Vid

eoga

mes

Ave

rage

Sentence-level 727 796 648 657 721 588 690Document-level 100 986 100 961 989 912 975

Table 55 Recall in both the sentence-level and the document level classifications

The Recall pertaining to the sentence-level performance of our toolhas been manually calculated on a sample of 150 opinionated docu-ments (25 from each domain) by considering as false negatives thesentiment indicators which have not been annotated by our grammarThe document-level Recall instead has been automatically checkedwith Doxa by considering as true positive all the opinionated docu-ments which contained at least one appropriate sentiment indicator

52 FEATURE-BASED SENTIMENT ANALYSIS 197

So the documents in which the Nooj grammar did not annotate anypattern were considered false negatives Because we assumed that allthe texts of our corpus were opinionated we considered as false neg-atives also the cases in which the value of the document was 0Taking the F-measure into account the best results were achievedwith the smartphonersquos domain (770) in the sentence-level taskand with the hotelrsquos dataset (948) into the document-level perfor-mance

52 Feature-based Sentiment Analysis

The purpose of the sentiment analysis based on features is to providecompanies with customer opinions overviews which automaticallysummarize the strengths and the weaknesses of the products and ser-vices they offer In Section 1 we connected the definition of the opin-ion features to the following function (Liu 2010)

T=O(f)

Where the features (f ) of an object (O) (eg hotels) can be brandnames (eg Artemide Milestone) properties (eg cleanness loca-tion) parts (eg rooms apartments) related concepts (eg city to-ponyms) or parts of the related concepts (eg city centre tube mu-seum etc ) (Popescu and Etzioni 2007) Liu (2010) represented ev-ery object as a ldquospecial featurerdquo (F) defined by a subset of features (fi)

F = f1 f2 fn

Both F and fi grouped together under the noun target can beautomatically discovered in texts through both synonym words andphrases Wi or other direct or indirect indicators Ii

Wi = wi1 wi2 wim

Ii = ii1 ii2 iiq

198 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

It is essential to discover not only the overall polarity of an opin-ionated document but also its topic and features in order to discernthe aspect of a product that must be improved or whether the opin-ions extracted by the Sentiment Analysis applications are relevant tothe product or not For example (81) an extract from a hotel reviewexpresses two opinions on two different kinds of features the behav-ior of the staff (81a) and the beauty of the city in which the hotel islocated (81b)

(81a) [Opinion[FeaturePersonale] meravigliosamente accogliente]ldquoStaff wonderfully welcomingrdquo

(81b) [Opinion[FeatureRoma]egrave bellissima]ldquoRome is beautifulrdquo

It is crucial to distinguish between the two topics because just oneof them (81a) refers to an aspect that can be influenced or improvedby the company

521 Related Works

Pioneer works on feature-based opinion summarization are Hu andLiu (2004 2006) Carenini et al (2005) Riloff et al (2006) and Popescuand Etzioni (2007) Both Popescu and Etzioni (2007) and Hu and Liu(2004) firstly identified the product features on the base of their fre-quency and then calculated the Semantic Orientation of the opin-ions expressed on these features In order to find the most impor-tant features commented in reviews Hu and Liu (2004) used the as-sociation rule mining thanks to which frequent itemsets can be ex-tracted in free texts Redundant and meaningless items are removedduring a Feature Pruning phase Experiments have been carried outon the first 100 reviews belonging to five classes of electronic products(wwwamazoncom and wwwC|netcom) Hu and Liu (2006) presentedthe algorithm ClassPrefix-Span that aimed to find special kinds of pat-

52 FEATURE-BASED SENTIMENT ANALYSIS 199

tern the Class Sequential Rules (CSR) using fixed target and classesTheir method was based on the sequential pattern miningCarenini et al (2005) struck a balance between supervised and unsu-pervised approaches They mapped crude (learned) features into aUser-Defined taxonomy of the entityrsquos Features (UDF) that provideda conceptual organization for the information extracted This methodtook advantages from a similarity matching in which the UDF re-duced the redundancies by grouping together identical features andthen organized and presented information by using hierarchical rela-tions) Experiments have been conducted on a testing set containingPros Cons and detailed free reviews of five productsRiloff et al (2006) used the subsumption hierarchy in order to iden-tify complex features and then to reduce a feature set by remov-ing useless features which have for example a more general coun-terpart in the subsumption hierarchy The feature representationsused for opinion analysis are n-grams (unigrams bigrams) and lexico-syntactic extraction patterns Popescu and Etzioni (2007) presentedOPINE an unsupervised feature and opinion extraction system thatused the Web as a corpus to identify explicit and implicit featuresand relaxation-labelling methods to infer the Semantic Orientation ofwords The system draws on WordNetrsquos semantic relations and hier-archies for the individuation of the features (parts properties and re-lated concepts) and the creation of clusters of wordsFerreira et al (2008) made a comparison between the likelihood ratiotest approach (Yi et al 2003) and the Association mining approach(Hu and Liu 2004) The e-product dataset (topical documents) andthe annotation scheme (improved with some revisions) are the onesused by Hu and Liu (2004) In particular the revised annotationscheme comprehended

bull part-of relationship with the product (eg battery is a part of adigital camera)

bull attribute-of relationship with the product (weight and design are

200 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

attributes of a camera)

bull attribute-of relationship with the productrsquos feature (battery life isan attribute of the battery that is in turn a camerarsquos feature)

Double Propagation (Qiu et al 2009) focuses on the natural relationbetween opinion words and features Because opinion words are of-ten used to modify features such relations can be identified thanks tothe dependency grammarBecause these methods have good results only for medium-size cor-pora they must be supported by other feature mining methods Thestrategy proposed by Zhang et al (2010) is based on ldquono patternsrdquo andpart-whole patterns (meronymy)

bull NP + Prep + CP ldquobattery of the camerardquo

bull CP + with + NP ldquomattress with a coverrdquo

bull NP CP or CP NP ldquomattress padrdquo

bull CP Verb NP ldquothe phone has a big screenrdquo

Where NP is a noun phrase and CP a class concept phrase Theverbs used are ldquohasrdquo ldquohaverdquo ldquoincluderdquo ldquocontainrdquo ldquoconsistrdquo etc ldquoNordquopatterns are feature indicators as well Examples of such patterns areldquono noiserdquo or ldquono indentationrdquo Clearly a manually built list of fixedldquonordquo expression like ldquono problemrdquo is filtered out from the feature can-didates Every sentence is POS-tagged using the Brillrsquos tagger (Brill1995) and used as system inputIn order to find important features and rank them high Zhang et al(2010) used a web page ranking algorithm named HITS (Kleinberg1999)Somprasertsri and Lalitrojwong (2010) used a dependency based ap-proach for the opinion summarization task A central stage in theirwork is the extraction of relations between product features (ldquothe topicof the sentimentrdquo) and opinions (ldquothe subjective expression of theproduct featurerdquo) from online customer reviews Adjectives and verbs

52 FEATURE-BASED SENTIMENT ANALYSIS 201

have been used in this study as opinion words The maximum entropymodel has been used in order to predict the opinion-relevant productfeature relation In general in a dependency tree a feature can be ofdiffering kinds

bull Child the product feature is the subject or object of the verbsand the opinion word is a verb or a complement of a copular verb(opinions depend on the product features)

eg I like(opinion) this camera(feature)

bull Parent the opinion words are in the modifiers of product fea-tures which include adjectival modifier relative clause modifieretc (product features depend on the opinions)

eg I have found that this camera takes incredible(opinion) pic-tures(feature)

bull Sibling the opinion word may also be in an adverbial modifiera complement of the verb or a predicative (both opinions andproduct features depend on the same words)

eg The pictures(feature) some time turn out blurry(opinion)

bull Grand Parent the opinion words are adjectival complement ofmodifiers of product features (in the following example the wordldquogoodrdquo is the adverbial complement of relative clause modifierof noun phrase ldquomovie moderdquo) (opinions depend on the wordswhich depend on the product features)

eg It has movie mode(feature) that works good(opinion) for a digitalcamera

bull Grand Child the product feature is the subject or object of thecomplements and the opinion word is a verb or a complementof a copular verb for example (product features depend on thewords which depend on the opinions)

eg Itrsquos great(opinion) having the LCD display(feature)

202 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

bull Indirect none of the above relations

The Stanford Parser has been used by Somprasertsri and Lalitroj-wong (2010) to parse documents in the pre-processing phase beforethe dependency analysis stage Definite linguistic filtering (POSbased) pattern and the General Inquirer dictionary (Stone et al 1966)has been used to locate noun phrases and to extract product featurecandidate The generalized iterative scaling algorithm (Darroch andRatcliff 1972) has been used to estimate parameters or weights of theselected features Because it is possible to refer to a particular featureusing several synonyms Somprasertsri and Lalitrojwong (2010) usedsemantic information encoded into a product ontology manuallybuilt by integrating manufacturer product descriptions and termi-nologies in customer reviewsWei et al (2010) proposed a semantic-based method that made usesof a list of positive and negative adjectives defined in the GeneralInquirer to recognize opinion words and then extracted the relatedproduct features in consumer reviewsXia and Zong (2010) performed the feature extraction and selectiontasks using word relation features which seems to be effective featuresfor sentiment classification because they encode relation informationbetween words They can be of different kinds Uni (unigrams) WR-Bi(traditional bigrams) WR-Dp (word pairs of dependency relation)GWR-Bi (generalized bigrams) and GWR-Dp (dependency relations)Gutieacuterrez et al (2011) exploited Relevant Semantic Trees (RST) for theword-sense disambiguation and measured the association betweenconcepts at the sentence-level using the association ratio measureMejova and Srinivasan (2011) explored different feature definitionand selection techniques (stemming negation enriched featuresterm frequency versus binary weighting n-grams and phrases) andapproaches (frequency based vocabulary trimming part-of-speechand lexicon selection and expected Mutual Information Mejovarsquosexperimental results confirmed the fact that adjectives are importantfor polarity classification Furthermore they revealed that stemming

52 FEATURE-BASED SENTIMENT ANALYSIS 203

and using binary instead of term frequency feature vectors do nothave any impact on performances and that the helpfulness of certaintechniques depends on the nature of the dataset (eg size and classbalance)Concordance based Feature Extraction (CFE) is the technique used byKhan et al (2012) After a traditional pre-processing step regular ex-pressions are used to extract candidate features Evaluative adjectivescollected on the base of a seed list from Hu and Liu (2004) are helpfulin the feature extraction task In the end a grouping phase found theappropriate features for the opinionrsquos topic grouping together all therelated features and removing the useless ones The algorithm usedin this phase is based on the co-occurrence of features and uses theleft and right featurersquos contextKhan et al selected candidate product features by employing nounphrases that appear in texts close to subjective adjectives This intu-ition is shared with Wei et al (2010) and Zhang and Liu (2011) Thecentrepiece of the Khanrsquos method is represented by hybrid patternsCombined Pattern Based Noun Phrases (cBNP) that are grounded onthe dependency relation between subjective adjectives (opinionatedterms) and nouns (product features) Nouns and adjectives canbe sometimes connected by linking verbs (eg ldquocamera producesfantastically good picturesrdquo) Preposition based noun phrases (egldquoquality of photordquo ldquorange of lensesrdquo) often represents entity-to-entityor entity-to-feature relations The last stage is the proper featureextraction phase in which using an ad hoc module the noun phrasesof the cBNP patterns have been designated as product features

522 Experiment and Evaluation

In order to give our contribution to the resolution of the opinionfeature extraction and selection tasks we both exploited the SentIta

204 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

database and other preexisting Italian lexical resources of Nooj thatconsist of an objective dictionary that describes its lemmas withappropriate semantic properties it has been used to define andsummarize the features on which the opinion holder expressed hisopinionsIn order to identify the relations between opinions and features thelexical items contained in the just mentioned dictionaries have beensystematically recalled into a Nooj local grammar that provides asfeedback annotations about the (positive or negative) nature of theopinion and about the semantic category of the featuresThe objective lexicon consists in a Nooj dictionary of concrete nouns(Table 56) that has been manually built and checked by seven lin-guists It counts almost 22000 nouns described from a semantic pointof view by the property Hyper that specify every lemmarsquos hyperonymThe tags it includes are shown in Table 56 Through the softwareNooj it has been possible to create a connection between thesedictionaries and an ad hoc FSA for the feature analysisAs we will demonstrate below the feature classification is stronglydependent on the review domains and topics on the base of whichspecific local grammars need to be adapted (Engstroumlm 2004 An-dreevskaia and Bergler 2008)As shown in Figure 52 in the Feature Extraction phase the sentencesor the noun phrases expressing the opinions on specific features arefound in free texts Annotations regarding their polarity (BENEFITDRAWBACK) and their intensity (SCORE=[-3+3]) are attributed to each sen-tence on the base of the opinion words on which they are anchoredObviously the graph reported in Figure 52 represents just an extractof the more complex FSA built to perform the task which includesmore than 100 embedded graphs and thousands of different pathsable to describe not only the sentences and the phrases anchored onsentiment adjectives but also the ones that contain sentiment ad-verbs nouns verbs idioms and other lexicon-independent opinionbearing expressions (eg essere (dotato + fornito + provvisto) di ldquoto be

52 FEATURE-BASED SENTIMENT ANALYSIS 205

Label Tag Entries Example Traslationbody part Npc 598 labbro liporganism part Npcorg 627 colon colontext Ntesti 1168 rubrica address book(part of) article of clothing Nindu 1009 vestaglia nightgowncosmetic product Ncos 66 rossetto lipstickfood Ncibo 1170 agnello lambliquid substance Nliq 341 urina urinepotable liquid substance Nliqbev 29 sidro cidermoney Nmon 216 rublo ruble(part of) building Nedi 1580 torre towerplace Nloc 1018 valle valleyphysical substance Nmat 1727 agata agate(part of) plant Nbot 2187 zucca pumpkinmedicine Nfarm 415 sedativo sedativedrug Ndrug 69 mescalina mescalinechemical element Nchim 1212 cloruro chloride(part of) electronic device Ndisp 1191 motosega chainsaw(part of) vehicle Nvei 1082 gondola gondola(part of) furniture Narr 450 tavolo tableinstrument Nstrum 3666 ventaglio fanTot - 19946 - -

Table 56 Semantically Labeled Dictionary of Concrete Nouns

equipped withrdquo essere un (aspetto + nota + cosa + lato) negativo ldquoto bea negative siderdquo)The Contextual Valence Shifters that have been considered are theones described in Chapter 4Differently from the other approaches proposed in Literature theFeature Pruning task (Figure 53) in which the features extracted aregrouped together on the base of their semantic nature is performedin the same moment of the Extraction phase just applying once thedictionaries-grammar pair to free texts This happens because thegrammar shown in Figure 53 is contained into the metanode NG(Nominal Group) of the feature extraction grammarThanks to the instructions contained in this subgraph the words de-

scribed in our dictionary of concrete nouns as belonging to the same

206 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figure 52 Extract of the Feature Extraction grammar

hyperonym receive the same annotation The words labeled with thetags Narr (furniture eg ldquobedrdquo) and Nindu (article of clothing egldquobath towelrdquo) are annotated as ldquofurnishingsrdquo (first path) the wordslabeled with the tags Ncibo (food eg ldquocakerdquo) and NliqBev (potableliquid substance eg ldquocoffeerdquo) receive the annotation ldquofoodservicerdquo(second path) and so onIn order to perfect the results we also directly included in the nodes ofthe grammar the words that were important for the feature selectionbut that were not contained in the dictionary of concrete nouns (ab-stract nouns eg pranzo ldquolunchrdquo vista ldquoviewrdquo) and the hyperonymsthemselves that in general correspond with each feature class name(eg arredamento ldquofurnishingsrdquo luogo ldquolocationrdquo)Finally we disambiguated terms that in the objective dictionary wereassociated to more than one tag by excluding the wrong interpreta-tions from the corresponding node For example in the domain of

52 FEATURE-BASED SENTIMENT ANALYSIS 207

Figure 53 Extract of the Feature Pruning metanode

hotel reviews if one comments the cucina ldquocookingrdquo (first path) or thesoggiorno ldquosojournrdquo (third path) he is clearly not talking about rooms(in Italian cucina means also ldquokitchenrdquo and soggiorno ldquoliving roomrdquo)but respectively about the food service and the whole holiday expe-rienceThe annotations provided by Nooj once the lexical and grammaticalresources are applied to texts written in natural language can beon the form of XML (eXtensible Markup Language) tags For thisreason they can be effortlessly recalled into an application that au-tomatically applies the Nooj ad hoc resources and makes statistical

208 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

analyses on them Therefore we dedicated a specific module of Doxato the feature-based sentiment analysis that has been tested on acorpus composed of 1000 hotel reviews collected from websites likewwwtripadvisorit wwwexpediait and wwwbookingcomAn extract of the automatically annotated corpus is reported below

Review

Sono stato in questo albergo quasi per caso dopo aver subitoun incidente e devo ammettere che se non fosse stato per ladisponibilitagrave la cortesia e la professionalitagrave di Mario Pinoe Rosa non avrei potuto risolvere una serie di cose Lrsquohotelegrave fantastico silenzioso e decisamente pulito La colazioneottima ed il responsabile della sala colazione gentilissimo

ldquoI happened to be in this hotel by accident and I must admitthat if it hadnrsquot been for the willingness the kindness andthe competence of Mario Pino and Rosa I couldnrsquot havesolved a number of things The hotel is fantastic silentand definitely clean Great the breakfast and very kind theresponsible of the breakfast roomrdquo

Annotation

Sono stato in questo albergo quasi per caso dopo aver subito un incidente e devo ammettere

che se non fosse stato per ltBENEFIT TYPE=ATTITUDES SCORE=2gt la disponibilitagrave

ltBENEFITgt ltBENEFIT TYPE=ATTITUDES SCORE=2gt la cortesia ltBENEFITgt e

ltBENEFIT TYPE=ATTITUDES SCORE=2gt la professionalitagrave ltBENEFITgt di Mario Pino

e Rosa non avrei potuto risolvere una serie di cose

ltBENEFIT TYPE=ACCOMMODATIONS SCORE=3gt Lrsquohotel egrave fantastico ltBENEFITgt

ltBENEFIT TYPE=LOCATION SCORE=2gt silenzioso ltBENEFITgt e

ltBENEFIT SCORE=3 TYPE=CLEANLINESSgt decisamente pulito ltBENEFITgt

ltBENEFIT TYPE=FOODSERVICE SCORE=3gt La colazione ottima ltBENEFITgt ed

52 FEATURE-BASED SENTIMENT ANALYSIS 209

ltBENEFIT TYPE=ATTITUDES SCORE=3gt il responsabile della sala colazione gentilissimo

ltBENEFITgt

The evaluation presented in Table 57 shows that the method pro-posed in the present work performs very well with the hotel domainWe manually calculated on a sample of 100 documents from theoriginal corpus the Precision on the Semantic Orientation Identi-fication task (that regards the annotation of the sentencesrsquo polarityand intensity) and on the Feature Classification task (that regards theannotation of the types of the feature) Because the classes used to

Method Precision Recall F-scoreSO Identification Classification

No residual category 933 924 723 811Residual category 929 826 783 804

Table 57 F-score of Feature Extraction and Pruning

group the features have been identified a priori in our grammar wealso checked the results of our tool with the introduction of a residualcategory to annotate the features that do not belong to these classesEven though this causes an increase on the Recall a commensuratedecrease of the Precision in the Classification task makes the valueof the F-score similar to the one reached when the residual categoryis not taken into account Significant examples of the sentences thathave been automatically annotated in our corpus are given below

(82) Il responsabile della sala colazione gentilissimoldquoVery kind the responsible of the breakfast roomrdquoBENEFIT TYPE= ATTITUDES SCORE= 3

(83) Le camere erano sporcheldquoThe rooms were dirtyrdquoDRAWBACK TYPE= CLEANLINESS SCORE= -2

210 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

(84) Lrsquohotel egrave vicinissimo alla metroldquoThe hotel is really close to the tuberdquoBENEFIT TYPE= LOCATION SCORE= 3

In (82) the subjective adjective gentilissimo ldquovery kindrdquo recognisedby the grammar as the superlative form (SCORE=+3) of the positive adjec-tive gentile ldquokindrdquo (SCORE=+2) makes the grammar recognize the wholesentiment expression and provides the information regarding its po-larity and intensity This is why the FSA associates the tag BENEFIT andthe higher score to the feature that in this case is represented by anounIn (83) is shown that the ldquotyperdquo of the features can be indicated notonly by nouns but also by specific adjectives as happens with sporcheldquodirtyrdquo that is both an opinion word that specifies that the whole ex-pression is negative and a feature class indicator that let the grammarclassify it in the group of ldquocleanlinessrdquo rather than in the group of ldquoac-commodationsrdquo (that instead would be selected by the noun cameraldquoroomrdquo described in our objective dictionary with the tags Conc ldquocon-creterdquo and Nedi ldquobuildingrdquo)In the end sentence (84) attests the strength of the influence of thereview domains sometimes the lexicon is not enough to correctly ex-tract and classify the product features Domain-specific lexicon in-dependent patterns are often required in order to make the analysesadequate (Engstroumlm 2004 Andreevskaia and Bergler 2008)

523 Opinion Visual Summarization

We said in Section 51 that Doxa is a prototype able to analyze opin-ionated documents from their semantic point of view It automati-cally applies the Nooj resources to free texts sums up the values cor-responding to every sentiment expression standardizes the results forthe total number of sentiment expressions contained in the reviewsand then provides statistics about the opinions expressed

53 SENTIMENT ROLE LABELING 211

In this Section we show the Doxa improved functionalities dedicatedto the sentiment analysis based on the opinion features In the Ho-tel domain this deeper analysis considerably increases the Precisionon the sentences-level SO identification task that goes from 813 to933 without any loss in the Recall that holds steady just rising of02 percentage points from 721 to 723 (see Table 57)The feature-based module of Doxa completely grounded on the auto-matic analysis of free texts allows the comparison between more thanone object on the base of different kinds of aspects that characterizethemThe regular decagon in the center of the spider graphs of Figure 54represents the threshold value (zero) with which every group of opin-ion on the same object is compared when a figure is larger than it theopinion is positive when it is smaller the opinion is negative when afeaturersquos value is zero it means that the consumers did not express anyjudgment on that topicThe automatic transformation of non-structured reviews in struc-tured visual summaries has a strong impact on the work of companymanagers that must ground their marketing strategies on the analy-sis of a large amount of information and on the individual consumersthat need to quickly compare the Pros and Cons of the products orservices they want to buy

53 Sentiment Role Labeling

Semantic Roles also called theta or thematic roles (Chomsky 1993Jackendoff 1990 see Section 214) are linguistic devices able to iden-tify in texts ldquoWho did What to Whom and How When and Whererdquo(Palmer et al 2010 p 1)The main task of an SRL system is to map the syntactic elements of freetext sentences to their semantic representation through the identifi-cation of all the syntactic functions of clause-complements and their

212 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Figu

re54D

oxam

od

ule

for

Feature-b

asedSen

timen

tAn

alysis

53 SENTIMENT ROLE LABELING 213

labeling with appropriate semantic roles (Rahimi Rastgar and Razavi2013 Monachesi 2009)To work an SRL system needs machine readable lexical resources inwhich all the predicates and their relative arguments are specifiedFrameNet (Baker et al 1998 Fillmore et al 2002 Ruppenhofer et al2006) VerbNet Schuler (2005) and the PropBank Frame Files (Kings-bury and Palmer 2002) provide the basis for large scale semantic an-notation of corpora (Giuglea and Moschitti 2006 Palmer et al 2010)In this Section according with the Semantic Predicates theory (Gross1981) we performed a matching between the definitional syntacticstructures attributed to each class of verbs and the semantic infor-mation we attached in the database to every lexical entryThis way we could create a strict connection between the argumentsselected by a given Predicate listed in our tables and the actants in-volved into the same verbrsquos Semantic Frame (Fillmore 1976 Fillmoreand Baker 2001 Fillmore 2006) In the experiment that we are goingto describe we focused on the following frames

bull ldquoSentiment (eg odiare ldquoto hate) that involves the roles experi-encer and stimulus

bull ldquoOpinion (eg difendere ldquoto stand up for) that implicates theroles opinion holder target of the opinion and cause of the opin-ion

bull ldquoPhysical act (eg baciare ldquoto kiss) that evokes the roles patientand agent

531 Related Works

A group of researches in the field of Semantic Role Labeling focusedon the automatic annotation of the roles which are relevant to Sen-timent Analysis and opinion mining such as sourceopinion holdertargetthemetopic

214 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

Bethard et al (2004) with question answering purposes proposed anextension of semantic parsing techniques coupled with additionallexical and syntactic features for the automatic extraction of propo-sitional opinions The algorithm used to identify and label semanticconstituents is the one developed by Gildea and Jurafsky (2002) Prad-han et al (2003)Kim and Hovy (2006) presented a Semantic Role Labeling systembased on the Frame Semantics of Fillmore (1976) that aimed to tagthe frame elements semantically related to an opinion bearing wordexpressed in a sentence which is considered as the key indicatorof an opinion The learning algorithm used for the classificationmodel is Maximum Entropy (Berger et al 1996 Kim and Hovy 2005)FrameNet II Ruppenhofer et al (2006) has been used to collect framesrelated to opinion wordsOther works based on FrameNet are Houen (2011) that used Frame-Frame relations in order to expand the subset of polarity scoredFrames found in the FrameNet database Ruppenhofer et al (2008)Ruppenhofer and Rehbein (2012) and Ruppenhofer (2013) whichadded opinion frames with slots for source target polarity and in-tensity to the FrameNet database to build SentiFrameNet a lexi-cal database enriched with inherently evaluative items associated toopinion framesLexicon-based approaches are the one of Kim et al (2008) that madeuse of a collection of communication and appraisal verbs the Sen-tiWordNet lexicon a syntactic parser and a named entity recognizerfor the extraction of opinion holders from texts and the one ofBloom et al (2007) that extracted domain-dependent opinion holdersthrough a combination of heuristic shallow parsing and dependencyparsingConditional Random Fields (Lafferty et al 2001) is the method usedby Choi et al (2005) that interpreting the semantic role labeling taskas an information extraction problem exploited a hybrid approachthat combines two very different learning-based methods CRF for the

53 SENTIMENT ROLE LABELING 215

named entity recognition and a variation of AutoSlog (Riloff 1996) forthe information extraction CRF have been also used by by Choi et al(2006) who presented a global inference approach to jointly extractentities and relations in the context of opinion oriented informationextraction through two separate token-level sequence-tagging classi-fiers for opinion expression extraction and source extraction via lin-ear chain CRF (Lafferty et al 2001)

532 Experiment and Evaluation

The starting point of our experiment on SRL are the LG tables of theItalian verbs developed at the Department of Communication Sci-ence of the University of Salerno Among the lexical entries listed insuch matrices we manually extracted all the verbs endowed with apositive or negative defined semantic orientation Such verbs couldbe expression of emotions opinions or physical acts Details of thisclassification can be found in Section 333As an example in Tables 58 and 59 we show a small group of verbs be-longing to the Lexicon-Grammar class 45 This class includes all theverbs that can entry into a syntactic structure such as N0 V di N1 inwhich the ldquosubject (N0) selected by the verb (V ) is generally a humannoun (Num) and the complement (N1) is a completive (Ch F) or in-finitive (V-inf comp) clause usually introduced by the preposition ldquodi(see Table58) As shown in Table 59 our databases contain also se-mantic information concerning the nature the semantic orientationand the strength of the Predicates in examIn detail the tagset used in this experiment is the following

1 Type

(a) SENT sentiment

(b) OP opinion

(c) PHY physical act

216 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

N0=

Nu

m

N0=

ilfa

tto

Ch

F

Verb

N1=

che

F

N1=

che

Fco

ng

diN

1um

diN

1-u

m

diN

1co

ntr

oN

2

pre

pV

-in

fco

mp

+ - profittare + + + + + -+ - ridersene + + + + - -+ - risentirsi + + + + - ++ - strafottersene + + + + - -+ - vergognarsi + + + + - +

Table 58 Extract of the LG table of the verb Class 45

2 Orientation

(a) POS positive

(b) NEG negative

3 Intensity

(a) FORTE intense

(b) DEB feeble

Speaking in terms of Frame Semantics we identified in the OpinionMining and in the Sentiment Analysis field three Frames of interestrecalled by specific Predicates Sentiment Opinion and Physical actThe frame elements evoked by such frames are described below

Sentiment

It refers to the expression of any given frame of mind or affectivestateThe ldquosentiment words can be put in connection with some WordNetAffect categories (Strapparava et al 2004) such as emotion mood he-donic signalExamples are sdegnarsi ldquoto be indignant (class 10) odiare ldquoto hate(class 20) affezionarsi ldquoto grow fond (class 44B) flirtare ldquoto flirt

53 SENTIMENT ROLE LABELING 217

(class 9) disprezzare ldquoto despise (class 20) gioire ldquoto rejoice (class45)Predicates of that kind evoke as frame elements an experiencerthat feels the emotion or other internal states and a stimulusand event or a person that instigates such states (Gross 1995Gildea and Jurafsky 2002 Swier and Stevenson 2004 Palmeret al 2005 Elia 2013) This semantic frame summarizes theFrameNet ones connected to emotions such as Sensation Emo-tions_of_mental_activity Cause_to_experience Emotion_active Emo-tions Cause_emotion ect

Opinion The type ldquoOpinion instead is the expression of positiveor negative viewpoints beliefs or judgments that can be personal orshared by most people It comprehends among the the WN-affect cat-egories trait cognitive state behavior attitude Examples are ignorareldquoto neglect (class 20) premiare ldquoto reward (class 20) difendere ldquotodefend (class 27) esaltare ldquoto exalt (class 22) dubitare ldquoto doubt(class 45) condannare ldquoto condemn (class 49) deridere ldquoto make funof (class 50)The frame elements they evoke are an opinion holder that states anopinion about an object or an event and an opinion target that repre-sents the event or the object on which the opinion is expressed about(Kim and Hovy 2006 Liu 2012) Some predicates imply also the pres-ence of a cause that motivates the opinion (see Section 333)Into the FrameNet frame Opinion and Judgment the opinion holder iscalled Cognizer bu we preferred to use a word which is more commonin the Sentiment Analysis and in the Opinion Mining literature

Physical act

The type ldquoPhysical act comprises verbs like baciare ldquoto kiss (class18) suicidarsi ldquoto commit suicide (class 2) vomitare ldquoto vomit (class2A) schiaffeggiare ldquoto slap (class 20) sparare ldquoto shoot (class 4)

218 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

palpeggiare ldquoto grope (class 18)For this group of predicates the selected frame elements ara a patientthat is the victim (for the negative actions) or the beneficiary (for thepositive ones) of the physical act carried out by an agent (Carreras andMagraverquez 2005 Magraverquez et al 2008) It includes a large number ofFrameNet frames such as Cause_bodily_experience Shoot_projectilesKilling Cause_harm Rape Sex Violence ect

In order to perform the orientation and the intensity attributionwe manually explored the Italian LG tables of verbs and the SentItasemantic labelsThanks to lexical resources of this kind it is possible to automaticallyextract and semantically describe real occurrences of sentences like(85)

(85) [Sentiment[ExperiencerRenzi(N0)] si vergogna(V) [Stimulusdi parlare di energia in Eu-

ropa(N1)]]

ldquoRenzi feels ashamed of talking about energy in Europe

in which the syntactic structure of the verb vergognarsi ldquoto feelashamed N0 V (di) Ch F is matched by means of interpretationrules (eg V = N0 V (di) Ch F = Caus(s sent h) N0 = h N1 = s) to thesemantic function Caus(s sent h) that puts in relation an experiencerh and a stimulus s thanks to a Sentiment Semantic Predicate sent(Gross 1995 Section 214)Moreover we provided our LG databases with the specification of thearguments (N0 N1 N2 ect) that are semantically influenced by thesemantic orientation of the verbs The purpose is to correctly identifythem as features of the opinionated sentences and to work on theirbase also into feature-based sentiment analysis tasks

The reliability of the LG method on the Semantic Role Labelingin the Sentiment Analysis task has been tested on three different

53 SENTIMENT ROLE LABELING 219

Verb

LGC

lass

Typ

e

Ori

enta

tio

n

Inte

nsi

ty

Infl

uen

ce

profittare 45 opinion negative - N0ridersene 45 opinion negative weak N1risentirsi 45 sentiment negative - N0strafottersene 45 opinion negative strong N1vergognarsi 45 sentiment negative strong N0

Table 59 Examples of semantic descriptions of the opinionated verbs from the LGclass 45

datasets two of which have been extracted from social networks orweb resourcesIn detail the first two datasets come from Twitter and the thirdis a free web news headings dataset provided by DataMediaHub(wwwdatamediahubit wwwhumanhighwayit)The tweets have been downloaded using the hashtag that groupstogether the user comments on the election of the Italian PresidentMattarella (1) and the one that collects the comments on the Mas-terchef Italian TV show (2)

bull Tweets 46393 tweets

1 Mattarellapresidente 10000 tweets

2 Masterchefit 36393 tweets

bull News Headings 80651 titles

The LG based approach on which this experiment has been builtincludes the following basic steps

bull pre-processing pipeline that includes two phases

ndash a cleaning up phase carried out with Python routines that

220 CHAPTER 5 SPECIFIC TASKS OF THE SENTIMENT ANALYSIS

aims to distinguish in the datasets linguistic elements fromstructural elements (eg markup informations web specificelements)

ndash an automatic linguistic analysis phase with the goal to lin-guistically standardize relevant elements obtained from thecleaned datasets in this phase texts are tokenized lem-matized and POS tagged using TreeTagger (Schmid 1994Schmid et al 2007) and then parsed using DeSR a depen-dency parser (Attardi et al 2009)

bull LG based automatic analysis in which the raw data are seman-tically labeled according with the syntacticsemantic rules of in-terpretation connected with each LG verb class

Figure 55 Examples of syntactically and semantically annotated sentences

53 SENTIMENT ROLE LABELING 221

Figure 55 presents three headlines examples processed both withthe dependency syntactic parser and the semantic LG-based seman-tic analyzerNotice that the elements of the traditional grammar automaticallyidentified by DeSR such as subjects and complements have beenrenamed according with the Lexicon-Grammar traditionThe corpus that counts 127044 short texts has been analyzed andsemantically and syntactically annotatedThe representative sample on which the human evaluation has beenperformed instead has 42348 textsThe evaluation of the performances of our tool proved the effec-tiveness of the Lexicon-Grammar approach The average F-scoresachieved in the different datasets are 071 in the Twitter and 076 inthe Heading corpus

Chapter 6

Conclusion

The present research handled the detection of linguistic phenomenaconnected to subjectivity emotions and opinions from a computa-tional point of viewThe necessity to quickly monitor huge quantity of semistructured andunstructured data from the web poses several challenges to NaturalLanguage Processing that must provide strategies and tools to ana-lyze their structures from a lexical syntactical and semantic point ofviewsThe general aim of the Sentiment Analysis shared with the broaderfields of NLP Data Mining Information Extraction etc is the auto-matic extraction of value from chaos (Gantz and Reinsel 2011) itsspecific focus instead is on opinions rather than on factual informa-tion This is the aspect that differentiates it from other computationallinguistics subfieldsThe majority of the sentiment lexicons has been manually or auto-matically created for the English language therefore existent Italianlexicons are mostly built through the translation and adaptation of theEnglish lexical databases eg SentiWordNet and WordNet-AffectUnlike many other Italian and English sentiment lexicons SentIta

223

224 CHAPTER 6 CONCLUSION

made up on the interaction of electronic dictionaries and lexicon de-pendent local grammars is able to manage simple and multiwordstructures that can take the shape of distributionally free structuresdistributionally restricted structures and frozen structuresMoreover differently from other lexicon-based Sentiment Analysismethods our approach has been grounded on the solidity of theLexicon-Grammar resources and classifications that provide fine-grained semantic but also syntactic descriptions of the lexical entriesSuch lexically exhaustive grammars distance themselves from the ten-dency of other sentiment resources to classify together words thathave nothing in common from the syntactic point of viewFurthermore through the Semantic Predicate theory it has been pos-sible to postulate a complex parallelism among syntax and semanticsthat does not ignore the numerous idiosyncrasies of the lexiconAccording with the major contribution in the Sentiment Analysis lit-erature we did not consider polar words in isolation We computedthey elementary sentence contexts with the allowed transformationsand then their interaction with contextual valence shifters the lin-guistic devices that are able to modify the prior polarity of the wordsfrom SentIta when occurring with them in the same sentencesIn order to do so we took advantage of the computational power ofthe finite-state technology We formalized a set of rules that work forthe intensification downtoning and negation modeling the modalitydetection and the analysis of comparative formsHere the difference with other state-of-the-art strategies consists inthe elimination of complex mathematical calculation in favor of theeasier use of embedded graphs as containers for the expressions de-signed to receive the same annotations into a compositional frame-workWith regard to the applicative part of the research we conducted withsatisfactory results three experiments on the same number of Sen-timent Analysis subtasks the sentiment classification of documentsand sentences the feature-based Sentiment Analysis and the seman-

225

tic role labeling based on sentimentsThis work exploited a method based on knowledge but this doesnot run counter statistical strategies All the semantically annotateddatasets produced by our fine-grained analysis that can indeed bereused as testing sets for learning machinesAs concerns the limitations of this research we mention among oth-ers the cases of irony sarcasm and cultural stereotypes which stillremain open problems for the NLP in general and for the SentimentAnalysis in particular since they can completely overturn the polarityof the sentencesSarcasm hyperbole jocularity etc are all rhetorical devices usedto express verbal irony a figurative linguistic process used to inten-tionally deny what it is literally expressed (Gibbs 2000 Curcoacute 2000)Therefore irony could been interpreted as a sort of indirect negation(Giora 1995 Giora et al 1998) in which some expectations are vio-lated (Clark and Gerrig 1984 Wilson and Sperber 1992) On this pur-pose (Reyes et al 2013 p 243) affirms that irony is perceived ldquoat theboundaries of conflicting frames of reference in which an expectationof one frame has been inappropriately violated in a way that is appro-priate in the otherrdquoThis linguistic device has been explored in both computational lin-guistics studies and in Opinion Mining works Sets of potential indica-tors have been tested on large corpora through different approachesobtaining encouraging results over the state-of-the-art in some re-spectsIn detail Tepperman et al (2006) used prosodic spectral and contex-tual cues Carvalho et al (2009) selected as irony cues emoticons ono-matopoeic expressions for laughter heavy punctuation marks quota-tion marks and positive interjections Davidov et al (2010) exploitedthe meta-data provided by Amazon to locate gaps between the starratings and the sentiments expressed in the raw reviews Reyes andRosso (2011) chose n-grams part of speech n-grams funny profilingpositivenegative profiling affective profiling and pleasantness pro-

226 CHAPTER 6 CONCLUSION

filing Gonzaacutelez-Ibaacutenez et al (2011) identified sarcasm through punc-tuation marks emoticons quotes capitalized words discursive termsthat hint at opposition or contradiction in a text temporal and contex-tual imbalance character-grams skip-grams and emotional scenar-ios concerning activation imagery and pleasantness Maynard andGreenwood (2014) exploited the hashtag tokenizationAmong others Veale and Hao (2009) carried out a very interestingstudy on creative irony focusing on ironic comparisons of the kind ATopic is as Ground as a Vehicle (Fishelov 1993 Moon 2008) whichcan consist on pre-fabricated similes eg ldquoas strong as an oxrdquo (Tay-lor 1954 Norrick 1986 Moon 2008) or on new and more emphaticand creative variations along the frozen similes eg ldquoas dangerous asa toothless wolfrdquo The authors found out that the 2 of these elabora-tions subvert the original simile to achieve an ironic effect The prob-lem is that sarcasm in general is a phenomenon inherently difficult toanalyze for machines and often for humans as well (Justo et al 2014Maynard and Greenwood 2014) In addition the barriers to the cre-ation of brand new comparisons are very low (Veale and Hao 2009)In any case the solutions to verbal irony proposed in literature seemfar from being accurate and generalizable They frequently lead tocontradictory results such as with the use of punctuation markswhich made Carvalho et al (2009) and Tepperman et al (2006) reachrelatively high precision but served as very weak predictors for Justoet al (2014) and Tsur et al (2010)In this works speaking in cost-benefit terms we realized that thequantity of errors introduced by the use of irony cues would havegone over the errors caused by their absence Therefore we limited theirony analysis to the cases of frozen comparative sentences describedin Section 342 that can attribute a semantic orientation to neutralwords (86) or reverse the orientation of polar words (87)

(86) Arturo egrave bianco[0] come un cadavere [-2]ldquoArturo is as white as a dead body (Arturo is pale)

227

(87) Maria egrave agile[+2] come una gatta di piombo [-2]ldquoMaria is as agile as a lead cat (Maria is not agile)

Anyway as can be noticed in (88) and (89) ironic comparisons canvary considerably from one case to another (Veale and Hao 2009) andcannot be distinguished automatically from non-ironic comparisons(90) as long as we don not include (domain-dependent) encyclopedicknowledge into the analysis tools

(88) La ripresa egrave degna[+2] di un trattore con aratro inseritoldquoThe pickup is worthy of a tractor with an inserted ploughrdquo

(89) E quel tocco di piccante () egrave gradevole[+2] quanto lo sarebbe unaspruzzata di pepe su un gelato alla panna

ldquoAnd that touch of piquancy ( ) is as pleasant as a spattering ofpepper on a cream flavoured ice-creamrdquo

(90) Gli spazi interni sono comodi[+2] come una berlina [+2]ldquoInside spaces are as comfortable as a sedanrdquo

Similar comments on the required encyclopedic knowledge can bemade about cultural stereotypes as well (91 92)

(91) La nuova fiat 500 egrave consigliabile[+2] molto di piugrave ad una ragazza[-2]

ldquoThe new Fiat 500 is recommended a lot more to a girlrdquo

(92) Un gioco per bambini di 12 anni [-2]ldquoA game for 12 years old childrdquo

Appendix A

Italian Lexicon-GrammarSymbols

Agg Val Evaluative adjective

Agg Adjective

Avv Adverbe

Ci Constrained nominal group

Ch F Completive complement without specification of mood

Det Determiner

N0 Sentence formal subject

N123 Sentence complements

Ni Nominal group

N Noun

Napp Appropriate noun

Nconc Concrete noun

229

230 APPENDIX A ITALIAN LEXICON-GRAMMAR SYMBOLS

Npc Noun of body part

Nsent Noun of Sentiment

Num Human noun

N-um Not human noun

Ppv Pronoun in preverbal position

Prep Preposition

V Verb

V-a Adjectivalization

Vcaus Causative verb

V-n Nominalization

Vsup Support verb

Vval Evaluative verb

Vvar Support verb variant

Appendix B

Acronyms

ALU Atomic Linguistic Units

cBNP Combined Pattern Based Noun Phrases

CCG Combinatory Categorial Grammar

CFE Concordance-based Feature Extraction

CRF Conditional Random Field

CSR Class Sequential Rule

CVS Contextual Valence Shifters

DRS Discourse Representation Structure

ERTN Enhanced Recursive Transition Network

eWOM electronic Word of Mouth

FMM Forward Maximum Matching

FSA Finite State Automata

FST Finite State Transducers

231

232 APPENDIX B ACRONYMS

LDA Latent Dirichlet Allocation

LG Lexicon-Grammar

LSA Latent Semantic Analysis

LSF Lexical Syntactic Feature

mCVS Morphological Contextual Valence Shifters

MPQA Multi Perspective Question Answering

NLP Natural Language Processing

PMI Pointwise Mutual Information

POS Part of Speech

PP Prior Polarity

QN Quality Noun

RNTN Recursive Neural Tensor Network

RST Relevant Semantic Tree

RTN Recursive Transition Network

SO Semantic Orientation

SVM Support Vector Machine

TGG Transformational-Generative Grammar

UDF User Defined taxonomy of entity Feature

WMI Web-based Mutual Information

XML eXtensible Markup Language

List of Figures

21 Syntactic Trees of the sentences (1) and (2) 2422 Syntactic Trees of the sentences (1b) and (2b) 2623 LG syntax-semantic interface 3224 Deterministic Finite-State Automaton 3725 Non deterministic Finite-State Automaton 3726 Finite-State Transducer 3727 Enhanced Recursive Transition Network 37

31 FSA for the interaction between Nsent and semanticallyannotated verbs 88

32 Extract of the morphological FSA that associates psychverbs to their adjectivalizations 92

33 Frozen and semi-frozen expressions that involve the vul-gar word cazzo and its euphemisms 103

34 Extract of the FSA for the SO identification of idioms 12035 Extract of the FSA for the extraction of PECO idioms 12436 Extract of the FSA for the automatic annotation of senti-

ment adverbs 13737 Extract of the FSA for the automatic annotation of qual-

ity nouns 14338 Morphological Contextual Valence Shifting of Adjectives 150

233

234 LIST OF FIGURES

51 Doxa the LG sentiment classifier based on the finite-state technology 189

52 Extract of the Feature Extraction grammar 20653 Extract of the Feature Pruning metanode 20754 Doxa module for Feature-based Sentiment Analysis 21255 Examples of syntactically and semantically annotated

sentences 220

List of Tables

21 Example of a Lexicon-Grammar Binary Matrix 27

31 Manually built Polarity Lexicons 4632 (Semi-)Automatically built Polarity Lexicons 4633 Sentix Basile and Nissim (2013) 5434 SentIta Evaluation tagset 5735 SentIta Strength tagset 5736 Composition of SentIta 5837 SentIta tag distribution in the adjective list 6138 Adjectives percentage values in SentIta 6139 Examples from the Psych Predicates dictionary 67310 Psych Predicates chosen for SentIta 67311 All the Semantic Predicates from SentIta 68312 Polar verbs in the main Lexicon-Grammar verbal sub-

classes 79313 Polar verbs in other LG verb classes which contain at

least one oriented item 80314 Distribution of semantic labels into the LG verb classes

that contain at least one oriented item 81315 Description of the Nominalizations of the Psychological

Semantic Predicates 82316 Nominalizations of the Psych Predicates chosen from

SentIta 84

235

236 LIST OF TABLES

317 Distribution of semantic tags in the dictionary of Adjec-tival Psych Predicates 90

318 Percentage values of Psych Adjectivalizations in SentIta 91319 Suffixes from the dictionary of Psych Adjectivalizations 93320 Adjectivalizations of the Psych Predicates 94321 SentIta tag distribution in the list of bad words 105322 Bad Words percentage values in SentIta 105323 Classes of Compound Adverbs in SentIta 110324 Composition of the compound adverb dictionary 113325 Polar adjectives in frozen sentences 115326 Examples of idioms that maintain switch or shift the

prior polarity of the adjectives they contain 117327 Composition of the adverb dictionary 138328 Adjectiversquos inflectional classes Salvi and Vanelli (2004) 139329 Rules for the adverb derivation 139330 Error analysis of the automatic QN annotation 144331 Presence of the QN suffixes in different dictionaries 145332 Productivity of the QN suffixes in different dictionaries 146333 About the overlap among Psych V-n dictionary and QN

dictionary 147334 Precision measure about the overlap of QN and Psy V-n 147335 Negation suffixes 148336 Intensifierdowntoner suffixes 149

41 The effects of troppo poco and abbastanza on polar lex-ical items 160

42 The effects of the co-occurrence and the negation oftroppo poco with reference to polar lexical items 160

43 Negation rules 16744 Negation rules with the repetition of negative operators 16745 Negation rules with strong operators 16746 Negation rules with weak operators 168

LIST OF TABLES 237

47 Rule I increasing comparative sentences with positiveorientation 180

48 Rule III decreasing comparative sentences with nega-tive orientation 180

49 Rule IV increasing comparative sentences with NegativeOrientation 181

410 Rule VI decreasing comparative sentences with positiveorientation 181

411 Nagation of Rule I Negated increasing comparative sen-tences with negative orientation 182

412 Negation of Rule III Negated decreasing comparativesentences with positive orientation 182

413 Negation of Rule IV Negated increasing comparativesentences with negative orientation 183

414 Negation of RuleVI Negated decreasing comparativesentences with positive orientation 183

51 Dataset of opinionated online customer reviews 19052 Precision measure in sentence-level classification 19353 Irrelevant matches in sentence-level classification 19554 Precision measure in document-level classification 19655 Recall in both the sentence-level and the document level

classifications 19656 Semantically Labeled Dictionary of Concrete Nouns 20557 F-score of Feature Extraction and Pruning 20958 Extract of the LG table of the verb Class 45 21659 Examples of semantic descriptions of the opinionated

verbs from the LG class 45 219

Bibliography

Abdul-Mageed M and Diab M (2012) Toward building a large-scalearabic sentiment lexicon In Proceedings of the 6th InternationalGlobal WordNet Conference pages 18ndash22

Abdul-Mageed M Diab M T and Korayem M (2011) Subjectiv-ity and sentiment analysis of modern standard arabic In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies short papers-Volume 2pages 587ndash591 Association for Computational Linguistics

Agerri R and Garciacutea-Serrano A (2010) Q-WordNet Extracting polar-ity from WordNet senses In Proceedings of the International Confer-ence on Language Resources and Evaluation pages 2300ndash2305

Akerlof G A (1970) The market for ldquolemonsrdquo Quality uncertaintyand the market mechanism In The quarterly journal of economicspages 488ndash500 JSTOR

Alba J Lynch J Weitz B Janiszewski C Lutz R Sawyer A andWood S (1997) Interactive home shopping consumer retailer andmanufacturer incentives to participate in electronic marketplacesIn The Journal of Marketing pages 38ndash53 JSTOR

Alm C O Roth D and Sproat R (2005) Emotions from text ma-chine learning for text-based emotion prediction In Proceedingsof the conference on Human Language Technology and Empirical

239

240 BIBLIOGRAPHY

Methods in Natural Language Processing pages 579ndash586 Associa-tion for Computational Linguistics

Andreevskaia A and Bergler S (2008) When specialists and gener-alists work together Overcoming domain dependence in sentimenttagging In ACL pages 290ndash298

Arad M (1998) Psych-notes In UCL Working Papers in Linguisticsvolume 10 pages 1ndash22

Argamon S Bloom K Esuli A and Sebastiani F (2009) Automati-cally determining attitude type and force for sentiment analysis InHuman Language Technology Challenges of the Information Societypages 218ndash231 Springer

Attardi G DellrsquoOrletta F Simi M and Turian J (2009) Accuratedependency parsing with a stacked multilayer perceptron In Pro-ceedings of EVALITA volume 9

Aue A and Gamon M Customizing sentiment classifiers to new do-mains A case study In Proceedings of recent advances in naturallanguage processing (RANLP)

Baccianella S Esuli A and Sebastiani F (2010) SentiWordNet 30An enhanced lexical resource for sentiment analysis and opinionmining In LREC volume 10 pages 2200ndash2204

Baker C F Fillmore C J and Lowe J B (1998) The berkeleyframenet project In Proceedings of the 17th international confer-ence on Computational linguistics-Volume 1 pages 86ndash90 Associa-tion for Computational Linguistics

Baker M (1988) Theta theory and the syntax of applicatives inchichewa In Natural Language amp Linguistic Theory volume 6 pages353ndash389 Springer

Baker M C (1997) Thematic roles and syntactic structure In Ele-ments of grammar pages 73ndash137 Springer

BIBLIOGRAPHY 241

Balahur A and Turchi M (2012) Multilingual sentiment analysis us-ing machine translation In Proceedings of the 3rd Workshop inComputational Approaches to Subjectivity and Sentiment Analysispages 52ndash60 Association for Computational Linguistics

Baldoni M Baroglio C Patti V and Rena P (2012) From tags toemotions Ontology-driven sentiment analysis in the social seman-tic web In Intelligenza Artificiale volume 6 pages 41ndash54

Balibar-Mrabti A (1995) Une eacutetude de la combinatoire des noms desentiment dans une grammaire locale In Langue franccedilaise pages88ndash97 JSTOR

Baptista J (2003) Some families of compound temporal adverbs inportuguese In Proceedings of the Workshop on Finite-State Methodsfor Natural Language Processing

Baroni M and Vegnaduzzo S (2004) Identifying subjective adjec-tives through web-based mutual information In Proceedings ofKONVENS volume 4 pages 17ndash24

Basile V and Nissim M (2013) Sentiment analysis on italian tweetsIn Proceedings of the 4th Workshop on Computational Approaches toSubjectivity Sentiment and Social Media Analysis pages 100ndash107

Belletti A and Rizzi L (1988) Psych-verbs and θ-theory In NaturalLanguage amp Linguistic Theory volume 6 pages 291ndash352 Springer

Benamara F Cesarano C Picariello A Recupero D R and Subrah-manian V S (2007) Sentiment analysis Adjectives and adverbs arebetter than adjectives alone In ICWSM

Benamara F Chardon B Mathieu Y Popescu V and Asher N(2012) How do negation and modality impact on opinions In Pro-ceedings of the Workshop on Extra-Propositional Aspects of Meaningin Computational Linguistics pages 10ndash18 Association for Compu-tational Linguistics

242 BIBLIOGRAPHY

Bennis H (2004) Unergative adjectives and psych verbs In Studiesin unaccusativity The syntax-lexicon interface pages 84ndash113

Berger A L Pietra V J D and Pietra S A D (1996) A maximum en-tropy approach to natural language processing In Computationallinguistics volume 22 pages 39ndash71 MIT Press

Bespalov D Bai B Qi Y and Shokoufandeh A (2011) Sentimentclassification based on supervised latent n-gram analysis In Pro-ceedings of the 20th ACM international conference on Informationand knowledge management pages 375ndash382 ACM

Bethard S Yu H Thornton A Hatzivassiloglou V and Jurafsky D(2004) Automatic extraction of opinion propositions and their hold-ers In 2004 AAAI Spring Symposium on Exploring Attitude and Affectin Text volume 2224

Bhatti M W Wang Y and Guan L A neural network approachfor human emotion recognition in speech In Circuits and Systems2004 ISCASrsquo04 volume 2

Blitzer J Dredze M Pereira F et al (2007) Biographies bollywoodboom-boxes and blenders Domain adaptation for sentiment clas-sification In ACL volume 7 pages 440ndash447

Bloom K (2011) Sentiment analysis based on appraisal theory andfunctional local grammars PhD thesis Illinois Institute of Technol-ogy

Bloom K Garg N and Argamon S (2007) Extracting appraisal ex-pressions In HLT-NAACL volume 2007 pages 308ndash315

Bloomfield L (1933) Language holt new york In Edizione italiana(1974) Il linguaggio Il Saggiatore Milano

Bocchino F (2007) Lessico-Grammatica dellrsquoitaliano Le costruzioniintransitive PhD thesis Universitagrave degli studi di Salerno

BIBLIOGRAPHY 243

Bos J and Nissim M (2006) An empirical approach to the interpreta-tion of superlatives In Proceedings of the 2006 conference on empiri-cal methods in natural language processing pages 9ndash17 Associationfor Computational Linguistics

Bosco C Allisio L Mussa V Patti V Ruffo G Sanguinetti M andSulis E (2014) Detecting happiness in italian tweets Towards anevaluation dataset for sentiment analysis in felicitta In Proc of ES-SSLOD pages 56ndash63

Bosco C Patti V and Bolioli A (2013) Developing corpora for sen-timent analysis The case of irony and senti-tut In IEEE IntelligentSystems number 2 pages 55ndash63 IEEE

Boucher J and Osgood C E (1969) The pollyanna hypothesis InJournal of Verbal Learning and Verbal Behavior volume 8 pages 1ndash8 Elsevier

Bounie D Bourreau M Gensollen M and Waelbroeck P (2005)The effect of online customer reviews on purchasing decisions Thecase of video games In Retrieved July volume 8 page 2009 Citeseer

Breese J S Heckerman D and Kadie C (1998) Empirical analysisof predictive algorithms for collaborative filtering In Proceedingsof the Fourteenth conference on Uncertainty in artificial intelligencepages 43ndash52 Morgan Kaufmann Publishers Inc

Brill E (1995) Transformation-based error-driven learning and nat-ural language processing A case study in part-of-speech tagging InComputational linguistics volume 21 pages 543ndash565 MIT Press

Buchanan T W Lutz K Mirzazade S Specht K Shah N J ZillesK and Jaumlncke L (2000) Recognition of emotional prosody and ver-bal components of spoken language an fmri study In CognitiveBrain Research volume 9 pages 227ndash238 Elsevier

244 BIBLIOGRAPHY

Buvet P-A Girardin C Gross G and Groud C (2005) Les preacutedicatsdrsquoaffect In LIDIL number 32 pages ppndash125

Carbonell J G (1979) Subjective understanding Computer modelsof belief systems Technical report DTIC Document

Carenini G Ng R T and Zwart E (2005) Extracting knowledge fromevaluative text In Proceedings of the 3rd international conference onKnowledge capture pages 11ndash18 ACM

Carreras X and Magraverquez L (2005) Introduction to the conll-2005shared task Semantic role labeling In Proceedings of the Ninth Con-ference on Computational Natural Language Learning pages 152ndash164 Association for Computational Linguistics

Carvalho P Sarmento L Silva M J and De Oliveira E (2009) Cluesfor detecting irony in user-generated contents oh itrsquos so easy-) In Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion pages 53ndash56 ACM

Chen Y Zhou Y Zhu S and Xu H (2012) Detecting offensive lan-guage in social media to protect adolescent online safety In Pri-vacy Security Risk and Trust (PASSAT) 2012 International Confer-ence on and 2012 International Confernece on Social Computing (So-cialCom) pages 71ndash80 IEEE

Chetviorkin I and Loukachevitch N V (2012) Extraction of rus-sian sentiment lexicon for product meta-domain In COLING pages593ndash610

Chevalier J A and Mayzlin D (2006) The effect of word of mouthon sales Online book reviews In Journal of marketing research vol-ume 43 pages 345ndash354 American Marketing Association

Chin P (2005) The value of user-generated contents In Intranet Jour-nal www intranetjournal com

BIBLIOGRAPHY 245

Choi Y Breck E and Cardie C (2006) Joint extraction of entities andrelations for opinion recognition In Proceedings of the 2006 Confer-ence on Empirical Methods in Natural Language Processing pages431ndash439 Association for Computational Linguistics

Choi Y and Cardie C (2008) Learning with compositional seman-tics as structural inference for subsentential sentiment analysis InProceedings of the Conference on Empirical Methods in Natural Lan-guage Processing pages 793ndash801

Choi Y Cardie C Riloff E and Patwardhan S (2005) Identify-ing sources of opinions with conditional random fields and extrac-tion patterns In Proceedings of the conference on Human LanguageTechnology and Empirical Methods in Natural Language Processingpages 355ndash362 Association for Computational Linguistics

Chomsky N (1957) Syntactic Structures Mouton and Co The Hague-Paris

Chomsky N (1965) Aspects of the Theory of Syntax Number 11 MITpress

Chomsky N (1993) Lectures on government and binding The Pisalectures Number 9 Walter de Gruyter

Chu S-C and Kim Y (2011) Determinants of consumer engagementin electronic word-of-mouth (ewom) in social networking sites InInternational journal of Advertising volume 30 pages 47ndash75 Tayloramp Francis

Clark H H and Gerrig R J (1984) On the pretense theory of ironyIn Journal of Experimental Psychology General volume 113 pages121ndash126

Clark S and Curran J R (2004) The importance of supertagging forwide-coverage ccg parsing In Proceedings of the 20th international

246 BIBLIOGRAPHY

conference on Computational Linguistics page 282 Association forComputational Linguistics

Clematide S and Klenner M (2010) Evaluation and extension of apolarity lexicon for german In Proceedings of the First Workshop onComputational Approaches to Subjectivity and Sentiment Analysispages 7ndash13

Councill I G McDonald R and Velikovich L (2010) Whatrsquos greatand whatrsquos not learning to classify the scope of negation for im-proved sentiment analysis In Proceedings of the workshop on nega-tion and speculation in natural language processing pages 51ndash59Association for Computational Linguistics

Curcoacute C (2000) Irony Negation echo and metarepresentation InLingua volume 110 pages 257ndash280 Elsevier

DrsquoAgostino E (1992) Analisi del discorso metodi descrittividellrsquoitaliano drsquouso Loffredo

DrsquoAgostino E (2005) Grammatiche lessicalmente esaustive delle pas-sioni il caso dellrsquoio collerico le forme nominali In Quaderns drsquoItaliagravepages 149ndash169

DrsquoAgostino E De Bueriis G Cicalese A Monteleone M VellutinoD Messina S Langella A Santonicola S Longobardi F andGuglielmo D (2007) Lexicon-grammar classifications or better toget rid of anguish In 26th International Conference on Lexis andGrammar (LGCrsquo07)

DrsquoAgostino E and Elia A (1998) Il significato delle frasi un contin-uum dalle frasi semplici alle forme polirematiche In AA VV Ai limitidel linguaggio Laterza Bari pages 287ndash310

Dalianis H and Skeppstedt M (2010) Creating and evaluating a con-sensus for negated and speculative words in a swedish clinical cor-pus In Negation and Speculation in Natural Language Processing(NeSp-NLP 2010) page 5

BIBLIOGRAPHY 247

Dang Y Zhang Y and Chen H (2010) A lexicon-enhanced methodfor sentiment classification An experiment on online product re-views In Intelligent Systems IEEE volume 25 pages 46ndash53 IEEE

Darroch J N and Ratcliff D (1972) Generalized iterative scaling forlog-linear models In The annals of mathematical statistics pages1470ndash1480 JSTOR

Dasgupta S and Ng V (2009) Mine the easy classify the hard asemi-supervised approach to automatic sentiment classification InProceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural LanguageProcessing of the AFNLP Volume 2-Volume 2 pages 701ndash709 Asso-ciation for Computational Linguistics

Davidov D Tsur O and Rappoport A (2010) Semi-supervisedrecognition of sarcastic sentences in twitter and amazon In Pro-ceedings of the Fourteenth Conference on Computational NaturalLanguage Learning pages 107ndash116 Association for ComputationalLinguistics

de Albornoz J C Plaza L and Gervaacutes P (2012) Sentisense An easilyscalable concept-based affective lexicon for sentiment analysis InLREC pages 3562ndash3567

De Bueriis G and Elia A (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

De Gioia M (1994a) Chiaro come il sole a mezzogiorno una classedi avverbi idiomatici dellrsquoitaliano In The Linguist volume 33 pages220ndash225 Newnorth Print Ltd

De Gioia M (1994b) Problemi di rappresentazione et traduzione degliavverbi idiomatici Micromeacutegas

De Gioia M (2001) Avverbi idiomatici dellrsquoitaliano analisi lessico-grammaticale LrsquoHarmattan Italia

248 BIBLIOGRAPHY

De Mauro T and Thornton A M (1985) La predicazione teoria e ap-plicazione allrsquoitaliano In Sintassi e morfologia della lingua italianadrsquouso teorie ed applicazioni descrittive pages 487ndash519

De Silva L C Miyasato T and Nakatsu R (1997) Facial emotionrecognition using multi-modal information In Information Com-munications and Signal Processing 1997 ICICS Proceedings of 1997International Conference on volume 1 pages 397ndash401 IEEE

Denecke K (2008) Using SentiWordNet for multilingual sentimentanalysis In Data Engineering Workshop 2008 ICDEW 2008 IEEE24th International Conference on pages 507ndash512 IEEE

Descleacutes J Makkaoui O and Hacegravene T (2010) Automatic annota-tion of speculation in biomedical texts new perspectives and largescale evaluation In Negation and Speculation in Natural LanguageProcessing (NeSp-NLP 2010) page 32

Dewally M and Ederington L (2006) Reputation certification war-ranties and information as remedies for seller-buyer informationasymmetries Lessons from the online comic book market In TheJournal of Business volume 79 pages 693ndash729 JSTOR

Diamond D W (1989) Reputation acquisition in debt markets InThe journal of political economy pages 828ndash862 JSTOR

Dowty D R (1989) On the semantic content of the notion of lsquothe-matic rolersquo In Properties types and meaning pages 69ndash129 Springer

Dragut E C Yu C Sistla P and Meng W (2010) Construction ofa sentimental word dictionary In Proceedings of the 19th ACM in-ternational conference on Information and knowledge managementpages 1761ndash1764

Duan W Gu B and Whinston A B (2008) The dynamics of on-line word-of-mouth and product salesmdashan empirical investigation

BIBLIOGRAPHY 249

of the movie industry In Journal of retailing volume 84 pages 233ndash242 Elsevier

Elia A Per filo e per segno la struttura degli avverbi compostiIn DrsquoAgostino (a cura di) Tra sintassi e semantica Descrizioni emetodi di elaborazione automatica della lingua drsquouso pages 167ndash263 Napoli ESI

Elia A (1984) Le verbe italien Les compleacutetives dans les phrases agrave uncompleacutement Fasano di Puglia Schena-Nizet

Elia A (1990) Chiaro e tondo Lessico-Grammatica degli avverbi com-posti in italiano Segno Associati

Elia A (2013) On lexical semantic and syntactic granularity of ital-ian verbs In Penser le Lexique Grammaire Perspectives ActuellesHonoreacute Champion Paris pages 277ndash286

Elia A (2014a) Lessico e sintassi tra tempo e massa parlante InMarchese MP Nocentini A Il lessico nella teoria e nella storia lin-guistica pages 15ndash47 Edizioni il Calamo

Elia A (2014b) Operatori argomenti e il sistema leg-semantic rolelabelling dellrsquoitaliano In Mirto I a cura di Le relazioni irresistibilipages 105ndash118 Edizioni Ets

Elia A and De Bueriis G (2008) Lessici elettronici e descrizioni lessi-cali sintattiche morfologiche ed ortografiche Plectica

Elia A Guglielmo D Maisto A and Pelosi S (2013) A linguistic-based method for automatically extracting spatial relations fromlarge non-structured data In Algorithms and Architectures for Par-allel Processing pages 193ndash200 Springer

Elia A Martinelli M and DrsquoAgostino E (1981) Lessico e Strut-ture sintattiche Introduzione alla sintassi del verbo italiano NapoliLiguori

250 BIBLIOGRAPHY

Elia A Pelosi S Maisto A and Raffaele G (2015) Towards alexicon-grammar based framework for nlp an opinion mining ap-plication In RANLP 2015 pages 160ndash167

Elia A and Vietri S (2010) Lexis-grammar amp semantic web In IN-FOtheca volume 11 pages 15andash38a

Elia A Vietri S Postiglione A Monteleone M and Marano F(2010) Data mining modular software system In SWWS pages 127ndash133

Engstroumlm C (2004) Topic dependence in sentiment classification InUnpublished MPhil Dissertation University of Cambridge

Essa I Pentland A P et al (1997) Coding analysis interpretationand recognition of facial expressions In Pattern Analysis and Ma-chine Intelligence IEEE Transactions on volume 19 pages 757ndash763IEEE

Esuli A and Sebastiani F (2005) Determining the semantic orienta-tion of terms through gloss classification In Proceedings of the 14thACM international conference on Information and knowledge man-agement pages 617ndash624 ACM

Esuli A and Sebastiani F (2006a) Determining term subjectivity andterm orientation for opinion mining In EACL volume 6 page 2006

Esuli A and Sebastiani F (2006b) SentiWordNet A publicly avail-able lexical resource for opinion mining In Proceedings of LRECvolume 6 pages 417ndash422

Farkas D F and Kiss K Eacute (2000) On the comparative and absolutereadings of superlatives In Natural Language amp Linguistic Theoryvolume 18 pages 417ndash455 Springer

Farkas R Vincze V Moacutera G Csirik J and Szarvas G (2010) Theconll-2010 shared task learning to detect hedges and their scope in

BIBLIOGRAPHY 251

natural language text In Proceedings of the Fourteenth Conferenceon Computational Natural Language LearningmdashShared Task pages1ndash12 Association for Computational Linguistics

Fellbaum C (1998) WordNet Wiley Online Library

Ferreira L Jakob N and Gurevych I (2008) A comparative studyof feature extraction algorithms in customer reviews In SemanticComputing 2008 IEEE International Conference on pages 144ndash151IEEE

Filip H (1996) Psychological predicates and the syntax-semantics in-terface In Conceptual Structure Discourse and Language StanfordCenter for the Study of Language and Information

Fillmore C J (1976) Frame semantics and the nature of languageIn Annals of the New York Academy of Sciences volume 280 pages20ndash32 Wiley Online Library

Fillmore C J (1977) The case for case reopened In Syntax and se-mantics volume 8 pages 59ndash82

Fillmore C J (2006) Frame semantics In Cognitive linguisticsBasic readings volume 34 pages 373ndash400 Berlin Mounton deGruyter[Originally published in Linguistics in the Morning CalmSeoul Hanshin Publishing Co 1982]

Fillmore C J and Baker C F (2001) Frame semantics for text un-derstanding In Proceedings of WordNet and Other Lexical ResourcesWorkshop NAACL

Fillmore C J Baker C F and Sato H (2002) The framenet databaseand software tools In LREC

Fishelov D (1993) Poetic and non-poetic simile structure seman-tics rhetoric In Poetics Today pages 1ndash23 JSTOR

252 BIBLIOGRAPHY

Fiszman M Demner-Fushman D Lang F M Goetz P and Rind-flesch T C (2007) Interpreting comparative constructions inbiomedical text In Proceedings of the Workshop on BioNLP 2007Biological Translational and Clinical Language Processing pages137ndash144 Association for Computational Linguistics

Fragopanagos N and Taylor J G (2005) Emotion recognition inhumanndashcomputer interaction In Neural Networks volume 18pages 389ndash405 Elsevier

Gaeta L (2004) Nomi drsquoazione In La formazione delle parole in ital-iano Tuumlbingen Max Niemeyer Verlag pages 314ndash51

Gamon M and Aue A (2005) Automatic identification of senti-ment vocabulary exploiting low association with known sentimentterms In Proceedings of the ACL Workshop on Feature Engineeringfor Machine Learning in Natural Language Processing pages 57ndash64

Ganapathibhotla M and Liu B (2008) Mining opinions in compara-tive sentences In Proceedings of the 22nd International Conferenceon Computational Linguistics-Volume 1 pages 241ndash248 Associationfor Computational Linguistics

Ganter V and Strube M (2009) Finding hedges by chasing weaselsHedge detection using wikipedia tags and shallow linguistic fea-tures In Proceedings of the ACL-IJCNLP 2009 Conference Short Pa-pers pages 173ndash176 Association for Computational Linguistics

Gantz J and Reinsel D (2011) Extracting value from chaos In IDCiview number 1142 pages 9ndash10

Garcia D Garas A and Schweitzer F (2012) Positive words carryless information than negative words In EPJ Data Science volume 1EPJ

Gatti L and Guerini M (2012) Assessing sentiment strength in wordsprior polarities In arXiv preprint arXiv12124315

BIBLIOGRAPHY 253

Gibbs R W (2000) Irony in talk among friends In Metaphor andsymbol volume 15 pages 5ndash27 Taylor amp Francis

Gildea D and Jurafsky D (2002) Automatic labeling of semanticroles In Computational linguistics volume 28 pages 245ndash288 MITPress

Giora R (1995) On irony and negation In Discourse processes vol-ume 19 pages 239ndash264 Taylor amp Francis

Giora R Fein O and Schwartz T (1998) Irony Grade salience andindirect negation In Metaphor and Symbol volume 13 pages 83ndash101 Taylor amp Francis

Giordano R and Voghera M (2008) Frasi senza verbo il contributodella prosodia In Sintassi storica e sincronica dellrsquoitaliano Atti delConv Intern SILFI Basilea

Giuglea A-M and Moschitti A (2006) Semantic role labeling viaframenet verbnet and propbank In Proceedings of the 21st Inter-national Conference on Computational Linguistics and the 44th an-nual meeting of the Association for Computational Linguistics pages929ndash936 Association for Computational Linguistics

Godard D (2013) Les neacutegateurs In La grande Grammaire du franccedilaisActes Sud

Goldberg A B and Zhu X (2006) Seeing stars when there arenrsquot manystars graph-based semi-supervised learning for sentiment catego-rization In Proceedings of the First Workshop on Graph Based Meth-ods for Natural Language Processing pages 45ndash52 Association forComputational Linguistics

Gonzaacutelez-Ibaacutenez R Muresan S and Wacholder N (2011) Identify-ing sarcasm in twitter a closer look In Proceedings of the 49th An-nual Meeting of the Association for Computational Linguistics Hu-man Language Technologies short papers-Volume 2 pages 581ndash586Association for Computational Linguistics

254 BIBLIOGRAPHY

Grimshaw J (1990) Argument structure the MIT Press

Gross G (1992a) Un outil pour le FLE les classes d lsquoobjets In Actesdu colloque FLE pages 169ndash192

Gross G (2008) Les classes drsquoobjets In Lalies number 28 pages 111ndash165

Gross M (1971) Transformational Analysis of French Verbal Con-structions University of Pennsylvania

Gross M (1975) Meacutethodes en syntaxe Hermann

Gross M (1979) On the failure of generative grammar In Languagepages 859ndash885 JSTOR

Gross M (1981) Les bases empiriques de la notion de preacutedicat seacute-mantique In Langages pages 7ndash52 JSTOR

Gross M (1982) Une classification des phrases laquofigeacuteesraquo du franccedilaisIn Revue queacutebeacutecoise de linguistique volume 11 pages 151ndash185 Uni-versiteacute du Queacutebec agrave Montreacuteal

Gross M (1986) Lexicon-grammar the representation of compoundwords In Proceedings of the 11th coference on Computational lin-guistics pages 1ndash6 Association for Computational Linguistics

Gross M (1988) Les limites de la phrase figeacutee In Langages pages7ndash22 JSTOR

Gross M (1990) La caracteacuterisation des adverbes dans un lexique-grammaire In Langue franccedilaise pages 90ndash102 JSTOR

Gross M (1992b) The argument structure of elementary sentencesIn Language Research volume 28 pages 699ndash716

Gross M (1993) Les phrases figeacutees en franccedilais In Lrsquoinformationgrammaticale volume 59 pages 36ndash41 Peeters

BIBLIOGRAPHY 255

Gross M (1995) Une grammaire locale de lrsquoexpression des senti-ments In Langue franccedilaise pages 70ndash87 JSTOR

Gross M (1996) Les verbes supports drsquoadjectifs et le passif In Lan-gages volume 30 pages 8ndash18 Armand Colin

Gross M (1997) The construction of local grammars In Finite-statelanguage processing volume 11 page 329 MIT Press

Gruber J S (1965) Studies in lexical relations PhD thesis Mas-sachusetts Institute of Technology

Gruen T W Osmonbekov T and Czaplewski A J (2006) ewomThe impact of customer-to-customer online know-how exchangeon customer value and loyalty In Journal of Business research vol-ume 59 pages 449ndash456 Elsevier

Guglielmo D (2009) ldquoGuardarsi allo specchiordquo forme lessicali e sin-tattiche della vanitagrave In Atti del II Convegno Interdipartimentale delCentro Studi sulle Rappresentazioni Linguistiche

Guillet A and Leclegravere C (1981) Restructuration du groupe nominalIn Langages pages 99ndash125 JSTOR

Gutieacuterrez Y Vaacutezquez S and Montoyo A (2011) Sentiment classi-fication using semantic features extracted from WordNet-based re-sources In Proceedings of the 2nd Workshop on Computational Ap-proaches to Subjectivity and Sentiment Analysis pages 139ndash145 As-sociation for Computational Linguistics

Hanani U Shapira B and Shoval P (2001) Information filteringOverview of issues research and systems In User Modeling andUser-Adapted Interaction volume 11 pages 203ndash259 Kluwer Aca-demic Publishers

Hansen L K Arvidsson A Nielsen F Aring Colleoni E and Etter M(2011) Good friends bad news-affect and virality in twitter In Fu-ture information technology pages 34ndash43 Springer

256 BIBLIOGRAPHY

Harkema H Dowling J N Thornblade T and Chapman W W(2009) Context An algorithm for determining negation expe-riencer and temporal status from clinical reports In Journal ofbiomedical informatics volume 42 page 839 NIH Public Access

Harris Z (1988) Language and information Columbia UniversityPress

Harris Z S (1964) Transformations in linguistic structure In Pro-ceedings of the American Philosophical Society pages 418ndash422

Harris Z S (1970) Discourse analysis In Papers in structural andtransformational linguistics pages 313ndash347

Hassan A and Radev D (2010) Identifying text polarity using randomwalks In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics pages 395ndash403

Hatzivassiloglou V and McKeown K R (1997) Predicting the seman-tic orientation of adjectives In Proceedings of the 35th annual meet-ing of the association for computational linguistics and eighth con-ference of the european chapter of the association for computationallinguistics pages 174ndash181 Association for Computational Linguis-tics

Herlocker J L Konstan J A Borchers A and Riedl J (1999) An al-gorithmic framework for performing collaborative filtering In Pro-ceedings of the 22nd annual international ACM SIGIR conference onResearch and development in information retrieval pages 230ndash237ACM

Hernandez-Farias I Buscaldi D and Priego-Saacutenchez B (2014)Iradabe Adapting english lexicons to the italian sentiment polar-ity classification task In First Italian Conference on ComputationalLinguistics (CLiC-it 2014) and the fourth International WorkshopEVALITA2014 pages 75ndash81

BIBLIOGRAPHY 257

Horn L (1989) A natural history of negation University of ChicagoPress

Houen S (2011) Opinion mining with semantic analysis In Onlinewww diku dk forskning Publikationer specialer 2011

specialerapport_ final_ Soren_ Houen pdf

Hu M and Liu B (2004) Mining opinion features in customer re-views In AAAI volume 4 pages 755ndash760

Hu M and Liu B (2006) Opinion feature extraction using classsequential rules In AAAI Spring Symposium Computational Ap-proaches to Analyzing Weblogs pages 61ndash66

Iacobini C (2004) Prefissazione In La formazione delle parole initaliano Tuumlbingen Max Niemeyer Verlag pages 97ndash161

Jackendoff R (1972) Semantic interpretation in generative grammarMIT press Cambridge MA

Jackendoff R (1990) Semantic structures volume 18 MIT press

Jespersen O (1965) The philosophy of grammar Number 307 Uni-versity of Chicago Press

Jia L Yu C and Meng W (2009) The effect of negation on senti-ment analysis and retrieval effectiveness In Proceedings of the 18thACM conference on Information and knowledge management pages1827ndash1830 ACM

Jin X Li Y Mah T and Tong J (2007) Sensitive webpage classifica-tion for content advertising In Proceedings of the 1st internationalworkshop on Data mining and audience intelligence for advertisingpages 28ndash33 ACM

Jindal N and Liu B (2006a) Identifying comparative sentences intext documents In Proceedings of the 29th annual internationalACM SIGIR conference on Research and development in informationretrieval pages 244ndash251 ACM

258 BIBLIOGRAPHY

Jindal N and Liu B (2006b) Mining comparative sentences and re-lations In AAAI volume 22 pages 1331ndash1336

Jindal N and Liu B (2007) Review spam detection In Proceedings ofthe 16th international conference on World Wide Web pages 1189ndash1190 ACM

Justo R Corcoran T Lukin S M Walker M and Torres M I (2014)Extracting relevant knowledge for the detection of sarcasm and nas-tiness in the social web In Knowledge-Based Systems volume 69pages 124ndash133 Elsevier

Kaji N and Kitsuregawa M (2007) Building lexicon for sentimentanalysis from massive collection of html documents In EMNLP-CoNLL pages 1075ndash1083

Kamp H and Reyle U (1993) From discourse to logic Introduction tomodeltheoretic semantics of natural language formal logic and dis-course representation theory Number 42 Springer Science amp Busi-ness Media

Kamps J Marx M Mokken R J and De Rijke M (2004) Usingwordnet to measure semantic orientations of adjectives In LRECvolume 4 pages 1115ndash1118

Kanayama H and Nasukawa T (2006a) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In Proceedings ofthe 2006 Conference on Empirical Methods in Natural Language Pro-cessing pages 355ndash363 Association for Computational Linguistics

Kanayama H and Nasukawa T (2006b) Fully automatic lexicon ex-pansion for domain-oriented sentiment analysis In COLING ACL2006 page 355

Kang H Yoo S J and Han D (2012) Senti-lexicon and improvednaiumlve bayes algorithms for sentiment analysis of restaurant reviews

BIBLIOGRAPHY 259

In Expert Systems with Applications volume 39 pages 6000ndash6010Elsevier

Kennedy A and Inkpen D (2006) Sentiment classification of moviereviews using contextual valence shifters In Computational intelli-gence volume 22 pages 110ndash125 Wiley Online Library

Khan K Baharudin B and Khan A Identifying product featuresfrom customer reviews using hybrid dependency patterns Citeseer

Khan K Baharudin B B and City T (2012) Identifying product fea-tures from customer reviews using lexical concordance In ResearchJournal of Applied Sciences Engineering and Technology volume 4pages 833ndash839

Kim S-M and Hovy E (2004) Determining the sentiment of opin-ions In Proceedings of the 20th international conference on Compu-tational Linguistics page 1367

Kim S-M and Hovy E (2005) Identifying opinion holders for ques-tion answering in opinion texts In Proceedings of AAAI-05 Workshopon Question Answering in Restricted Domains pages 1367ndash1373

Kim S-M and Hovy E (2006) Extracting opinions opinion holdersand topics expressed in online news media text In Proceedings ofthe Workshop on Sentiment and Subjectivity in Text pages 1ndash8 As-sociation for Computational Linguistics

Kim Y Kim S and Myaeng S-H (2008) Extracting topic-relatedopinions and their targets in NTCIR-7 In Proceedings of the 7th NT-CIR Workshop Meeting Tokyo Japan

Kim Y-j and Larson R (1989) Scope interpretation and the syntaxof psych-verbs In Linguistic Inquiry pages 681ndash688 JSTOR

Kingsbury P and Palmer M (2002) From treebank to propbank InLREC

260 BIBLIOGRAPHY

Klein B and Leffler K B (1981) The role of market forces in assuringcontractual performance In The Journal of Political Economy pages615ndash641 JSTOR

Klein K and Kutscher S (2002) Psych-verbs and lexical economy InTheorie des Lexikons volume 122 pages 1ndash41

Klein L R (1998) Evaluating the potential of interactive mediathrough a new lens Search versus experience goods In Journal ofbusiness research volume 41 pages 195ndash203 Elsevier

Kleinberg J M (1999) Authoritative sources in a hyperlinked envi-ronment In Journal of the ACM (JACM) volume 46 pages 604ndash632ACM

Kobayakawa T S Kumano T Tanaka H Okazaki N Kim J-D andTsujii J (2009) Opinion classification with tree kernel svm usinglinguistic modality analysis In Proceedings of the 18th ACM confer-ence on Information and knowledge management pages 1791ndash1794ACM

Ku L-W Huang T-H and Chen H-H (2009) Using morphologicaland syntactic structures for chinese opinion analysis In Proceedingsof the 2009 Conference on Empirical Methods in Natural LanguageProcessing Volume 3-Volume 3 pages 1260ndash1269

Lafferty J McCallum A and Pereira F C (2001) Conditional ran-dom fields Probabilistic models for segmenting and labeling se-quence data

Lakoff G (1973) Hedges A study in meaning criteria and the logicof fuzzy concepts In Journal of philosophical logic volume 2 pages458ndash508 Springer

Landau I (2010) The locative syntax of experiencers volume 34 MITpress Cambridge

BIBLIOGRAPHY 261

Landauer T K and Dumais S T (1997) A solution to platorsquos problemThe latent semantic analysis theory of acquisition induction andrepresentation of knowledge In Psychological review volume 104page 211 American Psychological Association

Laporte E (1997) Lrsquoanalyse de phrases adjectivales par reacutetablisse-ment de noms approprieacutes In Langages volume 31 pages 79ndash104Armand Colin

Laporte E (2012) Appropriate nouns with obligatory modifiers InarXiv preprint arXiv12074625

Laporte E and Voyatzi S (2008) An electronic dictionary of frenchmultiword adverbs In Language Resources and Evaluation Confer-ence Workshop Towards a Shared Task for Multiword Expressionspages 31ndash34

Larreya P (2004) Lrsquoexpression de la modaliteacute en franccedilais et enanglais (domaine verbal) In Revue belge de philologie et drsquohistoirevolume 82 pages 733ndash762 Socieacuteteacute pour le progregraves des eacutetudesphilologiques et historiques

Lavanda I and Rampa G (2001) Microeconomia scelte individuali ebenessere sociale Carocci

Laver M Benoit K and Garry J (2003) Extracting policy positionsfrom political texts using words as data In American Political Sci-ence Review volume 97 pages 311ndash331 Cambridge Univ Press

Le Pesant D and Mathieu-Colas M (1998) Introduction aux classesdrsquoobjets In Langages pages 6ndash33 JSTOR

Lee C M Narayanan S S and Pieraccini R (2002) Combiningacoustic and language information for emotion recognition In IN-TERSPEECH

262 BIBLIOGRAPHY

Lee M and Youn S (2009) Electronic word of mouth (ewom) howewom platforms influence consumer product judgement In Inter-national Journal of Advertising volume 28 pages 473ndash499 Taylor ampFrancis

Lenci A Lapesa G and Bonansinga G (2012) Lexit A computa-tional resource on italian argument structure In LREC pages 3712ndash3718

Leung C W Chan S C and Chung F-l (2006) Integrating collabo-rative filtering and sentiment analysis A rating inference approachIn Proceedings of the ECAI 2006 workshop on recommender systemspages 62ndash66

Leung C W Chan S C and Chung F-L (2008) Evaluation of a rat-ing inference approach to utilizing textual reviews for collaborativerecommendation In Cooperative Internet Computing World Scien-tific Publisher World Scientific

Li F Huang M and Zhu X (2010) Sentiment analysis with globaltopics and local dependency In AAAI

Linden G Smith B and York J (2003) Amazon com recommenda-tions Item-to-item collaborative filtering In Internet ComputingIEEE volume 7 pages 76ndash80 IEEE

Liu B (2010) Sentiment analysis and subjectivity In Handbook ofnatural language processing volume 2 pages 627ndash666 Chapman ampHall Goshen CT

Liu B (2012) Sentiment analysis and opinion mining In SynthesisLectures on Human Language Technologies volume 5 pages 1ndash167Morgan amp Claypool Publishers

Liu J and Seneff S (2009) Review sentiment scoring via a parse-and-paraphrase paradigm In Proceedings of the 2009 Conference on Em-pirical Methods in Natural Language Processing Volume 1-Volume1 pages 161ndash169 Association for Computational Linguistics

BIBLIOGRAPHY 263

Macdonald C Ounis I and Soboroff I (2007) Overview of the trec2007 blog track In TREC volume 7 pages 31ndash43

Mahmud A Ahmed K Z and Khan M (2008) Detecting flames andinsults in text In Proceedings of the Sixth International Conferenceon Natural Language Processing BRAC University

Maisto A and Pelosi S (2014a) Feature-based customer review sum-marization In On the Move to Meaningful Internet Systems OTM2014 Workshops pages 299ndash308

Maisto A and Pelosi S (2014b) A lexicon-based approach to senti-ment analysis the italian module for nooj In Proceedings of the In-ternational Nooj 2014 Conference University of Sassari Italy Cam-bridge Scholar Publishing

Maks I and Vossen P (2011) Different approaches to automatic po-larity annotation at synset level In Proceedings of the First Interna-tional Workshop on Lexical Resources WoLeR pages 62ndash69

Maks I Vossen P et al (2012) Building a fine-grained subjectivitylexicon from a web corpus In LREC pages 3070ndash3076

Markov A (1971) Extension of the limit theorems of probability theoryto a sum of variables connected in a chain John Wiley amp Sons Inc

Magraverquez L Carreras X Litkowski K C and Stevenson S (2008)Semantic role labeling an introduction to the special issue In Com-putational linguistics volume 34 pages 145ndash159 MIT Press

Martiacuten J (1998) The lexico-syntactic interface psych verbs andpsych nouns In Hispania pages 619ndash631 JSTOR

Mathieu Y Y (1995) Verbes psychologiques et interpreacutetation seacuteman-tique In Langue franccedilaise pages 98ndash106 JSTOR

Mathieu Y Y (1999a) Les preacutedicats de sentiment In Langages pages41ndash52 JSTOR

264 BIBLIOGRAPHY

Mathieu Y Y (1999b) Un classement seacutemantique des verbes psy-chologiques In Cahiers du CIEL Publications Paris 7

Mathieu Y Y (2006) A computational semantic lexicon of frenchverbs of emotion In Computing attitude and affect in text Theoryand applications pages 109ndash124 Springer

Matthews H Hendrickson C and Soh D (2001) Environmentaland economic effects of e-commerce A case study of book publish-ing and retail logistics In Transportation Research Record Journal ofthe Transportation Research Board number 1763 pages 6ndash12 Trans-portation Research Board of the National Academies

Maynard D and Greenwood M A (2014) Who cares about sarcastictweets investigating the impact of sarcasm on sentiment analysisIn Proceedings of LREC

Maziero E G Pardo T A Di Felippo A and Dias-da Silva B C(2008) A base de dados lexical ea interface web do tep 20 thesauruseletrocircnico para o portuguecircs do brasil In Companion Proceedingsof the XIV Brazilian Symposium on Multimedia and the Web pages390ndash392 ACM

McAfee A Brynjolfsson E Davenport T H Patil D and Barton D(2012) Big data In The management revolution Harvard Bus Revvolume 90 pages 61ndash67

Meier C (2003) The meaning of too enough and so that In NaturalLanguage Semantics volume 11 pages 69ndash107 Springer

Meillet A (1906) La Phrase nominale en indoeuropeacuteen Socieacuteteacute delinguistique

Mejova Y and Srinivasan P (2011) Exploring feature definition andselection for sentiment classifiers In ICWSM

BIBLIOGRAPHY 265

Meunier A (1984) La seacutemantique locative de certaines structuresN0 ecirctre adj In Revue queacutebeacutecoise de linguistique volume 13 pages95ndash121 Universiteacute du Queacutebec agrave Montreacuteal

Meunier A (1999) Une construction complexe N0hum ecirctre Adj deV0-inf W caracteeacuteristique de certains adjectifs aacute sujet humain InLangages pages 12ndash44 JSTOR

Meydan M (1996) Constructions adjectivales substantifs approprieacuteset verbes supports In Linx volume 34 pages 197ndash210 Centre derecherches linguistiques de Paris 10

Meydan M (1999) La restructuration du gn sujet dans les phrases ad-jectivales agrave substantif approprieacute In Langages pages 59ndash80 JSTOR

Miller G A (1995) WordNet a lexical database for english In Com-munications of the ACM volume 38 pages 39ndash41 ACM

Mohammad S Dunne C and Dorr B (2009) Generating high-coverage semantic orientation lexicons from overtly marked wordsand a thesaurus In Proceedings of the 2009 Conference on EmpiricalMethods in Natural Language Processing Volume 2-Volume 2 pages599ndash608 Association for Computational Linguistics

Moilanen K and Pulman S (2007) Sentiment composition In Pro-ceedings of the Recent Advances in Natural Language Processing In-ternational Conference pages 378ndash382

Moilanen K and Pulman S (2008) The good the bad and the un-known morphosyllabic sentiment tagging of unseen words In Pro-ceedings of the 46th Annual Meeting of the Association for Computa-tional Linguistics on Human Language Technologies Short Paperspages 109ndash112

Molinier C and Levrier F (2000) Grammaire des adverbes descrip-tion des formes en-ment volume 33 Librairie Droz

266 BIBLIOGRAPHY

Monachesi P (2009) Annotation of semantic roles In Utrechet Uni-versity Netherlands

Montefinese M Ambrosini E Fairfield B and Mammarella N(2014) The adaptation of the affective norms for english words(anew) for italian In Behavior research methods volume 46 pages887ndash903 Springer

Moon R (2008) Conventionalized as-similes in english a problemcase In International Journal of Corpus Linguistics volume 13pages 3ndash37 John Benjamins Publishing Company

Mulder M Nijholt A Den Uyl M and Terpstra P (2004) A lexi-cal grammatical implementation of affect In Text Speech and Dia-logue pages 171ndash177

Mullen T and Collier N (2004) Sentiment analysis using supportvector machines with diverse information sources In EMNLP vol-ume 4 pages 412ndash418

Mullen T and Malouf R (2006) A preliminary investigation into sen-timent analysis of informal political discourse In AAAI Spring Sym-posium Computational Approaches to Analyzing Weblogs pages159ndash162

Nakagawa T Inui K and Kurohashi S (2010) Dependency tree-based sentiment classification using crfs with hidden variables InHuman Language Technologies The 2010 Annual Conference of theNorth American Chapter of the Association for Computational Lin-guistics pages 786ndash794 Association for Computational Linguistics

Nakayama M Sutcliffe N and Wan Y (2010) Has the web trans-formed experience goods into search goods In Electronic Marketsvolume 20 pages 251ndash262 Springer

Narayanan R Liu B and Choudhary A (2009) Sentiment analy-sis of conditional sentences In Proceedings of the 2009 Conference

BIBLIOGRAPHY 267

on Empirical Methods in Natural Language Processing Volume 1-Volume 1 pages 180ndash189

Nasukawa T and Yi J (2003) Sentiment analysis Capturing favora-bility using natural language processing In Proceedings of the 2ndinternational conference on Knowledge capture pages 70ndash77 ACM

Navigli R and Ponzetto S P (2012) BabelNet The automatic con-struction evaluation and application of a wide-coverage multilin-gual semantic network volume 193 pages 217ndash250

Nelson P (1970) Information and consumer behavior In The Journalof Political Economy pages 311ndash329 JSTOR

Neviarouskaya A (2010) Compositional Approach for AutomaticRecognition of Fine-Grained Affect Judgment and Appreciation inText PhD thesis University of Tokyo

Neviarouskaya A Prendinger H and Ishizuka M (2007) Textualaffect sensing for sociable and expressive online communicationIn Affective Computing and Intelligent Interaction pages 218ndash229Springer

Neviarouskaya A Prendinger H and Ishizuka M (2009a) Com-positionality principle in recognition of fine-grained emotions fromtext In ICWSM

Neviarouskaya A Prendinger H and Ishizuka M (2009b) Senti-ful Generating a reliable lexicon for sentiment analysis In AffectiveComputing and Intelligent Interaction and Workshops 2009 ACII2009 3rd International Conference on pages 1ndash6 IEEE

Neviarouskaya A Prendinger H and Ishizuka M (2011) Sentiful Alexicon for sentiment analysis In Affective Computing IEEE Trans-actions on volume 2 pages 22ndash36 IEEE

268 BIBLIOGRAPHY

Niedenthal P M Halberstadt J B Margolin J and Innes-Ker Aring H(2000) Emotional state and the detection of change in facial ex-pression of emotion In European Journal of Social Psychology vol-ume 30 pages 211ndash222 Wiley Online Library

Nielsen F Aring (2011) A new anew Evaluation of a word list for senti-ment analysis in microblogs In arXiv preprint arXiv11032903

Norrick N R (1986) Stock similes In Journal of Literary Semanticsvolume 15 pages 39ndash52

Osgood C E (1952) The nature and measurement of meaning InPsychological bulletin volume 49 page 197 American PsychologicalAssociation

Pak A and Paroubek P (2011) Twitter for sentiment analysis Whenlanguage resources are not available In Database and Expert Sys-tems Applications (DEXA) 2011 22nd International Workshop onpages 111ndash115 IEEE

Pal P Iyer A N and Yantorno R E (2006) Emotion detection frominfant facial expressions and cries In Acoustics Speech and SignalProcessing 2006 ICASSP 2006 Proceedings 2006 IEEE InternationalConference on volume 2 pages IIndashII IEEE

Palmer M Gildea D and Kingsbury P (2005) The proposition bankAn annotated corpus of semantic roles In Computational linguis-tics volume 31 pages 71ndash106 MIT press

Palmer M Gildea D and Xue N (2010) Semantic role labelingIn Synthesis Lectures on Human Language Technologies volume 3pages 1ndash103 Morgan amp Claypool Publishers

Pang B and Lee L (2004) A sentimental education Sentiment anal-ysis using subjectivity summarization based on minimum cuts InProceedings of the 42nd annual meeting on Association for Compu-tational Linguistics page 271 Association for Computational Lin-guistics

BIBLIOGRAPHY 269

Pang B and Lee L (2005) Seeing stars Exploiting class relation-ships for sentiment categorization with respect to rating scales InProceedings of the 43rd Annual Meeting on Association for Compu-tational Linguistics pages 115ndash124 Association for ComputationalLinguistics

Pang B and Lee L (2008a) Opinion mining and sentiment analysisIn Foundations and trends in information retrieval volume 2 pages1ndash135 Now Publishers Inc

Pang B and Lee L (2008b) Using very simple statistics for reviewsearch An exploration In COLING (Posters) pages 75ndash78

Pang B Lee L and Vaithyanathan S (2002) Thumbs up senti-ment classification using machine learning techniques In Proceed-ings of the ACL-02 conference on Empirical methods in natural lan-guage processing-Volume 10 pages 79ndash86 Association for Compu-tational Linguistics

Pantic M and Patras I (2006) Dynamics of facial expression recog-nition of facial actions and their temporal segments from face pro-file image sequences In Systems Man and Cybernetics Part B Cy-bernetics IEEE Transactions on volume 36 pages 433ndash449 IEEE

Park C and Lee T M (2009) Information direction website reputa-tion and ewom effect A moderating role of product type In Journalof Business Research volume 62 pages 61ndash67 Elsevier

Paulo-Santos A Ramos C and Marques N C (2011) Determin-ing the polarity of words through a common online dictionary InProgress in Artificial Intelligence pages 649ndash663 Springer

Pelosi S (2015a) Morphological relations for the automatic expan-sion of italian sentiment lexicons In Proceedings of the Interna-tional Scientific Conference on the Automatic Processing of Natural-Language Electronic Texts NooJ 2015

270 BIBLIOGRAPHY

Pelosi S (2015b) Sentita and doxa Italian databases and tools forsentiment analysis purposes In Proceedings of the Second ItalianConference on Computational Linguistics CLiC-it 2015 pages 226ndash231 Accademia University Press

Pelosi S Maisto A and Elia A (2015) Sentiment analysis throughfinite state automata In the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 - FSMNLP2015

Perez-Rosas V Banea C and Mihalcea R (2012) Learning senti-ment lexicons in spanish In LREC pages 3077ndash3081

Pesetsky D (1996) Zero syntax Experiencers and cascades Num-ber 27 MIT press

Pianta E Bentivogli L and Girardi C (2002) MultiWordNet devel-oping an aligned multilingual database In Proceedings of the firstinternational conference on global WordNet volume 152 pages 55ndash63

Piao S Ananiadou S Tsuruoka Y Sasaki Y and McNaught J(2007) Mining opinion polarity relations of citations In Interna-tional Workshop on Computational Semantics (IWCS) pages 366ndash371

Picabia L (1978) Les constructions adjectivales en franccedilais systeacutema-tique transformationnelle volume 11 Librairie Droz

Picard R W and Picard R (1997) Affective computing volume 252MIT press Cambridge

Pierre-Yves O (2003) The production and recognition of emotionsin speech features and algorithms In International Journal ofHuman-Computer Studies volume 59 pages 157ndash183 Elsevier

Plag I (2003) Word-formation in English Cambridge UniversityPress

BIBLIOGRAPHY 271

Polanyi L and Zaenen A (2006) Contextual valence shifters In Com-puting attitude and affect in text Theory and applications pages 1ndash10 Springer

Popescu A-M and Etzioni O (2007) Extracting product features andopinions from reviews In Natural language processing and text min-ing pages 9ndash28 Springer

Portner P (2009) Modality OUP Oxford

Potts C (2011) On the negativity of negation In Semantics and Lin-guistic Theory number 20 pages 636ndash659

Prabowo R and Thelwall M (2009) Sentiment analysis A combinedapproach In Journal of Informetrics volume 3 pages 143ndash157 Else-vier

Pradhan S Hacioglu K Ward W Martin J H and Jurafsky D(2003) Semantic role parsing Adding semantic structure to un-structured text In Data Mining 2003 ICDM 2003 Third IEEE In-ternational Conference on pages 629ndash632 IEEE

Qiu G Liu B Bu J and Chen C (2009) Expanding domain senti-ment lexicon through double propagation In IJCAI volume 9 pages1199ndash1204

Quirk R Crystal D and Education P (1985) A comprehensive gram-mar of the English language volume 397 Cambridge Univ Press

Rahimi Rastgar S and Razavi N (2013) A System for Building CorpusAnnotated With Semantic Roles PhD thesis

Rainer F (2004) Derivazione nominale deaggettivale In La for-mazione delle parole in italiano pages 293ndash314

Rao D and Ravichandran D (2009) Semi-supervised polarity lexiconinduction In Proceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Linguistics pages 675ndash682 Association for Computational Linguistics

272 BIBLIOGRAPHY

Rappaport Hovav M and Levin B (1998) Building verb meaningsIn The projection of arguments Lexical and compositional factorspages 97ndash134

Read J and Carroll J (2009) Weakly supervised techniques fordomain-independent sentiment classification In Proceedings of the1st international CIKM workshop on Topic-sentiment analysis formass opinion pages 45ndash52 ACM

Reinstein D A and Snyder C M (2005) The influence of expert re-views on consumer demand for experience goods A case study ofmovie critics In The journal of industrial economics volume 53pages 27ndash51 Wiley Online Library

Remus R Quasthoff U and Heyer G (2010) Sentiws-a pub-licly available german-language resource for sentiment analysis InLREC

Reyes A and Rosso P (2011) Mining subjective knowledge from cus-tomer reviews a specific case of irony detection In Proceedings ofthe 2nd Workshop on Computational Approaches to Subjectivity andSentiment Analysis pages 118ndash124 Association for ComputationalLinguistics

Reyes A Rosso P and Veale T (2013) A multidimensional approachfor detecting irony in twitter In Language Resources and Evaluationvolume 47 pages 239ndash268 Springer

Reynolds K Kontostathis A and Edwards L (2011) Using machinelearning to detect cyberbullying In Machine Learning and Applica-tions and Workshops (ICMLA) 2011 10th International Conferenceon volume 2 pages 241ndash244 IEEE

Ricca D (2004a) Aggettivi deverbali In La formazione delle parole initaliano Tuumlbingen Niemeyer pages 419ndash444

BIBLIOGRAPHY 273

Ricca D (2004b) Derivazione avverbiale In La formazione delle pa-role in italiano pages 472ndash489

Riloff E (1996) An empirical study of automated dictionary construc-tion for information extraction in three domains In Artificial intel-ligence volume 85 pages 101ndash134 Elsevier

Riloff E Patwardhan S and Wiebe J (2006) Feature subsumptionfor opinion analysis In Proceedings of the 2006 Conference on Em-pirical Methods in Natural Language Processing pages 440ndash448 As-sociation for Computational Linguistics

Riloff E Wiebe J and Wilson T (2003) Learning subjective nounsusing extraction pattern bootstrapping In Proceedings of the sev-enth conference on Natural language learning at HLT-NAACL 2003-Volume 4 pages 25ndash32 Association for Computational Linguistics

Rosenbaum P S (1967) The grammar of English predicate comple-ment constructions Research monograph 47 Cambridge MA MITPress

Rubin V L (2010) Epistemic modality From uncertainty to certaintyin the context of information seeking as interactions with texts InInformation Processing amp Management volume 46 pages 533ndash540Elsevier

Ruppenhofer J (2013) Anchoring sentiment analysis in frame se-mantics In Veredas pages 66ndash81

Ruppenhofer J Ellsworth M Petruck M R Johnson C R andScheffczyk J (2006) FrameNet II Extended theory and practice

Ruppenhofer J and Rehbein I (2012) Semantic frames as an an-chor representation for sentiment analysis In Proceedings of the 3rdWorkshop in Computational Approaches to Subjectivity and Senti-ment Analysis pages 104ndash109 Association for Computational Lin-guistics

274 BIBLIOGRAPHY

Ruppenhofer J Somasundaran S and Wiebe J (2008) Finding thesources and targets of subjective expressions In LREC

Russom P et al (2011) Big data analytics In TDWI Best PracticesReport Fourth Quarter

Ruwet N (1994) Ecirctre ou ne pas ecirctre un verbe de sentiment In Languefranccedilaise pages 45ndash55 JSTOR

Sagot B and Fort K (2007) Ameacuteliorer un lexique syntaxique agrave lrsquoaidedes tables du lexique-grammaire adverbes en-ment In 26e Col-loque International sur le Lexique et la grammaire 2007

Salvi G and Vanelli L (2004) Nuova grammatica italiana BolognaIl Mulino

Sarwar B Karypis G Konstan J and Riedl J (2001) Item-basedcollaborative filtering recommendation algorithms In Proceedingsof the 10th international conference on World Wide Web pages 285ndash295 ACM

Sauriacute R and Pustejovsky J (2009) Factbank A corpus annotated withevent factuality In Language resources and evaluation volume 43pages 227ndash268 Springer

Schmid H (1994) Probabilistic part-of-speech tagging using decisiontrees In Proceedings of the international conference on new methodsin language processing volume 12 pages 44ndash49

Schmid H Baroni M Zanchetta E and Stein A (2007) The en-riched treetagger system In proceedings of the EVALITA 2007 work-shop

Schuler K K (2005) VerbNet A broad-coverage comprehensive verblexicon PhD thesis Computer and Information Science Dept Uni-versity of Pennsylvania

BIBLIOGRAPHY 275

Schuller B Rigoll G and Lang M (2003) Hidden markov model-based speech emotion recognition In Acoustics Speech and SignalProcessing 2003 Proceedings(ICASSPrsquo03) 2003 IEEE InternationalConference on volume 2 pages IIndash1 IEEE

Schuller B Rigoll G and Lang M (2004) Speech emotion recogni-tion combining acoustic features and linguistic information in a hy-brid support vector machine-belief network architecture In Acous-tics Speech and Signal Processing 2004 Proceedings(ICASSPrsquo04)IEEE International Conference on volume 1 pages Indash577 IEEE

Schwarzschild R (2008) The semantics of comparatives and otherdegree constructions In Language and Linguistics Compass vol-ume 2 pages 308ndash331 Wiley Online Library

Seki Y Eguchi K Kando N and Aono M (2005) Multi-documentsummarization with subjectivity analysis at duc 2005 In Proceed-ings of the Document Understanding Conference (DUC)

Seki Y Evans D K Ku L-W Chen H-H Kando N and Lin C-Y(2007) Overview of opinion analysis pilot task at ntcir-6 In Proceed-ings of NTCIR-6 Workshop Meeting pages 265ndash278

Shaikh M A M Prendinger H and Mitsuru I (2007) Assessingsentiment of text by semantic dependency and contextual valenceanalysis In Affective Computing and Intelligent Interaction pages191ndash202 Springer

Shapiro C (1982) Consumer information product quality and sellerreputation In The Bell Journal of Economics pages 20ndash35 JSTOR

Shapiro C (1983) Premiums for high quality products as returns toreputations In The quarterly journal of economics pages 659ndash679JSTOR

Shimada K and Endo T (2008) Seeing several stars A rating in-ference task for a document containing several evaluation criteria

276 BIBLIOGRAPHY

In Advances in Knowledge Discovery and Data Mining pages 1006ndash1014 Springer

Silberztein M (2003) Nooj manual In Available for download atwww nooj4nlp net

Silberztein M (2008) Complex annotations with nooj In Proceedingsof the 2007 International NooJ Conference pages pndash214 CambridgeScholars Publishing

Silva M J Carvalho P Costa C and Sarmento L (2010) Automaticexpansion of a social judgment lexicon for sentiment analysis Tech-nical Report TR 1008

Silva M J Carvalho P and Sarmento L (2012) Building a sentimentlexicon for social judgement mining In Computational Processingof the Portuguese Language pages 218ndash228 Springer

Singhal P and Bansal A (2013) Improved textual cyberbullying de-tection using data mining In International Journal of Informationand Computation Technology ISSN pages 0974ndash2239

Snyder B and Barzilay R (2007) Multiple aspect ranking using thegood grief algorithm In HLT-NAACL pages 300ndash307

Socher R Perelygin A Wu J Y Chuang J Manning C D Ng A Yand Potts C (2013) Recursive deep models for semantic composi-tionality over a sentiment treebank In Proceedings of the conferenceon empirical methods in natural language processing (EMNLP) vol-ume 1631 page 1642

Somprasertsri G and Lalitrojwong P (2010) Mining feature-opinionin online customer reviews for opinion summarization In J UCSvolume 16 pages 938ndash955

Souza M Vieira R Busetti D Chishman R Alves I M et al(2011) Construction of a portuguese opinion lexicon from multi-ple resources In STIL

BIBLIOGRAPHY 277

Steinberger J Ebrahim M Ehrmann M Hurriyetoglu A KabadjovM Lenkova P Steinberger R Tanev H Vaacutezquez S and ZavarellaV (2012) Creating sentiment dictionaries via triangulation In Deci-sion Support Systems volume 53 pages 689ndash694 Elsevier

Stigler G J (1961) The economics of information In The journal ofpolitical economy pages 213ndash225 JSTOR

Stone P J Dunphy D C and Smith M S (1966) The General In-quirer A Computer Approach to Content Analysis MIT press

Stone P J and Hunt E B (1963) A computer approach to contentanalysis studies using the general inquirer system In Proceedingsof the May 21-23 1963 spring joint computer conference pages 241ndash256 ACM

Strapparava C and Mihalcea R (2008) Learning to identify emo-tions in text In Proceedings of the 2008 ACM symposium on Appliedcomputing pages 1556ndash1560 ACM

Strapparava C Valitutti A et al (2004) WordNet-Affect an affectiveextension of WordNet In LREC volume 4 pages 1083ndash1086

Strapparava C Valitutti A and Stock O (2006) The affective weightof lexicon In Proceedings of the fifth international conference on lan-guage resources and evaluation pages 423ndash426

Swier R S and Stevenson S (2004) Unsupervised semantic role la-belling In Proceedings of EMNLP volume 95 page 102

Taboada M Anthony C and Voll K (2006) Methods for creatingsemantic orientation dictionaries In Proceedings of the 5th Inter-national Conference on Language Resources and Evaluation (LREC)Genova Italy pages 427ndash432

Taboada M Brooke J Tofiloski M Voll K and Stede M (2011)Lexicon-based methods for sentiment analysis In Computationallinguistics volume 37 pages 267ndash307 MIT Press

278 BIBLIOGRAPHY

Taboada M and Grieve J (2004) Analyzing appraisal automaticallyIn Proceedings of AAAI Spring Symposium on Exploring Attitude andAffect in Text (AAAI Technical Re port SS 04 07) Stanford Univer-sity CA pp 158q161 AAAI Press

Tan S Cheng X Wang Y and Xu H (2009) Adapting naive bayesto domain adaptation for sentiment analysis In Advances in Infor-mation Retrieval pages 337ndash349 Springer

Taylor A (1954) Proverbial comparisons and similes from Californiavolume 3 University of California Press

Tepperman J Traum D R and Narayanan S (2006) yeahright sarcasm recognition for spoken dialogue systems In INTER-SPEECH

Terveen L Hill W Amento B McDonald D and Creter J (1997)Phoaks A system for sharing recommendations In Communica-tions of the ACM volume 40 pages 59ndash62 ACM

Tesniegravere L (1959) Eleacutements de syntaxe structurale Librairie C Klinck-sieck

Thomas M Pang B and Lee L (2006) Get out the vote Deter-mining support or opposition from congressional floor-debate tran-scripts In Proceedings of the 2006 conference on empirical methodsin natural language processing pages 327ndash335 Association for Com-putational Linguistics

Tolone E and Stavroula V (2011) Extending the adverbial coverage ofa nlp oriented resource for french In arXiv preprint arXiv11113462

Tsur O Davidov D and Rappoport A (2010) Icwsm-a great catchyname Semi-supervised recognition of sarcastic sentences in onlineproduct reviews In ICWSM

Turney P D (2001) Mining the web for synonyms Pmi-ir versus lsaon toefl In Machine Learning ECML 2001 pages 491ndash502 Springer

BIBLIOGRAPHY 279

Turney P D (2002) Thumbs up or thumbs down semantic orienta-tion applied to unsupervised classification of reviews In Proceed-ings of the 40th annual meeting on association for computationallinguistics pages 417ndash424

Turney P D and Littman M L (2003) Measuring praise and criti-cism Inference of semantic orientation from association In ACMTransactions on Information Systems (TOIS) volume 21 pages 315ndash346

Veale T and Hao Y (2009) Support structures for linguistic creativityA computational analysis of creative irony in similes In Proceedingsof CogSci pages 1376ndash1381

Velikovich L Blair-Goldensohn S Hannan K and McDonald R(2010) The viability of web-derived polarity lexicons In HumanLanguage Technologies The 2010 Annual Conference of the NorthAmerican Chapter of the Association for Computational Linguisticspages 777ndash785 Association for Computational Linguistics

Vermeij M (2005) The orientation of user opinions through adverbsverbs and nouns In 3rd Twente Student Conference on IT EnschedeJune

Vietri S (1990) On some comparative frozen sentences in italianIn Lingvisticaelig Investigationes volume 14 pages 149ndash174 John Ben-jamins Publishing Company

Vietri S (2004) Lessico-grammatica dellrsquoitaliano Metodi descrizionie applicazioni UTET Universitagrave

Vietri S (2011) On a class of italian frozen sentences In LingvisticaeligInvestigationes volume 34 pages 228ndash267 John Benjamins Publish-ing Company

Vietri S (2014a) The construction of an annotated corpus for theanalysis of italian transfer predicates In Lingvisticae Investigationesvolume 37 pages 69ndash105 John Benjamins Publishing Company

280 BIBLIOGRAPHY

Vietri S (2014b) Idiomatic Constructions in Italian A Lexicon-grammar Approach volume 31 John Benjamins Publishing Com-pany

Vietri S (2014c) The italian module for nooj In In Proceedings of theFirst Italian Conference on Computational Linguistics CLiC-it 2014Pisa University Press

Vietri S (2014d) The lexicon-grammar of italian idioms In Work-shop on Lexical and Grammatical Resources for Language Process-ing page 137

Villars R L Olofson C W and Eastwood M (2011) Big data Whatit is and why you should care In White Paper IDC

Vincze V (2010) Speculation and negation annotation in naturallanguage texts what the case of bioscope might (not) reveal InNegation and Speculation in Natural Language Processing (NeSp-NLP 2010) page 28

Vincze V Szarvas G Farkas R Moacutera G and Csirik J (2008) Thebioscope corpus biomedical texts annotated for uncertainty nega-tion and their scopes In BMC bioinformatics volume 9 page S9BioMed Central Ltd

Voghera M (2004) Polirematiche In Grossmann M Rainer F(a curadi) La formazione delle parole in italiano Tbingen Niemeyer pages56ndash69

Voll K and Taboada M (2007) Not all words are created equal Ex-tracting semantic orientation as a function of adjective relevance InAI 2007 Advances in Artificial Intelligence pages 337ndash346 Springer

Vollero A (2010) E-marketing e Web communication Verso la gestionedella corporate reputation online Giappichelli Torino

BIBLIOGRAPHY 281

von Ungern-Sternberg T and von Weizsaumlcker C C (1985) The sup-ply of quality on a market for experience goods In The Journal ofIndustrial Economics pages 531ndash540 JSTOR

Voyatzi S (2006) Description morphosyntaxique et seacutemantique desadverbes figeacutes en vue drsquoun systeacuteme drsquoanalyse automatique des textesgrecs PhD thesis Universiteacute Paris-Est

Waltinger U (2010) Germanpolarityclues A lexical resource for ger-man sentiment analysis In LREC

Wan X (2009) Co-training for cross-lingual sentiment classificationIn Proceedings of the Joint Conference of the 47th Annual Meeting ofthe ACL and the 4th International Joint Conference on Natural Lan-guage Processing of the AFNLP Volume 1 pages 235ndash243 Associa-tion for Computational Linguistics

Wang X Zhao Y and Fu G (2011) A morpheme-based method tochinese sentence-level sentiment classification In Int J of AsianLang Proc volume 21 pages 95ndash106

Warriner A B Kuperman V and Brysbaert M (2013) Norms of va-lence arousal and dominance for 13915 english lemmas In Behav-ior research methods volume 45 pages 1191ndash1207 Springer

Wawer A (2012) Extracting emotive patterns for languages with richmorphology In International Journal of Computational Linguisticsand Applications volume 3 pages 11ndash24

Wei C-P Chen Y-M Yang C-S and Yang C C (2010) Un-derstanding what concerns consumers a semantic approach toproduct feature extraction from consumer reviews In Informa-tion Systems and E-Business Management volume 8 pages 149ndash167Springer

Whissel C (1989) The dictionary of affect in language emotion The-ory research and experience vol 4 the measurement of emotionsr In Plutchik and H Kellerman Eds New York Academic

282 BIBLIOGRAPHY

Whissell C (2009) Using the revised dictionary of affect in languageto quantify the emotional undertones of samples of natural lan-guage 1 2 In Psychological reports volume 105 pages 509ndash521 Am-mons Scientific

Whitley M S (1995) Gustar and other psych verbs a problem in tran-sitivity In Hispania pages 573ndash585 JSTOR

Wiebe J (2000) Learning subjective adjectives from corpora InAAAIIAAI pages 735ndash740

Wiebe J Wilson T Bruce R Bell M and Martin M (2004) Learn-ing subjective language In Computational linguistics volume 30pages 277ndash308 MIT Press

Wiebe J Wilson T and Cardie C (2005) Annotating expressionsof opinions and emotions in language In Language resources andevaluation volume 39 pages 165ndash210 Springer

Wiegand M Balahur A Roth B Klakow D and Montoyo A (2010)A survey on the role of negation in sentiment analysis In Proceed-ings of the workshop on negation and speculation in natural lan-guage processing pages 60ndash68 Association for Computational Lin-guistics

Wilson D and Sperber D (1992) On verbal irony In Lingua vol-ume 87 pages 53ndash76 Elsevier

Wilson T Wiebe J and Hoffmann P (2005) Recognizing contex-tual polarity in phrase-level sentiment analysis In Proceedings ofthe conference on human language technology and empirical meth-ods in natural language processing pages 347ndash354 Association forComputational Linguistics

Wilson T Wiebe J and Hoffmann P (2009) Recognizing contextualpolarity An exploration of features for phrase-level sentiment anal-ysis In Computational linguistics volume 35 pages 399ndash433 MITPress

BIBLIOGRAPHY 283

Xia R and Zong C (2010) Exploring the use of word relation featuresfor sentiment classification In Proceedings of the 23rd InternationalConference on Computational Linguistics Posters pages 1336ndash1344Association for Computational Linguistics

Xiang G Fan B Wang L Hong J and Rose C (2012) Detecting of-fensive tweets via topical feature discovery over a large scale twittercorpus In Proceedings of the 21st ACM international conference onInformation and knowledge management pages 1980ndash1984 ACM

Xu Z and Zhu S (2010) Filtering offensive language in online com-munities using grammatical relations In Proceedings of the SeventhAnnual Collaboration Electronic Messaging Anti-Abuse and SpamConference

Yang S and Ko Y (2011) Extracting comparative entities and pred-icates from texts using comparative type classification In Proceed-ings of the 49th Annual Meeting of the Association for ComputationalLinguistics Human Language Technologies-Volume 1 pages 1636ndash1644 Association for Computational Linguistics

Ye Q Law R Gu B and Chen W (2011) The influence of user-generated content on traveler behavior An empirical investigationon the effects of e-word-of-mouth to hotel online bookings In Com-puters in Human Behavior volume 27 pages 634ndash639 Elsevier

Ye Q Zhang Z and Law R (2009) Sentiment classification of on-line reviews to travel destinations by supervised machine learningapproaches In Expert Systems with Applications volume 36 pages6527ndash6535 Elsevier

Yessenalina A and Cardie C (2011) Compositional matrix-spacemodels for sentiment analysis In Proceedings of the Conference onEmpirical Methods in Natural Language Processing pages 172ndash182Association for Computational Linguistics

284 BIBLIOGRAPHY

Yi J Nasukawa T Bunescu R and Niblack W (2003) Sentimentanalyzer Extracting sentiments about a given topic using naturallanguage processing techniques In Data Mining 2003 ICDM 2003Third IEEE International Conference on pages 427ndash434

Yin D Xue Z Hong L Davison B D Kontostathis A and Ed-wards L (2009) Detection of harassment on web 20 In Proceedingsof the Content Analysis in the WEB volume 2 pages 1ndash7

Yuen R W Chan T Y Lai T B Kwong O Y and Trsquosou B K(2004) Morpheme-based derivation of bipolar semantic orientationof chinese words In Proceedings of the 20th international conferenceon Computational Linguistics page 1008 Association for Computa-tional Linguistics

Zhang L and Liu B (2011) Identifying noun product features thatimply opinions In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human Language Tech-nologies short papers-Volume 2 pages 575ndash580 Association forComputational Linguistics

Zhang L Liu B Lim S H and OrsquoBrien-Strain E (2010) Extractingand ranking product features in opinion documents In Proceed-ings of the 23rd international conference on computational linguis-tics Posters pages 1462ndash1470 Association for Computational Lin-guistics

Zhao Q Sun C Liu B and Cheng Y (2010) Learning to detecthedges and their scope using crf In Proceedings of the FourteenthConference on Computational Natural Language LearningmdashSharedTask pages 100ndash105 Association for Computational Linguistics

Zhou N (2010) Taboo Language on the Internet An Analysis of Gen-der Differences in Using Taboo Language PhD thesis KristianstadUniversity

  • Introduction
    • Research Context
    • Filtering Information Automatically
    • Subjectivity Emotions and Opinions
    • Structure of the Thesis
      • Theoretical Background
        • The Lexicon-Grammar Theoretical Framework
          • Distributional and Transformational Analysis
          • Formal Representation of Elementary Sentences
            • The Continuum from Simple Sentences to Polyrematic Structures
            • Against the Bipartition of the Sentence Structure
              • Lexicon-Grammar Binary Matrices
              • Syntax-Semantics Interface
                • Frame Semantics
                • Semantic Predicates Theory
                    • The Italian Module of Nooj
                      • Electronic Dictionaries
                      • Local grammars and Finite-state Automata
                          • SentIta Lexicon-Grammar based Sentiment Resources
                            • Lexicon-Grammar of Sentiment Expressions
                            • Literature Survey on Subjectivity Lexicons
                              • Polarity Lexicons
                              • Affective Lexicons
                              • Subjectivity Lexicons for Other Languages
                                • Italian Resources
                                • Sentiment Lexical Resources for Different Languages
                                    • SentIta and its Manually-built Resources
                                      • Semantic Orientations and Prior Polarities
                                      • Adjectives of Sentiment
                                        • A Special Remark on Negative Words in Dictionaries and Positive Biases in Corpora
                                          • Psychological and other Opinion Bearing Verbs
                                            • A Brief Literature Survey on the Classification of Psych Verbs
                                            • Semantic Predicates from the LG Framework
                                            • Psychological Predicates
                                            • Other Verbs of Sentiment
                                              • Psychological Predicates Nominalizations
                                                • The Presence of Support Verbs
                                                • The Absence of Support Verbs
                                                • Ordinary Verbs Structures
                                                • Standard-Crossed Structures
                                                  • Psychological Predicates Adjectivalizations
                                                    • Support Verbs
                                                    • Nominal groups with appropriate nouns Napp
                                                    • Verbless Structures
                                                    • Evaluation Verbs
                                                      • Vulgar Lexicon
                                                        • Towards a Lexicon of Opinionated Compounds
                                                          • Compound Adverbs and their Polarities
                                                            • Compound Adjectives of Sentiment
                                                              • Frozen Expressions of Sentiment
                                                                • The Lexicon-Grammar Classes Under Examination PECO CEAC EAA ECA and EAPC
                                                                • The Formalization of Sentiment Frozen Expressions
                                                                    • The Automatic Expansion of SentIta
                                                                      • Literature Survey on Sentiment Lexicon Propagation
                                                                        • Thesaurus Based Approaches
                                                                        • Corpus Based Approaches
                                                                        • The Morphological Approach
                                                                          • Deadjectival Adverbs in -mente
                                                                            • Semantics of the -mente formations
                                                                            • The Superlative of the -mente Adverbs
                                                                              • Deadjectival Nouns of Quality
                                                                              • Morphological Semantics
                                                                                  • Shifting Lexical Valences through FSA
                                                                                    • Contextual Valence Shifters
                                                                                    • Polarity Intensification and Downtoning
                                                                                      • Related Works
                                                                                      • Intensification and Downtoning in SentIta
                                                                                      • Excess Quantifiers
                                                                                        • Negation Modeling
                                                                                          • Related Works
                                                                                          • Negation in SentIta
                                                                                            • Modality Detection
                                                                                              • Related Works
                                                                                              • Modality in SentIta
                                                                                                • Comparative Sentences Mining
                                                                                                  • Related Works
                                                                                                  • Comparison in SentIta
                                                                                                  • Interaction with the Intensification Rules
                                                                                                  • Interaction with the Negation Rules
                                                                                                      • Specific Tasks of the Sentiment Analysis
                                                                                                        • Sentiment Polarity Classification
                                                                                                          • Related Works
                                                                                                          • DOXA a Sentiment Classifier based on FSA
                                                                                                          • A Dataset of Opinionated Reviews
                                                                                                          • Experiment and Evaluation
                                                                                                            • Sentence-level Sentiment Analysis
                                                                                                            • Document-level Sentiment Analysis
                                                                                                                • Feature-based Sentiment Analysis
                                                                                                                  • Related Works
                                                                                                                  • Experiment and Evaluation
                                                                                                                  • Opinion Visual Summarization
                                                                                                                    • Sentiment Role Labeling
                                                                                                                      • Related Works
                                                                                                                      • Experiment and Evaluation
                                                                                                                          • Conclusion
                                                                                                                          • Italian Lexicon-Grammar Symbols
                                                                                                                          • Acronyms
                                                                                                                          • Bibliography
Page 6: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 7: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 8: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 9: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 10: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 11: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 12: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 13: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 14: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 15: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 16: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 17: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 18: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 19: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 20: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 21: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 22: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 23: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 24: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 25: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 26: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 27: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 28: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 29: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 30: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 31: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 32: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 33: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 34: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 35: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 36: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 37: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 38: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 39: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 40: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 41: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 42: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 43: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 44: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 45: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 46: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 47: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 48: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 49: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 50: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 51: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 52: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 53: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 54: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 55: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 56: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 57: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 58: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 59: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 60: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 61: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 62: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 63: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 64: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 65: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 66: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 67: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 68: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 69: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 70: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 71: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 72: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 73: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 74: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 75: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 76: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 77: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 78: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 79: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 80: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 81: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 82: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 83: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 84: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 85: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 86: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 87: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 88: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 89: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 90: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 91: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 92: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 93: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 94: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 95: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 96: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 97: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 98: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 99: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 100: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 101: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 102: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 103: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 104: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 105: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 106: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 107: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 108: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 109: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 110: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 111: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 112: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 113: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 114: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 115: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 116: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 117: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 118: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 119: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 120: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 121: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 122: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 123: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 124: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 125: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 126: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 127: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 128: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 129: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 130: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 131: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 132: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 133: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 134: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 135: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 136: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 137: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 138: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 139: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 140: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 141: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 142: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 143: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 144: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 145: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 146: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 147: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 148: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 149: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 150: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 151: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 152: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 153: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 154: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 155: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 156: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 157: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 158: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 159: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 160: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 161: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 162: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 163: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 164: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 165: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 166: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 167: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 168: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 169: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 170: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 171: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 172: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 173: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 174: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 175: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 176: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 177: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 178: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 179: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 180: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 181: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 182: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 183: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 184: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 185: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 186: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 187: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 188: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 189: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 190: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 191: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 192: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 193: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 194: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 195: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 196: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 197: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 198: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 199: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 200: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 201: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 202: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 203: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 204: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 205: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 206: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 207: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 208: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 209: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 210: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 211: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 212: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 213: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 214: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 215: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 216: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 217: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 218: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 219: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 220: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 221: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 222: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 223: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 224: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 225: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 226: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 227: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 228: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 229: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 230: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 231: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 232: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 233: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 234: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 235: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 236: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 237: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 238: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 239: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 240: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 241: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 242: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 243: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 244: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 245: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 246: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 247: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 248: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 249: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 250: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 251: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 252: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 253: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 254: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 255: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 256: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 257: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 258: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 259: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 260: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 261: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 262: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 263: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 264: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 265: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 266: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 267: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 268: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 269: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 270: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 271: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 272: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 273: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 274: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 275: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 276: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 277: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 278: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 279: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 280: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 281: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 282: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 283: Detecting Subjectivity through Lexicon-Grammar Strategies
Page 284: Detecting Subjectivity through Lexicon-Grammar Strategies