7
Twitter Sentiment Analysis for Marketing Research Rachel Bugeja Supervisor: Mr. Charlie Abela Department of Intelligent Computer Systems University of Malta [email protected] ABSTRACT With the popularity of social media networks, Natural Lan- guage Processing (NLP) faced new challenges because of the dynamic nature of the data published on these platforms. Social media networks are flooded with posts every second, of different lengths and formats about news, users’ opin- ions and observations, comments, etc.[1]. This information can be exploited by companies, especially marketing compa- nies, to control and evaluate their social media hype, online marketing strategies and customer relationships. The pa- per focuses on sentiment analysis and attempts to classify tweets based on their sentiment for a marketing tool, Twit- ter MAT (Marketing Analysis Tool) which summarizes the public’s view about products. Twitter MAT uses Natural Language technologies, in particular a Named Entity Recog- nition (NER) found in the GATE text processor to perform sentiment analysis [2]. Twitter MAT also uses other tech- niques such as discourse analysis, which is namely the effect of adverbs on adjectives, to detect sentiment polarity and its strength. The results obtained for the accuracy on the clas- sification of tweets based on their sentiment was 75% when compared to existing, manually annotated datasets and the results we obtained from the conducted survey. However, when compared to datasets which focus on sentiment polar- ity strength, Twitter MAT reached a 100% agreement. 1. INTRODUCTION The increase of social media platforms and their frequent usage by the public [4], resulted in these networks to be flooded with information every second. Billions of accounts throughout famous social media networks are used every sin- gle day to post snippets of different lengths and formats about users news, opinions and observations, comments, etc.[6]. This data can be structured into information that is exploited by organisations for analysing and monitoring sales and for future product development. Nowadays, before buying a product, people tend to search for information and any reviews about the product. Through the Web, one can instantly find others’ opinions or reviews and experiences related to particular products. People use social media platforms such as Facebook and Twitter to instantly update their friends’ circle about new purchases and first impressions, their frequently used products or dis- appointing products and also when certain products break down. These updates or posts or even micro-conversations are important to companies since these are authentic con- sumer insights about any product or campaign. However, due to the huge amount received and also their short length and noisy content, these streams pose new challenges for natural language processing and interpreting. 2. AIMS AND OBJECTIVES The aim behind this paper is that of extracting informa- tion from social media streams, in this case Twitter, and interpret this data into structured information for product marketing research uses. This information can be used to analyze how the public is reviewing products and how their perception changes over time. Consequently, we shall be an- alyzing tweets that mention or discuss certain products that companies offer. Thus, our aim is to provide the user with a market research tool that provides instant consumer in- sights and a better understanding as to what these insights are indicating. The objectives behind this research are the following: The identification and filtering of noisy data for senti- ment analysis: Certain data, such as duplicate tweets, will be totally discarded, while other noisy data, such as tweets with various meta-data (example: hashtags or mentions), which may be useful to our analysis, will be exploited. To analyse the strength of sentiment polarity by analysing the effect of adverbs on adjectives and evaluate how ad- verbs intensify or minimize the sentiment polarity of a given text. Providing the user with a tool through which he can query by topic through the use of keywords, for par- ticular information extracted from Twitter. The re- trieved data will be displayed in a timeline which shows how the opinion about the topic or product chosen by the user has evolved over time depending on the sentiment of the public’s view. The tool will be able to receive data continuously so that the information 1

Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

Embed Size (px)

Citation preview

Page 1: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

Twitter Sentiment Analysis for Marketing Research

Rachel BugejaSupervisor: Mr. Charlie Abela

Department of Intelligent Computer SystemsUniversity of Malta

[email protected]

ABSTRACTWith the popularity of social media networks, Natural Lan-guage Processing (NLP) faced new challenges because of thedynamic nature of the data published on these platforms.Social media networks are flooded with posts every second,of different lengths and formats about news, users’ opin-ions and observations, comments, etc.[1]. This informationcan be exploited by companies, especially marketing compa-nies, to control and evaluate their social media hype, onlinemarketing strategies and customer relationships. The pa-per focuses on sentiment analysis and attempts to classifytweets based on their sentiment for a marketing tool, Twit-ter MAT (Marketing Analysis Tool) which summarizes thepublic’s view about products. Twitter MAT uses NaturalLanguage technologies, in particular a Named Entity Recog-nition (NER) found in the GATE text processor to performsentiment analysis [2]. Twitter MAT also uses other tech-niques such as discourse analysis, which is namely the effectof adverbs on adjectives, to detect sentiment polarity and itsstrength. The results obtained for the accuracy on the clas-sification of tweets based on their sentiment was 75% whencompared to existing, manually annotated datasets and theresults we obtained from the conducted survey. However,when compared to datasets which focus on sentiment polar-ity strength, Twitter MAT reached a 100% agreement.

1. INTRODUCTIONThe increase of social media platforms and their frequent

usage by the public [4], resulted in these networks to beflooded with information every second. Billions of accountsthroughout famous social media networks are used every sin-gle day to post snippets of different lengths and formatsabout users news, opinions and observations, comments,etc.[6]. This data can be structured into information thatis exploited by organisations for analysing and monitoringsales and for future product development.

Nowadays, before buying a product, people tend to search

for information and any reviews about the product. Throughthe Web, one can instantly find others’ opinions or reviewsand experiences related to particular products. People usesocial media platforms such as Facebook and Twitter toinstantly update their friends’ circle about new purchasesand first impressions, their frequently used products or dis-appointing products and also when certain products breakdown. These updates or posts or even micro-conversationsare important to companies since these are authentic con-sumer insights about any product or campaign. However,due to the huge amount received and also their short lengthand noisy content, these streams pose new challenges fornatural language processing and interpreting.

2. AIMS AND OBJECTIVESThe aim behind this paper is that of extracting informa-

tion from social media streams, in this case Twitter, andinterpret this data into structured information for productmarketing research uses. This information can be used toanalyze how the public is reviewing products and how theirperception changes over time. Consequently, we shall be an-alyzing tweets that mention or discuss certain products thatcompanies offer. Thus, our aim is to provide the user witha market research tool that provides instant consumer in-sights and a better understanding as to what these insightsare indicating.

The objectives behind this research are the following:

• The identification and filtering of noisy data for senti-ment analysis: Certain data, such as duplicate tweets,will be totally discarded, while other noisy data, suchas tweets with various meta-data (example: hashtagsor mentions), which may be useful to our analysis, willbe exploited.

• To analyse the strength of sentiment polarity by analysingthe effect of adverbs on adjectives and evaluate how ad-verbs intensify or minimize the sentiment polarity of agiven text.

• Providing the user with a tool through which he canquery by topic through the use of keywords, for par-ticular information extracted from Twitter. The re-trieved data will be displayed in a timeline which showshow the opinion about the topic or product chosenby the user has evolved over time depending on thesentiment of the public’s view. The tool will be ableto receive data continuously so that the information

1

Page 2: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

represented through the timeline will be realistic andupdated.

• This tool will also point out what product featuresstood out through the use of a tag cloud which is alsoupdated throughout time. This also helps the user toidentify what the people associate with the products.

3. RELATED WORKPrevious work on opinion mining and sentiment analysis

on web content, such as Yu and Hatzivassiloglou (2003)[3],Kim and Hovy (2004) [4], Hu and Liu (2004)[5] on sentimentanalysis have different approaches, however keyword-basedapproach is the most common. In short, they all have a col-lection of sentiment holding bag-of-words which are assigneda binary sentiment of either positive or negative. When oneof those keyword appear in a phrase or paragraph, the senti-ment polarity is worked out. Other studies including Wilsonet al. [6], introduce another sentiment categories such asneutral or both (a phrase that contains both negative andpositive lexicons).

One of the biggest challenges in NLP is noisy data. Sincethe idea for this thesis is to classify tweets, we were particu-larly interested in studies which work on these microbloggingtexts. In [7], an interesting approach was taken to sanitizetweets to improve the performance of the sentiment classi-fier, due to the amount of noisy content in tweets. Capital-ized words, excessive punctuation, upper cases, emoticonsand words that indicate laughter, and Twitter-specific char-acters such as the mention (@) were replaced by a keywordas they were considered as sentiment intensifiers. However,un-opinionated words knows as Stop Words in NLP and suf-fixes were removed. This sanitation process was proven toimprove sentiment classification.

On the other hand, in [4] the system selects sentenceswhich mention the topic required and contain opinionatedkeywords, calculates the polarity of each word separatelyand then calculates the polarity of the whole phrase. How-ever, this required a large amount of time dedicated to train-ing text and words in order to calculate the score for eachword. Studies from Pang et al. [8] showed that using key-words to determine the polarity of a text results in a 60%accuracy when compared to manually annotated texts.

3.1 SarcasmIn [9], a study was conducted explicitly on Tweets and

Amazon reviews. The Semi-Supervised Sarcasm Identifica-tion algorithm (SASI) was used for identifying sarcastic pat-terns and classifying tweets depending on the probability ofbeing sarcastic. To train the classifier, tweets having thehashtag #sarcasm were used, since these were considered tobe the ones with the highest probability to be sarcastic sincethey were explicitly marked by the user.

However, tweets containing the hashtag were found to bebiased and too noisy. This was due to the usage of thehashtag to non-sarcastic tweets (the main reason being thatthe user does not fully understand what is meant by sar-casm) and the use of the hashtag when talking about an-other tweet, document or external entity, such as: “I love itwhen #sarcasm is used in TV shows, there’s always some-

one who doesn’t get it”. Other tweets were also impossible toclassify as being sarcastic without the explicit sarcastic tag.It is also important to point out that only 4.09% of tweetsare explicitly tagged with this hashtag [10] (125 tweets outof 3.3 million) which shows that either users don’t use thehashtag to mark their tweets or sarcasm is not used regu-larly.

The biggest problem with detecting sarcasm in microblog-ging was identified in the study [11] where it was stated thattweets are not accurately labeled. Again, hashtags placean important role in classification of sarcastic tweets. Infact, several hashtags that are somewhat related to sarcasmwere used to collect tweets (#sarcasm, #sarcastic) and onlytweets having these hashtags at the very end were used. Thiswas based on the result from the previous study [9] whereit was stated that tweets with #sarcasm were too noisy andbiased, where a marker at the end is more probable to indi-cate that a tweet is sarcastic than using sarcasm as a nounin a tweet. A further manual inspection was conducted toeliminate tweets where sarcasm was the main subject andnot being used as a marker. Apart from this step, man-ual inspection was also used to compare results from thesystem’s classification and human classification of sarcastictweets. With only a 50% agreement between the humanjudges themselves, this study shows how difficult it is todetect sarcasm from text.

3.2 Strength in Sentiment PolarityAlthough several studies show different methodologies, an

unlikely but interesting approach was taken in [12] where ad-verbs and adjectives were studied to show how they affectthe sentiment of a sentence. Although other studies [13] [14]did analyze the use of adjectives to determine the sentimentof a text, this study was the first study to take into con-sideration adverbs as well, in particular adverbs of degree.These kind of adverbs affect the sentiment polarity and caneven reverse it. The method used to assign a score to eachadverb and the associated axiomatic rules, were defined toshow the relationship between adverbs and adjectives. Theresults show that this approach reaches a high level of pre-cision on sentiment classification.

4. DESIGNThe main purpose of Twitter MAT was to build a system

which could gather tweets from Twitter and classify themaccording to their sentiment group while displaying them ina timeline manner so as to show how the mentioned senti-ment is changing by time. This section discusses the designissues and decisions taken for modeling this system.

Twitter Mat includes two main components, the User In-terface, a component in the form of a Web application andthe Classification Module. The component diagram depictedin Figure 1 shows how the various components interact witheach other. The Classification Module holds any processingof the tweets, from data gathering to data storage, includingall algorithms to filter noise in tweets, annotate sentimentand the scoring system. The web application, on the otherhand allows the user to get access to this data, handle howresults are displayed to the user and give extra informationfor the interpretation of the data.

The Twitter module is the intermediary between the web

2

Page 3: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

application and the Annotation module. Essentially it isresponsible of fetching tweets, checking for existing tweetsgrouped by the same keyword and filtering. Tweets are re-trieved through the use of the Twitter API which handlesany type of interaction between a developer and the Twit-ter service. The API, given a query, returns tweets in JSONformat, including information about the tweet itself such asthe author, date of publication, number of retweets, loca-tion, any URLs included and several other details.

The Annotation module processes the tweets in order tocalculate the score, which then determines the tweet’s sen-timent and its strength. Once a tweet enters the Anno-tation module it is first checked as to whether it containsany abbreviations or acronyms which are commonly foundin tweets considering their 140 character limit. The abbre-viations and acronyms found are converted to their full andoriginal meaning. This procedure ensures that each wordcan be understood by the GATE system while annotatingthe content.

Each tweet is then processed through GATE to annotateits contents. Annotations are done with the use of ANNIEand the POS tagger provided. Although the POS taggercan annotate several different kinds of words and entities,it was necessary to add new annotations for the purpose ofthis thesis. The following annotations were added: Senti-ment Words, which can be divided in positive and negativewords and Adverbs of Degree.

Once GATE annotates the content, each annotation isparsed and evaluated. Based on which type of annotationfound, a score is given to the tweet. The scoring systemworks by increasing the score if a positive word is foundwhile the score is reduced if a negative word is found. Thesame applies to the mentioned adverbs of degree. Dependingon the score given, the tweets are then classified as HighlyPositive, Positive, Negative or Highly Negative. They arestored to the appropriate topic, by adding it to the exist-ing list, if any. If the tweet being stored already exists, thisstage is skipped. Unfortunately, filtering retweets does notlimit the Twitter API from returning the same tweet twice.Moreover, if no annotations are found in the resulting out-put, the tweet are not stored. Although previous works onsentiment analysis usually put these tweets in a “neutral”category [6], they do not express any form of feedback orperception of the product and thus are considered as noisydata for our paper.

As soon as the new tweets are stored, each tweet is checkedfor keyword extraction, which are then displayed in a tagcloud. This means that for every word it holds, it is checkedas to whether there are any keywords which are related tothe topic or any words which are commonly found in tweetsregarding the same topic. This procedure is done on thenew tweets so that keywords shown are up-to-date and re-flect the new trends. It was decided that keywords will beextracted by our simple but effective algorithm. Keywordextraction was made possible by eliminating any type ofstop words. These include any form of adverbs, conjunc-tions, modal verbs and common nouns, amongst others. Theremaining words are then checked if they appear in othertweets and if they are found in at least three tweets from

the same set, i.e. from the same query or topic, in whichcase they are considered as keywords.

Figure 1: Top-Level Design

The database is a crucial part of this system. Given thatthe Twitter API can only give tweets less than a week old, itis important to store data continuously so as to have a big-ger time span to show on the timeline. Moreover, this givesthe user the ability to search for the formerly perceived sen-timent of certain products and analyze how this perceptionis changing based on the results shown in the timeline. Thedatabase, in this case, is a relational database which is usedto store the actual tweet, the score given after the classifica-tion algorithm, the number of retweets and the date it waspublished.

5. IMPLEMENTATIONFollowing the previous Design Section, this section will

continue to explain the process of building the system anddiscuss in further detail how each module, the Classificationmodule and the Web Application module, was implementedbased on the design already discussed. We will also elab-orate on techniques used, frameworks and libraries used tohelp the development and how they were integrated and theAPI used to interact with Twitter. The Figure 2 belowshows a top-level diagram of the system showing both mod-ules and highlighting which technologies were used in orderto develop our system.

5.1 Classification of TweetsIn order to get tweets from Twitter, we used the search

functionality provided by twitter4j which returns a numberof tweets. This number depends on the number of tweetsavailable in the last 7 days with the keyword given as a queryto the search functionality, from the day being queried. Italso returns duplicate tweets if there is a limited number of

3

Page 4: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

Figure 2: Top-Level Design with Technologies Used

tweets available. Once the search is done, the system willloop through all the tweets and filter which tweets it wouldlike to process. Filtering is based on these characteristics:

• The tweet must be in English

• The tweet must contain the keyword specified in thequery due to the noisy and unrelated data returnedfrom the API.

• The tweet must not be a retweet. Processing suchtweets can lead to duplication of data (since it is highlyprobable that the original tweet is also retrieved). In-stead of storing a repeated tweet, we will be using theretweeted count of the original tweet so that it is re-flected in the percentages of how many tweets werefound for each sentiment group.

• The tweet must not be a reply to someone else’s tweetdue to the possibility of propaganda.

• The tweet is checked for any sarcasm indicator. Suchindicators include the explicit hashtag #sarcasm and#not. The rest of the indicators can be found in [10].

The tweets that pass this filtering process are then passedto the Annotation module and through GATE. By split-ting the content into separate words, the system identifiesany abbreviations from a given list 1. If an abbreviation isfound, the abbreviated word or phrase will be replaced byits full phrase based on a corresponding list. Both of theselists were extracted from the Twitter Plugin provided byGATE. Once every word is analyzed, the whole tweet con-tent is passed on as a document to a corpus. The systemthen commands the document to annotate its content, in-cluding user added annotations. The system loops througheach annotation, checking for its type. The score is thenincreased or decreased depending on the type found. Theaxioms below shows how scores are assigned.

Let AFF,DOUBT,WEAK and STRONG be the sets of ad-verbs of affirmation,adverbs of doubt,adverbs of weak inten-sity,and adverbs of strong intensity respectively & POS andNEG be the sets of adjectives which are positive and nega-tive. Having tweet T,

if ∃ adv ∈ AFF ∪ STRONG ∪DOUBT where adj ∈POS =⇒ (Score(T ) = Score(T ) + 2)

1http://gate.ac.uk/wiki/twitter-postagger.html

and

if ∃ adv ∈DOUBT ∪WEAK ∪ STRONG∪AFF where adj ∈ NEG

or

if ∃ adv ∈WEAK where adj ∈ POS

=⇒ (Score(T ) = Score(T )− 2)

The user added annotations were created using the JAPElanguage. Once we created the gazetteer list files, it wasnecessary to add an XML schema to this annotation. Then,using JAPE we specified how we would like to annotate thesewords. On the other hand, Adverbs of Degree is composedof four different types. These include:

1. Adverbs of affirmation, such as absolutely, certainly,exactly, totally and so on.

2. Adverbs of doubt, such as possibly, roughly, appar-ently, seemingly and so on.

3. Strong intensifying adverbs, such as astronomically,exceedingly, extremely, immensely and so on.

4. Weak intensifying adverbs, such as barely, scarcely,weakly, slightly and so on.

Since adverbs are usually found in front of adjectives, thishad to be specified in our JAPE rule as a pattern to an-notate. In order to create an annotation, GATE pattern-matches the content specified in the corpus to any JAPErule found. If a match is found, a new XML tag is createdsurrounding the content it has matched. The XML tag con-tents depends on what has been specified in the JAPE rule.

Note that if a sentiment-bearing word is found, withoutany adverbs in front, it will still affect the score. When allannotations are parsed and the scoring system is finished,the tweet is ready to be stored. However, if no annotationsare found, the tweet is discarded. The score is reversed ifany sarcasm indicator was found. While this is being done,the system also checks if the tweet to be stored containsany keywords needed for our tag cloud. This is done bychecking whether each word is a stop word (such as a con-junction, common verb, etc.). If it is not, it will be addedto a HashMap where the key is the actual keyword and thevalue is its weight. The weight here represents the num-ber of times the particular word has appeared in incomingtweets for that query. For a keyword to be shown, it mustbe present in at least three other tweets.

5.2 Web ApplicationThe web application was developed in order to let the

user search for new data and display the results found. Wemade use of Google Visualization (Google Charts)2 for ourtimeline and Tag Canvas3 for our tag cloud. Through ourweb application the user is able to register for the Twitter

2https://developers.google.com/chart/interactive/docs/reference3http://www.goat1000.com/tagcanvas.php

4

Page 5: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

MAT services and query new keywords or topics. The resultsare then displayed on the mentioned timeline which showshow the sentiment of the retrieved tweets changes over theselected time span. The user is also able to view percent-ages obtained for the amount of tweets found for each sen-timent class, view negative and positive tweets separatelyand the tag cloud showing various keywords found in thegiven tweets. Figure 3 shows the main focus of the webapplication, the Timeline.

Figure 3: Twitter MAT: Timeline

6. EVALUATION AND RESULTSThe accuracy of a sentiment analysis system is based on

how well it agrees with human assessment. Having no bodylanguage as compared to verbal language, extracting sen-timent in written text has always been a challenge. Sincedifferent people rate sentiment in text differently from eachother, it makes evaluating this type of analysis an evenharder task. Most studies carried out [15] [16] [17] [18] onsentiment analysis provided an evaluation system based onthe results of surveys done by different individuals in orderto have a wide range of answers. In order to interpret theresults of these surveys, the average mark is usually takenso as to reflect the general answer given by the people whoundertook the surveys.

The same was done with our survey where 20 individualswere given different tweets and they had to assign a senti-ment for each. As any other sentiment analysis study, therespondents’ opinion varied quite a lot. This goes to showhow difficult it is to determine the exact sentiment of writ-ten text. However, most answers did match up. From theresults obtained in the surveys we have a total of 75% agree-ment. This was calculated by comparing the survey’s resultsto the score obtained by Twitter MAT.

We also compared the system’s results with available pub-lic datasets. The datasets chosen were manually annotated,and each dataset was used for different reasons.

1. Stanford Twitter Sentiment Test SetThis corpus known as Sentiment1404 consists of over amillion tweets labelled as positive or negative. The re-sults are shown in Figure 4 where the red figures showany results which did not match. As one can see forboth instances, the tweet is more of a statement thanan expression of sentiment, which may be the reasonwhy the results did not comply. While for the firsttweet words such as “die” or “blasting” are considered

4http://help.sentiment140.com

as negative by the application, the correspondents hadthe knowledge that the tweet is referring to movies.

Figure 4: Sentiment140 Dataset vs. Twitter MATResults

2. Sentiment Strength Twitter DatasetThis dataset5 focuses on the strength of the senti-ment (as was our approach) having manually anno-tated tweets with a given number to represent thestrength of the sentiment. Two different scores aregiven, a positive score and a negative score. Positivesentiment strength ranges from 1 (not positive) to 5(extremely positive) and negative sentiment strengthfrom -1 (not negative) to -5 (extremely negative). Thus,for example, in the third tweet we have a score of neg-ative -4 by the Sentiment Strength Dataset and a scoreof -1 by our application. These two scores are indicat-ing the same score. While a score of -4 means that ahighly negative word was found, the score of -1 meansthat our application found a negative word and de-creased our score (since we are not keeping two sepa-rate scores for positive and negative words).

Figure 5: Sentiment Strength Dataset vs. TwitterMAT Results

3. STS-Gold DatasetThis dataset was constructed for evaluating Twittersentiment analysis systems as well. However, whatmade it different from the other systems was that theannotations of either positive or negative were not onlyon tweet-level but also entity level. Each entity men-tioned in a given tweet was annotated separately basedon the adjectives and adverbs found, and a generic la-bel is given to the whole tweet based on the majorityof the labels given at entity-level.

5http://sentistrength.wlv.ac.uk/documentation

5

Page 6: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

Figure 6: STS-Gold Dataset vs. Twitter MAT Re-sults

For the comparison to other manually annotated datasets,there was also a high percentage of agreement. With thefirst dataset, The Sentiment140, we had a 75% agreementas shown in Figure 4, while with Sentiment Strength Datasetwe had a 100% agreement, shown in Figure 5. This goes toshow that using adverbs to enhance our strength in polar-ity did in fact improve our sentiment analysis. For the lastdataset considered, we again reached a 75% agreement, asshown in Figure 6.

As part of our evaluation we also got feedback from alecturer from the Marketing department at the Universityof Malta, Mr E. Said. After a brief demonstration of theweb application and its main features, we discussed whethersimilar tools are useful for the marketing industry. The fol-lowing are some points which Mr. Said mentioned:

• Twitter MAT is very simple to use, which attracts alot of marketing managers who are not technically pro-ficient.

• The application is very helpful for both marketing com-panies and companies using this tool directly to eval-uate customer insights.

• The timeline and the zoom feature are very useful soas to analyse a particular time frame in more depth.

• Moreover, the tag cloud, which has recently becomevery popular in the marketing industry here in Malta,can help the user pin point new trends, errors, defectsand articles about the products which may have goneviral; all of which are very important for any marketingstrategy.

• Although there are similar tools which are in fact usedeveryday in marketing companies, these are not linkedto any social media platforms, which makes TwitterMAT very innovative and useful to analyse social me-dia hype.

7. CONCLUSION AND FUTURE WORKThere are several issues which can be addressed to improve

our sentiment analysis.

• The use of emoticons is becoming more popular thanever with the introduction of emoji keyboard apps andemoticons offered by the default keyboard of new smart-phones. Although in [19] [20], both studies explorehow text emoticons (emoticons made up of punctu-ation marks) affect the sentiment of a tweet, we are

now facing a new challenge of having image emoticons.Thus, we can categorize these images based on theirsentiment and adjust our score if any emoticons arefound. Currently the Twitter API does not cater forsuch emoticons as they are listed as image URLs.

• Another feature which we could add to our sentimentanalysis component is to examine the publishing pat-terns of the users to identify any form of propaganda[21]. This can be done by storing user’s (publisher) in-formation while receiving tweets. Doing so, we couldeliminate any form of spam published by propagandistsby analyzing how frequently he/she retweets certaintweets about particular topics or repetitive content.

• We could also improve how the system provides theoverview of the results given by the application and thetimeline. One feature we could consider is to identifyevents by analyzing sudden increases of tweets regard-ing certain topics or keywords trending. Such eventscould include: launching new products, new updates,articles about the products or company, new publicity,etc. These events can then be displayed to the user tosuggest what could have affected the sentiment polar-ity of the tweets or what event might have triggered afluctuation in the hype or amount of tweets publishedin a time-span.

From our conducted research, we identified several formsof noise in tweets, which we reduced with our filters suchas removing tweets which are replies to other tweets, tweetswith no sentiment, URLs, etc. Instead of considering abbre-viations and acronyms as noise or stop words, we expandedthem to their full meaning since they could also indicatesentiment. Moreover, we also used adverbs of degree whichmay affect the strength of sentiment for adjectives and verbs.For each adverb found, we adjusted our score to reflect suchstrength. This was proven to be effective based on the re-sults obtained in our evaluation. Even though, as discussed,there are several features we could add to improve our ap-plication, we have obtained satisfying results. As discussedwith local experts in the field, we proved that there is a needfor such tools in an era where social media is becoming animportant part of our lives.

8. REFERENCES[1] D. R. D. Maynard, K. Bontcheva, “Challenges in

developing opinion mining tools for social media.”University of Sheffield, 2012.

[2] D. Maynard, V. Tablan, C. Ursu, H. Cunningham,and Y. Wilks, “Named Entity Recognition fromDiverse Text Types,” in Proceedings of the RecentAdvances in Natural Language Processing 2001Conference, 2001, pp. 257–274.

[3] H. Yu and V. Hatzivassiloglou, “Towards answeringopinion questions: Separating facts from options andidentifying the polarity of opinion sentences,”EMNLP-2003, 2003.

[4] E. H. S-M. Kim, “Determining the sentiment ofopinions,” Coling 2004, 2004.

[5] B. L. M. Hu, “Mining and summarizing customerreviews.” KDD-2004, 2004.

6

Page 7: Twitter Sentiment Analysis for Marketing Researchstaff.um.edu.mt/cabe2/supervising/undergraduate/overview/rachel... · Twitter Sentiment Analysis for Marketing Research Rachel Bugeja

[6] C. C. J. Wiebe, T. Wilson, “Annotating expressions ofopinions and emotions in language.” LanguageResources and Evaluation (Formerly Computers andthe Humanities), 2005.

[7] S. D. R. N. H. M. E. F. M. Cohen, P. Damiani,“Sentiment analysis in microblogging: A practicalimplementation,” University of Buenos Aires, 2011.

[8] S. V. Bo Pang, Lillian Lee, “Thumbs up? sentimentclassiın ↪Acation using machine learning techniques.”Proceedings of the Conference on Empirical Methodsin Natural Language Processing (EMNLP), pp. 79 –86, 2002.

[9] A. R. D. Davidov, O. Tsur, “Semi-supervisedrecognition of sarcastic sentences in twitter andamazon,” in Proceedings of the Fourteenth Conferenceon Computational Natural Language Learning, 2010,pp. 107–116.

[10] A. v. d. B. C. Liebrecht, F. Kunneman, “The perfectsolution for detecting sarcasm in tweets not.”Proceedings of the 4th Workshop on ComputationalApproaches to Subjectivity, Sentiment and SocialMedia Analysis, pp. 29 – 37, 2013.

[11] N. W. R. GonzA ↪alez-IbA ↪aAsez, S. Muresan,“Identifying sarcasm in twitter: A closer look,”Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics, pp. 581 –586, 2011.

[12] D. R. F. Benamara, C. Cesarano, “Sentiment analysis:Adjectives and adverbs are better than adjectivesalone,” 2007.

[13] M. L. P.D Turney, “Measuring praise and criticism:Inference of semantic orientation from association,”2003.

[14] T. C. T. B. S. C. D. Borth, R. Ji, “. large-scale visualsentiment ontology and detectors using adjective nounpairs,” 2013.

[15] G. P. Mike Thelwall, Kevan Buckley and D. Cai,“Sentiment strength detection in short informal text,”Journal of the American Society for InformationScience and Technology, 2010.

[16] H. Saif, M. Fernandez, Y. He, Alani, and Harith,“Evaluation datasets for twitter sentiment analysis: Asurvey and a new dataset, the sts-gold,” 2013.

[17] A. Go, R. Bhayani, and L. Huang, “Twitter sentimentclassification using distant supervision,” Processing,2009. [Online]. Available:http://www.stanford.edu/ alecmgo/papers/TwitterDistantSupervision09.pdf

[18] E. Kouloumpis, T. Wilson, Moore, and Johanna,“Twitter sentiment analysis: The good the bad andthe omg!” The AAAI Press, 2011. [Online]. Available:http://dblp.uni-trier.de/db/conf/icwsm/icwsm2011.htmlKouloumpisWM11

[19] J. Zhao, L. Dong, J. Wu, and K. Xu, “Moodlens: anemoticon-based sentiment analysis system for chinesetweets.” in KDD. ACM, 2012, pp. 1528–1531.

[20] S. L. Rojas, U. Kirschenmann, and M. Wolpers, “Wehave no feelings, we have emoticons ;-).” in ICALT.IEEE, 2012, pp. 642–646.

[21] H. K. C. Lumezanu, N. Feamster, “bias: Measuringthe tweeting behavior of propagandists,” 2012.

7