Twitter Sentiment Analysis for Marketing Research
Rachel BugejaSupervisor: Mr. Charlie Abela
Department of Intelligent Computer SystemsUniversity of Malta
ABSTRACTWith the popularity of social media networks, Natural Lan-guage Processing (NLP) faced new challenges because of thedynamic nature of the data published on these platforms.Social media networks are flooded with posts every second,of different lengths and formats about news, users opin-ions and observations, comments, etc.. This informationcan be exploited by companies, especially marketing compa-nies, to control and evaluate their social media hype, onlinemarketing strategies and customer relationships. The pa-per focuses on sentiment analysis and attempts to classifytweets based on their sentiment for a marketing tool, Twit-ter MAT (Marketing Analysis Tool) which summarizes thepublics view about products. Twitter MAT uses NaturalLanguage technologies, in particular a Named Entity Recog-nition (NER) found in the GATE text processor to performsentiment analysis . Twitter MAT also uses other tech-niques such as discourse analysis, which is namely the effectof adverbs on adjectives, to detect sentiment polarity and itsstrength. The results obtained for the accuracy on the clas-sification of tweets based on their sentiment was 75% whencompared to existing, manually annotated datasets and theresults we obtained from the conducted survey. However,when compared to datasets which focus on sentiment polar-ity strength, Twitter MAT reached a 100% agreement.
1. INTRODUCTIONThe increase of social media platforms and their frequent
usage by the public , resulted in these networks to beflooded with information every second. Billions of accountsthroughout famous social media networks are used every sin-gle day to post snippets of different lengths and formatsabout users news, opinions and observations, comments,etc.. This data can be structured into information thatis exploited by organisations for analysing and monitoringsales and for future product development.
Nowadays, before buying a product, people tend to search
for information and any reviews about the product. Throughthe Web, one can instantly find others opinions or reviewsand experiences related to particular products. People usesocial media platforms such as Facebook and Twitter toinstantly update their friends circle about new purchasesand first impressions, their frequently used products or dis-appointing products and also when certain products breakdown. These updates or posts or even micro-conversationsare important to companies since these are authentic con-sumer insights about any product or campaign. However,due to the huge amount received and also their short lengthand noisy content, these streams pose new challenges fornatural language processing and interpreting.
2. AIMS AND OBJECTIVESThe aim behind this paper is that of extracting informa-
tion from social media streams, in this case Twitter, andinterpret this data into structured information for productmarketing research uses. This information can be used toanalyze how the public is reviewing products and how theirperception changes over time. Consequently, we shall be an-alyzing tweets that mention or discuss certain products thatcompanies offer. Thus, our aim is to provide the user witha market research tool that provides instant consumer in-sights and a better understanding as to what these insightsare indicating.
The objectives behind this research are the following:
The identification and filtering of noisy data for senti-ment analysis: Certain data, such as duplicate tweets,will be totally discarded, while other noisy data, suchas tweets with various meta-data (example: hashtagsor mentions), which may be useful to our analysis, willbe exploited.
To analyse the strength of sentiment polarity by analysingthe effect of adverbs on adjectives and evaluate how ad-verbs intensify or minimize the sentiment polarity of agiven text.
Providing the user with a tool through which he canquery by topic through the use of keywords, for par-ticular information extracted from Twitter. The re-trieved data will be displayed in a timeline which showshow the opinion about the topic or product chosenby the user has evolved over time depending on thesentiment of the publics view. The tool will be ableto receive data continuously so that the information
represented through the timeline will be realistic andupdated.
This tool will also point out what product featuresstood out through the use of a tag cloud which is alsoupdated throughout time. This also helps the user toidentify what the people associate with the products.
3. RELATED WORKPrevious work on opinion mining and sentiment analysis
on web content, such as Yu and Hatzivassiloglou (2003),Kim and Hovy (2004) , Hu and Liu (2004) on sentimentanalysis have different approaches, however keyword-basedapproach is the most common. In short, they all have a col-lection of sentiment holding bag-of-words which are assigneda binary sentiment of either positive or negative. When oneof those keyword appear in a phrase or paragraph, the senti-ment polarity is worked out. Other studies including Wilsonet al. , introduce another sentiment categories such asneutral or both (a phrase that contains both negative andpositive lexicons).
One of the biggest challenges in NLP is noisy data. Sincethe idea for this thesis is to classify tweets, we were particu-larly interested in studies which work on these microbloggingtexts. In , an interesting approach was taken to sanitizetweets to improve the performance of the sentiment classi-fier, due to the amount of noisy content in tweets. Capital-ized words, excessive punctuation, upper cases, emoticonsand words that indicate laughter, and Twitter-specific char-acters such as the mention (@) were replaced by a keywordas they were considered as sentiment intensifiers. However,un-opinionated words knows as Stop Words in NLP and suf-fixes were removed. This sanitation process was proven toimprove sentiment classification.
On the other hand, in  the system selects sentenceswhich mention the topic required and contain opinionatedkeywords, calculates the polarity of each word separatelyand then calculates the polarity of the whole phrase. How-ever, this required a large amount of time dedicated to train-ing text and words in order to calculate the score for eachword. Studies from Pang et al.  showed that using key-words to determine the polarity of a text results in a 60%accuracy when compared to manually annotated texts.
3.1 SarcasmIn , a study was conducted explicitly on Tweets and
Amazon reviews. The Semi-Supervised Sarcasm Identifica-tion algorithm (SASI) was used for identifying sarcastic pat-terns and classifying tweets depending on the probability ofbeing sarcastic. To train the classifier, tweets having thehashtag #sarcasm were used, since these were considered tobe the ones with the highest probability to be sarcastic sincethey were explicitly marked by the user.
However, tweets containing the hashtag were found to bebiased and too noisy. This was due to the usage of thehashtag to non-sarcastic tweets (the main reason being thatthe user does not fully understand what is meant by sar-casm) and the use of the hashtag when talking about an-other tweet, document or external entity, such as: I love itwhen #sarcasm is used in TV shows, theres always some-
one who doesnt get it. Other tweets were also impossible toclassify as being sarcastic without the explicit sarcastic tag.It is also important to point out that only 4.09% of tweetsare explicitly tagged with this hashtag  (125 tweets outof 3.3 million) which shows that either users dont use thehashtag to mark their tweets or sarcasm is not used regu-larly.
The biggest problem with detecting sarcasm in microblog-ging was identified in the study  where it was stated thattweets are not accurately labeled. Again, hashtags placean important role in classification of sarcastic tweets. Infact, several hashtags that are somewhat related to sarcasmwere used to collect tweets (#sarcasm, #sarcastic) and onlytweets having these hashtags at the very end were used. Thiswas based on the result from the previous study  whereit was stated that tweets with #sarcasm were too noisy andbiased, where a marker at the end is more probable to indi-cate that a tweet is sarcastic than using sarcasm as a nounin a tweet. A further manual inspection was conducted toeliminate tweets where sarcasm was the main subject andnot being used as a marker. Apart from this step, man-ual inspection was also used to compare results from thesystems classification and human classification of sarcastictweets. With only a 50% agreement between the humanjudges themselves, this study shows how difficult it is todetect sarcasm from text.
3.2 Strength in Sentiment PolarityAlthough several studies show different methodologies, an
unlikely but interesting approach was taken in  where ad-verbs and adjectives were studied to show how they affectthe sentiment of a sentence. Although other studies  did analyze the use of adjectives to determine the sentimentof a text, this study was the first study to take into con-sideration adverbs as well, in particular adverbs of degree.These kind of adverbs affect the sentiment polarity and caneven reverse it. The method used to assign a score to eachadverb and the associated axiomatic rules, were defined toshow the relationship between adverbs and adjectives. Theresults show that this approach reaches a high level of pre-cision on sentiment classification.
4. DESIGNThe main purpose of Twitter MAT was to build a system
which could gather tweets from Twitter and classify themaccording to their sentiment group while displaying them ina timeline manner so as to show how the mentioned senti-ment is changing by time. This section discusses the designissues and decisions taken for modeling this system.
Twitter Mat includes two main components, the User In-terface, a component in the form of a Web application andthe Classification Module. The component diagram depictedin Figure 1 shows how the various components interact witheach other. The Classification Module holds any processingof the tweets, from data gathering to data storage, includingall algorithms to filter noise in tweets, annotate sentimentand the scoring system. The web application, on the otherhand allows the user to get access to this data, handle howresults are displayed to the user and give extra informationfor the interpretation of the data.
The Twitter module is the intermediary between the web
application and the Annotation module. Essentially it isresponsible of fetching tweets, checking for existing tweetsgrouped by the same keyword and filtering. Tweets are re-trieved through the use of the Twitter API which handlesany type of interaction between a developer and the Twit-ter service. The API, given a query, returns tweets in JSONformat, including information about the tweet itself such asthe author, date of publication, number of retweets, loca-tion, any URLs included and several other details.
The Annotation module processes the tweets in order tocalculate the score, which then determines the tweets sen-timent and its strength. Once a tweet enters the Anno-tation module it is first checked as to whether it containsany abbreviations or acronyms which are commonly foundin tweets considering their 140 character limit. The abbre-viations and acronyms found are converted to their full andoriginal meaning. This procedure ensures that each wordcan be understood by the GATE system while annotatingthe content.
Each tweet is then processed through GATE to annotateits contents. Annotations are done with the use of ANNIEand the POS tagger provided. Although the POS taggercan annotate several different kinds of words and entities,it was necessary to add new annotations for the purpose ofthis thesis. The following annotations were added: Senti-ment Words, which can be divided in positive and negativewords and Adverbs of Degree.
Once GATE annotates the content, each annotation isparsed and evaluated. Based on which type of annotationfound, a score is given to the tweet. The scoring systemworks by increasing the score if a positive word is foundwhile the score is reduced if a negative word is found. Thesame applies to the mentioned adverbs of degree. Dependingon the score given, the tweets are then classified as HighlyPositive, Positive, Negative or Highly Negative. They arestored to the appropriate topic, by adding it to the exist-ing list, if any. If the tweet being stored already exists, thisstage is skipped. Unfortunately, filtering retweets does notlimit the Twitter API from returning the same tweet twice.Moreover, if no annotations are found in the resulting out-put, the tweet are not stored. Although previous works onsentiment analysis usually put these tweets in a neutralcategory , they do not express any form of feedback orperception of the product and thus are considered as noisydata for our paper.
As soon as the new tweets are stored, each tweet is checkedfor keyword extraction, which are then displayed in a tagcloud. This means that for every word it holds, it is checkedas to whether there are any keywords which are related tothe topic or any words which are commonly found in tweetsregarding the same topic. This procedure is done on thenew tweets so that keywords shown are up-to-date and re-flect the new trends. It was decided that keywords will beextracted by our simple but effective algorithm. Keywordextraction was made possible by eliminating any type ofstop words. These include any form of adverbs, conjunc-tions, modal verbs and common nouns, amongst others. Theremaining words are then checked if they appear in othertweets and if they are found in at least three tweets from
the same set, i.e. from the same query or topic, in whichcase they are considered as keywords.
Figure 1: Top-Level Design
The database is a crucial part of this system. Given thatthe Twitter API can only give tweets less than a week old, itis important to store data continuously so as to have a big-ger time span to show on the timeline. Moreover, this givesthe user the ability to search for the formerly perceived sen-timent of certain products and analyze how this perceptionis changing based on the results shown in the timeline. Thedatabase, in this case, is a relational database which is usedto store the actual tweet, the score given after the classifica-tion algorithm, the number of retweets and the date it waspublished.
5. IMPLEMENTATIONFollowing the previous Design Section, this section will
continue to explain the process of building the system anddiscuss in further detail how each module, the Classificationmodule and the Web Application module, was implementedbased on the design already discussed. We will also elab-orate on te...