01 03 Sentiment Analysis in Twitter

Embed Size (px)

Citation preview

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    1/12

    Sentiment Analysis in Twit

    ContAyushi Dalmia (ayushi.Dalmia@researc

    Mayank Gupta(mayank.g@studeArpit Kumar Jaiswal(arpitkumar.jaiswal@studen

    Chinthala Tharun Reddy(tharun.chinthala@studen

    Course: Information Retrieval and Extraction, IIIT HInstructor: Dr. Vasude

    Source Code: [http://github.com/mayank93/Twitter-Sentiment-

    Demo: [http://10.2.4.49:8000/TSAA/]

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    2/12

    What is Sentiment Analys

    It is classification of the polarity of agiven text in the document, sentenceor phrase

    The goal is to determine whether theexpressed opinion in the text ispositive, negative or neutral.

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    3/12

    Negative

    Neutral

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    4/12

    Why is Sentiment Analysis Impo

    Microblogging has become popular communication tool

    Opinion of the mass is important Political party may want to know whether people support their pro

    not.

    Before investing into a company, one can leverage the sentiment opeople for the company to find out where it stands.

    A company might want find out the reviews of its products

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    5/12

    Using Twitter for Sentiment An

    Popular microblogging site Short Text Messages of 140 characters

    240+ million active users

    500 million tweets are generated everyday

    Twitter audience varies from common man to celebriti

    Users often discuss current affairs and share personal vvarious subjects

    Tweets are small in length and hence unambiguous

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    6/12

    Problem Statemen

    The problem at hand consists of two subtasks:

    Phrase Level Sentiment Analysis in Twitter :

    Given a message containing a marked instance of a word or a pdetermine whether that instance is positive, negative or neutracontext.

    Sentence Level Sentiment Analysis in Twitter:

    Given a message, decide whether the message is of positive, neneutral sentiment. For messages conveying both a positive and sentiment, whichever is the stronger sentiment should be chos

    The task is inspired from SemEval 2013 , Task 9 : Sentiment Analysis in Twitter

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    7/12

    Challenges Tweets are highly unstructured and also non-grammatical

    Out of Vocabulary Words

    Lexical Variation

    Extensive usage of acronyms like asap, lol, afaik

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    8/12

    Approach

    SVM Classifier and Prediction

    Feature Extractor

    Preprocessing

    Tokeniser

    Tweet Downloader

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    9/12

    Tweet Downloader

    Download the tweets using Twitter API Tokenisation

    Twitter specific POS Tagger developed by ARK Social Media Search

    Preprocessing

    Removing non-English Tweets

    Replacing Emoticons by their polarity Remove URL, Target Mentions, Hashtags, Numbers.

    Replace Negative Mentions

    Replace Sequence of Repeated Characters eg. coooooooool by c

    Remove Nouns and Prepositions

    Approach

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    10/12

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    11/12

    ResultsA baseline model by taking the unigrams, bigrams and trigramcompare it with the feature based model for both the sub-tas

    Sub-Task Baseline Model Feature Based

    Model

    Baseline + Feature Base

    Model

    Phrase Based 62.24 % 77.33% 79.90%

    Sentence Based 52.54% 57.57% 58.36%

    Accuracy

    F1 Score

    Sub-Task Baseline Model Feature BasedModel

    Baseline + Feature BaseModel

    Phrase Based 76.27* 75.23 75.98

    Sentence Based 55.70 59.86 60.55

    *Classifies in positive classes only, hence high recall.

  • 8/9/2019 01 03 Sentiment Analysis in Twitter

    12/12

    Conclusion

    We investigated two kinds of models: Baseline and Feature

    Models and demonstrate that combination of both these mperform the best.

    For our feature-based approach, feature analysis reveals thamost important features are those that combine the prior pwords and their parts-of-speech tags.