4
@ IJTSRD | Available Online @ www ISSN No: 245 Inte R Twitter Data Analyt N. Saranya 1 , A. V. Guka 1 1,2,3,4 Departm Sri Shakthi Institute of Eng ABSTRACT Social media is used to analyze politic stock market, movies, medicines, ag Twitter a microblogging website where write millions of tweets on a variety o This project attempts to analyze the sen people for the election candidates based opinions and emotions. The focus of ou assign the polarity to each tweet that i user expresses a positive or negative o the tweets that are extracted, we try frequent their emotions change. We are classify and differentiate the sentiments before and after the election based on th upload. The location of the twitter us classify the geographical area which in analyze the emotions of people of differ project uses the Naïve-Bayes approach and R Studio for processing the textual d Keywords: Twitter, R, Sentiment An Bayes Algorithm INTRODUCTION: The social media sites which are u frequent posts in short publicly or in a g are called as the microblogging sites. T sites are hundreds in number. Therefo huge collection of data on these sites. T increases rapidly. These data express t sentiments of the people in a large benefits of social media analytics includ of business opportunities and competiti Cost efficiency is a major benefit of d Therefore, implementing the technology w.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 56 - 6470 | www.ijtsrd.com | Volum ernational Journal of Trend in Sc Research and Development (IJT International Open Access Journ tics Sentiment Analysis of An anesh 2 , G. Karthick Kumar 3 , K. Karthik Raja 1 Assistant Professor, 2,3,4 Student ment of Computer Science and Engineering, gineering and Technology, Coimbatore, Tamil N cal campaigns, griculture etc. users read and of topics daily. ntiment of the d upon the live ur project is to is whether the opinion. With y to find how e also trying to s of the people he tweets they ser is used to n turn helps to rent areas. Our in R language data. nalysis, Naïve used to make group of users These kinds of ore, there is a The rate of data the moods and amount. The de anticipation ive advantage. data analytics. y will result in positive advantages for organ industries. Twitter is one of t platform and a microbloggin users upload data. Twitter allo post Twitter messages of siz Twitter messages are calle 656Million tweets are tweeted expressed as tweets which mood. Twitter makes these ut in a data stream, which can algorithms. In this paper, we the sentiment of the twitter use on the hashtags and emojis the Sentiment Analysis: Sentiment Analysis is the pro self-indulgent feelings of the their opinion about something neutral. Sentiment Analysis w Opinion Mining is used to de Speaker. Sentiment Analysis variety of Applications such a customer service etc., For ex can be analysed and reports c can be used to decide how fa people. Working Model: The system uses Naïve Ba categorization. For the cate Naïve Bayes classifiers assum variable value on a given clas values of other variables. This conditional independence. In t an approach involving both corpus-based techniques whi r 2018 Page: 1600 me - 2 | Issue 3 cientific TSRD) nal n Election a Kumar 4 Nadu, India nizations in a number of the largest social media ng site where numerous ows its user to read and ze 140 characters. The ed as tweets. Nearly d in a day. Users’ idea is in turn expresses their tterances to be available be mined using mining e discuss the analysis of ers for an election based ey use to tweet. ocess of determining the people, that is whether g is positive, negative or which is also known as erive the attitude of the of a social media has a as marketing, reviewing, xample, movie reviews can be generated which ar the movie reached the yes Approach for text egorization of the text, me that the effect of a ss is independent of the s assumption is called as this paper, we a propose h dictionary-based and ich finds the semantic

Twitter Data Analytics '“ Sentiment Analysis of An Election

  • Upload
    ijtsrd

  • View
    4

  • Download
    0

Embed Size (px)

DESCRIPTION

Social media is used to analyze political campaigns, stock market, movies, medicines, agriculture etc. Twitter a microblogging website where users read and write millions of tweets on a variety of topics daily. This project attempts to analyze the sentiment of the people for the election candidates based upon the live opinions and emotions. The focus of our project is to assign the polarity to each tweet that is whether the user expresses a positive or negative opinion. With the tweets that are extracted, we try to find how frequent their emotions change. We are also trying to classify and differentiate the sentiments of the people before and after the election based on the tweets they upload. The location of the twitter user is used to classify the geographical area which in turn helps to analyze the emotions of people of different areas. Our project uses the Naïve Bayes approach in R language and R Studio for processing the textual data. A. V. Gukanesh | G. Karthick Kumar | K. Karthik Raja Kumar | N. Saranya "Twitter Data Analytics Sentiment Analysis of An Election" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: https://www.ijtsrd.com/papers/ijtsrd11457.pdf Paper URL: http://www.ijtsrd.com/computer-science/data-miining/11457/twitter-data-analytics-'“-sentiment-analysis-of-an-election/a-v-gukanesh

Citation preview

Page 1: Twitter Data Analytics '“ Sentiment Analysis of An Election

@ IJTSRD | Available Online @ www.ijtsrd.com

ISSN No: 2456

InternationalResearch

Twitter Data Analytics

N. Saranya1, A. V. Gukanesh

1

1,2,3,4Department of Computer Science and Engineering, Sri Shakthi Institute of Engineering and Technology

ABSTRACT

Social media is used to analyze political campaigns, stock market, movies, medicines, agriculture etc. Twitter a microblogging website where users read and write millions of tweets on a variety of topics daily. This project attempts to analyze the sentiment of the people for the election candidates based upon the live opinions and emotions. The focus of our project is to assign the polarity to each tweet that is whether the user expresses a positive or negative opinion. With the tweets that are extracted, we try to find how frequent their emotions change. We are also trying to classify and differentiate the sentiments of the people before and after the election based on the tweets they upload. The location of the twitter user is used to classify the geographical area which in turn helps to analyze the emotions of people of differeproject uses the Naïve-Bayes approach in R language and R Studio for processing the textual data. Keywords: Twitter, R, Sentiment Analysis, Naïve Bayes Algorithm INTRODUCTION:

The social media sites which are used to make frequent posts in short publicly or in a group of users are called as the microblogging sites. These kinds of sites are hundreds in number. Therefore, there is a huge collection of data on these sites. The rateincreases rapidly. These data express the moods and sentiments of the people in a large amount. The benefits of social media analytics include anticipation of business opportunities and competitive advantage. Cost efficiency is a major benefit of data analytics. Therefore, implementing the technology will result in

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018

ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume

International Journal of Trend in Scientific Research and Development (IJTSRD)

International Open Access Journal

Twitter Data Analytics – Sentiment Analysis of An Election

Gukanesh2, G. Karthick Kumar3, K. Karthik Raja Kumar

1Assistant Professor, 2,3,4 Student Department of Computer Science and Engineering,

Sri Shakthi Institute of Engineering and Technology, Coimbatore, Tamil Nadu,

Social media is used to analyze political campaigns, medicines, agriculture etc.

website where users read and write millions of tweets on a variety of topics daily. This project attempts to analyze the sentiment of the people for the election candidates based upon the live

d emotions. The focus of our project is to assign the polarity to each tweet that is whether the user expresses a positive or negative opinion. With the tweets that are extracted, we try to find how frequent their emotions change. We are also trying to

assify and differentiate the sentiments of the people before and after the election based on the tweets they upload. The location of the twitter user is used to classify the geographical area which in turn helps to analyze the emotions of people of different areas. Our

Bayes approach in R language and R Studio for processing the textual data.

Twitter, R, Sentiment Analysis, Naïve

The social media sites which are used to make frequent posts in short publicly or in a group of users are called as the microblogging sites. These kinds of sites are hundreds in number. Therefore, there is a huge collection of data on these sites. The rate of data increases rapidly. These data express the moods and sentiments of the people in a large amount. The benefits of social media analytics include anticipation of business opportunities and competitive advantage.

data analytics. Therefore, implementing the technology will result in

positive advantages for organizations in a number of industries. Twitter is one of the largest social media platform and a microblogging site where numerous users upload data. Twitter allows its user to read and post Twitter messages of size 140 characters. The Twitter messages are called as tweets. Nearly 656Million tweets are tweeted in a day. Users’ idea is expressed as tweets which in turn expresses their mood. Twitter makes these utterances to be available in a data stream, which can be mined using mining algorithms. In this paper, we discuss the analysis of the sentiment of the twitter users for an election based on the hashtags and emojis they use to tweet.

Sentiment Analysis: Sentiment Analysis is the process of determining the self-indulgent feelings of the people, that is whether their opinion about something is positive, negative or neutral. Sentiment Analysis which is also known as Opinion Mining is used to derive the attitudeSpeaker. Sentiment Analysis of a social media has a variety of Applications such as marketing, reviewing, customer service etc., For example, movie reviews can be analysed and reports can be generated which can be used to decide how far the movie rpeople. Working Model: The system uses Naïve Bayes Approach for text categorization. For the categorization of the text, Naïve Bayes classifiers assume that the effect of a variable value on a given class is independent of the values of other variables. This assumption is called as conditional independence. In this paper, we a propose an approach involving both dictionarycorpus-based techniques which finds the semantic

Apr 2018 Page: 1600

6470 | www.ijtsrd.com | Volume - 2 | Issue – 3

Scientific (IJTSRD)

International Open Access Journal

Sentiment Analysis of An Election

Karthik Raja Kumar4

Coimbatore, Tamil Nadu, India

positive advantages for organizations in a number of industries. Twitter is one of the largest social media platform and a microblogging site where numerous

allows its user to read and post Twitter messages of size 140 characters. The Twitter messages are called as tweets. Nearly 656Million tweets are tweeted in a day. Users’ idea is expressed as tweets which in turn expresses their

tterances to be available in a data stream, which can be mined using mining algorithms. In this paper, we discuss the analysis of the sentiment of the twitter users for an election based on the hashtags and emojis they use to tweet.

timent Analysis is the process of determining the indulgent feelings of the people, that is whether

their opinion about something is positive, negative or neutral. Sentiment Analysis which is also known as Opinion Mining is used to derive the attitude of the Speaker. Sentiment Analysis of a social media has a variety of Applications such as marketing, reviewing, customer service etc., For example, movie reviews can be analysed and reports can be generated which can be used to decide how far the movie reached the

The system uses Naïve Bayes Approach for text categorization. For the categorization of the text, Naïve Bayes classifiers assume that the effect of a variable value on a given class is independent of the

ariables. This assumption is called as conditional independence. In this paper, we a propose an approach involving both dictionary-based and

based techniques which finds the semantic

Page 2: Twitter Data Analytics '“ Sentiment Analysis of An Election

International Journal of Trend in Scientific Research and

@ IJTSRD | Available Online @ www.ijtsrd.com

orientation of the sentiments in the tweets. Emoticons, neutralization, negation handling and capitalization is also considered as they are the huge part of the modern internet language. To uncover the sentiments, we will first extract the opinion words from the tweets and then we find out their orientation, that is to dwhether the sentiment word reflects the feelings of the user.

Fig 1. Navie Bayes Approach

Fig 2. Working model

The following steps will brief the process of the proposed system which is discussed in this paper 1. Retrieval of tweets 2. Pre-processing of extracted data 3. Parallel processing 4. Sentiment scoring module 5. Output sentiment 1. Retrieval of tweets: As Twitter is the most exaggerated part of social networking site, it consists of various blogs which are related to various topics worldwide. Instead of taking whole blogs, we will rather search on a particular topic and extract all the tweets related to that topic. 2. Pre-processing of extracted data: After retrieval of tweets, Sentiment analysis tool is applied to raw tweets but in most of the cases results in very poor performance. Therefore, pretechniques are necessary for obtaining better results. The process involves the following steps:

International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018

orientation of the sentiments in the tweets. Emoticons, tion, negation handling and capitalization is

also considered as they are the huge part of the modern internet language. To uncover the sentiments, we will first extract the opinion words from the tweets and then we find out their orientation, that is to decide whether the sentiment word reflects the feelings of the

1. Navie Bayes Approach

The following steps will brief the process of the proposed system which is discussed in this paper

As Twitter is the most exaggerated part of social networking site, it consists of various blogs which are

topics worldwide. Instead of taking whole blogs, we will rather search on a particular topic and extract all the tweets related to that topic.

After retrieval of tweets, Sentiment analysis tool is but in most of the cases results

in very poor performance. Therefore, pre-processing techniques are necessary for obtaining better results.

The process involves the following steps:

i) Filtering: Filtering is nothing but the cleaning of raw data. In this step, URL links (E.g. http://twitter.com), special words in twitter (e.g. “RT” which means Retweets), usernames in twitter (e.g. @kgk indicating a username) are removed and emoticons are replaced with special strings. ii) Tokenization: Tokenization is nothing but Segmentation of sentences. In this step, we will tokenize or segment text with the help of splitting text by spaces and punctuation marks to form a container of words. iii) Construction of n-grams:Set of n-grams can make out of consecutivNegation words such as “no”, “not” is attached to a word which follows or precedes it. For Instance: “I do not like jazz music” has two bigrams: “I do+not”, “do+not like”, “not+like jazz music”. Sothe accuracy of the classification improves by takaccount because it is a very common linguistic construction that affects polarity. 3. Parallel Processing: Sentiment classifier which classifies the sentiments using Naïve Bayes Classifier where every database has hidden information whicdecision-making. The set of models are found by classification and this model can be used to predict the class of objects. It is a twofirst step is Model Construction, in which the model is built from the training set and the second step is Model Usage which is used to classifying new data. 4. Sentiment Scoring Module:The basic feature of this model is Polarity of the words. A dictionary which contains a list of English words and score which ranges from 1 to 3. ThScoring module is used to determine the sentiment of the textual data.

Polarity 1

2 3

Table1. Polarity Values

5. Output the sentiment:

Development (IJTSRD) ISSN: 2456-6470

Apr 2018 Page: 1601

Filtering is nothing but the cleaning of raw data. In this step, URL links (E.g. http://twitter.com), special words in twitter (e.g. “RT” which means Retweets), usernames in twitter (e.g. @kgk - @ symbol indicating a username) are removed and emoticons are replaced with special strings.

ation is nothing but Segmentation of sentences. In this step, we will tokenize or segment text with the help of splitting text by spaces and punctuation marks to form a container of words.

grams: grams can make out of consecutive words.

Negation words such as “no”, “not” is attached to a word which follows or precedes it. For Instance: “I do not like jazz music” has two bigrams: “I do+not”, “do+not like”, “not+like jazz music”. Sothe accuracy of the classification improves by taking negation into account because it is a very common linguistic construction that affects polarity.

Sentiment classifier which classifies the sentiments using Naïve Bayes Classifier where every database has hidden information which can be used for

making. The set of models are found by classification and this model can be used to predict the class of objects. It is a two-step process where the first step is Model Construction, in which the model is

set and the second step is Model Usage which is used to classifying new data.

4. Sentiment Scoring Module: The basic feature of this model is Polarity of the words. A dictionary which contains a list of English words and score which ranges from 1 to 3. The Scoring module is used to determine the sentiment of

Sentiment Negative

Neutral Positive

Table1. Polarity Values

Page 3: Twitter Data Analytics '“ Sentiment Analysis of An Election

International Journal of Trend in Scientific Research and

@ IJTSRD | Available Online @ www.ijtsrd.com

Based on the Score obtained from the previous module the system interprets whether the tweet is positive, neutral or negative and visualize the results with visualization models. Visualization: The details of the analysis and their visualizations are shown below. For visualization packages of R such as tm, reshape, gridextra, ggplot, wordcloud are used which can be obtained from different countries of the world. The analysis is based on the computhese tweets. Word cloud: The extraction of words the tweets based on the analysis of the hashtag (the words that start with hash). The content attribute of the hashtag was used for map function and reduce function is used to get the respective counts of different hashtags. The Shiny interface allows the user to select the maximum number of words maximum number of words as well as the minimum frequency of words used for visualization.

Fig 3: Wordcloud Top 20 tweeters: In this visualization type, the top 20 tweets are displayed. A bar plot is created here by using the bar plot function which takes height as a vector a vector.

Fig4. Top Tweets Pie chart:

International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018

Based on the Score obtained from the previous interprets whether the tweet is

positive, neutral or negative and visualize the results

The details of the analysis and their visualizations are shown below. For visualization packages of R such as tm, reshape, gridextra, ggplot, wordcloud are used which can be obtained from different countries of the world. The analysis is based on the computation of

The extraction of words the tweets based on the analysis of the hashtag (the words that start with hash). The content attribute of the hashtag was used for map function and reduce function is used to get

counts of different hashtags. The Shiny interface allows the user to select the maximum number of words maximum number of words as well as the minimum frequency of words used for

ype, the top 20 tweets are displayed. A bar plot is created here by using the bar plot function which takes height as a vector a vector.

A chance gradually shows the opinions of humans.

Fig 5. Pie ChartTabular Form: On the scale of 5, the sentiments are depicted and presented here. A tabular column consists of a positive score and negative score of the tweets.

Fig6. Tabular form Histogram: A histogram shows the score according to the positive, negative and neutral opishows two histograms, one for positivity and other for negative based on the frequency of days.

Fig7. Histogram CONCLUSION:

Development (IJTSRD) ISSN: 2456-6470

Apr 2018 Page: 1602

A chance gradually shows the opinions of humans.

Fig 5. Pie Chart

the scale of 5, the sentiments are depicted and presented here. A tabular column consists of a positive score and negative score of the tweets.

Fig6. Tabular form

A histogram shows the score according to the positive, negative and neutral opinions. This model shows two histograms, one for positivity and other for negative based on the frequency of days.

Fig7. Histogram

Page 4: Twitter Data Analytics '“ Sentiment Analysis of An Election

International Journal of Trend in Scientific Research and Development (IJTSRD) ISSN: 2456-6470

@ IJTSRD | Available Online @ www.ijtsrd.com | Volume – 2 | Issue – 3 | Mar-Apr 2018 Page: 1603

The main objective of this paper was to describe and design system for classifying the sentiments of the people over a period of time using their tweets. It was developed using R and RStudio Big data processing technologies. We developed a set of analytical representation which helps the user to identify the data and can gain insights from it. We took a set of visualizations, implemented in shiny web applications.

REFERENCES:

1. Akshi Kumar and Teeja Mary Sebastian, “Sentiment Analysis on Twitter”IJCSI, Vol. 9, Issue 4, No 3, July 2012

2. G. Vinodhini, R. M. Chandrasekaran “Sentiment Analysis and Opinion Mining: A Survey” Volume 2, Issue 6, June 2012, IEEE paper.

3. Statistical Analysis with R Book by John M. Quick.

4. Web Application Development with R using Shiny by ChrishBeeley.

5. http://dev.twitter.com/docs.

6. http://dev.twutter.com/overview/api.

7. http://www.r-bloggers.com/

8. Go, R. Bhayani, L.Huang. “TwitterSentiment Classification Using DistantSupervision”, Stanford University,Technical Paper, 2009

9. http://apps.twitter.com/

10. http://www.bbc.com./news/world-asia-india.

11. Use and Rise of social media as Election Campaign medium of India, Narasimhamurthy N, (IJIMS) 2014 vol.1 No.8 202-209.