Click here to load reader
Upload
vanmien
View
217
Download
0
Embed Size (px)
Citation preview
IJSRD - International Journal for Scientific Research & Development| Vol. 5, Issue 02, 2017 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 111
Sentiment Analysis using Twitter Data Kaustubh Chalke1 Ria Dogra2 Tanushree Shetty3 Satish Ranbhise4
1,2,3,4Department of Computer Engineering 1,2,3,4Atharva College of Engineering, Mumbai, India
Abstract— Analysis of any particular product or entity can be
done using the wide collection of unstructured data obtained
from the Internet (World Wide Web). Many industrial and
survey companies are using this data analysis to take
decisions. In this paper, we want to show how effectively,
sentiment analysis can be done on the data collected from the
social networking website called Twitter, using Flume.
Twitter is an online web application containing rich amounts
of data that can be an un-structured, semi-structured or
structured data. The Twitter data can be collected by using
BIGDATA eco-system, using online streaming tool called
Flume. In this project, the type of analysis we’re
concentrating on is Sentiment analysis. Using Hive and its
queries to give the sentiment data based on the groups that are
defined in the HQL (Hive Query Language). Here we have
categorized this sentiment analysis into 3 groups of tweets
that have positive, moderate and negative comments.
Key words: Social Networking, Sentiment Analysis using
Twitter Data
I. INTRODUCTION
Social networking sites contribute a lot towards data, these
days. In order to find the interesting patterns or trends from
this huge data, data scientists need to clean, integrate,
aggregate and analyze the data. The purpose of this project is
to find out trends by aggregating the data in social networking
site such as Twitter, analyze this data and evaluate the
sentiments of the user tweets. This data comes from
everywhere: posts in social media sites, digital videos and
pictures, sensors used to gather climate information, purchase
transaction records, cell phone GPS signals, etc. This data is
called big data.
People today, are expressing their views and
thoughts on social media sites and sharing this information
with the world.
Twitter itself generates 1TB worth of data within a
week. Many companies use these posts from Twitter called
tweets, for data analyses and to predict the success rate of
their products.
Sentiment analysis also known as opinion mining. It
is the process of computationally identifying and categorizing
opinions that are expressed in a piece of text, to determine the
writer's attitude towards a particular product or topic. The
main purpose of Sentiment Analysis is to detect the
contextual polarity of text. In other words, it determines
whether a piece of writing is positive, negative or neutral.
In this project, we are using Hadoop to analyze
twitter data. Twitter data is in the form of comments which
are nothing but sentiments that are opinions, feelings of
people. This data will be collected by using Twitter API. By
analyzing this data, our system will give output in the form of
positive, negative and neutral tweets. In this case, it makes
the use of data dictionary for classifying the data. This data
can be used further according to particular application. And
this analyzed data can be represented in the form of pie-
charts.
II. PROPOSED SYSTEM
Fig. 1: Proposed System
We will try to get Tweets using Flume and save them into
HDFS for later analysis. Twitter exposes the API to get the
Tweets.
The system proposes to extract raw data from
Twitter and then assign a positive, negative, or neutral
sentiment value to each Tweet.
These tweets will then be converted into a pie chart
and executed using Excel.
III. METHODOLOGY
Fig. 2: Methodology
Sentiment Analysis using Twitter Data
(IJSRD/Vol. 5/Issue 02/2017/030)
All rights reserved by www.ijsrd.com 112
In this Project, we are going to follow the following methods:
Creating Twitter Application.
Getting data using Flume.
Querying using Hive Query Language (HQL)
A. Creating Twitter Application
For sentiment analysis on Twitter data, we have to create an
account in Twitter developer and create an application by
clicking on the new application button provided by them.
After creating a new application just create the access tokens
so that we no need to provide our authentication details there
and also after creating application it will be having one
consumer keys to access that application for getting Twitter
data. We want to take this keys and token details and want to
set in the Flume configuration file such that we can get the
required data from the Twitter in the form of tweets.
B. Getting data using Flume
After creating an application in the Twitter developer site we
want to use the consumer key and secret along with the access
token and secret values. By which we can access the Twitter
and we can get the information that what we want exactly
here we will get everything in JSON format and this is stored
in the HDFS that we have given the location where to save all
the data that comes from the Twitter.
C. Querying using Hive Query Language (HQL)
After running the Flume by setting the above configuration
then the Twitter data will automatically will save into HDFS
where we have the set the path storage to save the Twitter
data that was taken by using Flume.
Also we are using another UDF’s (User Defined
Functions) for performing the sentiment analysis on the tales
that are created by using Hive.
IV. FUTURE SCOPE AND CONCLUSION
This Project can be used by various companies and agencies
to review their product and predict the profit they might make
with the analysed data. Major decisions can also be made by
judging the online reviews of various products. For example,
a movie production house can judge if the trailer of a movie
has fared well with the masses. Based on this data they can
make decisions as to where to screen the movie more.
Hence, in the current digitalizing world, views and
sentiments are expressed and shared on the internet. This
information can be analysed to benefit products of brands and
companies to increase profits.
ACKNOWLEDGMENT
Prof. Satish Ranbhise continued guidance and advice
throughout the course of this project. His guidance helped us
all the time in rethinking ideas and implementing them in the
project. Without his support this project would not have been
a success.
We would also like to thank our HOD Prof.
Mahindra Patil and other faculty members of Department of
Computer Engineering who have helped us and guided us
towards becoming a graduate.
We would like to thank all our friends and family for
their continuous support that has helped us achieve this
personal goal.
REFERENCES
[1] Steve Hoffman, Apache Flume: Distributed Log
Collection for Hadoop - Second Edition, pg. 16-24
[2] Michael Frampton, Big Data Made Easy: A Working
Guide to the Complete Hadoop Toolset, Apress
publication, pg. 20-26
[3] Soumendra Mohanty, Madhu Jagadeesh, Harsha
Srivatsa, Big Data Imperatives: Enterprise ‘Big Data’
Warehouse, ‘BI’ Implementations, Apress publication,
pg. 214
[4] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen
Rambow, Rebecca Passonneau, Sentiment Analysis of
Twitter Data
[5] A.Bhayani & Huang, L. (2009), Twitter sentiment
classification using distant supervision, CS224N Project
Report,Stanford, 1-12.
[6] Akshi Kumar and Teeja Mary Sebastian July 2012,
Sentiment Analysis on Twitter, IJCSI International
Journal of Computer Science Issues, Vol. 9, Issue 4, No
3
[7] Penchalaiah, Murali, Suresh Babu, October 2014,
Effective Sentiment Analysis on Twitter Data using:
Apache Flume and Hive, IJISET - International Journal
of Innovative Science, Engineering & Technology, Vol.
1 Issue 8.
[8] Aditya Bhardwaj, Ankit Kumar, Yogendra Narayan,
Pawan Kumar-Big Data Emerging Technologies: A Case
Study with Analyzing Twitter Data using Apache Hive-
2015 IEEE.