Sentiment Analysis using Twitter Data - IJSRD Analysis using Twitter Data Kaustubh Chalke1 Ria Dogra2 Tanushree Shetty3 Satish Ranbhise4 1,2,3,4Department of Computer Engineering 1,2,3,4Atharva

IJSRD - International Journal for Scientific Research & Development| Vol. 5, Issue 02, 2017 | ISSN (online): 2321-0613

All rights reserved by www.ijsrd.com 111

Sentiment Analysis using Twitter Data Kaustubh Chalke1 Ria Dogra2 Tanushree Shetty3 Satish Ranbhise4

1,2,3,4Department of Computer Engineering 1,2,3,4Atharva College of Engineering, Mumbai, India

Abstract— Analysis of any particular product or entity can be

done using the wide collection of unstructured data obtained

from the Internet (World Wide Web). Many industrial and

survey companies are using this data analysis to take

decisions. In this paper, we want to show how effectively,

sentiment analysis can be done on the data collected from the

social networking website called Twitter, using Flume.

Twitter is an online web application containing rich amounts

of data that can be an un-structured, semi-structured or

structured data. The Twitter data can be collected by using

BIGDATA eco-system, using online streaming tool called

Flume. In this project, the type of analysis we’re

concentrating on is Sentiment analysis. Using Hive and its

queries to give the sentiment data based on the groups that are

defined in the HQL (Hive Query Language). Here we have

categorized this sentiment analysis into 3 groups of tweets

that have positive, moderate and negative comments.

Key words: Social Networking, Sentiment Analysis using

Twitter Data

I. INTRODUCTION

Social networking sites contribute a lot towards data, these

days. In order to find the interesting patterns or trends from

this huge data, data scientists need to clean, integrate,

aggregate and analyze the data. The purpose of this project is

to find out trends by aggregating the data in social networking

site such as Twitter, analyze this data and evaluate the

sentiments of the user tweets. This data comes from

everywhere: posts in social media sites, digital videos and

pictures, sensors used to gather climate information, purchase

transaction records, cell phone GPS signals, etc. This data is

called big data.

People today, are expressing their views and

thoughts on social media sites and sharing this information

with the world.

Twitter itself generates 1TB worth of data within a

week. Many companies use these posts from Twitter called

tweets, for data analyses and to predict the success rate of

their products.

Sentiment analysis also known as opinion mining. It

is the process of computationally identifying and categorizing

opinions that are expressed in a piece of text, to determine the

writer's attitude towards a particular product or topic. The

main purpose of Sentiment Analysis is to detect the

contextual polarity of text. In other words, it determines

whether a piece of writing is positive, negative or neutral.

In this project, we are using Hadoop to analyze

twitter data. Twitter data is in the form of comments which

are nothing but sentiments that are opinions, feelings of

people. This data will be collected by using Twitter API. By

analyzing this data, our system will give output in the form of

positive, negative and neutral tweets. In this case, it makes

the use of data dictionary for classifying the data. This data

can be used further according to particular application. And

this analyzed data can be represented in the form of pie-

charts.

II. PROPOSED SYSTEM

Fig. 1: Proposed System

We will try to get Tweets using Flume and save them into

HDFS for later analysis. Twitter exposes the API to get the

Tweets.

The system proposes to extract raw data from

Twitter and then assign a positive, negative, or neutral

sentiment value to each Tweet.

These tweets will then be converted into a pie chart

and executed using Excel.

III. METHODOLOGY

Fig. 2: Methodology

Sentiment Analysis using Twitter Data

(IJSRD/Vol. 5/Issue 02/2017/030)

All rights reserved by www.ijsrd.com 112

In this Project, we are going to follow the following methods:

Creating Twitter Application.

Getting data using Flume.

Querying using Hive Query Language (HQL)

A. Creating Twitter Application

For sentiment analysis on Twitter data, we have to create an

account in Twitter developer and create an application by

clicking on the new application button provided by them.

After creating a new application just create the access tokens

so that we no need to provide our authentication details there

and also after creating application it will be having one

consumer keys to access that application for getting Twitter

data. We want to take this keys and token details and want to

set in the Flume configuration file such that we can get the

required data from the Twitter in the form of tweets.

B. Getting data using Flume

After creating an application in the Twitter developer site we

want to use the consumer key and secret along with the access

token and secret values. By which we can access the Twitter

and we can get the information that what we want exactly

here we will get everything in JSON format and this is stored

in the HDFS that we have given the location where to save all

the data that comes from the Twitter.

C. Querying using Hive Query Language (HQL)

After running the Flume by setting the above configuration

then the Twitter data will automatically will save into HDFS

where we have the set the path storage to save the Twitter

data that was taken by using Flume.

Also we are using another UDF’s (User Defined

Functions) for performing the sentiment analysis on the tales

that are created by using Hive.

IV. FUTURE SCOPE AND CONCLUSION

This Project can be used by various companies and agencies

to review their product and predict the profit they might make

with the analysed data. Major decisions can also be made by

judging the online reviews of various products. For example,

a movie production house can judge if the trailer of a movie

has fared well with the masses. Based on this data they can

make decisions as to where to screen the movie more.

Hence, in the current digitalizing world, views and

sentiments are expressed and shared on the internet. This

information can be analysed to benefit products of brands and

companies to increase profits.

ACKNOWLEDGMENT

Prof. Satish Ranbhise continued guidance and advice

throughout the course of this project. His guidance helped us

all the time in rethinking ideas and implementing them in the

project. Without his support this project would not have been

a success.

We would also like to thank our HOD Prof.

Mahindra Patil and other faculty members of Department of

Computer Engineering who have helped us and guided us

towards becoming a graduate.

We would like to thank all our friends and family for

their continuous support that has helped us achieve this

personal goal.

REFERENCES

[1] Steve Hoffman, Apache Flume: Distributed Log

Collection for Hadoop - Second Edition, pg. 16-24

[2] Michael Frampton, Big Data Made Easy: A Working

Guide to the Complete Hadoop Toolset, Apress

publication, pg. 20-26

[3] Soumendra Mohanty, Madhu Jagadeesh, Harsha

Srivatsa, Big Data Imperatives: Enterprise ‘Big Data’

Warehouse, ‘BI’ Implementations, Apress publication,

pg. 214

[4] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen

Rambow, Rebecca Passonneau, Sentiment Analysis of

Twitter Data

[5] A.Bhayani & Huang, L. (2009), Twitter sentiment

classification using distant supervision, CS224N Project

Report,Stanford, 1-12.

[6] Akshi Kumar and Teeja Mary Sebastian July 2012,

Sentiment Analysis on Twitter, IJCSI International

Journal of Computer Science Issues, Vol. 9, Issue 4, No

3

[7] Penchalaiah, Murali, Suresh Babu, October 2014,

Effective Sentiment Analysis on Twitter Data using:

Apache Flume and Hive, IJISET - International Journal

of Innovative Science, Engineering & Technology, Vol.

1 Issue 8.

[8] Aditya Bhardwaj, Ankit Kumar, Yogendra Narayan,

Pawan Kumar-Big Data Emerging Technologies: A Case

Study with Analyzing Twitter Data using Apache Hive-

2015 IEEE.

Documents

Sentiment Analysis using Twitter Data - IJSRD Analysis using Twitter Data Kaustubh Chalke1 Ria Dogra2 Tanushree Shetty3 Satish Ranbhise4 1,2,3,4Department of Computer Engineering 1,2,3,4Atharva