8
Abstract Facebook and Twitter started off as friendship and networking tool, but they have evolved into potent weapons of social mobilization. This tools not only monitors users who are actively engaged in providing sentiment, or the opinion (attitude) but also they add their unique insights to product penetration and reflect the changing moods of the public. In this paper we were about to collect feedback on ESOP at a very basic level through different social monitoring tools and observe their polarity. The proposed approach combines K-means clustering, Expectation Maximisation clustering algorithm and VAR K-Means that can be used to group the monitoring tools within a particular time span. A data mining tool Tanagra1.4 can be used for implementation of their results. Keywords: Social media, Sentiment Analysis, Opinion Mining, Social Monitoring Tools, Tanagra1.4 1. SOCIAL MEDIA Social media are popularly known as democracy’s pipeline, an amplifier of unfiltered emotion, an organism with a million tongues and twice as many eyes, a virtual megaphone with a global reach. This networking tool is now a weapon of public mobilization. The unbridling of the power of the social media was undoubtedly a top, if not number one trend of 2012 in India. In many cases, it set the agenda of public discourse. It is a movement without leaders, without any organized structure, and without any pre-determined plan. Definitely it is fundamentally making new shape. Recent surveys on media by research firm Social Bakers and Semiocast a Paris reports that:- 1. Facebook has 65 million active users in India – top five world- wide in terms of users. 2. Half of Indian facebook users are below the age of 25years. 3. 75% of web users in India below age of 35years. 4. One in every four online minutes spent on social networking sites. 5. Over 200mn Twitter users globally, Make half a billion tweets 6 th in terms of total twitter accounts. 6.42% smartphone users in India use device to access news. 7. Nearly 72% netizens lives in urban areas. 8. Nearly 52% internet users connect to web via a mobile phone. 9. Among urban internet users 67% connect to web for social Monitoring Opinion on ESOP through Social Media and Clustering its Polarity Page 1

Monitoring opinion on esop through social media and clustering its polarity

Embed Size (px)

Citation preview

Abstract

Facebook and Twitter started off as friendship and networking tool, but they have evolved into potent weapons of social mobilization. This tools not only monitors users who are actively engaged in providing sentiment, or the opinion (attitude) but also they add their unique insights to product penetration and reflect the changing moods of the public. In this paper we were about to collect feedback on ESOP at a very basic level through different social monitoring tools and observe their polarity. The proposed approach combines K-means clustering, Expectation Maximisation clustering algorithm and VAR K-Means that can be used to group the monitoring tools within a particular time span. A data mining tool Tanagra1.4 can be used for implementation of their results.

Keywords:

Social media, Sentiment Analysis, Opinion Mining, Social Monitoring Tools, Tanagra1.4

1. SOCIAL MEDIA

Social media are popularly known as democracy’s pipeline, an amplifier of unfiltered emotion, an organism with a million tongues and twice as many eyes, a virtual megaphone with a global reach. This networking tool is now a weapon of public mobilization. The unbridling of the power of the social media was undoubtedly a top, if not number one trend of 2012 in India. In many cases, it set the agenda of public discourse. It is a movement without leaders, without any organized structure, and without any pre-determined plan. Definitely it is fundamentally making new shape. Recent surveys on media by research firm Social Bakers and Semiocast a Paris reports that:-1. Facebook has 65 million active users in India – top five world-wide in terms of users.2. Half of Indian facebook users are below the age of 25years.3. 75% of web users in India below age of 35years.4. One in every four online minutes spent on social networking sites.

5. Over 200mn Twitter users globally, Make half a billion tweets 6th in terms of total twitter accounts.6.42% smartphone users in India use device to access news.7. Nearly 72% netizens lives in urban areas.8. Nearly 52% internet users connect to web via a mobile phone.9. Among urban internet users 67% connect to web for social networking, 87% for communication.10. About 1.5Lakh new internet users added every month in India.

2. SENTIMENT ANALYSIS

In General, Opinion mining or Sentiment analysis is an important sub discipline within data mining and Natural Language Processing (NLP), that deals with building a system that explores the user’s opinions made in blog spot, comments, reviews, discussion, news, feedback or tweets, about the product, policy, person or a topic. Sentiment analysis, opinion mining, opinion extraction, sentiment mining, subjectivity analysis, affect analysis, emotion analysis, review mining, etc. are now under the umbrella of sentiment analysis or opinion mining. All these platforms provide a huge amount of valuable information that we are interested to analyse. To be specific, opinion mining can be defined as a sub-discipline of computational linguistics that focuses on extracting people’s opinion from the web. It analyse from a given piece text about: - Which part is opinion expressing; Who wrote the opinion; What is being commented. Sentiment analysis on the other hand is about determining the subjectivity, polarity (positive, negative or neutral) and polarity strength (weakly positive, mildly positive, strongly positive, etc.,). In other words it will look into piece of text for: - What is the opinion of the writer.

2.1. SENTIMENT ANALYSIS RESEARCH

The urban educated reader, may first check out the blogs, read product reviews and decide on brands they choose be it new LED TVs, home décor, DVDs, insurance plan and

Monitoring Opinion on ESOP through Social Media and Clustering its Polarity

Page 1

policies etc,. In general, sentiment analysis has been investigated mainly at three levels:1. Document level: The task at this level is to classify whether a whole opinion document expresses a positive or negative sentiment. For example, given a product review, the system determines whether the review expresses an overall positive or negative opinion about the product. This task is commonly known as document-level sentiment classification. This level of analysis assumes that each document expresses opinions on a single entity (e.g., a single).

2. Sentence level: The task at this level goes to the sentences and determines whether each sentence expressed a positive, negative, or neutral opinion. Neutral usually means no opinion. This level of analysis is closely related to subjectivity classification, which distinguishes sentences (called objective sentences) that express factual information from sentences (called subjective sentences) that express subjective views and opinions. However, we should note that subjectivity is not equivalent to sentiment as many objective sentences can imply opinions, e.g., “We bought the car last month and the windshield wiper has fallen off.”

3. Entity and Aspect level: Both the document level and the sentence level analyses do not discover what exactly people liked and did not like. Aspect level performs finer-grained analysis. Instead of looking at language constructs (documents, paragraphs, sentences, clauses or phrases), aspect level directly looks at the opinion itself. It is based on the idea that an opinion consists of a sentiment (positive or negative) and a target (of opinion). An opinion without its target being identified is of limited use. Realizing the importance of opinion targets also helps us understand the sentiment analysis problem better. For example, the sentence “The iPhone’s call quality is good, but its battery life is short” evaluates two aspects, call quality and battery life, of iPhone (entity). The sentiment on iPhone’s call quality is positive, but the sentiment on its battery life is negative. The call quality and battery life of iPhone are the opinion targets. Based on this level of analysis, a structured summary of opinions about entities and their aspects can be produced.

3. SENTIMENT MONITORING TOOLS

Social media has created a new world of venting and consumer voice. This changing online environment has allowed customers to comment about brands and personal experiences. And that is why it is necessary to perform social media monitoring. One helpful aspect in monitoring is sentiment, or the attitude and tone of a user’s comment, review or mention with respect to the brand. There are hundreds of sentiment analysis programs available—but most come with a cost.

4. PROPOSED WORK:

Here, we have taken in our proposed work the document level sentiment towards ESOP through the best four sentiment monitoring tools.

1.Social Mention—track and measure what people are saying about you, your company, a new product, policy or any topic across the Web’s social media landscape (100+ social media platforms)

Fig 1: Social mention feedback on ESOP

2. Trackur —Online reputation and social media monitoring tool to track trends, understand influence, receive alerts and tag sentiment.

Page 2

Fig 2: Trackur feedback on ESOP3. Twendz—A Twitter-mining Web application that highlights conversation themes and sentiment of the tweets, as well as pinpointing top influencers minute by minute.

Fig 3:Twendz feedback on ESOP

4.Twitrratr—Simply analyse terms based on a pre-defined glossarly, and give highly simplified and unreliable results.

Fig 4: Twitratr feedback on ESOP

5. CLUSTERING USING TANAGRA1.4

Clustering is one of the important techniques in data mining categorizes unlabeled objects into several clusters such that the objects belonging to the same cluster are more similar than those belonging to different clusters. A cluster is an ordered list of objects, which have some common characteristics. The objects belong to an interval [a,b] or [0,1].The distance between two clusters involve some or all elements of the two clusters.Tanagra1.4 is free data mining software for academic and research purposes. It proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. The main purpose of Tanagra project is to give researchers an easy-to-use data mining software. The second purpose of TANAGRA is to propose to researchers an architecture allowing them to easily add their own data mining methods, to compare their performances. TANAGRA acts more as an experimental platform. The third and last purpose, in direction of novice developers, consists in diffusing a possible methodology for building this kind of software. In this way, Tanagra can be considered as a pedagogical tool for learning programming techniques.

6. DATASAMPLE

The dataset used in our experimental research is acquired from various social monitoring tools then it is imported to Tanagra1.4. First step is to use feature selection components to define status and parameter. Next, step is to click on the clustering component and

Page 3

choose K-means method. The other two clustering methods must follow same procedure in defining its status.

Table 1: Polarity strength towards ESOPTool used

TimeTaken

No of Positivepolarity

No of negativepolarity

No of Neutralpolarity

TotalNumber Of tweets

Social mention 9 13 0 284 297

Twendz6 21 3 76 100

Trackur5 7 6 1 13

Twittrat 3 17 1 153 171

Fig 5: In Tanagra1.4 view dataset window

7. IMPLEMENTATION OF ALGORITHM

7.1. K-Means is a well-known partition method. Objects are classified as belonging to one of k groups. Cluster membership is determined by calculating the centroid for each group and assigning each object to the group with the closest centroid. This approach minimizes the overall within-cluster dispersion by iterative reallocation of cluster members.Description: Clustering with K-Means method (Forge or McQueen) continuous input attribute.Precondition: One or more continuous attributes must be available in the dataset.Target attribute(s): NoneInput attribute(s): One or more continuous attributes.Post condition: A new discrete attribute is added to the dataset. Each value of the attribute corresponds to a cluster.

Fig 6:In K-Means R-Square calculation for each trial

Fig 7: K-Means – Cluster centroids7.2. The Expectation Maximization(EM) is a well-established clustering algorithm in the statistics community. EM is a distance-based algorithm that assumes the dataset can be modeled as a linear combination of multivariate noraml distributions and the algorithm finds the distribution parameters that maximize a model quality measure called log likelihood.Description:Clustering with Expectation-Maximization clustering algorithm. Gaussian mixture. Continuous inputs.Precondition: One or more continuous attributes must be available in the dataset.Target attribute(s): NoneInput attribute(s):One of more continuous attributes.Post condition:A new discrete attribute is added to the dataset. Each value of this attribute corresoponds to a cluster.

Page 4

Fig 8: In EM-clustering, cluster quality criteria – log likelihood is calculated

Fig 9: In EM-clustering Cluster centroid

7.3.VAR K-Means:

Description:Clustering variables using K-Means approach on latent variable.Precondition: Two or more continuous attributes must be available in the dataset.Target attribute(s): NoneInput attribute(s):Two or more continuous attributes.Post condition:A set of continuous attributes which represent clusters are available.

Fig 10 : Cluster members and R-Square value

Fig 11: VAR K-Means, Cluster correlat

7.CONCLUSIONAfter analyzing the results using three

different clustering algorithm that runs under Tanagra1.4 tool, the following tables and charts are generated which indicates cluster 2 has indicates most negative value when compared to cluster 1 and cluster 2 irrespective of the algorithm choosen.

Table 2: No of positive polarity:

Algorithm cluster1 cluster2 cluster 3

K-means 19.000000 19.000000 1.0000

EM 7.000000 19.000000 0.2075VAR k-means 13.000000

-99999.000000 -0.4852

Table 3: No of negative polarity:

Algorithm cluster1 cluster2 cluster 3

K-means 2.000000 2.000000 -0.4852

EM 6.000000 3.000000 -0.9373VAR k-means 0.000000

-99999.000000 1.0000

Table 4: No of neutral polarity:

Page 5

Algorithm cluster1 cluster2 cluster 3

K-means 114.500000 114.500000 0.2075

EM 0.000000 142.000000 1.0000VAR k-means 284.000000

-99999.000000 -0.9373

Fig 12:Chart about positive polarity

Fig 13: Chart about negative polarity

Fig 14: Chart indicating neutral polarity

REFERENCES

[1] Zhongwu Zhai, Bing Liu, Hua Xu and Peifa Jia. "Clustering Product Features for Opinion Mining." Proceedings of Fourth ACM International Conference on Web Search and Data Mining (WSDM-2011), Feb. 9-12, 2011, Hong Kong, China. [2] Bo Pang and Lillian Lee,”Opinion Mining and Sentiment Analysis”, Foundations and Trends R_ in Information Retrieval,Vol. 2, Nos. 1–2 (2008) 1–135[3] K.Nirmala Devi and V.Murali Bhaskaran, “Sentiment Analysis for Online Forums Hotspot Detection”, Proceedings in ICTACT Journal on Soft Computing, Vol. 02,No.2, Jan 2012.[4] Osama Abu Abbas,”Comparison s Between Data Clustering Algorithms”,in International Arab Journal of Information Technology, Vol. 5, No.3, July 2008.

[5] Georgious Paltoglou, Mike Thelwall,” Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media”,ACM-TIST-V3N4-TIST-2010-11-0317[6] Albert Bifet and Eibe Frank,University of Waikato, Hamilton, New Zealand, ”Sentiment Knowledge Discovery in Twitter Streaming Data”[7] Minqing Hu and Bing Liu. "Mining and summarizing customer reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25, 2004. [8] www.socialmention.com, www.twendz.com, www.trackur.in, www.twitratr.com, www.eBizMBA.com

Nithya Ramachandran is working as Assistant Professor in Computer science Department at R.V.S College of Arts and Science, Sulur,

Coimbatore, Tamil Nadu, India and pursuing Ph.D in part time in the area of Data mining.Her research work focusses on datamining and its supporting open source tools.

Page 6