8
1 Speakers Debarchan Sarkar Agenda HDInsight – Hadoop on Windows Microsoft Offerings Demo – Twitter sentiment analysis Social Media Opinion Mining

Twitter Sentiment Anlaysis using Hadoop

Embed Size (px)

DESCRIPTION

This is a demonstration based session which will show how to use a HDInsight (Apache Hadoop exposed as an Azure Service) cluster to do sentiment analysis from live Twitter feeds on a specific keyword or brand. Sentiment analysis is parsing unstructured data that represents opinions, emotions, and attitudes contained in sources such as social media posts, blogs, online product reviews, and customer support interactions. The demo uses Hadoop Hive and MapReduce to schematize, refine and transform raw Twitter data. It will also focus on the Hive endpoint that HDInsight exposes for client applications to consume HDInsight data through the Hive ODBC interface. Finally, this session will show the present day self-service BI tools (Power View, Power Query and Power Map) to demonstrate how you can generate powerful and interactive visualization on your twitter data to enhance your brand promotion/productivity with just a few mouse clicks.

Citation preview

Page 1: Twitter Sentiment Anlaysis using Hadoop

1

Speakers• Debarchan Sarkar

Agenda• HDInsight – Hadoop on Windows• Microsoft Offerings• Demo – Twitter sentiment analysis

Social Media Opinion Mining

Page 2: Twitter Sentiment Anlaysis using Hadoop

2

Microsoft Data Platform

Page 3: Twitter Sentiment Anlaysis using Hadoop

3

Demo – Twitter Sentiment Analysis

• Enterprises may analyze sentiment about:

• Product

• Service

• Competitors

• Reputation

Is used to understand how the public feels about something at a particular moment in timeAnd also track how those opinions change over time.

Page 4: Twitter Sentiment Anlaysis using Hadoop

4

What We Can Determine

• What do people think about our product (service, company etc.)?

• How positive (or negative) are people about our product based on geographical locations?

• What would people prefer our product to be like?

Page 5: Twitter Sentiment Anlaysis using Hadoop

5

Basic Steps For Sentiment Analysis

• Filtering – we remove URL links (e.g. http://example.com), Twitter user names (e.g. @alex – with symbol @ indicating a user name),

• Tokenization – we segment text by splitting it by spaces and punctuation marks, and form a bag of words.

• Removing stop words – we remove articles (“a”, “an”, “the”) from the bag of words.

• Constructing n-grams – we make a set of n-grams out of consecutive words. A negation (such as “no” and “not”) is attached to a word which precedes it or follows it.

Page 6: Twitter Sentiment Anlaysis using Hadoop

6

• Convert the raw Twitter data into a tabular format.

• Use a dictionary file to score the sentiment of each Tweet by the number of positive words compared to the number of negative words, and then assign a positive, negative, or neutral sentiment value to each Tweet.

• Create a new table that includes the sentiment value for each Tweet.

• Project the sentiment grouped according to geographical location of the users in an interactive, Excel Map visualization.

Demo – Twitter Sentiment Analysis - Continued

In this demo, we will:

**References: https://tweetinvi.codeplex.com/ http://hortonworks.com/

Page 7: Twitter Sentiment Anlaysis using Hadoop

7

Feed us back

• Support Team’s blog: http://blogs.msdn.com/b/bigdatasupport/ • Facebook Page: https://www.facebook.com/MicrosoftBigData • Facebook Group: https://www.facebook.com/groups/bigdatalearnings/ • Twitter: @debarchans

Page 8: Twitter Sentiment Anlaysis using Hadoop

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Thank You!