10
Sentiment Analysis Introduction Data Source for Sentiment analysis Sentiment Analysis: Problem definition Sentiment Analysis Tools Current Research Problems

Sentiment Analysis Introduction Data Source for Sentiment analysis Sentiment Analysis: Problem definition Sentiment Analysis Tools Current Research

Embed Size (px)

Citation preview

Page 1: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Sentiment Analysis

IntroductionData Source for Sentiment analysisSentiment Analysis: Problem definitionSentiment Analysis ToolsCurrent Research Problems

Page 2: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

IntroductionSentiment Analysis: Sentiment analysis deals with the computational treatment of opinion, sentiment, and subjectivity of texts. Sentiment analysis starts with a small question: “What other people think?”, and finally convert into billions of dollars of commercial deal. After the great success of Web-2.0, sentiment analysis became a demanding and commercially supported research field. Important Points:

1. Actually, Web 2.0 site gives its users the free choice to interact or collaborate with each other in a social media dialogue as creators of user-generated content in a virtual community.

2. This resulted in: social-networking sites, blogs, wikis, video-sharing sites, hosted services, web applications, mashups and folksonomies etc. Now the huge increment in internet users (see the chart below, source: http://www.internetworldstats.com/stats.htm)

Page 3: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Data Source for Sentiment analysisImportant Points:

1. Data used in Sentiment analysis, generally contains unstructured text data from (1) blog posts, (2) user reviews (about any product), (3) chatting record, (4) opinion poll, etc.

2. It may contain several noisy symbols, casual languages and emotion symbols. For example, if you search \hungry" with an arbitrary number of u's in the middle (e.g. huuuungry, huuuuuuungry,huuuuuuuuuungry) on Twitter, there will most likely be a nonempty result set.

Dataset:

1. Cornell University dataset: It contains Movie Review Data, Sentiment polarity datasets, Sentiment scale datasets and Subjectivity datasets. The url: http://www.cs.cornell.edu/People/pabo/movie-review-data/

2. Wiki Blog Lists: It contains web lnk of a large number of famous English blogs and can be obtained from : http://en.wikipedia.org/wiki/List_of_blogs

3. BLOGS06 (Macdonald and Ounis, 2006) collection: It contains 148GB crawl of approximately 100,000 blogs and their respective RSS feeds. The collection has been used for 3 consecutive years by the Text REtrieval Conferences (TREC). he data set can be found at http://www.trec.nist.gov

4. Multi-Domain Sentiment Dataset (version 2.0): The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains). Some domains (books and dvds) have hundreds of thousands of reviews (http://www.cs.jhu.edu/~mdredze/datasets/sentiment/)

Page 4: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Sentiment Analysis: Problem definitionAnalyzing sentiment using Clear Review: Such reviews contain either negative or positive opinion about product, or topics(s). It is very simple to identify the positive or negative sentiments. For Example:

Product Reviews: Inspiron 1525

Title: Where has customer service gone?

Review: I have an inspirion 1525-it was not listed in the models to review. DO NOT BUY THIS COMPUTER!!! The LCD has cracked after less than 9 months and Dell refuses to fix it under warranty. They will not tell me why they will not fix it and now after sending it all the way to Ontario to find out why there was lines on my screen- the service depot has returned it to me and the keyboard no longer functions. I can not use it all all now-lines or not!!!

Title: Insprion 1525

Review: I rec'd my Inspiron 1525 about 1 month ago, and I LOVE it!! It is quicker than my Dell desktop, very portable and I love Windows 7. I opted for the 6 cell battery and am so thankful that I did - I almost wish I would have got the 9 cell. So, if you are looking for an everyday computer - this one is a great deal and a great computer...but I would reccommend upgrading the battery!

RESULT: 1 out of 2(50%) customers would recommend this product to a friend.

Page 5: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Sentiment Analysis: Problem definition

Analyzing Sentiment using Multi-theme documents: In such type of document problem statement does not always remain so clear. It can be categorized into several different problems and successful analysis of sentiment depends on a lot of issues including (but not limited to):

1. Some time such texts contain multiple sentiments related to two or more than two issues. 2. Some time such documents contain both kinds of sentiments. i.e., negative and positive both.

Here, the identification of most effective one is a major issue. 3. In some cases the problem can be converted into multi-subjective sentiment analysis.

Example: “(1) I bought an iPhone a few days ago. (2) It was such a nice phone. (3) The touch screen was really cool. (4) The voice quality was clear too. (5) Although the battery life was not long, that is ok for me. (6) However, my mother was mad with me as I did not tell her before I bought it. (7) She also thought the phone was too expensive, and wanted me to return it to the shop. … ”

Description: The above text contains total seven sentences. Contains, both kind of sentiments; i.e. positive sentiment w.r.t. buyer and negative sentiment w.r.t. his mother. It contains two issues, i.e. quality of product (a positive sentiment is attached with it) and cost issues (negative sentiment is attached with this issue), so decision of more important sentiment is also a problem.

Page 6: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Current Trends and Techniques

Some novel approaches:

1. Document level sentiment classification: This technique, identifies whether the given document contains positive or negative sentiment about any topic. Generally classification techniques are used to solve these issues. The general features used in these techniques are: (1) terms and their occurrence frequency (for example the use of Tf-Idf), (2) POS taggers, (3) Opinion words and phrases, (4) Syntactic dependencies and (5) negative & Positive words.

2. Using unsupervised learning: For example, the use of POS tagger to identify two word phrases. It estimates the orientation of the extracted phrases using the Pointwise mutual information (PMI).

3. Sentiment analysis at sentence level: Techniques using this approach, considers the sentences as the source of single opinion. For a given a sentence s, it applies two sub-tasks: (a) Subjectivity classification: Determine whether s is a subjective sentence or an objective sentence, and (b) Sentence-level sentiment classification: If s is subjective, determine whether it expresses a positive or negative opinion.

4. Some Other Approaches: For example, the use of a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. To deal with the problem of topic shift within blog articles, it proposes text extraction techniques to create topic-specific sub-documents, which is used to train a sentiment classier. It shows that such approaches provide a substantial improvement over full document classification and that word-based approaches perform better than sentence-based or paragraph-based approaches.

Page 7: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Sentiment Analysis Tools

Twitter Sentiment (http://twittersentiment.appspot.com/): It is freely available, simple sentiment analysis tool. It provides the following facilities: 1. Brand management (e.g. windows 7), 2. Polling (e.g. obama), 3. Purchase planning (e.g. kindle), 4. Technology planning (e.g. streaming api), 5. Discovery (e.g. iphone app).

Techniques applied: It uses N-grams (N=1, 2, and 3) to identify the emotions attached with Twitter statements, for this it uses Stanford POS-Tagger. It removes the emoticon (icons which shows emotion, i.e. J etc.), as it can misguide the final solution. Finally, it applies three different classifiers i.e. (1) Naïve Bayes classifier, (2) Maximum entropy based model and (3) Support Vector based model, to classify the sentiments of Twitter statements.

Page 8: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Sentiment Analysis ToolsLingPipe (http://alias-i.com/lingpipe/index.html): LingPipe is a computational linguistic based text processing tool-kit. It considers the sentiment analysis as classification problem. It categorizes the entire problem into two classes:

Subjective (opinion) vs. Objective (fact) sentences. Positive (favorable) vs. Negative (unfavorable) movie reviews.

Method Used: It uses the concept of sentence polarity. To determine this sentiment polarity, it proposes a machine-learning method that applies text-categorization techniques to just separate the subjective portions of the document. For this, as depicted in Figure 1, it uses a subjectivity detector that determines whether each sentence is subjective or not: discarding the objective ones creates an extract that should better represent a review's subjective content to a default polarity classifier. Finally a graph-cut (basically Min-cut) algorithm is applied to partition the negative and positive sentiments.

Page 9: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

Current Research Problems

This section presents some problems and issues, which require more focus to achieve better result in the field of Sentiment analysis.

1. Instead of concentrating only on either (a) document level, (b) paragraph level, (c) sentence level or (d) feature based approach; can a better combination of all the above discussed technique give better result?

2. Topic-shift in sentence is still list studied for sentiment analysis. 3. We generally use sentiment labels ranging from (1) Very Negative to

Very Positive: Very Negative, Negative, Neutral, Positive, Very Positive; (2) Negative to positive; (3) Negative to neutral; (4) Positive to neutral etc. In most of the paper that I read; I found they use this type of shift in classification. There should be some effect of such shifting and including this effect may give more effective result.

Page 10: Sentiment Analysis  Introduction  Data Source for Sentiment analysis  Sentiment Analysis: Problem definition  Sentiment Analysis Tools  Current Research

References• Turning conversations into insights: A comparison of Social Media Monitoring Tools; A white paper from FreshMinds

Research 14th May 2010;FreshMinds 229-231 High Holborn London WC1V 7DA Tel: +44 20 7692 4300 Fax: +44 870 46 01596 www.freshminds.co.uk.

• Alec Go; Richa Bhayani; Lei Huang; Twitter Sentiment Classification using Distant Supervision; Technical report, Stanford University.

• Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP Proceedings.

• Bo Pang and Lillian Lee. 2004. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. ACL Proceedings.

• Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL Proceedings.

• Chenghua Lin, Yulan He;Joint Sentiment/Topic Model for Sentiment Analysis; CIKM’09, November 2–6, 2009, Hong Kong, China.Copyright 2009 ACM 978-1-60558-512-3/09/11.

• P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,” Proceedings of the Association for Computational Linguistics (ACL), pp. 417–424, 2002.

• R. Ghani, K. Probst, Y. Liu, M. Krema, and A. Fano, “Text mining for product attribute extraction,” SIGKDD Explorations Newsletter, vol. 8, pp. 41–48, 2006.

• E. Riloff, S. Patwardhan, and J. Wiebe, “Feature subsumption for opinion analysis,” Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2006.

• Prem Melville, Wojciech Gryc, Richard D. Lawrence; Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification;KDD’09, June 28–July 1, 2009, Paris, France.Copyright 2009 ACM 978-1-60558-495-9/09/06.

• Neil O’Hare, Michael Davy, Adam Bermingham, Paul Ferguson,Páraic Sheridan, Cathal Gurrin, Alan F.meaton1; Topic-Dependent Sentiment Analysis of Financial Blogs; TSA’09, November 6, 2009, Hong Kong, China.Copyright 2009 ACM 978-1-60558-805-6/09/11.