32
SU4 Project Proposal Implementation of opinion mining technique Group Members ItsariyaSaniwong Na Ayuthaya (5422791277) ThanchanokKlabsong (5422791889) Advisor: SasipornUsanavasin School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University

Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

SU4 Project Proposal

Implementation of opinion mining technique

Group Members

ItsariyaSaniwong Na Ayuthaya (5422791277)ThanchanokKlabsong (5422791889)

Advisor: SasipornUsanavasin

School of Information, Computer and Communication Technology,Sirindhorn International Institute of Technology,

Thammasat University

Semester 2, Academic Year 2014

Date March 1, 2015

Page 2: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Table of Contents1 Introduction.......................................................................................................................1

2 Background........................................................................................................................3

3 Objectives..........................................................................................................................5

4 Outputs and Expected Benefits.........................................................................................5

4.1 Outputs..........................................................................................................................5

4.2 Benefits.........................................................................................................................5

5 Literature Review..............................................................................................................6

6 Methodology....................................................................................................................10

6.1 Approach.....................................................................................................................10

6.2 Tools and Techniques.................................................................................................11

7 Project Schedule..............................................................................................................19

8 Project Progress (optional)..............................................................................................20

9 References.......................................................................................................................21

Page 3: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Statement of ContributionBy submitting this document, all students in the group agree that their contribution in the project so far, including the preparation of this document, is as follows:

ItsariyaSaniwong Na Ayuthaya (5422791277) 50%

ThanchanokKlabsong (5422791889) 50%

Page 4: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

IntroductionSocial network is the connection between on social to another social and related to gather through the network system that connected between person to person or person to group, these websites have more factor to lead them know each other for more convenience to build the connection as well as make the attraction details to become the community that people can share everything to people in the network by many method with intend to the social network service increasing according to the user behavior who addicts social network.

Social Media is becoming increasingly important in Thailand and is used for both business and fun. Out of the 66 million people in Thailand, 25 million have access to internet, and more than 18.5 million are social media users meaning that 28% of the people, or roughly a third of the country, uses social media. For every four people accessing internet in Thailand, three use social media and it’s still growing. In comparison to the world, Thailand is above average since one in four people use social media in the world compared to one in three in Thailand. The gender ratio of social media users is the same as that of the entire nation 74% of the users is between 13 and 34 years old.

ZocialInc was an observant company for social media scene in Bangkok. In their newest info graphic report, the crew has found that the biggest social gainer in Thailand is Instagram, which has seen 163 percent growth in users in the country in the past 12 months. Facebook, in contrast, has slowed down in the country, seeing only 28 percent growth and Twitter has still 54%. But in terms of sheer numbers, Facebook is still way ahead with 18 million.

Facebook, Twitter and LINE are the main players in social commerce. Social commerce is the center between producer and the customers with the special characteristic which can suddenly communication each other. User Recommendations and Referrals is Technology development that still support in social network system for more effective faster and it is one of the choice that the organization can adapt for contacting to the personal outside or inside because this is the low cost items but be popular. This is the way to easily get close to the customers and the people in the organization as well as spreading many knowledge, making a perfect image to the organization.

On the other hand, Social medial still has some weakness and some limited using on account of this is the easy way to contact each other is fast. So, if some information or text has broadcasted thought social network by the negative will show that others might bad image the organization in bad side

However, the limited of social network, we have to realize whether the organization that they can whether connect by social network to their organization target or not.

Nowadays, Marketing believe the concept of social medial to consume that is the one of the way to communicate in the marketing by the time and when planning the advertisement, public relation and the most important that we have to attend is the reaction from the customers. Searching information pass the social media is one of the method which be convenience to search. Social media in the presents often have another public commentator about related to many topics in many sides like talking about the products or service or complain tin many reasons and the condition about the decision have different, some customer might interests in the produce or some might infracted in service or promotion. The presentation in the feature level will surely help someone attract the view point that they want easier than reading from the commenting box and when all result has been combined in a chart format and the percentage. This is the good way to know about the concept of the total

School of ICT, SIIT 1

Page 5: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

comments without reading all comments yourself and can simply decision whether produce more or not or how weakness we have to improve to closing the target.

Opinion mining for business is the system that can analyze information in the format of commentary by using general language in the principle of analytical and sum up from the attitude in the level of feature-based sentiment analysis and summarization that showing the comment in the summary, simply to understand in the format of web application and show the numeric in the comments and can reread the previous message which is analyzed from these topic of interesting new point

School of ICT, SIIT 2

Page 6: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

BackgroundSocial media is the social interaction among people in which they create, share or

exchange information and ideas in virtual communities and networks. There are several types of social media such as Blogging, Twitter and Microblogging, Social networking, and Media Sharing. However, social network provides an alternative way for people to chat or exchange an opinion to each other and in the case of business, most of customers use social network to search for the data, review or comment related to a particular product.

Application Programming Interface (API) is a set of functionalities that are independent of their respective implementation, allowing both definition and implementation to vary without compromising each other. API can be used to ease the work of programming graphical user interface components, to allow integration of new features into existing applications, or share data between otherwise distinct applications. For example, have many of website connecting with twitter by read the content from twitter and also sent the data into twitter that are the connection to exchange the data in term of API . Another example is Google Maps API – another services of Google that many website use for show their contact.

The opinion mining or sentiment analysis is a type of natural language processing (NLP). It is a field of study that analyzes and summarizes people’s opinion, sentiment, evaluation, attitudes, and emotion towards entities such as products, services, organization, individual, issues, and topic from written language in term of comment, review, blog, or tweet. In fact, the opinion mining is the center of the social media research. The opinion mining also have a profound impact on political science, economics, management, and social science as they are all affected by people’s opinion. In general, comment or review on the webpage are divided into two types includes fact and opinion. Fact is the real data or document that can prove. Opinion is what a person think about particular topic. There are two types of opinion are Direct opinion – can tell the attitude of writer or speaker clearly, for example, “This camera have high-quality”. Another types is Comparisons – the opinion of comparing with two objects, for example, “Camera A expensive than Camera B”.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. In general, sentiment analysis have investigate mainly at three levels. First, Document Level – to analysis each document expresses opinions on a single entity. The task for this level is to classify whether a whole opinion document a positive or negative sentiment. Second, Sentence Level – to analysis the closely related to subjective classification which distinguish sentences the express factual the information from sentence that express subjective views and opinions. The task for this level is to sentences and determines whether each sentence express a positive, negative or neutral opinion. Last, Feature Level – to identify and extract object features that have been comment by an opinion holder. Opinion holder is the person or organization that holds a specific opinion on a particular topic. The task of this level is to determine polarity of an opinion whether a positive, negative, or neutral.

Nowadays, there is an increasing number of reviews and comments in particular topic such as product and services on the website, which have types of comment and rating. However, the reviewers give the comment and rating that are irrelevant. The problem is other reviewers who read and comprehensive other comment and analysis their rating and some comment are not clearly understand. So, the opinion can be useful in several cases. It is able to help the organization or business to know feedback from their customer for improving and developing the product to be perfect.Can help the marketer to evaluate the successful of advertising campaign, product or services. It can help the individual customer to make a

School of ICT, SIIT 3

Page 7: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

decision to buy the product. It can help government to do a research on the opinion of people in the country. For example, you are the owner of company and you sell the product, usually you want to know the feedback from your customer, which like or dislike with your products and you also know whether the customer wants to drop or add some feature in your products. Another example is when you buy a smart phone, usually you will check the opinion of other user for making a decision.

In this senior project, we will develop the web application for analyzes an opinionof user who use social media to comment or review about a brands, product, services or topic. So, w interest the mention of user on Twitter. Twitter is an online social network service that allow the user to send and read short message call “Tweets”. The reason that we choose Twitter because the Twitter has short 140-character messages, it easy to analyze the opinion and in Twitter there are have “Retweet”, Retweet is a re-posting of someone else’s tweet or content.Retweet are important can makethe comment reliable. However we will analyze the opinion and determine whether positive, negative, and neutral andshow on webpage as summarizes in term of calculate the percentage in each positive, negative, or neutral. Finally, our web application is useful for the customer and the owner know an opinion of each user or feedback about any parts of products to improve and developed it correctly.

School of ICT, SIIT 4

Page 8: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

ObjectivesThe aim of our project is to summarize and analyze an opinion of who use the social

media to mention about particular topic such as products, services, politics, actors, or some issue.

- Study and analyze the meaning of Opinion Mining such as separating a sentence and determine which one is a comment, which one is a query, which one is a question and whether those sentences are conducted in positive, negative or neutral intent.

- Design and develop a system of analyzing the comments for products and companies for trader by displaying result in statistics and percentages form in order to obtain the most accurate information as close as possible.

- Pull all of the data in social network in form of the web applications and design web application to make it less complicated.

- Develop the system so that it can support the assessment of clients in English version.

Outputs and Expected Benefits

3.1 Outputs

We’re creating system for evaluating and analyzing opinion of user who use social media to comment or review about particular topic. The opinion will show only a specified detail on web application, this website will show the comment from Twitter, for the comment they can search for specific word in the comment by using keyword, can choose date for view the comment only that day, and we also show calculate percentage of each positive, negative, or neutral comment for make the customer and organization to know the whole rating of user’s comment easily.

3.2BenefitsIndividual customer:

- Saving time to view other comment for each particular topic.- Can help the customer to make a decision before buy the product.

Organization or Business:- Saving time to view the comment whether positive, negative, neutral of product, ser-

vice and Ad campaign.- Can help the marketer or Customer Relationship Management (CRM) to analyze the

opinion of each customer on products or services, to evaluate the customer satisfac-tion.

School of ICT, SIIT 5

Page 9: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

Literature Review

Opinion mining is a field that study analyses and summarize people’s opinion on the particular topic such as product, service or issue from written language in term of comment, review, blog, or tweet in Twitter. Most of people shared the opinion on different aspects of life every day. Millions of messages are preparing daily in popular social media such as Facebook, Twitter, or Blog. Opinion important for people to make a decision, for example, before buy some product, you will read some review from the person who ever used that product for make your decision.Another example if for the owner of company to make the decision about their product to improve or develop the product.

We develope web application that allow the user to discover the opinion of mention about a brands on Twitter and then classified the comment whether positive, negative and neutral. Our web application is useful customer who want to research the opinion of product before purchase, or organization that want to know the publish opinion of their brands.For our User Interface, there are the feature of our application:

Show a graph that represent the percentage of a type of the comment whether positive, negative, and neutral.

Show the comment of each user tweet on Twitter that classified the comment into positive, negative, and neutral.

Can select the date to see the comment during specific of time. Can search the comment by specific word or phrase. Ranking the comment from the most number of retweet.

That we mention above, Twitter is an Application Programming Interface (API). The Twitter is a popular social media where the user to send and read short 140-character called “Tweet”. Each tweet express opinion about different topics. Another one that we focus on Twitter is “number of retweet”,Retweet is someone else's Tweet that you choose to share with all of your followers. If have most number of people retweet, it’s mean that the most of people see the comment and the reader will comply with the comment.

There are already have some opinion mining web application in the area, for example, Sentiment 140 (known as “Twitter Sentiment”) [1] [Figure 1,2], is the web application that allow the user to discover the sentiment of a brands, product, service, or a topic on Twitter. Sentiment 140 was created by computer science graduate student at Stanford University. According to their research paper, they used machine learning algorithms (Naïve Bayes, maximum entropy classification, and support vector machine (SVM)) to build the classifiers, show the classification individual tweet, show aggregated number to assess how accurate their classifier, and used the emoticons as noisy labels for training data.

School of ICT, SIIT 6

Page 10: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

School of ICT, SIIT 7

Figure 1.http://www.sentiment140.com/

Figure 2.http://www.sentiment140.com/

When searching the comment by a short keyword, it’s also show the tweet on twitter and calculate the percentage.

Page 11: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

Advantage and Disadvantage of other application.There are the different of our application from the existing application:

Classification Technique

1. Decision TreeDecision trees are trees that classify instances by sorting them based on feature

values. Each node in a decision tree represents a feature in an example to be classified. Instances are classified starting at the root node and sorted based on their feature values. Decision trees can handle both categorical and numerical data [2].

2. Naïve BayesThe Naive Bayesian classifier is based on Bayes’ theorem with independence

assumptions between predictors. A model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets [3].

School of ICT, SIIT 8

Opinion mining Sentiment 140Show the comment Show graph that represent the percentage of comment. Search comment English language Spanish language - Classify comment by type. -Arrange comment from retweet. -Filter the time. -

Page 12: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

3. K-Nearest NeighborsK-Nearest Neighbors is a simple algorithm that stores all available cases and

classifies new cases based on a similarity measure (e.g., distance functions). K-Nearest Neighbors is classified by a majority vote of its neighbors, with the case being assigned to the class most common amongst its K-Nearest Neighbors measured by a distance function [3].

4. Neural NetworkNeural Networks are analytic techniques model after the processes of learning in

the cognitive system and the neurological functions of the brain and capable of predicting new observations on the specific variables from other observations (on the same or other variables) after executing a process of so-called learning from existing data [4].

5. Support Vector Machine (SVM)SVM classification is based on the concept of decision planes that is the separates

between a set of objects having different class memberships. SVM finds the vectors ("support vectors") that define the separators giving the widest separation of classes.SVM models have similar functional form to neural networks, both popular data mining techniques [5].

Thus, we focus on the Support Vector Machine (SVM) as Classification Technique to classified type of the comment whether positive, negative, and neutral. See more detail of Support Vector Machine (SVM).

School of ICT, SIIT 9

Page 13: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Twitter

Web serverOpinion mining system

System Management Module

Opinion Mining Module

Feature IdentificationModule

Opinion SentimentClassification ModuleLocal database

Summarized Opinions

Senior Project 2014 Short Project Name

Methodology

5.1 Approach

Opinion Mining ProcessThere are 4 step of theOpinion Mining Process [Figure 3.].

STEP 1 : Retrieve data from Twitter.

We retrieve data from Twitter via using Zapier to retrieve tweet message and the massage will automatically keep into Google Doc Spread Sheet.Zapier is a service which allow to extraction the data from web application to another web application. See more section 5.2.1

STEP 2 :Put data into local database.

To put the message into local database via using Xampp to support for creating the database in phpMyAdmin. See more section.See more section 5.2.2

School of ICT, SIIT 10

Figure 3.Opinion Mining Process

1

3

2

4

Page 14: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

STEP 3 : Clean the data.

Use the data obtains from the data that retrievalinto the Feature Identification Module process and we using RapidMiner program to do this process.There arecom-posing of three step:

Feature selection, the process to find the frequency of word that occurs in the content. The feature selection of the document is delivering the documents and converted provides documentation of the same element replaces the word us-ing a single syllable, phrase or sentence to a set of values. See more section 5.2.4.

Text Cleaning, the process to stemming word to root and clear all the stop words, that is the pronouns, conjunctions, prepositions, numbers and as well as the ends of sentences.See more section 5.2.4.

Text representation, this process using Natural Language Processing (NLP) to arrange the unstructured data into the structure data.See more section 5.2.4.

STEP 4 :Classified the messages into positive, negative, and neutral.

Classify the sentence to determine the sentence whether positive, negative, or neut-ral by use SVM Classifications and calculate into a percentage of each type.

STEP 5 :Summarized Opinions and design into Web application.

5.2 Tools and TechniquesPresently, twitter has trend to simplify the expression of opinions by topics. Also, the

expansion of social networking, help users posting opinions online. Thus, the content of reviews has increased rapidly, making the big e-commerce sites, or recommendations of products and services sites, to contain large amount of reviews per item. The reviews access to useful. For example, to compare offers from different competitors on the market and make an informed decision about buying a certain offer. It is very difficult to read all an opinion on the subject or product because:

Some sentences may explicit opinion or some are not and in some case these reviews can be very long. A false impression can be creating by navigation part.

The various metrics is not familiar with the user comparing in a certain specialized field. And the reviews make it difficult to produce because it has a large number. Also, it is difficult to track the range of products because variety of web sites.

So, conducting a system to detect indicators of performance of a product, and domain specific metrics are very useful.It can be used to summarize the opinions obtained from the large amount of reviews, in several positive and negative aspects. To determine the opinions, existing techniques in opinion mining.

School of ICT, SIIT 11

Page 15: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

5.2.1 ZapierZapier is a service which allows web applications to retrievethe data by providing

integration platform for a number of popular web services e.g. YouTube, Gmail, Paypal etc. So, we use Zapier to retrieve the message on Twitter into a Google Docs spread sheet and then copy the data across to local database to mine it for sentiment trends. For Twitter, we select “Search Mention” option to retrieve the mention of each user and another one is for Google Docs, we select “Create Spreadsheet Row” to connect with spreadsheet that we created. [12] [Figure 4.].

5.2.2 Google Doc Spread Sheet.Google Doc is a web service that allows users to create and edit documents

online while collaborating with other users in real-time. Google Sheet is where we keep the content that retrieval from Twitter. Google Sheet also can export the file format as Excel file or CSV file, CSV is Comma Separate Value, the file that has separate the attribute with a comma.[13] [Figure 5.]

School of ICT, SIIT 12

Figure 4.https://zapier.com/

Figure 5: the data that keep in Google Spread Sheet

Page 16: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

5.2.3 XAMPPXAMPP is aopen source cross-platform web server solution stack package, consisting mainly of the Apache HTTP Server, MySQL database. To allow website designers and programmers to test their work on their own computers without any access to the Internet.XAMPP also provides support for creating and manipulating databases in MySQL and SQLite among others. So, we use XAMPP to support for creating the database in phpMyAdmin. [Figure.6]

School of ICT, SIIT 13

Figure 6: https://www.apachefriends.org/index.html - XAMMP

Figure 7: phpMyAdmin- where we create database and keep the data into database

Page 17: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

5.2.4 RapidminerRapidminer is a software platform that provides an integrated environment for

machine learning, data mining, and text mining [9].Rapidminer starting in 2006, its development was driven by Rapid-I, a company founded by Ingo Mierswa and Ralf Klinkenberg. In 2013, the name of the software was changed to Rapidminer.RapidMiner boasts a strong and successful list of users, including EADS, GfK, Lufthansa, PayPal, Pepsi, Sanofi, Siemens, Telenor, and Volkswagen. So, we use the Rapidminer Studio to do the feature identification module process.There are three step of feature identification module process:

At first we have to import the content that we keep into local database [Figure 8]

STEP 1 :Feature selection - the process to find the frequency of word that occurs in the content [Figure 9]. As a picture, there are the comment of each user review about IPhone6. Each comment have a similar of word. Thus, this step will count the frequency of each word in each comment.

STEP 2: Text Cleaning – the process to clean data [Figure 10] there are the operator that we use to clean data:

Tokenize (non letter, linguistic sentences) – to remove non letter and other language.

Filter Stop word – to clear all the stop words, that is the pronouns, conjunctions, prepositions, numbers and as well as the ends of sentences [Figure 11].

Stem – to convert the word to root word. Root Word, base of word, the word that does not have a prefix (in front of the word) or a suffix (at the end of a word), can see the example in Figure 12

Transform Case – to convert text from Upper case to Lower case.

School of ICT, SIIT 14

Figure 8: Import database file into Rapidminer.

Page 18: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

School of ICT, SIIT 15

Figure 9: Feature selection step

Figure 11: Remove stop word Figure 10: Text cleaning process

ID Tweet

1 Disappointed in the specs of theI

2 Iphone6 is the best; better than the rest!!

3 IPhone6 should work on card slot, camera and battery 4.7 size is good for use, and all specification are good.

4 its amazing phone

5 IPhone 5s model is better than IPhone6IPhone 5s model is better than IPhone6Camera and battery are so poor

6 I think S series of the IPhone6 are better.I think S series of the IPhone6 are better.I think S series of the IPhone6 are better.

IPhone6

IPhone6

IPhone6

IPhone6

IPhone6

Page 19: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

STEP 3: Text representation - this process using Natural Language Processing (NLP) to arrange the unstructured data into the structure data [Figure 13]. Unstructured data is information that doesn't in term of row-column database, for example, e-mail, messages, word processing documents, videos, photos, audio files, presentations, and webpages. Structured data is the data stored in fields in a database, for example, Excel file, database file, and spreadsheets.

5.2.4 Classification TechniquesSupport Vector Machine (SVM)

Support vector machine is used to search for a decision boundary between two classes that is located perfectly away from any point in the training data [Figure 9]. SVM introduces a hyper planes or a set of hyper planes in infinite dimension space. This distance from decision surface to closest data point measures the margin of classifier. So the hyper planes act as decision surface which act as criteria to decide the distance of any data point from it. The margin of classifier is calculated by the distance from the closest data point. This will eventually create a classification but a slight error will not cause a misclassification. So we focus to use support vector machine (SVM) as a classification techniques to classify the sentence whether positive, negative, and neutral.

School of ICT, SIIT 16

Figure 12: Stemming to the root word

Figure 13: Convert unstructured data to structured data form

Page 20: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

5.2.5Opinion Sentiment AnalysisOpinion mining or sentiment analysis is the study that aims to classify the sentiment.

After using Twitter API we discover a corpus of text posts and formed a dataset of three classes: positive sentiments, negative sentiments, and a set of objective texts (no sentiments) as following example:

• Harry is angry → negative

• The sun is absolutely beautiful today → positive

Then we found two types of emoticons that relate a sentiment

• Happy emoticons: “:-)”, “:)”, “=)”, “:D” etc. → positive

• Sad emoticons: “:-(”, “:(”, “=(”, “;(” etc. → negative

Tweeting in tweeter is maximize to not more than 140 characters; thus the tweeting messages are usually come out as a single sentence. Therefore, we assume that an emoticon in a message represents an emotion for that message as a whole and each words of the message are related to that emotion.

Negation Rules A word with the negative meaning such as no, not and never and also some words that follow patterns such as

“Stop” + “vb-ing”, “quit” + “vb-ing” and “cease” + “vb-ing” change the orientation of opinion words in the following way:

I. Negation Negative → Positive

II. Negation Positive → Negative

III. Negation Neutral → Negative

Some examples for each negation rule defined above:

I. “no problem”

II. “Not good”

III. “Does not work”

TOO Rules

The word “too” is usually referred to a negative meaning if it is placed in front of adjectives. The concept is to apply this rule to words which have orientation dependent on context.

School of ICT, SIIT 17

Figure 9.Support Vector Machine (SVM)

Page 21: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

I. “The battery life lasts long”

II. “The initialization time takes too long.”

So when a “too” word is found before an opinion word, the orientation of the opinion should be negative.

6 Project ScheduleTask Description Person Duration Deadline Status

Research Phase

Research about API which easy to pull data ITSARIYA 1 w 2 Sep 14 100% Complete

Choose the interested topic in social media

ITSARIYA

1 w 2 Sep 14 100% Complete

Research what is opinion mining THA 1 w 2 Sep 14 100% Complete

Research about influencer in social media

ITSARIYA

1 w 9 Sep 14 100% Complete

Research type of opinion in social media

ITSARIYA 1 w 16 Sep 14 100% Complete

Learn Phase

School of ICT, SIIT 18

Page 22: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

Learn how to use Rapidminer studio6 to analyze an opinion

IT & THA 3 w Semester 1 100%Complete

Learn how to do the text segment ITS&THA 3 w Semester 1 100%

Complete

Learn how to put the data on web application

ITS&THA 3 w Semester 1 100% Complete

Develop Phase

Design user interface ITS&THA Semester 2 -

Analyze a comment ITS&THA Semester 2 -

Do the text segment ITS&THA Semester 2 -

Develop web application ITS&THA Semester 2 -

Calculate an opinion in term of percentage

ITS&THA Semester 2 -

7 Project Progress (optional).

According to our project plan, First we select the topic to specific the scope. Next step we retrieve the mention of the user on Twitter. Then we put the content that retrieval into local database. After that we clean the data, including to remove all non letters and other language, to remove the stop word such as a, an, the and to convert the word to root word. Finally we will get the clean data that ready to classify into positive, negative and neutral.So, now we have done only this part. For the next step that we going to do is to classified the content whether positive, negative, or neutral and calculate the percentage of each other and then display the classified data onto the application webpage.

School of ICT, SIIT 19

Page 23: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

8 References[1] Alec Go, RichaBhayani, Lei Huang. Twitter Sentiment Classification using Distant

Supervision.

[2] S. B. Kotsiantis,Supervised Machine Learning: A Review of Classification Techniques.

[3] Dr.SaedSayad, http://www.saedsayad.com/data_mining_map.htm

[4] StatSoft World Headquarters, http://www.statsoft.com/Textbook/Data-Mining-Techniques#eda

[5] Oracle Help Center, https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/algo_svm.htm#CHDDJFDJ

[6] Pravesh Kumar Singh, MohdShahid Husain. ANALYTICAL STUDY OF FEATURE

EXTRACTION TECHNIQUES IN OPINIONMINING.[7] Felipe Jordão Almeida Prado Mattosinho Born: 02.06.1983, Cruzeiro Brazil. Mining

Product Opinions and Reviews on the Web.[8] Ion SMEUREANU, Cristian BUCUR. Applying Supervised Opinion Mining

School of ICT, SIIT 20

Page 24: Project Proposalict.siit.tu.ac.th/senior-itcs/midterm/report-su4.docx · Web viewStem – to convert the word to root word. Root Word, base of word, the word that does not have a

Senior Project 2014 Short Project Name

Techniques on Online User Reviews.

[9] Rapidminer, www.rapidminer.com

[10] Stefano Baccianella, Andrea Esuli, and FabrizioSebastiani, SENTIWORDNET 3.0: An Enhanced Lexical Resourcefor Sentiment Analysis and Opinion Mining.

[11] Julia Kreutzer &Neele Witte, Opinion Mining Using SentiWordNet.

[12] Zapier, https://zapier.com/app/dashboard.

School of ICT, SIIT 21