RM World 2014: Sentiment analysis on arabic tweets using RapidMiner

Sentiment Analysis on Arabic Tweets Using RapidMiner

Student Name: Salha al osaimi

Supervised by: Dr. Khan Muhammad Badruddin

Agenda

• Introduction

• Motivation

• Challenges Related to Arabic Language

• Experiment Steps

• Experiment Results

• Conclusion

Introduction

• In social media, there are a lot of informal messages that are posted

every day. Most of these messages are used to describe the sender’s

feelings and emotions.

• Millions of Twitter’s tweets with modern Arabic content provide a

challenging opportunity to understand the emotions of their

producers.

• Sentiment analysis is needed to help in understanding of the

emotions in this informal communication.

• One of the main objectives of sentiment analysis is to extract

sentiment of a given text by classifying it as positive, negative, or

neutral

Motivation

Our Focus: how to address the challenges of informal Arabic

sentiment analysis. For this purpose, we used RapidMiner to

manipulate Arabic text.

1. The work has application in Education, Business, Technology ,

Security and almost every field.

2. Working in this area means doing cutting-edge research.

3. The task is challenging in nature

Arabic sentiments analysis challengesArabic sentiments analysis has many challenges such as the following:

• complexity of Arabic language in terms of both structure and morphology.

• Arabic grammar is highly complex.

• The variety of different Arabic dialects.

• Arabic language contains many word forms and diacritic

• Arabic language is a derivational language

• Semantic dictionaries or lexicons for Arabic sentiment mining are limited.

• most of Arabic language in the internet is written in informal language

which is unstructured in nature.

Experiment steps

This resarch aims to investigate how to address the challenges of

informal Arabic sentiment analysis. For this purpose, we used

RapidMiner to manipulate Arabic text. We performed the

experiments, evaluated the results of different text processes

and then explored the problems and tried to fix them.

Figure.1 shows the sentiment classification process for Arabic

tweets using RapidMiner.

Experiment steps (cont.)

Figure.1 Sentiment classification process for Arabic tweets using RapidMiner

Experiment steps (cont.)

The experiment steps can be describes as follows:

• Data

After collection of the tweets by using twitter’s API library, 3000

tweets were randomly picked for creation of the text corpus. Then,

we determined the sentiment of each tweet (positive, negative,

neutral) manually. Each collected tweet contained at least one

emotion icon. At the end of this step we have 3000 labelled (1000

positives, 1000 negatives, 1000 neutrals).

Experiment steps (Cont.)

• Text processing

The text processing is very important for text mining to prepare data

for classification step. Figure.2 illustrates the steps of text prepressing.

Figure.2 Text processing steps in RapidMiner.

In the text processing, we performed this task in four steps that are

described below:

1- Tokenization

The tokenization step was preformed for each tweet in order to divide the

tweet into multiple tokens based on whitespaces characters

2- Filtering

In this step, we used “filter the token by length” facility and removed the

tokens with lenght of less than 3 characters

3- Light stemming

Light stemming facility from RapidMiner was used to reduce the feature

space. In Arabic, the base or stem is different from the root.

4- Word vector model

In this step, we converted the text data into matrix to show the

frequency of occurrence of each term for each sentiment polarity.

Figure.3 shows word vector model

Figure.3 Word vector model

• Building and Validation of Model

We built model to classify the unlabeled tweets with correct

sentiments. The training data with assignment of sentiments to tweets

was the input of the process. We used naïve bayse (NB) and k-Nearest

Neighbor (k-NN) algorithms to build the classification model. We

validated the model using the cross validation technique that can be

easily implemented in RapidMiner.

Experiment Results.When we analyzed experiment results, we discovered many problems

in text processing steps. The first problem was related to emotion

icons symbols. When we cleaned the text and removed English words

and special characters using filters, all emotion icons were also

removed. In order to preserve the emotion icons, we gave special

meaningful name to each emotion icon. Table1 show examples of the

emotion icons conversion step.

Emotion icons

TweetsTweets after converting the

:'(♥ ♥)':السالمهمع رمزقلبرمزحزينالسالمهمع

XD XDجمالاي رمزضحكجمالاي

Table 1: Examples of the Converting Emotion Icons to meaningful text

Experiment Results (Cont.)

The second problem was variations of word forms and diacritic that occurred during the tokenization process. The token filter treats diacriticsas whitespaces. Table 2 shows the Tokenization problems.

Tweet before Tokenization Tweet after Tokenization

موقوتَاِكتابا َالُمْؤِمنينَََعٓلىَكانتالَصالةََإنكنينممؤالىلعانتكالةالصإن

موقوتتابا

ساعدونيانَتْحـَرراح ساعدونيرحـتانراح

جميعاــالمعليكمالســ جميعاــالمعليكمالســ

Table 2: Tokenization problems

The diacritic problem and some of word forms like tatweel were solved

by performing normalization process that is manipulation of Arabic

text to produce consistent form, by converting all the various forms of

a word to a common form and the removal of the diacritic. Table 3

shows the normalization case.

Rule ExampleTashkeel المؤمنين<-الُمْؤِمنين

Tatweel هللا<-اللــــه

Hamza ء<-ءorىءorؤ

Alef ا<-إorأorآlamalef ال<-إلorألorآلorال

Yeh ي<-ىorي

Heh <-ةorه

Table 3: Normalization cases

The last problem is the stemming. In the Arabic there exist different

that have different meanings but have the same root. This makes

detecting the polarities of these words Very difficult task. Moreover,

other problems occur during the stemming process. The stemmer

sometimes, deleted some basic letters of the word. Table 4 shows the

light stemmer problems. We remove the stemmer step from the text

processing

Tweet before stemming Tweet after stemmer

❤ليَولكَلمنَبعدكَ! تعدلَثلثَالقرانَ تعدل ثلث قر ولك لمن بعدك

انزينَخالصَاالمتحانَصعبَوَعيدكمَ

=((مباركانز خالص امتح صعب عيدكم مبارك

Table 4: Stemmer problems

• Initially the accuracy of the NB classifier was 58.61% while

that of the k-NN classifier was 52.47%.

• After we solved the entire problems we got comparatively

better results. The accuracy of the NB increased to 63.99%

and that of the k-NN reached 59.04%. We plan to perform

more experiments in different settings to get better results.

• RapidMiner is a great tool for Arabic text mining, there exist

problems in Arabic language that can compel a researcher to work

partially outside the environment of RapidMiner and then come

back to it after solving the above-mentioned problems.

• It is possible this migration can happen due to lack of knowledge of

a researcher about functionalities of RapidMiner.

• There exist lot of room to build new extensions so that the

RapidMiner becomes one-window solution for every kind of Arabic

text preprocessing.

Conclusion Research in sentiment analysis for the Arabic language has been very limited

as compared to other languages like English. This paper described

1. the issues related to sentiment analysis of Arabic Language

2. showed how the RapidMiner tool helped to generate classification

model to discover the sentiments (positive, negative, and neutral) for

each tweets.

We found that even though RapidMiner facilities are very helpful to

manipulate Arabic text and perform text mining, there exist lot of room for

development of new plug-ins of RapidMiner that can handle Arabic-specific

issues.

RM World 2014: Sentiment analysis on arabic tweets using RapidMiner

Documents

Sentiment Analysis of Tweets for the 2016 US Presidential ... · Sentiment Analysis of Tweets for the 2016 US Presidential Election ... For the 2016 US Presidential election,

Sentiment Classiﬁcation and Medical Health Record Analysis ... · with the help of convolutional neural networks, namely sentiment analysis in tweets and classiﬁcation of medical

Sentiment Analysis of Tweets to Classify the Box Office

Political Sentiment Analysis using Hybrid Approach · Bakliwal, Foster, van der Puil, O’Brien, Hughes[6] et al. sentiment analysis of political tweets using subjective-lexical-based

Sentiment Analysis of Valentine’s Day Tweets...In 2016, NetBase Solutions, California based developer of natural language processing technology, performed sentiment analysis of Valentine-related

Surface and Deep Features Ensemble for Sentiment Analysis ...fac.ksu.edu.sa/sites/default/files/surface_and... · Sentiment analysis of Arabic tweets is a complex task due to the

Predictive Sentiment Analysis of Tweets: A Stock Market

A Probabilistic Approach to Tweets' Sentiment Classification - ACII 2013 Conference

text to data to insight Data and the Digital World August ... · The mystery of Mr. Galbraith and The Cuckoo’s Calling. Sentiment Analysis: Trump Tweets. Negative tweets are from

Sentiment Analysis of Tweets by CNN utilizing Tweets with Emoji as … · 2: Tweets text and emoji express inconsistent sentiment. Forexample:仕事を失った!(Lostmyjob!) As shown

Sentiment Analysis using Hadoop - SCE Support Centerdcm.uhcl.edu/caps15g1/pdfs/Sentiment Analysis using Hadoop-Midter… · Sentiment Analysis using Hadoop ... Tweets are frequently

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

Using Tweets Sentiment Analysis to Predict Stock Market

SENTIMENT ANALYSIS OF TWEETS

Sentiment Analysis for Arabic tweets

A Fine Grain Sentiment Analysis with Semantics in Tweets

September 2018 - BrandsEye...Between January 4-23, 2018, BrandsEye human coders analyzed the sentiment of over 16,000 U.S. based tweets about Trump. Negative tweets about him substantially

Sentiment Analysis of Tweets Using Supervised Machine Learning … · 2021. 3. 27. · Sentiment Analysis of Tweets Using Supervised Machine Learning … 120 analysis of political

Social media evolution of the Egyptian revolutioncucis.ece.northwestern.edu/publications/pdf/ChoHenLee12.pdfand assessed the sentiment of daily tweets. We classiﬁed the sentiment

Clustering Arabic Tweets for Sentiment Analysis